NeMo Guardrails Library API Server Endpoints Reference#
This reference documents the REST API endpoints provided by the NeMo Guardrails library API server. The server exposes an OpenAI-compatible Chat Completions API with additional guardrails-specific extensions.
Starting the Server#
Start the server using the CLI:
nemoguardrails server --port 8000 --config /path/to/config
For more information about server options, see Run the NeMo Guardrails Server.
Endpoints Overview#
Method |
Endpoint |
Description |
|---|---|---|
|
|
Generate a guarded chat completion |
|
|
List available models from the configured provider |
|
|
List available guardrails configurations |
|
|
Get red teaming challenges |
|
|
Chat UI (if enabled) or health status |
POST /v1/chat/completions#
Generate a chat completion with guardrails applied.
The request and response formats are compatible with the OpenAI Chat Completions API,
with guardrails-specific fields nested under a guardrails object.
Request Body#
{
"model": "meta/llama-3.1-8b-instruct",
"messages": [
{"role": "user", "content": "Hello, how are you?"}
],
"stream": false,
"temperature": 0.7,
"max_tokens": 256,
"guardrails": {
"config_id": "my-config"
}
}
OpenAI Fields#
Field |
Type |
Required |
Description |
|---|---|---|---|
|
string |
Yes |
The LLM model to use for chat completion (e.g., |
|
array of objects |
No |
The list of messages in the current conversation. Each message has |
|
boolean |
No |
If |
|
integer |
No |
The maximum number of tokens to generate. |
|
float |
No |
Sampling temperature (0-2). Higher values make output more random. |
|
float |
No |
Top-p (nucleus) sampling parameter. |
|
string or array |
No |
Stop sequence(s) where the model stops generating. |
|
float |
No |
Presence penalty parameter (-2.0 to 2.0). |
|
float |
No |
Frequency penalty parameter (-2.0 to 2.0). |
Guardrails Fields#
Guardrails-specific fields are nested under the guardrails object in the request body.
Field |
Type |
Required |
Description |
|---|---|---|---|
|
string |
No |
The ID of the guardrails configuration to use. If not set, uses the server’s default configuration. Mutually exclusive with |
|
array of strings |
No |
List of configuration IDs to combine. Mutually exclusive with |
|
string |
No |
ID of an existing thread for conversation persistence. Must be 16-255 characters. |
|
object |
No |
Additional context data to add to the conversation. |
|
object |
No |
Additional options for controlling the generation. See Generation Options. |
|
object |
No |
A state object to continue a previous interaction. Must contain an |
Authentication Headers#
The server supports per-request API key injection via custom HTTP headers. This allows different requests to use different API keys for the configured LLM models, without modifying the server configuration or environment variables.
Header Format#
For each model in your guardrails configuration, you can provide a custom API key using a header in the format:
X-{model-name}-Authorization: your-api-key-here
The header name is case-insensitive and the model name should match the model field in your configuration (spaces and special characters should be preserved as-is, though the header matching is case-insensitive).
Examples#
Single Model Configuration
If your configuration uses gpt-3.5-turbo as the main model:
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-Gpt-3.5-Turbo-Authorization: sk-custom-key-123" \
-d '{
"model": "gpt-3.5-turbo",
"messages": [{"role": "user", "content": "Hello"}],
"guardrails": {"config_id": "my-config"}
}'
Multi-Model Configuration
If your configuration uses multiple models (e.g., gpt-3.5-turbo for main generation and gpt-4 for self-check), you can provide separate keys for each:
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-Gpt-3.5-Turbo-Authorization: sk-main-key-789" \
-H "X-Gpt-4-Authorization: sk-selfcheck-key-012" \
-d '{
"model": "gpt-3.5-turbo",
"messages": [{"role": "user", "content": "Hello"}],
"guardrails": {"config_id": "my-config"}
}'
Behavior#
Headers are matched to models by comparing the model name (case-insensitive)
If a header is provided for a model, it overrides the API key configured in the guardrails configuration or environment variables for that specific request only
If no header is provided for a model, the default API key from the configuration is used
API keys are automatically reset to their original values after each request completes, preventing leakage between requests
This works for both streaming and non-streaming requests
Use Cases#
This feature is particularly useful for:
Multi-tenant applications: Different users can use their own API keys without server reconfiguration
Cost tracking: Route different requests to different API accounts for billing purposes
A/B testing: Test different API keys or accounts within the same deployment
Development: Test with personal API keys without modifying shared configurations
Generation Options#
The guardrails.options field controls which rails are applied and what information is returned.
{
"guardrails": {
"config_id": "my-config",
"options": {
"rails": {
"input": true,
"output": true,
"dialog": true,
"retrieval": true
},
"llm_params": {
"temperature": 0.7
},
"llm_output": false,
"output_vars": ["relevant_chunks"],
"log": {
"activated_rails": true,
"llm_calls": false
}
}
}
}
Rails Options#
Field |
Type |
Description |
|---|---|---|
|
boolean | array |
Enable input rails. Set to |
|
boolean | array |
Enable output rails. Set to |
|
boolean |
Enable dialog rails. Default: |
|
boolean | array |
Enable retrieval rails. Set to |
|
boolean | array |
Enable tool input rails. Default: |
|
boolean | array |
Enable tool output rails. Default: |
Other Options#
Field |
Type |
Description |
|---|---|---|
|
object |
Additional parameters to pass to the LLM call (e.g., |
|
boolean |
Whether to include custom LLM output in the response. Default: |
|
boolean | array |
Context variables to return. Set to |
Log Options#
Field |
Type |
Description |
|---|---|---|
|
boolean |
Include information about which rails were activated. Default: |
|
boolean |
Include details about all LLM calls (prompts, completions, token usage). Default: |
|
boolean |
Include the array of internal generated events. Default: |
|
boolean |
Include conversation history in Colang format. Default: |
Response Body#
The response follows the standard OpenAI ChatCompletion format with an additional guardrails object.
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1709424000,
"model": "meta/llama-3.1-8b-instruct",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "I'm doing well, thank you!"
},
"finish_reason": "stop"
}
],
"guardrails": {
"config_id": "content_safety",
"llm_output": null,
"output_data": null,
"log": null,
"state": null
}
}
Response Fields#
Field |
Type |
Description |
|---|---|---|
|
string |
A unique identifier for the chat completion (e.g., |
|
string |
Always |
|
integer |
Unix timestamp of when the completion was created. |
|
string |
The model used for the completion. |
|
array |
Array of completion choices. Each choice contains |
|
object |
Guardrails-specific output data. See below. |
Guardrails Response Fields#
Field |
Type |
Description |
|---|---|---|
|
string |
The guardrails configuration ID associated with this response. |
|
object |
State object for continuing the conversation in future requests. |
|
object |
Additional LLM output data. Only included if |
|
object |
Values for requested output variables. Only included if |
|
object |
Logging information based on |
Examples#
Basic Request#
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "meta/llama-3.1-8b-instruct",
"messages": [
{"role": "user", "content": "What is the capital of France?"}
],
"guardrails": {
"config_id": "content_safety"
}
}'
Request with Streaming#
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "meta/llama-3.1-8b-instruct",
"messages": [
{"role": "user", "content": "Tell me a story"}
],
"stream": true,
"guardrails": {
"config_id": "content_safety"
}
}'
Streaming responses use Server-Sent Events (SSE). Each chunk is a chat.completion.chunk object:
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1700000000,"model":"meta/llama-3.1-8b-instruct","choices":[{"delta":{"content":"Once"},"index":0,"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1700000000,"model":"meta/llama-3.1-8b-instruct","choices":[{"delta":{"content":" upon"},"index":0,"finish_reason":null}]}
data: [DONE]
Request with Specific Rails#
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "meta/llama-3.1-8b-instruct",
"messages": [
{"role": "user", "content": "Hello"}
],
"guardrails": {
"config_id": "content_safety",
"options": {
"rails": {
"input": ["check jailbreak"],
"output": false,
"dialog": false
}
}
}
}'
Request with Logging#
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "meta/llama-3.1-8b-instruct",
"messages": [
{"role": "user", "content": "Hello"}
],
"guardrails": {
"config_id": "content_safety",
"options": {
"log": {
"activated_rails": true,
"llm_calls": true
}
}
}
}'
Request with OpenAI Python SDK#
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="not-used"
)
response = client.chat.completions.create(
model="meta/llama-3.1-8b-instruct",
messages=[
{"role": "user", "content": "What is the capital of France?"}
],
extra_body={
"guardrails": {
"config_id": "content_safety"
}
}
)
print(response.choices[0].message.content)
GET /v1/models#
List the available LLM models from the configured upstream provider.
This endpoint proxies the request to the provider specified by MAIN_MODEL_ENGINE and returns the results in the standard OpenAI models list format.
For a guide on configuring providers, see List Available Models.
Request#
No request body or query parameters. The Authorization header, if present, is forwarded to the upstream provider.
curl http://localhost:8000/v1/models
Response Body#
{
"data": [
{
"id": "meta/llama-3.1-8b-instruct",
"object": "model",
"created": 1700000000,
"owned_by": "system"
},
{
"id": "meta/llama-3.1-70b-instruct",
"object": "model",
"created": 1700000000,
"owned_by": "system"
}
]
}
Response Fields#
Field |
Type |
Description |
|---|---|---|
|
array |
List of model objects. |
|
string |
The model identifier (e.g., |
|
string |
Always |
|
integer |
Unix timestamp of the model’s creation. |
|
string |
The organization that owns the model. |
Error Responses#
Status |
Description |
|---|---|
502 |
The upstream provider is unreachable or returned an error. |
4xx |
Proxied from the upstream provider (e.g., 401 for an invalid API key). |
Note
If the engine is not in the built-in provider table and MAIN_MODEL_BASE_URL is not set, the endpoint returns an empty model list instead of an error.
Example with OpenAI Python SDK#
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-used")
for model in client.models.list().data:
print(model.id)
GET /v1/rails/configs#
List all available guardrails configurations.
Returns an array of configuration objects.
[
{"id": "content_safety"},
{"id": "customer-service"},
{"id": "content-moderation"}
]
curl http://localhost:8000/v1/rails/configs
GET /v1/challenges#
Get the list of available red teaming challenges.
Returns an array of challenge objects. The structure depends on the registered challenges.
[
{
"id": "jailbreak-1",
"description": "Attempt to bypass safety guardrails",
"category": "jailbreak"
}
]
curl http://localhost:8000/v1/challenges
Note
Challenges must be registered via a challenges.json file in the configuration directory or programmatically using register_challenges().
GET /#
Root endpoint that serves the Chat UI or returns a health status.
Chat UI Disabled: When the Chat UI is disabled (--disable-chat-ui), returns a health status:
{"status": "ok"}
Chat UI Enabled: When the Chat UI is enabled (default), serves the interactive chat interface.
Error Responses#
Errors from the chat completions endpoint are returned as ChatCompletion objects with the error message in the assistant’s content, or as HTTP exceptions.
Configuration Error#
When the guardrails configuration cannot be loaded:
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1700000000,
"model": "meta/llama-3.1-8b-instruct",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Could not load the ['my-config'] guardrails configuration. An internal error has occurred."
},
"finish_reason": "stop"
}
]
}
Missing Configuration#
When no config_id is provided and no default is set, the server returns an HTTP 422 error:
{
"detail": "No guardrails config_id provided and server has no default configuration"
}
Thread ID Validation Error#
When thread_id is less than 16 characters:
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1700000000,
"model": "meta/llama-3.1-8b-instruct",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The `thread_id` must have a minimum length of 16 characters."
},
"finish_reason": "stop"
}
],
"guardrails": {
"config_id": "my-config"
}
}
Invalid State Format#
When the guardrails.state object does not contain an events or state key, the server returns an HTTP 422 error:
{
"detail": "Invalid state format: state must contain 'events' or 'state' key. Use an empty dict {} to start a new conversation."
}
Internal Server Error#
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1700000000,
"model": "meta/llama-3.1-8b-instruct",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Internal server error"
},
"finish_reason": "stop"
}
],
"guardrails": {
"config_id": "my-config"
}
}
Streaming Errors#
During streaming, errors are sent as SSE events with an error object:
data: {"error": {"message": "...", "type": "...", "param": "...", "code": "..."}}
Environment Variables#
The server supports the following environment variables:
Variable |
Description |
|---|---|
|
Default guardrails configuration ID when none is specified in the request. |
|
The LLM engine to use when the |
|
Base URL for the LLM provider when the |
|
API key for OpenAI models. |
|
API key for NVIDIA-hosted models on build.nvidia.com. |
|
API key for Anthropic models. Used when |
|
API key for Azure OpenAI. Used when |
|
Azure OpenAI resource endpoint URL (e.g., |
|
Azure OpenAI API version. Default: |
|
API key for Cohere models. Used when |
|
Override the Cohere API base URL. Default: |
|
Set to |
|
Comma-separated list of allowed CORS origins. Default: |