NeMo Guardrails Library API Server Endpoints Reference#

This reference documents the REST API endpoints provided by the NeMo Guardrails library API server. The server exposes an OpenAI-compatible Chat Completions API with additional guardrails-specific extensions.

Starting the Server#

Start the server using the CLI:

nemoguardrails server --port 8000 --config /path/to/config

For more information about server options, see Run the NeMo Guardrails Server.

Endpoints Overview#

Method	Endpoint	Description
`POST`	`/v1/chat/completions`	Generate a guarded chat completion
`GET`	`/v1/models`	List available models from the configured provider
`GET`	`/v1/rails/configs`	List available guardrails configurations
`GET`	`/v1/challenges`	Get red teaming challenges
`GET`	`/`	Chat UI (if enabled) or health status

POST /v1/chat/completions#

Generate a chat completion with guardrails applied. The request and response formats are compatible with the OpenAI Chat Completions API, with guardrails-specific fields nested under a guardrails object.

Request Body#

{
  "model": "meta/llama-3.1-8b-instruct",
  "messages": [
    {"role": "user", "content": "Hello, how are you?"}
  ],
  "stream": false,
  "temperature": 0.7,
  "max_tokens": 256,
  "guardrails": {
    "config_id": "my-config"
  }
}

OpenAI Fields#

Field	Type	Required	Description
`model`	string	Yes	The LLM model to use for chat completion (e.g., `"meta/llama-3.1-8b-instruct"`, `"gpt-4o"`).
`messages`	array of objects	No	The list of messages in the current conversation. Each message has `role` and `content` fields. Although the OpenAI API requires this field, the Guardrails server treats it as optional to support stateful continuation via `guardrails.state`. When omitted, defaults to an empty list.
`stream`	boolean	No	If `true`, returns partial message deltas as server-sent events. Default: `false`.
`max_tokens`	integer	No	The maximum number of tokens to generate.
`temperature`	float	No	Sampling temperature (0-2). Higher values make output more random.
`top_p`	float	No	Top-p (nucleus) sampling parameter.
`stop`	string or array	No	Stop sequence(s) where the model stops generating.
`presence_penalty`	float	No	Presence penalty parameter (-2.0 to 2.0).
`frequency_penalty`	float	No	Frequency penalty parameter (-2.0 to 2.0).

Guardrails Fields#

Guardrails-specific fields are nested under the guardrails object in the request body.

Field	Type	Required	Description
`guardrails.config_id`	string	No	The ID of the guardrails configuration to use. If not set, uses the server’s default configuration. Mutually exclusive with `config_ids`.
`guardrails.config_ids`	array of strings	No	List of configuration IDs to combine. Mutually exclusive with `config_id`.
`guardrails.thread_id`	string	No	ID of an existing thread for conversation persistence. Must be 16-255 characters.
`guardrails.context`	object	No	Additional context data to add to the conversation.
`guardrails.options`	object	No	Additional options for controlling the generation. See Generation Options.
`guardrails.state`	object	No	A state object to continue a previous interaction. Must contain an `events` or `state` key, or be an empty dict `{}` to start a new conversation.

Authentication Headers#

The server supports per-request API key injection via custom HTTP headers. This allows different requests to use different API keys for the configured LLM models, without modifying the server configuration or environment variables.

Header Format#

For each model in your guardrails configuration, you can provide a custom API key using a header in the format:

X-{model-name}-Authorization: your-api-key-here

The header name is case-insensitive and the model name should match the model field in your configuration (spaces and special characters should be preserved as-is, though the header matching is case-insensitive).

Examples#

Single Model Configuration

If your configuration uses gpt-3.5-turbo as the main model:

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "X-Gpt-3.5-Turbo-Authorization: sk-custom-key-123" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [{"role": "user", "content": "Hello"}],
    "guardrails": {"config_id": "my-config"}
  }'

Multi-Model Configuration

If your configuration uses multiple models (e.g., gpt-3.5-turbo for main generation and gpt-4 for self-check), you can provide separate keys for each:

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "X-Gpt-3.5-Turbo-Authorization: sk-main-key-789" \
  -H "X-Gpt-4-Authorization: sk-selfcheck-key-012" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [{"role": "user", "content": "Hello"}],
    "guardrails": {"config_id": "my-config"}
  }'

Behavior#

Headers are matched to models by comparing the model name (case-insensitive)
If a header is provided for a model, it overrides the API key configured in the guardrails configuration or environment variables for that specific request only
If no header is provided for a model, the default API key from the configuration is used
API keys are automatically reset to their original values after each request completes, preventing leakage between requests
This works for both streaming and non-streaming requests

Use Cases#

This feature is particularly useful for:

Multi-tenant applications: Different users can use their own API keys without server reconfiguration
Cost tracking: Route different requests to different API accounts for billing purposes
A/B testing: Test different API keys or accounts within the same deployment
Development: Test with personal API keys without modifying shared configurations

Generation Options#

The guardrails.options field controls which rails are applied and what information is returned.

{
  "guardrails": {
    "config_id": "my-config",
    "options": {
      "rails": {
        "input": true,
        "output": true,
        "dialog": true,
        "retrieval": true
      },
      "llm_params": {
        "temperature": 0.7
      },
      "llm_output": false,
      "output_vars": ["relevant_chunks"],
      "log": {
        "activated_rails": true,
        "llm_calls": false
      }
    }
  }
}

Rails Options#

Field	Type	Description
`input`	boolean \| array	Enable input rails. Set to `false` to disable, or provide a list of specific rail names.
`output`	boolean \| array	Enable output rails. Set to `false` to disable, or provide a list of specific rail names.
`dialog`	boolean	Enable dialog rails. Default: `true`.
`retrieval`	boolean \| array	Enable retrieval rails. Set to `false` to disable, or provide a list of specific rail names.
`tool_input`	boolean \| array	Enable tool input rails. Default: `true`.
`tool_output`	boolean \| array	Enable tool output rails. Default: `true`.

Other Options#

Field	Type	Description
`llm_params`	object	Additional parameters to pass to the LLM call (e.g., `temperature`, `max_tokens`).
`llm_output`	boolean	Whether to include custom LLM output in the response. Default: `false`.
`output_vars`	boolean \| array	Context variables to return. Set to `true` for all, or provide a list of variable names.

Log Options#

Field	Type	Description
`activated_rails`	boolean	Include information about which rails were activated. Default: `false`.
`llm_calls`	boolean	Include details about all LLM calls (prompts, completions, token usage). Default: `false`.
`internal_events`	boolean	Include the array of internal generated events. Default: `false`.
`colang_history`	boolean	Include conversation history in Colang format. Default: `false`.

Response Body#

The response follows the standard OpenAI ChatCompletion format with an additional guardrails object.

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1709424000,
  "model": "meta/llama-3.1-8b-instruct",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "I'm doing well, thank you!"
      },
      "finish_reason": "stop"
    }
  ],
  "guardrails": {
    "config_id": "content_safety",
    "llm_output": null,
    "output_data": null,
    "log": null,
    "state": null
  }
}

Response Fields#

Field	Type	Description
`id`	string	A unique identifier for the chat completion (e.g., `"chatcmpl-abc123"`).
`object`	string	Always `"chat.completion"`.
`created`	integer	Unix timestamp of when the completion was created.
`model`	string	The model used for the completion.
`choices`	array	Array of completion choices. Each choice contains `index`, `message` (with `role` and `content`), and `finish_reason`.
`guardrails`	object	Guardrails-specific output data. See below.

Guardrails Response Fields#

Field	Type	Description
`guardrails.config_id`	string	The guardrails configuration ID associated with this response.
`guardrails.state`	object	State object for continuing the conversation in future requests.
`guardrails.llm_output`	object	Additional LLM output data. Only included if `guardrails.options.llm_output` was `true`.
`guardrails.output_data`	object	Values for requested output variables. Only included if `guardrails.options.output_vars` was set.
`guardrails.log`	object	Logging information based on `guardrails.options.log` settings.

Examples#

Basic Request#

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta/llama-3.1-8b-instruct",
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "guardrails": {
      "config_id": "content_safety"
    }
  }'

Request with Streaming#

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta/llama-3.1-8b-instruct",
    "messages": [
      {"role": "user", "content": "Tell me a story"}
    ],
    "stream": true,
    "guardrails": {
      "config_id": "content_safety"
    }
  }'

Streaming responses use Server-Sent Events (SSE). Each chunk is a chat.completion.chunk object:

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1700000000,"model":"meta/llama-3.1-8b-instruct","choices":[{"delta":{"content":"Once"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1700000000,"model":"meta/llama-3.1-8b-instruct","choices":[{"delta":{"content":" upon"},"index":0,"finish_reason":null}]}

data: [DONE]

Request with Specific Rails#

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta/llama-3.1-8b-instruct",
    "messages": [
      {"role": "user", "content": "Hello"}
    ],
    "guardrails": {
      "config_id": "content_safety",
      "options": {
        "rails": {
          "input": ["check jailbreak"],
          "output": false,
          "dialog": false
        }
      }
    }
  }'

Request with Logging#

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta/llama-3.1-8b-instruct",
    "messages": [
      {"role": "user", "content": "Hello"}
    ],
    "guardrails": {
      "config_id": "content_safety",
      "options": {
        "log": {
          "activated_rails": true,
          "llm_calls": true
        }
      }
    }
  }'

Request with OpenAI Python SDK#

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="not-used"
)

response = client.chat.completions.create(
    model="meta/llama-3.1-8b-instruct",
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ],
    extra_body={
        "guardrails": {
            "config_id": "content_safety"
        }
    }
)

print(response.choices[0].message.content)

GET /v1/models#

List the available LLM models from the configured upstream provider. This endpoint proxies the request to the provider specified by MAIN_MODEL_ENGINE and returns the results in the standard OpenAI models list format.

For a guide on configuring providers, see List Available Models.

Request#

No request body or query parameters. The Authorization header, if present, is forwarded to the upstream provider.

curl http://localhost:8000/v1/models

Response Body#

{
  "data": [
    {
      "id": "meta/llama-3.1-8b-instruct",
      "object": "model",
      "created": 1700000000,
      "owned_by": "system"
    },
    {
      "id": "meta/llama-3.1-70b-instruct",
      "object": "model",
      "created": 1700000000,
      "owned_by": "system"
    }
  ]
}

Response Fields#

Field	Type	Description
`data`	array	List of model objects.
`data[].id`	string	The model identifier (e.g., `"gpt-4o"`, `"meta/llama-3.1-8b-instruct"`).
`data[].object`	string	Always `"model"`.
`data[].created`	integer	Unix timestamp of the model’s creation.
`data[].owned_by`	string	The organization that owns the model.

Error Responses#

Status	Description
502	The upstream provider is unreachable or returned an error.
4xx	Proxied from the upstream provider (e.g., 401 for an invalid API key).

Note

If the engine is not in the built-in provider table and MAIN_MODEL_BASE_URL is not set, the endpoint returns an empty model list instead of an error.

Example with OpenAI Python SDK#

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-used")
for model in client.models.list().data:
    print(model.id)

GET /v1/rails/configs#

List all available guardrails configurations.

Returns an array of configuration objects.

[
  {"id": "content_safety"},
  {"id": "customer-service"},
  {"id": "content-moderation"}
]

curl http://localhost:8000/v1/rails/configs

GET /v1/challenges#

Get the list of available red teaming challenges.

Returns an array of challenge objects. The structure depends on the registered challenges.

[
  {
    "id": "jailbreak-1",
    "description": "Attempt to bypass safety guardrails",
    "category": "jailbreak"
  }
]

curl http://localhost:8000/v1/challenges

Note

Challenges must be registered via a challenges.json file in the configuration directory or programmatically using register_challenges().

GET /#

Root endpoint that serves the Chat UI or returns a health status.

Chat UI Disabled: When the Chat UI is disabled (--disable-chat-ui), returns a health status:

{"status": "ok"}

Chat UI Enabled: When the Chat UI is enabled (default), serves the interactive chat interface.

Error Responses#

Errors from the chat completions endpoint are returned as ChatCompletion objects with the error message in the assistant’s content, or as HTTP exceptions.

Configuration Error#

When the guardrails configuration cannot be loaded:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "meta/llama-3.1-8b-instruct",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Could not load the ['my-config'] guardrails configuration. An internal error has occurred."
      },
      "finish_reason": "stop"
    }
  ]
}

Missing Configuration#

When no config_id is provided and no default is set, the server returns an HTTP 422 error:

{
  "detail": "No guardrails config_id provided and server has no default configuration"
}

Thread ID Validation Error#

When thread_id is less than 16 characters:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "meta/llama-3.1-8b-instruct",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The `thread_id` must have a minimum length of 16 characters."
      },
      "finish_reason": "stop"
    }
  ],
  "guardrails": {
    "config_id": "my-config"
  }
}

Invalid State Format#

When the guardrails.state object does not contain an events or state key, the server returns an HTTP 422 error:

{
  "detail": "Invalid state format: state must contain 'events' or 'state' key. Use an empty dict {} to start a new conversation."
}

Internal Server Error#

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "meta/llama-3.1-8b-instruct",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Internal server error"
      },
      "finish_reason": "stop"
    }
  ],
  "guardrails": {
    "config_id": "my-config"
  }
}

Streaming Errors#

During streaming, errors are sent as SSE events with an error object:

data: {"error": {"message": "...", "type": "...", "param": "...", "code": "..."}}

Environment Variables#

The server supports the following environment variables:

Variable	Description
`DEFAULT_CONFIG_ID`	Default guardrails configuration ID when none is specified in the request.
`MAIN_MODEL_ENGINE`	The LLM engine to use when the `model` field is specified in the request (e.g., `"openai"`, `"nim"`, `"vllm"`, `"anthropic"`, `"azure"` or `"azure_openai"`, `"cohere"`). Default: `"openai"`.
`MAIN_MODEL_BASE_URL`	Base URL for the LLM provider when the `model` field is specified in the request. Useful for self-hosted models (e.g., `"http://localhost:8080/v1"`).
`OPENAI_API_KEY`	API key for OpenAI models.
`NVIDIA_API_KEY`	API key for NVIDIA-hosted models on build.nvidia.com.
`ANTHROPIC_API_KEY`	API key for Anthropic models. Used when `MAIN_MODEL_ENGINE` is `"anthropic"`.
`AZURE_OPENAI_API_KEY`	API key for Azure OpenAI. Used when `MAIN_MODEL_ENGINE` is `"azure"` or `"azure_openai"`.
`AZURE_OPENAI_ENDPOINT`	Azure OpenAI resource endpoint URL (e.g., `"https://your-resource.openai.azure.com"`). Required when `MAIN_MODEL_ENGINE` is `"azure"`.
`AZURE_OPENAI_API_VERSION`	Azure OpenAI API version. Default: `"2024-06-01"`.
`COHERE_API_KEY`	API key for Cohere models. Used when `MAIN_MODEL_ENGINE` is `"cohere"`.
`COHERE_BASE_URL`	Override the Cohere API base URL. Default: `"https://api.cohere.com"`.
`NEMO_GUARDRAILS_SERVER_ENABLE_CORS`	Set to `"true"` to enable CORS. Default: `"false"`.
`NEMO_GUARDRAILS_SERVER_ALLOWED_ORIGINS`	Comma-separated list of allowed CORS origins. Default: `"*"`.

NeMo Guardrails Library API Server Endpoints Reference#

Starting the Server#

Endpoints Overview#

POST /v1/chat/completions#

Request Body#

OpenAI Fields#

Guardrails Fields#

Authentication Headers#

Header Format#

Examples#

Behavior#

Use Cases#

Generation Options#

Rails Options#

Other Options#

Log Options#

Response Body#

Response Fields#

Guardrails Response Fields#

Examples#

Basic Request#

Request with Streaming#

Request with Specific Rails#

Request with Logging#

Request with OpenAI Python SDK#

GET /v1/models#

Request#

Response Body#

Response Fields#

Error Responses#

Example with OpenAI Python SDK#

GET /v1/rails/configs#

GET /v1/challenges#

GET /#

Error Responses#

Configuration Error#

Missing Configuration#

Thread ID Validation Error#

Invalid State Format#

Internal Server Error#

Streaming Errors#

Environment Variables#

Related Topics#