NeMo Guardrails Library API Server Endpoints Reference#
This reference documents the REST API endpoints provided by the NeMo Guardrails library API server. The server exposes an OpenAI-compatible Chat Completions API with additional guardrails-specific extensions.
Starting the Server#
Start the server using the CLI:
nemoguardrails server --port 8000 --config /path/to/config
For more information about server options, see Run the NeMo Guardrails Server.
Endpoints Overview#
Method |
Endpoint |
Description |
|---|---|---|
|
|
Generate a guarded chat completion |
|
|
List available models from the configured provider |
|
|
List available guardrails configurations |
|
|
Get red teaming challenges |
|
|
Chat UI (if enabled) or health status |
POST /v1/chat/completions#
Generate a chat completion with guardrails applied.
The request and response formats are compatible with the OpenAI Chat Completions API,
with guardrails-specific fields nested under a guardrails object.
Request Body#
{
"model": "meta/llama-3.1-8b-instruct",
"messages": [
{"role": "user", "content": "Hello, how are you?"}
],
"stream": false,
"temperature": 0.7,
"max_tokens": 256,
"guardrails": {
"config_id": "my-config"
}
}
OpenAI Fields#
Field |
Type |
Required |
Description |
|---|---|---|---|
|
string |
Yes |
The LLM model to use for chat completion (e.g., |
|
array of objects |
No |
The list of messages in the current conversation. Each message has |
|
boolean |
No |
If |
|
integer |
No |
The maximum number of tokens to generate. |
|
float |
No |
Sampling temperature (0-2). Higher values make output more random. |
|
float |
No |
Top-p (nucleus) sampling parameter. |
|
string or array |
No |
Stop sequence(s) where the model stops generating. |
|
float |
No |
Presence penalty parameter (-2.0 to 2.0). |
|
float |
No |
Frequency penalty parameter (-2.0 to 2.0). |
Guardrails Fields#
Guardrails-specific fields are nested under the guardrails object in the request body.
Field |
Type |
Required |
Description |
|---|---|---|---|
|
string |
No |
The ID of the guardrails configuration to use. If not set, uses the server’s default configuration. Mutually exclusive with |
|
array of strings |
No |
List of configuration IDs to combine. Mutually exclusive with |
|
string |
No |
ID of an existing thread for conversation persistence. Must be 16-255 characters. |
|
object |
No |
Additional context data to add to the conversation. |
|
object |
No |
Additional options for controlling the generation. See Generation Options. |
|
object |
No |
A state object to continue a previous interaction. Must contain an |
Generation Options#
The guardrails.options field controls which rails are applied and what information is returned.
{
"guardrails": {
"config_id": "my-config",
"options": {
"rails": {
"input": true,
"output": true,
"dialog": true,
"retrieval": true
},
"llm_params": {
"temperature": 0.7
},
"llm_output": false,
"output_vars": ["relevant_chunks"],
"log": {
"activated_rails": true,
"llm_calls": false
}
}
}
}
Rails Options#
Field |
Type |
Description |
|---|---|---|
|
boolean | array |
Enable input rails. Set to |
|
boolean | array |
Enable output rails. Set to |
|
boolean |
Enable dialog rails. Default: |
|
boolean | array |
Enable retrieval rails. Set to |
|
boolean | array |
Enable tool input rails. Default: |
|
boolean | array |
Enable tool output rails. Default: |
Other Options#
Field |
Type |
Description |
|---|---|---|
|
object |
Additional parameters to pass to the LLM call (e.g., |
|
boolean |
Whether to include custom LLM output in the response. Default: |
|
boolean | array |
Context variables to return. Set to |
Log Options#
Field |
Type |
Description |
|---|---|---|
|
boolean |
Include information about which rails were activated. Default: |
|
boolean |
Include details about all LLM calls (prompts, completions, token usage). Default: |
|
boolean |
Include the array of internal generated events. Default: |
|
boolean |
Include conversation history in Colang format. Default: |
Response Body#
The response follows the standard OpenAI ChatCompletion format with an additional guardrails object.
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1709424000,
"model": "meta/llama-3.1-8b-instruct",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "I'm doing well, thank you!"
},
"finish_reason": "stop"
}
],
"guardrails": {
"config_id": "content_safety",
"llm_output": null,
"output_data": null,
"log": null,
"state": null
}
}
Response Fields#
Field |
Type |
Description |
|---|---|---|
|
string |
A unique identifier for the chat completion (e.g., |
|
string |
Always |
|
integer |
Unix timestamp of when the completion was created. |
|
string |
The model used for the completion. |
|
array |
Array of completion choices. Each choice contains |
|
object |
Guardrails-specific output data. See below. |
Guardrails Response Fields#
Field |
Type |
Description |
|---|---|---|
|
string |
The guardrails configuration ID associated with this response. |
|
object |
State object for continuing the conversation in future requests. |
|
object |
Additional LLM output data. Only included if |
|
object |
Values for requested output variables. Only included if |
|
object |
Logging information based on |
Examples#
Basic Request#
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "meta/llama-3.1-8b-instruct",
"messages": [
{"role": "user", "content": "What is the capital of France?"}
],
"guardrails": {
"config_id": "content_safety"
}
}'
Request with Streaming#
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "meta/llama-3.1-8b-instruct",
"messages": [
{"role": "user", "content": "Tell me a story"}
],
"stream": true,
"guardrails": {
"config_id": "content_safety"
}
}'
Streaming responses use Server-Sent Events (SSE). Each chunk is a chat.completion.chunk object:
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1700000000,"model":"meta/llama-3.1-8b-instruct","choices":[{"delta":{"content":"Once"},"index":0,"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1700000000,"model":"meta/llama-3.1-8b-instruct","choices":[{"delta":{"content":" upon"},"index":0,"finish_reason":null}]}
data: [DONE]
Request with Specific Rails#
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "meta/llama-3.1-8b-instruct",
"messages": [
{"role": "user", "content": "Hello"}
],
"guardrails": {
"config_id": "content_safety",
"options": {
"rails": {
"input": ["check jailbreak"],
"output": false,
"dialog": false
}
}
}
}'
Request with Logging#
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "meta/llama-3.1-8b-instruct",
"messages": [
{"role": "user", "content": "Hello"}
],
"guardrails": {
"config_id": "content_safety",
"options": {
"log": {
"activated_rails": true,
"llm_calls": true
}
}
}
}'
Request with OpenAI Python SDK#
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="not-used"
)
response = client.chat.completions.create(
model="meta/llama-3.1-8b-instruct",
messages=[
{"role": "user", "content": "What is the capital of France?"}
],
extra_body={
"guardrails": {
"config_id": "content_safety"
}
}
)
print(response.choices[0].message.content)
GET /v1/models#
List the available LLM models from the configured upstream provider.
This endpoint proxies the request to the provider specified by MAIN_MODEL_ENGINE and returns the results in the standard OpenAI models list format.
For a guide on configuring providers, see List Available Models.
Request#
No request body or query parameters. The Authorization header, if present, is forwarded to the upstream provider.
curl http://localhost:8000/v1/models
Response Body#
{
"data": [
{
"id": "meta/llama-3.1-8b-instruct",
"object": "model",
"created": 1700000000,
"owned_by": "system"
},
{
"id": "meta/llama-3.1-70b-instruct",
"object": "model",
"created": 1700000000,
"owned_by": "system"
}
]
}
Response Fields#
Field |
Type |
Description |
|---|---|---|
|
array |
List of model objects. |
|
string |
The model identifier (e.g., |
|
string |
Always |
|
integer |
Unix timestamp of the model’s creation. |
|
string |
The organization that owns the model. |
Error Responses#
Status |
Description |
|---|---|
502 |
The upstream provider is unreachable or returned an error. |
4xx |
Proxied from the upstream provider (e.g., 401 for an invalid API key). |
Note
If the engine is not in the built-in provider table and MAIN_MODEL_BASE_URL is not set, the endpoint returns an empty model list instead of an error.
Example with OpenAI Python SDK#
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-used")
for model in client.models.list().data:
print(model.id)
GET /v1/rails/configs#
List all available guardrails configurations.
Returns an array of configuration objects.
[
{"id": "content_safety"},
{"id": "customer-service"},
{"id": "content-moderation"}
]
curl http://localhost:8000/v1/rails/configs
GET /v1/challenges#
Get the list of available red teaming challenges.
Returns an array of challenge objects. The structure depends on the registered challenges.
[
{
"id": "jailbreak-1",
"description": "Attempt to bypass safety guardrails",
"category": "jailbreak"
}
]
curl http://localhost:8000/v1/challenges
Note
Challenges must be registered via a challenges.json file in the configuration directory or programmatically using register_challenges().
GET /#
Root endpoint that serves the Chat UI or returns a health status.
Chat UI Disabled: When the Chat UI is disabled (--disable-chat-ui), returns a health status:
{"status": "ok"}
Chat UI Enabled: When the Chat UI is enabled (default), serves the interactive chat interface.
Error Responses#
Errors from the chat completions endpoint are returned as ChatCompletion objects with the error message in the assistant’s content, or as HTTP exceptions.
Configuration Error#
When the guardrails configuration cannot be loaded:
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1700000000,
"model": "meta/llama-3.1-8b-instruct",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Could not load the ['my-config'] guardrails configuration. An internal error has occurred."
},
"finish_reason": "stop"
}
]
}
Missing Configuration#
When no config_id is provided and no default is set, the server returns an HTTP 422 error:
{
"detail": "No guardrails config_id provided and server has no default configuration"
}
Thread ID Validation Error#
When thread_id is less than 16 characters:
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1700000000,
"model": "meta/llama-3.1-8b-instruct",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The `thread_id` must have a minimum length of 16 characters."
},
"finish_reason": "stop"
}
],
"guardrails": {
"config_id": "my-config"
}
}
Invalid State Format#
When the guardrails.state object does not contain an events or state key, the server returns an HTTP 422 error:
{
"detail": "Invalid state format: state must contain 'events' or 'state' key. Use an empty dict {} to start a new conversation."
}
Internal Server Error#
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1700000000,
"model": "meta/llama-3.1-8b-instruct",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Internal server error"
},
"finish_reason": "stop"
}
],
"guardrails": {
"config_id": "my-config"
}
}
Streaming Errors#
During streaming, errors are sent as SSE events with an error object:
data: {"error": {"message": "...", "type": "...", "param": "...", "code": "..."}}
Environment Variables#
The server supports the following environment variables:
Variable |
Description |
|---|---|
|
Default guardrails configuration ID when none is specified in the request. |
|
The LLM engine to use when the |
|
Base URL for the LLM provider when the |
|
API key for OpenAI models. |
|
API key for NVIDIA-hosted models on build.nvidia.com. |
|
API key for Anthropic models. Used when |
|
API key for Azure OpenAI. Used when |
|
Azure OpenAI resource endpoint URL (e.g., |
|
Azure OpenAI API version. Default: |
|
API key for Cohere models. Used when |
|
Override the Cohere API base URL. Default: |
|
Set to |
|
Comma-separated list of allowed CORS origins. Default: |