Model Configuration#
In this page, learn how to configure the models section in your Guardrails config.yml file. For a complete reference of all configuration options, refer to the Configuration YAML Schema Reference.
NVIDIA NIM Configuration#
The NVIDIA NeMo Guardrails library integrates with NVIDIA NIM microservices:
models:
- type: main
engine: nim
model: meta/llama-3.1-8b-instruct
This provides access to:
Locally deployed NIMs. You can run models on your own infrastructure with optimized inference.
NVIDIA API Catalog. You can access hosted models on build.nvidia.com.
Specialized NIMs. Includes NemoGuard Content Safety, Topic Control, and Jailbreak Detect.
Local NIM Deployment#
For locally deployed NIMs, specify the base URL:
models:
- type: main
engine: nim
model: meta/llama-3.1-8b-instruct
parameters:
base_url: http://localhost:8000/v1
Task-Specific Models#
Configure different models for specific tasks:
models:
- type: main
engine: nim
model: meta/llama-3.1-8b-instruct
- type: self_check_input
engine: nim
model: meta/llama3-8b-instruct
- type: self_check_output
engine: nim
model: meta/llama-3.1-70b-instruct
- type: generate_user_intent
engine: nim
model: meta/llama-3.1-8b-instruct
Configuration Examples#
OpenAI#
The following example shows how to configure the OpenAI model as the main application LLM:
models:
- type: main
engine: openai
model: gpt-4o
Azure OpenAI#
The following example shows how to configure the Azure OpenAI model as the main application LLM using the Azure OpenAI API:
models:
- type: main
engine: azure
model: gpt-4
parameters:
azure_endpoint: https://my-resource.openai.azure.com/
azure_deployment: my-gpt4-deployment
api_version: "2024-02-15-preview"
You can supply the resource endpoint as azure_endpoint (preferred, matches the OpenAI Python SDK) or base_url (v0.21-compatibility alias). Both fields accept only the resource URL. The framework composes the deployment path. Setting both raises an error.
Set AZURE_OPENAI_API_KEY in the environment, set api_key_env_var on the model entry, or pass parameters.api_key directly. The framework constructs the deployment URL, sets api-version as a query parameter, and authenticates with the api-key header.
Note
Azure OpenAI is supported natively on the default framework in v0.22 with key-based authentication. For Azure AD or token-based authentication, configure engine: openai manually or use LangChain with NEMOGUARDRAILS_LLM_FRAMEWORK=langchain. Refer to Migrating to 0.22 for both alternatives.
Anthropic#
The following example shows how to configure the Anthropic model as the main application LLM:
models:
- type: main
engine: anthropic
model: claude-3-5-sonnet-20241022
Note
Anthropic’s API is not OpenAI-compatible, so this engine is opt-in. Set NEMOGUARDRAILS_LLM_FRAMEWORK=langchain and install langchain-anthropic. For background, refer to Migrating to 0.22.
vLLM (OpenAI-Compatible)#
vLLM exposes an OpenAI-compatible API, so the recommended configuration uses engine: openai pointed at the vLLM endpoint. The built-in client handles it with no LangChain dependency.
models:
- type: main
engine: openai
model: meta-llama/Llama-3.1-8B-Instruct
parameters:
base_url: http://localhost:5000/v1
api_key: EMPTY
The following example shows how to configure Llama Guard as a guardrail model using the same pattern:
models:
- type: llama_guard
engine: openai
model: meta-llama/LlamaGuard-7b
parameters:
base_url: http://localhost:5000/v1
api_key: EMPTY
When self-hosted vLLM does not enforce authentication, set parameters.api_key to a non-empty placeholder such as EMPTY. If your deployment requires a real token, replace parameters.api_key with the literal token, or omit it and set api_key_env_var at the top level of the model entry, not inside parameters:
- type: main
engine: openai
model: meta-llama/Llama-3.1-8B-Instruct
api_key_env_var: MY_VLLM_API_KEY
parameters:
base_url: http://localhost:5000/v1
Note
Set the referenced environment variable before calling RailsConfig.from_content or RailsConfig.from_path. Otherwise, config loading fails with Model API Key environment variable 'X' not set.. A Pydantic validator on the model schema performs the check eagerly.
Note
The legacy engine: vllm_openai with parameters.openai_api_base form is only needed when running under NEMOGUARDRAILS_LLM_FRAMEWORK=langchain. For new configurations, prefer the form above.
Other OpenAI-Compatible Endpoints#
The same engine: openai plus parameters.base_url pattern works for any provider whose wire protocol is OpenAI-compatible. Examples include OpenRouter, Together.ai, Fireworks.ai, Groq, DeepSeek’s hosted API at https://api.deepseek.com/v1, TGI deployments that expose /v1/chat/completions, and the llama.cpp server with --api. Provide parameters.base_url and either parameters.api_key or a top-level api_key_env_var.
Google Vertex AI#
The following example shows how to configure the Google Vertex AI model as the main application LLM:
models:
- type: main
engine: vertexai
model: gemini-1.0-pro
Note
Vertex AI’s API is not OpenAI-compatible, so this engine is opt-in. Set NEMOGUARDRAILS_LLM_FRAMEWORK=langchain and install langchain-google-vertexai. For background, refer to Migrating to 0.22.
Complete Example#
The following example shows how to configure the main application LLM, embeddings model, and a dedicated NemoGuard model for input and output checking:
models:
# Main application LLM
- type: main
engine: nim
model: meta/llama-3.1-70b-instruct
parameters:
temperature: 0.7
max_tokens: 2000
# Embeddings for knowledge base
- type: embeddings
engine: FastEmbed
model: all-MiniLM-L6-v2
# Dedicated model for input checking
- type: self_check_input
engine: nim
model: nvidia/llama-3.1-nemoguard-8b-content-safety
# Dedicated model for output checking
- type: self_check_output
engine: nim
model: nvidia/llama-3.1-nemoguard-8b-content-safety
Model Parameters#
Pass additional parameters to the underlying LLM client. For engines served by the built-in client, such as any OpenAI-compatible endpoint, the runtime forwards parameters to the OpenAI-compatible HTTP request. Examples include temperature, max_tokens, base_url, api_key, default_query, and default_headers. For LangChain engines, parameters follow the conventions of the underlying LangChain class.
models:
- type: main
engine: openai
model: gpt-4
parameters:
temperature: 0.7
max_tokens: 1000
top_p: 0.9
Common parameters vary by provider. For built-in engines, see the OpenAI-compatible client options. For LangChain engines, refer to the corresponding LangChain provider documentation.