Model Configuration#
This section describes how to configure LLM models and embedding models in the config.yml file.
The models Key#
The models key defines the LLM providers and models used by the NeMo Guardrails library.
models:
- type: main
engine: openai
model: gpt-3.5-turbo-instruct
Attribute |
Description |
|---|---|
|
The model type ( |
|
The LLM provider (for example, |
|
The model name (for example, |
|
Optional parameters to pass to the LangChain class that is used by the LLM provider. For example, when engine is set to |
LLM Engines#
Core Engines#
Engine |
Description |
|---|---|
|
OpenAI models |
|
NVIDIA NIM microservices |
|
Alias for |
|
Azure OpenAI models |
|
Anthropic Claude models |
|
Cohere models |
|
Google Vertex AI |
Self-Hosted Engines#
Engine |
Description |
|---|---|
|
HuggingFace Hub models |
|
HuggingFace Inference Endpoints |
|
vLLM with OpenAI-compatible API |
|
TensorRT-LLM |
|
Generic self-hosted models |
Auto-Discovered LangChain Providers#
The library automatically discovers all LLM providers from LangChain Community at runtime. This includes 50+ additional providers. Use the provider name as the engine value in your configuration.
To help you explore and select the right LLM provider, the library CLI provides the find-providers command to discover available LLM providers:
nemoguardrails find-providers [--list]
Embedding Engines#
Engine |
Description |
|---|---|
|
FastEmbed (default) |
|
OpenAI embeddings |
|
NVIDIA NIM embeddings |
Embeddings Configuration#
models:
- type: main
engine: openai
model: gpt-3.5-turbo-instruct
- type: embeddings
engine: FastEmbed
model: all-MiniLM-L6-v2
NVIDIA NIM Configuration#
The NeMo Guardrails library provides seamless integration with NVIDIA NIM microservices:
models:
- type: main
engine: nim
model: meta/llama-3.1-8b-instruct
This provides access to:
Locally-deployed NIMs: Run models on your own infrastructure with optimized inference.
NVIDIA API Catalog: Access hosted models on build.nvidia.com.
Specialized NIMs: Nemotron Content Safety, Topic Control, and Jailbreak Detect.
Local NIM Deployment#
For locally-deployed NIMs, specify the base URL:
models:
- type: main
engine: nim
model: meta/llama-3.1-8b-instruct
parameters:
base_url: http://localhost:8000/v1
Task-Specific Models#
Configure different models for specific tasks:
models:
- type: main
engine: nim
model: meta/llama-3.1-8b-instruct
- type: self_check_input
engine: nim
model: meta/llama3-8b-instruct
- type: self_check_output
engine: nim
model: meta/llama-3.1-70b-instruct
- type: generate_user_intent
engine: nim
model: meta/llama-3.1-8b-instruct
Available Task Types#
Task Type |
Description |
|---|---|
|
Primary application LLM |
|
Embedding generation |
|
Input validation checks |
|
Output validation checks |
|
Canonical user intent generation |
|
Next step prediction |
|
Bot response generation |
|
Fact verification |
Configuration Examples#
OpenAI#
The following example shows how to configure the OpenAI model as the main application LLM:
models:
- type: main
engine: openai
model: gpt-4o
Azure OpenAI#
The following example shows how to configure the Azure OpenAI model as the main application LLM using the Azure OpenAI API:
models:
- type: main
engine: azure
model: gpt-4
parameters:
azure_deployment: my-gpt4-deployment
azure_endpoint: https://my-resource.openai.azure.com
Anthropic#
The following example shows how to configure the Anthropic model as the main application LLM:
models:
- type: main
engine: anthropic
model: claude-3-5-sonnet-20241022
vLLM (OpenAI-Compatible)#
The following example shows how to configure the vLLM model as the main application LLM using the vLLM OpenAI API:
models:
- type: main
engine: vllm_openai
parameters:
openai_api_base: http://localhost:5000/v1
model_name: meta-llama/Llama-3.1-8B-Instruct
The following example shows how to configure Llama Guard as a guardrail model using the vLLM OpenAI API:
models:
- type: llama_guard
engine: vllm_openai
parameters:
openai_api_base: http://localhost:5000/v1
model_name: meta-llama/LlamaGuard-7b
Google Vertex AI#
The following example shows how to configure the Google Vertex AI model as the main application LLM:
models:
- type: main
engine: vertexai
model: gemini-1.0-pro
Complete Example#
The following example shows how to configure the main application LLM, embeddings model, and a dedicated Nemotron model for input and output checking:
models:
# Main application LLM
- type: main
engine: nim
model: meta/llama-3.1-70b-instruct
parameters:
temperature: 0.7
max_tokens: 2000
# Embeddings for knowledge base
- type: embeddings
engine: FastEmbed
model: all-MiniLM-L6-v2
# Dedicated model for input checking
- type: self_check_input
engine: nim
model: nvidia/llama-3.1-nemoguard-8b-content-safety
# Dedicated model for output checking
- type: self_check_output
engine: nim
model: nvidia/llama-3.1-nemoguard-8b-content-safety
Model Parameters#
Pass additional parameters to the underlying LangChain class:
models:
- type: main
engine: openai
model: gpt-4
parameters:
temperature: 0.7
max_tokens: 1000
top_p: 0.9
Common parameters vary by provider. Refer to the LangChain documentation for provider-specific options.