Model Configuration#
This section describes how to configure LLM models and embedding models in the config.yml file.
The models Key#
The models key defines the LLM providers and models used by the NeMo Guardrails toolkit.
models:
- type: main
engine: openai
model: gpt-3.5-turbo-instruct
Attribute |
Description |
|---|---|
|
The model type ( |
|
The LLM provider (for example, |
|
The model name (for example, |
|
Optional parameters to pass to the LangChain class that is used by the LLM provider. For example, when engine is set to openai, the toolkit loads the ChatOpenAI class. The ChatOpenAI class supports temperature, max_tokens, and other class-specific arguments. |
LLM Engines#
Core Engines#
Engine |
Description |
|---|---|
|
OpenAI models |
|
NVIDIA NIM microservices |
|
Alias for |
|
Azure OpenAI models |
|
Anthropic Claude models |
|
Cohere models |
|
Google Vertex AI |
Self-Hosted Engines#
Engine |
Description |
|---|---|
|
HuggingFace Hub models |
|
HuggingFace Inference Endpoints |
|
vLLM with OpenAI-compatible API |
|
TensorRT-LLM |
|
Generic self-hosted models |
Auto-Discovered LangChain Providers#
The toolkit automatically discovers all LLM providers from LangChain Community at runtime. This includes 50+ additional providers. Use the provider name as the engine value in your configuration.
To help you explore and select the right LLM provider, the toolkit CLI provides the find-providers command to discover available LLM providers:
nemoguardrails find-providers [--list]
Embedding Engines#
Engine |
Description |
|---|---|
|
FastEmbed (default) |
|
OpenAI embeddings |
|
NVIDIA NIM embeddings |
Embeddings Configuration#
models:
- type: main
engine: openai
model: gpt-3.5-turbo-instruct
- type: embeddings
engine: FastEmbed
model: all-MiniLM-L6-v2
NVIDIA NIM Configuration#
The NeMo Guardrails toolkit provides seamless integration with NVIDIA NIM microservices:
models:
- type: main
engine: nim
model: meta/llama-3.1-8b-instruct
This provides access to:
Locally-deployed NIMs: Run models on your own infrastructure with optimized inference.
NVIDIA API Catalog: Access hosted models on build.nvidia.com.
Specialized NIMs: NemoGuard Content Safety, Topic Control, and Jailbreak Detection.
Local NIM Deployment#
For locally-deployed NIMs, specify the base URL:
models:
- type: main
engine: nim
model: meta/llama-3.1-8b-instruct
parameters:
base_url: http://localhost:8000/v1
Task-Specific Models#
Configure different models for specific tasks:
models:
- type: main
engine: nim
model: meta/llama-3.1-8b-instruct
- type: self_check_input
engine: nim
model: meta/llama3-8b-instruct
- type: self_check_output
engine: nim
model: meta/llama-3.1-70b-instruct
- type: generate_user_intent
engine: nim
model: meta/llama-3.1-8b-instruct
Available Task Types#
Task Type |
Description |
|---|---|
|
Primary application LLM |
|
Embedding generation |
|
Input validation checks |
|
Output validation checks |
|
Canonical user intent generation |
|
Next step prediction |
|
Bot response generation |
|
Fact verification |
Configuration Examples#
OpenAI#
The following example shows how to configure the OpenAI model as the main application LLM:
models:
- type: main
engine: openai
model: gpt-4o
Azure OpenAI#
The following example shows how to configure the Azure OpenAI model as the main application LLM using the Azure OpenAI API:
models:
- type: main
engine: azure
model: gpt-4
parameters:
azure_deployment: my-gpt4-deployment
azure_endpoint: https://my-resource.openai.azure.com
Anthropic#
The following example shows how to configure the Anthropic model as the main application LLM:
models:
- type: main
engine: anthropic
model: claude-3-5-sonnet-20241022
vLLM (OpenAI-Compatible)#
The following example shows how to configure the vLLM model as the main application LLM using the vLLM OpenAI API:
models:
- type: main
engine: vllm_openai
parameters:
openai_api_base: http://localhost:5000/v1
model_name: meta-llama/Llama-3.1-8B-Instruct
Google Vertex AI#
The following example shows how to configure the Google Vertex AI model as the main application LLM:
models:
- type: main
engine: vertexai
model: gemini-pro
parameters:
project: my-gcp-project
location: us-central1
Complete Example#
The following example shows how to configure the main application LLM, embeddings model, and a dedicated NemoGuard model for input and output checking:
models:
# Main application LLM
- type: main
engine: nim
model: meta/llama-3.1-70b-instruct
parameters:
temperature: 0.7
max_tokens: 2000
# Embeddings for knowledge base
- type: embeddings
engine: FastEmbed
model: all-MiniLM-L6-v2
# Dedicated model for input checking
- type: self_check_input
engine: nim
model: nvidia/llama-3.1-nemoguard-8b-content-safety
# Dedicated model for output checking
- type: self_check_output
engine: nim
model: nvidia/llama-3.1-nemoguard-8b-content-safety
Model Parameters#
Pass additional parameters to the underlying LangChain class:
models:
- type: main
engine: openai
model: gpt-4
parameters:
temperature: 0.7
max_tokens: 1000
top_p: 0.9
Common parameters vary by provider. Refer to the LangChain documentation for provider-specific options.