Model Configuration#

This section describes how to configure LLM models and embedding models in the config.yml file.

The `models` Key#

The models key defines the LLM providers and models used by the NeMo Guardrails toolkit.

models:
  - type: main
    engine: openai
    model: gpt-3.5-turbo-instruct

Attribute	Description
`type`	The model type (`main`, `embeddings`, or task-specific types)
`engine`	The LLM provider (for example, `openai`, `nim`, `anthropic`)
`model`	The model name (for example, `gpt-3.5-turbo-instruct`, `meta/llama-3.1-8b-instruct`)
`parameters`	Optional parameters to pass to the LangChain class that is used by the LLM provider. For example, when engine is set to openai, the toolkit loads the ChatOpenAI class. The ChatOpenAI class supports temperature, max_tokens, and other class-specific arguments.

LLM Engines#

Core Engines#

Engine	Description
`openai`	OpenAI models
`nim`	NVIDIA NIM microservices
`nvidia_ai_endpoints`	Alias for `nim` engine
`azure`	Azure OpenAI models
`anthropic`	Anthropic Claude models
`cohere`	Cohere models
`vertexai`	Google Vertex AI

Self-Hosted Engines#

Engine	Description
`huggingface_hub`	HuggingFace Hub models
`huggingface_endpoint`	HuggingFace Inference Endpoints
`vllm_openai`	vLLM with OpenAI-compatible API
`trt_llm`	TensorRT-LLM
`self_hosted`	Generic self-hosted models

Auto-Discovered LangChain Providers#

The toolkit automatically discovers all LLM providers from LangChain Community at runtime. This includes 50+ additional providers. Use the provider name as the engine value in your configuration.

To help you explore and select the right LLM provider, the toolkit CLI provides the find-providers command to discover available LLM providers:

nemoguardrails find-providers [--list]

Embedding Engines#

Engine	Description
`FastEmbed`	FastEmbed (default)
`openai`	OpenAI embeddings
`nim`	NVIDIA NIM embeddings

Embeddings Configuration#

models:
  - type: main
    engine: openai
    model: gpt-3.5-turbo-instruct

  - type: embeddings
    engine: FastEmbed
    model: all-MiniLM-L6-v2

NVIDIA NIM Configuration#

The NeMo Guardrails toolkit provides seamless integration with NVIDIA NIM microservices:

models:
  - type: main
    engine: nim
    model: meta/llama-3.1-8b-instruct

This provides access to:

Locally-deployed NIMs: Run models on your own infrastructure with optimized inference.
NVIDIA API Catalog: Access hosted models on build.nvidia.com.
Specialized NIMs: NemoGuard Content Safety, Topic Control, and Jailbreak Detection.

Local NIM Deployment#

For locally-deployed NIMs, specify the base URL:

models:
  - type: main
    engine: nim
    model: meta/llama-3.1-8b-instruct
    parameters:
      base_url: http://localhost:8000/v1

Task-Specific Models#

Configure different models for specific tasks:

models:
  - type: main
    engine: nim
    model: meta/llama-3.1-8b-instruct

  - type: self_check_input
    engine: nim
    model: meta/llama3-8b-instruct

  - type: self_check_output
    engine: nim
    model: meta/llama-3.1-70b-instruct

  - type: generate_user_intent
    engine: nim
    model: meta/llama-3.1-8b-instruct

Available Task Types#

Task Type	Description
`main`	Primary application LLM
`embeddings`	Embedding generation
`self_check_input`	Input validation checks
`self_check_output`	Output validation checks
`generate_user_intent`	Canonical user intent generation
`generate_next_steps`	Next step prediction
`generate_bot_message`	Bot response generation
`fact_checking`	Fact verification

Configuration Examples#

OpenAI#

The following example shows how to configure the OpenAI model as the main application LLM:

models:
  - type: main
    engine: openai
    model: gpt-4o

Azure OpenAI#

The following example shows how to configure the Azure OpenAI model as the main application LLM using the Azure OpenAI API:

models:
  - type: main
    engine: azure
    model: gpt-4
    parameters:
      azure_deployment: my-gpt4-deployment
      azure_endpoint: https://my-resource.openai.azure.com

Anthropic#

The following example shows how to configure the Anthropic model as the main application LLM:

models:
  - type: main
    engine: anthropic
    model: claude-3-5-sonnet-20241022

vLLM (OpenAI-Compatible)#

The following example shows how to configure the vLLM model as the main application LLM using the vLLM OpenAI API:

models:
  - type: main
    engine: vllm_openai
    parameters:
      openai_api_base: http://localhost:5000/v1
      model_name: meta-llama/Llama-3.1-8B-Instruct

Google Vertex AI#

The following example shows how to configure the Google Vertex AI model as the main application LLM:

models:
  - type: main
    engine: vertexai
    model: gemini-pro
    parameters:
      project: my-gcp-project
      location: us-central1

Complete Example#

The following example shows how to configure the main application LLM, embeddings model, and a dedicated NemoGuard model for input and output checking:

models:
  # Main application LLM
  - type: main
    engine: nim
    model: meta/llama-3.1-70b-instruct
    parameters:
      temperature: 0.7
      max_tokens: 2000

  # Embeddings for knowledge base
  - type: embeddings
    engine: FastEmbed
    model: all-MiniLM-L6-v2

  # Dedicated model for input checking
  - type: self_check_input
    engine: nim
    model: nvidia/llama-3.1-nemoguard-8b-content-safety

  # Dedicated model for output checking
  - type: self_check_output
    engine: nim
    model: nvidia/llama-3.1-nemoguard-8b-content-safety

Model Parameters#

Pass additional parameters to the underlying LangChain class:

models:
  - type: main
    engine: openai
    model: gpt-4
    parameters:
      temperature: 0.7
      max_tokens: 1000
      top_p: 0.9

Common parameters vary by provider. Refer to the LangChain documentation for provider-specific options.