Model Configuration#

In this section, learn how to configure the models used in your guardrails configuration. For a complete reference of all configuration options, refer to the Configuration YAML Schema Reference.

NVIDIA NIM Configuration#

The NeMo Guardrails library provides seamless integration with NVIDIA NIM microservices:

models:
  - type: main
    engine: nim
    model: meta/llama-3.1-8b-instruct

This provides access to:

Locally-deployed NIMs: Run models on your own infrastructure with optimized inference.
NVIDIA API Catalog: Access hosted models on build.nvidia.com.
Specialized NIMs: NemoGuard Content Safety, Topic Control, and Jailbreak Detect.

Local NIM Deployment#

For locally-deployed NIMs, specify the base URL:

models:
  - type: main
    engine: nim
    model: meta/llama-3.1-8b-instruct
    parameters:
      base_url: http://localhost:8000/v1

Task-Specific Models#

Configure different models for specific tasks:

models:
  - type: main
    engine: nim
    model: meta/llama-3.1-8b-instruct

  - type: self_check_input
    engine: nim
    model: meta/llama3-8b-instruct

  - type: self_check_output
    engine: nim
    model: meta/llama-3.1-70b-instruct

  - type: generate_user_intent
    engine: nim
    model: meta/llama-3.1-8b-instruct

Configuration Examples#

OpenAI#

The following example shows how to configure the OpenAI model as the main application LLM:

models:
  - type: main
    engine: openai
    model: gpt-4o

Azure OpenAI#

The following example shows how to configure the Azure OpenAI model as the main application LLM using the Azure OpenAI API:

models:
  - type: main
    engine: azure
    model: gpt-4
    parameters:
      azure_deployment: my-gpt4-deployment
      azure_endpoint: https://my-resource.openai.azure.com

Note

Azure OpenAI is OpenAI-compatible at the wire level, but the LangChain path is the convenient default because langchain-openai handles the deployment-name URL pattern and api-version query string for you. Set NEMOGUARDRAILS_LLM_FRAMEWORK=langchain and install langchain-openai. Azure is also reachable through the built-in client with manual plumbing; see Migrating to 0.22.

Anthropic#

The following example shows how to configure the Anthropic model as the main application LLM:

models:
  - type: main
    engine: anthropic
    model: claude-3-5-sonnet-20241022

Note

Anthropic’s API isn’t OpenAI-compatible, so this engine is opt-in: set NEMOGUARDRAILS_LLM_FRAMEWORK=langchain and install langchain-anthropic. For background, see Migrating to 0.22.

vLLM (OpenAI-Compatible)#

vLLM exposes an OpenAI-compatible API, so the recommended configuration uses engine: openai pointed at the vLLM endpoint. The built-in client handles it with no LangChain dependency.

models:
  - type: main
    engine: openai
    model: meta-llama/Llama-3.1-8B-Instruct
    parameters:
      base_url: http://localhost:5000/v1
      api_key: EMPTY

The following example shows how to configure Llama Guard as a guardrail model using the same pattern:

models:
  - type: llama_guard
    engine: openai
    model: meta-llama/LlamaGuard-7b
    parameters:
      base_url: http://localhost:5000/v1
      api_key: EMPTY

When self-hosted vLLM does not enforce authentication, set parameters.api_key to any non-empty placeholder such as EMPTY. If your deployment requires a real token, replace parameters.api_key with the literal token, or omit it and set api_key_env_var at the top level of the model entry (not inside parameters:):

- type: main
  engine: openai
  model: meta-llama/Llama-3.1-8B-Instruct
  api_key_env_var: MY_VLLM_API_KEY
  parameters:
    base_url: http://localhost:5000/v1

Note

The referenced environment variable must be set before RailsConfig.from_content or RailsConfig.from_path is called. Otherwise, config loading fails with Model API Key environment variable 'X' not set.. This is a Pydantic validator on the model schema; the check is eager, not lazy.

Note

The legacy engine: vllm_openai with parameters.openai_api_base form is only needed when running under NEMOGUARDRAILS_LLM_FRAMEWORK=langchain. For new configurations, prefer the form above.

Other OpenAI-compatible endpoints#

The same engine: openai plus parameters.base_url pattern works for any provider whose wire protocol is OpenAI-compatible, including OpenRouter, Together.ai, Fireworks.ai, Groq, DeepSeek’s hosted API at https://api.deepseek.com/v1, TGI deployments that expose /v1/chat/completions, and llama.cpp server with --api. Provide parameters.base_url and either parameters.api_key or a top-level api_key_env_var.

Google Vertex AI#

The following example shows how to configure the Google Vertex AI model as the main application LLM:

models:
  - type: main
    engine: vertexai
    model: gemini-1.0-pro

Note

Vertex AI’s API isn’t OpenAI-compatible, so this engine is opt-in: set NEMOGUARDRAILS_LLM_FRAMEWORK=langchain and install langchain-google-vertexai. For background, see Migrating to 0.22.

Complete Example#

The following example shows how to configure the main application LLM, embeddings model, and a dedicated NemoGuard model for input and output checking:

models:
  # Main application LLM
  - type: main
    engine: nim
    model: meta/llama-3.1-70b-instruct
    parameters:
      temperature: 0.7
      max_tokens: 2000

  # Embeddings for knowledge base
  - type: embeddings
    engine: FastEmbed
    model: all-MiniLM-L6-v2

  # Dedicated model for input checking
  - type: self_check_input
    engine: nim
    model: nvidia/llama-3.1-nemoguard-8b-content-safety

  # Dedicated model for output checking
  - type: self_check_output
    engine: nim
    model: nvidia/llama-3.1-nemoguard-8b-content-safety

Model Parameters#

Pass additional parameters to the underlying LLM client. For engines served by the built-in client (any OpenAI-compatible endpoint), parameters are forwarded to the OpenAI-compatible HTTP request (for example, temperature, max_tokens, base_url, api_key, default_query, default_headers). For LangChain engines, parameters follow the conventions of the underlying LangChain class.

models:
  - type: main
    engine: openai
    model: gpt-4
    parameters:
      temperature: 0.7
      max_tokens: 1000
      top_p: 0.9

Common parameters vary by provider. For built-in engines, see the OpenAI-compatible client options. For LangChain engines, refer to the corresponding LangChain provider documentation.