Complete Configuration Reference#

This reference documents all configuration options for config.yml, derived from the authoritative Pydantic schema in nemoguardrails/rails/llm/config.py.

Configuration Structure#

models:           # LLM and embedding model configurations
  - type: main
    engine: openai
    model: gpt-4

rails:            # Guardrail configurations
  input:
    flows: []
  output:
    flows: []
  config: {}

prompts:          # Task-specific prompts
  - task: self_check_input
    content: "..."

instructions:     # System instructions
  - type: general
    content: "..."

Models Configuration#

The models key defines LLM providers and models used by NeMo Guardrails.

Model Schema#

models:
  - type: main                    # Required: Model type
    engine: openai                # Required: LLM provider
    model: gpt-4                  # Required: Model name
    mode: chat                    # Optional: "chat" or "text" (default: "chat")
    api_key_env_var: OPENAI_KEY   # Optional: Environment variable for API key
    parameters:                   # Optional: Provider-specific parameters
      temperature: 0.7
      max_tokens: 1000
    cache:                        # Optional: Caching configuration
      enabled: false
      maxsize: 50000

Model Attributes#

Attribute

Type

Required

Description

type

string

Model type: main, embeddings, or task-specific

engine

string

LLM provider (see Engines)

model

string

Model name (can also be in parameters.model_name)

mode

string

Completion mode: chat or text (default: chat)

api_key_env_var

string

Environment variable containing API key

parameters

object

Provider-specific parameters passed to LangChain

cache

object

Cache configuration for this model

Model Types#

Type

Description

main

Primary application LLM

embeddings

Embedding generation model

self_check_input

Input validation model

self_check_output

Output validation model

content_safety

Content safety model

topic_control

Topic control model

generate_user_intent

Canonical user intent generation

generate_next_steps

Next step prediction

generate_bot_message

Bot response generation

fact_checking

Fact verification model

llama_guard

LlamaGuard content moderation

Engines#

Core Engines#

Engine

Description

openai

OpenAI models

nim

NVIDIA NIM microservices

nvidia_ai_endpoints

Alias for nim

azure

Azure OpenAI models

anthropic

Anthropic Claude models

cohere

Cohere models

vertexai

Google Vertex AI

Self-Hosted Engines#

Engine

Description

huggingface_hub

HuggingFace Hub models

huggingface_endpoint

HuggingFace Inference Endpoints

vllm_openai

vLLM with OpenAI-compatible API

trt_llm

TensorRT-LLM

self_hosted

Generic self-hosted models

Embedding Engines#

Engine

Description

FastEmbed

FastEmbed (default)

openai

OpenAI embeddings

nim

NVIDIA NIM embeddings

Model Cache Configuration#

models:
  - type: content_safety
    engine: nim
    model: nvidia/llama-3.1-nemotron-safety-guard-8b-v3
    cache:
      enabled: true
      maxsize: 50000
      stats:
        enabled: false
        log_interval: null

Attribute

Type

Default

Description

enabled

boolean

false

Enable caching for this model

maxsize

integer

50000

Maximum cache entries

stats.enabled

boolean

false

Enable cache statistics tracking

stats.log_interval

float

null

Seconds between stats logging


Rails Configuration#

The rails key configures guardrails that control LLM behavior.

Rails Schema#

rails:
  input:
    parallel: false
    flows:
      - self check input
      - check jailbreak

  output:
    parallel: false
    flows:
      - self check output
    streaming:
      enabled: false
      chunk_size: 200
      context_size: 50
      stream_first: true

  retrieval:
    flows:
      - check retrieval sensitive data

  dialog:
    single_call:
      enabled: false
      fallback_to_multiple_calls: true
    user_messages:
      embeddings_only: false

  actions:
    instant_actions: []

  tool_output:
    flows: []
    parallel: false

  tool_input:
    flows: []
    parallel: false

  config:
    # Rail-specific configurations

Input Rails#

Process user messages before they reach the LLM.

rails:
  input:
    parallel: false      # Execute flows in parallel
    flows:
      - self check input
      - check jailbreak
      - mask sensitive data on input

Attribute

Type

Default

Description

parallel

boolean

false

Execute input rails in parallel

flows

list

[]

Names of flows that implement input rails

Built-in Input Flows#

Flow

Description

self check input

LLM-based policy compliance check

check jailbreak

Jailbreak detection heuristics

jailbreak detection model

NIM-based jailbreak detection

mask sensitive data on input

Mask PII in user input

detect sensitive data on input

Detect and block PII

llama guard check input

LlamaGuard content moderation

content safety check input

NVIDIA content safety model

topic safety check input

Topic control model

Output Rails#

Process LLM responses before returning to users.

rails:
  output:
    parallel: false
    flows:
      - self check output
      - self check facts
    streaming:
      enabled: false
      chunk_size: 200
      context_size: 50
      stream_first: true

Attribute

Type

Default

Description

parallel

boolean

false

Execute output rails in parallel

flows

list

[]

Names of flows that implement output rails

streaming

object

Streaming output configuration

Output Streaming Configuration#

Attribute

Type

Default

Description

enabled

boolean

false

Enable streaming mode

chunk_size

integer

200

Tokens per processing chunk

context_size

integer

50

Tokens carried from previous chunk

stream_first

boolean

true

Stream before applying output rails

Built-in Output Flows#

Flow

Description

self check output

LLM-based policy compliance check

self check facts

Fact verification

self check hallucination

Hallucination detection

mask sensitive data on output

Mask PII in output

llama guard check output

LlamaGuard content moderation

content safety check output

NVIDIA content safety model

Retrieval Rails#

Process chunks retrieved from knowledge base.

rails:
  retrieval:
    flows:
      - check retrieval sensitive data

Dialog Rails#

Control conversation flow after user intent is determined.

rails:
  dialog:
    single_call:
      enabled: false
      fallback_to_multiple_calls: true
    user_messages:
      embeddings_only: false
      embeddings_only_similarity_threshold: null
      embeddings_only_fallback_intent: null

Attribute

Type

Default

Description

single_call.enabled

boolean

false

Use single LLM call for intent + response

single_call.fallback_to_multiple_calls

boolean

true

Fall back if single call fails

user_messages.embeddings_only

boolean

false

Use only embeddings for intent matching

Action Rails#

Control custom action and tool invocations.

rails:
  actions:
    instant_actions:
      - action_name_1
      - action_name_2

Tool Rails#

Control tool input/output processing.

rails:
  tool_output:
    flows:
      - validate tool parameters
    parallel: false

  tool_input:
    flows:
      - filter tool results
    parallel: false

Rails Config Section#

The rails.config section contains configuration for specific built-in rails.

Jailbreak Detection#

rails:
  config:
    jailbreak_detection:
      # Heuristics-based detection
      server_endpoint: null
      length_per_perplexity_threshold: 89.79
      prefix_suffix_perplexity_threshold: 1845.65

      # NIM-based detection
      nim_base_url: "http://localhost:8000/v1/"
      nim_server_endpoint: "classify"
      api_key_env_var: "JAILBREAK_KEY"

Attribute

Type

Default

Description

server_endpoint

string

null

Heuristics model endpoint

length_per_perplexity_threshold

float

89.79

Length/perplexity threshold

prefix_suffix_perplexity_threshold

float

1845.65

Prefix/suffix perplexity threshold

nim_base_url

string

null

NIM base URL (e.g., http://localhost:8000/v1)

nim_server_endpoint

string

"classify"

NIM endpoint path

api_key_env_var

string

null

Environment variable for API key

api_key

string

null

API key (not recommended)

Sensitive Data Detection (Presidio)#

rails:
  config:
    sensitive_data_detection:
      recognizers: []
      input:
        entities:
          - PERSON
          - EMAIL_ADDRESS
          - PHONE_NUMBER
          - CREDIT_CARD
        mask_token: "*"
        score_threshold: 0.2
      output:
        entities:
          - PERSON
          - EMAIL_ADDRESS
      retrieval:
        entities: []

Attribute

Type

Default

Description

recognizers

list

[]

Custom Presidio recognizers

input/output/retrieval.entities

list

[]

Entity types to detect

input/output/retrieval.mask_token

string

"*"

Token for masking

input/output/retrieval.score_threshold

float

0.2

Detection confidence threshold

Injection Detection#

rails:
  config:
    injection_detection:
      injections:
        - sqli
        - template
        - code
        - xss
      action: reject    # "reject" or "omit"
      yara_path: ""
      yara_rules: {}

Attribute

Type

Default

Description

injections

list

[]

Injection types: sqli, template, code, xss

action

string

"reject"

Action: reject or omit

yara_path

string

""

Custom YARA rules path

yara_rules

object

{}

Inline YARA rules

Fact Checking#

rails:
  config:
    fact_checking:
      parameters:
        endpoint: "http://localhost:5000"
      fallback_to_self_check: false

Content Safety#

rails:
  config:
    content_safety:
      multilingual:
        enabled: false
        refusal_messages:
          en: "Sorry, I cannot help with that."
          es: "Lo siento, no puedo ayudar con eso."

Third-Party Integrations#

AutoAlign#

rails:
  config:
    autoalign:
      parameters: {}
      input:
        guardrails_config: {}
      output:
        guardrails_config: {}

Patronus#

rails:
  config:
    patronus:
      input:
        evaluate_config:
          success_strategy: all_pass  # or any_pass
          params: {}
      output:
        evaluate_config:
          success_strategy: all_pass
          params: {}

Clavata#

rails:
  config:
    clavata:
      server_endpoint: "https://gateway.app.clavata.ai:8443"
      policies: {}
      label_match_logic: ANY  # or ALL
      input:
        policy: "policy_alias"
        labels: []
      output:
        policy: "policy_alias"
        labels: []

Pangea AI Guard#

rails:
  config:
    pangea:
      input:
        recipe: "recipe_key"
      output:
        recipe: "recipe_key"

Trend Micro#

rails:
  config:
    trend_micro:
      v1_url: "https://api.xdr.trendmicro.com/beta/aiSecurity/guard"
      api_key_env_var: "TREND_MICRO_API_KEY"

Cisco AI Defense#

rails:
  config:
    ai_defense:
      timeout: 30.0
      fail_open: false

Private AI#

rails:
  config:
    private_ai_detection:
      server_endpoint: "http://localhost:8080/process/text"
      input:
        entities: []
      output:
        entities: []
      retrieval:
        entities: []

Fiddler Guardrails#

rails:
  config:
    fiddler:
      fiddler_endpoint: "http://localhost:8080/process/text"
      safety_threshold: 0.1
      faithfulness_threshold: 0.05

Guardrails AI#

rails:
  config:
    guardrails_ai:
      input:
        validators:
          - name: toxic_language
            parameters:
              threshold: 0.5
            metadata: {}
      output:
        validators:
          - name: pii
            parameters: {}

Prompts Configuration#

Define prompts for LLM tasks.

prompts:
  - task: self_check_input
    content: |
      Your task is to check if the user input is safe.
      User input: {{ user_input }}
      Answer [Yes/No]:
    output_parser: null
    max_length: 16000
    max_tokens: null
    mode: standard
    stop: null
    models: null    # Restrict to specific engines/models

Attribute

Type

Default

Description

task

string

Task identifier

content

string

Prompt template (mutually exclusive with messages)

messages

list

Chat messages (mutually exclusive with content)

output_parser

string

null

Output parser name

max_length

integer

16000

Maximum prompt length (characters)

max_tokens

integer

null

Maximum response tokens

mode

string

"standard"

Prompting mode

stop

list

null

Stop tokens

models

list

null

Restrict to engines/models (e.g., ["openai", "nim/llama-3.1"])


Other Configuration Options#

Instructions#

instructions:
  - type: general
    content: |
      You are a helpful assistant.

Sample Conversation#

sample_conversation: |
  user: Hello
  assistant: Hi! How can I help you?

Knowledge Base#

knowledge_base:
  folder: kb
  embedding_search_provider:
    name: default
    parameters: {}
    cache:
      enabled: false

Core Settings#

core:
  embedding_search_provider:
    name: default
    parameters: {}

Tracing#

tracing:
  enabled: false
  adapters:
    - name: FileSystem
  span_format: opentelemetry
  enable_content_capture: false

Streaming#

streaming:
  enabled: false
  stream_on_start: false
  stream_on_end: true
  first_chunk_suffix: ""
  last_chunk_suffix: ""

Import Paths#

import_paths:
  - path/to/shared/config

Complete Example#

models:
  # Main application LLM
  - type: main
    engine: nim
    model: meta/llama-3.1-70b-instruct
    parameters:
      temperature: 0.7

  # Content safety model
  - type: content_safety
    engine: nim
    parameters:
      base_url: "http://localhost:8000/v1"
      model_name: "nvidia/llama-3.1-nemotron-safety-guard-8b-v3"

  # Embeddings
  - type: embeddings
    engine: FastEmbed
    model: all-MiniLM-L6-v2

rails:
  input:
    flows:
      - content safety check input $model=content_safety

  output:
    flows:
      - content safety check output $model=content_safety

  config:
    jailbreak_detection:
      nim_base_url: "http://localhost:8001/v1/"

prompts:
  - task: content_safety_check_input $model=content_safety
    content: |
      Check if this content is safe: {{ user_input }}
    output_parser: nemoguard_parse_prompt_safety
    max_tokens: 50

instructions:
  - type: general
    content: |
      You are a helpful, harmless, and honest assistant.

streaming:
  enabled: true