Guardrails Configuration#

This section describes how to configure guardrails (rails) in the config.yml file to control LLM behavior.

The rails Key#

The rails key defines which guardrails are active and their configuration options. Rails are organized into five categories based on when they trigger during the guardrails process.

Rail Categories#

The following table summarizes the different rail categories and their trigger points.

Category

Trigger Point

Purpose

Input rails

When user input is received

Validate, filter, or modify user input

Output rails

When LLM generates output

Validate, filter, or modify bot responses

Dialog rails

After canonical form is computed

Control conversation flow

Retrieval rails

After RAG retrieval completes

Process retrieved chunks

Execution rails

Before/after action execution

Control tool and action calls

The following diagram shows the guardrails process described in the table above in detail.

Diagram showing the programmable guardrails flow

Basic Configuration#

rails:
  input:
    flows:
      - self check input
      - check jailbreak
      - mask sensitive data on input

  output:
    flows:
      - self check output
      - self check facts
      - check output sensitive data

  retrieval:
    flows:
      - check retrieval sensitive data

Input Rails#

Input rails process user messages before they reach the LLM:

rails:
  input:
    flows:
      - self check input           # LLM-based input validation
      - check jailbreak            # Jailbreak detection
      - mask sensitive data on input  # PII masking

Available Flows for Input Rails#

Flow

Description

self check input

LLM-based policy compliance check

check jailbreak

Detect jailbreak attempts

mask sensitive data on input

Mask PII in user input

detect sensitive data on input

Detect and block PII

llama guard check input

LlamaGuard content moderation

content safety check input

NVIDIA content safety model

Output Rails#

Output rails process LLM responses before returning to users:

rails:
  output:
    flows:
      - self check output          # LLM-based output validation
      - self check facts           # Fact verification
      - self check hallucination   # Hallucination detection
      - mask sensitive data on output  # PII masking

Available Flows for Output Rails#

Flow

Description

self check output

LLM-based policy compliance check

self check facts

Verify factual accuracy

self check hallucination

Detect hallucinations

mask sensitive data on output

Mask PII in output

llama guard check output

LlamaGuard content moderation

content safety check output

NVIDIA content safety model

Dialog Rails#

Dialog rails control conversation flow after user intent is determined:

rails:
  dialog:
    single_call:
      enabled: false
      fallback_to_multiple_calls: true

    user_messages:
      embeddings_only: false

Dialog Configuration Options#

Option

Description

Default

single_call.enabled

Use single LLM call for intent, next step, and message

false

single_call.fallback_to_multiple_calls

Fall back to multiple calls if single call fails

true

user_messages.embeddings_only

Use only embeddings for user intent matching

false

Retrieval Rails#

Retrieval rails process chunks retrieved from the knowledge base:

rails:
  retrieval:
    flows:
      - check retrieval sensitive data

Execution Rails#

Execution rails control custom action and tool invocations:

rails:
  execution:
    flows:
      - check tool input
      - check tool output

Rail-Specific Configuration#

Configure options for specific rails using the config key:

rails:
  config:
    # Sensitive data detection settings
    sensitive_data_detection:
      input:
        entities:
          - PERSON
          - EMAIL_ADDRESS
          - PHONE_NUMBER
      output:
        entities:
          - PERSON
          - EMAIL_ADDRESS

    # Jailbreak detection settings
    jailbreak_detection:
      length_per_perplexity_threshold: 89.79
      prefix_suffix_perplexity_threshold: 1845.65

    # Fact-checking settings
    fact_checking:
      parameters:
        endpoint: "http://localhost:5000"

Example Configuration#

Complete guardrails configuration example:

rails:
  # Input validation
  input:
    flows:
      - self check input
      - check jailbreak
      - mask sensitive data on input

  # Output validation
  output:
    flows:
      - self check output
      - self check facts

  # Retrieval processing
  retrieval:
    flows:
      - check retrieval sensitive data

  # Dialog behavior
  dialog:
    single_call:
      enabled: false

  # Rail-specific settings
  config:
    sensitive_data_detection:
      input:
        entities:
          - PERSON
          - EMAIL_ADDRESS
          - CREDIT_CARD
      output:
        entities:
          - PERSON
          - EMAIL_ADDRESS