Guardrails Configuration#
This section describes how to configure guardrails (rails) in the config.yml file to control LLM behavior.
The rails Key#
The rails key defines which guardrails are active and their configuration options.
Rails are organized into five categories based on when they trigger during the guardrails process.
Rail Categories#
The following table summarizes the different rail categories and their trigger points.
Category |
Trigger Point |
Purpose |
|---|---|---|
Input rails |
When user input is received |
Validate, filter, or modify user input |
Output rails |
When LLM generates output |
Validate, filter, or modify bot responses |
Dialog rails |
After canonical form is computed |
Control conversation flow |
Retrieval rails |
After RAG retrieval completes |
Process retrieved chunks |
Execution rails |
Before/after action execution |
Control tool and action calls |
The following diagram shows the guardrails process described in the table above in detail.
Basic Configuration#
rails:
input:
flows:
- self check input
- check jailbreak
- mask sensitive data on input
output:
flows:
- self check output
- self check facts
- check output sensitive data
retrieval:
flows:
- check retrieval sensitive data
Input Rails#
Input rails process user messages before they reach the LLM:
rails:
input:
flows:
- self check input # LLM-based input validation
- check jailbreak # Jailbreak detection
- mask sensitive data on input # PII masking
Available Flows for Input Rails#
Flow |
Description |
|---|---|
|
LLM-based policy compliance check |
|
Detect jailbreak attempts |
|
Mask PII in user input |
|
Detect and block PII |
|
LlamaGuard content moderation |
|
NVIDIA content safety model |
Output Rails#
Output rails process LLM responses before returning to users:
rails:
output:
flows:
- self check output # LLM-based output validation
- self check facts # Fact verification
- self check hallucination # Hallucination detection
- mask sensitive data on output # PII masking
Available Flows for Output Rails#
Flow |
Description |
|---|---|
|
LLM-based policy compliance check |
|
Verify factual accuracy |
|
Detect hallucinations |
|
Mask PII in output |
|
LlamaGuard content moderation |
|
NVIDIA content safety model |
Dialog Rails#
Dialog rails control conversation flow after user intent is determined:
rails:
dialog:
single_call:
enabled: false
fallback_to_multiple_calls: true
user_messages:
embeddings_only: false
Dialog Configuration Options#
Option |
Description |
Default |
|---|---|---|
|
Use single LLM call for intent, next step, and message |
|
|
Fall back to multiple calls if single call fails |
|
|
Use only embeddings for user intent matching |
|
Retrieval Rails#
Retrieval rails process chunks retrieved from the knowledge base:
rails:
retrieval:
flows:
- check retrieval sensitive data
Execution Rails#
Execution rails control custom action and tool invocations:
rails:
execution:
flows:
- check tool input
- check tool output
Rail-Specific Configuration#
Configure options for specific rails using the config key:
rails:
config:
# Sensitive data detection settings
sensitive_data_detection:
input:
entities:
- PERSON
- EMAIL_ADDRESS
- PHONE_NUMBER
output:
entities:
- PERSON
- EMAIL_ADDRESS
# Jailbreak detection settings
jailbreak_detection:
length_per_perplexity_threshold: 89.79
prefix_suffix_perplexity_threshold: 1845.65
# Fact-checking settings
fact_checking:
parameters:
endpoint: "http://localhost:5000"
Example Configuration#
Complete guardrails configuration example:
rails:
# Input validation
input:
flows:
- self check input
- check jailbreak
- mask sensitive data on input
# Output validation
output:
flows:
- self check output
- self check facts
# Retrieval processing
retrieval:
flows:
- check retrieval sensitive data
# Dialog behavior
dialog:
single_call:
enabled: false
# Rail-specific settings
config:
sensitive_data_detection:
input:
entities:
- PERSON
- EMAIL_ADDRESS
- CREDIT_CARD
output:
entities:
- PERSON
- EMAIL_ADDRESS