Streaming Configuration#

NeMo Guardrails supports two levels of streaming configuration:

Global streaming - Controls LLM token generation
Output rail streaming - Controls how output rails process streamed tokens

Configuration Comparison#

Aspect	Global `streaming`	Output Rail `streaming.enabled`
Scope	LLM token generation	Output rail processing
Required for	Any streaming	Streaming with output rails
Affects	How LLM produces tokens	How rails process token chunks
Default	`False`	`False`

Quick Example#

When using streaming with output rails, both configurations are required:

# Global: Enable LLM streaming
streaming: True

rails:
  output:
    flows:
      - self check output
    # Output rail streaming: Enable chunked processing
    streaming:
      enabled: True
      chunk_size: 200
      context_size: 50

Streaming Configuration Details#

The following guides provide detailed documentation for each streaming configuration area.

Global Streaming

Enable streaming mode for LLM token generation in config.yml.

Global Streaming

Output Rail Streaming

Configure how output rails process streamed tokens in chunked mode.

Output Rail Streaming