Streaming Configuration#

NeMo Guardrails supports two levels of streaming configuration:

  1. Global streaming - Controls LLM token generation

  2. Output rail streaming - Controls how output rails process streamed tokens

Configuration Comparison#

Aspect

Global streaming

Output Rail streaming.enabled

Scope

LLM token generation

Output rail processing

Required for

Any streaming

Streaming with output rails

Affects

How LLM produces tokens

How rails process token chunks

Default

False

False

Quick Example#

When using streaming with output rails, both configurations are required:

# Global: Enable LLM streaming
streaming: True

rails:
  output:
    flows:
      - self check output
    # Output rail streaming: Enable chunked processing
    streaming:
      enabled: True
      chunk_size: 200
      context_size: 50

Streaming Configuration Details#

The following guides provide detailed documentation for each streaming configuration area.

Global Streaming

Enable streaming mode for LLM token generation in config.yml.

Global Streaming
Output Rail Streaming

Configure how output rails process streamed tokens in chunked mode.

Output Rail Streaming