Release Notes#

The following sections summarize and highlight the changes for each release. For a complete record of changes in a release, refer to the CHANGELOG.md in the GitHub repository.

0.20.0#

Key Features#

Added support for multilingual content safety models such as NVIDIA Nemotron Safety Guard 8B v3. This feature uses the fast-langdetect package to detect the user’s input language and return refusal messages in the appropriate language. To use this feature, install the NeMo Guardrails library with the multilingual extra.
```
pip install nemoguardrails[multilingual]
```

Added support for configuring custom refusal messages per language to complement multilingual content safety models. You can enable multilingual refusal messages and specify custom refusal messages in the rails.config.content_safety section of the config.yml file.

rails:
  config:
    content_safety:
      multilingual:
        enabled: true
        refusal_messages:
          en: "Sorry, I cannot help with that request."
          es: "Lo siento, no puedo ayudar con esa solicitud."
          zh: "抱歉，我无法处理该请求。"
          # Add other languages as needed

For more information, refer to Multilingual Refusal Messages.

Added support for NVIDIA GLiNER-PII for detecting entities such as names, email addresses, phone numbers, social security numbers, and more. For more information, refer to GLiNER Integration.

Breaking Changes#

A breaking change removes redundant streaming configuration for output rails. Prior to the change, streaming had to be enabled in two places: streaming and rails.output.streaming.enabled. This change removes the top-level streaming configuration.

Example config.yml before the change:

models:
  - type: main
    engine: nvidia_ai_endpoints
    model: meta/llama-3.3-70b-instruct
  - type: content_safety
    engine: nvidia_ai_endpoints
    model: nvidia/llama-3.1-nemoguard-8b-content-safety

rails:
  input:
    flows:
      - content safety check input $model=content_safety
  output:
    flows:
      - content safety check output $model=content_safety
    streaming:
      enabled: True
      chunk_size: 200
      context_size: 50

streaming: True # No longer needed starting from v0.20.0

Example config.yml after the change:

models:
  - type: main
    engine: nvidia_ai_endpoints
    model: meta/llama-3.3-70b-instruct

  - type: content_safety
    engine: nvidia_ai_endpoints
    model: nvidia/llama-3.1-nemoguard-8b-content-safety

rails:
  input:
    flows:
      - content safety check input $model=content_safety
  output:
    flows:
      - content safety check output $model=content_safety
    streaming:
      enabled: True
      chunk_size: 200
      context_size: 50

For more information, refer to Streaming Generated Responses in Real-Time.

Other Changes#

Restructured the documentation with improved navigation, clearer content organization, and updated configuration reference and user guides.

Release Notes#

0.20.0#

Key Features#

Breaking Changes#

Other Changes#

Previous Release Notes#