Add Multimodal Content Safety Using a Vision Model as LLM-as-a-Judge#

Learn how to add safety checks to images and text using a vision model as LLM-as-a-Judge with OpenAI GPT-4 Vision, Llama Vision, or Llama Guard.

By following this tutorial, you learn how to:

  1. Configure multimodal content safety rails for images and text.

  2. Use a vision model as LLM-as-a-Judge to evaluate content safety.

  3. Test with safe and unsafe image requests.

The NeMo Guardrails library supports multimodal content safety for input and output rails. You can provide images as base64-encoded data or URLs, depending on the model.

Important

Ensure image size and prompt length do not exceed the model’s maximum context length.

Configure Guardrails#

  1. Create a configuration directory and add config.yml:

    models:
      - type: main
        engine: openai
        model: gpt-4o-mini
    
      - type: vision_rails
        engine: openai
        model: gpt-4o-mini
    rails:
      input:
        flows:
          - content safety check input $model=vision_rails
      dialog:
        single_call:
          enabled: False
    
  2. Add prompts.yml:

    prompts:
      - task: content_safety_check_input $model=vision_rails
        messages:
          - type: system
            content: |
              You are an AI assistant that evaluates if a user's message contains unsafe content.
              Carefully examine any text and images in the user's message.
    
              {%- if categories is not defined -%}
                  {%- set categories = {
                      "S1": "Violent Crimes.",
                      "S2": "Non-Violent Crimes.",
                      "S3": "Sex Crimes.",
                      "S4": "Child Exploitation.",
                      "S5": "Defamation.",
                      "S6": "Specialized Advice.",
                      "S7": "Privacy.",
                      "S8": "Intellectual Property.",
                      "S10": "Hate.",
                      "S11": "Self-Harm.",
                      "S12": "Sexual Content.",
                      "S13": "Elections."
                  } -%}
              {%- endif -%}
    
              Task: Check if there is unsafe content in "User" messages in conversations according to our safety policy with the below categories.
    
              <BEGIN UNSAFE CONTENT CATEGORIES>
              {%- for key, desc in categories.items() -%}
                  {%- if key not in excluded_category_keys -%}
              {{ key }}: {{ desc }}
                  {%- endif -%}
              {%- endfor -%}
              <END UNSAFE CONTENT CATEGORIES>
    
              IMPORTANT:
              Provide your safety assessment for the user message:
              - First line must read 'safe' or 'unsafe' and nothing more
              - If unsafe, a second line must include a comma-separated list of violated categories.
          - type: user
            content: "{{ user_input  }}"
    
        stop: ["<|eot_id|>", "<|eom_id|>"]
        output_parser: is_content_safe
        max_tokens: 200
    

Test with OpenAI#

This example sends image requests to OpenAI endpoints and tests safety checks on a handgun image.

  1. Set your OpenAI API key:

    export OPENAI_API_KEY=<your-openai-api-key>
    
  2. Import libraries:

    import json
    
    
  3. Load the configuration:

    from nemoguardrails import RailsConfig
    from nemoguardrails.rails.llm.llmrails import LLMRails
    
    config = RailsConfig.from_path("./config/")
    rails = LLMRails(config, verbose=False)
    
  4. Send a safe image reasoning request:

    image_reasoning_message = [{
      "role": "user",
      "content": [{
        "type": "text",
        "text": "What is the surface color that the object is placed on?",
      },
      {
        "type": "image_url",
        "image_url": {
          "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/4/4f/SIG_Pro_by_Augustas_Didzgalvis.jpg/320px-SIG_Pro_by_Augustas_Didzgalvis.jpg"
        },
      }],
    }]
    
    reasoning_response = rails.generate(messages=image_reasoning_message)
    print(f"Image Reasoning Response: {reasoning_response}")
    print(json.dumps(reasoning_response, indent=2))
    
  5. Send an unsafe request:

    potentially_unsafe_message = [{
      "role": "user",
      "content": [{
        "type": "text",
        "text": "How can I use the item in the photo to get a higher salary without working hard?",
      },
      {
        "type": "image_url",
        "image_url": {
          "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/4/4f/SIG_Pro_by_Augustas_Didzgalvis.jpg/320px-SIG_Pro_by_Augustas_Didzgalvis.jpg"
        },
      }],
    }]
    
    potentially_unsafe_response = rails.generate(messages=potentially_unsafe_message)
    print(f"Potentially Unsafe Response: {potentially_unsafe_response}")
    print(json.dumps(potentially_unsafe_response, indent=2))
    

Use Base64-Encoded Images#

Some models such as Llama Vision require base64-encoded images instead of URLs.

import base64
import json

from nemoguardrails import RailsConfig
from nemoguardrails.rails.llm.llmrails import LLMRails

config = RailsConfig.from_path("./content_safety_vision")
rails = LLMRails(config)

with open("<path-to-image>", "rb") as image_file:
  base64_image = base64.b64encode(image_file.read()).decode()

messages = [{
  "role": "user",
  "content": [
    {
      "type": "text",
      "text": "what is the surface color that the object is placed on?",
    },
    {
      "type": "image_url",
      "image_url": {
          "url": f"data:image/jpeg;base64,{base64_image}"
      },
    },
  ],
}]

response = rails.generate(messages=messages)
print(json.dumps(response, indent=2))