Add Multimodal Content Safety Using a Vision Model as LLM-as-a-Judge#
Learn how to add safety checks to images and text using a vision model as LLM-as-a-Judge with OpenAI GPT-4 Vision, Llama Vision, or Llama Guard.
By following this tutorial, you learn how to:
Configure multimodal content safety rails for images and text.
Use a vision model as LLM-as-a-Judge to evaluate content safety.
Test with safe and unsafe image requests.
The NeMo Guardrails library supports multimodal content safety for input and output rails. You can provide images as base64-encoded data or URLs, depending on the model.
Important
Ensure image size and prompt length do not exceed the model’s maximum context length.
Configure Guardrails#
Create a configuration directory and add
config.yml:models: - type: main engine: openai model: gpt-4o-mini - type: vision_rails engine: openai model: gpt-4o-mini rails: input: flows: - content safety check input $model=vision_rails dialog: single_call: enabled: False
Add
prompts.yml:prompts: - task: content_safety_check_input $model=vision_rails messages: - type: system content: | You are an AI assistant that evaluates if a user's message contains unsafe content. Carefully examine any text and images in the user's message. {%- if categories is not defined -%} {%- set categories = { "S1": "Violent Crimes.", "S2": "Non-Violent Crimes.", "S3": "Sex Crimes.", "S4": "Child Exploitation.", "S5": "Defamation.", "S6": "Specialized Advice.", "S7": "Privacy.", "S8": "Intellectual Property.", "S10": "Hate.", "S11": "Self-Harm.", "S12": "Sexual Content.", "S13": "Elections." } -%} {%- endif -%} Task: Check if there is unsafe content in "User" messages in conversations according to our safety policy with the below categories. <BEGIN UNSAFE CONTENT CATEGORIES> {%- for key, desc in categories.items() -%} {%- if key not in excluded_category_keys -%} {{ key }}: {{ desc }} {%- endif -%} {%- endfor -%} <END UNSAFE CONTENT CATEGORIES> IMPORTANT: Provide your safety assessment for the user message: - First line must read 'safe' or 'unsafe' and nothing more - If unsafe, a second line must include a comma-separated list of violated categories. - type: user content: "{{ user_input }}" stop: ["<|eot_id|>", "<|eom_id|>"] output_parser: is_content_safe max_tokens: 200
Test with OpenAI#
This example sends image requests to OpenAI endpoints and tests safety checks on a handgun image.
Set your OpenAI API key:
export OPENAI_API_KEY=<your-openai-api-key>Import libraries:
import json
Load the configuration:
from nemoguardrails import RailsConfig from nemoguardrails.rails.llm.llmrails import LLMRails config = RailsConfig.from_path("./config/") rails = LLMRails(config, verbose=False)
Send a safe image reasoning request:
image_reasoning_message = [{ "role": "user", "content": [{ "type": "text", "text": "What is the surface color that the object is placed on?", }, { "type": "image_url", "image_url": { "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/4/4f/SIG_Pro_by_Augustas_Didzgalvis.jpg/320px-SIG_Pro_by_Augustas_Didzgalvis.jpg" }, }], }] reasoning_response = rails.generate(messages=image_reasoning_message) print(f"Image Reasoning Response: {reasoning_response}") print(json.dumps(reasoning_response, indent=2))
Send an unsafe request:
potentially_unsafe_message = [{ "role": "user", "content": [{ "type": "text", "text": "How can I use the item in the photo to get a higher salary without working hard?", }, { "type": "image_url", "image_url": { "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/4/4f/SIG_Pro_by_Augustas_Didzgalvis.jpg/320px-SIG_Pro_by_Augustas_Didzgalvis.jpg" }, }], }] potentially_unsafe_response = rails.generate(messages=potentially_unsafe_message) print(f"Potentially Unsafe Response: {potentially_unsafe_response}") print(json.dumps(potentially_unsafe_response, indent=2))
Use Base64-Encoded Images#
Some models such as Llama Vision require base64-encoded images instead of URLs.
import base64
import json
from nemoguardrails import RailsConfig
from nemoguardrails.rails.llm.llmrails import LLMRails
config = RailsConfig.from_path("./content_safety_vision")
rails = LLMRails(config)
with open("<path-to-image>", "rb") as image_file:
base64_image = base64.b64encode(image_file.read()).decode()
messages = [{
"role": "user",
"content": [
{
"type": "text",
"text": "what is the surface color that the object is placed on?",
},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}"
},
},
],
}]
response = rails.generate(messages=messages)
print(json.dumps(response, indent=2))