Llama-Guard Integration#
NeMo Guardrails provides out-of-the-box support for content moderation using Meta’s Llama Guard model.
In our testing, we observe significantly improved input and output content moderation performance compared to the self-check method. Please see additional documentation for more details on the recommended deployment method and the performance evaluation numbers.
Usage#
To configure your bot to use Llama Guard for input/output checking, follow the below steps:
- Add a model of type - llama_guardto the models section of the- config.ymlfile (the example below uses a vLLM setup):
models:
  ...
  - type: llama_guard
    engine: vllm_openai
    parameters:
      openai_api_base: "http://localhost:5123/v1"
      model_name: "meta-llama/LlamaGuard-7b"
- Include the - llama guard check inputand- llama guard check outputflow names in the rails section of the- config.ymlfile:
rails:
  input:
    flows:
      - llama guard check input
  output:
    flows:
      - llama guard check output
- Define the - llama_guard_check_inputand the- llama_guard_check_outputprompts in the- prompts.ymlfile:
prompts:
  - task: llama_guard_check_input
    content: |
      <s>[INST] Task: ...
      <BEGIN UNSAFE CONTENT CATEGORIES>
      O1: ...
      O2: ...
  - task: llama_guard_check_output
    content: |
      <s>[INST] Task: ...
      <BEGIN UNSAFE CONTENT CATEGORIES>
      O1: ...
      O2: ...
The rails execute the llama_guard_check_* actions, which return True if the user input or the bot message should be allowed, and False otherwise, along with a list of the unsafe content categories as defined in the Llama Guard prompt.
define flow llama guard check input
  $llama_guard_response = execute llama_guard_check_input
  $allowed = $llama_guard_response["allowed"]
  $llama_guard_policy_violations = $llama_guard_response["policy_violations"]
  if not $allowed
    bot refuse to respond
    stop
# (similar flow for checking output)
A complete example configuration that uses Llama Guard for input and output moderation is provided in this example folder.