Agentic Security#

Agentic security provides specialized guardrails for LLM-based agents that use tools and interact with external systems.

Injection Detection#

The NeMo Guardrails library offers detection of potential exploitation attempts by using injection such as code injection, cross-site scripting, SQL injection, and template injection. Injection detection is primarily intended to be used in agentic systems to enhance other security controls as part of a defense-in-depth strategy.

The first part of injection detection is YARA rules. A YARA rule specifies a set of strings (text or binary patterns) to match and a Boolean expression that specifies the logic of the rule. YARA rules are a technology that is familiar to many security teams.

The second part of injection detection is specifying the action to take when a rule is triggered. You can specify to reject the text and return “I’m sorry, the desired output triggered rule(s) designed to mitigate exploitation of {detections}.” Rejecting the output is the safest action and most appropriate for production deployments. As an alternative to rejecting the output, you can specify to omit the triggering text from the response.

About the Default Rules#

By default, the NeMo Guardrails library provides the following rules:

Code injection (Python): Recommended if the LLM output is used as an argument to downstream functions or passed to a code interpreter.
SQL injection: Recommended if the LLM output is used as part of a SQL query to a database.
Template injection (Jinja): Recommended for use if LLM output is rendered using the Jinja templating language. This rule is usually paired with code injection rules.
Cross-site scripting (Markdown and Javascript): Recommended if the LLM output is rendered directly in HTML or Markdown.

You can view the default rules in the yara_rules directory of the GitHub repository.

Configuring Injection Detection#

To activate injection detection, you must specify the rules to apply and the action to take as well as include the injection detection output flow. As an example config:

rails:
  config:
    injection_detection:
      injections:
        - code
        - sqli
        - template
        - xss
      action:
        reject

  output:
    flows:
      - injection detection

Refer to the following table for the rails.config.injection_detection field syntax reference:

Field	Description	Default Value
`injections`	Specifies the injection detection rules to use. The following injections are part of the library: `code` for Python code injection `sqli` for SQL injection `template` for Jinja template injection `xss` for cross-site scripting	None (required)
`action`	Specifies the action to take when injection is detected. Refer to the following actions: `reject` returns a message to the user indicating that the query could not be handled and they should try again. `omit` returns the model response, removing the offending detected content.	None (required)
`yara_path`	Specifies the path to a directory that contains custom YARA rules.	`library/injection_detection/yara_rules` in the NeMo Guardrails package.
`yara_rules`	Specifies inline YARA rules. The field is a dictionary that maps rule names to the rules. The rules use the string data type. yara_rules: <inline-rule-name>: \|- <inline-rule-content> If specified, these inline rules override the rules found in the `yara_path` field.	None

For information about writing YARA rules, refer to the YARA documentation.

Example#

Before you begin, install the yara-python package or you can install the NeMo Guardrails package with pip install nemoguardrails[jailbreak].

Set your NVIDIA API key as an environment variable:
```
$ export NVIDIA_API_KEY=<nvapi-...>
```

Create a configuration directory, such as config, and add a config.yml file with contents like the following:

models:
  - type: main
    engine: nvidia_ai_endpoints
    model: meta/llama-3.3-70b-instruct

rails:
  config:
    injection_detection:
      injections:
        - code
        - sqli
        - template
        - xss
      action: reject

Load the guardrails configuration:

from nemoguardrails import LLMRails, RailsConfig  # noqa: E402

config = RailsConfig.from_path("./config")
rails = LLMRails(config)

Send a possibly unsafe request:

messages = [
    {
        "role": "user",
        "content": "Write a Python script that uses the requests library, and maybe urllib, to get the weather in Santa Clara.",
    },
]

response = rails.generate(messages=messages)
print(response)

Example Output

{'role': 'assistant', 'content': '**Getting the Weather in Santa Clara using Python**\n=====================================================\n\nTo get the weather in Santa Clara, we can use the OpenWeatherMap API, which provides current and forecasted weather conditions. We will use the `requests` library to make an HTTP request to the API and the `json` library to parse the response.\n\n**Prerequisites**\n---------------\n\n* Python 3.x\n* `requests` library (`pip install requests`)\n* OpenWeatherMap API key (sign up for free at [OpenWeatherMap](https://home.openweathermap.org/users/sign_up))\n\n**Code**\n-----\n\n```python\nimport requests\nimport json\n\ndef get_weather(api_key, city, units=\'metric\'):\n    """\n    Get the current weather in a city.\n\n    Args:\n        api_key (str): OpenWeatherMap API key\n        city (str): City name\n        units (str, optional): Units of measurement (default: \'metric\')\n\n    Returns:\n        dict: Weather data\n    """\n    base_url = \'http://api.openweathermap.org/data/2.5/weather\'\n    params = {\n        \'q\': city,\n        \'units\': units,\n        \'appid\': api_key\n    }\n    response = requests.get(base_url, params=params)\n    response.raise_for_status()\n    return response.json()\n\ndef main():\n    api_key = \'YOUR_API_KEY\'  # replace with your OpenWeatherMap API key\n    city = \'Santa Clara\'\n    weather_data = get_weather(api_key, city)\n    print(\'Weather in {}:\'.format(city))\n    print(\'Temperature: {}°C\'.format(weather_data[\'main\'][\'temp\']))\n    print(\'Humidity: {}%\'.format(weather_data[\'main\'][\'humidity\']))\n    print(\'Conditions: {}\'.format(weather_data[\'weather\'][0][\'description\']))\n\nif __name__ == \'__main__\':\n    main()\n```\n\n**Explanation**\n--------------\n\n1. We import the required libraries: `requests` for making HTTP requests and `json` for parsing the response.\n2. We define a function `get_weather` that takes the API key, city name, and units of measurement as arguments.\n3. We construct the API URL and parameters using the `base_url` and `params` dictionary.\n4. We make a GET request to the API using `requests.get` and raise an exception if the response status code is not 200 (OK).\n5. We parse the response data using `response.json()` and return the result as a dictionary.\n6. In the `main` function, we set the API key and city name, call `get_weather`, and print the weather data.\n\n**Notes**\n-------\n\n* Replace `YOUR_API_KEY` with your actual OpenWeatherMap API key.\n* You can adjust the units of measurement by passing a different value for the `units` parameter (e.g., \'imperial\' for Fahrenheit).\n* This script only retrieves the current weather conditions. For forecasted weather, you can use the [OpenWeatherMap forecast API](https://openweathermap.org/forecast5).\n\nI hope this helps! Let me know if you have any questions or need further assistance.'}