Skip to content

Deploy NemoGuard NIMs

NemoGuard NIMs are specialized models built for specific use cases supported by the Guardrails service. Learn how to deploy NemoGuard NIMs in your environment and apply them to a guardrail configuration.

NIM Use Case
nvidia/llama-3.1-nemotron-safety-guard-8b-v3 Content safety: classifies inputs and outputs as safe or unsafe across 23 content categories
nvidia/llama-3.1-nemoguard-8b-topic-control Topic control: restricts conversations to a defined set of allowed topics
nvidia/nemoguard-jailbreak-detect Jailbreak detection: detects prompt injection and jailbreak attempts

Prerequisites

Before you begin:

  • You have access to a running NeMo Platform.
  • NMP_BASE_URL is set to the NeMo Platform base URL.
  • Your infrastructure has 1 GPU available per NIM deployment.

Step 1: Configure the Client

Instantiate the NeMoPlatform SDK.

import os
from nemo_platform import NeMoPlatform

client = NeMoPlatform(
    base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
    workspace="default",
)

Step 2: Deploy the NIMs

Use the Platform's Inference Gateway service to deploy each NIM. This process creates a DeploymentConfig that specifies the NIM image, and a Deployment that runs it.

Enabling KV cache reuse on the LLM-based NIMs could improve inference speed. These examples enable this feature by setting NIM_ENABLE_KV_CACHE_REUSE=1 via the nim_deployment.additional_envs option.

Deploy a Content-Safety NIM

nemo inference deployment-configs create \
--name "nemotron-safety-guard-config" \
--nim-deployment '{
"gpu": 1,
"image_name": "nvcr.io/nim/nvidia/llama-3.1-nemotron-safety-guard-8b-v3",
"image_tag": "1.14.0",
"additional_envs": {"NIM_ENABLE_KV_CACHE_REUSE": "1"}
}'

nemo inference deployments create \
--name "nemotron-safety-guard" \
--config "nemotron-safety-guard-config"

nemo wait inference deployment nemotron-safety-guard
client.inference.deployment_configs.create(
    name="nemotron-safety-guard-config",
    nim_deployment={
        "gpu": 1,
        "image_name": "nvcr.io/nim/nvidia/llama-3.1-nemotron-safety-guard-8b-v3",
        "image_tag": "1.14.0",
        "additional_envs": {
            "NIM_ENABLE_KV_CACHE_REUSE": "1",
        },
    },
)

client.inference.deployments.create(
    name="nemotron-safety-guard",
    config="nemotron-safety-guard-config",
)

client.models.wait_for_status(
    deployment_name="nemotron-safety-guard",
    desired_status="READY",
)

print("Content safety NIM ready")

Deploy a Topic-Control NIM

nemo inference deployment-configs create \
--name "nemoguard-topic-control-config" \
--nim-deployment '{
"gpu": 1,
"image_name": "nvcr.io/nim/nvidia/llama-3.1-nemoguard-8b-topic-control",
"image_tag": "1.10.1",
"additional_envs": {"NIM_ENABLE_KV_CACHE_REUSE": "1"}
}'

nemo inference deployments create \
--name "nemoguard-topic-control" \
--config "nemoguard-topic-control-config"

nemo wait inference deployment nemoguard-topic-control
client.inference.deployment_configs.create(
    name="nemoguard-topic-control-config",
    nim_deployment={
        "gpu": 1,
        "image_name": "nvcr.io/nim/nvidia/llama-3.1-nemoguard-8b-topic-control",
        "image_tag": "1.10.1",
        "additional_envs": {
            "NIM_ENABLE_KV_CACHE_REUSE": "1",
        },
    },
)

client.inference.deployments.create(
    name="nemoguard-topic-control",
    config="nemoguard-topic-control-config",
)

client.models.wait_for_status(
    deployment_name="nemoguard-topic-control",
    desired_status="READY",
)

print("Topic control NIM ready")

Deploy a Jailbreak-Detection NIM

nemo inference deployment-configs create \
--name "nemoguard-jailbreak-config" \
--nim-deployment '{
"gpu": 1,
"image_name": "nvcr.io/nim/nvidia/nemoguard-jailbreak-detect",
"image_tag": "1.10.1"
}'

nemo inference deployments create \
--name "nemoguard-jailbreak" \
--config "nemoguard-jailbreak-config"

nemo wait inference deployment nemoguard-jailbreak
client.inference.deployment_configs.create(
    name="nemoguard-jailbreak-config",
    nim_deployment={
        "gpu": 1,
        "image_name": "nvcr.io/nim/nvidia/nemoguard-jailbreak-detect",
        "image_tag": "1.10.1",
    },
)

client.inference.deployments.create(
    name="nemoguard-jailbreak",
    config="nemoguard-jailbreak-config",
)

client.models.wait_for_status(
    deployment_name="nemoguard-jailbreak",
    desired_status="READY",
)

print("Jailbreak detection NIM ready")

Step 3: Verify the Model Entity Names

After the content safety and topic control NIMs are deployed, the Inference Gateway discovers the models served by each NIM and registers them as Model Entities in your workspace. Use these entities in guardrail configurations with the workspace/name format.

List all Model Entities in your workspace to find the names:

models = client.models.list(workspace="default")
for model in models:
    print(f"{model.workspace}/{model.name}")

The NemoGuard NIMs register Model Entities with the following default names:

NIM Model Entity Reference
llama-3.1-nemotron-safety-guard-8b-v3 default/nvidia-llama-3-1-nemotron-safety-guard-8b-v3
llama-3.1-nemoguard-8b-topic-control default/nvidia-llama-3-1-nemoguard-8b-topic-control

The jailbreak detection NIM exposes a /v1/classify endpoint rather than an OpenAI-compatible chat completions endpoint, so it does not register a Model Entity. Reference the NIM by setting nim_base_url to its Inference Gateway URL — see Step 4 below.


Step 4: Use the NIMs in Guardrail Configurations

Content Safety and Topic Control

Reference the Model Entities in your guardrail configuration using the workspace/name format. For a complete example combining content safety and topic control rails, see Executing Input and Output Rails in Parallel.

Jailbreak Detection

Configure the jailbreak detection NIM using the rails.config.jailbreak_detection field. Set nim_base_url to the Inference Gateway provider route exposed by the deployment you created in Step 2. The URL follows the pattern /apis/inference-gateway/v2/workspaces/{workspace}/provider/{deployment_name}/-/v1, where deployment_name matches the deployment name from Step 2.

config = client.guardrail.configs.create(
    name="nemoguard-jailbreak-config",
    description="Jailbreak detection using self-hosted NemoGuard NIM",
    data={
        "rails": {
            "config": {
                "jailbreak_detection": {
                    "nim_base_url": f"{os.environ['NMP_BASE_URL']}/apis/inference-gateway/v2/workspaces/default/provider/nemoguard-jailbreak/-/v1",
                }
            },
            "input": {
                "flows": ["jailbreak detection model"],
            },
        },
    },
)
print(f"Created config: {config.name}")

Cleanup

nemo guardrail configs delete nemoguard-jailbreak-config

# Note: Deleting the deployment will free up its GPU(s) when complete
nemo inference deployments delete nemotron-safety-guard
nemo inference deployments delete nemoguard-topic-control
nemo inference deployments delete nemoguard-jailbreak

nemo wait inference deployment nemotron-safety-guard --status DELETED
nemo wait inference deployment nemoguard-topic-control --status DELETED
nemo wait inference deployment nemoguard-jailbreak --status DELETED

nemo inference deployment-configs delete nemotron-safety-guard-config
nemo inference deployment-configs delete nemoguard-topic-control-config
nemo inference deployment-configs delete nemoguard-jailbreak-config
client.guardrail.configs.delete(name="nemoguard-jailbreak-config")

# Note: Deleting the deployment will free up its GPU(s) when complete
client.inference.deployments.delete(name="nemotron-safety-guard")
client.inference.deployments.delete(name="nemoguard-topic-control")
client.inference.deployments.delete(name="nemoguard-jailbreak")

client.models.wait_for_status(
    deployment_name="nemotron-safety-guard", desired_status="DELETED"
)
client.models.wait_for_status(
    deployment_name="nemoguard-topic-control", desired_status="DELETED"
)
client.models.wait_for_status(
    deployment_name="nemoguard-jailbreak", desired_status="DELETED"
)

client.inference.deployment_configs.delete(name="nemotron-safety-guard-config")
client.inference.deployment_configs.delete(name="nemoguard-topic-control-config")
client.inference.deployment_configs.delete(name="nemoguard-jailbreak-config")

print("Cleanup complete")

Next Steps