Deploy NemoGuard NIMs¶
NemoGuard NIMs are specialized models built for specific use cases supported by the Guardrails service. Learn how to deploy NemoGuard NIMs in your environment and apply them to a guardrail configuration.
| NIM | Use Case |
|---|---|
nvidia/llama-3.1-nemotron-safety-guard-8b-v3 |
Content safety: classifies inputs and outputs as safe or unsafe across 23 content categories |
nvidia/llama-3.1-nemoguard-8b-topic-control |
Topic control: restricts conversations to a defined set of allowed topics |
nvidia/nemoguard-jailbreak-detect |
Jailbreak detection: detects prompt injection and jailbreak attempts |
Prerequisites¶
Before you begin:
- You have access to a running NeMo Platform.
NMP_BASE_URLis set to the NeMo Platform base URL.- Your infrastructure has 1 GPU available per NIM deployment.
Step 1: Configure the Client¶
Instantiate the NeMoPlatform SDK.
import os
from nemo_platform import NeMoPlatform
client = NeMoPlatform(
base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
workspace="default",
)
Step 2: Deploy the NIMs¶
Use the Platform's Inference Gateway service to deploy each NIM. This process creates a DeploymentConfig that specifies the NIM image, and a Deployment that runs it.
Enabling KV cache reuse on the LLM-based NIMs could improve inference speed. These examples enable this feature by setting NIM_ENABLE_KV_CACHE_REUSE=1 via the nim_deployment.additional_envs option.
Deploy a Content-Safety NIM¶
nemo inference deployment-configs create \
--name "nemotron-safety-guard-config" \
--nim-deployment '{
"gpu": 1,
"image_name": "nvcr.io/nim/nvidia/llama-3.1-nemotron-safety-guard-8b-v3",
"image_tag": "1.14.0",
"additional_envs": {"NIM_ENABLE_KV_CACHE_REUSE": "1"}
}'
nemo inference deployments create \
--name "nemotron-safety-guard" \
--config "nemotron-safety-guard-config"
nemo wait inference deployment nemotron-safety-guard
client.inference.deployment_configs.create(
name="nemotron-safety-guard-config",
nim_deployment={
"gpu": 1,
"image_name": "nvcr.io/nim/nvidia/llama-3.1-nemotron-safety-guard-8b-v3",
"image_tag": "1.14.0",
"additional_envs": {
"NIM_ENABLE_KV_CACHE_REUSE": "1",
},
},
)
client.inference.deployments.create(
name="nemotron-safety-guard",
config="nemotron-safety-guard-config",
)
client.models.wait_for_status(
deployment_name="nemotron-safety-guard",
desired_status="READY",
)
print("Content safety NIM ready")
Deploy a Topic-Control NIM¶
nemo inference deployment-configs create \
--name "nemoguard-topic-control-config" \
--nim-deployment '{
"gpu": 1,
"image_name": "nvcr.io/nim/nvidia/llama-3.1-nemoguard-8b-topic-control",
"image_tag": "1.10.1",
"additional_envs": {"NIM_ENABLE_KV_CACHE_REUSE": "1"}
}'
nemo inference deployments create \
--name "nemoguard-topic-control" \
--config "nemoguard-topic-control-config"
nemo wait inference deployment nemoguard-topic-control
client.inference.deployment_configs.create(
name="nemoguard-topic-control-config",
nim_deployment={
"gpu": 1,
"image_name": "nvcr.io/nim/nvidia/llama-3.1-nemoguard-8b-topic-control",
"image_tag": "1.10.1",
"additional_envs": {
"NIM_ENABLE_KV_CACHE_REUSE": "1",
},
},
)
client.inference.deployments.create(
name="nemoguard-topic-control",
config="nemoguard-topic-control-config",
)
client.models.wait_for_status(
deployment_name="nemoguard-topic-control",
desired_status="READY",
)
print("Topic control NIM ready")
Deploy a Jailbreak-Detection NIM¶
nemo inference deployment-configs create \
--name "nemoguard-jailbreak-config" \
--nim-deployment '{
"gpu": 1,
"image_name": "nvcr.io/nim/nvidia/nemoguard-jailbreak-detect",
"image_tag": "1.10.1"
}'
nemo inference deployments create \
--name "nemoguard-jailbreak" \
--config "nemoguard-jailbreak-config"
nemo wait inference deployment nemoguard-jailbreak
client.inference.deployment_configs.create(
name="nemoguard-jailbreak-config",
nim_deployment={
"gpu": 1,
"image_name": "nvcr.io/nim/nvidia/nemoguard-jailbreak-detect",
"image_tag": "1.10.1",
},
)
client.inference.deployments.create(
name="nemoguard-jailbreak",
config="nemoguard-jailbreak-config",
)
client.models.wait_for_status(
deployment_name="nemoguard-jailbreak",
desired_status="READY",
)
print("Jailbreak detection NIM ready")
Step 3: Verify the Model Entity Names¶
After the content safety and topic control NIMs are deployed, the Inference Gateway discovers the models served by each NIM and registers them as Model Entities in your workspace. Use these entities in guardrail configurations with the workspace/name format.
List all Model Entities in your workspace to find the names:
models = client.models.list(workspace="default")
for model in models:
print(f"{model.workspace}/{model.name}")
The NemoGuard NIMs register Model Entities with the following default names:
| NIM | Model Entity Reference |
|---|---|
llama-3.1-nemotron-safety-guard-8b-v3 |
default/nvidia-llama-3-1-nemotron-safety-guard-8b-v3 |
llama-3.1-nemoguard-8b-topic-control |
default/nvidia-llama-3-1-nemoguard-8b-topic-control |
The jailbreak detection NIM exposes a /v1/classify endpoint rather than an OpenAI-compatible chat completions endpoint, so it does not register a Model Entity. Reference the NIM by setting nim_base_url to its Inference Gateway URL — see Step 4 below.
Step 4: Use the NIMs in Guardrail Configurations¶
Content Safety and Topic Control¶
Reference the Model Entities in your guardrail configuration using the workspace/name format. For a complete example combining content safety and topic control rails, see Executing Input and Output Rails in Parallel.
Jailbreak Detection¶
Configure the jailbreak detection NIM using the rails.config.jailbreak_detection field. Set nim_base_url to the Inference Gateway provider route exposed by the deployment you created in Step 2. The URL follows the pattern /apis/inference-gateway/v2/workspaces/{workspace}/provider/{deployment_name}/-/v1, where deployment_name matches the deployment name from Step 2.
config = client.guardrail.configs.create(
name="nemoguard-jailbreak-config",
description="Jailbreak detection using self-hosted NemoGuard NIM",
data={
"rails": {
"config": {
"jailbreak_detection": {
"nim_base_url": f"{os.environ['NMP_BASE_URL']}/apis/inference-gateway/v2/workspaces/default/provider/nemoguard-jailbreak/-/v1",
}
},
"input": {
"flows": ["jailbreak detection model"],
},
},
},
)
print(f"Created config: {config.name}")
Cleanup¶
nemo guardrail configs delete nemoguard-jailbreak-config
# Note: Deleting the deployment will free up its GPU(s) when complete
nemo inference deployments delete nemotron-safety-guard
nemo inference deployments delete nemoguard-topic-control
nemo inference deployments delete nemoguard-jailbreak
nemo wait inference deployment nemotron-safety-guard --status DELETED
nemo wait inference deployment nemoguard-topic-control --status DELETED
nemo wait inference deployment nemoguard-jailbreak --status DELETED
nemo inference deployment-configs delete nemotron-safety-guard-config
nemo inference deployment-configs delete nemoguard-topic-control-config
nemo inference deployment-configs delete nemoguard-jailbreak-config
client.guardrail.configs.delete(name="nemoguard-jailbreak-config")
# Note: Deleting the deployment will free up its GPU(s) when complete
client.inference.deployments.delete(name="nemotron-safety-guard")
client.inference.deployments.delete(name="nemoguard-topic-control")
client.inference.deployments.delete(name="nemoguard-jailbreak")
client.models.wait_for_status(
deployment_name="nemotron-safety-guard", desired_status="DELETED"
)
client.models.wait_for_status(
deployment_name="nemoguard-topic-control", desired_status="DELETED"
)
client.models.wait_for_status(
deployment_name="nemoguard-jailbreak", desired_status="DELETED"
)
client.inference.deployment_configs.delete(name="nemotron-safety-guard-config")
client.inference.deployment_configs.delete(name="nemoguard-topic-control-config")
client.inference.deployment_configs.delete(name="nemoguard-jailbreak-config")
print("Cleanup complete")
Next Steps¶
- Improving Content Safety with NemoGuard NIMs - Full content safety tutorial using
build.nvidia.com-hosted NIMs - Executing Input and Output Rails in Parallel - Combine multiple rails for comprehensive safety coverage