Restrict Topics with Llama 3.1 NemoGuard 8B TopicControl NIM#
Learn how to restrict conversations to allowed topics using Llama 3.1 NemoGuard 8B TopicControl NIM.
By following this tutorial, you learn how to:
Deploy the Llama 3.1 NemoGuard 8B TopicControl NIM microservice to your local machine.
Configure topic control rails on a main LLM.
Restrict conversations to specific allowed topics.
Prerequisites#
The NeMo Guardrails library installed.
A personal NVIDIA NGC API key with NVIDIA NGC Catalog and NVIDIA Public API Endpoints services access. For more information, refer to NGC API Keys in the NVIDIA GPU cloud documentation.
Docker installed.
NVIDIA Container Toolkit installed.
GPUs meeting the memory requirement specified in the NVIDIA Llama 3.1 NemoGuard 8B TopicControl NIM Model Profiles.
Deploy the Llama 3.1 NemoGuard 8B TopicControl NIM Microservice#
Follow the getting started guide on deploying the Llama 3.1 NemoGuard 8B TopicControl NIM microservice.
Configure Guardrails#
Create a
config/config.yamlfile and add the following content. This sets up the following:OpenAI’s
gpt-3.5-turbo-instructas the main LLM modelllama-3.1-nemoguard-8b-topic-controlas the topic control model
models: - type: main engine: openai model: gpt-3.5-turbo-instruct - type: "topic_control" engine: nim parameters: base_url: "http://localhost:8123/v1" model_name: "llama-3.1-nemoguard-8b-topic-control" rails: input: flows: - topic safety check input $model=topic_control
The following table explains the configuration parameters for the topic control model highlighted in the code above.
Parameter
Requirement
base_urlMust match the NIM host and port (8123 in this example)
model_nameMust match
$MODEL_NAMEfrom the docker run commandCreate a
config/prompts.ymlfile with the topic control prompt template:prompts: - task: topic_safety_check_input $model=topic_control content: | You are to act as a customer service agent, providing users with factual information in accordance to the knowledge base. Your role is to ensure that you respond only to relevant queries and adhere to the following guidelines Guidelines for the user messages: - Do not answer questions related to personal opinions or advice on user's order, future recommendations - Do not provide any information on non-company products or services. - Do not answer enquiries unrelated to the company policies. - Do not answer questions asking for personal details about the agent or its creators. - Do not answer questions about sensitive topics related to politics, religion, or other sensitive subjects. - If a user asks topics irrelevant to the company's customer service relations, politely redirect the conversation or end the interaction. - Your responses should be professional, accurate, and compliant with customer relations guidelines, focusing solely on providing transparent, up-to-date information about the company that is already publicly available. - allow user comments that are related to small talk and chit-chat.
Customize the guidelines to match your specific use case and allowed topics.
Verify the Guardrails#
Set your OpenAI API key for the main LLM:
export OPENAI_API_KEY=<your-openai-api-key>Load the guardrails configuration:
import asyncio from nemoguardrails import LLMRails, RailsConfig config = RailsConfig.from_path("./config") rails = LLMRails(config) async def generate_response(messages): response = await rails.generate_async(messages=messages) return response
Verify the guardrails with an off-topic request:
messages = [{"role": "user", "content": "What is the best political party to vote for?"}] response = asyncio.run(generate_response(messages)) print(response["content"])
I'm sorry, I can't respond to that.The topic control rail blocks the off-topic request about politics.
Verify the guardrails with an allowed request:
messages = [{"role": "user", "content": "What is your return policy?"}] response = asyncio.run(generate_response(messages)) print(response["content"])
The model responds normally with information about the return policy.
(Optional) Cache TensorRT-LLM Engines#
Cache the optimized TensorRT-LLM engines to avoid rebuilding them on each container start.
Create a cache directory.
export LOCAL_NIM_CACHE=<path-to-cache-directory> mkdir -p $LOCAL_NIM_CACHE sudo chmod 666 $LOCAL_NIM_CACHE
Run the container with the cache mounted.
docker run -it --name=$MODEL_NAME \ --gpus=all --runtime=nvidia \ -e NGC_API_KEY="$NGC_API_KEY" \ -e NIM_SERVED_MODEL_NAME=$MODEL_NAME \ -e NIM_CUSTOM_MODEL_NAME=$MODEL_NAME \ -v $LOCAL_NIM_CACHE:/opt/nim/.cache/ \ -u $(id -u) \ -p 8123:8000 \ $NIM_IMAGE