Caching Instructions and Prompts#

Memory Model Cache

Guardrails supports an in-memory cache that avoids making LLM calls for repeated prompts. The cache stores user prompts and their corresponding LLM responses. Prior to making an LLM call,…

Memory Model Cache
KV Cache Reuse for NemoGuard NIM

When you configure NeMo Guardrails to call NemoGuard NIMs in response to a client request, every NIM call interjecting the input and response adds to the inference latency. The application LLM can…

KV Cache Reuse for NemoGuard NIM