Caching Instructions and Prompts#
Memory Model Cache
Guardrails supports an in-memory cache that avoids making LLM calls for repeated prompts. The cache stores user prompts and their corresponding LLM responses. Prior to making an LLM call,…
KV Cache Reuse for NemoGuard NIM
When you configure NeMo Guardrails to call NemoGuard NIMs in response to a client request, every NIM call interjecting the input and response adds to the inference latency. The application LLM can…