Caching Instructions and Prompts#

Memory Model Cache

Configure in-memory caching to avoid repeated LLM calls for identical prompts using LFU eviction.

Memory Model Cache
KV Cache Reuse for NemoGuard NIM

Enable KV cache reuse in NVIDIA NIM for LLMs to reduce inference latency for NemoGuard models.

KV Cache Reuse for NemoGuard NIM