Custom LLM Frameworks#
The NVIDIA NeMo Guardrails library has two layers of LLM extensibility: providers and frameworks. Most users only need the provider layer. This guide is for the smaller set of cases that need to replace the framework layer itself.
The Two-Layer Model#
Framework Layer (system-wide, swappable)
|-- DefaultFramework (built-in, all OpenAI-compatible HTTP)
| |-- openai (provider)
| |-- nim (provider)
| |-- ollama (provider)
| '-- <your custom provider>
|-- LangChainFramework (built-in, opt-in)
| '-- LangChain providers
'-- <YourCustomFramework>
'-- <your providers>
A provider is a name a user types as engine: in config.yml: a label your framework dispatches on. In DefaultFramework, openai, nim, and ollama are provider names that all dispatch to the same OpenAIChatModel runtime. They differ only in default base URLs and small per-provider conventions. In LangChainFramework, each provider name dispatches to its own LangChain class. Your framework decides whether multiple provider names share one runtime or each name has its own. Adding a provider is the right move when you want to plug in one new backend and the surrounding framework’s behavior is fine. For details, refer to Custom LLM Providers and Custom LLM Model.
A framework owns the entire LLM stack: how models are constructed, how providers are looked up, and how resources are released at shutdown. Adding a framework is the right move when you want to replace the entire stack (for example, route everything through LiteLLM, a proprietary in-house orchestrator, or a service mesh).
Decision |
Pick a provider |
Pick a framework |
|---|---|---|
You need one new engine alongside the existing ones |
Yes |
No |
You have one new HTTP backend with custom auth |
Yes (subclass |
No |
You want all engines to flow through your own gateway |
No |
Yes |
You want to disable LangChain entirely and replace it with LiteLLM |
No |
Yes |
You want per-call observability hooks across every model |
Maybe |
Yes if you also need to control construction and shutdown |
In practice almost every customization is a provider. A custom framework is reserved for the cases where you are replacing more than one engine and you need shared lifecycle management across them.
The LLMFramework Contract#
The protocol is nemoguardrails.types.LLMFramework and is @runtime_checkable, so callers can verify a framework with isinstance(instance, LLMFramework). As a Python Protocol, it expresses a contract. Nothing prevents you from passing an object that duck-types most of it, but the rest of the NVIDIA NeMo Guardrails library assumes both invariants below hold:
The registered object structurally matches the
LLMFrameworkprotocol (the four methods and their signatures listed below).Its
resetattribute is anasynccoroutine function. The registry awaits it directly during test teardown.
A custom framework implements four methods.
from typing import Any, Dict, List, Optional
from nemoguardrails import LLMModel
class MyFramework:
def create_model(
self,
model_name: str,
provider_name: str,
model_kwargs: Optional[Dict[str, Any]] = None,
) -> LLMModel: ...
def register_provider(self, name: str, provider_cls: Any) -> None: ...
def get_provider_names(self) -> List[str]: ...
async def reset(self) -> None: ...
create_model#
Called once per models: entry in config.yml when LLMRails builds its task models. model_name is the value of model:, provider_name is the value of engine:, and model_kwargs carries everything from the entry’s parameters block plus a few platform keys like mode. Your framework decides what provider_name means. Typically, you use it to dispatch to a specific LLMModel class or to pick provider-specific defaults. Return any object that implements LLMModel. For details, refer to Custom LLM Model.
The framework owns construction. It can cache and reuse expensive resources, such as HTTP clients, gRPC channels, and auth tokens. It can also inject defaults for headers, timeouts, and retries, or short-circuit on a registered custom provider. Review DefaultFramework and LangChainFramework for two contrasting implementations.
register_provider#
Called by user code (usually from a config.py) to add a custom class your framework should dispatch to. Implementations typically just record the class in an in-memory dict; create_model then checks that dict before falling back to its built-in dispatch.
get_provider_names#
Returns the list of provider names this framework knows about, including built-ins and anything registered at runtime. Used by tooling (nemoguardrails find_providers) and for debugging.
reset#
reset is called at process or test boundaries to release framework-owned resources. It must:
Close any pooled HTTP clients, gRPC channels, file handles, or database connections.
Clear any registered-provider state if you want a clean slate (some frameworks like
DefaultFrameworkseparateaclosefromclear_providersand call both fromreset; others may want to keep registrations).Be idempotent: calling
resettwice in a row must not raise.Be safe to call from a running event loop. The registry awaits it directly with
_areset_frameworks.
After reset, the instance must remain usable. New resources are constructed lazily on the next create_model call.
Today reset is invoked only by the test suite; the runtime does not call it on nemoguardrails server shutdown. Implement it for test isolation, not for production cleanup.
Minimal Working Example#
The example below is fully self-contained and runs end-to-end without any
external dependencies. The model is an “echo” implementation that returns a
fixed string for every prompt. Swap in real HTTP calls or SDK invocations after
you verify that the registration and dispatch path works. Refer to
custom-llm-model.md for the canonical httpx-based pattern.
Create a config directory my_config/ next to your smoke-test script with
two files:
my_config/
├── config.py # framework + LLMModel definitions, registered at import time
└── config.yml # references the framework's engine name
my_config/config.py:
from typing import Any, Dict, List, Optional
from nemoguardrails import LLMModel, LLMResponse, LLMResponseChunk, register_framework, set_default_framework
class EchoLLMModel:
"""Returns a canned response. Useful as a skeleton or in offline tests."""
def __init__(self, model: str, response: str = "echo", **kwargs: Any):
self._model = model
self._response = response
self._default_kwargs = kwargs
@property
def model_name(self) -> str:
return self._model
@property
def provider_name(self) -> Optional[str]:
return "my_engine"
@property
def provider_url(self) -> Optional[str]:
return None
async def generate_async(self, prompt, *, stop=None, **kwargs) -> LLMResponse:
return LLMResponse(content=self._response)
async def stream_async(self, prompt, *, stop=None, **kwargs):
yield LLMResponseChunk(delta_content=self._response)
yield LLMResponseChunk(finish_reason="stop")
class MyFramework:
def __init__(self):
self._providers: Dict[str, Any] = {}
def create_model(
self,
model_name: str,
provider_name: str,
model_kwargs: Optional[Dict[str, Any]] = None,
) -> LLMModel:
kwargs = dict(model_kwargs) if model_kwargs else {}
kwargs.pop("mode", None)
if provider_name in self._providers:
return self._providers[provider_name](model=model_name, **kwargs)
return EchoLLMModel(model=model_name, **kwargs)
def register_provider(self, name: str, provider_cls: Any) -> None:
self._providers[name] = provider_cls
def get_provider_names(self) -> List[str]:
return sorted({"my_engine", *self._providers})
async def reset(self) -> None:
# Release any framework-scoped resources you hold (HTTP clients,
# connection pools, caches). The echo framework only owns a registry
# dict, so clearing it is sufficient. A real framework typically
# closes a shared `httpx.AsyncClient` here.
self._providers.clear()
register_framework("my", MyFramework())
set_default_framework("my")
my_config/config.yml:
models:
- type: main
engine: my_engine
model: echo
parameters:
response: "echo from echo"
Trying it out#
Run a smoke test from the parent directory of my_config/. LLMRails
imports config.py automatically, which triggers the register_framework
and set_default_framework calls at the bottom of that file:
# smoke.py (next to my_config/)
from nemoguardrails import LLMRails, RailsConfig
config = RailsConfig.from_path("./my_config")
app = LLMRails(config)
result = app.generate(messages=[{"role": "user", "content": "hi"}])
print(result["content"]) # -> echo from echo
If the smoke test prints echo from echo, the framework is wired up. From
there, replace EchoLLMModel.generate_async and stream_async with real
backend calls. Refer to custom-llm-model.md.
After register_framework("my", MyFramework()), the framework is selectable in three ways:
Process-wide default at import time. Set the environment variable before importing the NVIDIA NeMo Guardrails library:
export NEMOGUARDRAILS_LLM_FRAMEWORK=my
The registry reads
NEMOGUARDRAILS_LLM_FRAMEWORKat module load and uses it as the active framework name.Programmatic flip in
config.py. Callset_default_framework("my")after registering. All subsequentLLMRailsconstructions use it.Targeted dispatch. If you want different frameworks for different model entries, route directly with
framework.create_modelin your own initialization code (advanced; not the standard path).
config.yml entries do not name the framework; they name a provider. The framework is implicit in whichever one is active.
models:
- type: main
engine: my_engine
model: my-flagship-model
parameters:
temperature: 0.2
Reference Implementations#
Review these production-grade frameworks:
nemoguardrails/llm/frameworks/default.py:DefaultFramework. PoolsOpenAICompatibleClientinstances keyed on(base_url, api_key, timeouts, headers, query). Splits lifecycle intoaclose(HTTP teardown),clear_providers(registry teardown), andreset(both, used in tests).nemoguardrails/integrations/langchain/llm_adapter.py:LangChainFramework. Defers tonemoguardrails.integrations.langchain.providersfor registration, callsinit_langchain_modelfor construction, wraps the result inLangChainLLMAdapter. Has a no-opresetbecause the LangChain side has no pooled state of its own.nemoguardrails/llm/frameworks/registry.py:register_framework,get_framework,set_default_framework,get_default_framework,_areset_frameworks. Read this to understand the environment variable, lazy lookup, and registration behavior.
Failure Modes#
Registering a provider before any framework is active#
register_provider from nemoguardrails.llm.providers resolves the active framework with get_default_framework() and calls framework.register_provider on it. The registry has a built-in default framework that is constructed lazily on first access, so this almost always works without explicit setup. The failure mode appears only when the user sets NEMOGUARDRAILS_LLM_FRAMEWORK to a name that has not been registered yet:
export NEMOGUARDRAILS_LLM_FRAMEWORK=my
# config.py runs BEFORE `register_framework("my", ...)`
from nemoguardrails.llm.providers import register_provider
register_provider("echo", EchoLLMModel)
# KeyError: Unknown framework 'my'. Available frameworks: []
The fix is simple: register the framework before any provider, or keep NEMOGUARDRAILS_LLM_FRAMEWORK unset until after register_framework has run.
Unknown framework on activation#
set_default_framework("typo")
# KeyError: Unknown framework 'typo'. Register it first or use one of: ['default', 'langchain']
The two built-in names always appear in this hint because the registry knows them by default. If you are working with only your own framework, register it first then call set_default_framework.
Best Practices#
Treat
resetas a hard contract, not a hint. Test it. Pooled HTTP connections that survive across tests cause surprising flakes elsewhere.Prefer composition over inheritance.
MyFrameworkdoes not need to subclassDefaultFramework. The protocol is small enough to implement from scratch.Pool HTTP clients on the framework when multiple
models:entries share a backend.create_modelruns once per entry atLLMRailsstartup, so a model can safely build its own client. When two entries point at the same backend, only the framework can deduplicate them.DefaultFramework._get_or_create_clientkeys clients by(base_url, api_key, ...)for exactly this case.Do not import LangChain in a default-framework-style implementation. The whole point of swapping the framework layer is to avoid pulling in dependencies you do not need. Keep your imports tight.
Document your framework’s provider taxonomy.
get_provider_namesis whatnemoguardrails find_providersshows users.