Custom LLM Frameworks#

The NVIDIA NeMo Guardrails library has two layers of LLM extensibility: providers and frameworks. Most users only need the provider layer. This guide is for the smaller set of cases that need to replace the framework layer itself.

The Two-Layer Model#

Framework Layer (system-wide, swappable)
|-- DefaultFramework (built-in, all OpenAI-compatible HTTP)
|     |-- openai (provider)
|     |-- nim (provider)
|     |-- ollama (provider)
|     '-- <your custom provider>
|-- LangChainFramework (built-in, opt-in)
|     '-- LangChain providers
'-- <YourCustomFramework>
      '-- <your providers>

A provider is a name a user types as engine: in config.yml: a label your framework dispatches on. In DefaultFramework, openai, nim, and ollama are provider names that all dispatch to the same OpenAIChatModel runtime. They differ only in default base URLs and small per-provider conventions. In LangChainFramework, each provider name dispatches to its own LangChain class. Your framework decides whether multiple provider names share one runtime or each name has its own. Adding a provider is the right move when you want to plug in one new backend and the surrounding framework’s behavior is fine. For details, refer to Custom LLM Providers and Custom LLM Model.

A framework owns the entire LLM stack: how models are constructed, how providers are looked up, and how resources are released at shutdown. Adding a framework is the right move when you want to replace the entire stack (for example, route everything through LiteLLM, a proprietary in-house orchestrator, or a service mesh).

Decision	Pick a provider	Pick a framework
You need one new engine alongside the existing ones	Yes	No
You have one new HTTP backend with custom auth	Yes (subclass `OpenAICompatibleClient` if it is OpenAI-shaped)	No
You want all engines to flow through your own gateway	No	Yes
You want to disable LangChain entirely and replace it with LiteLLM	No	Yes
You want per-call observability hooks across every model	Maybe	Yes if you also need to control construction and shutdown

In practice almost every customization is a provider. A custom framework is reserved for the cases where you are replacing more than one engine and you need shared lifecycle management across them.

The LLMFramework Contract#

The protocol is nemoguardrails.types.LLMFramework and is @runtime_checkable, so callers can verify a framework with isinstance(instance, LLMFramework). As a Python Protocol, it expresses a contract. Nothing prevents you from passing an object that duck-types most of it, but the rest of the NVIDIA NeMo Guardrails library assumes both invariants below hold:

The registered object structurally matches the LLMFramework protocol (the four methods and their signatures listed below).
Its reset attribute is an async coroutine function. The registry awaits it directly during test teardown.

A custom framework implements four methods.

from typing import Any, Dict, List, Optional

from nemoguardrails import LLMModel


class MyFramework:
    def create_model(
        self,
        model_name: str,
        provider_name: str,
        model_kwargs: Optional[Dict[str, Any]] = None,
    ) -> LLMModel: ...

    def register_provider(self, name: str, provider_cls: Any) -> None: ...

    def get_provider_names(self) -> List[str]: ...

    async def reset(self) -> None: ...

`create_model`#

Called once per models: entry in config.yml when LLMRails builds its task models. model_name is the value of model:, provider_name is the value of engine:, and model_kwargs carries everything from the entry’s parameters block plus a few platform keys like mode. Your framework decides what provider_name means. Typically, you use it to dispatch to a specific LLMModel class or to pick provider-specific defaults. Return any object that implements LLMModel. For details, refer to Custom LLM Model.

The framework owns construction. It can cache and reuse expensive resources, such as HTTP clients, gRPC channels, and auth tokens. It can also inject defaults for headers, timeouts, and retries, or short-circuit on a registered custom provider. Review DefaultFramework and LangChainFramework for two contrasting implementations.

`register_provider`#

Called by user code (usually from a config.py) to add a custom class your framework should dispatch to. Implementations typically just record the class in an in-memory dict; create_model then checks that dict before falling back to its built-in dispatch.

`get_provider_names`#

Returns the list of provider names this framework knows about, including built-ins and anything registered at runtime. Used by tooling (nemoguardrails find_providers) and for debugging.

`reset`#

reset is called at process or test boundaries to release framework-owned resources. It must:

Close any pooled HTTP clients, gRPC channels, file handles, or database connections.
Clear any registered-provider state if you want a clean slate (some frameworks like DefaultFramework separate aclose from clear_providers and call both from reset; others may want to keep registrations).
Be idempotent: calling reset twice in a row must not raise.
Be safe to call from a running event loop. The registry awaits it directly with _areset_frameworks.

After reset, the instance must remain usable. New resources are constructed lazily on the next create_model call.

Today reset is invoked only by the test suite; the runtime does not call it on nemoguardrails server shutdown. Implement it for test isolation, not for production cleanup.

Minimal Working Example#

The example below is fully self-contained and runs end-to-end without any external dependencies. The model is an “echo” implementation that returns a fixed string for every prompt. Swap in real HTTP calls or SDK invocations after you verify that the registration and dispatch path works. Refer to custom-llm-model.md for the canonical httpx-based pattern.

Create a config directory my_config/ next to your smoke-test script with two files:

my_config/
├── config.py    # framework + LLMModel definitions, registered at import time
└── config.yml   # references the framework's engine name

my_config/config.py:

from typing import Any, Dict, List, Optional

from nemoguardrails import LLMModel, LLMResponse, LLMResponseChunk, register_framework, set_default_framework


class EchoLLMModel:
    """Returns a canned response. Useful as a skeleton or in offline tests."""

    def __init__(self, model: str, response: str = "echo", **kwargs: Any):
        self._model = model
        self._response = response
        self._default_kwargs = kwargs

    @property
    def model_name(self) -> str:
        return self._model

    @property
    def provider_name(self) -> Optional[str]:
        return "my_engine"

    @property
    def provider_url(self) -> Optional[str]:
        return None

    async def generate_async(self, prompt, *, stop=None, **kwargs) -> LLMResponse:
        return LLMResponse(content=self._response)

    async def stream_async(self, prompt, *, stop=None, **kwargs):
        yield LLMResponseChunk(delta_content=self._response)
        yield LLMResponseChunk(finish_reason="stop")


class MyFramework:
    def __init__(self):
        self._providers: Dict[str, Any] = {}

    def create_model(
        self,
        model_name: str,
        provider_name: str,
        model_kwargs: Optional[Dict[str, Any]] = None,
    ) -> LLMModel:
        kwargs = dict(model_kwargs) if model_kwargs else {}
        kwargs.pop("mode", None)
        if provider_name in self._providers:
            return self._providers[provider_name](model=model_name, **kwargs)
        return EchoLLMModel(model=model_name, **kwargs)

    def register_provider(self, name: str, provider_cls: Any) -> None:
        self._providers[name] = provider_cls

    def get_provider_names(self) -> List[str]:
        return sorted({"my_engine", *self._providers})

    async def reset(self) -> None:
        # Release any framework-scoped resources you hold (HTTP clients,
        # connection pools, caches). The echo framework only owns a registry
        # dict, so clearing it is sufficient. A real framework typically
        # closes a shared `httpx.AsyncClient` here.
        self._providers.clear()


register_framework("my", MyFramework())
set_default_framework("my")

my_config/config.yml:

models:
  - type: main
    engine: my_engine
    model: echo
    parameters:
      response: "echo from echo"

Trying it out#

Run a smoke test from the parent directory of my_config/. LLMRails imports config.py automatically, which triggers the register_framework and set_default_framework calls at the bottom of that file:

# smoke.py (next to my_config/)
from nemoguardrails import LLMRails, RailsConfig

config = RailsConfig.from_path("./my_config")
app = LLMRails(config)

result = app.generate(messages=[{"role": "user", "content": "hi"}])
print(result["content"])  # -> echo from echo

If the smoke test prints echo from echo, the framework is wired up. From there, replace EchoLLMModel.generate_async and stream_async with real backend calls. Refer to custom-llm-model.md.

After register_framework("my", MyFramework()), the framework is selectable in three ways:

Process-wide default at import time. Set the environment variable before importing the NVIDIA NeMo Guardrails library:
```
export NEMOGUARDRAILS_LLM_FRAMEWORK=my
```
The registry reads NEMOGUARDRAILS_LLM_FRAMEWORK at module load and uses it as the active framework name.
Programmatic flip in config.py. Call set_default_framework("my") after registering. All subsequent LLMRails constructions use it.
Targeted dispatch. If you want different frameworks for different model entries, route directly with framework.create_model in your own initialization code (advanced; not the standard path).

config.yml entries do not name the framework; they name a provider. The framework is implicit in whichever one is active.

models:
  - type: main
    engine: my_engine
    model: my-flagship-model
    parameters:
      temperature: 0.2

Reference Implementations#

Review these production-grade frameworks:

nemoguardrails/llm/frameworks/default.py: DefaultFramework. Pools OpenAICompatibleClient instances keyed on (base_url, api_key, timeouts, headers, query). Splits lifecycle into aclose (HTTP teardown), clear_providers (registry teardown), and reset (both, used in tests).
nemoguardrails/integrations/langchain/llm_adapter.py: LangChainFramework. Defers to nemoguardrails.integrations.langchain.providers for registration, calls init_langchain_model for construction, wraps the result in LangChainLLMAdapter. Has a no-op reset because the LangChain side has no pooled state of its own.
nemoguardrails/llm/frameworks/registry.py: register_framework, get_framework, set_default_framework, get_default_framework, _areset_frameworks. Read this to understand the environment variable, lazy lookup, and registration behavior.

Failure Modes#

Registering a provider before any framework is active#

register_provider from nemoguardrails.llm.providers resolves the active framework with get_default_framework() and calls framework.register_provider on it. The registry has a built-in default framework that is constructed lazily on first access, so this almost always works without explicit setup. The failure mode appears only when the user sets NEMOGUARDRAILS_LLM_FRAMEWORK to a name that has not been registered yet:

export NEMOGUARDRAILS_LLM_FRAMEWORK=my

# config.py runs BEFORE `register_framework("my", ...)`
from nemoguardrails.llm.providers import register_provider

register_provider("echo", EchoLLMModel)
# KeyError: Unknown framework 'my'. Available frameworks: []

The fix is simple: register the framework before any provider, or keep NEMOGUARDRAILS_LLM_FRAMEWORK unset until after register_framework has run.

Unknown framework on activation#

set_default_framework("typo")
# KeyError: Unknown framework 'typo'. Register it first or use one of: ['default', 'langchain']

The two built-in names always appear in this hint because the registry knows them by default. If you are working with only your own framework, register it first then call set_default_framework.

Best Practices#

Treat reset as a hard contract, not a hint. Test it. Pooled HTTP connections that survive across tests cause surprising flakes elsewhere.
Prefer composition over inheritance. MyFramework does not need to subclass DefaultFramework. The protocol is small enough to implement from scratch.
Pool HTTP clients on the framework when multiple models: entries share a backend. create_model runs once per entry at LLMRails startup, so a model can safely build its own client. When two entries point at the same backend, only the framework can deduplicate them. DefaultFramework._get_or_create_client keys clients by (base_url, api_key, ...) for exactly this case.
Do not import LangChain in a default-framework-style implementation. The whole point of swapping the framework layer is to avoid pulling in dependencies you do not need. Keep your imports tight.
Document your framework’s provider taxonomy. get_provider_names is what nemoguardrails find_providers shows users.