Custom LLM Providers#

Note

This guide covers LangChain-based custom chat providers (BaseChatModel) and applies when NEMOGUARDRAILS_LLM_FRAMEWORK=langchain is set. It was the only extension path before 0.22. For the built-in client (the 0.22+ default), implement the LLMModel Protocol instead, see Custom LLM Model.

Text completion providers (BaseLLM) and the register_llm_provider helper were removed in 0.23.0. Custom providers must subclass BaseChatModel.

NeMo Guardrails supports one type of custom LLM provider:

Type

Base Class

Input

Output

Chat Model

BaseChatModel

List of messages

Message response

Chat Models (BaseChatModel)#

For models that work with message-based conversations:

from typing import Any, List, Optional

from langchain_core.callbacks.manager import CallbackManagerForLLMRun
from langchain_core.language_models import BaseChatModel
from langchain_core.messages import AIMessage, BaseMessage
from langchain_core.outputs import ChatGeneration, ChatResult

from nemoguardrails.llm.providers import register_chat_provider


class MyCustomChatModel(BaseChatModel):
    """Custom chat model."""

    @property
    def _llm_type(self) -> str:
        return "my_custom_chat"

    def _generate(
        self,
        messages: List[BaseMessage],
        stop: Optional[List[str]] = None,
        run_manager: Optional[CallbackManagerForLLMRun] = None,
        **kwargs: Any,
    ) -> ChatResult:
        """Synchronous chat completion."""
        # Convert messages to your model's format
        response_text = "Generated chat response"

        message = AIMessage(content=response_text)
        generation = ChatGeneration(message=message)
        return ChatResult(generations=[generation])

    async def _agenerate(
        self,
        messages: List[BaseMessage],
        stop: Optional[List[str]] = None,
        run_manager: Optional[CallbackManagerForLLMRun] = None,
        **kwargs: Any,
    ) -> ChatResult:
        """Asynchronous chat completion (recommended)."""
        response_text = "Generated chat response"

        message = AIMessage(content=response_text)
        generation = ChatGeneration(message=message)
        return ChatResult(generations=[generation])


# Register the provider
register_chat_provider("my_custom_chat", MyCustomChatModel)

Using Custom Providers#

After registering your custom provider in config.py, use it in config.yml:

models:
  - type: main
    engine: my_custom_chat
    model: optional-model-name

Required and Optional Methods#

BaseChatModel Methods#

Method

Required

Description

_generate

Yes

Synchronous chat completion

_llm_type

Yes

Returns the LLM type identifier

_agenerate

Recommended

Asynchronous chat completion

_stream

Optional

Streaming chat completion

_astream

Optional

Async streaming chat completion

Best Practices#

  1. Implement async methods: For better performance, always implement _agenerate.

  2. Import from langchain-core: Always import the base class from langchain_core.language_models.

  3. Use the chat registration helper: Call register_chat_provider() to register your BaseChatModel subclass.