Skip to content

backend

backend

Abstract generator backend.

Classes:

Name Description
GeneratorBackend

Abstract base class for generation backends.

GeneratorBackend

Abstract base class for generation backends.

Lifecycle: initialize -> prepare_params -> generate [-> generate ...] -> teardown.

teardown must be idempotent and safe to call multiple times. Callers should use try/finally to guarantee teardown runs even if generate raises. Each cleanup step should be isolated so one failure doesn't prevent the next from running.

Subclasses must implement initialize, prepare_params, generate, and teardown. The _torn_down guard flag pattern is recommended for teardown implementations.

Methods:

Name Description
initialize

Load the model and any required resources into memory.

prepare_params

Translate caller-supplied sampling parameters into a backend-native form.

generate

Run the batch generation loop and return aggregated results.

teardown

Release all resources held by this backend.

Attributes:

Name Type Description
gen_method Callable | None

Callable used internally for LLM generation.

gen_results GenerateJobResults

Results from the most recent generation run.

config SafeSynthesizerParameters

Pipeline configuration.

model_metadata ModelMetadata

Metadata for the fine-tuned model (prompt template, sequence length, adapter path, etc.).

remote bool

Whether the backend calls a remote inference endpoint.

elapsed_time float

Wall-clock duration of the last generation run in seconds.

workdir Workdir

Working directory containing model artifacts.

gen_method = None class-attribute instance-attribute

Callable used internally for LLM generation.

gen_results instance-attribute

Results from the most recent generation run.

config instance-attribute

Pipeline configuration.

model_metadata instance-attribute

Metadata for the fine-tuned model (prompt template, sequence length, adapter path, etc.).

remote instance-attribute

Whether the backend calls a remote inference endpoint.

elapsed_time instance-attribute

Wall-clock duration of the last generation run in seconds.

workdir instance-attribute

Working directory containing model artifacts.

initialize() abstractmethod

Load the model and any required resources into memory.

Called once before the first generate() invocation. Implementations should allocate GPU memory, instantiate the inference engine (e.g. vLLM), load LoRA adapters, and configure backend-specific settings such as attention backends or structured-output support.

After this method returns, the backend must be ready to accept prepare_params() and generate() calls.

Source code in src/nemo_safe_synthesizer/generation/backend.py
@abc.abstractmethod
def initialize(self) -> None:
    """Load the model and any required resources into memory.

    Called once before the first ``generate()`` invocation.
    Implementations should allocate GPU memory, instantiate the
    inference engine (e.g. vLLM), load LoRA adapters, and configure
    backend-specific settings such as attention backends or
    structured-output support.

    After this method returns, the backend must be ready to accept
    ``prepare_params()`` and ``generate()`` calls.
    """

prepare_params(**kwargs) abstractmethod

Translate caller-supplied sampling parameters into a backend-native form.

Resolves, validates, and transforms high-level generation parameters (temperature, top-p, max tokens, structured-output constraints, etc.) into the format expected by the underlying inference engine. The result is stored internally so that subsequent generate() calls use these settings.

Must be called after initialize() and before generate().

Parameters:

Name Type Description Default
**kwargs

Sampling parameters such as temperature, top_p, max_new_tokens, repetition_penalty, and backend-specific options.

{}
Source code in src/nemo_safe_synthesizer/generation/backend.py
@abc.abstractmethod
def prepare_params(self, **kwargs) -> None:
    """Translate caller-supplied sampling parameters into a backend-native form.

    Resolves, validates, and transforms high-level generation
    parameters (temperature, top-p, max tokens, structured-output
    constraints, etc.) into the format expected by the underlying
    inference engine.  The result is stored internally so that
    subsequent ``generate()`` calls use these settings.

    Must be called after ``initialize()`` and before ``generate()``.

    Args:
        **kwargs: Sampling parameters such as ``temperature``,
            ``top_p``, ``max_new_tokens``, ``repetition_penalty``,
            and backend-specific options.
    """

generate(data_actions_fn=None) abstractmethod

Run the batch generation loop and return aggregated results.

Repeatedly prompts the model, processes each batch through the configured Processor, and accumulates valid records until the target count is reached or a stopping condition fires (e.g. too many consecutive invalid batches). Progress and error statistics are logged after each batch.

Parameters:

Name Type Description Default
data_actions_fn DataActionsFn | None

Optional post-processing / validation function applied to each batch of generated records. Typically reverses training-time preprocessing and enforces user-specified data constraints.

None

Returns:

Type Description
GenerateJobResults

Results containing the generated DataFrame, validity

GenerateJobResults

statistics, and timing information.

Source code in src/nemo_safe_synthesizer/generation/backend.py
@abc.abstractmethod
def generate(
    self,
    data_actions_fn: utils.DataActionsFn | None = None,
) -> GenerateJobResults:
    """Run the batch generation loop and return aggregated results.

    Repeatedly prompts the model, processes each batch through the
    configured
    [`Processor`][nemo_safe_synthesizer.generation.processors.Processor],
    and accumulates valid records until the target count is reached
    or a stopping condition fires (e.g. too many consecutive invalid
    batches).  Progress and error statistics are logged after each
    batch.

    Args:
        data_actions_fn: Optional post-processing / validation
            function applied to each batch of generated records.
            Typically reverses training-time preprocessing and
            enforces user-specified data constraints.

    Returns:
        Results containing the generated DataFrame, validity
        statistics, and timing information.
    """

teardown() abstractmethod

Release all resources held by this backend.

Frees GPU memory, destroys distributed process groups, and cleans up any temporary state. Must be idempotent -- safe to call multiple times. Implementations should use the _torn_down guard flag and isolate each cleanup step so one failure doesn't prevent subsequent cleanup.

Callers should wrap generate() in try/finally to guarantee this runs even when generation raises.

Source code in src/nemo_safe_synthesizer/generation/backend.py
@abc.abstractmethod
def teardown(self) -> None:
    """Release all resources held by this backend.

    Frees GPU memory, destroys distributed process groups, and
    cleans up any temporary state.  Must be idempotent -- safe to
    call multiple times.  Implementations should use the
    ``_torn_down`` guard flag and isolate each cleanup step so one
    failure doesn't prevent subsequent cleanup.

    Callers should wrap ``generate()`` in ``try/finally`` to
    guarantee this runs even when generation raises.
    """