Skip to content

parameters

parameters

Classes:

Name Description
SafeSynthesizerParameters

Main configuration class for the Safe Synthesizer pipeline.

SafeSynthesizerParameters pydantic-model

Bases: Parameters

Main configuration class for the Safe Synthesizer pipeline.

This is the top-level configuration class that orchestrates all aspects of synthetic data generation including training, generation, privacy, evaluation, and data handling. It provides validation to ensure parameter compatibility.

Fields:

Validators:

  • _validate_and_resolve_data_params
  • check_timeseries_group_column

data pydantic-field

Configuration controlling how input data is grouped and split for training and evaluation.

evaluation pydantic-field

Parameters for evaluating the quality of generated synthetic data.

training pydantic-field

Hyperparameters for model training such as learning rate, batch size, and LoRA adapter settings.

generation pydantic-field

Parameters governing synthetic data generation including temperature, top-p, and number of records to produce.

privacy pydantic-field

Differential-privacy hyperparameters. When None, differential privacy is disabled entirely.

time_series pydantic-field

Configuration for time-series mode. Time-series pipeline is currently experimental.

replace_pii pydantic-field

PII replacement configuration. When None, PII replacement is skipped.

preflight pydantic-field

Preflight validation overrides, including checks to skip via disabled_checks.

emit_telemetry pydantic-field

Whether to emit anonymous Safe Synthesizer telemetry events. Defaults from NEMO_TELEMETRY_ENABLED when unset.

from_params(**kwargs) classmethod

Convert singular, flat parameters to nested structure.

Takes a flat dictionary of parameters, where keys correspond to attributes of the nested parameter classes, and constructs a SafeSynthesizerParameters instance with the appropriate nested structure, using default values for each subgroup that are not explicitly provided.

Args: **kwargs: Flat key-value pairs that map to attributes of the nested parameter classes (e.g., TrainingHyperparams, GenerateParameters).

Returns: A fully initialized SafeSynthesizerParameters instance with nested sub-configurations populated from the provided values.

Example

from nemo_safe_synthesizer.config import SafeSynthesizerParameters SafeSynthesizerParameters.from_params(structured_generation={"enabled": True})

Source code in src/nemo_safe_synthesizer/config/parameters.py
@classmethod
def from_params(cls, **kwargs) -> "SafeSynthesizerParameters":
    """Convert singular, flat parameters to nested structure.

      Takes a flat dictionary of parameters, where keys correspond to
      attributes of the nested parameter classes, and constructs a
      ``SafeSynthesizerParameters`` instance with the appropriate nested
      structure, using default values for each subgroup that are not
      explicitly provided.

      Args:
          **kwargs: Flat key-value pairs that map to attributes of the
              nested parameter classes (e.g., ``TrainingHyperparams``,
              ``GenerateParameters``).

      Returns:
          A fully initialized ``SafeSynthesizerParameters`` instance with
          nested sub-configurations populated from the provided values.

    Example:
        >>> from nemo_safe_synthesizer.config import SafeSynthesizerParameters
        >>> SafeSynthesizerParameters.from_params(structured_generation={"enabled": True})
    """
    thp = TrainingHyperparams.model_validate(_section_values(kwargs, "training"))
    gp = GenerateParameters.model_validate(_section_values(kwargs, "generation"))
    ep = EvaluationParameters.model_validate(_section_values(kwargs, "evaluation"))
    pp = DifferentialPrivacyHyperparams.model_validate(_section_values(kwargs, "privacy"))
    dp = DataParameters.model_validate(_section_values(kwargs, "data"))
    tsp = TimeSeriesParameters.model_validate(_section_values(kwargs, "time_series"))

    extra: dict[str, Any] = {
        "training": thp,
        "generation": gp,
        "evaluation": ep,
        "privacy": pp,
        "data": dp,
        "time_series": tsp,
    }
    if "replace_pii" in kwargs:
        extra["replace_pii"] = kwargs["replace_pii"]
    if "preflight" in kwargs:
        extra["preflight"] = kwargs["preflight"]
    if "emit_telemetry" in kwargs:
        extra["emit_telemetry"] = kwargs["emit_telemetry"]
    return cls(**extra)

with_runtime_overrides(runtime)

Apply resume-time generation/evaluation/telemetry overrides onto a copy of self.

self is the saved training-run config. Only explicitly-set generation and evaluation fields from runtime are merged in, plus emit_telemetry when the caller set it. Training, data, privacy, and other sections are preserved so training provenance survives a generate-only resume.

Parameters:

Name Type Description Default
runtime SafeSynthesizerParameters

Config carrying resume-time CLI/SDK overrides. Typically sparse -- only the fields the caller set are applied.

required

Returns:

Type Description
'SafeSynthesizerParameters'

A new SafeSynthesizerParameters with overrides applied. The

'SafeSynthesizerParameters'

result is fully independent of self: sections that are not

'SafeSynthesizerParameters'

overridden are deep-copied, so later mutation of either object does

'SafeSynthesizerParameters'

not affect the other.

Source code in src/nemo_safe_synthesizer/config/parameters.py
def with_runtime_overrides(self, runtime: SafeSynthesizerParameters) -> "SafeSynthesizerParameters":
    """Apply resume-time generation/evaluation/telemetry overrides onto a copy of self.

    ``self`` is the saved training-run config. Only explicitly-set
    ``generation`` and ``evaluation`` fields from ``runtime`` are merged in,
    plus ``emit_telemetry`` when the caller set it. Training, data, privacy,
    and other sections are preserved so training provenance survives a
    generate-only resume.

    Args:
        runtime: Config carrying resume-time CLI/SDK overrides. Typically
            sparse -- only the fields the caller set are applied.

    Returns:
        A new ``SafeSynthesizerParameters`` with overrides applied. The
        result is fully independent of ``self``: sections that are not
        overridden are deep-copied, so later mutation of either object does
        not affect the other.
    """
    updates: dict[str, object] = {}
    # Only record sections that actually changed; unchanged sections are
    # deep-copied by ``model_copy(deep=True)`` below so the returned config
    # never shares mutable sub-objects with ``self``.
    generation = _overlay_set_fields(self.generation, runtime.generation)
    if generation is not self.generation:
        updates["generation"] = generation
    evaluation = _overlay_set_fields(self.evaluation, runtime.evaluation)
    if evaluation is not self.evaluation:
        updates["evaluation"] = evaluation
    # emit_telemetry is a top-level scalar: detect explicit assignment,
    # since there is no sub-model to inspect for set fields.
    if "emit_telemetry" in runtime.__pydantic_fields_set__:
        updates["emit_telemetry"] = runtime.emit_telemetry
    return self.model_copy(update=updates, deep=True)