Skip to content

parameters

parameters

Classes:

Name Description
SafeSynthesizerParameters

Main configuration class for the Safe Synthesizer pipeline.

SafeSynthesizerParameters pydantic-model

Bases: Parameters

Main configuration class for the Safe Synthesizer pipeline.

This is the top-level configuration class that orchestrates all aspects of synthetic data generation including training, generation, privacy, evaluation, and data handling. It provides validation to ensure parameter compatibility.

Fields:

Validators:

data pydantic-field

Configuration controlling how input data is grouped and split for training and evaluation.

evaluation pydantic-field

Parameters for evaluating the quality of generated synthetic data.

training pydantic-field

Hyperparameters for model training such as learning rate, batch size, and LoRA adapter settings.

generation pydantic-field

Parameters governing synthetic data generation including temperature, top-p, and number of records to produce.

privacy pydantic-field

Differential-privacy hyperparameters. When None, differential privacy is disabled entirely.

time_series pydantic-field

Configuration for time-series mode. Time-series pipeline is currently experimental.

replace_pii pydantic-field

PII replacement configuration. When None, PII replacement is skipped.

check_dp_compatibility(dp_params, info) pydantic-validator

Validate that DP-enabled configs have compatible data settings.

When DP is enabled, enforces that max_sequences_per_example is 1 (or "auto", which is resolved to 1) to bound per-example contribution. When DP is disabled but max_sequences_per_example is "auto", defaults it to 10.

The dp_enabled check runs before inspecting data so that an upstream data-section validation failure does not produce a misleading "Data parameters must be provided when DP is enabled" error when DP is actually disabled.

Raises:

Type Description
ParameterError

If DP is enabled and data parameters are missing, or max_sequences_per_example is not 1.

Source code in src/nemo_safe_synthesizer/config/parameters.py
@field_validator("privacy", mode="after", check_fields=False)
def check_dp_compatibility(
    cls, dp_params: DifferentialPrivacyHyperparams | None, info: ValidationInfo
) -> DifferentialPrivacyHyperparams | None:
    """Validate that DP-enabled configs have compatible data settings.

    When DP is enabled, enforces that ``max_sequences_per_example``
    is ``1`` (or ``"auto"``, which is resolved to ``1``) to bound
    per-example contribution. When DP is disabled but
    ``max_sequences_per_example`` is ``"auto"``, defaults it to
    ``10``.

    The ``dp_enabled`` check runs before inspecting ``data`` so that
    an upstream data-section validation failure does not produce a
    misleading "Data parameters must be provided when DP is enabled"
    error when DP is actually disabled.

    Raises:
        ParameterError: If DP is enabled and ``data`` parameters are
            missing, or ``max_sequences_per_example`` is not ``1``.
    """
    if dp_params is None:
        return dp_params
    logger.debug("Checking DP compatibility for privacy parameters. ")

    if not dp_params.dp_enabled:
        data: DataParameters | None = info.data.get("data")
        if data and data.max_sequences_per_example is not None and data.max_sequences_per_example == AUTO_STR:
            logger.debug("setting max_sequences_per_example to the default of 10 because DP is disabled")
            data.max_sequences_per_example = 10
        return dp_params

    data = info.data.get("data")
    if not data:
        raise ParameterError("Data parameters must be provided when DP is enabled.")

    match data.max_sequences_per_example:
        case "auto" | None:
            logger.info("Setting max_sequences_per_example to 1 because DP is enabled.")
            data.max_sequences_per_example = 1
        case None:
            data.max_sequences_per_example = 1
        case v if v not in [AUTO_STR, 1]:
            raise ParameterError(
                f"When enabling DP, max_sequences_per_example must be set to 1 or 'auto'. Received: {v}"
            )

    return dp_params

from_params(**kwargs) classmethod

Convert singular, flat parameters to nested structure.

Takes a flat dictionary of parameters, where keys correspond to attributes of the nested parameter classes, and constructs a SafeSynthesizerParameters instance with the appropriate nested structure, using default values for each subgroup that are not explicitly provided.

Args: **kwargs: Flat key-value pairs that map to attributes of the nested parameter classes (e.g., TrainingHyperparams, GenerateParameters).

Returns: A fully initialized SafeSynthesizerParameters instance with nested sub-configurations populated from the provided values.

Example

from nemo_safe_synthesizer.config import SafeSynthesizerParameters SafeSynthesizerParameters.from_params(use_structured_generation=True)

Source code in src/nemo_safe_synthesizer/config/parameters.py
@classmethod
def from_params(cls, **kwargs) -> "SafeSynthesizerParameters":
    """Convert singular, flat parameters to nested structure.

      Takes a flat dictionary of parameters, where keys correspond to
      attributes of the nested parameter classes, and constructs a
      ``SafeSynthesizerParameters`` instance with the appropriate nested
      structure, using default values for each subgroup that are not
      explicitly provided.

      Args:
          **kwargs: Flat key-value pairs that map to attributes of the
              nested parameter classes (e.g., ``TrainingHyperparams``,
              ``GenerateParameters``).

      Returns:
          A fully initialized ``SafeSynthesizerParameters`` instance with
          nested sub-configurations populated from the provided values.

    Example:
        >>> from nemo_safe_synthesizer.config import SafeSynthesizerParameters
        >>> SafeSynthesizerParameters.from_params(use_structured_generation=True)
    """
    thp = TrainingHyperparams().model_copy(update=kwargs)
    gp = GenerateParameters().model_copy(update=kwargs)
    ep = EvaluationParameters().model_copy(update=kwargs)
    pp = DifferentialPrivacyHyperparams().model_copy(update=kwargs)
    dp = DataParameters().model_copy(update=kwargs)
    tsp = TimeSeriesParameters().model_copy(update=kwargs)

    extra: dict[str, Any] = {
        "training": thp,
        "generation": gp,
        "evaluation": ep,
        "privacy": pp,
        "data": dp,
        "time_series": tsp,
    }
    if "replace_pii" in kwargs:
        extra["replace_pii"] = kwargs["replace_pii"]
    return cls(**extra)