Skip to content

parameters

parameters

Classes:

Name Description
SafeSynthesizerParameters

Main configuration class for the Safe Synthesizer pipeline.

SafeSynthesizerParameters pydantic-model

Bases: Parameters

Main configuration class for the Safe Synthesizer pipeline.

This is the top-level configuration class that orchestrates all aspects of synthetic data generation including training, generation, privacy, evaluation, and data handling. It provides validation to ensure parameter compatibility.

Fields:

Validators:

data pydantic-field

Configuration controlling how input data is grouped and split for training and evaluation.

evaluation pydantic-field

Parameters for evaluating the quality of generated synthetic data.

training pydantic-field

Hyperparameters for model training such as learning rate, batch size, and LoRA adapter settings.

generation pydantic-field

Parameters governing synthetic data generation including temperature, top-p, and number of records to produce.

privacy pydantic-field

Differential-privacy hyperparameters. When None, differential privacy is disabled entirely.

time_series pydantic-field

Configuration for time-series mode. Time-series pipeline is currently experimental.

replace_pii pydantic-field

PII replacement configuration. When None, PII replacement is skipped.

check_dp_compatibility(dp_params, info) pydantic-validator

Validate that DP-enabled configs have compatible data and training settings.

When DP is enabled, enforces that max_sequences_per_example is 1 (or "auto", which is resolved to 1) to bound per-example contribution, and that Unsloth is disabled since it is not yet compatible with DP-SGD. When DP is disabled but max_sequences_per_example is "auto", defaults it to 10.

Raises:

Type Description
ParameterError

If data or training parameters are missing, max_sequences_per_example is not 1, or Unsloth is enabled alongside DP.

Source code in src/nemo_safe_synthesizer/config/parameters.py
@field_validator("privacy", mode="after", check_fields=False)
def check_dp_compatibility(
    cls, dp_params: DifferentialPrivacyHyperparams | None, info: ValidationInfo
) -> DifferentialPrivacyHyperparams | None:
    """Validate that DP-enabled configs have compatible data and training settings.

    When DP is enabled, enforces that ``max_sequences_per_example``
    is ``1`` (or ``"auto"``, which is resolved to ``1``) to bound
    per-example contribution, and that Unsloth is disabled since it
    is not yet compatible with DP-SGD. When DP is disabled but
    ``max_sequences_per_example`` is ``"auto"``, defaults it to
    ``10``.

    Raises:
        ParameterError: If ``data`` or ``training`` parameters are
            missing, ``max_sequences_per_example`` is not ``1``, or
            Unsloth is enabled alongside DP.
    """
    if dp_params is None:
        return dp_params
    logger.debug("Checking DP compatibility for privacy parameters. ")
    # logger.debug(f"Privacy parameters: {dp_params}")
    data: DataParameters | None = info.data.get("data")
    if not data:
        raise ParameterError("Data parameters must be provided when DP is enabled.")

    if not dp_params.dp_enabled:
        if data.max_sequences_per_example is not None and data.max_sequences_per_example == AUTO_STR:
            logger.debug("setting max_sequences_per_example to the default of 10 because DP is disabled")
            data.max_sequences_per_example = 10
        return dp_params

    match data.max_sequences_per_example:
        # this should be a valid none or parameter[int|str|none]
        case "auto" | None:
            logger.info("Setting max_sequences_per_example to 1 because DP is enabled.")
            data.max_sequences_per_example = 1
        case None:
            data.max_sequences_per_example = 1
        case v if v not in [AUTO_STR, 1]:
            raise ParameterError(
                f"When enabling DP, max_sequences_per_example must be set to 1 or 'auto'. Received: {v}"
            )

    logger.debug("Checking Training compatibility for training parameters.")

    training: TrainingHyperparams | None = info.data.get("training")
    logger.debug(f"Training parameters: {training}")

    if not training:
        raise ParameterError("Training parameters must be provided when DP is enabled.")

    if training.use_unsloth not in [False, AUTO_STR]:
        raise ParameterError("Unsloth is currently not compatible with DP.")

    return dp_params

from_params(**kwargs) classmethod

Convert singular, flat parameters to nested structure.

Takes a flat dictionary of parameters, where keys correspond to attributes of the nested parameter classes, and constructs a SafeSynthesizerParameters instance with the appropriate nested structure, using default values for each subgroup that are not explicitly provided.

Args: **kwargs: Flat key-value pairs that map to attributes of the nested parameter classes (e.g., TrainingHyperparams, GenerateParameters).

Returns: A fully initialized SafeSynthesizerParameters instance with nested sub-configurations populated from the provided values.

Example

from nemo_safe_synthesizer.config import SafeSynthesizerParameters SafeSynthesizerParameters.from_params(use_structured_generation=True)

Source code in src/nemo_safe_synthesizer/config/parameters.py
@classmethod
def from_params(cls, **kwargs) -> "SafeSynthesizerParameters":
    """Convert singular, flat parameters to nested structure.

      Takes a flat dictionary of parameters, where keys correspond to
      attributes of the nested parameter classes, and constructs a
      ``SafeSynthesizerParameters`` instance with the appropriate nested
      structure, using default values for each subgroup that are not
      explicitly provided.

      Args:
          **kwargs: Flat key-value pairs that map to attributes of the
              nested parameter classes (e.g., ``TrainingHyperparams``,
              ``GenerateParameters``).

      Returns:
          A fully initialized ``SafeSynthesizerParameters`` instance with
          nested sub-configurations populated from the provided values.

    Example:
        >>> from nemo_safe_synthesizer.config import SafeSynthesizerParameters
        >>> SafeSynthesizerParameters.from_params(use_structured_generation=True)
    """
    thp = TrainingHyperparams().model_copy(update=kwargs)
    gp = GenerateParameters().model_copy(update=kwargs)
    ep = EvaluationParameters().model_copy(update=kwargs)
    pp = DifferentialPrivacyHyperparams().model_copy(update=kwargs)
    dp = DataParameters().model_copy(update=kwargs)
    tsp = TimeSeriesParameters().model_copy(update=kwargs)

    extra: dict[str, Any] = {
        "training": thp,
        "generation": gp,
        "evaluation": ep,
        "privacy": pp,
        "data": dp,
        "time_series": tsp,
    }
    if "replace_pii" in kwargs:
        extra["replace_pii"] = kwargs["replace_pii"]
    return cls(**extra)