Skip to content

config_builder

config_builder

Builder-pattern configuration layer for Safe Synthesizer.

Provides ConfigBuilder, the base builder that accumulates per-section configuration objects (training, generation, data, etc.) via fluent with_* methods before resolving them into a single SafeSynthesizerParameters.

Classes:

Name Description
ConfigBuilder

Fluent builder for assembling Safe Synthesizer configuration.

ConfigBuilder(config=None)

Bases: object

Fluent builder for assembling Safe Synthesizer configuration.

Accumulates per-section configuration objects (data, training, generation, evaluation, privacy, PII replacement, and time-series) via with_* methods. Call resolve() (or let SafeSynthesizer do it) to collapse them into a single SafeSynthesizerParameters.

Each with_* method accepts an optional typed config object or a plain dict, plus **kwargs overrides. kwargs always take precedence over fields in the config/dict. All with_* methods return self for chaining.

Parameters:

Name Type Description Default
config SafeSynthesizerParameters | None

Optional pre-built parameters. When supplied, the individual _*_config attributes are seeded from its sections.

None

Methods:

Name Description
with_data_source

Set the data source for synthetic data generation.

with_data

Configure data processing settings.

with_train

Configure training hyperparameters.

with_generate

Configure generation settings.

with_time_series

Configure time-series synthesis settings.

with_differential_privacy

Configure differential privacy settings.

with_replace_pii

Configure PII replacement settings.

with_evaluate

Configure evaluation settings.

resolve

Finalize configuration and data source.

Source code in src/nemo_safe_synthesizer/sdk/config_builder.py
def __init__(self, config: SafeSynthesizerParameters | None = None) -> None:
    self._nss_config: SafeSynthesizerParameters | None = config
    if self._nss_config is not None:
        self._evaluation_config = self._nss_config.evaluation
        self._replace_pii_config = self._nss_config.replace_pii
        self._privacy_config: DifferentialPrivacyHyperparams | None = self._nss_config.privacy
        self._training_config = self._nss_config.training
        self._generation_config = self._nss_config.generation
        self._data_config = self._nss_config.data
        self._time_series_config = self._nss_config.time_series
    else:
        self._data_config: DataParameters = DataParameters()
        self._evaluation_config: EvaluationParameters = EvaluationParameters()
        self._generation_config: GenerateParameters = GenerateParameters()
        self._replace_pii_config: PiiReplacerConfig | None = PiiReplacerConfig.get_default_config()
        self._privacy_config: DifferentialPrivacyHyperparams = DifferentialPrivacyHyperparams()
        self._training_config: TrainingHyperparams = TrainingHyperparams()
        self._time_series_config: TimeSeriesParameters = TimeSeriesParameters()

    self._data_source: DataSource | None = None
    self._classify_model_provider: str | None = None
    self._hf_token_secret: str | None = None
    self._nss_inputs: list[str] = [
        "_data_config",
        "_evaluation_config",
        "_generation_config",
        "_replace_pii_config",
        "_privacy_config",
        "_training_config",
        "_time_series_config",
    ]

with_data_source(df_source)

Set the data source for synthetic data generation.

Parameters:

Name Type Description Default
df_source DataSource

Training dataset as a pandas DataFrame or a fetchable URL.

required

Returns:

Type Description
Self

This builder instance with the data source configured.

Source code in src/nemo_safe_synthesizer/sdk/config_builder.py
def with_data_source(self, df_source: DataSource) -> Self:
    """Set the data source for synthetic data generation.

    Args:
        df_source: Training dataset as a pandas DataFrame or a fetchable URL.

    Returns:
        This builder instance with the data source configured.
    """
    self._data_source = df_source
    return self

with_data(config=None, **kwargs)

Configure data processing settings.

Parameters:

Name Type Description Default
config DataParameters | ParamDict | None

Data configuration object or dict.

None
**kwargs

Field-level overrides (e.g. holdout_size).

{}

Returns:

Type Description
Self

This builder instance with data processing settings applied.

Source code in src/nemo_safe_synthesizer/sdk/config_builder.py
def with_data(self, config: DataParameters | ParamDict | None = None, **kwargs) -> Self:
    """Configure data processing settings.

    Args:
        config: Data configuration object or dict.
        **kwargs: Field-level overrides (e.g. ``holdout_size``).

    Returns:
        This builder instance with data processing settings applied.
    """
    self._data_config: DataParameters | None = self._resolve_config(values=config, cls=DataParameters, **kwargs)
    return self

with_train(config=None, **kwargs)

Configure training hyperparameters.

Parameters:

Name Type Description Default
config TrainingHyperparams | ParamDict | None

Training configuration object or dict.

None
**kwargs

Field-level overrides (e.g. learning_rate).

{}

Returns:

Type Description
Self

This builder instance with training hyperparameters applied.

Source code in src/nemo_safe_synthesizer/sdk/config_builder.py
def with_train(self, config: TrainingHyperparams | ParamDict | None = None, **kwargs) -> Self:
    """Configure training hyperparameters.

    Args:
        config: Training configuration object or dict.
        **kwargs: Field-level overrides (e.g. ``learning_rate``).

    Returns:
        This builder instance with training hyperparameters applied.
    """
    self._training_config: TrainingHyperparams | None = self._resolve_config(
        values=config, cls=TrainingHyperparams, **kwargs
    )
    return self

with_generate(config=None, **kwargs)

Configure generation settings.

Parameters:

Name Type Description Default
config GenerateParameters | ParamDict | None

Generation configuration object or dict.

None
**kwargs

Field-level overrides (e.g. num_records).

{}

Returns:

Type Description
Self

This builder instance with generation settings applied.

Source code in src/nemo_safe_synthesizer/sdk/config_builder.py
def with_generate(self, config: GenerateParameters | ParamDict | None = None, **kwargs) -> Self:
    """Configure generation settings.

    Args:
        config: Generation configuration object or dict.
        **kwargs: Field-level overrides (e.g. ``num_records``).

    Returns:
        This builder instance with generation settings applied.
    """
    self._generation_config: GenerateParameters | None = self._resolve_config(
        values=config, cls=GenerateParameters, **kwargs
    )
    return self

with_time_series(config=None, **kwargs)

Configure time-series synthesis settings.

Parameters:

Name Type Description Default
config TimeSeriesParameters | ParamDict | None

Time-series configuration object or dict.

None
**kwargs

Field-level overrides (e.g. time_column).

{}

Returns:

Type Description
Self

This builder instance with time-series synthesis settings applied.

Source code in src/nemo_safe_synthesizer/sdk/config_builder.py
def with_time_series(self, config: TimeSeriesParameters | ParamDict | None = None, **kwargs) -> Self:
    """Configure time-series synthesis settings.

    Args:
        config: Time-series configuration object or dict.
        **kwargs: Field-level overrides (e.g. ``time_column``).

    Returns:
        This builder instance with time-series synthesis settings applied.
    """
    self._time_series_config: TimeSeriesParameters | None = self._resolve_config(
        values=config, cls=TimeSeriesParameters, **kwargs
    )
    return self

with_differential_privacy(config=None, **kwargs)

Configure differential privacy settings.

Parameters:

Name Type Description Default
config DifferentialPrivacyHyperparams | ParamDict | None

DP configuration object or dict.

None
**kwargs

Field-level overrides (e.g. epsilon).

{}

Returns:

Type Description
Self

This builder instance with differential privacy settings applied.

Source code in src/nemo_safe_synthesizer/sdk/config_builder.py
def with_differential_privacy(
    self, config: DifferentialPrivacyHyperparams | ParamDict | None = None, **kwargs
) -> Self:
    """Configure differential privacy settings.

    Args:
        config: DP configuration object or dict.
        **kwargs: Field-level overrides (e.g. ``epsilon``).

    Returns:
        This builder instance with differential privacy settings applied.
    """
    self._privacy_config: DifferentialPrivacyHyperparams | None = self._resolve_config(
        values=config, cls=DifferentialPrivacyHyperparams, **kwargs
    )
    return self

with_replace_pii(config=None, *, enable=True, **kwargs)

Configure PII replacement settings.

Falls back to PiiReplacerConfig.get_default_config() when config is None. Pass enable=False to explicitly disable PII replacement for this run -- this sets replace_pii=None, which is the sole disabled signal.

Note: PII replacement uses replace_pii=None as the disabled signal rather than a PiiReplacerConfig.enabled boolean field. This differs from EvaluationConfig.enabled but is intentional: PiiReplacerConfig has a non-trivial default_factory that must fire when the field is absent from a YAML config. Adding an enabled boolean inside the sub-config would require a model_validator to reconcile the two signals and would not interact cleanly with Pydantic's exclude_unset semantics used in from_params.

Parameters:

Name Type Description Default
config PiiReplacerConfig | ParamDict | None

PII replacement configuration object or dict.

None
enable bool

When False, disables PII replacement entirely and clears any previously set config.

True
**kwargs

Field-level overrides (e.g. classify).

{}

Returns:

Type Description
Self

This builder instance with PII replacement configured.

Raises:

Type Description
ValueError

If config is not a PiiReplacerConfig, dict, or None.

Example::

builder = SafeSynthesizer().with_data_source(your_dataframe).with_replace_pii(config=custom_pii_config)
Source code in src/nemo_safe_synthesizer/sdk/config_builder.py
def with_replace_pii(
    self, config: PiiReplacerConfig | ParamDict | None = None, *, enable: bool = True, **kwargs
) -> Self:
    """Configure PII replacement settings.

    Falls back to ``PiiReplacerConfig.get_default_config()`` when
    ``config`` is ``None``.  Pass ``enable=False`` to explicitly
    disable PII replacement for this run -- this sets
    ``replace_pii=None``, which is the sole disabled signal.

    Note: PII replacement uses ``replace_pii=None`` as the disabled
    signal rather than a ``PiiReplacerConfig.enabled`` boolean field.
    This differs from ``EvaluationConfig.enabled`` but is intentional:
    ``PiiReplacerConfig`` has a non-trivial ``default_factory`` that
    must fire when the field is absent from a YAML config.  Adding an
    ``enabled`` boolean inside the sub-config would require a
    ``model_validator`` to reconcile the two signals and would not
    interact cleanly with Pydantic's ``exclude_unset`` semantics used
    in ``from_params``.

    Args:
        config: PII replacement configuration object or dict.
        enable: When ``False``, disables PII replacement entirely
            and clears any previously set config.
        **kwargs: Field-level overrides (e.g. ``classify``).

    Returns:
        This builder instance with PII replacement configured.

    Raises:
        ValueError: If ``config`` is not a ``PiiReplacerConfig``,
            dict, or ``None``.

    Example::

        builder = SafeSynthesizer().with_data_source(your_dataframe).with_replace_pii(config=custom_pii_config)
    """
    if not enable:
        self._replace_pii_config = None
        return self

    cfg = None
    match config:
        case PiiReplacerConfig() as m:
            cfg = m.model_copy(update=kwargs, deep=True)
        case dict() as d:
            cfg = PiiReplacerConfig.model_validate(d).model_copy(update=kwargs, deep=True)
        case None:
            cfg = PiiReplacerConfig.get_default_config().model_copy(update=kwargs, deep=True)
        case _:
            raise ValueError(f"Config must be a PiiReplacerConfig, dict, or None, got {config!r}")

    self._replace_pii_config = cfg
    return self

with_evaluate(config=None, **kwargs)

Configure evaluation settings.

Parameters:

Name Type Description Default
config EvaluationParameters | ParamDict | None

Evaluation configuration object or dict.

None
**kwargs

Field-level overrides (e.g. enabled).

{}

Returns:

Type Description
Self

This builder instance with evaluation settings applied.

Source code in src/nemo_safe_synthesizer/sdk/config_builder.py
def with_evaluate(self, config: EvaluationParameters | ParamDict | None = None, **kwargs) -> Self:
    """Configure evaluation settings.

    Args:
        config: Evaluation configuration object or dict.
        **kwargs: Field-level overrides (e.g. ``enabled``).

    Returns:
        This builder instance with evaluation settings applied.
    """
    self._evaluation_config: EvaluationParameters | None = self._resolve_config(
        values=config, cls=EvaluationParameters, **kwargs
    )
    return self

resolve()

Finalize configuration and data source.

Assembles the individual _*_config sections into a single SafeSynthesizerParameters and converts the data source (URL string or DataFrame) into a DataFrame.

Returns:

Type Description
Self

This builder instance with all configuration sections finalized.

Source code in src/nemo_safe_synthesizer/sdk/config_builder.py
def resolve(self) -> Self:
    """Finalize configuration and data source.

    Assembles the individual ``_*_config`` sections into a single
    ``SafeSynthesizerParameters`` and converts the data source
    (URL string or DataFrame) into a ``DataFrame``.

    Returns:
        This builder instance with all configuration sections finalized.
    """
    self._resolve_nss_config()
    self._resolve_datasource()
    return self