library_builder
library_builder
¶
Executable pipeline for Safe Synthesizer.
Extends ConfigBuilder with the SafeSynthesizer class, which
adds artifact management (via Workdir) and stepwise pipeline
execution: process_data -> train -> generate -> evaluate.
Classes:
| Name | Description |
|---|---|
SafeSynthesizer |
Fluent builder and runner for Safe Synthesizer workflows. |
Functions:
| Name | Description |
|---|---|
get_training_backend_class |
Select the training backend class based on configuration. |
SafeSynthesizer(config=None, workdir=None, save_path=None)
¶
Bases: ConfigBuilder
Fluent builder and runner for Safe Synthesizer workflows.
Extends ConfigBuilder with artifact management and stepwise
pipeline execution. Run all at once via run(), or step by
step::
builder = SafeSynthesizer().with_data_source(df)
builder.process_data().train().generate().evaluate()
builder.save_results()
results = builder.results
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
SafeSynthesizerParameters | None
|
Optional pre-built parameters that seed every config section. |
None
|
workdir
|
Workdir | None
|
Explicit artifact directory layout. When |
None
|
save_path
|
Path | str | None
|
Root directory for artifacts when |
None
|
Example::
builder = (
SafeSynthesizer()
.with_data_source(df)
.with_replace_pii()
.with_train(learning_rate=0.0001)
.with_generate(num_records=10000)
)
builder.run()
results = builder.results
Methods:
| Name | Description |
|---|---|
load_from_save_path |
Load the Safe Synthesizer configuration from the save path. |
process_data |
Perform train/test split, auto-config resolution, and optional PII replacement. |
train |
Fine-tune the base model on the processed training data. |
generate |
Generate synthetic data using the trained model. |
evaluate |
Run quality and privacy evaluations and populate |
run |
Run the full pipeline and save results. |
save_results |
Save synthetic data, evaluation report, and metrics to the workdir. |
Attributes:
| Name | Type | Description |
|---|---|---|
trainer |
TrainingBackend
|
Training backend instance, populated after |
generator |
GeneratorBackend
|
Generation backend instance, populated after |
evaluator |
Evaluator
|
Evaluator instance, populated after |
results |
SafeSynthesizerResults
|
Final pipeline results, populated after |
Source code in src/nemo_safe_synthesizer/sdk/library_builder.py
trainer
instance-attribute
¶
Training backend instance, populated after train().
generator
instance-attribute
¶
Generation backend instance, populated after generate().
evaluator
instance-attribute
¶
Evaluator instance, populated after evaluate().
results
instance-attribute
¶
Final pipeline results, populated after evaluate() or run().
load_from_save_path()
¶
Load the Safe Synthesizer configuration from the save path.
Loads the configuration from the source run directory's config file. When resuming from a trained model for generation, the source paths point to the parent workdir that contains the trained adapter.
Always prefers cached train/test splits from the training run to ensure evaluation metrics are consistent and privacy guarantees are maintained. Falls back to with_data_source() data only if cached files are missing.
Returns:
| Type | Description |
|---|---|
SafeSynthesizer
|
Self for method chaining. |
Source code in src/nemo_safe_synthesizer/sdk/library_builder.py
process_data()
¶
Perform train/test split, auto-config resolution, and optional PII replacement.
Splits the data via Holdout, runs AutoConfigResolver to
resolve "auto" parameters, applies PII replacement to the
training set when enabled, and persists the splits to the workdir.
Returns:
| Type | Description |
|---|---|
SafeSynthesizer
|
Self for method chaining. |
Source code in src/nemo_safe_synthesizer/sdk/library_builder.py
train()
¶
Fine-tune the base model on the processed training data.
Creates the training backend (HuggingFace or Unsloth), loads
the base model, and runs fine-tuning. Requires
process_data() to have been called first.
Returns:
| Type | Description |
|---|---|
SafeSynthesizer
|
Self for method chaining. |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If called after |
Source code in src/nemo_safe_synthesizer/sdk/library_builder.py
generate()
¶
Generate synthetic data using the trained model.
Selects the appropriate backend (VllmBackend or
TimeseriesBackend), initializes it, and generates
synthetic records.
Returns:
| Type | Description |
|---|---|
SafeSynthesizer
|
Self for method chaining. |
Source code in src/nemo_safe_synthesizer/sdk/library_builder.py
evaluate()
¶
Run quality and privacy evaluations and populate results.
Returns:
| Type | Description |
|---|---|
SafeSynthesizer
|
Self for method chaining. |
Source code in src/nemo_safe_synthesizer/sdk/library_builder.py
run(output_file=None)
¶
Run the full pipeline and save results.
Executes process_data -> train -> generate ->
evaluate -> save_results. For step-by-step control,
call the individual methods instead.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
output_file
|
Path | str | None
|
Explicit output path for the synthetic data CSV.
Falls back to |
None
|
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If called after |
Source code in src/nemo_safe_synthesizer/sdk/library_builder.py
save_results(output_file=None)
¶
Save synthetic data, evaluation report, and metrics to the workdir.
Writes synthetic_data.csv, evaluation_report.html (when
available), and evaluation_metrics.json into the generate
directory. Called automatically by run(). Call explicitly
after stepwise execution
(process_data().train().generate().evaluate()).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
output_file
|
Path | str | None
|
Explicit output path for the CSV. Falls back
to |
None
|
Source code in src/nemo_safe_synthesizer/sdk/library_builder.py
get_training_backend_class(config)
¶
Select the training backend class based on configuration.
Returns HuggingFaceBackend by default, or UnslothTrainer
when config.training.use_unsloth is True.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
SafeSynthesizerParameters
|
Resolved pipeline parameters. |
required |
Returns:
| Type | Description |
|---|---|
type[TrainingBackend]
|
The training backend class to instantiate. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the backend identifier is unrecognized. |