library_builder
library_builder
¶
Executable pipeline for Safe Synthesizer.
Classes:
| Name | Description |
|---|---|
SafeSynthesizer |
Fluent builder and runner for Safe Synthesizer workflows. |
SafeSynthesizer(config=None, workdir=None, save_path=None, emit_telemetry=None, deployment_type=None)
¶
Bases: ConfigBuilder
Fluent builder and runner for Safe Synthesizer workflows.
Extends ConfigBuilder with artifact management and stepwise
pipeline execution. Run all at once via run(), or step by
step::
builder = SafeSynthesizer().with_data_source(df)
builder.process_data().train().generate().evaluate()
builder.save_results()
results = builder.results
train() uses HuggingFaceBackend. generate() chooses
TimeseriesBackend when config.time_series.is_timeseries is true and
VllmBackend otherwise. Stepwise callers must call save_results()
themselves after evaluate(); run() does this automatically.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
SafeSynthesizerParameters | None
|
Optional pre-built parameters that seed every config section. |
None
|
workdir
|
Workdir | None
|
Explicit artifact directory layout. When |
None
|
save_path
|
Path | str | None
|
Root directory for artifacts when |
None
|
Example::
builder = (
SafeSynthesizer()
.with_data_source(df)
.with_replace_pii()
.with_train(learning_rate=0.0001)
.with_generate(num_records=10000)
)
builder.run()
results = builder.results
Methods:
| Name | Description |
|---|---|
load_from_save_path |
Load the Safe Synthesizer configuration from the save path. |
process_data |
Perform train/test split, auto-config resolution, and optional PII replacement. |
train |
Fine-tune the base model on the processed training data. |
generate |
Generate synthetic data using the trained model. |
evaluate |
Run quality and privacy evaluations and populate |
run |
Run the full pipeline and save results. |
save_results |
Save synthetic data, evaluation report, and metrics to the workdir. |
Attributes:
| Name | Type | Description |
|---|---|---|
trainer |
TrainingBackend
|
Training backend instance, populated after |
generator |
GeneratorBackend
|
Generation backend instance, populated after |
evaluator |
Evaluator
|
Evaluator instance, populated after |
results |
SafeSynthesizerResults
|
Final pipeline results, populated after |
Source code in src/nemo_safe_synthesizer/sdk/library_builder.py
trainer
instance-attribute
¶
Training backend instance, populated after train().
generator
instance-attribute
¶
Generation backend instance, populated after generate().
evaluator
instance-attribute
¶
Evaluator instance, populated after evaluate().
results
instance-attribute
¶
Final pipeline results, populated after evaluate() or run().
load_from_save_path()
¶
Load the Safe Synthesizer configuration from the save path.
Loads the configuration from the source run directory's config file. When resuming from a trained model for generation, the source paths point to the parent workdir that contains the trained adapter.
Always prefers cached train/test splits from the training run to ensure evaluation metrics are consistent and privacy guarantees are maintained. Falls back to with_data_source() data only if cached files are missing.
Returns:
| Type | Description |
|---|---|
SafeSynthesizer
|
Self for method chaining. |
Source code in src/nemo_safe_synthesizer/sdk/library_builder.py
process_data(check_only=False)
¶
Perform train/test split, auto-config resolution, and optional PII replacement.
Validates configured grouping/ordering columns against the input
dataset, splits the data via Holdout, runs
AutoConfigResolver to resolve "auto" parameters, applies
PII replacement to the training set when enabled, and persists the
splits to the workdir.
When check_only is True (the --validate path), PII
replacement is intentionally skipped and CSV writes are elided; a
resolved config YAML is written instead. Preflight therefore sees
the pre-replacement training split, which is a known gap: PII
replacement can change token lengths, so a clean --validate
does not guarantee a full run will pass token-budget checks. See
the "--validate is best-effort" callout in
docs/user-guide/running.md.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
check_only
|
bool
|
If |
False
|
Returns:
| Type | Description |
|---|---|
SafeSynthesizer
|
Self for method chaining. |
Source code in src/nemo_safe_synthesizer/sdk/library_builder.py
300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 | |
train()
¶
Fine-tune the base model on the processed training data.
Creates the HuggingFace training backend, loads the base model,
and runs fine-tuning. Requires process_data() to have been
called first.
Returns:
| Type | Description |
|---|---|
SafeSynthesizer
|
Self for method chaining. |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If called after |
Source code in src/nemo_safe_synthesizer/sdk/library_builder.py
generate()
¶
Generate synthetic data using the trained model.
Selects the appropriate backend (VllmBackend or
TimeseriesBackend), initializes it, and generates
synthetic records.
Returns:
| Type | Description |
|---|---|
SafeSynthesizer
|
Self for method chaining. |
Source code in src/nemo_safe_synthesizer/sdk/library_builder.py
evaluate()
¶
Run quality and privacy evaluations and populate results.
Returns:
| Type | Description |
|---|---|
SafeSynthesizer
|
Self for method chaining. |
Source code in src/nemo_safe_synthesizer/sdk/library_builder.py
run(output_file=None)
¶
Run the full pipeline and save results.
Executes process_data -> train -> generate ->
evaluate -> save_results. For step-by-step control,
call the individual methods instead.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
output_file
|
Path | str | None
|
Explicit output path for the synthetic data CSV.
Falls back to |
None
|
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If called after |
Source code in src/nemo_safe_synthesizer/sdk/library_builder.py
save_results(output_file=None)
¶
Save synthetic data, evaluation report, and metrics to the workdir.
Writes synthetic_data.csv, evaluation_report.html (when
available), and evaluation_metrics.json into the generate
directory. Called automatically by run(). Call explicitly
after stepwise execution
(process_data().train().generate().evaluate()).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
output_file
|
Path | str | None
|
Explicit output path for the CSV. Falls back
to |
None
|