external_results
external_results
¶
Public result models returned by the Safe Synthesizer pipeline.
Classes:
| Name | Description |
|---|---|
SafeSynthesizerTiming |
Wall-clock durations for each pipeline stage. |
SafeSynthesizerSummary |
Aggregated quality, privacy, and record-count metrics for a pipeline run. |
SafeSynthesizerTiming
pydantic-model
¶
Bases: NSSBaseModel
Wall-clock durations for each pipeline stage.
Fields:
-
total_time_sec(float | None) -
pii_replacer_time_sec(float | None) -
training_time_sec(float | None) -
generation_time_sec(float | None) -
evaluation_time_sec(float | None)
total_time_sec = None
pydantic-field
¶
Total end-to-end pipeline duration in seconds.
pii_replacer_time_sec = None
pydantic-field
¶
Time spent on PII replacement.
training_time_sec = None
pydantic-field
¶
Time spent on model training.
generation_time_sec = None
pydantic-field
¶
Time spent generating synthetic records.
evaluation_time_sec = None
pydantic-field
¶
Time spent evaluating synthetic data quality.
log_timing(logger)
¶
Emit all timing fields as a structured table via logger.
Source code in src/nemo_safe_synthesizer/config/external_results.py
log_wandb(run=None)
¶
Log timing metrics to an active Weights & Biases run.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
run
|
Optional[Run]
|
W&B run instance. No-op when |
None
|
Source code in src/nemo_safe_synthesizer/config/external_results.py
SafeSynthesizerSummary
pydantic-model
¶
Bases: NSSBaseModel
Aggregated quality, privacy, and record-count metrics for a pipeline run.
Fields:
-
synthetic_data_quality_score(float | None) -
column_correlation_stability_score(float | None) -
deep_structure_stability_score(float | None) -
column_distribution_stability_score(float | None) -
text_semantic_similarity_score(float | None) -
text_structure_similarity_score(float | None) -
data_privacy_score(float | None) -
membership_inference_protection_score(float | None) -
attribute_inference_protection_score(float | None) -
num_valid_records(int | None) -
num_invalid_records(int | None) -
num_prompts(int | None) -
valid_record_fraction(float | None) -
timing(SafeSynthesizerTiming)
synthetic_data_quality_score = None
pydantic-field
¶
Weighted composite of the five sub-scores below (SQS). Higher is better (0--10 scale).
column_correlation_stability_score = None
pydantic-field
¶
How closely pairwise column correlations in synthetic data match the original for numeric and categorical columns.
deep_structure_stability_score = None
pydantic-field
¶
PCA-based comparison of multivariate structure between real and synthetic data for numeric and categorical columns.
column_distribution_stability_score = None
pydantic-field
¶
Per-column Jensen-Shannon distance between training and synthetic distributions averaged across all numeric and categorical columns.
text_semantic_similarity_score = None
pydantic-field
¶
Embedding-based semantic closeness between real and synthetic free-text columns.
text_structure_similarity_score = None
pydantic-field
¶
Jensen-Shannon divergence over sentence count, words-per-sentence, and characters-per-word distributions between real and synthetic free-text columns.
data_privacy_score = None
pydantic-field
¶
Composite of MIA and AIA protection scores.
membership_inference_protection_score = None
pydantic-field
¶
Resistance to attacks that try to determine whether a record was in the training set.
attribute_inference_protection_score = None
pydantic-field
¶
Resistance to attacks that try to infer sensitive attributes from quasi-identifiers.
num_valid_records = None
pydantic-field
¶
Count of synthetic records that passed schema and format validation.
num_invalid_records = None
pydantic-field
¶
Count of synthetic records filtered out during validation.
num_prompts = None
pydantic-field
¶
Total LLM generation prompts issued.
valid_record_fraction = None
pydantic-field
¶
Ratio of valid records: num_valid_records / (num_valid_records + num_invalid_records).
timing
pydantic-field
¶
Per-stage wall-clock durations.
log_summary(logger)
¶
Emit all summary metrics as a structured table via logger.
Source code in src/nemo_safe_synthesizer/config/external_results.py
log_wandb()
¶
Log all summary and timing metrics to the active W&B run.