Skip to content

external_results

external_results

Public result models returned by the Safe Synthesizer pipeline.

Classes:

Name Description
SafeSynthesizerTiming

Wall-clock durations for each pipeline stage.

SafeSynthesizerSummary

Aggregated quality, privacy, and record-count metrics for a pipeline run.

SafeSynthesizerTiming pydantic-model

Bases: NSSBaseModel

Wall-clock durations for each pipeline stage.

Fields:

total_time_sec = None pydantic-field

Total end-to-end pipeline duration in seconds.

pii_replacer_time_sec = None pydantic-field

Time spent on PII replacement.

training_time_sec = None pydantic-field

Time spent on model training.

generation_time_sec = None pydantic-field

Time spent generating synthetic records.

evaluation_time_sec = None pydantic-field

Time spent evaluating synthetic data quality.

log_timing(logger)

Emit all timing fields as a structured table via logger.

Source code in src/nemo_safe_synthesizer/config/external_results.py
def log_timing(self, logger: logging.Logger) -> None:
    """Emit all timing fields as a structured table via *logger*."""
    logger.info(
        "Safe Synthesizer timing",
        extra={"ctx": {"render_table": True, "tabular_data": self.model_dump(), "title": "Pipeline Timing"}},
    )

log_wandb(run=None)

Log timing metrics to an active Weights & Biases run.

Parameters:

Name Type Description Default
run Run | None

W&B run instance. No-op when None.

None
Source code in src/nemo_safe_synthesizer/config/external_results.py
def log_wandb(self, run: wandb.Run | None = None) -> None:
    """Log timing metrics to an active Weights & Biases run.

    Args:
        run: W&B run instance. No-op when ``None``.
    """
    if run is not None:
        run.log(
            {
                "total_time_sec": self.total_time_sec,
                "pii_replacer_time_sec": self.pii_replacer_time_sec,
                "training_time_sec": self.training_time_sec,
                "generation_time_sec": self.generation_time_sec,
                "evaluation_time_sec": self.evaluation_time_sec if self.evaluation_time_sec else 0,
            }
        )

SafeSynthesizerSummary pydantic-model

Bases: NSSBaseModel

Aggregated quality, privacy, and record-count metrics for a pipeline run.

Token-field invariants (when all referenced fields are populated):

  • ``num_non_record_tokens == num_completion_tokens
  • num_valid_record_tokens - num_invalid_record_tokens`` (clamped to 0 if slight tokenizer-boundary drift makes the subtraction negative).
  • tokens_per_prompt == num_completion_tokens / num_prompts.
  • valid_record_token_fraction == num_valid_record_tokens / num_completion_tokens.

Fields:

synthetic_data_quality_score = None pydantic-field

Weighted composite of the five sub-scores below (SQS). Higher is better (0--10 scale).

column_correlation_stability_score = None pydantic-field

How closely pairwise column correlations in synthetic data match the original for numeric and categorical columns.

deep_structure_stability_score = None pydantic-field

PCA-based comparison of multivariate structure between real and synthetic data for numeric and categorical columns.

column_distribution_stability_score = None pydantic-field

Per-column Jensen-Shannon distance between training and synthetic distributions averaged across all numeric and categorical columns.

text_semantic_similarity_score = None pydantic-field

Embedding-based semantic closeness between real and synthetic free-text columns.

text_structure_similarity_score = None pydantic-field

Jensen-Shannon divergence over sentence count, words-per-sentence, and characters-per-word distributions between real and synthetic free-text columns.

data_privacy_score = None pydantic-field

Composite of MIA and AIA protection scores.

membership_inference_protection_score = None pydantic-field

Resistance to attacks that try to determine whether a record was in the training set.

attribute_inference_protection_score = None pydantic-field

Resistance to attacks that try to infer sensitive attributes from quasi-identifiers.

num_valid_records = None pydantic-field

Count of synthetic records that passed schema and format validation.

num_invalid_records = None pydantic-field

Count of synthetic records filtered out during validation.

num_prompts = None pydantic-field

Total LLM generation prompts issued.

valid_record_fraction = None pydantic-field

Ratio of valid records: num_valid_records / (num_valid_records + num_invalid_records).

num_completion_tokens = None pydantic-field

Total tokens generated by the LLM across all completions.

num_valid_record_tokens = None pydantic-field

Tokens in records that passed validation.

num_invalid_record_tokens = None pydantic-field

Tokens in records that failed validation.

num_non_record_tokens = None pydantic-field

Tokens not part of any recognized record.

valid_record_token_fraction = None pydantic-field

Fraction of total completion tokens in valid records.

tokens_per_prompt = None pydantic-field

Average completion tokens per prompt: num_completion_tokens / num_prompts.

tokens_per_second = None pydantic-field

Total completion tokens divided by generation wall-clock time.

valid_tokens_per_second = None pydantic-field

Valid record tokens divided by generation wall-clock time.

tokenization_overhead_sec = None pydantic-field

Wall-clock seconds spent on tokenization for statistics tracking.

timing pydantic-field

Per-stage wall-clock durations.

log_summary(logger)

Emit all summary metrics as a structured table via logger.

Source code in src/nemo_safe_synthesizer/config/external_results.py
def log_summary(self, logger: logging.Logger) -> None:
    """Emit all summary metrics as a structured table via ``logger``."""
    logger.info(
        "Safe Synthesizer Summary",
        extra={"ctx": {"render_table": True, "tabular_data": self.model_dump(), "title": "Quality Metrics"}},
    )

log_wandb()

Log all summary and timing metrics to the active W&B run.

Source code in src/nemo_safe_synthesizer/config/external_results.py
def log_wandb(self) -> None:
    """Log all summary and timing metrics to the active W&B run."""
    import wandb

    if wandb.run is not None:
        metrics: dict[str, float | int | None] = {
            "gen/generation_time_sec": self.timing.generation_time_sec,
            "gen/evaluation_time_sec": self.timing.evaluation_time_sec,
            "eval/total_time_sec": self.timing.total_time_sec,
            "train/pii_replacer_time_sec": self.timing.pii_replacer_time_sec,
            "train/training_time_sec": self.timing.training_time_sec,
            "gen/num_valid_records": self.num_valid_records,
            "gen/num_invalid_records": self.num_invalid_records,
            "gen/num_prompts": self.num_prompts,
            "gen/valid_record_fraction": self.valid_record_fraction,
            "gen/num_completion_tokens": self.num_completion_tokens,
            "gen/num_valid_record_tokens": self.num_valid_record_tokens,
            "gen/num_invalid_record_tokens": self.num_invalid_record_tokens,
            "gen/num_non_record_tokens": self.num_non_record_tokens,
            "gen/valid_record_token_fraction": self.valid_record_token_fraction,
            "gen/tokens_per_prompt": self.tokens_per_prompt,
            "gen/tokens_per_second": self.tokens_per_second,
            "gen/valid_tokens_per_second": self.valid_tokens_per_second,
            "gen/tokenization_overhead_sec": self.tokenization_overhead_sec,
            "eval/data_privacy_score": self.data_privacy_score,
            "eval/membership_inference_protection_score": self.membership_inference_protection_score,
            "eval/attribute_inference_protection_score": self.attribute_inference_protection_score,
            "eval/synthetic_data_quality_score": self.synthetic_data_quality_score,
            "eval/column_correlation_stability_score": self.column_correlation_stability_score,
            "eval/deep_structure_stability_score": self.deep_structure_stability_score,
            "eval/column_distribution_stability_score": self.column_distribution_stability_score,
            "eval/text_semantic_similarity_score": self.text_semantic_similarity_score,
            "eval/text_structure_similarity_score": self.text_structure_similarity_score,
            "eval/success": 1
            if self.data_privacy_score is not None
            and self.synthetic_data_quality_score is not None
            and self.synthetic_data_quality_score > 0
            else 0,
        }
        # Log ``None`` values too (rather than filtering them out) so that
        # dashboard comparisons clearly show when a stage was skipped or a
        # metric was not collected for a given run, instead of the metric
        # silently carrying its last-logged value on the W&B chart.
        wandb.log(metrics)