external_results

`external_results` ¶

Public result models returned by the Safe Synthesizer pipeline.

Classes:

Name	Description
`SafeSynthesizerTiming`	Wall-clock durations for each pipeline stage.
`SafeSynthesizerSummary`	Aggregated quality, privacy, and record-count metrics for a pipeline run.

`SafeSynthesizerTiming` `pydantic-model` ¶

Bases: NSSBaseModel

Wall-clock durations for each pipeline stage.

Fields:

total_time_sec (float | None)
pii_replacer_time_sec (float | None)
training_time_sec (float | None)
generation_time_sec (float | None)
evaluation_time_sec (float | None)

`total_time_sec = None` `pydantic-field` ¶

Total end-to-end pipeline duration in seconds.

`pii_replacer_time_sec = None` `pydantic-field` ¶

Time spent on PII replacement.

`training_time_sec = None` `pydantic-field` ¶

Time spent on model training.

`generation_time_sec = None` `pydantic-field` ¶

Time spent generating synthetic records.

`evaluation_time_sec = None` `pydantic-field` ¶

Time spent evaluating synthetic data quality.

`log_timing(logger)` ¶

Emit all timing fields as a structured table via logger.

Source code in src/nemo_safe_synthesizer/config/external_results.py

def log_timing(self, logger: logging.Logger) -> None:
    """Emit all timing fields as a structured table via *logger*."""
    logger.info(
        "Safe Synthesizer timing",
        extra={"ctx": {"render_table": True, "tabular_data": self.model_dump(), "title": "Pipeline Timing"}},
    )

`log_wandb(run=None)` ¶

Log timing metrics to an active Weights & Biases run.

Parameters:

Name	Type	Description	Default
`run`	`Run \| None`	W&B run instance. No-op when `None`.	`None`

Source code in src/nemo_safe_synthesizer/config/external_results.py

def log_wandb(self, run: wandb.Run | None = None) -> None:
    """Log timing metrics to an active Weights & Biases run.

    Args:
        run: W&B run instance. No-op when ``None``.
    """
    if run is not None:
        run.log(
            {
                "total_time_sec": self.total_time_sec,
                "pii_replacer_time_sec": self.pii_replacer_time_sec,
                "training_time_sec": self.training_time_sec,
                "generation_time_sec": self.generation_time_sec,
                "evaluation_time_sec": self.evaluation_time_sec if self.evaluation_time_sec else 0,
            }
        )

`SafeSynthesizerSummary` `pydantic-model` ¶

Bases: NSSBaseModel

Aggregated quality, privacy, and record-count metrics for a pipeline run.

Token-field invariants (when all referenced fields are populated):

``num_non_record_tokens == num_completion_tokens
num_valid_record_tokens - num_invalid_record_tokens`` (clamped to 0 if slight tokenizer-boundary drift makes the subtraction negative).
tokens_per_prompt == num_completion_tokens / num_prompts.
valid_record_token_fraction == num_valid_record_tokens / num_completion_tokens.

Fields:

synthetic_data_quality_score (float | None)
column_correlation_stability_score (float | None)
deep_structure_stability_score (float | None)
column_distribution_stability_score (float | None)
text_semantic_similarity_score (float | None)
text_structure_similarity_score (float | None)
data_privacy_score (float | None)
membership_inference_protection_score (float | None)
attribute_inference_protection_score (float | None)
num_valid_records (int | None)
num_invalid_records (int | None)
num_prompts (int | None)
valid_record_fraction (float | None)
num_completion_tokens (int | None)
num_valid_record_tokens (int | None)
num_invalid_record_tokens (int | None)
num_non_record_tokens (int | None)
valid_record_token_fraction (float | None)
tokens_per_prompt (float | None)
tokens_per_second (float | None)
valid_tokens_per_second (float | None)
tokenization_overhead_sec (float | None)
timing (SafeSynthesizerTiming)

`synthetic_data_quality_score = None` `pydantic-field` ¶

Weighted composite of the five sub-scores below (SQS). Higher is better (0--10 scale).

`column_correlation_stability_score = None` `pydantic-field` ¶

How closely pairwise column correlations in synthetic data match the original for numeric and categorical columns.

`deep_structure_stability_score = None` `pydantic-field` ¶

PCA-based comparison of multivariate structure between real and synthetic data for numeric and categorical columns.

`column_distribution_stability_score = None` `pydantic-field` ¶

Per-column Jensen-Shannon distance between training and synthetic distributions averaged across all numeric and categorical columns.

`text_semantic_similarity_score = None` `pydantic-field` ¶

Embedding-based semantic closeness between real and synthetic free-text columns.

`text_structure_similarity_score = None` `pydantic-field` ¶

Jensen-Shannon divergence over sentence count, words-per-sentence, and characters-per-word distributions between real and synthetic free-text columns.

`data_privacy_score = None` `pydantic-field` ¶

Composite of MIA and AIA protection scores.

`membership_inference_protection_score = None` `pydantic-field` ¶

Resistance to attacks that try to determine whether a record was in the training set.

`attribute_inference_protection_score = None` `pydantic-field` ¶

Resistance to attacks that try to infer sensitive attributes from quasi-identifiers.

`num_valid_records = None` `pydantic-field` ¶

Count of synthetic records that passed schema and format validation.

`num_invalid_records = None` `pydantic-field` ¶

Count of synthetic records filtered out during validation.

`num_prompts = None` `pydantic-field` ¶

Total LLM generation prompts issued.

`valid_record_fraction = None` `pydantic-field` ¶

Ratio of valid records: num_valid_records / (num_valid_records + num_invalid_records).

`num_completion_tokens = None` `pydantic-field` ¶

Total tokens generated by the LLM across all completions.

`num_valid_record_tokens = None` `pydantic-field` ¶

Tokens in records that passed validation.

`num_invalid_record_tokens = None` `pydantic-field` ¶

Tokens in records that failed validation.

`num_non_record_tokens = None` `pydantic-field` ¶

Tokens not part of any recognized record.

`valid_record_token_fraction = None` `pydantic-field` ¶

Fraction of total completion tokens in valid records.

`tokens_per_prompt = None` `pydantic-field` ¶

Average completion tokens per prompt: num_completion_tokens / num_prompts.

`tokens_per_second = None` `pydantic-field` ¶

Total completion tokens divided by generation wall-clock time.

`valid_tokens_per_second = None` `pydantic-field` ¶

Valid record tokens divided by generation wall-clock time.

`tokenization_overhead_sec = None` `pydantic-field` ¶

Wall-clock seconds spent on tokenization for statistics tracking.

`timing` `pydantic-field` ¶

Per-stage wall-clock durations.

`log_summary(logger)` ¶

Emit all summary metrics as a structured table via logger.

Source code in src/nemo_safe_synthesizer/config/external_results.py

def log_summary(self, logger: logging.Logger) -> None:
    """Emit all summary metrics as a structured table via ``logger``."""
    logger.info(
        "Safe Synthesizer Summary",
        extra={"ctx": {"render_table": True, "tabular_data": self.model_dump(), "title": "Quality Metrics"}},
    )

`log_wandb()` ¶

Log all summary and timing metrics to the active W&B run.

Source code in src/nemo_safe_synthesizer/config/external_results.py

def log_wandb(self) -> None:
    """Log all summary and timing metrics to the active W&B run."""
    import wandb

    if wandb.run is not None:
        metrics: dict[str, float | int | None] = {
            "gen/generation_time_sec": self.timing.generation_time_sec,
            "gen/evaluation_time_sec": self.timing.evaluation_time_sec,
            "eval/total_time_sec": self.timing.total_time_sec,
            "train/pii_replacer_time_sec": self.timing.pii_replacer_time_sec,
            "train/training_time_sec": self.timing.training_time_sec,
            "gen/num_valid_records": self.num_valid_records,
            "gen/num_invalid_records": self.num_invalid_records,
            "gen/num_prompts": self.num_prompts,
            "gen/valid_record_fraction": self.valid_record_fraction,
            "gen/num_completion_tokens": self.num_completion_tokens,
            "gen/num_valid_record_tokens": self.num_valid_record_tokens,
            "gen/num_invalid_record_tokens": self.num_invalid_record_tokens,
            "gen/num_non_record_tokens": self.num_non_record_tokens,
            "gen/valid_record_token_fraction": self.valid_record_token_fraction,
            "gen/tokens_per_prompt": self.tokens_per_prompt,
            "gen/tokens_per_second": self.tokens_per_second,
            "gen/valid_tokens_per_second": self.valid_tokens_per_second,
            "gen/tokenization_overhead_sec": self.tokenization_overhead_sec,
            "eval/data_privacy_score": self.data_privacy_score,
            "eval/membership_inference_protection_score": self.membership_inference_protection_score,
            "eval/attribute_inference_protection_score": self.attribute_inference_protection_score,
            "eval/synthetic_data_quality_score": self.synthetic_data_quality_score,
            "eval/column_correlation_stability_score": self.column_correlation_stability_score,
            "eval/deep_structure_stability_score": self.deep_structure_stability_score,
            "eval/column_distribution_stability_score": self.column_distribution_stability_score,
            "eval/text_semantic_similarity_score": self.text_semantic_similarity_score,
            "eval/text_structure_similarity_score": self.text_structure_similarity_score,
            "eval/success": 1
            if self.data_privacy_score is not None
            and self.synthetic_data_quality_score is not None
            and self.synthetic_data_quality_score > 0
            else 0,
        }
        # Log ``None`` values too (rather than filtering them out) so that
        # dashboard comparisons clearly show when a stage was skipped or a
        # metric was not collected for a given run, instead of the metric
        # silently carrying its last-logged value on the W&B chart.
        wandb.log(metrics)

external_results