Skip to content

column_distribution

column_distribution

Classes:

Name Description
ColumnDistributionPlotRow

A pair of side-by-side column distribution plots for the HTML report.

ColumnDistribution

Column Distribution Stability metric.

ColumnDistributionPlotRow pydantic-model

Bases: BaseModel

A pair of side-by-side column distribution plots for the HTML report.

Fields:

name1 pydantic-field

Name of the first column in the plot row.

name2 pydantic-field

Name of the second column in the plot row, if present.

figure pydantic-field

Rendered HTML of the side-by-side distribution plot.

ColumnDistribution pydantic-model

Bases: Component

Column Distribution Stability metric.

Computes per-column Jensen-Shannon divergence between reference and output distributions, averages across all tabular columns, and maps the result to a 0--10 score. Also carries data for the per-column histogram figures and the Reference Columns table in the HTML report.

Fields:

column_statistics = None pydantic-field

Per-column PII entity and transform metadata.

evaluation_fields = list() pydantic-field

Per-column evaluation metadata and distribution scores.

jinja_context cached property

Template context with evaluation fields and column statistics for the report.

from_evaluation_dataset(evaluation_dataset, config=None) staticmethod

Compute column distribution stability from the evaluation dataset.

Source code in src/nemo_safe_synthesizer/evaluation/components/column_distribution.py
@staticmethod
def from_evaluation_dataset(
    evaluation_dataset: EvaluationDataset, config: SafeSynthesizerParameters | None = None
) -> ColumnDistribution:
    """Compute column distribution stability from the evaluation dataset."""
    tabular_columns = set(evaluation_dataset.get_tabular_columns())
    tabular_fields = [f for f in evaluation_dataset.evaluation_fields if f.name in tabular_columns]
    if tabular_fields:
        average_divergence = EvaluationField.get_average_divergence(tabular_fields)
        score = EvaluationField.get_field_distribution_stability(average_divergence)
        return ColumnDistribution(
            score=score,
            column_statistics=evaluation_dataset.column_statistics,
            evaluation_fields=evaluation_dataset.evaluation_fields,
        )
    else:
        return ColumnDistribution(
            score=EvaluationScore(notes="No tabular columns detected."),
            column_statistics=evaluation_dataset.column_statistics,
            evaluation_fields=evaluation_dataset.evaluation_fields,
        )