Skip to content

correlation

correlation

Classes:

Name Description
Correlation

Column Correlation Stability metric.

Correlation pydantic-model

Bases: Component

Column Correlation Stability metric.

Computes per-column-pair correlations (Pearson, Theil's U, Correlation Ratio) for both training and synthetic dataframes, then scores the mean absolute difference.

Config:

  • arbitrary_types_allowed: True

Fields:

training_correlation = None pydantic-field

Correlation matrix for the training data.

synthetic_correlation = None pydantic-field

Correlation matrix for the synthetic data.

correlation_difference = None pydantic-field

Element-wise absolute difference of the two matrices.

jinja_context cached property

Template context with combined correlation heatmap figure.

from_evaluation_datasets(evaluation_datasets, config=None) staticmethod

Compute correlation matrices and the correlation stability score.

Source code in src/nemo_safe_synthesizer/evaluation/components/correlation.py
@staticmethod
def from_evaluation_datasets(
    evaluation_datasets: EvaluationDatasets, config: SafeSynthesizerParameters | None = None
) -> Correlation:
    """Compute correlation matrices and the correlation stability score."""
    # We only want to use these types for correlation.
    tabular_columns = evaluation_datasets.get_tabular_columns()
    # We use different calculations (Theil's U) for nominal columns.
    nominal_columns = evaluation_datasets.get_nominal_columns()

    (
        training_correlation,
        synthetic_correlation,
        correlation_difference,
        mean_absolute_error,
    ) = Correlation._get_correlation_calculations(
        training_df=evaluation_datasets.training[tabular_columns],
        synthetic_df=evaluation_datasets.synthetic[tabular_columns],
        nominal_columns=nominal_columns,
        fields=evaluation_datasets.evaluation_fields,
    )
    evaluation_score = Correlation._get_field_correlation_stability(mean_absolute_error)
    return Correlation(
        training_correlation=training_correlation,
        synthetic_correlation=synthetic_correlation,
        correlation_difference=correlation_difference,
        score=evaluation_score,
    )