dataset_statistics
dataset_statistics
¶
Classes:
| Name | Description |
|---|---|
DatasetStatistics |
Summary statistics for the training and synthetic datasets. |
DatasetStatistics
pydantic-model
¶
Bases: Component
Summary statistics for the training and synthetic datasets.
Reports row/column counts, missing-value percentages, and the number of memorized (verbatim-repeated) rows. This component does not produce a numeric score -- it provides context for the HTML report.
Fields:
-
score(EvaluationScore) -
name(str) -
training_rows(int) -
training_cols(int) -
training_missing(int) -
synthetic_rows(int) -
synthetic_cols(int) -
synthetic_missing(int) -
memorized_lines(int)
training_rows = 0
pydantic-field
¶
Row count of the training dataframe used for evaluation.
training_cols = 0
pydantic-field
¶
Column count of the training dataframe used for evaluation.
training_missing = 0
pydantic-field
¶
Percentage of missing values in the training dataframe.
synthetic_rows = 0
pydantic-field
¶
Row count of the synthetic dataframe used for evaluation.
synthetic_cols = 0
pydantic-field
¶
Column count of the synthetic dataframe used for evaluation.
synthetic_missing = 0
pydantic-field
¶
Percentage of missing values in the synthetic dataframe.
memorized_lines = 0
pydantic-field
¶
Number of exact row matches between training and synthetic.
jinja_context
cached
property
¶
Template context merging all dataset summary fields into the base context.
from_evaluation_datasets(evaluation_datasets, config=None)
staticmethod
¶
Compute summary statistics from the evaluation dataset.