dataset_statistics
dataset_statistics
¶
Classes:
| Name | Description |
|---|---|
DatasetStatistics |
Summary statistics for the reference and output datasets. |
DatasetStatistics
pydantic-model
¶
Bases: Component
Summary statistics for the reference and output datasets.
Reports row/column counts, missing-value percentages, and the number of memorized (verbatim-repeated) rows. This component does not produce a numeric score -- it provides context for the HTML report.
Fields:
-
score(EvaluationScore) -
name(str) -
reference_rows(int) -
reference_cols(int) -
reference_missing(int) -
output_rows(int) -
output_cols(int) -
output_missing(int) -
memorized_lines(int)
reference_rows = 0
pydantic-field
¶
Row count of the reference dataframe used for evaluation.
reference_cols = 0
pydantic-field
¶
Column count of the reference dataframe used for evaluation.
reference_missing = 0
pydantic-field
¶
Percentage of missing values in the reference dataframe.
output_rows = 0
pydantic-field
¶
Row count of the output dataframe used for evaluation.
output_cols = 0
pydantic-field
¶
Column count of the output dataframe used for evaluation.
output_missing = 0
pydantic-field
¶
Percentage of missing values in the output dataframe.
memorized_lines = 0
pydantic-field
¶
Number of exact row matches between reference and output.
jinja_context
cached
property
¶
Template context merging all dataset summary fields into the base context.
from_evaluation_dataset(evaluation_dataset, config=None)
staticmethod
¶
Compute summary statistics from the evaluation dataset.