fields
fields
¶
Data models for field type classification and per-column statistics.
Classes:
FieldType: Enum of column types recognized by the field analyzer.
FieldFeatures: Statistical profile of a single DataFrame column.
Classes:
| Name | Description |
|---|---|
FieldType |
Column type classification assigned by the field analyzer. |
FieldFeatures |
Statistical profile of a single DataFrame column. |
FieldType
¶
Bases: StrEnum
Column type classification assigned by the field analyzer.
Used by evaluation and pii_replacer to dispatch type-specific
processing logic (e.g., numeric metrics vs. text similarity).
FieldFeatures
pydantic-model
¶
Bases: BaseModel
Statistical profile of a single DataFrame column.
Captures type classification, value distribution, missing-data rates,
string-length statistics, and optional numeric precision. Produced by
describe_field in the analyzers.field_features module.
Fields:
-
name(str) -
type(FieldType) -
count(int) -
unique_values_list(list[Any]) -
unique_count(int) -
unique_percent(float) -
missing_count(int) -
missing_percent(float) -
min_str_length(int) -
max_str_length(int) -
avg_str_length(float) -
min_value(int | float | None) -
max_value(int | float | None) -
min_precision(int | None) -
max_precision(int | None) -
space_count(int | None) -
classification(dict | None)
name
pydantic-field
¶
Column name in the source DataFrame.
type
pydantic-field
¶
Inferred column type.
count
pydantic-field
¶
Number of non-null values.
unique_values_list
pydantic-field
¶
Deduplicated list of non-null values.
unique_count
pydantic-field
¶
Number of unique non-null values.
unique_percent
pydantic-field
¶
Percentage of values that are unique, relative to non-null count.
missing_count
pydantic-field
¶
Number of null/missing values.
missing_percent
pydantic-field
¶
Percentage of values that are missing, relative to total count.
min_str_length
pydantic-field
¶
Minimum string-representation length among non-null values.
max_str_length
pydantic-field
¶
Maximum string-representation length among non-null values.
avg_str_length
pydantic-field
¶
Mean string-representation length among non-null values.
min_value = None
pydantic-field
¶
Floor power-of-10 of the column minimum (numeric columns only).
max_value = None
pydantic-field
¶
Floor power-of-10 of the column maximum (numeric columns only).
min_precision = None
pydantic-field
¶
Minimum decimal digit count across float values.
max_precision = None
pydantic-field
¶
Maximum decimal digit count across float values.
space_count = None
pydantic-field
¶
Total number of space characters across all non-null values.
classification = None
pydantic-field
¶
NER-based classification metadata, when available.