Skip to content

fields

fields

Data models for field type classification and per-column statistics.

Classes:

FieldType: Enum of column types recognized by the field analyzer.
FieldFeatures: Statistical profile of a single DataFrame column.

Classes:

Name Description
FieldType

Column type classification assigned by the field analyzer.

FieldFeatures

Statistical profile of a single DataFrame column.

FieldType

Bases: StrEnum

Column type classification assigned by the field analyzer.

Used by evaluation and pii_replacer to dispatch type-specific processing logic (e.g., numeric metrics vs. text similarity).

FieldFeatures pydantic-model

Bases: BaseModel

Statistical profile of a single DataFrame column.

Captures type classification, value distribution, missing-data rates, string-length statistics, and optional numeric precision. Produced by describe_field in the analyzers.field_features module.

Fields:

name pydantic-field

Column name in the source DataFrame.

type pydantic-field

Inferred column type.

count pydantic-field

Number of non-null values.

unique_values_list pydantic-field

Deduplicated list of non-null values.

unique_count pydantic-field

Number of unique non-null values.

unique_percent pydantic-field

Percentage of values that are unique, relative to non-null count.

missing_count pydantic-field

Number of null/missing values.

missing_percent pydantic-field

Percentage of values that are missing, relative to total count.

min_str_length pydantic-field

Minimum string-representation length among non-null values.

max_str_length pydantic-field

Maximum string-representation length among non-null values.

avg_str_length pydantic-field

Mean string-representation length among non-null values.

min_value = None pydantic-field

Floor power-of-10 of the column minimum (numeric columns only).

max_value = None pydantic-field

Floor power-of-10 of the column maximum (numeric columns only).

min_precision = None pydantic-field

Minimum decimal digit count across float values.

max_precision = None pydantic-field

Maximum decimal digit count across float values.

space_count = None pydantic-field

Total number of space characters across all non-null values.

classification = None pydantic-field

NER-based classification metadata, when available.

to_dict(**kwargs)

Serialize to a dict, excluding unset and None fields.

Source code in src/nemo_safe_synthesizer/artifacts/base/fields.py
def to_dict(self, **kwargs) -> dict:
    """Serialize to a dict, excluding unset and None fields."""
    return self.model_dump(exclude_unset=True, exclude_none=True)