field_features
field_features
¶
Feature extraction for DataFrame columns.
Analyzes each column to produce a FieldFeatures profile containing type
classification, value distribution, missing-data rates, and numeric precision.
The evaluation and pii_replacer packages import describe_field,
FieldFeatures, and FieldType from this module.
Functions:
| Name | Description |
|---|---|
float_precision |
Compute the range of decimal-digit counts in a float Series. |
describe_field |
Build a statistical profile for a single DataFrame column. |
floor_power_of_10 |
Return the largest power of 10 that does not exceed |
float_precision(data)
¶
Compute the range of decimal-digit counts in a float Series.
Converts each value to its string representation, counts digits after the decimal point, and returns the (min, max) across the Series.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Series
|
A pandas Series to analyze. Non-float dtypes return |
required |
Returns:
| Type | Description |
|---|---|
int | None
|
|
int | None
|
Series is not float-typed or contains no meaningful decimal digits. |
Source code in src/nemo_safe_synthesizer/artifacts/analyzers/field_features.py
describe_field(field_name, data)
¶
Build a statistical profile for a single DataFrame column.
Computes value counts, uniqueness, missing-data rates, string-length
statistics, and numeric precision. Infers a FieldType using the
following heuristic priority:
EMPTY-- all values are null.BINARY-- exactly two unique non-null values.NUMERIC-- numeric dtype (float or int); integer columns with <= 10 non-negative unique values are classified asCATEGORICAL.CATEGORICAL-- high duplicate ratio (>= 90%, or >= 70% when the sample has <= 50 rows).TEXT-- average space count per value exceeds the threshold.OTHER-- none of the above rules matched.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
field_name
|
str
|
Column name used as the |
required |
data
|
Series
|
The column data as a pandas Series. |
required |
Returns:
| Type | Description |
|---|---|
FieldFeatures
|
A |
Source code in src/nemo_safe_synthesizer/artifacts/analyzers/field_features.py
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 | |
floor_power_of_10(value)
¶
Return the largest power of 10 that does not exceed abs(value).
Special cases: returns 1 for zero, and passes through infinity and NaN unchanged. Negative values return the negative of the result.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
value
|
int | float
|
The number to compute the floor power-of-10 for. |
required |
Returns:
| Type | Description |
|---|---|
int | float
|
A power of 10 with the same sign as |