validation
validation
¶
Shared column-validation primitives and compound checks.
Two layers:
- Single-purpose primitives (
check_column_present,check_column_has_no_nulls,check_no_pseudo_column_collision) raise one specific error for one specific failure mode. Preflight checks call these directly so eachcollector.error("code", ...)sits next to the check that owns the issue code -- no shared exception-to-code mapping. - Compound checks (
check_groupby_column,check_orderby_column,check_timestamp_column) are 2-3 line orchestrations of the primitives. They give non-preflight callers (SDK pipeline, holdout, assembler, time-series preprocessing) a single fail-fast gate per column.
Functions:
| Name | Description |
|---|---|
check_column_present |
Raise |
check_column_has_no_nulls |
Raise |
check_no_pseudo_column_collision |
Validate that the reserved pseudo-group column name is not already in use. |
check_groupby_column |
Validate the configured group-by column exists and has no missing values. |
check_orderby_column |
Validate the configured order-by column exists. |
check_timestamp_column |
Validate the configured timestamp column exists and has no missing values. |
check_column_present(data, column, *, role, hint=None)
¶
Raise ParameterError if column is not in data's columns.
role is an English label for the column's purpose (e.g.
"Group by", "Order by", "Timestamp") used in the error
message. hint is an optional trailing sentence callers can use
to tell the user how to resolve the error (e.g. "Please set X to
null to disable Y."). The "Group by" role has a built-in hint
for comma-in-name that always takes precedence over hint.
Source code in src/nemo_safe_synthesizer/data_processing/validation.py
check_column_has_no_nulls(data, column, *, role)
¶
Raise DataError if column contains any null values.
The input must already be a DataFrame so null checks inspect real row contents instead of only column-name metadata.
Source code in src/nemo_safe_synthesizer/data_processing/validation.py
check_no_pseudo_column_collision(data)
¶
Validate that the reserved pseudo-group column name is not already in use.
Raises:
| Type | Description |
|---|---|
ParameterError
|
If |
DataError
|
If the reserved pseudo-group column is already present in
|
Source code in src/nemo_safe_synthesizer/data_processing/validation.py
check_groupby_column(data, group_by)
¶
Validate the configured group-by column exists and has no missing values.
Raises:
| Type | Description |
|---|---|
ParameterError
|
If |
DataError
|
If |
Source code in src/nemo_safe_synthesizer/data_processing/validation.py
check_orderby_column(data, order_by, *, is_timeseries=False, timestamp_column=None)
¶
Validate the configured order-by column exists.
In time-series mode without an explicit timestamp column, ordering is deferred until preprocessing synthesizes a timestamp, so this check is skipped.
Raises:
| Type | Description |
|---|---|
ParameterError
|
If |
Source code in src/nemo_safe_synthesizer/data_processing/validation.py
check_timestamp_column(data, timestamp_column)
¶
Validate the configured timestamp column exists and has no missing values.
Raises:
| Type | Description |
|---|---|
ParameterError
|
If |
DataError
|
If |