processors
processors
¶
Processors that parse raw LLM text into validated records.
Classes:
| Name | Description |
|---|---|
ParsedResponse |
Parsed result of a single LLM prompt response. |
Processor |
Abstract class for processing text generation results from the LLM. |
TabularDataProcessor |
Processor for standard (non-grouped, non-time-series) tabular data. |
TimeSeriesDataProcessor |
Processor for time-series data generation tasks. |
GroupedDataProcessor |
Processor for grouped data generation tasks. |
Functions:
| Name | Description |
|---|---|
create_processor |
Create the appropriate record processor for the current pipeline mode. |
ParsedResponse(valid_records, invalid_records, errors, prompt_number=None)
dataclass
¶
Parsed result of a single LLM prompt response.
Attributes:
| Name | Type | Description |
|---|---|---|
valid_records |
list[dict]
|
Records that passed schema validation (as dicts). |
invalid_records |
list[str]
|
Raw text of records that failed validation. |
errors |
list[tuple[str, str]]
|
|
prompt_number |
int | None
|
Index of the prompt in the batch. |
Processor(schema, config)
¶
Bases: ABC
Abstract class for processing text generation results from the LLM.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
schema
|
dict
|
JSON schema as a dictionary. |
required |
Attributes:
| Name | Type | Description |
|---|---|---|
name |
The processor's name with spaces, for logging. |
Source code in src/nemo_safe_synthesizer/generation/processors.py
name
property
¶
The processor's name with spaces, for logging.
TabularDataProcessor(schema, config)
¶
Bases: Processor
Processor for standard (non-grouped, non-time-series) tabular data.
Source code in src/nemo_safe_synthesizer/generation/processors.py
TimeSeriesDataProcessor(schema, config, time_column, interval_seconds, time_format)
¶
Bases: Processor
Processor for time-series data generation tasks.
Validates chronological ordering and timestamp intervals in addition to the standard schema checks.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
schema
|
dict
|
JSON schema as a dictionary. |
required |
config
|
ValidationParameters
|
Validation parameters. |
required |
time_column
|
str | None
|
Name of the timestamp column. |
required |
interval_seconds
|
int | None
|
Expected interval between consecutive
timestamps, or |
required |
time_format
|
str | None
|
Timestamp format string ( |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Source code in src/nemo_safe_synthesizer/generation/processors.py
GroupedDataProcessor(schema, config, bos_token, eos_token, group_by, order_by=None)
¶
Bases: Processor
Processor for grouped data generation tasks.
Used when training examples are grouped (and optionally ordered) by
one or more columns. Validates that each group has a unique
group_by value and respects the order_by ordering.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
schema
|
dict
|
JSON schema as a dictionary. |
required |
config
|
ValidationParameters
|
Validation parameters controlling tolerance for invalid records, non-unique group values, etc. |
required |
bos_token
|
str
|
Token delimiting the beginning of a group sequence. |
required |
eos_token
|
str
|
Token delimiting the end of a group sequence. |
required |
group_by
|
str | list[str]
|
Column name that defines groups. |
required |
order_by
|
str | None
|
Column name to enforce ordering within a group, or
|
None
|
Source code in src/nemo_safe_synthesizer/generation/processors.py
create_processor(schema, metadata, config)
¶
Create the appropriate record processor for the current pipeline mode.
Selects TimeSeriesDataProcessor, GroupedDataProcessor, or
TabularDataProcessor based on the pipeline configuration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
schema
|
dict
|
JSON schema describing the expected record format. |
required |
metadata
|
ModelMetadata
|
Model metadata (prompt template, BOS/EOS tokens, etc.). |
required |
config
|
SafeSynthesizerParameters
|
Pipeline configuration determining the generation mode. |
required |
Returns:
| Type | Description |
|---|---|
Processor
|
Processor instance matching the configured generation mode. |