Processors
The processors module defines configuration objects for post-generation data transformations. Processors run after column generation and can modify the dataset schema or content before output.
Classes:
| Name | Description |
|---|---|
DropColumnsProcessorConfig |
Configuration for dropping columns from the output dataset. |
ProcessorConfig |
Abstract base class for all processor configuration types. |
ProcessorType |
Enumeration of available processor types. |
SchemaTransformProcessorConfig |
Configuration for transforming the dataset schema using Jinja2 templates. |
Functions:
| Name | Description |
|---|---|
get_processor_config_from_kwargs |
Create a processor configuration from a processor type and keyword arguments. |
DropColumnsProcessorConfig
Bases: ProcessorConfig
Configuration for dropping columns from the output dataset.
This processor removes specified columns from the generated dataset. The dropped
columns are saved separately in a dropped-columns directory for reference.
When this processor is added via the config builder, the corresponding column
configs are automatically marked with drop = True.
Alternatively, you can set drop = True when configuring a column.
Attributes:
| Name | Type | Description |
|---|---|---|
column_names |
list[str]
|
List of column names to remove from the output dataset. |
processor_type |
Literal[DROP_COLUMNS]
|
Discriminator field, always |
ProcessorConfig
Bases: ConfigBase, ABC
Abstract base class for all processor configuration types.
Processors are transformations that run before or after columns are generated. They can modify, reshape, or augment the dataset before it's saved.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Unique name of the processor, used to identify the processor in results and to name output artifacts on disk. |
build_stage |
BuildStage
|
The stage at which the processor runs. Currently only |
ProcessorType
Bases: str, Enum
Enumeration of available processor types.
Attributes:
| Name | Type | Description |
|---|---|---|
DROP_COLUMNS |
Processor that removes specified columns from the output dataset. |
|
SCHEMA_TRANSFORM |
Processor that creates a new dataset with a transformed schema using Jinja2 templates. |
SchemaTransformProcessorConfig
Bases: ProcessorConfig
Configuration for transforming the dataset schema using Jinja2 templates.
This processor creates a new dataset with a transformed schema. Each key in the
template becomes a column in the output, and values are Jinja2 templates that
can reference any column in the batch. The transformed dataset is written to
a processors-outputs/{processor_name}/ directory alongside the main dataset.
Attributes:
| Name | Type | Description |
|---|---|---|
template |
dict[str, Any]
|
Dictionary defining the output schema. Keys are new column names, values are Jinja2 templates (strings, lists, or nested structures). Must be JSON-serializable. |
processor_type |
Literal[SCHEMA_TRANSFORM]
|
Discriminator field, always |
get_processor_config_from_kwargs(processor_type, **kwargs)
Create a processor configuration from a processor type and keyword arguments.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
processor_type
|
ProcessorType
|
The type of processor to create. |
required |
**kwargs
|
Any
|
Additional keyword arguments passed to the processor constructor. |
{}
|
Returns:
| Type | Description |
|---|---|
ProcessorConfig
|
A processor configuration object of the specified type. |
Source code in src/data_designer/config/processors.py
62 63 64 65 66 67 68 69 70 71 72 73 74 75 | |