Processors
The processors module defines configuration objects for post-generation data transformations. Processors run after column generation and can modify the dataset schema or content before output.
Classes:
| Name | Description |
|---|---|
DropColumnsProcessorConfig |
Drop columns from the output dataset (prefer |
ProcessorType |
Enumeration of available processor types. |
SchemaTransformProcessorConfig |
Configuration for transforming the dataset schema using Jinja2 templates. |
Functions:
| Name | Description |
|---|---|
get_processor_config_from_kwargs |
Create a processor configuration from a processor type and keyword arguments. |
DropColumnsProcessorConfig
Bases: ProcessorConfig
Drop columns from the output dataset (prefer drop=True in the column config).
This processor removes specified columns from the generated dataset. The dropped
columns are saved separately in a dropped-columns directory for reference.
When this processor is added via the config builder, the corresponding column
configs are automatically marked with drop = True.
Attributes:
| Name | Type | Description |
|---|---|---|
column_names |
required
|
List of column names to remove from the output dataset. |
Inherited Attributes
name (required): Name of the processor.
ProcessorType
Bases: str, Enum
Enumeration of available processor types.
Attributes:
| Name | Type | Description |
|---|---|---|
DROP_COLUMNS |
Processor that removes specified columns from the output dataset. |
|
SCHEMA_TRANSFORM |
Processor that creates a new dataset with a transformed schema using Jinja2 templates. |
SchemaTransformProcessorConfig
Bases: ProcessorConfig
Configuration for transforming the dataset schema using Jinja2 templates.
This processor creates a new dataset with a transformed schema. Each key in the
template becomes a column in the output, and values are Jinja2 templates that
can reference any column in the batch. The transformed dataset is written to
a processors-outputs/{processor_name}/ directory alongside the main dataset.
Attributes:
| Name | Type | Description |
|---|---|---|
template |
required
|
Dictionary defining the output schema. Keys are new column names, values are Jinja2 templates (strings, lists, or nested structures). Must be JSON-serializable. |
Inherited Attributes
name (required): Name of the processor.
get_processor_config_from_kwargs(processor_type, **kwargs)
Create a processor configuration from a processor type and keyword arguments.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
processor_type
|
ProcessorType
|
The type of processor to create. |
required |
**kwargs
|
Any
|
Additional keyword arguments passed to the processor constructor. |
{}
|
Returns:
| Type | Description |
|---|---|
ProcessorConfig
|
A processor configuration object of the specified type. |
Source code in packages/data-designer-config/src/data_designer/config/processors.py
28 29 30 31 32 33 34 35 36 37 38 39 40 41 | |