Skip to content

Processors

The processors module defines configuration objects for post-generation data transformations. Processors run after column generation and can modify the dataset schema or content before output.

Classes:

Name Description
DropColumnsProcessorConfig

Drop columns from the output dataset (prefer drop=True in the column config).

ProcessorType

Enumeration of available processor types.

SchemaTransformProcessorConfig

Configuration for transforming the dataset schema using Jinja2 templates.

Functions:

Name Description
get_processor_config_from_kwargs

Create a processor configuration from a processor type and keyword arguments.

DropColumnsProcessorConfig

Bases: ProcessorConfig

Drop columns from the output dataset (prefer drop=True in the column config).

This processor removes specified columns from the generated dataset. The dropped columns are saved separately in a dropped-columns directory for reference. When this processor is added via the config builder, the corresponding column configs are automatically marked with drop = True.

Attributes:

Name Type Description
column_names required

List of column names to remove from the output dataset.

Inherited Attributes

name (required): Name of the processor.

ProcessorType

Bases: str, Enum

Enumeration of available processor types.

Attributes:

Name Type Description
DROP_COLUMNS

Processor that removes specified columns from the output dataset.

SCHEMA_TRANSFORM

Processor that creates a new dataset with a transformed schema using Jinja2 templates.

SchemaTransformProcessorConfig

Bases: ProcessorConfig

Configuration for transforming the dataset schema using Jinja2 templates.

This processor creates a new dataset with a transformed schema. Each key in the template becomes a column in the output, and values are Jinja2 templates that can reference any column in the batch. The transformed dataset is written to a processors-outputs/{processor_name}/ directory alongside the main dataset.

Attributes:

Name Type Description
template required

Dictionary defining the output schema. Keys are new column names, values are Jinja2 templates (strings, lists, or nested structures). Must be JSON-serializable.

Inherited Attributes

name (required): Name of the processor.

get_processor_config_from_kwargs(processor_type, **kwargs)

Create a processor configuration from a processor type and keyword arguments.

Parameters:

Name Type Description Default
processor_type ProcessorType

The type of processor to create.

required
**kwargs Any

Additional keyword arguments passed to the processor constructor.

{}

Returns:

Type Description
ProcessorConfig

A processor configuration object of the specified type.

Source code in packages/data-designer-config/src/data_designer/config/processors.py
28
29
30
31
32
33
34
35
36
37
38
39
40
41
def get_processor_config_from_kwargs(processor_type: ProcessorType, **kwargs: Any) -> ProcessorConfig:
    """Create a processor configuration from a processor type and keyword arguments.

    Args:
        processor_type: The type of processor to create.
        **kwargs: Additional keyword arguments passed to the processor constructor.

    Returns:
        A processor configuration object of the specified type.
    """
    if processor_type == ProcessorType.DROP_COLUMNS:
        return DropColumnsProcessorConfig(**kwargs)
    elif processor_type == ProcessorType.SCHEMA_TRANSFORM:
        return SchemaTransformProcessorConfig(**kwargs)