Custom Columns
Custom columns let you implement your own generation logic using Python functions. Use them for multi-step LLM workflows, external API integration, or any scenario requiring full programmatic control. For reusable, distributable components, see Plugins instead.
Quick Start
import data_designer.config as dd
@dd.custom_column_generator(required_columns=["name"])
def create_greeting(row: dict) -> dict:
row["greeting"] = f"Hello, {row['name']}!"
return row
config_builder.add_column(
dd.CustomColumnConfig(
name="greeting",
generator_function=create_greeting,
)
)
Function Signatures
Three signatures are supported. Parameter names are validated:
| Args | Signature | Use Case |
|---|---|---|
| 1 | fn(row) -> dict |
Simple transforms |
| 2 | fn(row, generator_params) -> dict |
With typed params |
| 3 | fn(row, generator_params, models) -> dict |
LLM access via models dict |
For full_column strategy, use df instead of row.
For LLM access without params, use generator_params: None:
@dd.custom_column_generator(required_columns=["name"], model_aliases=["my-model"])
def generate_message(row: dict, generator_params: None, models: dict) -> dict:
response, _ = models["my-model"].generate(prompt=f"Greet {row['name']}")
row["greeting"] = response
return row
Model aliases are validated before generation starts. If an alias doesn't exist in your config, an error is raised during the health check.
Generation Strategies
| Strategy | Input | Use Case |
|---|---|---|
cell_by_cell (default) |
row: dict |
LLM calls, row-by-row logic |
full_column |
df: DataFrame |
Vectorized DataFrame operations |
Recommendation: Use cell_by_cell for LLM calls. The framework handles parallelization automatically. Use full_column only for vectorized operations that don't involve LLM calls.
For full_column, set generation_strategy=dd.GenerationStrategy.FULL_COLUMN.
The Decorator
@dd.custom_column_generator(
required_columns=["col1"], # DAG ordering
side_effect_columns=["extra"], # Additional columns created
model_aliases=["model1"], # Required for LLM access
)
Models Dict
The third argument is a dict of ModelFacade instances, keyed by alias. You must declare all models required in your custom column generator in model_aliases - this populates the models dict and enables health checks before generation starts.
@dd.custom_column_generator(model_aliases=["my-model"])
def my_generator(row: dict, generator_params: None, models: dict) -> dict:
model = models["my-model"]
response, trace = model.generate(
prompt="...",
parser=my_custom_parser, # optional, defaults to identity
system_prompt="...",
max_correction_steps=3,
)
row["result"] = response
return row
This gives you direct access to all ModelFacade capabilities: custom parsers, correction loops, structured output, tool use, etc.
Configuration
| Parameter | Type | Required | Description |
|---|---|---|---|
name |
str | Yes | Column name |
generator_function |
Callable | Yes | Decorated function |
generation_strategy |
GenerationStrategy | No | CELL_BY_CELL or FULL_COLUMN |
generator_params |
BaseModel | No | Typed params passed to function |
Multi-Turn Example
@dd.custom_column_generator(
required_columns=["topic"],
side_effect_columns=["draft", "critique"],
model_aliases=["writer", "editor"],
)
def writer_editor(row: dict, generator_params: None, models: dict) -> dict:
draft, _ = models["writer"].generate(prompt=f"Write about '{row['topic']}'")
critique, _ = models["editor"].generate(prompt=f"Critique: {draft}")
revised, _ = models["writer"].generate(prompt=f"Revise based on: {critique}\n\nOriginal: {draft}")
row["final_text"] = revised
row["draft"] = draft
row["critique"] = critique
return row
Development Testing
Test generators with real LLM calls without running the full pipeline:
data_designer = DataDesigner()
models = data_designer.get_models(["my-model"])
result = my_generator({"name": "Alice"}, None, models)