Experimental Feature
The plugin system is currently experimental and under active development. The documentation, examples, and plugin interface are subject to significant changes in future releases. If you encounter any issues, have questions, or have ideas for improvement, please consider starting a discussion on GitHub.
Example Plugin: Index Multiplier
In this guide, we will build a simple plugin that generates values by multiplying the row index by a user-specified multiplier. Admittedly, not the most useful plugin, but it demonstrates the required steps 😜.
A Data Designer plugin is implemented as a Python package with three main components:
- Configuration Class: Defines the parameters users can configure
- Task Class: Contains the core implementation of the plugin
- Plugin Object: Connects the config and task classes to make the plugin discoverable
Let's build the data-designer-index-multiplier plugin step by step.
Step 1: Create a Python package
Data Designer plugins are implemented as Python packages. We recommend using a standard structure for your plugin package.
For example, here is the structure of a data-designer-index-multiplier plugin:
data-designer-index-multiplier/
├── pyproject.toml
└── src/
└── data_designer_index_multiplier/
├── __init__.py
└── plugin.py
Step 2: Create the config class
The configuration class defines what parameters users can set when using your plugin. For column generator plugins, it must inherit from SingleColumnConfig and include a discriminator field.
from typing import Literal
from data_designer.config.column_configs import SingleColumnConfig
class IndexMultiplierColumnConfig(SingleColumnConfig):
"""Configuration for the index multiplier column generator."""
# Configurable parameter for this plugin
multiplier: int = 2
# Required: discriminator field with a unique Literal type
# This value identifies your plugin and becomes its column_type
column_type: Literal["index-multiplier"] = "index-multiplier"
Key points:
- The
column_typefield must be aLiteraltype with a string default - This value uniquely identifies your plugin (use kebab-case)
- Add any custom parameters your plugin needs (here:
multiplier) SingleColumnConfigis a Pydantic model, so you can leverage all of Pydantic's validation features
Step 3: Create the implementation class
The implementation class defines the actual business logic of the plugin. For column generator plugins, it inherits from ColumnGenerator and must implement a metadata static method and generate method:
import logging
import pandas as pd
from data_designer.engine.column_generators.generators.base import ColumnGeneratorFullColumn, GenerationStrategy
# Data Designer uses the standard Python logging module for logging
logger = logging.getLogger(__name__)
class IndexMultiplierColumnGenerator(ColumnGeneratorFullColumn[IndexMultiplierColumnConfig]):
def generate(self, data: pd.DataFrame) -> pd.DataFrame:
"""Generate the column data.
Args:
data: The current DataFrame being built
Returns:
The DataFrame with the new column added
"""
logger.info(
f"Generating column {self.config.name} "
f"with multiplier {self.config.multiplier}"
)
# Access config via self.config
data[self.config.name] = data.index * self.config.multiplier
return data
Key points:
- Generic type
ColumnGeneratorFullColumn[IndexMultiplierColumnConfig]connects the task to its config - You have access to the configuration parameters via
self.config
Understanding generation_strategy
The generation_strategy specifies how the column generator will generate data.
-
FULL_COLUMN: Generates the full column (at the batch level) in a single call togenerategeneratemust take as input apd.DataFramewith all previous columns and return apd.DataFramewith the generated column appended.- Inherit from
ColumnGeneratorFullColumnfor this strategy, as we do in the example above.
-
CELL_BY_CELL: Generates one cell at a timegeneratemust take as input adictwith key/value pairs for all previous columns and return adictwith an additional key/value for the generated cell- Supports concurrent workers via a
max_parallel_requestsparameter on the configuration - Inherit from
ColumnGeneratorCellByCellfor this strategy.
Step 4: Create the plugin object
Create a Plugin object that makes the plugin discoverable and connects the task and config classes.
from data_designer.plugins import Plugin, PluginType
# Plugin instance - this is what gets loaded via entry point
plugin = Plugin(
impl_qualified_name="data_designer_index_multiplier.plugin.IndexMultiplierColumnGenerator",
config_qualified_name="data_designer_index_multiplier.plugin.IndexMultiplierColumnConfig",
plugin_type=PluginType.COLUMN_GENERATOR,
emoji="🔌",
)
Complete plugin code
Pulling it all together, here is the complete plugin code for src/data_designer_index_multiplier/plugin.py:
import logging
from typing import Literal
import pandas as pd
from data_designer.config.column_configs import SingleColumnConfig
from data_designer.engine.column_generators.generators.base import ColumnGeneratorFullColumn
from data_designer.plugins import Plugin, PluginType
# Data Designer uses the standard Python logging module for logging
logger = logging.getLogger(__name__)
class IndexMultiplierColumnConfig(SingleColumnConfig):
"""Configuration for the index multiplier column generator."""
# Configurable parameter for this plugin
multiplier: int = 2
# Required: discriminator field with a unique Literal type
# This value identifies your plugin and becomes its column_type
column_type: Literal["index-multiplier"] = "index-multiplier"
class IndexMultiplierColumnGenerator(ColumnGeneratorFullColumn[IndexMultiplierColumnConfig]):
def generate(self, data: pd.DataFrame) -> pd.DataFrame:
"""Generate the column data.
Args:
data: The current DataFrame being built
Returns:
The DataFrame with the new column added
"""
logger.info(
f"Generating column {self.config.name} "
f"with multiplier {self.config.multiplier}"
)
# Access config via self.config
data[self.config.name] = data.index * self.config.multiplier
return data
# Plugin instance - this is what gets loaded via entry point
plugin = Plugin(
impl_qualified_name="data_designer_index_multiplier.plugin.IndexMultiplierColumnGenerator",
config_qualified_name="data_designer_index_multiplier.plugin.IndexMultiplierColumnConfig",
plugin_type=PluginType.COLUMN_GENERATOR,
emoji="🔌",
)
Step 5: Package your plugin
Create a pyproject.toml file to define your package and register the entry point:
[project]
name = "data-designer-index-multiplier"
version = "1.0.0"
description = "Data Designer index multiplier plugin"
requires-python = ">=3.10"
dependencies = [
"data-designer",
]
# Register this plugin via entry points
[project.entry-points."data_designer.plugins"]
index-multiplier = "data_designer_index_multiplier.plugin:plugin"
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[tool.hatch.build.targets.wheel]
packages = ["src/data_designer_index_multiplier"]
Entry Point Registration
Plugins are discovered automatically using Python entry points. It is important to register your plugin as an entry point under the data_designer.plugins group.
The entry point format is:
[project.entry-points."data_designer.plugins"]
<entry-point-name> = "<module.path>:<plugin-instance-name>"
Step 6: Use your plugin
Install your plugin in editable mode for testing:
# From the plugin directory
uv pip install -e .
Once installed, your plugin works just like built-in column types:
from data_designer_index_multiplier.plugin import IndexMultiplierColumnConfig
import data_designer.config as dd
from data_designer.interface import DataDesigner
data_designer = DataDesigner()
builder = dd.DataDesignerConfigBuilder()
# Add a regular column
builder.add_column(
dd.SamplerColumnConfig(
name="category",
sampler_type="category",
params=dd.CategorySamplerParams(values=["A", "B", "C"]),
)
)
# Add your custom plugin column
builder.add_column(
IndexMultiplierColumnConfig(
name="v",
multiplier=5,
)
)
# Generate data
results = data_designer.create(builder, num_records=10)
print(results.load_dataset())
Output:
category multiplied-index
0 B 0
1 A 5
2 C 10
3 A 15
4 B 20
...
That's it! You have now created and used your first Data Designer plugin. The last step is to package your plugin and share it with the community 🚀