Skip to content

Data Designer's Config Builder

The config_builder module provides a high-level interface for constructing Data Designer configurations through the DataDesignerConfigBuilder class, enabling programmatic creation of DataDesignerConfig objects by incrementally adding column configurations, constraints, processors, and profilers.

You can use the builder to create Data Designer configurations from scratch or from existing configurations stored in YAML/JSON files via from_config(). The builder includes validation capabilities to catch configuration errors early and can work with seed datasets from local sources or external datastores. Once configured, use build() to generate the final configuration object or write_config() to serialize it to disk.

Model configs are required

DataDesignerConfigBuilder requires a list of model configurations at initialization. This tells the builder which model aliases can be referenced by LLM-generated columns (such as LLMTextColumnConfig, LLMCodeColumnConfig, LLMStructuredColumnConfig, and LLMJudgeColumnConfig). Each model configuration specifies the model alias, model provider, model ID, and inference parameters that will be used during data generation.

Classes:

Name Description
BuilderConfig

Configuration container for Data Designer builder.

DataDesignerConfigBuilder

Config builder for Data Designer configurations.

BuilderConfig

Bases: ExportableConfigBase

Configuration container for Data Designer builder.

This class holds the main Data Designer configuration along with optional datastore settings needed for seed dataset operations.

Attributes:

Name Type Description
data_designer DataDesignerConfig

The main Data Designer configuration containing columns, constraints, profilers, and other settings.

DataDesignerConfigBuilder(model_configs=None, tool_configs=None)

Config builder for Data Designer configurations.

This class provides a high-level interface for building Data Designer configurations.

Initialize a new DataDesignerConfigBuilder instance.

Parameters:

Name Type Description Default
model_configs list[ModelConfig] | str | Path | None

Model configurations. Can be: - None to use default model configurations in local mode - A list of ModelConfig objects - A string or Path to a model configuration file

None
tool_configs list[ToolConfig] | None

Tool configurations for MCP tool calling. Can be: - None if no tool configs are needed - A list of ToolConfig objects

None

Methods:

Name Description
add_column

Add a Data Designer column configuration to the current Data Designer configuration.

add_constraint

Add a constraint to the current Data Designer configuration.

add_model_config

Add a model configuration to the current Data Designer configuration.

add_processor

Add a processor to the current Data Designer configuration.

add_profiler

Add a profiler to the current Data Designer configuration.

add_tool_config

Add a tool configuration to the current Data Designer configuration.

build

Build a DataDesignerConfig instance based on the current builder configuration.

delete_column

Delete the column with the given name.

delete_constraints

Delete all constraints for the given target column.

delete_model_config

Delete a model configuration from the current Data Designer configuration by alias.

delete_tool_config

Delete a tool configuration from the current Data Designer configuration by alias.

from_config

Create a DataDesignerConfigBuilder from an existing configuration.

get_builder_config

Get the builder config for the current Data Designer configuration.

get_column_config

Get a column configuration by name.

get_column_configs

Get all column configurations.

get_columns_excluding_type

Get all column configurations excluding the specified type.

get_columns_of_type

Get all column configurations of the specified type.

get_constraints

Get all constraints for the given target column.

get_processor_configs

Get processor configuration objects.

get_profilers

Get all profilers.

get_seed_config

Get the seed config for the current Data Designer configuration.

get_tool_config

Get a tool configuration by alias.

num_columns_of_type

Get the count of columns of the specified type.

with_seed_dataset

Add a seed dataset to the current Data Designer configuration.

write_config

Write the current configuration to a file.

Attributes:

Name Type Description
allowed_references list[str]

Get all referenceable variables allowed in prompt templates and expressions.

info ConfigBuilderInfo

Get the ConfigBuilderInfo object for this builder.

model_configs list[ModelConfig]

Get the model configurations for this builder.

tool_configs list[ToolConfig]

Get the tool configurations for this builder.

Source code in packages/data-designer-config/src/data_designer/config/config_builder.py
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
def __init__(
    self,
    model_configs: list[ModelConfig] | str | Path | None = None,
    tool_configs: list[ToolConfig] | None = None,
):
    """Initialize a new DataDesignerConfigBuilder instance.

    Args:
        model_configs: Model configurations. Can be:
            - None to use default model configurations in local mode
            - A list of ModelConfig objects
            - A string or Path to a model configuration file
        tool_configs: Tool configurations for MCP tool calling. Can be:
            - None if no tool configs are needed
            - A list of ToolConfig objects
    """
    self._column_configs = {}
    self._model_configs = _load_model_configs(model_configs)
    self._tool_configs: list[ToolConfig] = tool_configs or []
    self._processor_configs: list[ProcessorConfigT] = []
    self._seed_config: SeedConfig | None = None
    self._constraints: list[ColumnConstraintT] = []
    self._profilers: list[ColumnProfilerConfigT] = []

allowed_references property

Get all referenceable variables allowed in prompt templates and expressions.

This includes all column names and their side effect columns that can be referenced in prompt templates and expressions within the configuration.

Returns:

Type Description
list[str]

A list of variable names that can be referenced in templates and expressions.

info property

Get the ConfigBuilderInfo object for this builder.

Returns:

Type Description
ConfigBuilderInfo

An object containing information about the configuration.

model_configs property

Get the model configurations for this builder.

Returns:

Type Description
list[ModelConfig]

A list of ModelConfig objects used for data generation.

tool_configs property

Get the tool configurations for this builder.

Returns:

Type Description
list[ToolConfig]

A list of ToolConfig objects used for MCP tool calling.

add_column(column_config=None, *, name=None, column_type=None, **kwargs)

Add a Data Designer column configuration to the current Data Designer configuration.

If no column config object is provided, you must provide the name, column_type, and any additional keyword arguments that are required by the column config constructor.

Parameters:

Name Type Description Default
column_config ColumnConfigT | None

Data Designer column config object to add.

None
name str | None

Name of the column to add. This is only used if column_config is not provided.

None
column_type DataDesignerColumnType | None

Column type to add. This is only used if column_config is not provided.

None
**kwargs

Additional keyword arguments to pass to the column constructor.

{}

Returns:

Type Description
Self

The current Data Designer config builder instance.

Raises:

Type Description
BuilderConfigurationError

If the column name collides with an existing seed dataset column.

Source code in packages/data-designer-config/src/data_designer/config/config_builder.py
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
def add_column(
    self,
    column_config: ColumnConfigT | None = None,
    *,
    name: str | None = None,
    column_type: DataDesignerColumnType | None = None,
    **kwargs,
) -> Self:
    """Add a Data Designer column configuration to the current Data Designer configuration.

    If no column config object is provided, you must provide the `name`, `column_type`, and any
    additional keyword arguments that are required by the column config constructor.

    Args:
        column_config: Data Designer column config object to add.
        name: Name of the column to add. This is only used if `column_config` is not provided.
        column_type: Column type to add. This is only used if `column_config` is not provided.
        **kwargs: Additional keyword arguments to pass to the column constructor.

    Returns:
        The current Data Designer config builder instance.

    Raises:
        BuilderConfigurationError: If the column name collides with an existing seed dataset column.
    """
    if column_config is None:
        if name is None or column_type is None:
            raise BuilderConfigurationError(
                "🛑 You must provide either a 'column_config' object or 'name' *and* 'column_type' "
                f"with additional keyword arguments. You provided {column_config=}, {name=}, and {column_type=}."
            )
        column_config = get_column_config_from_kwargs(name=name, column_type=column_type, **kwargs)

    allowed_column_configs = ColumnConfigT.__args__
    if not any(isinstance(column_config, t) for t in allowed_column_configs):
        raise InvalidColumnTypeError(
            f"🛑 Invalid column config object: '{column_config}'. Valid column config options are: "
            f"{', '.join([t.__name__ for t in allowed_column_configs])}"
        )

    self._column_configs[column_config.name] = column_config
    return self

add_constraint(constraint=None, *, constraint_type=None, **kwargs)

Add a constraint to the current Data Designer configuration.

Currently, constraints are only supported for numerical samplers.

You can either provide a constraint object directly, or provide a constraint type and additional keyword arguments to construct the constraint object. Valid constraint types are: - "scalar_inequality": Constraint between a column and a scalar value. - "column_inequality": Constraint between two columns.

Parameters:

Name Type Description Default
constraint ColumnConstraintT | None

Constraint object to add.

None
constraint_type ConstraintType | None

Constraint type to add. Ignored when constraint is provided.

None
**kwargs

Additional keyword arguments to pass to the constraint constructor.

{}

Returns:

Type Description
Self

The current Data Designer config builder instance.

Source code in packages/data-designer-config/src/data_designer/config/config_builder.py
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
def add_constraint(
    self,
    constraint: ColumnConstraintT | None = None,
    *,
    constraint_type: ConstraintType | None = None,
    **kwargs,
) -> Self:
    """Add a constraint to the current Data Designer configuration.

    Currently, constraints are only supported for numerical samplers.

    You can either provide a constraint object directly, or provide a constraint type and
    additional keyword arguments to construct the constraint object. Valid constraint types are:
        - "scalar_inequality": Constraint between a column and a scalar value.
        - "column_inequality": Constraint between two columns.

    Args:
        constraint: Constraint object to add.
        constraint_type: Constraint type to add. Ignored when `constraint` is provided.
        **kwargs: Additional keyword arguments to pass to the constraint constructor.

    Returns:
        The current Data Designer config builder instance.
    """
    if constraint is None:
        if constraint_type is None:
            raise BuilderConfigurationError(
                "🛑 You must provide either a 'constraint' object or 'constraint_type' "
                "with additional keyword arguments."
            )
        try:
            constraint_type = ConstraintType(constraint_type)
        except Exception:
            raise BuilderConfigurationError(
                f"🛑 Invalid constraint type: {constraint_type}. Valid options are: "
                f"{', '.join([t.value for t in ConstraintType])}"
            )
        if constraint_type == ConstraintType.SCALAR_INEQUALITY:
            constraint = ScalarInequalityConstraint(**kwargs)
        elif constraint_type == ConstraintType.COLUMN_INEQUALITY:
            constraint = ColumnInequalityConstraint(**kwargs)

    allowed_constraint_types = ColumnConstraintT.__args__
    if not any(isinstance(constraint, t) for t in allowed_constraint_types):
        raise BuilderConfigurationError(
            "🛑 Invalid constraint object. Valid constraint options are: "
            f"{', '.join([t.__name__ for t in allowed_constraint_types])}"
        )

    self._constraints.append(constraint)
    return self

add_model_config(model_config)

Add a model configuration to the current Data Designer configuration.

Parameters:

Name Type Description Default
model_config ModelConfig

The model configuration to add.

required
Source code in packages/data-designer-config/src/data_designer/config/config_builder.py
209
210
211
212
213
214
215
216
217
218
219
220
def add_model_config(self, model_config: ModelConfig) -> Self:
    """Add a model configuration to the current Data Designer configuration.

    Args:
        model_config: The model configuration to add.
    """
    if model_config.alias in [mc.alias for mc in self._model_configs]:
        raise BuilderConfigurationError(
            f"🛑 Model configuration with alias {model_config.alias} already exists. Please delete the existing model configuration or choose a different alias."
        )
    self._model_configs.append(model_config)
    return self

add_processor(processor_config=None, *, processor_type=None, **kwargs)

Add a processor to the current Data Designer configuration.

If a processor with the same name already exists, it is replaced (upsert), making notebook cells safely re-runnable.

You can either provide a processor config object directly, or provide a processor type and additional keyword arguments to construct the processor config object.

Parameters:

Name Type Description Default
processor_config ProcessorConfigT | None

The processor configuration object to add.

None
processor_type ProcessorType | None

The type of processor to add.

None
**kwargs

Additional keyword arguments to pass to the processor constructor.

{}

Returns:

Type Description
Self

The current Data Designer config builder instance.

Source code in packages/data-designer-config/src/data_designer/config/config_builder.py
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
def add_processor(
    self,
    processor_config: ProcessorConfigT | None = None,
    *,
    processor_type: ProcessorType | None = None,
    **kwargs,
) -> Self:
    """Add a processor to the current Data Designer configuration.

    If a processor with the same name already exists, it is replaced (upsert),
    making notebook cells safely re-runnable.

    You can either provide a processor config object directly, or provide a processor type and
    additional keyword arguments to construct the processor config object.

    Args:
        processor_config: The processor configuration object to add.
        processor_type: The type of processor to add.
        **kwargs: Additional keyword arguments to pass to the processor constructor.

    Returns:
        The current Data Designer config builder instance.
    """
    if processor_config is None:
        if processor_type is None:
            raise BuilderConfigurationError(
                "🛑 You must provide either a 'processor_config' object or 'processor_type' "
                "with additional keyword arguments."
            )
        processor_config = get_processor_config_from_kwargs(processor_type=processor_type, **kwargs)

    self._remove_processor_by_name(processor_config.name)

    # Checks elsewhere fail if DropColumnsProcessor drops a column but it is not marked for drop
    if processor_config.processor_type == ProcessorType.DROP_COLUMNS:
        for col in self._resolve_drop_column_names(processor_config.column_names):
            self._column_configs[col].drop = True

    self._processor_configs.append(processor_config)
    return self

add_profiler(profiler_config)

Add a profiler to the current Data Designer configuration.

Parameters:

Name Type Description Default
profiler_config ColumnProfilerConfigT

The profiler configuration object to add.

required

Returns:

Type Description
Self

The current Data Designer config builder instance.

Raises:

Type Description
BuilderConfigurationError

If the profiler configuration is of an invalid type.

Source code in packages/data-designer-config/src/data_designer/config/config_builder.py
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
def add_profiler(self, profiler_config: ColumnProfilerConfigT) -> Self:
    """Add a profiler to the current Data Designer configuration.

    Args:
        profiler_config: The profiler configuration object to add.

    Returns:
        The current Data Designer config builder instance.

    Raises:
        BuilderConfigurationError: If the profiler configuration is of an invalid type.
    """
    if not isinstance(profiler_config, ColumnProfilerConfigT):
        if hasattr(ColumnProfilerConfigT, "__args__"):
            valid_options = ", ".join([t.__name__ for t in ColumnProfilerConfigT.__args__])
        else:
            valid_options = ColumnProfilerConfigT.__name__
        raise BuilderConfigurationError(f"🛑 Invalid profiler object. Valid profiler options are: {valid_options}")
    self._profilers.append(profiler_config)
    return self

add_tool_config(tool_config)

Add a tool configuration to the current Data Designer configuration.

Parameters:

Name Type Description Default
tool_config ToolConfig

The tool configuration to add.

required

Returns:

Type Description
Self

The current Data Designer config builder instance.

Raises:

Type Description
BuilderConfigurationError

If a tool configuration with the same alias already exists.

Source code in packages/data-designer-config/src/data_designer/config/config_builder.py
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
def add_tool_config(self, tool_config: ToolConfig) -> Self:
    """Add a tool configuration to the current Data Designer configuration.

    Args:
        tool_config: The tool configuration to add.

    Returns:
        The current Data Designer config builder instance.

    Raises:
        BuilderConfigurationError: If a tool configuration with the same alias already exists.
    """
    if tool_config.tool_alias in {tc.tool_alias for tc in self._tool_configs}:
        raise BuilderConfigurationError(
            f"Tool configuration with alias {tool_config.tool_alias} already exists. "
            "Please delete the existing tool configuration or choose a different alias."
        )
    self._tool_configs.append(tool_config)
    return self

build()

Build a DataDesignerConfig instance based on the current builder configuration.

Returns:

Type Description
DataDesignerConfig

The current Data Designer config object.

Raises:

Type Description
BuilderConfigurationError

If any ToolConfig has duplicate tool names in its allow_tools list.

Source code in packages/data-designer-config/src/data_designer/config/config_builder.py
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
def build(self) -> DataDesignerConfig:
    """Build a DataDesignerConfig instance based on the current builder configuration.

    Returns:
        The current Data Designer config object.

    Raises:
        BuilderConfigurationError: If any ToolConfig has duplicate tool names in its allow_tools list.
    """
    self._validate_tool_configs_no_duplicates()
    return DataDesignerConfig(
        model_configs=self._model_configs,
        tool_configs=self._tool_configs,
        seed_config=self._seed_config,
        columns=list(self._column_configs.values()),
        constraints=self._constraints or None,
        profilers=self._profilers or None,
        processors=self._processor_configs or None,
    )

delete_column(column_name)

Delete the column with the given name.

Parameters:

Name Type Description Default
column_name str

Name of the column to delete.

required

Returns:

Type Description
Self

The current Data Designer config builder instance.

Raises:

Type Description
BuilderConfigurationError

If trying to delete a seed dataset column.

Source code in packages/data-designer-config/src/data_designer/config/config_builder.py
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
def delete_column(self, column_name: str) -> Self:
    """Delete the column with the given name.

    Args:
        column_name: Name of the column to delete.

    Returns:
        The current Data Designer config builder instance.

    Raises:
        BuilderConfigurationError: If trying to delete a seed dataset column.
    """
    if isinstance(self._column_configs.get(column_name), SeedDatasetColumnConfig):
        raise BuilderConfigurationError("Seed columns cannot be deleted. Please update the seed dataset instead.")
    self._column_configs.pop(column_name, None)
    return self

delete_constraints(target_column)

Delete all constraints for the given target column.

Parameters:

Name Type Description Default
target_column str

Name of the column to remove constraints for.

required

Returns:

Type Description
Self

The current Data Designer config builder instance.

Source code in packages/data-designer-config/src/data_designer/config/config_builder.py
505
506
507
508
509
510
511
512
513
514
515
def delete_constraints(self, target_column: str) -> Self:
    """Delete all constraints for the given target column.

    Args:
        target_column: Name of the column to remove constraints for.

    Returns:
        The current Data Designer config builder instance.
    """
    self._constraints = [c for c in self._constraints if c.target_column != target_column]
    return self

delete_model_config(alias)

Delete a model configuration from the current Data Designer configuration by alias.

Parameters:

Name Type Description Default
alias str

The alias of the model configuration to delete.

required
Source code in packages/data-designer-config/src/data_designer/config/config_builder.py
222
223
224
225
226
227
228
229
230
231
232
233
def delete_model_config(self, alias: str) -> Self:
    """Delete a model configuration from the current Data Designer configuration by alias.

    Args:
        alias: The alias of the model configuration to delete.
    """
    self._model_configs = [mc for mc in self._model_configs if mc.alias != alias]
    if len(self._model_configs) == 0:
        logger.warning(
            f"⚠️ No model configurations found after deleting model configuration with alias {alias}. Please add a model configuration before building the configuration."
        )
    return self

delete_tool_config(alias)

Delete a tool configuration from the current Data Designer configuration by alias.

Parameters:

Name Type Description Default
alias str

The alias of the tool configuration to delete.

required

Returns:

Type Description
Self

The current Data Designer config builder instance.

Source code in packages/data-designer-config/src/data_designer/config/config_builder.py
255
256
257
258
259
260
261
262
263
264
265
def delete_tool_config(self, alias: str) -> Self:
    """Delete a tool configuration from the current Data Designer configuration by alias.

    Args:
        alias: The alias of the tool configuration to delete.

    Returns:
        The current Data Designer config builder instance.
    """
    self._tool_configs = [tc for tc in self._tool_configs if tc.tool_alias != alias]
    return self

from_config(config) classmethod

Create a DataDesignerConfigBuilder from an existing configuration.

Accepts both the full BuilderConfig format (with a top-level data_designer key) and the shorthand DataDesignerConfig format (columns, model_configs, etc. at the top level). When the shorthand format is detected it is automatically normalized into a full BuilderConfig.

Parameters:

Name Type Description Default
config dict | str | Path | BuilderConfig

Configuration source. Can be: - A dictionary containing the configuration - A string or Path to a local YAML/JSON configuration file - An HTTP(S) URL string to a YAML/JSON configuration file - A BuilderConfig object

required

Returns:

Type Description
Self

A new instance populated with the configuration from the provided source.

Raises:

Type Description
ValueError

If the config format is invalid.

ValidationError

If the builder config loaded from the config is invalid.

Source code in packages/data-designer-config/src/data_designer/config/config_builder.py
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
@classmethod
def from_config(cls, config: dict | str | Path | BuilderConfig) -> Self:
    """Create a DataDesignerConfigBuilder from an existing configuration.

    Accepts both the full ``BuilderConfig`` format (with a top-level
    ``data_designer`` key) and the shorthand ``DataDesignerConfig`` format
    (``columns``, ``model_configs``, etc. at the top level). When the
    shorthand format is detected it is automatically normalized into a
    full ``BuilderConfig``.

    Args:
        config: Configuration source. Can be:
            - A dictionary containing the configuration
            - A string or Path to a local YAML/JSON configuration file
            - An HTTP(S) URL string to a YAML/JSON configuration file
            - A BuilderConfig object

    Returns:
        A new instance populated with the configuration from the provided source.

    Raises:
        ValueError: If the config format is invalid.
        ValidationError: If the builder config loaded from the config is invalid.
    """
    if isinstance(config, BuilderConfig):
        builder_config = config
    else:
        json_config = json.loads(serialize_data(smart_load_yaml(config)))
        # Normalize shorthand DataDesignerConfig into full BuilderConfig
        if "columns" in json_config and "data_designer" not in json_config:
            json_config = {"data_designer": json_config}
        builder_config = BuilderConfig.model_validate(json_config)

    builder = cls(
        model_configs=builder_config.data_designer.model_configs,
        tool_configs=builder_config.data_designer.tool_configs,
    )
    data_designer_config = builder_config.data_designer

    for col in data_designer_config.columns:
        builder.add_column(col)

    for constraint in data_designer_config.constraints or []:
        builder.add_constraint(constraint=constraint)

    if (seed_config := data_designer_config.seed_config) is not None:
        builder.with_seed_dataset(
            seed_config.source,
            sampling_strategy=seed_config.sampling_strategy,
            selection_strategy=seed_config.selection_strategy,
        )

    return builder

get_builder_config()

Get the builder config for the current Data Designer configuration.

Returns:

Type Description
BuilderConfig

The builder config.

Source code in packages/data-designer-config/src/data_designer/config/config_builder.py
694
695
696
697
698
699
700
def get_builder_config(self) -> BuilderConfig:
    """Get the builder config for the current Data Designer configuration.

    Returns:
        The builder config.
    """
    return BuilderConfig(data_designer=self.build())

get_column_config(name)

Get a column configuration by name.

Parameters:

Name Type Description Default
name str

Name of the column to retrieve the config for.

required

Returns:

Type Description
ColumnConfigT

The column configuration object.

Raises:

Type Description
KeyError

If no column with the given name exists.

Source code in packages/data-designer-config/src/data_designer/config/config_builder.py
534
535
536
537
538
539
540
541
542
543
544
545
546
def get_column_config(self, name: str) -> ColumnConfigT:
    """Get a column configuration by name.

    Args:
        name: Name of the column to retrieve the config for.

    Returns:
        The column configuration object.

    Raises:
        KeyError: If no column with the given name exists.
    """
    return self._column_configs[name]

get_column_configs()

Get all column configurations.

Returns:

Type Description
list[ColumnConfigT]

A list of all column configuration objects.

Source code in packages/data-designer-config/src/data_designer/config/config_builder.py
548
549
550
551
552
553
554
def get_column_configs(self) -> list[ColumnConfigT]:
    """Get all column configurations.

    Returns:
        A list of all column configuration objects.
    """
    return list(self._column_configs.values())

get_columns_excluding_type(column_type)

Get all column configurations excluding the specified type.

Parameters:

Name Type Description Default
column_type DataDesignerColumnType

The type of columns to exclude.

required

Returns:

Type Description
list[ColumnConfigT]

A list of column configurations that do not match the specified type.

Source code in packages/data-designer-config/src/data_designer/config/config_builder.py
596
597
598
599
600
601
602
603
604
605
606
def get_columns_excluding_type(self, column_type: DataDesignerColumnType) -> list[ColumnConfigT]:
    """Get all column configurations excluding the specified type.

    Args:
        column_type: The type of columns to exclude.

    Returns:
        A list of column configurations that do not match the specified type.
    """
    column_type = resolve_string_enum(column_type, DataDesignerColumnType)
    return [c for c in self._column_configs.values() if c.column_type != column_type]

get_columns_of_type(column_type)

Get all column configurations of the specified type.

Parameters:

Name Type Description Default
column_type DataDesignerColumnType

The type of columns to filter by.

required

Returns:

Type Description
list[ColumnConfigT]

A list of column configurations matching the specified type.

Source code in packages/data-designer-config/src/data_designer/config/config_builder.py
584
585
586
587
588
589
590
591
592
593
594
def get_columns_of_type(self, column_type: DataDesignerColumnType) -> list[ColumnConfigT]:
    """Get all column configurations of the specified type.

    Args:
        column_type: The type of columns to filter by.

    Returns:
        A list of column configurations matching the specified type.
    """
    column_type = resolve_string_enum(column_type, DataDesignerColumnType)
    return [c for c in self._column_configs.values() if c.column_type == column_type]

get_constraints(target_column)

Get all constraints for the given target column.

Parameters:

Name Type Description Default
target_column str

Name of the column to get constraints for.

required

Returns:

Type Description
list[ColumnConstraintT]

A list of constraint objects targeting the specified column.

Source code in packages/data-designer-config/src/data_designer/config/config_builder.py
573
574
575
576
577
578
579
580
581
582
def get_constraints(self, target_column: str) -> list[ColumnConstraintT]:
    """Get all constraints for the given target column.

    Args:
        target_column: Name of the column to get constraints for.

    Returns:
        A list of constraint objects targeting the specified column.
    """
    return [c for c in self._constraints if c.target_column == target_column]

get_processor_configs()

Get processor configuration objects.

Returns:

Type Description
list[ProcessorConfigT]

A dictionary of processor configuration objects by dataset builder stage.

Source code in packages/data-designer-config/src/data_designer/config/config_builder.py
608
609
610
611
612
613
614
def get_processor_configs(self) -> list[ProcessorConfigT]:
    """Get processor configuration objects.

    Returns:
        A dictionary of processor configuration objects by dataset builder stage.
    """
    return self._processor_configs

get_profilers()

Get all profilers.

Returns:

Type Description
list[ColumnProfilerConfigT]

A list of profiler configuration objects.

Source code in packages/data-designer-config/src/data_designer/config/config_builder.py
452
453
454
455
456
457
458
def get_profilers(self) -> list[ColumnProfilerConfigT]:
    """Get all profilers.

    Returns:
        A list of profiler configuration objects.
    """
    return self._profilers

get_seed_config()

Get the seed config for the current Data Designer configuration.

Returns:

Type Description
SeedConfig | None

The seed config if configured, None otherwise.

Source code in packages/data-designer-config/src/data_designer/config/config_builder.py
616
617
618
619
620
621
622
def get_seed_config(self) -> SeedConfig | None:
    """Get the seed config for the current Data Designer configuration.

    Returns:
        The seed config if configured, None otherwise.
    """
    return self._seed_config

get_tool_config(alias)

Get a tool configuration by alias.

Parameters:

Name Type Description Default
alias str

The alias of the tool configuration to retrieve.

required

Returns:

Type Description
ToolConfig

The tool configuration object.

Raises:

Type Description
KeyError

If no tool configuration with the given alias exists.

Source code in packages/data-designer-config/src/data_designer/config/config_builder.py
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
def get_tool_config(self, alias: str) -> ToolConfig:
    """Get a tool configuration by alias.

    Args:
        alias: The alias of the tool configuration to retrieve.

    Returns:
        The tool configuration object.

    Raises:
        KeyError: If no tool configuration with the given alias exists.
    """
    for tc in self._tool_configs:
        if tc.tool_alias == alias:
            return tc
    raise KeyError(f"No tool configuration with alias {alias!r} found")

num_columns_of_type(column_type)

Get the count of columns of the specified type.

Parameters:

Name Type Description Default
column_type DataDesignerColumnType

The type of columns to count.

required

Returns:

Type Description
int

The number of columns matching the specified type.

Source code in packages/data-designer-config/src/data_designer/config/config_builder.py
624
625
626
627
628
629
630
631
632
633
def num_columns_of_type(self, column_type: DataDesignerColumnType) -> int:
    """Get the count of columns of the specified type.

    Args:
        column_type: The type of columns to count.

    Returns:
        The number of columns matching the specified type.
    """
    return len(self.get_columns_of_type(column_type))

with_seed_dataset(seed_source, *, sampling_strategy=SamplingStrategy.ORDERED, selection_strategy=None)

Add a seed dataset to the current Data Designer configuration.

This method sets the seed dataset for the configuration, but columns are not resolved until compilation (including validation) is performed by the engine using a SeedReader.

Parameters:

Name Type Description Default
seed_source SeedSourceT

The pointer to the seed dataset.

required
sampling_strategy SamplingStrategy

The sampling strategy to use when generating data from the seed dataset. Defaults to ORDERED sampling.

ORDERED
selection_strategy IndexRange | PartitionBlock | None

An optional selection strategy to use when generating data from the seed dataset. Defaults to None.

None

Returns:

Type Description
Self

The current Data Designer config builder instance.

Source code in packages/data-designer-config/src/data_designer/config/config_builder.py
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
def with_seed_dataset(
    self,
    seed_source: SeedSourceT,
    *,
    sampling_strategy: SamplingStrategy = SamplingStrategy.ORDERED,
    selection_strategy: IndexRange | PartitionBlock | None = None,
) -> Self:
    """Add a seed dataset to the current Data Designer configuration.

    This method sets the seed dataset for the configuration, but columns are not resolved until
    compilation (including validation) is performed by the engine using a SeedReader.

    Args:
        seed_source: The pointer to the seed dataset.
        sampling_strategy: The sampling strategy to use when generating data from the seed dataset.
            Defaults to ORDERED sampling.
        selection_strategy: An optional selection strategy to use when generating data from the seed dataset.
            Defaults to None.

    Returns:
        The current Data Designer config builder instance.
    """
    self._seed_config = SeedConfig(
        source=seed_source,
        sampling_strategy=sampling_strategy,
        selection_strategy=selection_strategy,
    )
    return self

write_config(path, indent=2, **kwargs)

Write the current configuration to a file.

Parameters:

Name Type Description Default
path str | Path

Path to the file to write the configuration to.

required
indent int | None

Indentation level for the output file (default: 2).

2
**kwargs

Additional keyword arguments passed to the serialization methods used.

{}

Raises:

Type Description
BuilderConfigurationError

If the file format is unsupported.

BuilderSerializationError

If the configuration cannot be serialized.

Source code in packages/data-designer-config/src/data_designer/config/config_builder.py
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
def write_config(self, path: str | Path, indent: int | None = 2, **kwargs) -> None:
    """Write the current configuration to a file.

    Args:
        path: Path to the file to write the configuration to.
        indent: Indentation level for the output file (default: 2).
        **kwargs: Additional keyword arguments passed to the serialization methods used.

    Raises:
        BuilderConfigurationError: If the file format is unsupported.
        BuilderSerializationError: If the configuration cannot be serialized.
    """
    if (seed_config := self.get_seed_config()) is not None and isinstance(seed_config.source, DataFrameSeedSource):
        raise BuilderSerializationError(
            "This builder was configured with a DataFrame seed dataset. "
            "DataFrame seeds cannot be serialized to config files. "
            "To serialize this configuration, change your seed dataset to a more persistent, serializable source format. "
            "For example, you could make a local file seed source from the dataframe:\n\n"
            "LocalFileSeedSource.from_dataframe(my_dataframe, '/path/to/data.parquet')"
        )

    cfg = self.get_builder_config()
    suffix = Path(path).suffix
    if suffix in {".yaml", ".yml"}:
        cfg.to_yaml(path, indent=indent, **kwargs)
    elif suffix == ".json":
        cfg.to_json(path, indent=indent, **kwargs)
    else:
        raise BuilderConfigurationError(f"🛑 Unsupported file type: {suffix}. Must be `.yaml`, `.yml` or `.json`.")