Skip to content

Data Designer's Config Builder

The config_builder module provides a high-level interface for constructing Data Designer configurations through the DataDesignerConfigBuilder class, enabling programmatic creation of DataDesignerConfig objects by incrementally adding column configurations, constraints, processors, and profilers.

You can use the builder to create Data Designer configurations from scratch or from existing configurations stored in YAML/JSON files via from_config(). The builder includes validation capabilities to catch configuration errors early and can work with seed datasets from local sources or external datastores. Once configured, use build() to generate the final configuration object or write_config() to serialize it to disk.

Model configs are required

DataDesignerConfigBuilder requires a list of model configurations at initialization. This tells the builder which model aliases can be referenced by LLM-generated columns (such as LLMTextColumnConfig, LLMCodeColumnConfig, LLMStructuredColumnConfig, and LLMJudgeColumnConfig). Each model configuration specifies the model alias, model provider, model ID, and inference parameters that will be used during data generation.

Classes:

Name Description
BuilderConfig

Configuration container for Data Designer builder.

DataDesignerConfigBuilder

Config builder for Data Designer configurations.

BuilderConfig

Bases: ExportableConfigBase

Configuration container for Data Designer builder.

This class holds the main Data Designer configuration along with optional datastore settings needed for seed dataset operations.

Attributes:

Name Type Description
data_designer DataDesignerConfig

The main Data Designer configuration containing columns, constraints, profilers, and other settings.

datastore_settings Optional[DatastoreSettings]

Optional datastore settings for accessing external datasets.

DataDesignerConfigBuilder(model_configs=None)

Config builder for Data Designer configurations.

This class provides a high-level interface for building Data Designer configurations.

Initialize a new DataDesignerConfigBuilder instance.

Parameters:

Name Type Description Default
model_configs Optional[Union[list[ModelConfig], str, Path]]

Model configurations. Can be: - None to use default model configurations in local mode - A list of ModelConfig objects - A string or Path to a model configuration file

None

Methods:

Name Description
add_column

Add a Data Designer column configuration to the current Data Designer configuration.

add_constraint

Add a constraint to the current Data Designer configuration.

add_model_config

Add a model configuration to the current Data Designer configuration.

add_processor

Add a processor to the current Data Designer configuration.

add_profiler

Add a profiler to the current Data Designer configuration.

build

Build a DataDesignerConfig instance based on the current builder configuration.

delete_column

Delete the column with the given name.

delete_constraints

Delete all constraints for the given target column.

delete_model_config

Delete a model configuration from the current Data Designer configuration by alias.

from_config

Create a DataDesignerConfigBuilder from an existing configuration.

get_builder_config

Get the builder config for the current Data Designer configuration.

get_column_config

Get a column configuration by name.

get_column_configs

Get all column configurations.

get_columns_excluding_type

Get all column configurations excluding the specified type.

get_columns_of_type

Get all column configurations of the specified type.

get_constraints

Get all constraints for the given target column.

get_llm_gen_columns

Get all LLM-generated column configurations.

get_processor_configs

Get processor configuration objects.

get_profilers

Get all profilers.

get_seed_config

Get the seed config for the current Data Designer configuration.

get_seed_datastore_settings

Get most recent datastore settings for the current Data Designer configuration.

num_columns_of_type

Get the count of columns of the specified type.

set_seed_datastore_settings

Set the datastore settings for the seed dataset.

validate

Validate the current Data Designer configuration.

with_seed_dataset

Add a seed dataset to the current Data Designer configuration.

write_config

Write the current configuration to a file.

Attributes:

Name Type Description
allowed_references list[str]

Get all referenceable variables allowed in prompt templates and expressions.

info ConfigBuilderInfo

Get the ConfigBuilderInfo object for this builder.

model_configs list[ModelConfig]

Get the model configurations for this builder.

Source code in src/data_designer/config/config_builder.py
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
def __init__(self, model_configs: Optional[Union[list[ModelConfig], str, Path]] = None):
    """Initialize a new DataDesignerConfigBuilder instance.

    Args:
        model_configs: Model configurations. Can be:
            - None to use default model configurations in local mode
            - A list of ModelConfig objects
            - A string or Path to a model configuration file
    """
    if not can_run_data_designer_locally() and (model_configs is None or len(model_configs) == 0):
        raise BuilderConfigurationError("🛑 Model configurations are required!")

    self._column_configs = {}
    self._model_configs = load_model_configs(model_configs or get_default_model_configs())
    self._processor_configs: list[ProcessorConfig] = []
    self._seed_config: Optional[SeedConfig] = None
    self._constraints: list[ColumnConstraintT] = []
    self._profilers: list[ColumnProfilerConfigT] = []
    self._datastore_settings: Optional[DatastoreSettings] = None

allowed_references property

Get all referenceable variables allowed in prompt templates and expressions.

This includes all column names and their side effect columns that can be referenced in prompt templates and expressions within the configuration.

Returns:

Type Description
list[str]

A list of variable names that can be referenced in templates and expressions.

info property

Get the ConfigBuilderInfo object for this builder.

Returns:

Type Description
ConfigBuilderInfo

An object containing information about the configuration.

model_configs property

Get the model configurations for this builder.

Returns:

Type Description
list[ModelConfig]

A list of ModelConfig objects used for data generation.

add_column(column_config=None, *, name=None, column_type=None, **kwargs)

Add a Data Designer column configuration to the current Data Designer configuration.

If no column config object is provided, you must provide the name, column_type, and any additional keyword arguments that are required by the column config constructor.

Parameters:

Name Type Description Default
column_config Optional[ColumnConfigT]

Data Designer column config object to add.

None
name Optional[str]

Name of the column to add. This is only used if column_config is not provided.

None
column_type Optional[DataDesignerColumnType]

Column type to add. This is only used if column_config is not provided.

None
**kwargs

Additional keyword arguments to pass to the column constructor.

{}

Returns:

Type Description
Self

The current Data Designer config builder instance.

Source code in src/data_designer/config/config_builder.py
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
def add_column(
    self,
    column_config: Optional[ColumnConfigT] = None,
    *,
    name: Optional[str] = None,
    column_type: Optional[DataDesignerColumnType] = None,
    **kwargs,
) -> Self:
    """Add a Data Designer column configuration to the current Data Designer configuration.

    If no column config object is provided, you must provide the `name`, `column_type`, and any
    additional keyword arguments that are required by the column config constructor.

    Args:
        column_config: Data Designer column config object to add.
        name: Name of the column to add. This is only used if `column_config` is not provided.
        column_type: Column type to add. This is only used if `column_config` is not provided.
        **kwargs: Additional keyword arguments to pass to the column constructor.

    Returns:
        The current Data Designer config builder instance.
    """
    if column_config is None:
        if name is None or column_type is None:
            raise BuilderConfigurationError(
                "🛑 You must provide either a 'column_config' object or 'name' *and* 'column_type' "
                f"with additional keyword arguments. You provided {column_config=}, {name=}, and {column_type=}."
            )
        column_config = get_column_config_from_kwargs(name=name, column_type=column_type, **kwargs)

    allowed_column_configs = ColumnConfigT.__args__
    if not any(isinstance(column_config, t) for t in allowed_column_configs):
        raise InvalidColumnTypeError(
            f"🛑 Invalid column config object: '{column_config}'. Valid column config options are: "
            f"{', '.join([t.__name__ for t in allowed_column_configs])}"
        )

    self._column_configs[column_config.name] = column_config
    return self

add_constraint(constraint=None, *, constraint_type=None, **kwargs)

Add a constraint to the current Data Designer configuration.

Currently, constraints are only supported for numerical samplers.

You can either provide a constraint object directly, or provide a constraint type and additional keyword arguments to construct the constraint object. Valid constraint types are: - "scalar_inequality": Constraint between a column and a scalar value. - "column_inequality": Constraint between two columns.

Parameters:

Name Type Description Default
constraint Optional[ColumnConstraintT]

Constraint object to add.

None
constraint_type Optional[ConstraintType]

Constraint type to add. Ignored when constraint is provided.

None
**kwargs

Additional keyword arguments to pass to the constraint constructor.

{}

Returns:

Type Description
Self

The current Data Designer config builder instance.

Source code in src/data_designer/config/config_builder.py
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
def add_constraint(
    self,
    constraint: Optional[ColumnConstraintT] = None,
    *,
    constraint_type: Optional[ConstraintType] = None,
    **kwargs,
) -> Self:
    """Add a constraint to the current Data Designer configuration.

    Currently, constraints are only supported for numerical samplers.

    You can either provide a constraint object directly, or provide a constraint type and
    additional keyword arguments to construct the constraint object. Valid constraint types are:
        - "scalar_inequality": Constraint between a column and a scalar value.
        - "column_inequality": Constraint between two columns.

    Args:
        constraint: Constraint object to add.
        constraint_type: Constraint type to add. Ignored when `constraint` is provided.
        **kwargs: Additional keyword arguments to pass to the constraint constructor.

    Returns:
        The current Data Designer config builder instance.
    """
    if constraint is None:
        if constraint_type is None:
            raise BuilderConfigurationError(
                "🛑 You must provide either a 'constraint' object or 'constraint_type' "
                "with additional keyword arguments."
            )
        try:
            constraint_type = ConstraintType(constraint_type)
        except Exception:
            raise BuilderConfigurationError(
                f"🛑 Invalid constraint type: {constraint_type}. Valid options are: "
                f"{', '.join([t.value for t in ConstraintType])}"
            )
        if constraint_type == ConstraintType.SCALAR_INEQUALITY:
            constraint = ScalarInequalityConstraint(**kwargs)
        elif constraint_type == ConstraintType.COLUMN_INEQUALITY:
            constraint = ColumnInequalityConstraint(**kwargs)

    allowed_constraint_types = ColumnConstraintT.__args__
    if not any(isinstance(constraint, t) for t in allowed_constraint_types):
        raise BuilderConfigurationError(
            "🛑 Invalid constraint object. Valid constraint options are: "
            f"{', '.join([t.__name__ for t in allowed_constraint_types])}"
        )

    self._constraints.append(constraint)
    return self

add_model_config(model_config)

Add a model configuration to the current Data Designer configuration.

Parameters:

Name Type Description Default
model_config ModelConfig

The model configuration to add.

required
Source code in src/data_designer/config/config_builder.py
188
189
190
191
192
193
194
195
196
197
198
199
def add_model_config(self, model_config: ModelConfig) -> Self:
    """Add a model configuration to the current Data Designer configuration.

    Args:
        model_config: The model configuration to add.
    """
    if model_config.alias in [mc.alias for mc in self._model_configs]:
        raise BuilderConfigurationError(
            f"🛑 Model configuration with alias {model_config.alias} already exists. Please delete the existing model configuration or choose a different alias."
        )
    self._model_configs.append(model_config)
    return self

add_processor(processor_config=None, *, processor_type=None, **kwargs)

Add a processor to the current Data Designer configuration.

You can either provide a processor config object directly, or provide a processor type and additional keyword arguments to construct the processor config object.

Parameters:

Name Type Description Default
processor_config Optional[ProcessorConfig]

The processor configuration object to add.

None
processor_type Optional[ProcessorType]

The type of processor to add.

None
**kwargs

Additional keyword arguments to pass to the processor constructor.

{}

Returns:

Type Description
Self

The current Data Designer config builder instance.

Source code in src/data_designer/config/config_builder.py
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
def add_processor(
    self,
    processor_config: Optional[ProcessorConfig] = None,
    *,
    processor_type: Optional[ProcessorType] = None,
    **kwargs,
) -> Self:
    """Add a processor to the current Data Designer configuration.

    You can either provide a processor config object directly, or provide a processor type and
    additional keyword arguments to construct the processor config object.

    Args:
        processor_config: The processor configuration object to add.
        processor_type: The type of processor to add.
        **kwargs: Additional keyword arguments to pass to the processor constructor.

    Returns:
        The current Data Designer config builder instance.
    """
    if processor_config is None:
        if processor_type is None:
            raise BuilderConfigurationError(
                "🛑 You must provide either a 'processor_config' object or 'processor_type' "
                "with additional keyword arguments."
            )
        processor_config = get_processor_config_from_kwargs(processor_type=processor_type, **kwargs)

    # Checks elsewhere fail if DropColumnsProcessor drops a column but it is not marked for drop
    if processor_config.processor_type == ProcessorType.DROP_COLUMNS:
        for column in processor_config.column_names:
            if column in self._column_configs:
                self._column_configs[column].drop = True

    self._processor_configs.append(processor_config)
    return self

add_profiler(profiler_config)

Add a profiler to the current Data Designer configuration.

Parameters:

Name Type Description Default
profiler_config ColumnProfilerConfigT

The profiler configuration object to add.

required

Returns:

Type Description
Self

The current Data Designer config builder instance.

Raises:

Type Description
BuilderConfigurationError

If the profiler configuration is of an invalid type.

Source code in src/data_designer/config/config_builder.py
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
def add_profiler(self, profiler_config: ColumnProfilerConfigT) -> Self:
    """Add a profiler to the current Data Designer configuration.

    Args:
        profiler_config: The profiler configuration object to add.

    Returns:
        The current Data Designer config builder instance.

    Raises:
        BuilderConfigurationError: If the profiler configuration is of an invalid type.
    """
    if not isinstance(profiler_config, ColumnProfilerConfigT):
        if hasattr(ColumnProfilerConfigT, "__args__"):
            valid_options = ", ".join([t.__name__ for t in ColumnProfilerConfigT.__args__])
        else:
            valid_options = ColumnProfilerConfigT.__name__
        raise BuilderConfigurationError(f"🛑 Invalid profiler object. Valid profiler options are: {valid_options}")
    self._profilers.append(profiler_config)
    return self

build(*, skip_validation=False, raise_exceptions=False)

Build a DataDesignerConfig instance based on the current builder configuration.

Parameters:

Name Type Description Default
skip_validation bool

Whether to skip validation of the configuration.

False
raise_exceptions bool

Whether to raise an exception if the configuration is invalid.

False

Returns:

Type Description
DataDesignerConfig

The current Data Designer config object.

Source code in src/data_designer/config/config_builder.py
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
def build(self, *, skip_validation: bool = False, raise_exceptions: bool = False) -> DataDesignerConfig:
    """Build a DataDesignerConfig instance based on the current builder configuration.

    Args:
        skip_validation: Whether to skip validation of the configuration.
        raise_exceptions: Whether to raise an exception if the configuration is invalid.

    Returns:
        The current Data Designer config object.
    """
    if not skip_validation:
        self.validate(raise_exceptions=raise_exceptions)

    return DataDesignerConfig(
        model_configs=self._model_configs,
        seed_config=self._seed_config,
        columns=list(self._column_configs.values()),
        constraints=self._constraints or None,
        profilers=self._profilers or None,
        processors=self._processor_configs or None,
    )

delete_column(column_name)

Delete the column with the given name.

Parameters:

Name Type Description Default
column_name str

Name of the column to delete.

required

Returns:

Type Description
Self

The current Data Designer config builder instance.

Raises:

Type Description
BuilderConfigurationError

If trying to delete a seed dataset column.

Source code in src/data_designer/config/config_builder.py
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
def delete_column(self, column_name: str) -> Self:
    """Delete the column with the given name.

    Args:
        column_name: Name of the column to delete.

    Returns:
        The current Data Designer config builder instance.

    Raises:
        BuilderConfigurationError: If trying to delete a seed dataset column.
    """
    if isinstance(self._column_configs.get(column_name), SeedDatasetColumnConfig):
        raise BuilderConfigurationError("Seed columns cannot be deleted. Please update the seed dataset instead.")
    self._column_configs.pop(column_name, None)
    return self

delete_constraints(target_column)

Delete all constraints for the given target column.

Parameters:

Name Type Description Default
target_column str

Name of the column to remove constraints for.

required

Returns:

Type Description
Self

The current Data Designer config builder instance.

Source code in src/data_designer/config/config_builder.py
394
395
396
397
398
399
400
401
402
403
404
def delete_constraints(self, target_column: str) -> Self:
    """Delete all constraints for the given target column.

    Args:
        target_column: Name of the column to remove constraints for.

    Returns:
        The current Data Designer config builder instance.
    """
    self._constraints = [c for c in self._constraints if c.target_column != target_column]
    return self

delete_model_config(alias)

Delete a model configuration from the current Data Designer configuration by alias.

Parameters:

Name Type Description Default
alias str

The alias of the model configuration to delete.

required
Source code in src/data_designer/config/config_builder.py
201
202
203
204
205
206
207
208
209
210
211
212
def delete_model_config(self, alias: str) -> Self:
    """Delete a model configuration from the current Data Designer configuration by alias.

    Args:
        alias: The alias of the model configuration to delete.
    """
    self._model_configs = [mc for mc in self._model_configs if mc.alias != alias]
    if len(self._model_configs) == 0:
        logger.warning(
            f"⚠️ No model configurations found after deleting model configuration with alias {alias}. Please add a model configuration before building the configuration."
        )
    return self

from_config(config) classmethod

Create a DataDesignerConfigBuilder from an existing configuration.

Parameters:

Name Type Description Default
config Union[dict, str, Path, BuilderConfig]

Configuration source. Can be: - A dictionary containing the configuration - A string or Path to a YAML/JSON configuration file - A BuilderConfig object

required

Returns:

Type Description
Self

A new instance populated with the configuration from the provided source.

Raises:

Type Description
ValueError

If the config format is invalid.

ValidationError

If the builder config loaded from the config is invalid.

Source code in src/data_designer/config/config_builder.py
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
@classmethod
def from_config(cls, config: Union[dict, str, Path, BuilderConfig]) -> Self:
    """Create a DataDesignerConfigBuilder from an existing configuration.

    Args:
        config: Configuration source. Can be:
            - A dictionary containing the configuration
            - A string or Path to a YAML/JSON configuration file
            - A BuilderConfig object

    Returns:
        A new instance populated with the configuration from the provided source.

    Raises:
        ValueError: If the config format is invalid.
        ValidationError: If the builder config loaded from the config is invalid.
    """
    if isinstance(config, BuilderConfig):
        builder_config = config
    else:
        json_config = json.loads(serialize_data(smart_load_yaml(config)))
        builder_config = BuilderConfig.model_validate(json_config)

    builder = cls(model_configs=builder_config.data_designer.model_configs)
    config = builder_config.data_designer

    for col in config.columns:
        builder.add_column(col)

    for constraint in config.constraints or []:
        builder.add_constraint(constraint=constraint)

    if config.seed_config:
        if builder_config.datastore_settings is None:
            if can_run_data_designer_locally():
                seed_dataset_reference = LocalSeedDatasetReference(dataset=config.seed_config.dataset)
            else:
                raise BuilderConfigurationError("🛑 Datastore settings are required.")
        else:
            seed_dataset_reference = DatastoreSeedDatasetReference(
                dataset=config.seed_config.dataset,
                datastore_settings=builder_config.datastore_settings,
            )
            builder.set_seed_datastore_settings(builder_config.datastore_settings)
        builder.with_seed_dataset(
            seed_dataset_reference,
            sampling_strategy=config.seed_config.sampling_strategy,
            selection_strategy=config.seed_config.selection_strategy,
        )

    return builder

get_builder_config()

Get the builder config for the current Data Designer configuration.

Returns:

Type Description
BuilderConfig

The builder config.

Source code in src/data_designer/config/config_builder.py
612
613
614
615
616
617
618
def get_builder_config(self) -> BuilderConfig:
    """Get the builder config for the current Data Designer configuration.

    Returns:
        The builder config.
    """
    return BuilderConfig(data_designer=self.build(), datastore_settings=self._datastore_settings)

get_column_config(name)

Get a column configuration by name.

Parameters:

Name Type Description Default
name str

Name of the column to retrieve the config for.

required

Returns:

Type Description
ColumnConfigT

The column configuration object.

Raises:

Type Description
KeyError

If no column with the given name exists.

Source code in src/data_designer/config/config_builder.py
423
424
425
426
427
428
429
430
431
432
433
434
435
def get_column_config(self, name: str) -> ColumnConfigT:
    """Get a column configuration by name.

    Args:
        name: Name of the column to retrieve the config for.

    Returns:
        The column configuration object.

    Raises:
        KeyError: If no column with the given name exists.
    """
    return self._column_configs[name]

get_column_configs()

Get all column configurations.

Returns:

Type Description
list[ColumnConfigT]

A list of all column configuration objects.

Source code in src/data_designer/config/config_builder.py
437
438
439
440
441
442
443
def get_column_configs(self) -> list[ColumnConfigT]:
    """Get all column configurations.

    Returns:
        A list of all column configuration objects.
    """
    return list(self._column_configs.values())

get_columns_excluding_type(column_type)

Get all column configurations excluding the specified type.

Parameters:

Name Type Description Default
column_type DataDesignerColumnType

The type of columns to exclude.

required

Returns:

Type Description
list[ColumnConfigT]

A list of column configurations that do not match the specified type.

Source code in src/data_designer/config/config_builder.py
476
477
478
479
480
481
482
483
484
485
486
def get_columns_excluding_type(self, column_type: DataDesignerColumnType) -> list[ColumnConfigT]:
    """Get all column configurations excluding the specified type.

    Args:
        column_type: The type of columns to exclude.

    Returns:
        A list of column configurations that do not match the specified type.
    """
    column_type = resolve_string_enum(column_type, DataDesignerColumnType)
    return [c for c in self._column_configs.values() if c.column_type != column_type]

get_columns_of_type(column_type)

Get all column configurations of the specified type.

Parameters:

Name Type Description Default
column_type DataDesignerColumnType

The type of columns to filter by.

required

Returns:

Type Description
list[ColumnConfigT]

A list of column configurations matching the specified type.

Source code in src/data_designer/config/config_builder.py
464
465
466
467
468
469
470
471
472
473
474
def get_columns_of_type(self, column_type: DataDesignerColumnType) -> list[ColumnConfigT]:
    """Get all column configurations of the specified type.

    Args:
        column_type: The type of columns to filter by.

    Returns:
        A list of column configurations matching the specified type.
    """
    column_type = resolve_string_enum(column_type, DataDesignerColumnType)
    return [c for c in self._column_configs.values() if c.column_type == column_type]

get_constraints(target_column)

Get all constraints for the given target column.

Parameters:

Name Type Description Default
target_column str

Name of the column to get constraints for.

required

Returns:

Type Description
list[ColumnConstraintT]

A list of constraint objects targeting the specified column.

Source code in src/data_designer/config/config_builder.py
445
446
447
448
449
450
451
452
453
454
def get_constraints(self, target_column: str) -> list[ColumnConstraintT]:
    """Get all constraints for the given target column.

    Args:
        target_column: Name of the column to get constraints for.

    Returns:
        A list of constraint objects targeting the specified column.
    """
    return [c for c in self._constraints if c.target_column == target_column]

get_llm_gen_columns()

Get all LLM-generated column configurations.

Returns:

Type Description
list[ColumnConfigT]

A list of column configurations that use LLM generation.

Source code in src/data_designer/config/config_builder.py
456
457
458
459
460
461
462
def get_llm_gen_columns(self) -> list[ColumnConfigT]:
    """Get all LLM-generated column configurations.

    Returns:
        A list of column configurations that use LLM generation.
    """
    return [c for c in self._column_configs.values() if column_type_is_llm_generated(c.column_type)]

get_processor_configs()

Get processor configuration objects.

Returns:

Type Description
dict[BuildStage, list[ProcessorConfig]]

A dictionary of processor configuration objects by dataset builder stage.

Source code in src/data_designer/config/config_builder.py
488
489
490
491
492
493
494
def get_processor_configs(self) -> dict[BuildStage, list[ProcessorConfig]]:
    """Get processor configuration objects.

    Returns:
        A dictionary of processor configuration objects by dataset builder stage.
    """
    return self._processor_configs

get_profilers()

Get all profilers.

Returns:

Type Description
list[ColumnProfilerConfigT]

A list of profiler configuration objects.

Source code in src/data_designer/config/config_builder.py
364
365
366
367
368
369
370
def get_profilers(self) -> list[ColumnProfilerConfigT]:
    """Get all profilers.

    Returns:
        A list of profiler configuration objects.
    """
    return self._profilers

get_seed_config()

Get the seed config for the current Data Designer configuration.

Returns:

Type Description
Optional[SeedConfig]

The seed config if configured, None otherwise.

Source code in src/data_designer/config/config_builder.py
496
497
498
499
500
501
502
def get_seed_config(self) -> Optional[SeedConfig]:
    """Get the seed config for the current Data Designer configuration.

    Returns:
        The seed config if configured, None otherwise.
    """
    return self._seed_config

get_seed_datastore_settings()

Get most recent datastore settings for the current Data Designer configuration.

Returns:

Type Description
Optional[DatastoreSettings]

The datastore settings if configured, None otherwise.

Source code in src/data_designer/config/config_builder.py
504
505
506
507
508
509
510
def get_seed_datastore_settings(self) -> Optional[DatastoreSettings]:
    """Get most recent datastore settings for the current Data Designer configuration.

    Returns:
        The datastore settings if configured, None otherwise.
    """
    return None if not self._datastore_settings else DatastoreSettings.model_validate(self._datastore_settings)

num_columns_of_type(column_type)

Get the count of columns of the specified type.

Parameters:

Name Type Description Default
column_type DataDesignerColumnType

The type of columns to count.

required

Returns:

Type Description
int

The number of columns matching the specified type.

Source code in src/data_designer/config/config_builder.py
512
513
514
515
516
517
518
519
520
521
def num_columns_of_type(self, column_type: DataDesignerColumnType) -> int:
    """Get the count of columns of the specified type.

    Args:
        column_type: The type of columns to count.

    Returns:
        The number of columns matching the specified type.
    """
    return len(self.get_columns_of_type(column_type))

set_seed_datastore_settings(datastore_settings)

Set the datastore settings for the seed dataset.

Parameters:

Name Type Description Default
datastore_settings Optional[DatastoreSettings]

The datastore settings to use for the seed dataset.

required
Source code in src/data_designer/config/config_builder.py
523
524
525
526
527
528
529
530
def set_seed_datastore_settings(self, datastore_settings: Optional[DatastoreSettings]) -> Self:
    """Set the datastore settings for the seed dataset.

    Args:
        datastore_settings: The datastore settings to use for the seed dataset.
    """
    self._datastore_settings = datastore_settings
    return self

validate(*, raise_exceptions=False)

Validate the current Data Designer configuration.

Parameters:

Name Type Description Default
raise_exceptions bool

Whether to raise an exception if the configuration is invalid.

False

Returns:

Type Description
Self

The current Data Designer config builder instance.

Raises:

Type Description
InvalidConfigError

If the configuration is invalid and raise_exceptions is True.

Source code in src/data_designer/config/config_builder.py
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
def validate(self, *, raise_exceptions: bool = False) -> Self:
    """Validate the current Data Designer configuration.

    Args:
        raise_exceptions: Whether to raise an exception if the configuration is invalid.

    Returns:
        The current Data Designer config builder instance.

    Raises:
        InvalidConfigError: If the configuration is invalid and raise_exceptions is True.
    """

    violations = validate_data_designer_config(
        columns=list(self._column_configs.values()),
        processor_configs=self._processor_configs,
        allowed_references=self.allowed_references,
    )
    rich_print_violations(violations)
    if raise_exceptions and len([v for v in violations if v.level == ViolationLevel.ERROR]) > 0:
        raise InvalidConfigError(
            "🛑 Your configuration contains validation errors. Please address the indicated issues and try again."
        )
    if len(violations) == 0:
        logger.info("✅ Validation passed")
    return self

with_seed_dataset(dataset_reference, *, sampling_strategy=SamplingStrategy.ORDERED, selection_strategy=None)

Add a seed dataset to the current Data Designer configuration.

This method sets the seed dataset for the configuration and automatically creates SeedDatasetColumnConfig objects for each column found in the dataset. The column names are fetched from the dataset source (Hugging Face Hub or NeMo Microservices Datastore).

Parameters:

Name Type Description Default
dataset_reference SeedDatasetReference

Seed dataset reference for fetching from the datastore.

required
sampling_strategy SamplingStrategy

The sampling strategy to use when generating data from the seed dataset. Defaults to ORDERED sampling.

ORDERED

Returns:

Type Description
Self

The current Data Designer config builder instance.

Source code in src/data_designer/config/config_builder.py
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
def with_seed_dataset(
    self,
    dataset_reference: SeedDatasetReference,
    *,
    sampling_strategy: SamplingStrategy = SamplingStrategy.ORDERED,
    selection_strategy: Optional[Union[IndexRange, PartitionBlock]] = None,
) -> Self:
    """Add a seed dataset to the current Data Designer configuration.

    This method sets the seed dataset for the configuration and automatically creates
    SeedDatasetColumnConfig objects for each column found in the dataset. The column
    names are fetched from the dataset source (Hugging Face Hub or NeMo Microservices Datastore).

    Args:
        dataset_reference: Seed dataset reference for fetching from the datastore.
        sampling_strategy: The sampling strategy to use when generating data from the seed dataset.
            Defaults to ORDERED sampling.

    Returns:
        The current Data Designer config builder instance.
    """
    self._seed_config = SeedConfig(
        dataset=dataset_reference.dataset,
        sampling_strategy=sampling_strategy,
        selection_strategy=selection_strategy,
    )
    self.set_seed_datastore_settings(
        dataset_reference.datastore_settings if hasattr(dataset_reference, "datastore_settings") else None
    )
    for column_name in fetch_seed_dataset_column_names(dataset_reference):
        self._column_configs[column_name] = SeedDatasetColumnConfig(name=column_name)
    return self

write_config(path, indent=2, **kwargs)

Write the current configuration to a file.

Parameters:

Name Type Description Default
path Union[str, Path]

Path to the file to write the configuration to.

required
indent Optional[int]

Indentation level for the output file (default: 2).

2
**kwargs

Additional keyword arguments passed to the serialization methods used.

{}

Raises:

Type Description
BuilderConfigurationError

If the file format is unsupported.

Source code in src/data_designer/config/config_builder.py
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
def write_config(self, path: Union[str, Path], indent: Optional[int] = 2, **kwargs) -> None:
    """Write the current configuration to a file.

    Args:
        path: Path to the file to write the configuration to.
        indent: Indentation level for the output file (default: 2).
        **kwargs: Additional keyword arguments passed to the serialization methods used.

    Raises:
        BuilderConfigurationError: If the file format is unsupported.
    """
    cfg = self.get_builder_config()
    suffix = Path(path).suffix
    if suffix in {".yaml", ".yml"}:
        cfg.to_yaml(path, indent=indent, **kwargs)
    elif suffix == ".json":
        cfg.to_json(path, indent=indent, **kwargs)
    else:
        raise BuilderConfigurationError(f"🛑 Unsupported file type: {suffix}. Must be `.yaml`, `.yml` or `.json`.")