Skip to content

Data Designer Configuration

DataDesignerConfig is the main configuration object for builder datasets with Data Designer. It is a declarative configuration for defining the dataset you want to generate column-by-column, including options for dataset post-processing, validation, and profiling.

Generally, you should use the DataDesignerConfigBuilder to build your configuration, but you can also build it manually by instantiating the DataDesignerConfig class directly.

Classes:

Name Description
DataDesignerConfig

Configuration for NeMo Data Designer.

DataDesignerConfig

Bases: ExportableConfigBase

Configuration for NeMo Data Designer.

This class defines the main configuration structure for NeMo Data Designer, which orchestrates the generation of synthetic data.

Attributes:

Name Type Description
columns list[Annotated[ColumnConfigT, Field(discriminator='column_type')]]

Required list of column configurations defining how each column should be generated. Must contain at least one column.

model_configs Optional[list[ModelConfig]]

Optional list of model configurations for LLM-based generation. Each model config defines the model, provider, and inference parameters.

seed_config Optional[SeedConfig]

Optional seed dataset settings to use for generation.

constraints Optional[list[ColumnConstraintT]]

Optional list of column constraints.

profilers Optional[list[ColumnProfilerConfigT]]

Optional list of column profilers for analyzing generated data characteristics.