Skip to content

Models

The models module defines configuration objects for model-based generation. ModelProvider specifies connection and authentication details for custom providers. ModelConfig encapsulates model details including the model alias, identifier, and inference parameters. Inference Parameters controls model behavior through settings like temperature, top_p, and max_tokens, with support for both fixed values and distribution-based sampling. The module includes ImageContext for providing image inputs to multimodal models, and ImageInferenceParams for configuring image generation models.

For more information on how they are used, see below:

Classes:

Name Description
BaseInferenceParams

Base configuration for inference parameters.

ChatCompletionInferenceParams

Configuration for LLM inference parameters.

DistributionType

Types of distributions for sampling inference parameters.

EmbeddingInferenceParams

Configuration for embedding generation parameters.

ImageContext

Configuration for providing image context to multimodal models.

ImageInferenceParams

Configuration for image generation models.

ManualDistribution

Manual (discrete) distribution for sampling inference parameters.

ManualDistributionParams

Parameters for manual distribution sampling.

Modality

Supported modality types for multimodal model data.

ModalityDataType

Data type formats for multimodal data.

ModelConfig

Configuration for a model used for generation.

ModelProvider

Configuration for a custom model provider.

UniformDistribution

Uniform distribution for sampling inference parameters.

UniformDistributionParams

Parameters for uniform distribution sampling.

BaseInferenceParams

Bases: ConfigBase, ABC

Base configuration for inference parameters.

Attributes:

Name Type Description
generation_type GenerationType

Type of generation (chat-completion or embedding). Acts as discriminator.

max_parallel_requests int

Maximum number of parallel requests to the model API.

timeout int | None

Timeout in seconds for each request.

extra_body dict[str, Any] | None

Additional parameters to pass to the model API.

Methods:

Name Description
format_for_display

Format inference parameters for display as a single line.

get_formatted_params

Get a list of formatted parameter strings.

generate_kwargs property

Get the generate kwargs for the inference parameters.

Returns:

Type Description
dict[str, Any]

A dictionary of the generate kwargs.

format_for_display()

Format inference parameters for display as a single line.

Returns:

Type Description
str

Formatted string of inference parameters

Source code in packages/data-designer-config/src/data_designer/config/models.py
315
316
317
318
319
320
321
322
323
324
def format_for_display(self) -> str:
    """Format inference parameters for display as a single line.

    Returns:
        Formatted string of inference parameters
    """
    parts = self.get_formatted_params()
    if not parts:
        return "(none)"
    return ", ".join(parts)

get_formatted_params()

Get a list of formatted parameter strings.

Returns:

Type Description
list[str]

List of formatted parameter strings (e.g., ["temperature=0.70", "max_tokens=100"])

Source code in packages/data-designer-config/src/data_designer/config/models.py
326
327
328
329
330
331
332
333
334
335
336
337
def get_formatted_params(self) -> list[str]:
    """Get a list of formatted parameter strings.

    Returns:
        List of formatted parameter strings (e.g., ["temperature=0.70", "max_tokens=100"])
    """
    params_dict = self.model_dump(exclude_none=True, mode="json")
    parts = []
    for key, value in params_dict.items():
        formatted_value = self._format_value(key, value)
        parts.append(f"{key}={formatted_value}")
    return parts

ChatCompletionInferenceParams

Bases: BaseInferenceParams

Configuration for LLM inference parameters.

Attributes:

Name Type Description
generation_type Literal[CHAT_COMPLETION]

Type of generation, always "chat-completion" for this class.

temperature float | DistributionT | None

Sampling temperature (0.0-2.0). Can be a fixed value or a distribution for dynamic sampling.

top_p float | DistributionT | None

Nucleus sampling probability (0.0-1.0). Can be a fixed value or a distribution for dynamic sampling.

max_tokens int | None

Maximum number of tokens to generate in the response.

DistributionType

Bases: str, Enum

Types of distributions for sampling inference parameters.

EmbeddingInferenceParams

Bases: BaseInferenceParams

Configuration for embedding generation parameters.

Attributes:

Name Type Description
generation_type Literal[EMBEDDING]

Type of generation, always "embedding" for this class.

encoding_format Literal['float', 'base64']

Format of the embedding encoding ("float" or "base64").

dimensions int | None

Number of dimensions for the embedding.

ImageContext

Bases: ModalityContext

Configuration for providing image context to multimodal models.

Attributes:

Name Type Description
modality Modality

The modality type (always "image").

column_name str

Name of the column containing image data.

data_type ModalityDataType | None

Format of the image data ("url", "base64", or None for auto-detection). When None, the format is auto-detected: URLs are passed through, file paths that exist under base_path are loaded as base64, and other values are assumed to be base64.

image_format ImageFormat | None

Image format (required when data_type is explicitly "base64").

Methods:

Name Description
get_contexts

Get the contexts for the image modality.

get_contexts(record, *, base_path=None)

Get the contexts for the image modality.

Parameters:

Name Type Description Default
record dict

The record containing the image data. The data can be: - A JSON serialized list of strings - A list of strings - A single string

required
base_path str | None

Optional base path for resolving relative file paths. When provided, file paths that exist under this directory are loaded and converted to base64. This enables generated images (stored as relative paths in create mode) to be sent to remote model endpoints.

None

Returns:

Type Description
list[dict[str, Any]]

A list of image contexts.

Source code in packages/data-designer-config/src/data_designer/config/models.py
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
def get_contexts(self, record: dict, *, base_path: str | None = None) -> list[dict[str, Any]]:
    """Get the contexts for the image modality.

    Args:
        record: The record containing the image data. The data can be:
            - A JSON serialized list of strings
            - A list of strings
            - A single string
        base_path: Optional base path for resolving relative file paths.
            When provided, file paths that exist under this directory are loaded
            and converted to base64. This enables generated images (stored as relative
            paths in create mode) to be sent to remote model endpoints.

    Returns:
        A list of image contexts.
    """
    raw_value = record[self.column_name]

    # Normalize to list of strings
    if isinstance(raw_value, str):
        # Try to parse as JSON first
        try:
            parsed_value = json.loads(raw_value)
            if isinstance(parsed_value, list):
                context_values = parsed_value
            else:
                context_values = [raw_value]
        except (json.JSONDecodeError, TypeError):
            context_values = [raw_value]
    elif isinstance(raw_value, list):
        context_values = raw_value
    elif hasattr(raw_value, "__iter__") and not isinstance(raw_value, (str, bytes, dict)):
        # Handle array-like objects (numpy arrays, pandas Series, etc.)
        context_values = list(raw_value)
    else:
        context_values = [raw_value]

    # Build context list
    contexts = []
    for context_value in context_values:
        context = dict(type="image_url")
        if self.data_type is not None:
            # Explicit data_type: use existing behavior
            if self.data_type == ModalityDataType.URL:
                context["image_url"] = context_value
            else:
                context["image_url"] = {
                    "url": f"data:image/{self.image_format.value};base64,{context_value}",
                    "format": self.image_format.value,
                }
        else:
            # Auto-detect: resolve file paths, pass through URLs, assume base64 otherwise
            context["image_url"] = self._auto_resolve_context_value(context_value, base_path)
        contexts.append(context)

    return contexts

ImageInferenceParams

Bases: BaseInferenceParams

Configuration for image generation models.

Works for both diffusion and autoregressive image generation models. Pass all model-specific image options via extra_body.

Attributes:

Name Type Description
generation_type Literal[IMAGE]

Type of generation, always "image" for this class.

Example
# OpenAI-style (DALL·E): quality and size in extra_body or as top-level kwargs
dd.ImageInferenceParams(
    extra_body={"size": "1024x1024", "quality": "hd"}
)

# Gemini-style: generationConfig.imageConfig
dd.ImageInferenceParams(
    extra_body={
        "generationConfig": {
            "imageConfig": {
                "aspectRatio": "1:1",
                "imageSize": "1024"
            }
        }
    }
)

ManualDistribution

Bases: Distribution[ManualDistributionParams]

Manual (discrete) distribution for sampling inference parameters.

Samples from a discrete set of values with optional weights. Useful for testing specific values or creating custom probability distributions for temperature or top_p.

Attributes:

Name Type Description
distribution_type DistributionType | None

Type of distribution ("manual").

params ManualDistributionParams

Distribution parameters (values, weights).

Methods:

Name Description
sample

Sample a value from the manual distribution.

sample()

Sample a value from the manual distribution.

Returns:

Type Description
float

A float value sampled from the manual distribution.

Source code in packages/data-designer-config/src/data_designer/config/models.py
227
228
229
230
231
232
233
def sample(self) -> float:
    """Sample a value from the manual distribution.

    Returns:
        A float value sampled from the manual distribution.
    """
    return float(lazy.np.random.choice(self.params.values, p=self.params.weights))

ManualDistributionParams

Bases: ConfigBase

Parameters for manual distribution sampling.

Attributes:

Name Type Description
values list[float]

List of possible values to sample from.

weights list[float] | None

Optional list of weights for each value. If not provided, all values have equal probability.

Modality

Bases: str, Enum

Supported modality types for multimodal model data.

ModalityDataType

Bases: str, Enum

Data type formats for multimodal data.

ModelConfig

Bases: ConfigBase

Configuration for a model used for generation.

Attributes:

Name Type Description
alias str

User-defined alias to reference in column configurations.

model str

Model identifier (e.g., from build.nvidia.com or other providers).

inference_parameters InferenceParamsT

Inference parameters for the model (temperature, top_p, max_tokens, etc.). The generation_type is determined by the type of inference_parameters.

provider str | None

Optional model provider name if using custom providers.

skip_health_check bool

Whether to skip the health check for this model. Defaults to False.

generation_type property

Get the generation type from the inference parameters.

ModelProvider

Bases: ConfigBase

Configuration for a custom model provider.

Attributes:

Name Type Description
name str

Name of the model provider.

endpoint str

API endpoint URL for the provider.

provider_type str

Provider type (default: "openai"). Determines the API format to use.

api_key str | None

Optional API key for authentication.

extra_body dict[str, Any] | None

Additional parameters to pass in API requests.

extra_headers dict[str, str] | None

Additional headers to pass in API requests.

UniformDistribution

Bases: Distribution[UniformDistributionParams]

Uniform distribution for sampling inference parameters.

Samples values uniformly between low and high bounds. Useful for exploring a continuous range of values for temperature or top_p.

Attributes:

Name Type Description
distribution_type DistributionType | None

Type of distribution ("uniform").

params UniformDistributionParams

Distribution parameters (low, high).

Methods:

Name Description
sample

Sample a value from the uniform distribution.

sample()

Sample a value from the uniform distribution.

Returns:

Type Description
float

A float value sampled from the uniform distribution.

Source code in packages/data-designer-config/src/data_designer/config/models.py
268
269
270
271
272
273
274
def sample(self) -> float:
    """Sample a value from the uniform distribution.

    Returns:
        A float value sampled from the uniform distribution.
    """
    return float(lazy.np.random.uniform(low=self.params.low, high=self.params.high, size=1)[0])

UniformDistributionParams

Bases: ConfigBase

Parameters for uniform distribution sampling.

Attributes:

Name Type Description
low float

Lower bound (inclusive).

high float

Upper bound (exclusive).