Skip to content

Models

The models module defines configuration objects for model-based generation. ModelProvider, specifies connection and authentication details for custom providers. ModelConfig encapsulates model details including the model alias, identifier, and inference parameters. Inference Parameters controls model behavior through settings like temperature, top_p, and max_tokens, with support for both fixed values and distribution-based sampling. The module includes ImageContext for providing image inputs to multimodal models.

For more information on how they are used, see below:

Classes:

Name Description
BaseInferenceParams

Base configuration for inference parameters.

ChatCompletionInferenceParams

Configuration for LLM inference parameters.

DistributionType

Types of distributions for sampling inference parameters.

EmbeddingInferenceParams

Configuration for embedding generation parameters.

ImageContext

Configuration for providing image context to multimodal models.

ImageFormat

Supported image formats for image modality.

ManualDistribution

Manual (discrete) distribution for sampling inference parameters.

ManualDistributionParams

Parameters for manual distribution sampling.

Modality

Supported modality types for multimodal model data.

ModalityDataType

Data type formats for multimodal data.

ModelConfig

Configuration for a model used for generation.

ModelProvider

Configuration for a custom model provider.

UniformDistribution

Uniform distribution for sampling inference parameters.

UniformDistributionParams

Parameters for uniform distribution sampling.

BaseInferenceParams

Bases: ConfigBase, ABC

Base configuration for inference parameters.

Attributes:

Name Type Description
generation_type GenerationType

Type of generation (chat-completion or embedding). Acts as discriminator.

max_parallel_requests int

Maximum number of parallel requests to the model API.

timeout int | None

Timeout in seconds for each request.

extra_body dict[str, Any] | None

Additional parameters to pass to the model API.

Methods:

Name Description
format_for_display

Format inference parameters for display.

generate_kwargs property

Get the generate kwargs for the inference parameters.

Returns:

Type Description
dict[str, Any]

A dictionary of the generate kwargs.

format_for_display()

Format inference parameters for display.

Returns:

Type Description
str

Formatted string of inference parameters

Source code in packages/data-designer-config/src/data_designer/config/models.py
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
def format_for_display(self) -> str:
    """Format inference parameters for display.

    Returns:
        Formatted string of inference parameters
    """
    params_dict = self.model_dump(exclude_none=True, mode="json")

    if not params_dict:
        return "(none)"

    parts = []
    for key, value in params_dict.items():
        formatted_value = self._format_value(key, value)
        parts.append(f"{key}={formatted_value}")
    return ", ".join(parts)

ChatCompletionInferenceParams

Bases: BaseInferenceParams

Configuration for LLM inference parameters.

Attributes:

Name Type Description
generation_type Literal[CHAT_COMPLETION]

Type of generation, always "chat-completion" for this class.

temperature float | DistributionT | None

Sampling temperature (0.0-2.0). Can be a fixed value or a distribution for dynamic sampling.

top_p float | DistributionT | None

Nucleus sampling probability (0.0-1.0). Can be a fixed value or a distribution for dynamic sampling.

max_tokens int | None

Maximum number of tokens to generate in the response.

DistributionType

Bases: str, Enum

Types of distributions for sampling inference parameters.

EmbeddingInferenceParams

Bases: BaseInferenceParams

Configuration for embedding generation parameters.

Attributes:

Name Type Description
generation_type Literal[EMBEDDING]

Type of generation, always "embedding" for this class.

encoding_format Literal['float', 'base64']

Format of the embedding encoding ("float" or "base64").

dimensions int | None

Number of dimensions for the embedding.

ImageContext

Bases: ModalityContext

Configuration for providing image context to multimodal models.

Attributes:

Name Type Description
modality Modality

The modality type (always "image").

column_name str

Name of the column containing image data.

data_type ModalityDataType

Format of the image data ("url" or "base64").

image_format ImageFormat | None

Image format (required for base64 data).

Methods:

Name Description
get_contexts

Get the contexts for the image modality.

get_contexts(record)

Get the contexts for the image modality.

Parameters:

Name Type Description Default
record dict

The record containing the image data. The data can be: - A JSON serialized list of strings - A list of strings - A single string

required

Returns:

Type Description
list[dict[str, Any]]

A list of image contexts.

Source code in packages/data-designer-config/src/data_designer/config/models.py
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
def get_contexts(self, record: dict) -> list[dict[str, Any]]:
    """Get the contexts for the image modality.

    Args:
        record: The record containing the image data. The data can be:
            - A JSON serialized list of strings
            - A list of strings
            - A single string

    Returns:
        A list of image contexts.
    """
    raw_value = record[self.column_name]

    # Normalize to list of strings
    if isinstance(raw_value, str):
        # Try to parse as JSON first
        try:
            parsed_value = json.loads(raw_value)
            if isinstance(parsed_value, list):
                context_values = parsed_value
            else:
                context_values = [raw_value]
        except (json.JSONDecodeError, TypeError):
            context_values = [raw_value]
    elif isinstance(raw_value, list):
        context_values = raw_value
    elif hasattr(raw_value, "__iter__") and not isinstance(raw_value, (str, bytes, dict)):
        # Handle array-like objects (numpy arrays, pandas Series, etc.)
        context_values = list(raw_value)
    else:
        context_values = [raw_value]

    # Build context list
    contexts = []
    for context_value in context_values:
        context = dict(type="image_url")
        if self.data_type == ModalityDataType.URL:
            context["image_url"] = context_value
        else:
            context["image_url"] = {
                "url": f"data:image/{self.image_format.value};base64,{context_value}",
                "format": self.image_format.value,
            }
        contexts.append(context)

    return contexts

ImageFormat

Bases: str, Enum

Supported image formats for image modality.

ManualDistribution

Bases: Distribution[ManualDistributionParams]

Manual (discrete) distribution for sampling inference parameters.

Samples from a discrete set of values with optional weights. Useful for testing specific values or creating custom probability distributions for temperature or top_p.

Attributes:

Name Type Description
distribution_type DistributionType | None

Type of distribution ("manual").

params ManualDistributionParams

Distribution parameters (values, weights).

Methods:

Name Description
sample

Sample a value from the manual distribution.

sample()

Sample a value from the manual distribution.

Returns:

Type Description
float

A float value sampled from the manual distribution.

Source code in packages/data-designer-config/src/data_designer/config/models.py
189
190
191
192
193
194
195
def sample(self) -> float:
    """Sample a value from the manual distribution.

    Returns:
        A float value sampled from the manual distribution.
    """
    return float(np.random.choice(self.params.values, p=self.params.weights))

ManualDistributionParams

Bases: ConfigBase

Parameters for manual distribution sampling.

Attributes:

Name Type Description
values list[float]

List of possible values to sample from.

weights list[float] | None

Optional list of weights for each value. If not provided, all values have equal probability.

Modality

Bases: str, Enum

Supported modality types for multimodal model data.

ModalityDataType

Bases: str, Enum

Data type formats for multimodal data.

ModelConfig

Bases: ConfigBase

Configuration for a model used for generation.

Attributes:

Name Type Description
alias str

User-defined alias to reference in column configurations.

model str

Model identifier (e.g., from build.nvidia.com or other providers).

inference_parameters InferenceParamsT

Inference parameters for the model (temperature, top_p, max_tokens, etc.). The generation_type is determined by the type of inference_parameters.

provider str | None

Optional model provider name if using custom providers.

skip_health_check bool

Whether to skip the health check for this model. Defaults to False.

generation_type property

Get the generation type from the inference parameters.

ModelProvider

Bases: ConfigBase

Configuration for a custom model provider.

Attributes:

Name Type Description
name str

Name of the model provider.

endpoint str

API endpoint URL for the provider.

provider_type str

Provider type (default: "openai"). Determines the API format to use.

api_key str | None

Optional API key for authentication.

extra_body dict[str, Any] | None

Additional parameters to pass in API requests.

extra_headers dict[str, str] | None

Additional headers to pass in API requests.

UniformDistribution

Bases: Distribution[UniformDistributionParams]

Uniform distribution for sampling inference parameters.

Samples values uniformly between low and high bounds. Useful for exploring a continuous range of values for temperature or top_p.

Attributes:

Name Type Description
distribution_type DistributionType | None

Type of distribution ("uniform").

params UniformDistributionParams

Distribution parameters (low, high).

Methods:

Name Description
sample

Sample a value from the uniform distribution.

sample()

Sample a value from the uniform distribution.

Returns:

Type Description
float

A float value sampled from the uniform distribution.

Source code in packages/data-designer-config/src/data_designer/config/models.py
230
231
232
233
234
235
236
def sample(self) -> float:
    """Sample a value from the uniform distribution.

    Returns:
        A float value sampled from the uniform distribution.
    """
    return float(np.random.uniform(low=self.params.low, high=self.params.high, size=1)[0])

UniformDistributionParams

Bases: ConfigBase

Parameters for uniform distribution sampling.

Attributes:

Name Type Description
low float

Lower bound (inclusive).

high float

Upper bound (exclusive).