Skip to content

Models

The models module defines configuration objects for model-based generation. ModelProvider, specifies connection and authentication details for custom providers. ModelConfig encapsulates model details including the model alias, identifier, and inference parameters. Inference Parameters controls model behavior through settings like temperature, top_p, and max_tokens, with support for both fixed values and distribution-based sampling. The module includes ImageContext for providing image inputs to multimodal models.

For more information on how they are used, see below:

Classes:

Name Description
BaseInferenceParams

Base configuration for inference parameters.

ChatCompletionInferenceParams

Configuration for LLM inference parameters.

DistributionType

Types of distributions for sampling inference parameters.

EmbeddingInferenceParams

Configuration for embedding generation parameters.

ImageContext

Configuration for providing image context to multimodal models.

ImageFormat

Supported image formats for image modality.

ManualDistribution

Manual (discrete) distribution for sampling inference parameters.

ManualDistributionParams

Parameters for manual distribution sampling.

Modality

Supported modality types for multimodal model data.

ModalityDataType

Data type formats for multimodal data.

ModelConfig

Configuration for a model used for generation.

ModelProvider

Configuration for a custom model provider.

UniformDistribution

Uniform distribution for sampling inference parameters.

UniformDistributionParams

Parameters for uniform distribution sampling.

BaseInferenceParams

Bases: ConfigBase, ABC

Base configuration for inference parameters.

Attributes:

Name Type Description
generation_type GenerationType

Type of generation (chat-completion or embedding). Acts as discriminator.

max_parallel_requests int

Maximum number of parallel requests to the model API.

timeout int | None

Timeout in seconds for each request.

extra_body dict[str, Any] | None

Additional parameters to pass to the model API.

Methods:

Name Description
format_for_display

Format inference parameters for display.

generate_kwargs property

Get the generate kwargs for the inference parameters.

Returns:

Type Description
dict[str, Any]

A dictionary of the generate kwargs.

format_for_display()

Format inference parameters for display.

Returns:

Type Description
str

Formatted string of inference parameters

Source code in src/data_designer/config/models.py
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
def format_for_display(self) -> str:
    """Format inference parameters for display.

    Returns:
        Formatted string of inference parameters
    """
    params_dict = self.model_dump(exclude_none=True, mode="json")

    if not params_dict:
        return "(none)"

    parts = []
    for key, value in params_dict.items():
        formatted_value = self._format_value(key, value)
        parts.append(f"{key}={formatted_value}")
    return ", ".join(parts)

ChatCompletionInferenceParams

Bases: BaseInferenceParams

Configuration for LLM inference parameters.

Attributes:

Name Type Description
generation_type Literal[CHAT_COMPLETION]

Type of generation, always "chat-completion" for this class.

temperature float | DistributionT | None

Sampling temperature (0.0-2.0). Can be a fixed value or a distribution for dynamic sampling.

top_p float | DistributionT | None

Nucleus sampling probability (0.0-1.0). Can be a fixed value or a distribution for dynamic sampling.

max_tokens int | None

Maximum number of tokens to generate in the response.

DistributionType

Bases: str, Enum

Types of distributions for sampling inference parameters.

EmbeddingInferenceParams

Bases: BaseInferenceParams

Configuration for embedding generation parameters.

Attributes:

Name Type Description
generation_type Literal[EMBEDDING]

Type of generation, always "embedding" for this class.

encoding_format Literal['float', 'base64']

Format of the embedding encoding ("float" or "base64").

dimensions int | None

Number of dimensions for the embedding.

ImageContext

Bases: ModalityContext

Configuration for providing image context to multimodal models.

Attributes:

Name Type Description
modality Modality

The modality type (always "image").

column_name str

Name of the column containing image data.

data_type ModalityDataType

Format of the image data ("url" or "base64").

image_format ImageFormat | None

Image format (required for base64 data).

Methods:

Name Description
get_context

Get the context for the image modality.

get_context(record)

Get the context for the image modality.

Parameters:

Name Type Description Default
record dict

The record containing the image data.

required

Returns:

Type Description
dict[str, Any]

The context for the image modality.

Source code in src/data_designer/config/models.py
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
def get_context(self, record: dict) -> dict[str, Any]:
    """Get the context for the image modality.

    Args:
        record: The record containing the image data.

    Returns:
        The context for the image modality.
    """
    context = dict(type="image_url")
    context_value = record[self.column_name]
    if self.data_type == ModalityDataType.URL:
        context["image_url"] = context_value
    else:
        context["image_url"] = {
            "url": f"data:image/{self.image_format.value};base64,{context_value}",
            "format": self.image_format.value,
        }
    return context

ImageFormat

Bases: str, Enum

Supported image formats for image modality.

ManualDistribution

Bases: Distribution[ManualDistributionParams]

Manual (discrete) distribution for sampling inference parameters.

Samples from a discrete set of values with optional weights. Useful for testing specific values or creating custom probability distributions for temperature or top_p.

Attributes:

Name Type Description
distribution_type DistributionType | None

Type of distribution ("manual").

params ManualDistributionParams

Distribution parameters (values, weights).

Methods:

Name Description
sample

Sample a value from the manual distribution.

sample()

Sample a value from the manual distribution.

Returns:

Type Description
float

A float value sampled from the manual distribution.

Source code in src/data_designer/config/models.py
160
161
162
163
164
165
166
def sample(self) -> float:
    """Sample a value from the manual distribution.

    Returns:
        A float value sampled from the manual distribution.
    """
    return float(np.random.choice(self.params.values, p=self.params.weights))

ManualDistributionParams

Bases: ConfigBase

Parameters for manual distribution sampling.

Attributes:

Name Type Description
values list[float]

List of possible values to sample from.

weights list[float] | None

Optional list of weights for each value. If not provided, all values have equal probability.

Modality

Bases: str, Enum

Supported modality types for multimodal model data.

ModalityDataType

Bases: str, Enum

Data type formats for multimodal data.

ModelConfig

Bases: ConfigBase

Configuration for a model used for generation.

Attributes:

Name Type Description
alias str

User-defined alias to reference in column configurations.

model str

Model identifier (e.g., from build.nvidia.com or other providers).

inference_parameters InferenceParamsT

Inference parameters for the model (temperature, top_p, max_tokens, etc.). The generation_type is determined by the type of inference_parameters.

provider str | None

Optional model provider name if using custom providers.

generation_type property

Get the generation type from the inference parameters.

ModelProvider

Bases: ConfigBase

Configuration for a custom model provider.

Attributes:

Name Type Description
name str

Name of the model provider.

endpoint str

API endpoint URL for the provider.

provider_type str

Provider type (default: "openai"). Determines the API format to use.

api_key str | None

Optional API key for authentication.

extra_body dict[str, Any] | None

Additional parameters to pass in API requests.

extra_headers dict[str, str] | None

Additional headers to pass in API requests.

UniformDistribution

Bases: Distribution[UniformDistributionParams]

Uniform distribution for sampling inference parameters.

Samples values uniformly between low and high bounds. Useful for exploring a continuous range of values for temperature or top_p.

Attributes:

Name Type Description
distribution_type DistributionType | None

Type of distribution ("uniform").

params UniformDistributionParams

Distribution parameters (low, high).

Methods:

Name Description
sample

Sample a value from the uniform distribution.

sample()

Sample a value from the uniform distribution.

Returns:

Type Description
float

A float value sampled from the uniform distribution.

Source code in src/data_designer/config/models.py
201
202
203
204
205
206
207
def sample(self) -> float:
    """Sample a value from the uniform distribution.

    Returns:
        A float value sampled from the uniform distribution.
    """
    return float(np.random.uniform(low=self.params.low, high=self.params.high, size=1)[0])

UniformDistributionParams

Bases: ConfigBase

Parameters for uniform distribution sampling.

Attributes:

Name Type Description
low float

Lower bound (inclusive).

high float

Upper bound (exclusive).