Skip to content

Models

The models module defines configuration objects for model-based generation. ModelProvider, specifies connection and authentication details for custom providers. ModelConfig encapsulates model details including the model alias, identifier, and inference parameters. InferenceParameters controls model behavior through settings like temperature, top_p, and max_tokens, with support for both fixed values and distribution-based sampling. The module includes ImageContext for providing image inputs to multimodal models.

For more information on how they are used, see below:

Classes:

Name Description
DistributionType

Types of distributions for sampling inference parameters.

ImageContext

Configuration for providing image context to multimodal models.

ImageFormat

Supported image formats for image modality.

InferenceParameters

Configuration for LLM inference parameters.

ManualDistribution

Manual (discrete) distribution for sampling inference parameters.

ManualDistributionParams

Parameters for manual distribution sampling.

Modality

Supported modality types for multimodal model data.

ModalityDataType

Data type formats for multimodal data.

ModelConfig

Configuration for a model used for generation.

ModelProvider

Configuration for a custom model provider.

UniformDistribution

Uniform distribution for sampling inference parameters.

UniformDistributionParams

Parameters for uniform distribution sampling.

DistributionType

Bases: str, Enum

Types of distributions for sampling inference parameters.

ImageContext

Bases: ModalityContext

Configuration for providing image context to multimodal models.

Attributes:

Name Type Description
modality Modality

The modality type (always "image").

column_name str

Name of the column containing image data.

data_type ModalityDataType

Format of the image data ("url" or "base64").

image_format Optional[ImageFormat]

Image format (required for base64 data).

Methods:

Name Description
get_context

Get the context for the image modality.

get_context(record)

Get the context for the image modality.

Parameters:

Name Type Description Default
record dict

The record containing the image data.

required

Returns:

Type Description
dict[str, Any]

The context for the image modality.

Source code in src/data_designer/config/models.py
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
def get_context(self, record: dict) -> dict[str, Any]:
    """Get the context for the image modality.

    Args:
        record: The record containing the image data.

    Returns:
        The context for the image modality.
    """
    context = dict(type="image_url")
    context_value = record[self.column_name]
    if self.data_type == ModalityDataType.URL:
        context["image_url"] = context_value
    else:
        context["image_url"] = {
            "url": f"data:image/{self.image_format.value};base64,{context_value}",
            "format": self.image_format.value,
        }
    return context

ImageFormat

Bases: str, Enum

Supported image formats for image modality.

InferenceParameters

Bases: ConfigBase

Configuration for LLM inference parameters.

Attributes:

Name Type Description
temperature Optional[Union[float, DistributionT]]

Sampling temperature (0.0-2.0). Can be a fixed value or a distribution for dynamic sampling.

top_p Optional[Union[float, DistributionT]]

Nucleus sampling probability (0.0-1.0). Can be a fixed value or a distribution for dynamic sampling.

max_tokens Optional[int]

Maximum number of tokens (includes both input and output tokens).

max_parallel_requests int

Maximum number of parallel requests to the model API.

timeout Optional[int]

Timeout in seconds for each request.

extra_body Optional[dict[str, Any]]

Additional parameters to pass to the model API.

generate_kwargs property

Get the generate kwargs for the inference parameters.

Returns:

Type Description
dict[str, Union[float, int]]

A dictionary of the generate kwargs.

ManualDistribution

Bases: Distribution[ManualDistributionParams]

Manual (discrete) distribution for sampling inference parameters.

Samples from a discrete set of values with optional weights. Useful for testing specific values or creating custom probability distributions for temperature or top_p.

Attributes:

Name Type Description
distribution_type Optional[DistributionType]

Type of distribution ("manual").

params ManualDistributionParams

Distribution parameters (values, weights).

Methods:

Name Description
sample

Sample a value from the manual distribution.

sample()

Sample a value from the manual distribution.

Returns:

Type Description
float

A float value sampled from the manual distribution.

Source code in src/data_designer/config/models.py
155
156
157
158
159
160
161
def sample(self) -> float:
    """Sample a value from the manual distribution.

    Returns:
        A float value sampled from the manual distribution.
    """
    return float(np.random.choice(self.params.values, p=self.params.weights))

ManualDistributionParams

Bases: ConfigBase

Parameters for manual distribution sampling.

Attributes:

Name Type Description
values List[float]

List of possible values to sample from.

weights Optional[List[float]]

Optional list of weights for each value. If not provided, all values have equal probability.

Modality

Bases: str, Enum

Supported modality types for multimodal model data.

ModalityDataType

Bases: str, Enum

Data type formats for multimodal data.

ModelConfig

Bases: ConfigBase

Configuration for a model used for generation.

Attributes:

Name Type Description
alias str

User-defined alias to reference in column configurations.

model str

Model identifier (e.g., from build.nvidia.com or other providers).

inference_parameters InferenceParameters

Inference parameters for the model (temperature, top_p, max_tokens, etc.).

provider Optional[str]

Optional model provider name if using custom providers.

ModelProvider

Bases: ConfigBase

Configuration for a custom model provider.

Attributes:

Name Type Description
name str

Name of the model provider.

endpoint str

API endpoint URL for the provider.

provider_type str

Provider type (default: "openai"). Determines the API format to use.

api_key Optional[str]

Optional API key for authentication.

extra_body Optional[dict[str, Any]]

Additional parameters to pass in API requests.

UniformDistribution

Bases: Distribution[UniformDistributionParams]

Uniform distribution for sampling inference parameters.

Samples values uniformly between low and high bounds. Useful for exploring a continuous range of values for temperature or top_p.

Attributes:

Name Type Description
distribution_type Optional[DistributionType]

Type of distribution ("uniform").

params UniformDistributionParams

Distribution parameters (low, high).

Methods:

Name Description
sample

Sample a value from the uniform distribution.

sample()

Sample a value from the uniform distribution.

Returns:

Type Description
float

A float value sampled from the uniform distribution.

Source code in src/data_designer/config/models.py
196
197
198
199
200
201
202
def sample(self) -> float:
    """Sample a value from the uniform distribution.

    Returns:
        A float value sampled from the uniform distribution.
    """
    return float(np.random.uniform(low=self.params.low, high=self.params.high, size=1)[0])

UniformDistributionParams

Bases: ConfigBase

Parameters for uniform distribution sampling.

Attributes:

Name Type Description
low float

Lower bound (inclusive).

high float

Upper bound (exclusive).