Models

The models module defines configuration objects for model-based generation. ModelProvider, specifies connection and authentication details for custom providers. ModelConfig encapsulates model details including the model alias, identifier, and inference parameters. InferenceParameters controls model behavior through settings like temperature, top_p, and max_tokens, with support for both fixed values and distribution-based sampling. The module includes ImageContext for providing image inputs to multimodal models.

For more information on how they are used, see below:

Classes:

Name	Description
`DistributionType`	Types of distributions for sampling inference parameters.
`ImageContext`	Configuration for providing image context to multimodal models.
`ImageFormat`	Supported image formats for image modality.
`InferenceParameters`	Configuration for LLM inference parameters.
`ManualDistribution`	Manual (discrete) distribution for sampling inference parameters.
`ManualDistributionParams`	Parameters for manual distribution sampling.
`Modality`	Supported modality types for multimodal model data.
`ModalityDataType`	Data type formats for multimodal data.
`ModelConfig`	Configuration for a model used for generation.
`ModelProvider`	Configuration for a custom model provider.
`UniformDistribution`	Uniform distribution for sampling inference parameters.
`UniformDistributionParams`	Parameters for uniform distribution sampling.

`DistributionType`

Bases: str, Enum

Types of distributions for sampling inference parameters.

`ImageContext`

Bases: ModalityContext

Configuration for providing image context to multimodal models.

Attributes:

Name	Type	Description
`modality`	`Modality`	The modality type (always "image").
`column_name`	`str`	Name of the column containing image data.
`data_type`	`ModalityDataType`	Format of the image data ("url" or "base64").
`image_format`	`Optional[ImageFormat]`	Image format (required for base64 data).

Methods:

Name	Description
`get_context`	Get the context for the image modality.

`get_context(record)`

Get the context for the image modality.

Parameters:

Name	Type	Description	Default
`record`	`dict`	The record containing the image data.	required

Returns:

Type	Description
`dict[str, Any]`	The context for the image modality.

Source code in src/data_designer/config/models.py

def get_context(self, record: dict) -> dict[str, Any]:
    """Get the context for the image modality.

    Args:
        record: The record containing the image data.

    Returns:
        The context for the image modality.
    """
    context = dict(type="image_url")
    context_value = record[self.column_name]
    if self.data_type == ModalityDataType.URL:
        context["image_url"] = context_value
    else:
        context["image_url"] = {
            "url": f"data:image/{self.image_format.value};base64,{context_value}",
            "format": self.image_format.value,
        }
    return context

`ImageFormat`

Bases: str, Enum

Supported image formats for image modality.

`InferenceParameters`

Bases: ConfigBase

Configuration for LLM inference parameters.

Attributes:

Name	Type	Description
`temperature`	`Optional[Union[float, DistributionT]]`	Sampling temperature (0.0-2.0). Can be a fixed value or a distribution for dynamic sampling.
`top_p`	`Optional[Union[float, DistributionT]]`	Nucleus sampling probability (0.0-1.0). Can be a fixed value or a distribution for dynamic sampling.
`max_tokens`	`Optional[int]`	Maximum number of tokens (includes both input and output tokens).
`max_parallel_requests`	`int`	Maximum number of parallel requests to the model API.
`timeout`	`Optional[int]`	Timeout in seconds for each request.
`extra_body`	`Optional[dict[str, Any]]`	Additional parameters to pass to the model API.

`generate_kwargs` `property`

Get the generate kwargs for the inference parameters.

Returns:

Type	Description
`dict[str, Union[float, int]]`	A dictionary of the generate kwargs.

`ManualDistribution`

Bases: Distribution[ManualDistributionParams]

Manual (discrete) distribution for sampling inference parameters.

Samples from a discrete set of values with optional weights. Useful for testing specific values or creating custom probability distributions for temperature or top_p.

Attributes:

Name	Type	Description
`distribution_type`	`Optional[DistributionType]`	Type of distribution ("manual").
`params`	`ManualDistributionParams`	Distribution parameters (values, weights).

Methods:

Name	Description
`sample`	Sample a value from the manual distribution.

`sample()`

Sample a value from the manual distribution.

Returns:

Type	Description
`float`	A float value sampled from the manual distribution.

Source code in src/data_designer/config/models.py

def sample(self) -> float:
    """Sample a value from the manual distribution.

    Returns:
        A float value sampled from the manual distribution.
    """
    return float(np.random.choice(self.params.values, p=self.params.weights))

`ManualDistributionParams`

Bases: ConfigBase

Parameters for manual distribution sampling.

Attributes:

Name	Type	Description
`values`	`List[float]`	List of possible values to sample from.
`weights`	`Optional[List[float]]`	Optional list of weights for each value. If not provided, all values have equal probability.

`Modality`

Bases: str, Enum

Supported modality types for multimodal model data.

`ModalityDataType`

Bases: str, Enum

Data type formats for multimodal data.

`ModelConfig`

Bases: ConfigBase

Configuration for a model used for generation.

Attributes:

Name	Type	Description
`alias`	`str`	User-defined alias to reference in column configurations.
`model`	`str`	Model identifier (e.g., from build.nvidia.com or other providers).
`inference_parameters`	`InferenceParameters`	Inference parameters for the model (temperature, top_p, max_tokens, etc.).
`provider`	`Optional[str]`	Optional model provider name if using custom providers.

`ModelProvider`

Bases: ConfigBase

Configuration for a custom model provider.

Attributes:

Name	Type	Description
`name`	`str`	Name of the model provider.
`endpoint`	`str`	API endpoint URL for the provider.
`provider_type`	`str`	Provider type (default: "openai"). Determines the API format to use.
`api_key`	`Optional[str]`	Optional API key for authentication.
`extra_body`	`Optional[dict[str, Any]]`	Additional parameters to pass in API requests.

`UniformDistribution`

Bases: Distribution[UniformDistributionParams]

Uniform distribution for sampling inference parameters.

Samples values uniformly between low and high bounds. Useful for exploring a continuous range of values for temperature or top_p.

Attributes:

Name	Type	Description
`distribution_type`	`Optional[DistributionType]`	Type of distribution ("uniform").
`params`	`UniformDistributionParams`	Distribution parameters (low, high).

Methods:

Name	Description
`sample`	Sample a value from the uniform distribution.

`sample()`

Sample a value from the uniform distribution.

Returns:

Type	Description
`float`	A float value sampled from the uniform distribution.

Source code in src/data_designer/config/models.py

def sample(self) -> float:
    """Sample a value from the uniform distribution.

    Returns:
        A float value sampled from the uniform distribution.
    """
    return float(np.random.uniform(low=self.params.low, high=self.params.high, size=1)[0])

`UniformDistributionParams`

Bases: ConfigBase

Parameters for uniform distribution sampling.

Attributes:

Name	Type	Description
`low`	`float`	Lower bound (inclusive).
`high`	`float`	Upper bound (exclusive).

Models

DistributionType

ImageContext

get_context(record)

ImageFormat

InferenceParameters

generate_kwargs property

ManualDistribution

sample()

ManualDistributionParams

Modality

ModalityDataType

ModelConfig

ModelProvider

UniformDistribution

sample()

UniformDistributionParams

`DistributionType`

`ImageContext`

`get_context(record)`

`ImageFormat`

`InferenceParameters`

`generate_kwargs` `property`

`ManualDistribution`

`sample()`

`ManualDistributionParams`

`Modality`

`ModalityDataType`

`ModelConfig`

`ModelProvider`

`UniformDistribution`

`sample()`

`UniformDistributionParams`