Skip to content

generate

generate

Classes:

Name Description
ValidationParameters

Configuration for record and sequence validation.

GenerateParameters

Configuration parameters for synthetic data generation.

ValidationParameters pydantic-model

Bases: Parameters, BaseModel

Configuration for record and sequence validation.

These parameters control the validation and automatic fixes when going from LLM output to tabular data.

Fields:

group_by_accept_no_delineator pydantic-field

Whether to accept completions without both beginning and end of sequence delineators as a single sequence.

group_by_ignore_invalid_records pydantic-field

Whether to ignore invalid records in a sequence and proceed with the valid records.

group_by_fix_non_unique_value pydantic-field

Whether to automatically fix non-unique group-by values in a sequence by using the first unique value for all records.

group_by_fix_unordered_records pydantic-field

Whether to automatically fix unordered records in a sequence by sorting the records.

GenerateParameters pydantic-model

Bases: Parameters, BaseModel

Configuration parameters for synthetic data generation.

These parameters control how synthetic data is generated after the model is trained. They affect the quality, diversity, and validity of the generated synthetic records.

Fields:

num_records pydantic-field

Number of records to generate.

temperature pydantic-field

Sampling temperature for controlling randomness (higher = more random).

repetition_penalty pydantic-field

The value used to control the likelihood of the model repeating the same token. Must be > 0.

top_p pydantic-field

Nucleus sampling probability for token selection. Must be in (0, 1].

patience pydantic-field

Number of consecutive generations where the invalid_fraction_threshold is reached before stopping generation. Must be >= 1.

invalid_fraction_threshold pydantic-field

The fraction of invalid records that will stop generation after the patience limit is reached. Must be in [0, 1].

use_structured_generation pydantic-field

Whether to use structured generation for better format control.

structured_generation_backend pydantic-field

The backend used by vLLM when use_structured_generation is True. Supported backends: 'outlines', 'guidance', 'xgrammar', 'lm-format-enforcer'. 'auto' will allow vLLM to choose the backend.

structured_generation_schema_method pydantic-field

The method used to generate the schema from your dataset and pass it to the generation backend. 'regex' uses a custom regex construction method that tends to be more comprehensive than 'json_schema' at the cost of speed.

structured_generation_use_single_sequence pydantic-field

Whether to use a regex that matches exactly one sequence or record if max_sequences_per_example is 1.

enforce_timeseries_fidelity pydantic-field

Enforce time-series fidelity by enforcing order, intervals, start and end times of the records.

validation pydantic-field

Validation parameters controlling validation logic and automatic fixes when parsing LLM output and converting to tabular data.

attention_backend pydantic-field

The attention backend for the vLLM engine. Common values: 'FLASHINFER', 'FLASH_ATTN', 'TRITON_ATTN', 'FLEX_ATTENTION'. If None or 'auto', vLLM will auto-select the best available backend.