generate

`generate` ¶

Classes:

Name	Description
`ValidationParameters`	Configuration for record and sequence validation.
`GenerateParameters`	Configuration parameters for synthetic data generation.

`ValidationParameters` `pydantic-model` ¶

Bases: Parameters, BaseModel

Configuration for record and sequence validation.

These parameters control the validation and automatic fixes when going from LLM output to tabular data.

Fields:

group_by_accept_no_delineator (bool)
group_by_ignore_invalid_records (bool)
group_by_fix_non_unique_value (bool)
group_by_fix_unordered_records (bool)

`group_by_accept_no_delineator` `pydantic-field` ¶

Whether to accept completions without both beginning and end of sequence delineators as a single sequence.

`group_by_ignore_invalid_records` `pydantic-field` ¶

Whether to ignore invalid records in a sequence and proceed with the valid records.

`group_by_fix_non_unique_value` `pydantic-field` ¶

Whether to automatically fix non-unique group-by values in a sequence by using the first unique value for all records.

`group_by_fix_unordered_records` `pydantic-field` ¶

Whether to automatically fix unordered records in a sequence by sorting the records.

`GenerateParameters` `pydantic-model` ¶

Bases: Parameters, BaseModel

Configuration parameters for synthetic data generation.

These parameters control how synthetic data is generated after the model is trained. They affect the quality, diversity, and validity of the generated synthetic records.

Fields:

num_records (int)
temperature (float)
repetition_penalty (float)
top_p (float)
patience (int)
invalid_fraction_threshold (float)
use_structured_generation (bool)
structured_generation_backend (Literal['auto', 'xgrammar', 'guidance', 'outlines', 'lm-format-enforcer'])
structured_generation_schema_method (Literal['regex', 'json_schema'])
structured_generation_use_single_sequence (bool)
enforce_timeseries_fidelity (bool)
validation (ValidationParameters)
attention_backend (str | None)

`num_records` `pydantic-field` ¶

Number of records to generate.

`temperature` `pydantic-field` ¶

Sampling temperature for controlling randomness (higher = more random).

`repetition_penalty` `pydantic-field` ¶

The value used to control the likelihood of the model repeating the same token. Must be > 0.

`top_p` `pydantic-field` ¶

Nucleus sampling probability for token selection. Must be in (0, 1].

`patience` `pydantic-field` ¶

Number of consecutive generations where the invalid_fraction_threshold is reached before stopping generation. Must be >= 1.

`invalid_fraction_threshold` `pydantic-field` ¶

The fraction of invalid records that will stop generation after the patience limit is reached. Must be in [0, 1].

`use_structured_generation` `pydantic-field` ¶

Whether to use structured generation for better format control.

`structured_generation_backend` `pydantic-field` ¶

The backend used by vLLM when use_structured_generation is True. Supported backends: 'outlines', 'guidance', 'xgrammar', 'lm-format-enforcer'. 'auto' will allow vLLM to choose the backend.

`structured_generation_schema_method` `pydantic-field` ¶

The method used to generate the schema from your dataset and pass it to the generation backend. 'regex' uses a custom regex construction method that tends to be more comprehensive than 'json_schema' at the cost of speed.

`structured_generation_use_single_sequence` `pydantic-field` ¶

Whether to use a regex that matches exactly one sequence or record if max_sequences_per_example is 1.

`enforce_timeseries_fidelity` `pydantic-field` ¶

Enforce time-series fidelity by enforcing order, intervals, start and end times of the records.

`validation` `pydantic-field` ¶

Validation parameters controlling validation logic and automatic fixes when parsing LLM output and converting to tabular data.

`attention_backend` `pydantic-field` ¶

The attention backend for the vLLM engine. Common values: 'FLASHINFER', 'FLASH_ATTN', 'TRITON_ATTN', 'FLEX_ATTENTION'. If None or 'auto', vLLM will auto-select the best available backend.

generate

generate ¶

ValidationParameters pydantic-model ¶

group_by_accept_no_delineator pydantic-field ¶

group_by_ignore_invalid_records pydantic-field ¶

group_by_fix_non_unique_value pydantic-field ¶

group_by_fix_unordered_records pydantic-field ¶

GenerateParameters pydantic-model ¶

num_records pydantic-field ¶

temperature pydantic-field ¶

repetition_penalty pydantic-field ¶

top_p pydantic-field ¶

patience pydantic-field ¶

invalid_fraction_threshold pydantic-field ¶

use_structured_generation pydantic-field ¶

structured_generation_backend pydantic-field ¶

structured_generation_schema_method pydantic-field ¶

structured_generation_use_single_sequence pydantic-field ¶

enforce_timeseries_fidelity pydantic-field ¶

validation pydantic-field ¶

attention_backend pydantic-field ¶