generate
generate
¶
Classes:
| Name | Description |
|---|---|
ValidationParameters |
Configuration for record and sequence validation. |
StructuredGenerationParameters |
Configuration for vLLM structured generation. |
GenerateParameters |
Configuration parameters for synthetic data generation. |
Functions:
| Name | Description |
|---|---|
resolve_structured_generation_schema_method |
Resolve |
structural_tag_backend_error_message |
Return an error message when backend cannot serve |
ValidationParameters
pydantic-model
¶
Bases: Parameters, BaseModel
Configuration for record and sequence validation.
These parameters control the validation and automatic fixes when going from LLM output to tabular data.
Fields:
-
group_by_accept_no_delineator(bool) -
group_by_ignore_invalid_records(bool) -
group_by_fix_non_unique_value(bool) -
group_by_fix_unordered_records(bool)
group_by_accept_no_delineator
pydantic-field
¶
Whether to accept completions without both beginning and end of sequence delineators as a single sequence.
group_by_ignore_invalid_records
pydantic-field
¶
Whether to ignore invalid records in a sequence and proceed with the valid records.
group_by_fix_non_unique_value
pydantic-field
¶
Whether to automatically fix non-unique group-by values in a sequence by using the first unique value for all records.
group_by_fix_unordered_records
pydantic-field
¶
Whether to automatically fix unordered records in a sequence by sorting the records.
StructuredGenerationParameters
pydantic-model
¶
Bases: Parameters, BaseModel
Configuration for vLLM structured generation.
These parameters control whether generation is constrained to schema-shaped output, which backend enforces the constraint, and how the constraint schema is built.
Fields:
-
enabled(bool) -
backend(StructuredGenerationBackend) -
schema_method(StructuredGenerationSchemaMethod) -
use_single_sequence(bool)
Validators:
-
_validate_structural_tag_backend
enabled
pydantic-field
¶
Whether to use structured generation for better format control.
backend
pydantic-field
¶
The backend used by vLLM when structured generation is enabled. Supported backends: 'outlines', 'guidance', 'xgrammar', 'lm-format-enforcer'. 'auto' will allow vLLM to choose the backend.
schema_method
pydantic-field
¶
The method used to generate the schema from your dataset and pass it to the generation backend. 'auto' picks 'structural_tag' on xgrammar-capable backends and 'regex' otherwise. 'regex' uses a custom regex construction method that tends to be more comprehensive than 'json_schema' at the cost of speed. 'structural_tag' uses XGrammar Structural Tag to compose schema-constrained JSONL output.
use_single_sequence
pydantic-field
¶
Whether to use a regex that matches exactly one sequence or record if max_sequences_per_example is 1.
GenerateParameters
pydantic-model
¶
Bases: Parameters, BaseModel
Configuration parameters for synthetic data generation.
These parameters control how synthetic data is generated after the model is trained. They affect the quality, diversity, and validity of the generated synthetic records.
Fields:
-
num_records(int) -
temperature(float) -
repetition_penalty(float) -
top_p(float) -
patience(int) -
invalid_fraction_threshold(float) -
structured_generation(StructuredGenerationParameters) -
enforce_timeseries_fidelity(bool) -
validation(ValidationParameters) -
attention_backend(str | None)
Validators:
-
_migrate_legacy_structured_generation_fields
num_records
pydantic-field
¶
Number of records to generate.
temperature
pydantic-field
¶
Sampling temperature for controlling randomness (higher = more random).
repetition_penalty
pydantic-field
¶
The value used to control the likelihood of the model repeating the same token. Must be > 0.
top_p
pydantic-field
¶
Nucleus sampling probability for token selection. Must be in (0, 1].
patience
pydantic-field
¶
Number of consecutive generations where the invalid_fraction_threshold is reached before stopping generation. Must be >= 1.
invalid_fraction_threshold
pydantic-field
¶
The fraction of invalid records that will stop generation after the patience limit is reached. Must be in [0, 1].
structured_generation
pydantic-field
¶
Structured generation parameters controlling schema-constrained output.
enforce_timeseries_fidelity
pydantic-field
¶
Enforce time-series fidelity by enforcing order, intervals, start and end times of the records.
validation
pydantic-field
¶
Validation parameters controlling validation logic and automatic fixes when parsing LLM output and converting to tabular data.
attention_backend
pydantic-field
¶
The attention backend for the vLLM engine. Common values: 'FLASHINFER', 'FLASH_ATTN', 'TRITON_ATTN', 'FLEX_ATTENTION'. If None or 'auto', vLLM will auto-select the best available backend.
use_structured_generation
property
writable
¶
Deprecated flat alias for structured_generation.enabled.
structured_generation_backend
property
writable
¶
Deprecated flat alias for structured_generation.backend.
structured_generation_schema_method
property
writable
¶
Deprecated flat alias for structured_generation.schema_method.
structured_generation_use_single_sequence
property
writable
¶
Deprecated flat alias for structured_generation.use_single_sequence.
resolve_structured_generation_schema_method(schema_method, backend)
¶
Resolve auto schema method from the configured structured-output backend.
auto picks structural_tag on xgrammar-capable backends and regex
elsewhere, preserving legacy behavior for outlines/guidance configs that omit
an explicit schema method.
Source code in src/nemo_safe_synthesizer/config/generate.py
structural_tag_backend_error_message(backend)
¶
Return an error message when backend cannot serve structural_tag.
vLLM only supports XGrammar Structural Tag constraints when the guided
decoding backend is xgrammar or auto (which selects xgrammar for
this schema method).