results
results
¶
Generation result containers and multi-batch accumulator.
Classes:
| Name | Description |
|---|---|
GenerateJobResults |
Results of a complete generation job. |
GenerationBatches |
Accumulator that tracks batches during the generation phase. |
Functions:
| Name | Description |
|---|---|
rejected_record_to_error |
Convert a rejected record into a |
Attributes:
| Name | Type | Description |
|---|---|---|
NUM_PROMPT_BUFFER |
Extra prompts added on top of the records-per-prompt estimate to absorb invalid completions. |
|
INITIAL_PROBE_PROMPTS |
Prompt count used for the first batch when no records-per-prompt history exists yet. |
|
ADAPTIVE_MAX_PROMPTS_CEILING |
Upper bound for |
NUM_PROMPT_BUFFER = 10
module-attribute
¶
Extra prompts added on top of the records-per-prompt estimate to absorb invalid completions.
INITIAL_PROBE_PROMPTS = 10
module-attribute
¶
Prompt count used for the first batch when no records-per-prompt history exists yet.
Sending a small probe lets the accumulator measure the records-per-prompt
ratio cheaply. Subsequent batches escalate to the full max_num_prompts_per_batch
once at least one prompt has been processed (regardless of whether it produced
valid records), avoiding the overshoot that an upfront full-batch causes when
the target count is much larger than the per-prompt yield.
ADAPTIVE_MAX_PROMPTS_CEILING = 2000
module-attribute
¶
Upper bound for target_num_records-derived max_num_prompts_per_batch.
GenerateJobResults(df, status, num_valid_records, num_invalid_records, num_prompts, valid_record_fraction, batch_valid_record_fractions, elapsed_time=None, num_completion_tokens=None, num_valid_record_tokens=None, num_invalid_record_tokens=None, num_non_record_tokens=None, tokens_per_prompt=None, tokens_per_second=None, valid_tokens_per_second=None, tokenization_overhead_sec=None)
dataclass
¶
Results of a complete generation job.
Encapsulates the generated DataFrame along with validity statistics,
prompt counts, and timing information. Built from a
GenerationBatches
accumulator via from_batches.
Methods:
| Name | Description |
|---|---|
from_batches |
Build results from a completed :class: |
Attributes:
| Name | Type | Description |
|---|---|---|
df |
DataFrame
|
DataFrame containing the generated records. |
status |
GenerationStatus
|
Overall generation status derived from the processed batches. |
num_valid_records |
int
|
Total number of records that passed validation. |
num_invalid_records |
int
|
Total number of records that failed validation. |
num_prompts |
int
|
Total number of prompts processed during generation. |
valid_record_fraction |
float
|
Fraction of valid records among all generated records. |
batch_valid_record_fractions |
list[float]
|
Per-batch valid record fractions, in batch order. |
elapsed_time |
float | None
|
Wall-clock generation duration in seconds, or |
num_completion_tokens |
int | None
|
Total tokens generated by the LLM across all completions. |
num_valid_record_tokens |
int | None
|
Tokens in records that passed validation. |
num_invalid_record_tokens |
int | None
|
Tokens in records that failed validation. |
num_non_record_tokens |
int | None
|
Tokens not part of any recognized record. |
tokens_per_prompt |
float | None
|
Average completion tokens per prompt ( |
tokens_per_second |
float | None
|
Total completion tokens divided by generation wall-clock time. |
valid_tokens_per_second |
float | None
|
Valid record tokens divided by generation wall-clock time. |
tokenization_overhead_sec |
float | None
|
Wall-clock seconds spent tokenizing records for statistics. |
df
instance-attribute
¶
DataFrame containing the generated records.
status
instance-attribute
¶
Overall generation status derived from the processed batches.
num_valid_records
instance-attribute
¶
Total number of records that passed validation.
num_invalid_records
instance-attribute
¶
Total number of records that failed validation.
num_prompts
instance-attribute
¶
Total number of prompts processed during generation.
valid_record_fraction
instance-attribute
¶
Fraction of valid records among all generated records.
batch_valid_record_fractions
instance-attribute
¶
Per-batch valid record fractions, in batch order.
elapsed_time = None
class-attribute
instance-attribute
¶
Wall-clock generation duration in seconds, or None if not yet set.
num_completion_tokens = None
class-attribute
instance-attribute
¶
Total tokens generated by the LLM across all completions.
num_valid_record_tokens = None
class-attribute
instance-attribute
¶
Tokens in records that passed validation.
num_invalid_record_tokens = None
class-attribute
instance-attribute
¶
Tokens in records that failed validation.
num_non_record_tokens = None
class-attribute
instance-attribute
¶
Tokens not part of any recognized record.
tokens_per_prompt = None
class-attribute
instance-attribute
¶
Average completion tokens per prompt (num_completion_tokens / num_prompts).
tokens_per_second = None
class-attribute
instance-attribute
¶
Total completion tokens divided by generation wall-clock time.
valid_tokens_per_second = None
class-attribute
instance-attribute
¶
Valid record tokens divided by generation wall-clock time.
tokenization_overhead_sec = None
class-attribute
instance-attribute
¶
Wall-clock seconds spent tokenizing records for statistics.
from_batches(batches, max_num_records, columns, elapsed_time)
classmethod
¶
Build results from a completed :class:GenerationBatches accumulator.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
batches
|
GenerationBatches
|
Accumulated generation batches. |
required |
max_num_records
|
int | None
|
If set, truncate the output DataFrame to this many rows. |
required |
columns
|
list[str]
|
Column names to select from the generated records. |
required |
elapsed_time
|
float
|
Wall-clock generation duration in seconds. |
required |
Returns:
| Type | Description |
|---|---|
Self
|
Populated results instance. |
Source code in src/nemo_safe_synthesizer/generation/results.py
GenerationBatches(target_num_records=None, batches=None, max_num_prompts_per_batch=None, invalid_fraction_threshold=None, patience=None, data_actions_fn=None)
¶
Accumulator that tracks batches during the generation phase.
Manages the stopping condition, running statistics, and optional
post-processing via data_actions_fn. The first batch uses a small
probe when a target record count is available; later batches estimate the
required prompt count from num_valid_records / num_prompts with
NUM_PROMPT_BUFFER added to absorb invalid completions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
target_num_records
|
int | None
|
Target number of valid records to generate. |
None
|
batches
|
list[Batch] | None
|
Pre-existing batches to seed the accumulator with. |
None
|
max_num_prompts_per_batch
|
int | None
|
Maximum prompts per LLM generation
call. |
None
|
invalid_fraction_threshold
|
float | None
|
Fraction of invalid records that
triggers stopping after |
None
|
patience
|
int | None
|
Consecutive batch count before the threshold triggers a stop. |
None
|
data_actions_fn
|
DataActionsFn | None
|
Optional function that post-processes and validates records from each batch. |
None
|
Attributes:
| Name | Type | Description |
|---|---|---|
status |
Current generation status. |
|
running_stopping_metric |
Exponential running average of the invalid-record fraction. |
|
stop_condition |
The patience-based stopping condition, or
|
Methods:
| Name | Description |
|---|---|
add_batch |
Add a batch and update the generation status. |
get_next_num_prompts |
Return an estimate of the optimal number of prompts to process in the next batch. |
job_complete |
Update the generation job status to a finished state and log the results. |
log_status |
Log the current status of the generation process. |
to_dataframe |
Combine valid records from all batches into a single DataFrame. |
Source code in src/nemo_safe_synthesizer/generation/results.py
num_batches
property
¶
The number of batches in the generation job.
num_prompts
property
¶
The total number of prompts processed in the generation job.
num_invalid_records
property
¶
The total number of invalid records generated in the generation job.
num_valid_records
property
¶
The total number of valid records generated in the generation job.
total_completion_tokens
property
¶
Total tokens across all completions in all batches.
total_valid_record_tokens
property
¶
Sum of token counts for valid records across all batches.
total_invalid_record_tokens
property
¶
Sum of token counts for invalid records across all batches.
total_non_record_tokens
property
¶
Tokens not part of any recognized record across all batches.
total_tokenization_time_sec
property
¶
Wall-clock seconds spent tokenizing records across all batches.
num_length_truncated_completions
property
¶
Total completions that stopped because they reached max_tokens.
add_batch(batch)
¶
Add a batch and update the generation status.
Stopping rules:
- The very first batch producing zero valid records normally
triggers
STOP_NO_RECORDS. A fully length-truncated probe is the exception: every completion may have been cut off before it could emit a complete record, so a patience-basedstop_conditiongets to decide whether to continue. - When a
stop_conditionis configured, subsequent batches with zero valid records are tolerated until the patience-based threshold is reached. - Without a
stop_condition, any batch with zero valid records triggersSTOP_NO_RECORDS.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
batch
|
Batch
|
The completed batch to add. |
required |
Source code in src/nemo_safe_synthesizer/generation/results.py
get_next_num_prompts()
¶
Return an estimate of the optimal number of prompts to process in the next batch.
The accumulator scales the per-batch prompt count through three regimes:
- Truly-first batch (no prompts ever sent) -- send a small probe
batch of
INITIAL_PROBE_PROMPTSso the records-per-prompt ratio can be measured before committing to a full batch. - Prompts sent but no valid records yet -- escalate to the full
max_num_prompts_per_batchbudget unless the latest batch was fully length-truncated, in which case keep probing conservatively instead of expanding GPU work. - Have valid records -- size the next batch from the observed
records-per-prompt ratio, plus
NUM_PROMPT_BUFFERto absorb invalid completions.
Source code in src/nemo_safe_synthesizer/generation/results.py
job_complete()
¶
Update the generation job status to a finished state and log the results.
Source code in src/nemo_safe_synthesizer/generation/results.py
log_status()
¶
Log the current status of the generation process.
Source code in src/nemo_safe_synthesizer/generation/results.py
to_dataframe(columns, max_num_records=None)
¶
Combine valid records from all batches into a single DataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
columns
|
list[str]
|
Column names to include in the output. |
required |
max_num_records
|
int | None
|
If set, truncate to this many rows. |
None
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame of valid records, or an empty DataFrame if none |
DataFrame
|
were generated. |
Source code in src/nemo_safe_synthesizer/generation/results.py
rejected_record_to_error(record)
¶
Convert a rejected record into a (detailed, summary) error tuple.
Both elements are identical so that log output is consistent
regardless of the detailed_errors setting.