results
results
¶
Generation result containers and multi-batch accumulator.
Classes:
| Name | Description |
|---|---|
GenerateJobResults |
Results of a complete generation job. |
GenerationBatches |
Accumulator that tracks batches during the generation phase. |
Functions:
| Name | Description |
|---|---|
rejected_record_to_error |
Convert a rejected record into a |
GenerateJobResults(df, status, num_valid_records, num_invalid_records, num_prompts, valid_record_fraction, batch_valid_record_fractions, elapsed_time=None)
dataclass
¶
Results of a complete generation job.
Encapsulates the generated DataFrame along with validity statistics,
prompt counts, and timing information. Built from a
GenerationBatches
accumulator via from_batches.
Methods:
| Name | Description |
|---|---|
from_batches |
Build results from a completed :class: |
Attributes:
| Name | Type | Description |
|---|---|---|
df |
DataFrame
|
DataFrame containing the generated records. |
status |
GenerationStatus
|
Overall generation status derived from the processed batches. |
num_valid_records |
int
|
Total number of records that passed validation. |
num_invalid_records |
int
|
Total number of records that failed validation. |
num_prompts |
int
|
Total number of prompts processed during generation. |
valid_record_fraction |
float
|
Fraction of valid records among all generated records. |
batch_valid_record_fractions |
list[float]
|
Per-batch valid record fractions, in batch order. |
elapsed_time |
float | None
|
Wall-clock generation duration in seconds, or |
df
instance-attribute
¶
DataFrame containing the generated records.
status
instance-attribute
¶
Overall generation status derived from the processed batches.
num_valid_records
instance-attribute
¶
Total number of records that passed validation.
num_invalid_records
instance-attribute
¶
Total number of records that failed validation.
num_prompts
instance-attribute
¶
Total number of prompts processed during generation.
valid_record_fraction
instance-attribute
¶
Fraction of valid records among all generated records.
batch_valid_record_fractions
instance-attribute
¶
Per-batch valid record fractions, in batch order.
elapsed_time = None
class-attribute
instance-attribute
¶
Wall-clock generation duration in seconds, or None if not yet set.
from_batches(batches, max_num_records, columns, elapsed_time)
classmethod
¶
Build results from a completed :class:GenerationBatches accumulator.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
batches
|
GenerationBatches
|
Accumulated generation batches. |
required |
max_num_records
|
int | None
|
If set, truncate the output DataFrame to this many rows. |
required |
columns
|
list[str]
|
Column names to select from the generated records. |
required |
elapsed_time
|
float
|
Wall-clock generation duration in seconds. |
required |
Returns:
| Type | Description |
|---|---|
Self
|
Populated results instance. |
Source code in src/nemo_safe_synthesizer/generation/results.py
GenerationBatches(target_num_records=None, batches=None, max_num_prompts_per_batch=MAX_NUM_PROMPTS_PER_BATCH, invalid_fraction_threshold=None, patience=None, data_actions_fn=None)
¶
Accumulator that tracks batches during the generation phase.
Manages the stopping condition, running statistics, and optional
post-processing via data_actions_fn.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
target_num_records
|
int | None
|
Target number of valid records to generate. |
None
|
batches
|
list[Batch] | None
|
Pre-existing batches to seed the accumulator with. |
None
|
max_num_prompts_per_batch
|
int
|
Maximum prompts per LLM generation call. |
MAX_NUM_PROMPTS_PER_BATCH
|
invalid_fraction_threshold
|
float | None
|
Fraction of invalid records that
triggers stopping after |
None
|
patience
|
int | None
|
Consecutive batch count before the threshold triggers a stop. |
None
|
data_actions_fn
|
DataActionsFn | None
|
Optional function that post-processes and validates records from each batch. |
None
|
Attributes:
| Name | Type | Description |
|---|---|---|
status |
Current generation status. |
|
running_stopping_metric |
Exponential running average of the invalid-record fraction. |
|
stop_condition |
The patience-based stopping condition, or
|
Methods:
| Name | Description |
|---|---|
add_batch |
Add a batch and update the generation status. |
get_next_num_prompts |
Return an estimate of the optimal number of prompts to process in the next batch. |
job_complete |
Update the generation job status to a finished state and log the results. |
log_status |
Log the current status of the generation process. |
to_dataframe |
Combine valid records from all batches into a single DataFrame. |
Source code in src/nemo_safe_synthesizer/generation/results.py
num_batches
property
¶
The number of batches in the generation job.
num_prompts
property
¶
The total number of prompts processed in the generation job.
num_invalid_records
property
¶
The total number of invalid records generated in the generation job.
num_valid_records
property
¶
The total number of valid records generated in the generation job.
add_batch(batch)
¶
Add a batch and update the generation status.
Stopping rules:
- The very first batch producing zero valid records always
triggers
STOP_NO_RECORDS. - When a
stop_conditionis configured, subsequent batches with zero valid records are tolerated until the patience-based threshold is reached. - Without a
stop_condition, any batch with zero valid records triggersSTOP_NO_RECORDS.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
batch
|
Batch
|
The completed batch to add. |
required |
Source code in src/nemo_safe_synthesizer/generation/results.py
get_next_num_prompts()
¶
Return an estimate of the optimal number of prompts to process in the next batch.
Source code in src/nemo_safe_synthesizer/generation/results.py
job_complete()
¶
Update the generation job status to a finished state and log the results.
Source code in src/nemo_safe_synthesizer/generation/results.py
log_status()
¶
Log the current status of the generation process.
Source code in src/nemo_safe_synthesizer/generation/results.py
to_dataframe(columns, max_num_records=None)
¶
Combine valid records from all batches into a single DataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
columns
|
list[str]
|
Column names to include in the output. |
required |
max_num_records
|
int | None
|
If set, truncate to this many rows. |
None
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame of valid records, or an empty DataFrame if none |
DataFrame
|
were generated. |
Source code in src/nemo_safe_synthesizer/generation/results.py
rejected_record_to_error(record)
¶
Convert a rejected record into a (detailed, summary) error tuple.
Both elements are identical so that log output is consistent
regardless of the detailed_errors setting.