telemetry
telemetry
¶
Telemetry handler for NeMo products.
Environment variables: - NEMO_TELEMETRY_ENABLED: Whether telemetry is enabled. - NEMO_DEPLOYMENT_TYPE: The deployment type the event came from. - NEMO_TELEMETRY_ENDPOINT: The endpoint to send the telemetry events to. - NEMO_SESSION_PREFIX: Optional prefix to add to session IDs.
Classes:
| Name | Description |
|---|---|
NSSTrainingAndGenerationEvent |
|
TelemetryHandler |
Handles telemetry event batching, flushing, and retry logic for NeMo products. |
Functions:
| Name | Description |
|---|---|
sanitize_model_for_telemetry |
Return a telemetry-safe pretrained model label. |
bucket_records |
Return a bucketed string label for a count of input records. |
bucket_columns |
Return a bucketed string label for a count of input columns. |
NSSTrainingAndGenerationEvent
pydantic-model
¶
Bases: BaseModel
Config:
default:{'populate_by_name': True}
Fields:
-
nemo_source(NemoSourceEnum) -
task(str) -
task_status(TaskStatusEnum) -
deployment_type(DeploymentTypeEnum) -
job_duration_sec(float) -
num_records_generated(int) -
num_tokens_generated(int) -
replace_pii_enabled(bool) -
differential_privacy_enabled(bool) -
time_series_enabled(bool) -
group_by_enabled(bool) -
input_records_bucket(str) -
input_columns_bucket(str) -
synthetic_quality_score(float) -
data_privacy_score(float) -
model(str) -
gpu(str)
nemo_source = NemoSourceEnum.SAFE_SYNTHESIZER
pydantic-field
¶
The NeMo product that created the event.
task
pydantic-field
¶
The type of task that was performed (e.g. train, generate, evaluate, run).
task_status
pydantic-field
¶
The final status of the task.
deployment_type
pydantic-field
¶
How Safe Synthesizer was invoked (cli, sdk, nmp).
job_duration_sec = -1.0
pydantic-field
¶
Wall-clock duration of the job in seconds. -1.0 if not available.
num_records_generated = -1
pydantic-field
¶
Number of valid synthetic records produced. -1 if not available.
num_tokens_generated = -1
pydantic-field
¶
Number of tokens generated by the model. -1 if not available.
replace_pii_enabled = False
pydantic-field
¶
Whether PII replacement was enabled for this run.
differential_privacy_enabled = False
pydantic-field
¶
Whether differential privacy training was enabled for this run.
time_series_enabled = False
pydantic-field
¶
Whether time-series mode was enabled for this run.
group_by_enabled = False
pydantic-field
¶
Whether group-by was set on the input data for this run.
input_records_bucket = 'undefined'
pydantic-field
¶
Bucketed count of input training records (e.g. '101-1000'). Use bucket_records().
input_columns_bucket = 'undefined'
pydantic-field
¶
Bucketed count of input columns (e.g. '6-10'). Use bucket_columns().
synthetic_quality_score = -1.0
pydantic-field
¶
Top-level Synthetic Quality Score from the evaluation report. -1.0 if not available.
data_privacy_score = -1.0
pydantic-field
¶
Top-level Data Privacy Score from the evaluation report. -1.0 if not available.
model = 'undefined'
pydantic-field
¶
The pretrained model used for training/generation.
gpu = 'undefined'
pydantic-field
¶
GPU device name (e.g. 'NVIDIA A100 80GB PCIe'). 'undefined' if not on GPU.
TelemetryHandler(flush_interval_seconds=120.0, max_queue_size=50, max_retries=MAX_RETRIES, source_client_version='undefined', session_id='undefined')
¶
Handles telemetry event batching, flushing, and retry logic for NeMo products.
Supports two usage patterns:
- Background mode: call
start()(or usewith handler:) to spawn a daemon thread with its own event loop that drives periodic flushing.stop()schedules a final flush, then stops the loop and joins the thread. - Fire-and-flush mode: skip
start(),enqueue()events, then callstop()to flush once viaasyncio.run. No background thread is created.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
flush_interval_seconds
|
float
|
The interval in seconds to flush the events. |
120.0
|
max_queue_size
|
int
|
The maximum number of events to queue before flushing. |
50
|
max_retries
|
int
|
The maximum number of times to retry sending an event. |
MAX_RETRIES
|
source_client_version
|
str
|
The version of the source client. This should be the version of the actual NeMo product that is sending the events, typically the same as the version of a PyPi package that a user would install. |
'undefined'
|
session_id
|
str
|
An optional session ID to associate with the events. This should be a unique identifier for the session, such as a UUID. It is used to group events together. |
'undefined'
|
Methods:
| Name | Description |
|---|---|
astart |
Start the background timer task on the current event loop. |
astop |
Cancel the timer task and flush any remaining events. |
aflush |
Flush all queued events immediately and await completion. |
start |
Spawn a daemon thread with a persistent event loop for periodic flushing. |
stop |
Flush pending events. If a background thread is running, shut it down and join. |
flush |
Flush all queued events immediately and wait for completion. |
Source code in src/nemo_safe_synthesizer/telemetry.py
astart()
async
¶
Start the background timer task on the current event loop.
Source code in src/nemo_safe_synthesizer/telemetry.py
astop()
async
¶
Cancel the timer task and flush any remaining events.
Source code in src/nemo_safe_synthesizer/telemetry.py
aflush()
async
¶
start()
¶
Spawn a daemon thread with a persistent event loop for periodic flushing.
Source code in src/nemo_safe_synthesizer/telemetry.py
stop()
¶
Flush pending events. If a background thread is running, shut it down and join.
Source code in src/nemo_safe_synthesizer/telemetry.py
flush()
¶
Flush all queued events immediately and wait for completion.
Source code in src/nemo_safe_synthesizer/telemetry.py
sanitize_model_for_telemetry(model)
¶
Return a telemetry-safe pretrained model label.
Hugging Face repo IDs are safe to report, but local model paths may embed user or machine details. Prefer the coarse local label when the value looks path-like or does not satisfy Hugging Face repo ID syntax.
Source code in src/nemo_safe_synthesizer/telemetry.py
bucket_records(n)
¶
Return a bucketed string label for a count of input records.
Used to avoid transmitting exact record counts in telemetry.
Source code in src/nemo_safe_synthesizer/telemetry.py
bucket_columns(n)
¶
Return a bucketed string label for a count of input columns.
Used to avoid transmitting exact column counts in telemetry.