dp_utils
dp_utils
¶
DP training utilities for Hugging Face Trainer and data collation.
Provides OpacusDPTrainer (DP-aware Trainer with entity-level sampling and
Opacus optimizer), DPCallback for Trainer hooks, data collators that
expose position_ids for per-sample gradients, and GradSampleModule
wrapper with no_sync support.
Classes:
| Name | Description |
|---|---|
DPCallback |
Trainer callback that integrates Opacus DP-SGD with |
DataCollatorForPrivateCausalLanguageModeling |
Adds |
DataCollatorForPrivateTokenClassification |
Collator for token classification that adds |
GradSampleModule |
Opacus GradSampleModule with |
OpacusDPTrainer |
DP-aware Trainer for PEFT/LoRA fine-tuning with Opacus. |
Functions:
| Name | Description |
|---|---|
create_entity_mapping |
Build a mapping from each entity to its dataset indices. |
DPCallback(noise_multiplier, sampling_probability, accountant, max_epsilon=float('inf'))
¶
Bases: TrainerCallback
Trainer callback that integrates Opacus DP-SGD with transformers.Trainer.
Handles per-step optimizer behavior (skip signal, step, zero_grad), optional
RDP step accounting, and early stopping when max_epsilon is exceeded.
Used with OpacusDPTrainer; the trainer injects this callback when
privacy arguments are enabled.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
noise_multiplier
|
float
|
Gaussian noise scale for gradients. |
required |
sampling_probability
|
float
|
Probability of a record being in a batch. |
required |
accountant
|
SafeSynthesizerAccountant
|
Privacy accountant for epsilon computation and (if RDP) step tracking. |
required |
max_epsilon
|
float
|
Stop training when computed epsilon exceeds this value. |
float('inf')
|
Methods:
| Name | Description |
|---|---|
on_substep_end |
Run DP optimizer step at the end of each gradient-accumulation substep. |
on_step_end |
Clear gradients and update RDP accountant at the end of each optimizer step. |
on_save |
Called when the Trainer is about to save a checkpoint. Ensures training |
on_evaluate |
Check epsilon budget and stop training if |
Source code in src/nemo_safe_synthesizer/privacy/dp_transformers/dp_utils.py
on_substep_end(args, state, control, optimizer=None, **kwargs)
¶
Run DP optimizer step at the end of each gradient-accumulation substep.
Signals the Opacus optimizer to skip the step, calls step() and
zero_grad() on the underlying DP optimizer (or the optimizer itself
if not wrapped by Accelerate). Required when using gradient accumulation
so that the optimizer step runs once per micro-batch.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
args
|
TrainingArguments
|
HF Trainer arguments. |
required |
state
|
TrainerState
|
Current trainer state. |
required |
control
|
TrainerControl
|
Trainer control object (not modified). |
required |
optimizer
|
Optimizer | None
|
The Trainer's optimizer (Opacus DP optimizer or AcceleratedOptimizer wrapping it). |
None
|
**kwargs
|
Additional callback keyword arguments. |
{}
|
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If optimizer is None (callback cannot access optimizer). |
Source code in src/nemo_safe_synthesizer/privacy/dp_transformers/dp_utils.py
on_step_end(args, state, control, optimizer=None, **kwargs)
¶
Clear gradients and update RDP accountant at the end of each optimizer step.
Calls zero_grad() on the optimizer (Opacus expects this; Trainer does not
call it by default). When using the RDP accountant (not PRV), increments the
accountant step for accurate epsilon calculation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
args
|
TrainingArguments
|
Trainer training arguments (used to check gradient_accumulation_steps). |
required |
state
|
TrainerState
|
Current trainer state. |
required |
control
|
TrainerControl
|
Trainer control object (not modified). |
required |
optimizer
|
Optimizer | None
|
The Trainer's optimizer (required for |
None
|
**kwargs
|
Additional callback keyword arguments. |
{}
|
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If gradient accumulation is used but |
Source code in src/nemo_safe_synthesizer/privacy/dp_transformers/dp_utils.py
on_save(args, state, control, **kwargs)
¶
Called when the Trainer is about to save a checkpoint. Ensures training stops before saving if the privacy budget would be exceeded.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
args
|
TrainingArguments
|
HF Trainer arguments. |
required |
state
|
TrainerState
|
Current trainer state (used for global_step). |
required |
control
|
TrainerControl
|
Trainer control object; |
required |
**kwargs
|
Additional callback keyword arguments. |
{}
|
Returns:
| Type | Description |
|---|---|
TrainerControl
|
TrainerControl with |
TrainerControl
|
epsilon exceeds |
Source code in src/nemo_safe_synthesizer/privacy/dp_transformers/dp_utils.py
on_evaluate(args, state, control, **kwargs)
¶
Check epsilon budget and stop training if max_epsilon is exceeded.
Called when the Trainer runs evaluation. Ensures training stops before further steps if the privacy budget would be exceeded.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
args
|
TrainingArguments
|
HF Trainer arguments. |
required |
state
|
TrainerState
|
Current trainer state (used for global_step). |
required |
control
|
TrainerControl
|
Trainer control object; |
required |
**kwargs
|
Additional callback keyword arguments. |
{}
|
Returns:
| Type | Description |
|---|---|
TrainerControl
|
TrainerControl with |
TrainerControl
|
epsilon exceeds |
Source code in src/nemo_safe_synthesizer/privacy/dp_transformers/dp_utils.py
DataCollatorForPrivateCausalLanguageModeling(tokenizer)
¶
Bases: DataCollatorForLanguageModeling
Adds position_ids for Opacus per-sample gradients.
Trainer and model code often create position_ids inside the model
forward pass, which Opacus cannot see. This collator builds position_ids
during batching so they are present in the batch and available for
per-sample gradient computation. See https://github.com/huggingface/transformers/blob/5c1c72be5f864d10d0efe8ece0768d9ed6ee4fdd/src/transformers/models/mistral/modeling_mistral.py#L379
for an example.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tokenizer
|
PreTrainedTokenizer
|
Tokenizer for padding and encoding. |
required |
Source code in src/nemo_safe_synthesizer/privacy/dp_transformers/dp_utils.py
DataCollatorForPrivateTokenClassification(tokenizer)
¶
Bases: DataCollatorForTokenClassification
Collator for token classification that adds position_ids for Opacus.
Same rationale as DataCollatorForPrivateCausalLanguageModeling: ensures
position_ids are in the batch for per-sample gradient computation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tokenizer
|
PreTrainedTokenizer
|
Tokenizer for padding and encoding. |
required |
Source code in src/nemo_safe_synthesizer/privacy/dp_transformers/dp_utils.py
GradSampleModule
¶
Bases: GradSampleModule
Opacus GradSampleModule with no_sync for Hugging Face Trainer.
Trainer expects a no_sync context manager to defer gradient sync in
distributed settings. This wrapper provides a no-op no_sync so the
Trainer API is satisfied.
Methods:
| Name | Description |
|---|---|
no_sync |
Context manager that does nothing; required by Trainer's expected API. |
no_sync()
¶
OpacusDPTrainer(train_dataset, model, args=None, privacy_args=None, data_fraction=None, true_dataset_size=None, entity_column_values=None, callbacks=None, secure_mode=True, **kwargs)
¶
Bases: Trainer
DP-aware Trainer for PEFT/LoRA fine-tuning with Opacus.
Adapts Hugging Face Trainer for differential privacy: uses entity-level
(or record-level) sampling, wraps the model in GradSampleModule and
the optimizer in Opacus DPOptimizer, and avoids double-scaling of
loss by gradient accumulation. Saves only the PEFT/LoRA adapter weights.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
train_dataset
|
Dataset
|
Dataset for training. |
required |
model
|
PreTrainedModel | Module
|
Base model (will be wrapped with GradSampleModule). |
required |
args
|
TrainingArguments | None
|
Training arguments (e.g. |
None
|
privacy_args
|
PrivacyArguments | None
|
DP parameters (epsilon, delta, noise, clipping). Required. |
None
|
data_fraction
|
float | None
|
If set, scales effective number of epochs for privacy math. |
None
|
true_dataset_size
|
int | None
|
Override number of entities/records for privacy accounting. |
None
|
entity_column_values
|
list | None
|
If set, entity-level DP; each value is the entity ID for the corresponding dataset row. If None, record-level DP (one entity per row). |
None
|
callbacks
|
list[TrainerCallback] | None
|
Additional Trainer callbacks. |
None
|
secure_mode
|
bool | None
|
If True, use secure RNG for noise (recommended). |
True
|
**kwargs
|
Any
|
Passed to |
{}
|
Attributes:
| Name | Type | Description |
|---|---|---|
accountant |
Privacy accountant used for epsilon computation. |
|
entity_mapping |
For entity i, list of dataset indices in that entity. |
Methods:
| Name | Description |
|---|---|
get_epsilon |
Calculate the epsilon after model training completes. |
create_optimizer |
Create the base optimizer then wrap it with Opacus DPOptimizer. |
training_step |
Run one training step and return the loss scaled for logging. |
get_train_dataloader |
Returns a torch DataLoader that uses an entity-level sampler. |
Source code in src/nemo_safe_synthesizer/privacy/dp_transformers/dp_utils.py
382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 | |
sampling_probability
property
¶
Probability that an entity is included in a batch (capped at 1.0).
For record-level DP (one entity per row), it is \(min(1, (per_device_batch_size × gradient_accumulation_steps) / n_entities)\). For entity-level DP, n_entities can be small so the ratio may exceed 1; the result is capped at 1.0. Used as the sampling probability in the privacy accountant for ε computation.
num_steps
property
¶
The number of optimizer steps used for privacy accounting.
Either user-supplied (via max_steps when true_num_epochs == -1)
or determined from num_train_epochs. When the user specifies
num_train_epochs, we determine num_steps from
sampling_probability so we pass over each entity roughly once per
epoch, similarly to passing over each record once per epoch in
record-level training.
Always at least 1, because we add 1 to 1 / sampling_probability;
this can happen when there are fewer entities than
batch_size * gradient_accumulation_steps (e.g. 4 * 8 = 32).
Used to determine the privacy budget (noise multiplier and epsilon)
during training.
get_epsilon()
¶
create_optimizer()
¶
Create the base optimizer then wrap it with Opacus DPOptimizer.
Source code in src/nemo_safe_synthesizer/privacy/dp_transformers/dp_utils.py
training_step(model, inputs, num_items_in_batch=None)
¶
Run one training step and return the loss scaled for logging.
Forward pass and backward are performed as usual. Loss is not scaled by
batch size or per-sample factors here: Opacus handles per-sample gradient
scaling. The returned value is the raw loss divided by
gradient_accumulation_steps so that the logged loss matches the
effective per-step loss (averaged over accumulation steps).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Module
|
The model to train (wrapped in |
required |
inputs
|
dict[str, Tensor | Any]
|
Batch of inputs (e.g. |
required |
num_items_in_batch
|
Tensor | None
|
Unused; passed for API compatibility. Opacus
handles scaling; we pass |
None
|
Returns:
| Type | Description |
|---|---|
Tensor
|
Detached loss tensor scaled by 1 / |
Tensor
|
for logging only (optimizer step is driven by the callback). |
Source code in src/nemo_safe_synthesizer/privacy/dp_transformers/dp_utils.py
get_train_dataloader()
¶
Returns a torch DataLoader that uses an entity-level sampler.
Source code in src/nemo_safe_synthesizer/privacy/dp_transformers/dp_utils.py
create_entity_mapping(entity_column_values)
¶
Build a mapping from each entity to its dataset indices.
Groups rows by the entity column; each group's indices are the dataset positions for that entity. Entity order follows groupby sort; order within a group is preserved.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
entity_column_values
|
list
|
List of entity IDs aligned with dataset rows (e.g. one value per row in the same order). |
required |
Returns:
| Type | Description |
|---|---|
Sequence[Sequence[int]]
|
Sequence of sequences: for entity i, result[i] is the list of dataset |
Sequence[Sequence[int]]
|
indices belonging to that entity. |