sampler
sampler
¶
Samplers for DP batch creation.
Provides entity-level and record-level samplers: ShuffledEntitySampler
(shuffle entities, fixed batch size), PoissonEntitySampler (Poisson
sampling for proper DP accounting), and UniformWithReplacementNonNullSampler
(no empty batches).
Classes:
| Name | Description |
|---|---|
ShuffledEntitySampler |
Sample batches of entities at random, one sample per entity per batch. |
PoissonEntitySampler |
Sample entities with Poisson (per-entity) sampling for correct DP accounting. |
UniformWithReplacementNonNullSampler |
Uniform-with-replacement sampler that skips empty batches but counts them. |
ShuffledEntitySampler(entity_mapping, batch_size)
¶
Bases: _EntitySampler
Sample batches of entities at random, one sample per entity per batch.
Uses RandomSampler to shuffle entities and BatchSampler to form batches of
batch_size entities. Each batch contains one sample from each of the
chosen entities, so no single entity dominates a step (important for
entity-level DP and when training for less than one epoch).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
entity_mapping
|
Sequence[Sequence[int]]
|
For entity i, entity_mapping[i] is the list of dataset indices for that entity; dataset[entity_mapping[i][j]] is the j-th sample of entity i. |
required |
batch_size
|
int
|
Number of entities (and thus samples) per batch. |
required |
Source code in src/nemo_safe_synthesizer/privacy/dp_transformers/sampler.py
PoissonEntitySampler(entity_mapping, sample_rate)
¶
Bases: _EntitySampler
Sample entities with Poisson (per-entity) sampling for correct DP accounting.
Each entity is included in a batch with probability sample_rate.
Batch size varies; on average equals len(entities) * sample_rate.
Empty batches are skipped but counted toward the step budget.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
entity_mapping
|
Sequence[Sequence[int]]
|
For entity i, entity_mapping[i] is the list of dataset indices for that entity. |
required |
sample_rate
|
float
|
Probability of each entity being included in a batch. |
required |
Source code in src/nemo_safe_synthesizer/privacy/dp_transformers/sampler.py
UniformWithReplacementNonNullSampler(*args, **kwargs)
¶
Bases: UniformWithReplacementSampler
Uniform-with-replacement sampler that skips empty batches but counts them.
Same as Opacus UniformWithReplacementSampler except batches with zero
samples are not yielded. Empty batches are still counted toward the total
number of steps so that step-based privacy accounting (e.g. ε composition)
remains correct. Used by PoissonEntitySampler for Poisson sampling.
Attributes:
| Name | Type | Description |
|---|---|---|
empty_batches |
Number of empty batches skipped so far (reset at the
start of each |