Skip to content

sampler

sampler

Samplers for DP batch creation.

Provides entity-level and record-level samplers: ShuffledEntitySampler (shuffle entities, fixed batch size), PoissonEntitySampler (Poisson sampling for proper DP accounting), and UniformWithReplacementNonNullSampler (no empty batches).

Classes:

Name Description
ShuffledEntitySampler

Sample batches of entities at random, one sample per entity per batch.

PoissonEntitySampler

Sample entities with Poisson (per-entity) sampling for correct DP accounting.

UniformWithReplacementNonNullSampler

Uniform-with-replacement sampler that skips empty batches but counts them.

ShuffledEntitySampler(entity_mapping, batch_size)

Bases: _EntitySampler

Sample batches of entities at random, one sample per entity per batch.

Uses RandomSampler to shuffle entities and BatchSampler to form batches of batch_size entities. Each batch contains one sample from each of the chosen entities, so no single entity dominates a step (important for entity-level DP and when training for less than one epoch).

Parameters:

Name Type Description Default
entity_mapping Sequence[Sequence[int]]

For entity i, entity_mapping[i] is the list of dataset indices for that entity; dataset[entity_mapping[i][j]] is the j-th sample of entity i.

required
batch_size int

Number of entities (and thus samples) per batch.

required
Source code in src/nemo_safe_synthesizer/privacy/dp_transformers/sampler.py
def __init__(self, entity_mapping: Sequence[Sequence[int]], batch_size: int) -> None:
    entity_sampler = BatchSampler(RandomSampler(entity_mapping), batch_size=batch_size, drop_last=True)
    super().__init__(entity_sampler, entity_mapping)

PoissonEntitySampler(entity_mapping, sample_rate)

Bases: _EntitySampler

Sample entities with Poisson (per-entity) sampling for correct DP accounting.

Each entity is included in a batch with probability sample_rate. Batch size varies; on average equals len(entities) * sample_rate. Empty batches are skipped but counted toward the step budget.

Parameters:

Name Type Description Default
entity_mapping Sequence[Sequence[int]]

For entity i, entity_mapping[i] is the list of dataset indices for that entity.

required
sample_rate float

Probability of each entity being included in a batch.

required
Source code in src/nemo_safe_synthesizer/privacy/dp_transformers/sampler.py
def __init__(self, entity_mapping: Sequence[Sequence[int]], sample_rate: float) -> None:
    entity_sampler = UniformWithReplacementNonNullSampler(
        num_samples=len(entity_mapping),
        sample_rate=sample_rate,
    )
    super().__init__(entity_sampler, entity_mapping)

UniformWithReplacementNonNullSampler(*args, **kwargs)

Bases: UniformWithReplacementSampler

Uniform-with-replacement sampler that skips empty batches but counts them.

Same as Opacus UniformWithReplacementSampler except batches with zero samples are not yielded. Empty batches are still counted toward the total number of steps so that step-based privacy accounting (e.g. ε composition) remains correct. Used by PoissonEntitySampler for Poisson sampling.

Attributes:

Name Type Description
empty_batches

Number of empty batches skipped so far (reset at the start of each __iter__; only meaningful during iteration).

Source code in src/nemo_safe_synthesizer/privacy/dp_transformers/sampler.py
def __init__(self, *args, **kwargs):
    # NOTE: we might want to log empty_batches for debugging purposes
    self.empty_batches = 0
    super().__init__(*args, **kwargs)