environment
environment
¶
Classes:
| Name | Description |
|---|---|
PersonaProvider |
Faker provider that yields a consistent persona (first/last name, email, gender) per row seed. |
Faker |
Thin wrapper around Faker with optional seeding; supports |
SafeSynthesizerFakerMethodNotFound |
Raised when no Faker method exists for an entity type (e.g. in |
Environment |
Jinja sandbox with Faker, date filters, and entity filters (detect/redact/label/hash/fake). |
Functions:
| Name | Description |
|---|---|
lookup_country |
Resolve country name or code to pycountry |
lookup_locales |
Return Faker locales matching the country for |
tld |
Return the TLD for the country (e.g. |
normalize |
Transliterate to ASCII and remove characters not in |
sha256 |
Return SHA-256 hex digest of |
redact_entities_fn |
Return the entity label in angle brackets (e.g. |
label_entities_fn |
Return an XML-like tag with |
hash_entities_fn |
Return first 9 chars of SHA-256 hash of entity text (with optional |
fake_entities_fn |
Replace entity with a faked value of the same type; fall back per |
PersonaProvider
¶
Bases: BaseProvider
Faker provider that yields a consistent persona (first/last name, email, gender) per row seed.
Methods:
| Name | Description |
|---|---|
persona |
Return a dict with |
persona(row_index=1, email_format='first_name.last_name', domain_type='all_domains', gender=None)
¶
Return a dict with first_name, last_name, email, gender.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
row_index
|
int
|
Seed for deterministic persona (default 1). |
1
|
email_format
|
str
|
One of |
'first_name.last_name'
|
domain_type
|
str
|
|
'all_domains'
|
gender
|
Optional[str]
|
|
None
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Dict with keys |
Source code in src/nemo_safe_synthesizer/pii_replacer/data_editor/environment.py
Faker(locale=None, seed=None)
¶
Thin wrapper around Faker with optional seeding; supports maybe_seed for deterministic per-row data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
locale
|
Optional[list[str]]
|
Faker locale(s); single locale enables |
None
|
seed
|
Optional[SeedType]
|
Global seed for reproducibility; if set, |
None
|
Attributes:
| Name | Type | Description |
|---|---|---|
global_seed |
SeedType
|
Seed passed at construction (if any); used when |
Source code in src/nemo_safe_synthesizer/pii_replacer/data_editor/environment.py
SafeSynthesizerFakerMethodNotFound
¶
Bases: Exception
Raised when no Faker method exists for an entity type (e.g. in fake_entities_fn with on_error='raise').
Environment(locales, seed, globals_config=None, entity_extractor=None)
¶
Jinja sandbox with Faker, date filters, and entity filters (detect/redact/label/hash/fake).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
locales
|
Optional[list[str]]
|
Faker locale(s); passed to |
required |
seed
|
SeedType
|
Seed for Faker and hash filter. |
required |
globals_config
|
Optional[dict[str, Any]]
|
Optional dict exposed as |
None
|
entity_extractor
|
Optional[EntityExtractor]
|
Extractor for NER filters; default |
None
|
Attributes:
| Name | Type | Description |
|---|---|---|
entity_extractor |
EntityExtractor
|
The NER extractor used by entity filters. |
ner_cacheable_filters |
Set of filter names that benefit from NER cache prefill. |
Methods:
| Name | Description |
|---|---|
maybe_seed |
Set Faker instance seed for deterministic output (e.g. per row). |
template_to_fnames |
Parse the template's AST and return the set of filter/function names used (e.g. |
make_template |
Build a Jinja template from the string (wrapped so empty/missing renders as the literal string). |
date_shift |
Return a random date in the interval defined by |
date_time_shift |
Return a random datetime in the interval (like |
date_format |
Format date/datetime as string; delegates to |
date_time_format |
Parse if string, then return |
fake_filter |
Return a faked value for the given type (e.g. |
Source code in src/nemo_safe_synthesizer/pii_replacer/data_editor/environment.py
maybe_seed(instance_seed)
¶
Set Faker instance seed for deterministic output (e.g. per row).
template_to_fnames(template_str)
¶
Parse the template's AST and return the set of filter/function names used (e.g. fake, hash).
Source code in src/nemo_safe_synthesizer/pii_replacer/data_editor/environment.py
make_template(template_str)
¶
Build a Jinja template from the string (wrapped so empty/missing renders as the literal string).
Source code in src/nemo_safe_synthesizer/pii_replacer/data_editor/environment.py
date_shift(value, min_offset='-30y', max_offset='today')
¶
Return a random date in the interval defined by value and offsets, then subtract delta to preserve relative position.
E.g. 2000-01-01 | date_shift('-1y', '+1y') picks a date between 1999-01-01 and 2001-01-01 (Faker),
then adjusts so the result is in the same relative position from today as value.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
value
|
date | datetime | str
|
Base date (or parseable string). |
required |
min_offset
|
date | datetime | timedelta | str | int
|
Minimum offset (e.g. |
'-30y'
|
max_offset
|
date | datetime | timedelta | str | int
|
Maximum offset. |
'today'
|
Returns:
| Type | Description |
|---|---|
datetime
|
Shifted date as datetime. |
Source code in src/nemo_safe_synthesizer/pii_replacer/data_editor/environment.py
date_time_shift(value, min_offset='-30y', max_offset='now')
¶
Return a random datetime in the interval (like date_shift but with time); preserves relative position from today.
E.g. 2000-01-01 00:00 | date_time_shift('-1y', '+1y') picks between 1999-01-01 00:00 and 2001-01-01 00:00.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
value
|
date | datetime | str
|
Base datetime (or parseable string). |
required |
min_offset
|
date | datetime | timedelta | str | int
|
Minimum offset (e.g. |
'-30y'
|
max_offset
|
date | datetime | timedelta | str | int
|
Maximum offset. |
'now'
|
Returns:
| Type | Description |
|---|---|
datetime
|
Shifted datetime. |
Source code in src/nemo_safe_synthesizer/pii_replacer/data_editor/environment.py
date_format(value, format='%Y-%m-%d')
¶
Format date/datetime as string; delegates to date_time_format with default date-only format.
Source code in src/nemo_safe_synthesizer/pii_replacer/data_editor/environment.py
date_time_format(value, format='%Y-%m-%d %H:%M:%S')
¶
Parse if string, then return strftime with the given format.
Source code in src/nemo_safe_synthesizer/pii_replacer/data_editor/environment.py
fake_filter(fake_type)
¶
Return a faked value for the given type (e.g. birthdate, email_address); uses mapping or getattr(fake, type).
Source code in src/nemo_safe_synthesizer/pii_replacer/data_editor/environment.py
lookup_country(value)
¶
Resolve country name or code to pycountry Data (with alias map for e.g. Russia, UK).
Source code in src/nemo_safe_synthesizer/pii_replacer/data_editor/environment.py
lookup_locales(value)
¶
Return Faker locales matching the country for value (e.g. en_GB for UK), or None if none.
Source code in src/nemo_safe_synthesizer/pii_replacer/data_editor/environment.py
tld(value)
¶
Return the TLD for the country (e.g. .uk for GB).
Source code in src/nemo_safe_synthesizer/pii_replacer/data_editor/environment.py
normalize(value, allow='')
¶
Transliterate to ASCII and remove characters not in \w or allow.
sha256(default_salt, value, salt=None)
¶
Return SHA-256 hex digest of salt + value; salt defaults to default_salt (e.g. from Environment).
In templates, this | hash uses the default salt; this | hash(salt="ABC") overrides it.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
default_salt
|
str
|
Salt used when |
required |
value
|
Any
|
Value to hash (stringified). |
required |
salt
|
Optional[str]
|
Optional override; if |
None
|
Returns:
| Type | Description |
|---|---|
str
|
Hexadecimal digest string. |
Source code in src/nemo_safe_synthesizer/pii_replacer/data_editor/environment.py
redact_entities_fn(entity)
¶
Return the entity label in angle brackets (e.g. <first_name>).
label_entities_fn(entity, extended=False)
¶
Return an XML-like tag with type and value (and optionally source/score if extended).
Source code in src/nemo_safe_synthesizer/pii_replacer/data_editor/environment.py
hash_entities_fn(default_salt, entity, salt=None)
¶
Return first 9 chars of SHA-256 hash of entity text (with optional salt).
Source code in src/nemo_safe_synthesizer/pii_replacer/data_editor/environment.py
fake_entities_fn(hash_salt, fake, entity, on_error=None, extended=False)
¶
Replace entity with a faked value of the same type; fall back per on_error if no Faker method exists.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
hash_salt
|
str
|
Salt for hash fallback. |
required |
fake
|
Faker
|
Faker instance. |
required |
entity
|
NERPrediction
|
NER prediction to fake. |
required |
on_error
|
Optional[str]
|
If no Faker method: |
None
|
extended
|
Optional[bool]
|
Passed to |
False
|
Returns:
| Type | Description |
|---|---|
str
|
Faked string or fallback (redact/label/hash). |
Raises:
| Type | Description |
|---|---|
SafeSynthesizerFakerMethodNotFound
|
When |