Replace¶
Replace mode replaces each detected entity with an alternative token. Anonymizer provides four strategies.
Strategy comparison¶
| Strategy | Example output for "Alice" (first_name) |
Deterministic | Generated by LLM |
|---|---|---|---|
| Substitute | Maya |
No | Yes |
| Redact | [REDACTED_FIRST_NAME] |
Yes | No |
| Annotate | <Alice, first_name> |
Yes | No |
| Hash | <HASH_FIRST_NAME_3bc51062973c> |
Yes | No |
Substitute¶
Replaces entities with LLM-generated synthetic values that are contextually plausible. This is the only replacement strategy that requires an LLM call.
from anonymizer import AnonymizerConfig, Substitute
# Default: uses the replacement_generator model
AnonymizerConfig(replace=Substitute())
# With additional instructions
AnonymizerConfig(replace=Substitute(
instructions="Replacement IDs must always start with the same 4 characters as the original."
))
| Field | Default | Description |
|---|---|---|
instructions |
None |
Additional instructions for the LLM replacement generator. |
Model role
Substitute uses the replacement_generator role. Override it if required in your model config:
selected_models:
replace:
replacement_generator: your-model-alias
Redact¶
Replaces entities with a label-based marker. The original text is removed entirely.
from anonymizer import AnonymizerConfig, Redact
# Default: [REDACTED_FIRST_NAME]
AnonymizerConfig(replace=Redact())
# Custom template
AnonymizerConfig(replace=Redact(format_template="****"))
# Template with label only
AnonymizerConfig(replace=Redact(format_template="[{label}]"))
| Field | Default | Description |
|---|---|---|
format_template |
[REDACTED_{label}] |
Template with optional {label} placeholder. |
normalize_label |
True |
Uppercase and clean the label before substitution. |
Annotate¶
Tags entities with their label but preserves the original text. Useful for review and debugging.
from anonymizer import AnonymizerConfig, Annotate
# Default: <Alice, first_name>
AnonymizerConfig(replace=Annotate())
# Custom template
AnonymizerConfig(replace=Annotate(format_template="[{label}: {text}]"))
| Field | Default | Description |
|---|---|---|
format_template |
<{text}, {label}> |
Template with {text} and {label} placeholders. Both are required. |
Note
The original text is still present, so this is not privacy-safe on its own.
Hash¶
Replaces entities with a deterministic hash digest. The same entity text always produces the same hash, enabling consistent de-identification across records.
from anonymizer import AnonymizerConfig, Hash
# Default: <HASH_FIRST_NAME_3bc51062973c>
AnonymizerConfig(replace=Hash())
# Short digest with SHA-1
AnonymizerConfig(replace=Hash(algorithm="sha1", digest_length=8))
| Field | Default | Description |
|---|---|---|
algorithm |
sha256 |
Hash algorithm (sha256, sha1, or md5). |
digest_length |
12 |
Number of hex characters to keep (6--64). |
format_template |
<HASH_{label}_{digest}> |
Template with {digest} required; {label} optional. |