Skip to content

Replace

Replace mode replaces each detected entity with an alternative token. Anonymizer provides four strategies.


Strategy comparison

Strategy Example output for "Alice" (first_name) Deterministic Generated by LLM
Substitute Maya No Yes
Redact [REDACTED_FIRST_NAME] Yes No
Annotate <Alice, first_name> Yes No
Hash <HASH_FIRST_NAME_3bc51062973c> Yes No

Substitute

Replaces entities with LLM-generated synthetic values that are contextually plausible. This is the only replacement strategy that requires an LLM call.

from anonymizer import AnonymizerConfig, Substitute

# Default: uses the replacement_generator model
AnonymizerConfig(replace=Substitute())

# With additional instructions
AnonymizerConfig(replace=Substitute(
    instructions="Replacement IDs must always start with the same 4 characters as the original."
))
Field Default Description
instructions None Additional instructions for the LLM replacement generator.

Model role

Substitute uses the replacement_generator role. Override it if required in your model config:

selected_models:
  replace:
    replacement_generator: your-model-alias

Redact

Replaces entities with a label-based marker. The original text is removed entirely.

from anonymizer import AnonymizerConfig, Redact

# Default: [REDACTED_FIRST_NAME]
AnonymizerConfig(replace=Redact())

# Custom template
AnonymizerConfig(replace=Redact(format_template="****"))

# Template with label only
AnonymizerConfig(replace=Redact(format_template="[{label}]"))
Field Default Description
format_template [REDACTED_{label}] Template with optional {label} placeholder.
normalize_label True Uppercase and clean the label before substitution.

Annotate

Tags entities with their label but preserves the original text. Useful for review and debugging.

from anonymizer import AnonymizerConfig, Annotate

# Default: <Alice, first_name>
AnonymizerConfig(replace=Annotate())

# Custom template
AnonymizerConfig(replace=Annotate(format_template="[{label}: {text}]"))
Field Default Description
format_template <{text}, {label}> Template with {text} and {label} placeholders. Both are required.

Note

The original text is still present, so this is not privacy-safe on its own.


Hash

Replaces entities with a deterministic hash digest. The same entity text always produces the same hash, enabling consistent de-identification across records.

from anonymizer import AnonymizerConfig, Hash

# Default: <HASH_FIRST_NAME_3bc51062973c>
AnonymizerConfig(replace=Hash())

# Short digest with SHA-1
AnonymizerConfig(replace=Hash(algorithm="sha1", digest_length=8))
Field Default Description
algorithm sha256 Hash algorithm (sha256, sha1, or md5).
digest_length 12 Number of hex characters to keep (6--64).
format_template <HASH_{label}_{digest}> Template with {digest} required; {label} optional.