Skip to content

anonymizer_config

anonymizer_config

Classes:

Name Description
AnonymizerInput

Input source definition for the anonymizer pipeline.

Detect

Configuration for the entity detection stage.

Rewrite

Configuration for rewrite-mode execution.

AnonymizerConfig

Primary user-facing config for anonymization behavior.

Functions:

Name Description
is_remote_input_source

Return True when the input source is an HTTP(S) URL.

has_unsupported_url_scheme

Return True when the input looks like a URL but uses an unsupported scheme.

infer_input_source_suffix

Infer the lowercase file suffix from a local path or remote URL path.

AnonymizerInput pydantic-model

Bases: BaseModel

Input source definition for the anonymizer pipeline.

Format is inferred from the file extension of a local path or HTTP(S) URL.

Fields:

Validators:

  • validate_source_pathsource

source pydantic-field

Local path or HTTP(S) URL for a .csv or .parquet input file.

text_column = 'text' pydantic-field

Column containing the text to anonymize.

id_column = None pydantic-field

Optional column to use as record identifier.

data_summary = None pydantic-field

Short description of the data. Improves LLM detection accuracy.

Detect pydantic-model

Bases: BaseModel

Configuration for the entity detection stage.

Fields:

Validators:

entity_labels = None pydantic-field

Labels to detect. None uses the built-in default detection label set. To inspect the default set, use from anonymizer import DEFAULT_ENTITY_LABELS.

gliner_threshold = 0.3 pydantic-field

GLiNER detection confidence threshold (0.0-1.0).

validation_max_entities_per_call = 100 pydantic-field

Maximum number of candidate entities included in a single validator LLM call. When a row has more candidates than this, validation is split into chunks that are dispatched (round-robin) across the validator pool.

validation_excerpt_window_chars = 500 pydantic-field

Number of characters to include before and after a chunk's entity span when building the text excerpt sent to the validator. Bounds the prompt context the validator sees per chunk; it is NOT the LLM's context window limit.

Rewrite pydantic-model

Bases: BaseModel

Configuration for rewrite-mode execution.

Fields:

Validators:

  • populate_default_privacy_goal

privacy_goal = None pydantic-field

Structured privacy goal. Auto-populated with defaults if not provided.

instructions = None pydantic-field

Additional instructions for the rewrite LLM.

risk_tolerance = RiskTolerance.low pydantic-field

Preset controlling repair thresholds and review flagging.

max_repair_iterations = 3 pydantic-field

Maximum repair rounds. Set to 0 to disable repair.

strict_entity_protection = False pydantic-field

If True, requires every entity to receive a protective disposition during sensitivity analysis.

evaluation property

Construct EvaluationCriteria from this Rewrite config for the engine.

Rewrite and EvaluationCriteria both carry max_repair_iterations. This property keeps them in sync: it passes through self.risk_tolerance and self.max_repair_iterations. Leakage thresholds and repair parameters are derived from risk_tolerance via _RiskToleranceBundle (see rewrite.py).

Production code that starts from a user-facing Rewrite should pass rewrite.evaluation into the engine — never duplicate the mapping manually. Tests and engine-internal callers may construct EvaluationCriteria directly when they aren't routing through a user-facing Rewrite.

AnonymizerConfig pydantic-model

Bases: BaseModel

Primary user-facing config for anonymization behavior.

Fields:

Validators:

  • validate_exactly_one_mode

detect pydantic-field

Entity detection configuration.

replace = None pydantic-field

Replacement method (Substitute(), Redact(), Annotate(), or Hash()).

rewrite = None pydantic-field

Optional rewrite-mode parameters.

emit_telemetry = True pydantic-field

Whether to emit anonymous Anonymizer telemetry events. See the Telemetry section in the README for what is collected and how to opt out at the environment or CLI level.

is_remote_input_source(value)

Return True when the input source is an HTTP(S) URL.

Source code in src/anonymizer/config/anonymizer_config.py
def is_remote_input_source(value: str) -> bool:
    """Return True when the input source is an HTTP(S) URL."""
    parsed = urlparse(value)
    return parsed.scheme in {"http", "https"}

has_unsupported_url_scheme(value)

Return True when the input looks like a URL but uses an unsupported scheme.

Source code in src/anonymizer/config/anonymizer_config.py
def has_unsupported_url_scheme(value: str) -> bool:
    """Return True when the input looks like a URL but uses an unsupported scheme."""
    parsed = urlparse(value)
    return "://" in value and bool(parsed.scheme) and parsed.scheme not in {"http", "https"}

infer_input_source_suffix(value)

Infer the lowercase file suffix from a local path or remote URL path.

Source code in src/anonymizer/config/anonymizer_config.py
def infer_input_source_suffix(value: str) -> str:
    """Infer the lowercase file suffix from a local path or remote URL path."""
    if is_remote_input_source(value):
        return Path(urlparse(value).path).suffix.lower()
    return Path(value).suffix.lower()