Skip to content

anonymizer_config

anonymizer_config

Classes:

Name Description
AnonymizerInput

Input source definition for the anonymizer pipeline.

Detect

Configuration for the entity detection stage.

Rewrite

Configuration for rewrite-mode execution.

AnonymizerConfig

Primary user-facing config for anonymization behavior.

Functions:

Name Description
is_remote_input_source

Return True when the input source is an HTTP(S) URL.

has_unsupported_url_scheme

Return True when the input looks like a URL but uses an unsupported scheme.

infer_input_source_suffix

Infer the lowercase file suffix from a local path or remote URL path.

AnonymizerInput pydantic-model

Bases: BaseModel

Input source definition for the anonymizer pipeline.

Format is inferred from the file extension of a local path or HTTP(S) URL.

Fields:

Validators:

  • validate_source_pathsource

source pydantic-field

Local path or HTTP(S) URL for a .csv or .parquet input file.

text_column = 'text' pydantic-field

Column containing the text to anonymize.

id_column = None pydantic-field

Optional column to use as record identifier.

data_summary = None pydantic-field

Short description of the data. Improves LLM detection accuracy.

Detect pydantic-model

Bases: BaseModel

Configuration for the entity detection stage.

Fields:

Validators:

entity_labels = None pydantic-field

Labels to detect. None uses the built-in default detection label set. To inspect the default set, use from anonymizer import DEFAULT_ENTITY_LABELS.

gliner_threshold = 0.3 pydantic-field

GLiNER detection confidence threshold (0.0-1.0).

Rewrite pydantic-model

Bases: BaseModel

Configuration for rewrite-mode execution.

Fields:

Validators:

  • populate_default_privacy_goal

privacy_goal = None pydantic-field

Structured privacy goal. Auto-populated with defaults if not provided.

instructions = None pydantic-field

Additional instructions for the rewrite LLM.

risk_tolerance = RiskTolerance.low pydantic-field

Preset controlling repair thresholds and review flagging.

max_repair_iterations = 2 pydantic-field

Maximum repair rounds. Set to 0 to disable repair.

evaluation property

Internal: construct EvaluationCriteria for the engine.

AnonymizerConfig pydantic-model

Bases: BaseModel

Primary user-facing config for anonymization behavior.

Fields:

Validators:

  • validate_exactly_one_mode

detect pydantic-field

Entity detection configuration.

replace = None pydantic-field

Replacement method (Substitute(), Redact(), Annotate(), or Hash()).

rewrite = None pydantic-field

Optional rewrite-mode parameters.

is_remote_input_source(value)

Return True when the input source is an HTTP(S) URL.

Source code in src/anonymizer/config/anonymizer_config.py
def is_remote_input_source(value: str) -> bool:
    """Return True when the input source is an HTTP(S) URL."""
    parsed = urlparse(value)
    return parsed.scheme in {"http", "https"}

has_unsupported_url_scheme(value)

Return True when the input looks like a URL but uses an unsupported scheme.

Source code in src/anonymizer/config/anonymizer_config.py
def has_unsupported_url_scheme(value: str) -> bool:
    """Return True when the input looks like a URL but uses an unsupported scheme."""
    parsed = urlparse(value)
    return "://" in value and bool(parsed.scheme) and parsed.scheme not in {"http", "https"}

infer_input_source_suffix(value)

Infer the lowercase file suffix from a local path or remote URL path.

Source code in src/anonymizer/config/anonymizer_config.py
def infer_input_source_suffix(value: str) -> str:
    """Infer the lowercase file suffix from a local path or remote URL path."""
    if is_remote_input_source(value):
        return Path(urlparse(value).path).suffix.lower()
    return Path(value).suffix.lower()