anonymizer_config

`anonymizer_config` ¶

Classes:

Name	Description
`AnonymizerInput`	Input source definition for the anonymizer pipeline.
`Detect`	Configuration for the entity detection stage.
`Rewrite`	Configuration for rewrite-mode execution.
`AnonymizerConfig`	Primary user-facing config for anonymization behavior.
`EvaluateConfig`	Optional knobs for :meth:`Anonymizer.evaluate`.

Functions:

Name	Description
`is_remote_input_source`	Return True when the input source is an HTTP(S) URL.
`has_unsupported_url_scheme`	Return True when the input looks like a URL but uses an unsupported scheme.
`infer_input_source_suffix`	Infer the lowercase file suffix from a local path or remote URL path.

`AnonymizerInput` `pydantic-model` ¶

Bases: BaseModel

Input source definition for the anonymizer pipeline.

Format is inferred from the file extension of a local path or HTTP(S) URL.

Fields:

source (str)
text_column (str)
id_column (str | None)
data_summary (str | None)

Validators:

validate_source_path → source

`source` `pydantic-field` ¶

Local path or HTTP(S) URL for a .csv or .parquet input file.

`text_column = 'text'` `pydantic-field` ¶

Column containing the text to anonymize.

`id_column = None` `pydantic-field` ¶

Optional column to use as record identifier.

`data_summary = None` `pydantic-field` ¶

Short description of the data. Improves LLM detection accuracy.

`Detect` `pydantic-model` ¶

Bases: BaseModel

Configuration for the entity detection stage.

Fields:

entity_labels (list[str] | None)
gliner_threshold (float)
validation_max_entities_per_call (int)
validation_excerpt_window_chars (int)

Validators:

validate_entity_labels → entity_labels

`entity_labels = None` `pydantic-field` ¶

Labels to detect. None uses the built-in default detection label set. To inspect the default set, use from anonymizer import DEFAULT_ENTITY_LABELS.

`gliner_threshold = 0.3` `pydantic-field` ¶

GLiNER detection confidence threshold (0.0-1.0).

`validation_max_entities_per_call = 100` `pydantic-field` ¶

Maximum number of candidate entities included in a single validator LLM call. When a row has more candidates than this, validation is split into chunks that are dispatched (round-robin) across the validator pool.

`validation_excerpt_window_chars = 500` `pydantic-field` ¶

Number of characters to include before and after a chunk's entity span when building the text excerpt sent to the validator. Bounds the prompt context the validator sees per chunk; it is NOT the LLM's context window limit.

`Rewrite` `pydantic-model` ¶

Bases: BaseModel

Configuration for rewrite-mode execution.

Fields:

privacy_goal (PrivacyGoal | None)
instructions (str | None)
risk_tolerance (RiskTolerance)
max_repair_iterations (int)
strict_entity_protection (bool)

Validators:

populate_default_privacy_goal

`privacy_goal = None` `pydantic-field` ¶

Structured privacy goal. Auto-populated with defaults if not provided.

`instructions = None` `pydantic-field` ¶

Additional instructions for the rewrite LLM.

`risk_tolerance = RiskTolerance.low` `pydantic-field` ¶

Preset controlling repair thresholds and review flagging.

`max_repair_iterations = 3` `pydantic-field` ¶

Maximum repair rounds. Set to 0 to disable repair.

`strict_entity_protection = False` `pydantic-field` ¶

If True, requires every entity to receive a protective disposition during sensitivity analysis.

`evaluation` `property` ¶

Construct EvaluationCriteria from this Rewrite config for the engine.

Rewrite and EvaluationCriteria both carry max_repair_iterations. This property keeps them in sync: it passes through self.risk_tolerance and self.max_repair_iterations. Leakage thresholds and repair parameters are derived from risk_tolerance via _RiskToleranceBundle (see rewrite.py).

Production code that starts from a user-facing Rewrite should pass rewrite.evaluation into the engine — never duplicate the mapping manually. Tests and engine-internal callers may construct EvaluationCriteria directly when they aren't routing through a user-facing Rewrite.

`AnonymizerConfig` `pydantic-model` ¶

Bases: BaseModel

Primary user-facing config for anonymization behavior.

Fields:

detect (Detect)
replace (ReplaceMethod | None)
rewrite (Rewrite | None)
emit_telemetry (bool)

Validators:

validate_exactly_one_mode

`detect` `pydantic-field` ¶

Entity detection configuration.

`replace = None` `pydantic-field` ¶

Replacement method (Substitute(), Redact(), Annotate(), or Hash()).

`rewrite = None` `pydantic-field` ¶

Optional rewrite-mode parameters.

`emit_telemetry = True` `pydantic-field` ¶

Whether to emit anonymous Anonymizer telemetry events. See the Telemetry section in the README for what is collected and how to opt out at the environment or CLI level.

`EvaluateConfig` `pydantic-model` ¶

Bases: BaseModel

Optional knobs for :meth:Anonymizer.evaluate.

Reserved for genuinely evaluation-specific configuration — metric selection, per-judge model/prompt overrides, scoring thresholds, etc. The anonymization mode is not here: it travels on the AnonymizerResult / PreviewResult produced by run() / preview() and is read directly by evaluate(), so users don't restate it and can't mis-state it.

Today this is an empty placeholder; fields will be added as evaluation knobs are introduced.

`is_remote_input_source(value)` ¶

Return True when the input source is an HTTP(S) URL.

Source code in src/anonymizer/config/anonymizer_config.py

def is_remote_input_source(value: str) -> bool:
    """Return True when the input source is an HTTP(S) URL."""
    parsed = urlparse(value)
    return parsed.scheme in {"http", "https"}

`has_unsupported_url_scheme(value)` ¶

Return True when the input looks like a URL but uses an unsupported scheme.

Source code in src/anonymizer/config/anonymizer_config.py

def has_unsupported_url_scheme(value: str) -> bool:
    """Return True when the input looks like a URL but uses an unsupported scheme."""
    parsed = urlparse(value)
    return "://" in value and bool(parsed.scheme) and parsed.scheme not in {"http", "https"}

`infer_input_source_suffix(value)` ¶

Infer the lowercase file suffix from a local path or remote URL path.

Source code in src/anonymizer/config/anonymizer_config.py

def infer_input_source_suffix(value: str) -> str:
    """Infer the lowercase file suffix from a local path or remote URL path."""
    if is_remote_input_source(value):
        return Path(urlparse(value).path).suffix.lower()
    return Path(value).suffix.lower()

anonymizer_config

anonymizer_config ¶

AnonymizerInput pydantic-model ¶

source pydantic-field ¶

text_column = 'text' pydantic-field ¶

id_column = None pydantic-field ¶

data_summary = None pydantic-field ¶

Detect pydantic-model ¶

entity_labels = None pydantic-field ¶

gliner_threshold = 0.3 pydantic-field ¶

validation_max_entities_per_call = 100 pydantic-field ¶

validation_excerpt_window_chars = 500 pydantic-field ¶

Rewrite pydantic-model ¶

privacy_goal = None pydantic-field ¶

instructions = None pydantic-field ¶

risk_tolerance = RiskTolerance.low pydantic-field ¶

max_repair_iterations = 3 pydantic-field ¶

strict_entity_protection = False pydantic-field ¶

evaluation property ¶

AnonymizerConfig pydantic-model ¶

detect pydantic-field ¶

replace = None pydantic-field ¶

rewrite = None pydantic-field ¶

emit_telemetry = True pydantic-field ¶

EvaluateConfig pydantic-model ¶

is_remote_input_source(value) ¶

has_unsupported_url_scheme(value) ¶

infer_input_source_suffix(value) ¶

`anonymizer_config` ¶

`AnonymizerInput` `pydantic-model` ¶

`source` `pydantic-field` ¶

`text_column = 'text'` `pydantic-field` ¶

`id_column = None` `pydantic-field` ¶

`data_summary = None` `pydantic-field` ¶

`Detect` `pydantic-model` ¶

`entity_labels = None` `pydantic-field` ¶

`gliner_threshold = 0.3` `pydantic-field` ¶

`validation_max_entities_per_call = 100` `pydantic-field` ¶

`validation_excerpt_window_chars = 500` `pydantic-field` ¶

`Rewrite` `pydantic-model` ¶

`privacy_goal = None` `pydantic-field` ¶

`instructions = None` `pydantic-field` ¶

`risk_tolerance = RiskTolerance.low` `pydantic-field` ¶

`max_repair_iterations = 3` `pydantic-field` ¶

`strict_entity_protection = False` `pydantic-field` ¶

`evaluation` `property` ¶

`AnonymizerConfig` `pydantic-model` ¶

`detect` `pydantic-field` ¶

`replace = None` `pydantic-field` ¶

`rewrite = None` `pydantic-field` ¶

`emit_telemetry = True` `pydantic-field` ¶

`EvaluateConfig` `pydantic-model` ¶

`is_remote_input_source(value)` ¶

`has_unsupported_url_scheme(value)` ¶

`infer_input_source_suffix(value)` ¶