models

`models` ¶

Classes:

Name	Description
`DetectionModelSelection`	Model aliases for the entity detection pipeline.
`ReplaceModelSelection`	Model aliases for the replacement pipeline.
`RewriteModelSelection`	Model aliases for the rewrite pipeline.
`EvaluateModelSelection`	Model aliases for the LLM-as-judge evaluation step.
`ModelSelection`	Model alias selections for all pipelines, loaded from YAML defaults via `load_default_model_selection()`.

`DetectionModelSelection` `pydantic-model` ¶

Bases: BaseModel

Model aliases for the entity detection pipeline.

entity_validator accepts either a single alias or a list of aliases. A list forms a validator pool: chunked validation rotates calls across the pool in round-robin order, which is useful for bypassing per-alias TPM/RPM limits. A single scalar is normalized to a one-element list.

Fields:

entity_detector (str)
entity_validator (list[str])
entity_augmenter (str)
latent_detector (str)

Validators:

normalize_entity_validator → entity_validator

`normalize_entity_validator(value)` `pydantic-validator` ¶

Accept a scalar alias, a list of aliases, or a tuple of aliases; return a non-empty deduplicated list.

Normalizing at parse time keeps every downstream consumer on the same shape (list[str]) regardless of whether the user wrote entity_validator: some-alias or entity_validator: [alias-a, alias-b]. Tuples are accepted for parity with Pydantic v2's default coercion for list[str] fields, which lets programmatic callers pass either DetectionModelSelection(entity_validator=["a", "b"]) or DetectionModelSelection(entity_validator=("a", "b")) without caring about the concrete sequence type. Any other input type raises TypeError.

Duplicate aliases are collapsed to the first occurrence (order preserved) and a warning is logged. A duplicate in the pool would burn a failover attempt on an already-exhausted endpoint, which almost certainly isn't what the user wants.

Source code in src/anonymizer/config/models.py

@field_validator("entity_validator", mode="before")
@classmethod
def normalize_entity_validator(cls, value: Any) -> list[str]:
    """Accept a scalar alias, a list of aliases, or a tuple of aliases; return a non-empty deduplicated list.

    Normalizing at parse time keeps every downstream consumer on the
    same shape (``list[str]``) regardless of whether the user wrote
    ``entity_validator: some-alias`` or
    ``entity_validator: [alias-a, alias-b]``. Tuples are accepted for
    parity with Pydantic v2's default coercion for ``list[str]`` fields,
    which lets programmatic callers pass either
    ``DetectionModelSelection(entity_validator=["a", "b"])`` or
    ``DetectionModelSelection(entity_validator=("a", "b"))`` without
    caring about the concrete sequence type. Any other input type
    raises ``TypeError``.

    Duplicate aliases are collapsed to the first occurrence (order
    preserved) and a warning is logged. A duplicate in the pool would
    burn a failover attempt on an already-exhausted endpoint, which
    almost certainly isn't what the user wants.
    """
    if isinstance(value, str):
        aliases: list[str] = [value]
    elif isinstance(value, (list, tuple)):
        aliases = [str(item) for item in value]
    else:
        raise TypeError(f"entity_validator must be a string or list of strings, got {type(value).__name__}")
    cleaned = [alias.strip() for alias in aliases if alias.strip()]
    if not cleaned:
        raise ValueError("entity_validator must name at least one model alias.")
    seen: set[str] = set()
    deduped: list[str] = []
    for alias in cleaned:
        if alias in seen:
            continue
        seen.add(alias)
        deduped.append(alias)
    if len(deduped) != len(cleaned):
        removed = [alias for alias in cleaned if cleaned.count(alias) > 1]
        logger.warning(
            "entity_validator pool contained duplicate aliases %s; collapsing to %s. "
            "Duplicates burn a failover attempt on an already-exhausted endpoint.",
            sorted(set(removed)),
            deduped,
        )
    return deduped

`ReplaceModelSelection` `pydantic-model` ¶

Bases: BaseModel

Model aliases for the replacement pipeline.

Fields:

replacement_generator (str)

`RewriteModelSelection` `pydantic-model` ¶

Bases: BaseModel

Model aliases for the rewrite pipeline.

Fields:

domain_classifier (str)
disposition_analyzer (str)
meaning_extractor (str)
qa_generator (str)
rewriter (str)
evaluator (str)
repairer (str)
judge (str)

`EvaluateModelSelection` `pydantic-model` ¶

Bases: BaseModel

Model aliases for the LLM-as-judge evaluation step.

These roles are only consumed by :meth:Anonymizer.evaluate — they are not needed at anonymization time. Keeping them in their own section lets preview() / run() validate only the roles that produce anonymized output, while evaluate(...) validates the roles that score it.

Fields:

detection_validity_judge (str)
replace_type_fidelity_judge (str)
replace_relational_consistency_judge (str)
replace_attribute_fidelity_judge (str)

`ModelSelection` `pydantic-model` ¶

Bases: BaseModel

Model alias selections for all pipelines, loaded from YAML defaults via load_default_model_selection().

Fields:

detection (DetectionModelSelection)
replace (ReplaceModelSelection)
rewrite (RewriteModelSelection)
evaluate (EvaluateModelSelection)

models

models ¶

DetectionModelSelection pydantic-model ¶

normalize_entity_validator(value) pydantic-validator ¶

ReplaceModelSelection pydantic-model ¶

RewriteModelSelection pydantic-model ¶

EvaluateModelSelection pydantic-model ¶

ModelSelection pydantic-model ¶

`models` ¶

`DetectionModelSelection` `pydantic-model` ¶

`normalize_entity_validator(value)` `pydantic-validator` ¶

`ReplaceModelSelection` `pydantic-model` ¶

`RewriteModelSelection` `pydantic-model` ¶

`EvaluateModelSelection` `pydantic-model` ¶

`ModelSelection` `pydantic-model` ¶