Skip to content

predictor

predictor

Classes:

Name Description
ContextSpan

This class can be used to search for surrounding context given an

PredictorContext

Base class for an arbitrary context object that can be

Predictor

Base class for managing an entity prediction.

ContextSpan(pattern_list, span=DEFAULT_CONTEXT_SPAN_SIZE) dataclass

This class can be used to search for surrounding context given an input string and some start/end offsets within that string. You create this object by providing a list of discrete strings or regex patterns to match on, and then how far "left" and "right" of the target string to search for these patterns.

In the below example we'll search for context left and right of a phone number::

tgt = "Please give me a call at 867-5309"

We can create a ContextSpan to use the "call" string as context::

c = ContextSpan(pattern_list=["call"])
assert c.is_match(tgt, 25, 33)

Parameters:

Name Type Description Default
pattern_list list[str | Pattern]

A list of strings or regex Patterns to use for matching

required
span int

How many characters left of the start index and right of the end index to search for any matches from the pattern_list objects.

DEFAULT_CONTEXT_SPAN_SIZE

PredictorContext() dataclass

Bases: ABC

Base class for an arbitrary context object that can be passed into a predictor. Arbitrary contexts can be subclassed from here and passed into the Predictor objects.

This can be useful when predictors should have the same business logic but perhaps some differing settings like contexts, etc

Predictor(name, namespace=None, predictor_context=None)

Bases: ABC

Base class for managing an entity prediction.

Predictors operate at the record level and might be managed via a PredictionPipeline parent class. For a NLP pipeline this might represent a model. In pattern based pipelines a Predictor might represent a single entity matcher such as an IP address.

Methods:

Name Description
evaluate

This MUST be implemented by each Predictor

header_has_context

Checks to see if the field has a label match.

Attributes:

Name Type Description
default_name str

Subclasses can set a default name to use that

Source code in src/nemo_safe_synthesizer/pii_replacer/ner/predictor.py
def __init__(
    self,
    name: str,
    namespace: str = None,
    predictor_context: Optional[PredictorContext] = None,
):
    if namespace is None:
        namespace = self.default_namespace

    if not name:
        raise ValueError("name required")

    self.source = f"{namespace.lower()}/{name.lower()}"
    self._context = predictor_context

default_name = None class-attribute instance-attribute

Subclasses can set a default name to use that can be directly accessed as a class attr if need be.

evaluate(in_data) abstractmethod

This MUST be implemented by each Predictor

Source code in src/nemo_safe_synthesizer/pii_replacer/ner/predictor.py
@abstractmethod
def evaluate(self, in_data: JSONRecord) -> list[NERPrediction]:
    """This MUST be implemented by each Predictor"""
    pass

header_has_context(field_pair, header_context_source, token_patterns=None, regex_patterns=None)

Checks to see if the field has a label match.

Source code in src/nemo_safe_synthesizer/pii_replacer/ner/predictor.py
def header_has_context(
    self,
    field_pair: KVPair,
    header_context_source: int,
    token_patterns: Pattern = None,
    regex_patterns: Pattern = None,
) -> bool:
    """Checks to see if the field has a label match."""
    _field = field_pair
    if header_context_source == self.BOTH:
        search_string = (_field.field + " " + _field.value if _field.field else _field.value).casefold()
    elif header_context_source == self.VALUE:
        search_string = _field.value.casefold()
    else:
        if _field.field is None:
            return False
        search_string = _field.field.casefold()

    if regex_patterns is not None and regex_patterns.search(search_string):
        return True

    if token_patterns is not None:
        for token in field_pair.field_tokens:
            if token_patterns.match(token):
                return True

    return False