nlp

`nlp` ¶

Classes:

Name	Description
`FieldStr`	String optimized field representation for NLP prediction pipelines.

`FieldStr(field, value_path, offset, text)` `dataclass` ¶

String optimized field representation for NLP prediction pipelines.

Methods:

Name	Description
`from_kv_pair`	Returns a string optimized input for NLP predictions.
`spacy_doc_to_ner_prediction`	Given a prediction document, return an NERPrediction.

`from_kv_pair(pair)` `classmethod` ¶

Returns a string optimized input for NLP predictions.

For example give a k,v pair

{"location": "united states"} this function will

merge the pair into a string

"location is united states"

These merged strings produce better prediction results from our NLP pipeline.

Parameters:

Name	Type	Description	Default
`pair`	`KVPair`	`KVPair` from a `JSONRecord` to merge.	required

Returns:

Type	Description
`FieldStr`	An instance of `FieldStr`.

Source code in src/nemo_safe_synthesizer/pii_replacer/ner/nlp.py

@classmethod
def from_kv_pair(cls, pair: KVPair) -> FieldStr:
    """Returns a string optimized input for NLP predictions.

    For example give a k,v pair

        {"location": "united states"} this function will

    merge the pair into a string

        "location is united states"

    These merged strings produce better prediction results from our NLP pipeline.

    Args:
        pair: ``KVPair`` from a ``JSONRecord`` to merge.

    Returns:
        An instance of ``FieldStr``.
    """
    prefix = " ".join(pair.field_tokens) + SPACY_DELIM if pair.field else ""
    return cls(
        field=pair.field,
        value_path=pair.value_path,
        offset=len(prefix),
        text=prefix + str(pair.value),
    )

`spacy_doc_to_ner_prediction(doc, source, validator=None)` ¶

Given a prediction document, return an NERPrediction.

This function will apply a set of rules on a Spacy doc and extract predictions based on those rules. Certain predictions are filtered out based on score and entity type.

This function is also responsible for reconstructing the input string into it's source KVPair. Since Spacy creates spans on texts of different lengths, we account for those lengths during reconstruction.

Parameters:

Name	Type	Description	Default
`doc`	`Doc`	The spacy doc to extract entities from	required
`source`	`str`	the model used to create predictions.	required

Source code in src/nemo_safe_synthesizer/pii_replacer/ner/nlp.py

def spacy_doc_to_ner_prediction(
    self, doc: Doc, source: str, validator: Optional[Callable] = None
) -> list[NERPrediction]:
    """Given a prediction document, return an NERPrediction.

    This function will apply a set of rules on a Spacy doc and extract predictions
    based on those rules. Certain predictions are filtered out based on score
    and entity type.

    This function is also responsible for reconstructing the input string into it's
    source KVPair. Since Spacy creates spans on texts of different lengths, we account
    for those lengths during reconstruction.

    Args:
        doc: The spacy doc to extract entities from
        source: the model used to create predictions.
    """
    preds = []
    for ent in doc.ents:
        # Don't create predictions for entities that were found inside the tokenized prefix.
        if ent.start_char - self.offset >= 0:
            if validator is None or validator(ent):
                label = NLP_ENTITY_MAP.get(getattr(ent, "label_"))
                start = ent.start_char - self.offset
                end = ent.end_char - self.offset
                substring_match = end - start + self.offset != len(doc.text)
                if label:
                    pred = NERPrediction(
                        text=ent.text.strip(),
                        start=start,
                        end=end,
                        field=self.field,
                        value_path=self.value_path,
                        label=label.tag,
                        score=doc._.ent_score(label),
                        source=source,  # todo: get source from model instead of cls
                        substring_match=substring_match,
                    )
                    preds.append(pred)
    return preds

nlp

nlp ¶

FieldStr(field, value_path, offset, text) dataclass ¶

from_kv_pair(pair) classmethod ¶

spacy_doc_to_ner_prediction(doc, source, validator=None) ¶

`nlp` ¶

`FieldStr(field, value_path, offset, text)` `dataclass` ¶

`from_kv_pair(pair)` `classmethod` ¶

`spacy_doc_to_ner_prediction(doc, source, validator=None)` ¶