nlp
nlp
¶
Classes:
| Name | Description |
|---|---|
FieldStr |
String optimized field representation for NLP prediction pipelines. |
FieldStr(field, value_path, offset, text)
dataclass
¶
String optimized field representation for NLP prediction pipelines.
Methods:
| Name | Description |
|---|---|
from_kv_pair |
Returns a string optimized input for NLP predictions. |
spacy_doc_to_ner_prediction |
Given a prediction document, return an NERPrediction. |
from_kv_pair(pair)
classmethod
¶
Returns a string optimized input for NLP predictions.
For example give a k,v pair
{"location": "united states"} this function will
merge the pair into a string
"location is united states"
These merged strings produce better prediction results from our NLP pipeline.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pair
|
KVPair
|
|
required |
Returns:
| Type | Description |
|---|---|
FieldStr
|
An instance of |
Source code in src/nemo_safe_synthesizer/pii_replacer/ner/nlp.py
spacy_doc_to_ner_prediction(doc, source, validator=None)
¶
Given a prediction document, return an NERPrediction.
This function will apply a set of rules on a Spacy doc and extract predictions based on those rules. Certain predictions are filtered out based on score and entity type.
This function is also responsible for reconstructing the input string into it's source KVPair. Since Spacy creates spans on texts of different lengths, we account for those lengths during reconstruction.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
doc
|
Doc
|
The spacy doc to extract entities from |
required |
source
|
str
|
the model used to create predictions. |
required |