fasttext
fasttext
¶
Classes:
| Name | Description |
|---|---|
FTEntityMatcher |
|
Attributes:
| Name | Type | Description |
|---|---|---|
manifest |
Defines FTEntityMatcher model files |
manifest = ModelManifest(model='fasttext', version='3', sources=[ObjectRef(key='pos_neg_terms', file_name='FT_posneg_terms.pickle'), ObjectRef(key='ft_word_vecs', file_name='FTwordvecsnormPCA.pickle'), ObjectRef(key='ft_ngram_vecs', file_name='FTngramvecsPCA.pickle')], visibility=(Visibility.INTERNAL))
module-attribute
¶
Defines FTEntityMatcher model files
Changelog
- v1: initial model release
- v2: update pos_neg_terms to include state and county tags (GC-59)
- v3: FP fixes: "capital_loss" and "capital_gain" are no longer marked as locations
FTEntityMatcher(*, pos_neg_terms, ft_word_vecs, ft_ngram_vecs)
¶
Methods:
| Name | Description |
|---|---|
compute_ngrams_bytes |
From fasttext |
ft_hash_bytes |
Reproduces dictionary used in fastText. |
norm |
Normalize vector |
get_ft_vec |
Get FastText vector for a word. If it's OOV, gather the |
vec_sim |
Compute the cosing similarity between two vectors |
ent_score |
Determine NERPrediction.score. FT models should all have the same max_score |
Attributes:
| Name | Type | Description |
|---|---|---|
max_score |
float
|
All fasttext predictions are assigned the max_score |
VEC_SIM_SCORE |
str
|
Default key on spacy doc where vector similarity score is stored |
Source code in src/nemo_safe_synthesizer/pii_replacer/ner/fasttext.py
max_score = 0.8
class-attribute
instance-attribute
¶
All fasttext predictions are assigned the max_score
VEC_SIM_SCORE = 'VEC_SIM_SCORE'
class-attribute
instance-attribute
¶
Default key on spacy doc where vector similarity score is stored
compute_ngrams_bytes(word, min_n, max_n)
¶
From fasttext
Source code in src/nemo_safe_synthesizer/pii_replacer/ner/fasttext.py
ft_hash_bytes(bytez)
¶
Reproduces dictionary used in fastText.
source
https://github.com/facebookresearch/fastText/blob/master/src/dictionary.cc
Source code in src/nemo_safe_synthesizer/pii_replacer/ner/fasttext.py
norm(vec)
staticmethod
¶
get_ft_vec(word)
¶
Get FastText vector for a word. If it's OOV, gather the vectors for it's ngrams and average them
Source code in src/nemo_safe_synthesizer/pii_replacer/ner/fasttext.py
vec_sim(a, b)
¶
Compute the cosing similarity between two vectors
Source code in src/nemo_safe_synthesizer/pii_replacer/ner/fasttext.py
ent_score(doc, ent)
staticmethod
¶
Determine NERPrediction.score. FT models should all have the same max_score