Skip to content

pipeline

pipeline

Classes:

Name Description
PredictionSource

This enum stores default source tags for NLP and other more

Pipeline

A lightweight container class for managing prediction pipelines.

Functions:

Name Description
regex_pipeline

Returns a pipeline with regex predictors

default_pipeline

Returns a pipeline with the following predictors:

create_default_ner

Helper function that creates a NER

Attributes:

Name Type Description
CUSTOM_CONFIG

User defined custom regex predictors and patterns

CUSTOM_CONFIG = 'config.yml' module-attribute

User defined custom regex predictors and patterns

PredictionSource

Bases: Enum

This enum stores default source tags for NLP and other more complex predictors that have associated "models" that need to be downloaded or are not automatically loaded based on a sub-package structure like regexes.

Pipeline(predictors=None)

A lightweight container class for managing prediction pipelines.

Methods:

Name Description
get_predictor

Returns the first predictor by source name in the pipeline

add_predictors_from_yaml

Look for a custom config file with regex predictor data and

Attributes:

Name Type Description
predictor_list list[str]

All namespaced predictors currently loaded on the pipeline.

Source code in src/nemo_safe_synthesizer/pii_replacer/ner/pipeline.py
def __init__(self, predictors: list[Predictor] = None):
    self.predictors = predictors or []
    self.load_timings = {}

predictor_list property

All namespaced predictors currently loaded on the pipeline.

get_predictor(source)

Returns the first predictor by source name in the pipeline

Parameters:

Name Type Description Default
source str

source search token

required

Returns:

Type Description
Predictor

the first found predictor

Source code in src/nemo_safe_synthesizer/pii_replacer/ner/pipeline.py
def get_predictor(self, source: str) -> Predictor:
    """Returns the first predictor by source name in the pipeline

    Args:
        source: source search token

    Returns:
        the first found predictor
    """
    try:
        return next(p for p in self.predictors if p.source == source)
    except StopIteration:
        return None

add_predictors_from_yaml(file_path=CUSTOM_CONFIG)

Look for a custom config file with regex predictor data and load them into the pipeline

Source code in src/nemo_safe_synthesizer/pii_replacer/ner/pipeline.py
def add_predictors_from_yaml(self, file_path: str = CUSTOM_CONFIG):
    """Look for a custom config file with regex predictor data and
    load them into the pipeline
    """
    _path = Path(file_path)
    if not _path.is_file():
        logger.info("Custom Predictors: Not Found, skipping")
        return
    logger.info("Custom Predictors: loading from %s", file_path)
    self.add_predictors(get_predictors_from_yaml(_path))

regex_pipeline()

Returns a pipeline with regex predictors

Source code in src/nemo_safe_synthesizer/pii_replacer/ner/pipeline.py
def regex_pipeline() -> Pipeline:
    """Returns a pipeline with regex predictors"""
    return Pipeline.from_class_refs(rules)

default_pipeline()

Returns a pipeline with the following predictors: - All regexes - DateTime - BirthDateTime - PersonName - Locations (ER + FT)

Source code in src/nemo_safe_synthesizer/pii_replacer/ner/pipeline.py
def default_pipeline() -> Pipeline:
    """Returns a pipeline with  the following predictors:
    - All regexes
    - DateTime
    - BirthDateTime
    - PersonName
    - Locations (ER + FT)
    """
    pipe = regex_pipeline().add_predictors(
        [
            DateTime(),
            BirthDateTime(),
            person_name.PersonNamePredictor(),
        ]
    )
    return pipe

create_default_ner(full=False)

Helper function that creates a NER instance already configured with the default pipeline

Source code in src/nemo_safe_synthesizer/pii_replacer/ner/pipeline.py
def create_default_ner(full: bool = False) -> NER:
    """Helper function that creates a NER
    instance already configured with the default pipeline
    """
    pipe = default_pipeline()
    return NER(pipeline=pipe)