Skip to content

replace_pii

replace_pii

Classes:

Name Description
Column

Rule matcher for selecting columns by name, position, condition, entity, or type.

ColumnActions

Container for column add, drop, and rename operations.

Row

Rule matcher for selecting rows by name, condition, entity, or type.

RowActions

Container for row drop and update operations.

StepDefinition

Single transformation step with optional variables, column actions, and row actions.

GlinerConfig

Configuration for the GLiNER named-entity recognition model.

ClassifyConfig

Configuration for column classification using an LLM.

Globals

Global settings for the PII replacer including locales, seed, NER, and classification.

PiiReplacerConfig

Configuration for PII replacer.

Column pydantic-model

Bases: NSSBaseModel

Rule matcher for selecting columns by name, position, condition, entity, or type.

Fields:

Validators:

name = None pydantic-field

Column name.

position = None pydantic-field

Column position.

condition = None pydantic-field

Column condition.

value = None pydantic-field

Rename to value.

entity = None pydantic-field

Column entity match.

type = None pydantic-field

Column type match.

identifier_required(values) pydantic-validator

Ensure at least one column identifier field is provided.

Source code in src/nemo_safe_synthesizer/config/replace_pii.py
@model_validator(mode="before")
@classmethod
def identifier_required(cls, values):
    """Ensure at least one column identifier field is provided."""
    # Handle both dict and model instance cases (Pydantic v2 compatibility)
    if not isinstance(values, dict):
        return values
    if (
        values.get("name") is None
        and values.get("condition") is None
        and values.get("entity") is None
        and values.get("position") is None
        and values.get("type") is None
    ):
        raise ValueError("column rule must contain one of name, position, entity, type or condition.")
    return values

ColumnActions pydantic-model

Bases: NSSBaseModel

Container for column add, drop, and rename operations.

Fields:

add = None pydantic-field

Columns to add.

drop = None pydantic-field

Columns to drop.

rename = None pydantic-field

Columns to rename.

Row pydantic-model

Bases: NSSBaseModel

Rule matcher for selecting rows by name, condition, entity, or type.

Fields:

Validators:

name = None pydantic-field

Row name.

condition = None pydantic-field

Row condition match.

foreach = None pydantic-field

Foreach expression.

value = None pydantic-field

Row value definition.

entity = None pydantic-field

Row entity match.

type = None pydantic-field

Row type match.

fallback_value = None pydantic-field

Row fallback value.

description = None pydantic-field

Rule description for human consumption.

identifier_required(values) pydantic-validator

Ensure at least one row identifier field is provided.

Source code in src/nemo_safe_synthesizer/config/replace_pii.py
@model_validator(mode="before")
@classmethod
def identifier_required(cls, values):
    """Ensure at least one row identifier field is provided."""
    # Handle both dict and model instance cases (Pydantic v2 compatibility)
    if not isinstance(values, dict):
        return values
    if (
        values.get("name") is None
        and values.get("condition") is None
        and values.get("entity") is None
        and values.get("type") is None
    ):
        raise ValueError("row rule must contain one of name, entity, type or condition.")

    if values.get("foreach") is not None and values.get("value") is None:
        raise ValueError(
            "foreach without value field. If a rule contains foreach, it must also "
            "include a value field to iterate on."
        )
    return values

RowActions pydantic-model

Bases: NSSBaseModel

Container for row drop and update operations.

Fields:

drop = None pydantic-field

Rows to drop.

update = None pydantic-field

Rows to update.

StepDefinition pydantic-model

Bases: NSSBaseModel

Single transformation step with optional variables, column actions, and row actions.

Fields:

vars = None pydantic-field

Variable names and templates.

columns = None pydantic-field

Columns transform configuration.

rows = None pydantic-field

Rows transform configurations.

GlinerConfig pydantic-model

Bases: NSSBaseModel

Configuration for the GLiNER named-entity recognition model.

Fields:

enable_gliner = True pydantic-field

Enable GLiNER NER module.

enable_batch_mode = True pydantic-field

Enable GLiNER batch mode.

batch_size = 8 pydantic-field

GLiNER batch size.

chunk_length = 512 pydantic-field

GLiNER batch chunk length in characters.

gliner_model = 'nvidia/gliner-PII' pydantic-field

GLiNER model name.

NERConfig pydantic-model

Bases: NSSBaseModel

Configuration for Named Entity Recognition.

Fields:

ner_threshold = 0.3 pydantic-field

NER model threshold.

enable_regexps = False pydantic-field

Enable NER regular expressions (experimental).

gliner = GlinerConfig() pydantic-field

GLiNER NER configuration.

ner_entities = None pydantic-field

List of entity types to recognize. If unset, classification entity types are used.

ClassifyConfig pydantic-model

Bases: NSSBaseModel

Configuration for column classification using an LLM.

Fields:

enable_classify = None pydantic-field

Enable column classification.

entities = None pydantic-field

List of entity types to classify.

num_samples = 3 pydantic-field

Number of column values to sample for classification.

classify_model_provider = None pydantic-field

Name of the model provider in the Inference Gateway for column classification. The job compiler will resolve this to the appropriate endpoint URL.

Globals pydantic-model

Bases: NSSBaseModel

Global settings for the PII replacer including locales, seed, NER, and classification.

Fields:

Validators:

locales = None pydantic-field

List of locales.

seed = None pydantic-field

Optional random seed.

classify pydantic-field

Column classification configuration.

ner pydantic-field

Named Entity Recognition configuration.

lock_columns = None pydantic-field

List of columns to preserve as immutable across all transformations.

PiiReplacerConfig pydantic-model

Bases: Parameters

Configuration for PII replacer.

Defines how PII data should be detected and replaced in a dataset.

Fields:

globals pydantic-field

Global configuration options.

steps pydantic-field

List of transformation steps to perform on input data.

get_default_config() classmethod

Return a default configuration loaded from the embedded YAML template.

Source code in src/nemo_safe_synthesizer/config/replace_pii.py
@classmethod
def get_default_config(cls) -> Self:
    """Return a default configuration loaded from the embedded YAML template."""
    return cls.from_yaml_str(DEFAULT_PII_TRANSFORM_CONFIG)