replace_pii
replace_pii
¶
Classes:
| Name | Description |
|---|---|
Column |
Rule matcher for selecting columns by name, position, condition, entity, or type. |
ColumnActions |
Container for column add, drop, and rename operations. |
Row |
Rule matcher for selecting rows by name, condition, entity, or type. |
RowActions |
Container for row drop and update operations. |
StepDefinition |
Single transformation step with optional variables, column actions, and row actions. |
GlinerConfig |
Configuration for the GLiNER named-entity recognition model. |
ClassifyConfig |
Configuration for column classification using an LLM. |
Globals |
Global settings for the PII replacer including locales, seed, NER, and classification. |
PiiReplacerConfig |
Configuration for PII replacer. |
Column
pydantic-model
¶
Bases: NSSBaseModel
Rule matcher for selecting columns by name, position, condition, entity, or type.
Fields:
-
name(str | None) -
position(OptionalListOrInt) -
condition(str | None) -
value(str | None) -
entity(OptionalListOrStr) -
type(OptionalListOrStr)
Validators:
name = None
pydantic-field
¶
Column name.
position = None
pydantic-field
¶
Column position.
condition = None
pydantic-field
¶
Column condition.
value = None
pydantic-field
¶
Rename to value.
entity = None
pydantic-field
¶
Column entity match.
type = None
pydantic-field
¶
Column type match.
identifier_required(values)
pydantic-validator
¶
Ensure at least one column identifier field is provided.
Source code in src/nemo_safe_synthesizer/config/replace_pii.py
ColumnActions
pydantic-model
¶
Row
pydantic-model
¶
Bases: NSSBaseModel
Rule matcher for selecting rows by name, condition, entity, or type.
Fields:
-
name(OptionalListOrStr) -
condition(str | None) -
foreach(str | None) -
value(str | None) -
entity(OptionalListOrStr) -
type(OptionalListOrStr) -
fallback_value(str | None) -
description(str | None)
Validators:
name = None
pydantic-field
¶
Row name.
condition = None
pydantic-field
¶
Row condition match.
foreach = None
pydantic-field
¶
Foreach expression.
value = None
pydantic-field
¶
Row value definition.
entity = None
pydantic-field
¶
Row entity match.
type = None
pydantic-field
¶
Row type match.
fallback_value = None
pydantic-field
¶
Row fallback value.
description = None
pydantic-field
¶
Rule description for human consumption.
identifier_required(values)
pydantic-validator
¶
Ensure at least one row identifier field is provided.
Source code in src/nemo_safe_synthesizer/config/replace_pii.py
RowActions
pydantic-model
¶
StepDefinition
pydantic-model
¶
Bases: NSSBaseModel
Single transformation step with optional variables, column actions, and row actions.
Fields:
-
vars(dict[str, str | dict | list] | None) -
columns(ColumnActions | None) -
rows(RowActions | None)
GlinerConfig
pydantic-model
¶
Bases: NSSBaseModel
Configuration for the GLiNER named-entity recognition model.
Fields:
-
enable_gliner(bool) -
enable_batch_mode(bool) -
batch_size(int) -
chunk_length(int) -
gliner_model(str)
enable_gliner = True
pydantic-field
¶
Enable GLiNER NER module.
enable_batch_mode = True
pydantic-field
¶
Enable GLiNER batch mode.
batch_size = 8
pydantic-field
¶
GLiNER batch size.
chunk_length = 512
pydantic-field
¶
GLiNER batch chunk length in characters.
gliner_model = 'nvidia/gliner-PII'
pydantic-field
¶
GLiNER model name.
NERConfig
pydantic-model
¶
Bases: NSSBaseModel
Configuration for Named Entity Recognition.
Fields:
-
ner_threshold(float) -
enable_regexps(bool) -
gliner(GlinerConfig) -
ner_entities(OptionalStrList)
ner_threshold = 0.3
pydantic-field
¶
NER model threshold.
enable_regexps = False
pydantic-field
¶
Enable NER regular expressions (experimental).
gliner = GlinerConfig()
pydantic-field
¶
GLiNER NER configuration.
ner_entities = None
pydantic-field
¶
List of entity types to recognize. If unset, classification entity types are used.
ClassifyConfig
pydantic-model
¶
Bases: NSSBaseModel
Configuration for column classification using an LLM.
Fields:
-
enable_classify(bool | None) -
entities(OptionalStrList) -
num_samples(int | None) -
classify_model_provider(str | None)
enable_classify = None
pydantic-field
¶
Enable column classification.
entities = None
pydantic-field
¶
List of entity types to classify.
num_samples = 3
pydantic-field
¶
Number of column values to sample for classification.
classify_model_provider = None
pydantic-field
¶
Name of the model provider in the Inference Gateway for column classification. The job compiler will resolve this to the appropriate endpoint URL.
Globals
pydantic-model
¶
Bases: NSSBaseModel
Global settings for the PII replacer including locales, seed, NER, and classification.
Fields:
-
locales(list[str] | None) -
seed(int | None) -
classify(ClassifyConfig) -
ner(NERConfig) -
lock_columns(OptionalStrList)
Validators:
-
_validate_locale→locales
locales = None
pydantic-field
¶
List of locales.
seed = None
pydantic-field
¶
Optional random seed.
classify
pydantic-field
¶
Column classification configuration.
ner
pydantic-field
¶
Named Entity Recognition configuration.
lock_columns = None
pydantic-field
¶
List of columns to preserve as immutable across all transformations.
PiiReplacerConfig
pydantic-model
¶
Bases: Parameters
Configuration for PII replacer.
Defines how PII data should be detected and replaced in a dataset.
Fields:
-
globals(Globals) -
steps(list[StepDefinition])