edit
edit
¶
Classes:
| Name | Description |
|---|---|
TransformFnAccounting |
Tracks which transform functions or filters are applied to each column for reporting. |
ProgressStatus |
Mutable progress counters and labels for transformation steps (step, rule, row, column). |
ProgressLog |
Throttled progress logging for transformation; logs to |
Step |
Single transformation step: applies column/row add/drop/rename/update rules to a DataFrame. |
Editor |
Applies a sequence of transformation steps to a DataFrame (columns/rows add, drop, rename, update). |
Functions:
| Name | Description |
|---|---|
instantiate_vars |
Recursively render template strings in |
TransformFnAccounting(included_fns)
¶
Tracks which transform functions or filters are applied to each column for reporting.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
included_fns
|
list[str]
|
Function/filter names to track; others are ignored (or recorded as |
required |
Attributes:
| Name | Type | Description |
|---|---|---|
included_fns |
set[str]
|
Set of names that are included in accounting. |
column_fns |
dict[str, set[str]]
|
Map of column name to set of function/filter names applied to that column. |
Methods:
| Name | Description |
|---|---|
update |
Record that the given functions/filters were applied to the given columns. |
Source code in src/nemo_safe_synthesizer/pii_replacer/data_editor/edit.py
update(column_names, fns)
¶
Record that the given functions/filters were applied to the given columns.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
column_names
|
str | Iterable[str]
|
Column name(s) to record; a single string or iterable of strings. |
required |
fns
|
str | set[str]
|
Name(s) of functions or filters applied; intersected with |
required |
Source code in src/nemo_safe_synthesizer/pii_replacer/data_editor/edit.py
ProgressStatus(step_n=0, step_n_total=0, update_rule_n=0, update_rule_n_total=0, update_rule_description='', row_n=0, row_n_total=0, column_n=0, column_n_total=0, column_name='')
dataclass
¶
Mutable progress counters and labels for transformation steps (step, rule, row, column).
Attributes:
| Name | Type | Description |
|---|---|---|
step_n |
int
|
Current step index (0-based). |
step_n_total |
int
|
Total number of steps. |
update_rule_n |
int
|
Current update rule index (0-based). |
update_rule_n_total |
int
|
Total number of update rules in the current step. |
update_rule_description |
str
|
Description of the current update rule (for logging). |
row_n |
int
|
Number of rows processed so far. |
row_n_total |
int
|
Total number of rows to process for the current column. |
column_n |
int
|
Current column index (0-based). |
column_n_total |
int
|
Total number of columns in the current update rule. |
column_name |
str
|
Name of the column currently being processed. |
step_n = 0
class-attribute
instance-attribute
¶
Current step index (0-based).
step_n_total = 0
class-attribute
instance-attribute
¶
Total number of steps.
update_rule_n = 0
class-attribute
instance-attribute
¶
Current update rule index (0-based).
update_rule_n_total = 0
class-attribute
instance-attribute
¶
Total number of update rules in the current step.
update_rule_description = ''
class-attribute
instance-attribute
¶
Description of the current update rule (for logging).
row_n = 0
class-attribute
instance-attribute
¶
Number of rows processed so far.
row_n_total = 0
class-attribute
instance-attribute
¶
Total number of rows to process for the current column.
column_n = 0
class-attribute
instance-attribute
¶
Current column index (0-based).
column_n_total = 0
class-attribute
instance-attribute
¶
Total number of columns in the current update rule.
column_name = ''
class-attribute
instance-attribute
¶
Name of the column currently being processed.
ProgressLog(log_duration)
¶
Throttled progress logging for transformation; logs to logger.user at most every log_duration seconds.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
log_duration
|
float
|
Minimum seconds between log emissions. |
required |
Attributes:
| Name | Type | Description |
|---|---|---|
status |
ProgressStatus
|
Current progress counters and labels. |
start_time |
float
|
Monotonic time when logging started. |
last_log |
float
|
Monotonic time of last log. |
log_duration |
float
|
Minimum interval between logs in seconds. |
Methods:
| Name | Description |
|---|---|
log_throttled |
Emit a progress log if at least |
Source code in src/nemo_safe_synthesizer/pii_replacer/data_editor/edit.py
log_throttled(force=False)
¶
Emit a progress log if at least log_duration seconds have passed, or if force is True.
Source code in src/nemo_safe_synthesizer/pii_replacer/data_editor/edit.py
Step
¶
Single transformation step: applies column/row add/drop/rename/update rules to a DataFrame.
Used via Step.execute; holds _env (Jinja + faker) and _vars for the step.
Methods:
| Name | Description |
|---|---|
do_make_template |
Build a Jinja template from the string (may raise |
make_template |
Build a Jinja template; raise with |
template_to_fnames |
Return the set of filter/function names referenced in the template (e.g. |
update_ner_cache |
Pre-fill the entity extractor cache for the given text series (e.g. before row updates). |
execute |
Run one transformation step: apply column add/drop/rename and row drop/update from |
do_make_template(template_str)
¶
Build a Jinja template from the string (may raise TemplateError).
make_template(template_str)
¶
Build a Jinja template; raise with error_id='param' on failure.
Source code in src/nemo_safe_synthesizer/pii_replacer/data_editor/edit.py
template_to_fnames(template_str)
¶
Return the set of filter/function names referenced in the template (e.g. fake, re).
Source code in src/nemo_safe_synthesizer/pii_replacer/data_editor/edit.py
update_ner_cache(texts, entities=None)
¶
Pre-fill the entity extractor cache for the given text series (e.g. before row updates).
Source code in src/nemo_safe_synthesizer/pii_replacer/data_editor/edit.py
execute(df, entities, column_types, step_config, env, progress, fnreport)
classmethod
¶
Run one transformation step: apply column add/drop/rename and row drop/update from step_config.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
DataFrame to transform (modified in place). |
required |
entities
|
dict[str, str]
|
Column name to entity type. |
required |
column_types
|
dict[str, str]
|
Column name to column type. |
required |
step_config
|
dict[str, dict]
|
Step config with optional |
required |
env
|
Environment
|
Environment (Jinja, faker, entity extractor). |
required |
progress
|
ProgressLog
|
Progress logger for throttled output. |
required |
fnreport
|
TransformFnAccounting | None
|
Optional accounting for which functions were applied per column. |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
The same DataFrame after applying the step (index reset). |
Source code in src/nemo_safe_synthesizer/pii_replacer/data_editor/edit.py
Editor(config, entity_extractor)
¶
Applies a sequence of transformation steps to a DataFrame (columns/rows add, drop, rename, update).
Config is a dict with steps; each step has optional vars, columns, and rows.
Uses Environment for Jinja templates and entity extraction.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
dict[str, dict]
|
Editor config (e.g. from YAML) with |
required |
entity_extractor
|
EntityExtractor | None
|
Optional extractor for NER in templates; |
required |
Methods:
| Name | Description |
|---|---|
load_yaml |
Build an |
process_df |
Apply all transformation steps to a deep copy of |
Source code in src/nemo_safe_synthesizer/pii_replacer/data_editor/edit.py
load_yaml(yaml_str)
classmethod
¶
Build an Editor from a YAML string (e.g. yaml.safe_load(yaml_str)).
process_df(df, entities, column_types, fnreport=None)
¶
Apply all transformation steps to a deep copy of df and return the result.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Source DataFrame (not modified). |
required |
entities
|
dict[str, str]
|
Column name to entity type. |
required |
column_types
|
dict[str, str]
|
Column name to column type. |
required |
fnreport
|
TransformFnAccounting | None
|
Optional accounting for which functions were applied per column. |
None
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Transformed DataFrame. |
Source code in src/nemo_safe_synthesizer/pii_replacer/data_editor/edit.py
instantiate_vars(var_name, var_value, step, df)
¶
Recursively render template strings in var_value and eval to Python types.
Strings are rendered with step and df; then ast.literal_eval is attempted.
Dicts and lists are processed recursively. Template errors for var_name raise with
error_id='param'. Order of vars in config can affect what is available during render.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
var_name
|
str
|
Name of the variable (used in error messages). |
required |
var_value
|
dict | list | str
|
Current value (string, list, or dict) to render and optionally eval. |
required |
step
|
Step
|
Step with |
required |
df
|
DataFrame
|
DataFrame available as |
required |
Returns:
| Type | Description |
|---|---|
Any
|
Rendered value, with strings possibly converted to bool/int/float/list/dict via |