Skip to content

utils

utils

Shared utilities for the data actions framework.

Provides ActionCtx (execution context with state and dependency injection), TransformsUtil (wrapper around the transforms_v2 engine), helper types (MetadataColumns, TransformsUpdate), and subclass-discovery functions.

Classes:

Name Description
MetadataColumns

Internal column names injected during validation phases.

TransformsUpdate

Typed wrapper for a single transforms_v2 update step.

TransformsUtil

Wrapper around a transforms_v2 Environment for executing column updates and drop conditions.

DataSource

Abstract base for pluggable data sources used by GenDataSource actions.

ActionCtx

Execution context shared across all action invocations.

Functions:

Name Description
type_alias_fn

Pydantic alias generator that maps type_ to type for YAML compatibility.

remove_metadata_columns_from_df

Drop all MetadataColumns from the DataFrame in-place.

remove_metadata_columns_from_records

Return a copy of each record dict with MetadataColumns keys removed.

is_abstract

Return True if the class has abstract methods or directly inherits ABC.

all_subclasses

Recursively collect all subclasses of klass.

concrete_subclasses

Return all non-abstract recursive subclasses of klass.

guess_datetime_format

Infer a strftime-compatible format string from a date string, or None.

MetadataColumns

Bases: StrEnum

Internal column names injected during validation phases.

Attributes:

Name Type Description
INDEX

Temporary index for mapping back to pre-transformed records.

REJECT_REASON

Reason a row was rejected during batch validation.

INDEX = '__nss__idx' class-attribute instance-attribute

Temporary index for mapping back to pre-transformed records.

REJECT_REASON = '__nss_reject_reason' class-attribute instance-attribute

Reason a row was rejected during batch validation.

TransformsUpdate pydantic-model

Bases: BaseModel

Typed wrapper for a single transforms_v2 update step.

Fields:

name pydantic-field

Target column name for the update.

value pydantic-field

Jinja expression evaluated by the transforms_v2 engine.

position = None pydantic-field

Column insertion index when adding a new column.

TransformsUtil(seed=None)

Wrapper around a transforms_v2 Environment for executing column updates and drop conditions.

Parameters:

Name Type Description Default
seed Optional[int]

Random seed passed to the underlying Environment.

None
Source code in src/nemo_safe_synthesizer/data_processing/actions/utils.py
def __init__(self, seed: Optional[int] = None) -> None:
    from ...pii_replacer.data_editor.edit import (
        Environment,
    )

    self.env = Environment(locales=None, seed=seed, globals_config={}, entity_extractor=None)

DataSource pydantic-model

Bases: BaseModel, ABC

Abstract base for pluggable data sources used by GenDataSource actions.

Subclasses implement generate_data to populate a column in an existing DataFrame. generate_records is a convenience wrapper that creates an empty DataFrame first.

Config:

  • alias_generator: type_alias_fn

with_ctx(ctx)

Attach an ActionCtx and return self for chaining.

Source code in src/nemo_safe_synthesizer/data_processing/actions/utils.py
def with_ctx(self, ctx: ActionCtx) -> Self:
    """Attach an ``ActionCtx`` and return self for chaining."""
    self._ctx = ctx
    return self

generate_records(num_records, col='newcol')

Generate records as a list of dicts without an existing DataFrame.

Source code in src/nemo_safe_synthesizer/data_processing/actions/utils.py
def generate_records(self, num_records: int, col: str = "newcol") -> list[dict[Hashable, Any]]:
    """Generate records as a list of dicts without an existing DataFrame."""
    df = pd.DataFrame(index=range(num_records))
    return self.generate_data(df, col).to_dict("records")

ActionCtx(**data) pydantic-model

Bases: BaseModel

Execution context shared across all action invocations.

Provides a random seed, a state dictionary for cross-phase communication, and a lazily-initialized TransformsUtil for expression evaluation.

Fields:

  • seed (Optional[int])
  • state (dict[str, str])
Source code in src/nemo_safe_synthesizer/data_processing/actions/utils.py
def __init__(self, /, **data: Any) -> None:
    super().__init__(**data)
    np.random.seed(seed=self.seed)

seed = None pydantic-field

Seed used for all random generation tasks.

state = {} pydantic-field

Per-action state persisted across phases (keyed by BaseAction.hash()).

type_alias_fn(field_name)

Pydantic alias generator that maps type_ to type for YAML compatibility.

Source code in src/nemo_safe_synthesizer/data_processing/actions/utils.py
def type_alias_fn(field_name: str) -> str:
    """Pydantic alias generator that maps ``type_`` to ``type`` for YAML compatibility."""
    if field_name == "type_":
        return "type"

    return field_name

remove_metadata_columns_from_df(df)

Drop all MetadataColumns from the DataFrame in-place.

Source code in src/nemo_safe_synthesizer/data_processing/actions/utils.py
def remove_metadata_columns_from_df(df: pd.DataFrame):
    """Drop all ``MetadataColumns`` from the DataFrame in-place."""
    metadata_cols = [col.value for col in MetadataColumns]

    columns_to_drop = [col for col in metadata_cols if col in df.columns]
    if columns_to_drop:
        df.drop(columns=columns_to_drop, inplace=True)

    return df

remove_metadata_columns_from_records(records)

Return a copy of each record dict with MetadataColumns keys removed.

Source code in src/nemo_safe_synthesizer/data_processing/actions/utils.py
def remove_metadata_columns_from_records(records: list[dict]) -> list[dict]:
    """Return a copy of each record dict with ``MetadataColumns`` keys removed."""
    metadata_cols = [col.value for col in MetadataColumns]

    new_records: list[dict] = []
    for record in records:
        new_records.append({k: v for k, v in record.items() if k not in metadata_cols})

    return new_records

is_abstract(c)

Return True if the class has abstract methods or directly inherits ABC.

Source code in src/nemo_safe_synthesizer/data_processing/actions/utils.py
def is_abstract(c: Any) -> bool:
    """Return True if the class has abstract methods or directly inherits ``ABC``."""
    return inspect.isabstract(c) or ABC in c.__bases__

all_subclasses(klass)

Recursively collect all subclasses of klass.

Source code in src/nemo_safe_synthesizer/data_processing/actions/utils.py
def all_subclasses(klass: type[T]) -> set[type[T]]:
    """Recursively collect all subclasses of ``klass``."""
    subclasses: set[type[T]] = set()
    subclass_queue = [klass]
    while subclass_queue:
        parent = subclass_queue.pop()
        for subclass in parent.__subclasses__():
            if subclass not in subclasses:
                subclasses.add(subclass)
                subclass_queue.append(subclass)
    return subclasses

concrete_subclasses(klass)

Return all non-abstract recursive subclasses of klass.

Used by pydantic discriminated unions (e.g., ActionT) to auto-discover instantiable action types for validation and schema generation.

Source code in src/nemo_safe_synthesizer/data_processing/actions/utils.py
def concrete_subclasses(klass: type[T]) -> set[type[T]]:
    """Return all non-abstract recursive subclasses of ``klass``.

    Used by pydantic discriminated unions (e.g., ``ActionT``) to
    auto-discover instantiable action types for validation and schema
    generation.
    """
    return set(c for c in all_subclasses(klass) if not is_abstract(c))

guess_datetime_format(datetime_str)

Infer a strftime-compatible format string from a date string, or None.

Source code in src/nemo_safe_synthesizer/data_processing/actions/utils.py
def guess_datetime_format(datetime_str: str) -> Optional[str]:
    """Infer a ``strftime``-compatible format string from a date string, or None."""
    # TODO: use `pandas.tseries.api.guess_datetime_format` in the future?
    format = parse_date(datetime_str)
    if format is None:
        return None
    return format.fmt_str