artifact_structure
artifact_structure
¶
Artifact directory structure for Safe Synthesizer.
Defines the on-disk layout produced by each pipeline run using a declarative
descriptor pattern. FileNode and DirNode descriptors declare the
tree shape on Workdir; at runtime they resolve to Path and BoundDir
objects respectively, giving typed access to every artifact path without
hard-coding strings throughout the CLI.
Typical directory tree:
<base_path>/<config>---<dataset>/<run_name>/
- train/ ...
- generate/ ...
- dataset/ ...
See Workdir for the full structure.
Classes:
| Name | Description |
|---|---|
RunName |
Run name for artifact directories. |
FileNode |
Descriptor for file paths within a directory structure. |
DirNode |
Descriptor for directory paths within a directory structure. |
BoundDir |
Runtime class representing a bound directory path. |
Workdir |
Working directory structure for Safe Synthesizer artifacts. |
Attributes:
| Name | Type | Description |
|---|---|---|
RUN_NAME_DATE_FORMAT |
Format string for auto-generated timestamp-based run names. |
|
PROJECT_NAME_DELIMITER |
Delimiter used to separate config_name and dataset_name in project names. |
RUN_NAME_DATE_FORMAT = '%Y-%m-%dT%H:%M:%S'
module-attribute
¶
Format string for auto-generated timestamp-based run names.
PROJECT_NAME_DELIMITER = '---'
module-attribute
¶
Delimiter used to separate config_name and dataset_name in project names.
Uses triple-dash to avoid ambiguity with single dashes that commonly appear in config and dataset filenames (e.g., my-config.yaml, training-data.csv).
RunName(_value='', _timestamp=None)
dataclass
¶
Run name for artifact directories.
Supports two modes: 1) Auto-generated based on timestamp or 2) an arbitrary string name provided by the user (from --run-path).
Examples:
- Auto-generated: "2026-01-15T12:00:00"
- Explicit: "unsloth_adult_0", "my-experiment-run"
Methods:
| Name | Description |
|---|---|
to_string |
Convert the run name to a string for use in directory names. |
from_string |
Parse a run name string into a RunName object. |
Attributes:
| Name | Type | Description |
|---|---|---|
is_timestamp_based |
bool
|
Whether this run name was generated from or parsed as a timestamp. |
timestamp |
datetime | None
|
Parsed timestamp, or None for non-timestamp-based run names. |
is_timestamp_based
property
¶
Whether this run name was generated from or parsed as a timestamp.
timestamp
property
¶
Parsed timestamp, or None for non-timestamp-based run names.
to_string()
¶
from_string(name)
classmethod
¶
Parse a run name string into a RunName object.
Accepts any valid string. If the string matches the timestamp format, the timestamp is also stored for potential use.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Run name string (e.g., "2026-01-15T12:00:00" or "unsloth_adult_0"). |
required |
Returns:
| Type | Description |
|---|---|
Self
|
RunName with the provided name and optional parsed timestamp. |
Source code in src/nemo_safe_synthesizer/cli/artifact_structure.py
FileNode(name)
¶
Descriptor for file paths within a directory structure.
When accessed on a class, returns the descriptor itself. When accessed on an instance, returns the full Path to the file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
The filename (e.g., "config.json"). |
required |
Source code in src/nemo_safe_synthesizer/cli/artifact_structure.py
DirNode(name, **children)
¶
Descriptor for directory paths within a directory structure.
Supports nested children (both FileNode and DirNode). When accessed on a class, returns the descriptor itself. When accessed on an instance, returns a BoundDir with the resolved path.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
The directory name (e.g., "train"). |
required |
**children
|
FileNode | DirNode
|
Child nodes (FileNode or DirNode instances). |
{}
|
Source code in src/nemo_safe_synthesizer/cli/artifact_structure.py
BoundDir(path, children)
¶
Bases: PathLike[str]
Runtime class representing a bound directory path.
Provides access to child FileNode and DirNode descriptors as attributes.
Implements os.PathLike[str] so instances can be used wherever paths are expected.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path
|
The resolved directory path. |
required |
children
|
dict[str, FileNode | DirNode]
|
Child nodes from the DirNode. |
required |
Attributes:
| Name | Type | Description |
|---|---|---|
path |
Path
|
The resolved directory path. |
Source code in src/nemo_safe_synthesizer/cli/artifact_structure.py
path
property
¶
The resolved directory path.
Workdir(base_path, config_name, dataset_name, run_name=None, _run_name_obj=RunName(), _current_phase='unknown', _parent_workdir=None, _explicit_run_path=None)
dataclass
¶
Working directory structure for Safe Synthesizer artifacts.
This class defines the complete directory layout and provides typed access to all paths within the structure. It uses FileNode and DirNode descriptors for declarative path definitions.
Full directory structure:
<base_path>/<config>---<dataset>/<run_name>/
- train/
- safe-synthesizer-config.json
- cache/
- adapter/ (trained PEFT adapter)
- adapter_config.json
- adapter_model.safetensors
- metadata_v2.json
- dataset_schema.json
- generate/
- logs.jsonl (generate-only workflow)
- info.json (generate-only workflow)
- synthetic_data.csv
- evaluation_report.html
- evaluation_metrics.json (machine-readable metrics)
- dataset/
- training.csv
- test.csv
- validation.csv (when training.validation_ratio > 0)
- transformed_training.csv (when PII replacement transforms the data)
- logs/
- <phase>.jsonl (e.g. end_to_end.jsonl or train.jsonl)
Methods:
| Name | Description |
|---|---|
phase_dir |
Get the phase directory path. |
ensure_directories |
Create directories based on the current phase. |
new_generation_run |
Create a new Workdir for a generation run from this workdir. |
from_explicit_run_path |
Create Workdir from an explicit run path (no auto-generated nesting). |
from_path |
Load a Workdir from an existing path. |
Attributes:
| Name | Type | Description |
|---|---|---|
base_path |
Path
|
Root directory under which project and run directories are created. |
config_name |
str
|
Stem of the config file name, used in the project directory name. |
dataset_name |
str
|
Stem of the dataset file name, used in the project directory name. |
run_name |
str | None
|
Run name (auto-generated timestamp or explicit name from CLI). |
config |
Location for NSS config file. |
|
wandb_run_id_file |
Location for WandB run ID file. |
|
train |
Location and contents of train directory structure. |
|
generate |
Location and contents of generate directory structure. |
|
dataset |
Location and contents of dataset directory structure. |
|
project_name |
str
|
Project name in |
project_dir |
Path
|
Project directory path ( |
run_dir |
Path
|
Run directory path ( |
log_file |
Path
|
Log file path for the current phase. |
adapter_path |
Path
|
Shortcut to train.adapter.path (adapter directory). |
metadata_file |
Path
|
Shortcut to train.adapter.metadata. |
schema_file |
Path
|
Shortcut to train.adapter.schema. |
dataset_schema_file |
Path
|
Alias for schema_file (backwards compatibility). |
output_file |
Path
|
Shortcut to generate.output. |
evaluation_report |
Path
|
Shortcut to generate.report. |
evaluation_metrics |
Path
|
Shortcut to generate.evaluation_metrics. |
source_run_dir |
Path
|
Source run directory (parent's |
source_config |
Path
|
Source config file path (from parent workdir if available). |
source_adapter_path |
Path
|
Source adapter path (from parent workdir if available). |
source_dataset |
BoundDir
|
Source dataset directory (from parent workdir if available). |
source_schema_file |
Path
|
Source schema file path (from parent workdir if available). |
base_path
instance-attribute
¶
Root directory under which project and run directories are created.
config_name
instance-attribute
¶
Stem of the config file name, used in the project directory name.
dataset_name
instance-attribute
¶
Stem of the dataset file name, used in the project directory name.
run_name = None
class-attribute
instance-attribute
¶
Run name (auto-generated timestamp or explicit name from CLI).
When None, a timestamp-based name is generated in __post_init__.
config = FileNode('safe-synthesizer-config.json')
class-attribute
instance-attribute
¶
Location for NSS config file.
wandb_run_id_file = FileNode('wandb_run_id.txt')
class-attribute
instance-attribute
¶
Location for WandB run ID file.
train = DirNode('train', config=(FileNode('safe-synthesizer-config.json')), cache=(DirNode('cache')), adapter=(DirNode('adapter', adapter_config=(FileNode('adapter_config.json')), metadata=(FileNode('metadata_v2.json')), schema=(FileNode('dataset_schema.json')))))
class-attribute
instance-attribute
¶
Location and contents of train directory structure.
generate = DirNode('generate', logs=(FileNode('logs.jsonl')), output=(FileNode('synthetic_data.csv')), report=(FileNode('evaluation_report.html')), evaluation_metrics=(FileNode('evaluation_metrics.json')), info=(FileNode('info.json')))
class-attribute
instance-attribute
¶
Location and contents of generate directory structure.
dataset = DirNode('dataset', training=(FileNode('training.csv')), test=(FileNode('test.csv')), validation=(FileNode('validation.csv')), transformed_training=(FileNode('transformed_training.csv')))
class-attribute
instance-attribute
¶
Location and contents of dataset directory structure.
project_name
property
¶
Project name in <config>---<dataset> format.
project_dir
property
¶
Project directory path (<base_path>/<config>---<dataset>/).
Falls back to the parent of _explicit_run_path when one was provided.
run_dir
property
¶
Run directory path (<base_path>/<config>---<dataset>/<run_name>/).
Uses _explicit_run_path directly when one is provided.
log_file
property
¶
Log file path for the current phase.
adapter_path
property
¶
Shortcut to train.adapter.path (adapter directory).
When this workdir has a parent (e.g., a generation run spawned from training), returns the parent's adapter path since that's where the trained adapter lives.
metadata_file
property
¶
Shortcut to train.adapter.metadata.
Uses parent workdir's path when available.
schema_file
property
¶
Shortcut to train.adapter.schema.
Uses parent workdir's path when available.
dataset_schema_file
property
¶
Alias for schema_file (backwards compatibility).
output_file
property
¶
Shortcut to generate.output.
evaluation_report
property
¶
Shortcut to generate.report.
evaluation_metrics
property
¶
Shortcut to generate.evaluation_metrics.
source_run_dir
property
¶
Source run directory (parent's run_dir for child generation runs).
source_config
property
¶
Source config file path (from parent workdir if available).
Checks multiple locations for backwards compatibility:
1. Root level config: <run_dir>/safe-synthesizer-config.json
2. Train config: <run_dir>/train/safe-synthesizer-config.json
source_adapter_path
property
¶
Source adapter path (from parent workdir if available).
source_dataset
property
¶
Source dataset directory (from parent workdir if available).
source_schema_file
property
¶
Source schema file path (from parent workdir if available).
phase_dir(phase=None)
¶
Get the phase directory path.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
phase
|
str | None
|
Phase name (train, generate, etc.). Defaults to _current_phase. |
None
|
Returns:
| Type | Description |
|---|---|
Path
|
Path to the phase directory |
Source code in src/nemo_safe_synthesizer/cli/artifact_structure.py
ensure_directories()
¶
Create directories based on the current phase.
For training runs: creates train/, generate/, and dataset/ directories
For generation-only runs: creates only generate/ directory and writes info.txt
Returns:
| Type | Description |
|---|---|
Self
|
self for method chaining |
Source code in src/nemo_safe_synthesizer/cli/artifact_structure.py
new_generation_run()
¶
Create a new Workdir for a generation run from this workdir.
This method is used when resuming from a trained model to run generation. The new Workdir shares the same project but gets a new run_name, and references this workdir as the parent for loading config/data/adapter.
Returns:
| Type | Description |
|---|---|
Self
|
New Workdir instance with a fresh timestamp-based run_name and this workdir as parent |
Source code in src/nemo_safe_synthesizer/cli/artifact_structure.py
from_explicit_run_path(run_path, config_name, dataset_name, current_phase='unknown')
classmethod
¶
Create Workdir from an explicit run path (no auto-generated nesting).
Used when --run-path is provided on the CLI. The path is used directly
as the run directory, without the normal
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
run_path
|
Path
|
Explicit path to use as the run directory |
required |
config_name
|
str
|
Name of the config (used for project naming) |
required |
dataset_name
|
str
|
Name of the dataset (used for project naming) |
required |
current_phase
|
str
|
The current phase (train, generate, end_to_end) |
'unknown'
|
Returns:
| Type | Description |
|---|---|
Workdir
|
Workdir with run_dir set to run_path |
Raises:
| Type | Description |
|---|---|
ValueError
|
If run_path already contains a trained adapter |
Source code in src/nemo_safe_synthesizer/cli/artifact_structure.py
from_path(path)
classmethod
¶
Load a Workdir from an existing path.
This method handles three scenarios: 1. Path is a run_dir (contains train/adapter/ with safetensors) - use it directly 2. Path is a project_dir - find the latest run within that project 3. Path is a base_path - find the latest run across all projects
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path
|
Path to run_dir, project_dir, or base_path |
required |
Returns:
| Type | Description |
|---|---|
Workdir
|
Workdir pointing to the existing run |
Raises:
| Type | Description |
|---|---|
ValueError
|
If path doesn't exist or no valid run found |
Source code in src/nemo_safe_synthesizer/cli/artifact_structure.py
678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 | |