Nemotron Kit#
The nemotron.kit module is a lightweight framework powering Nemotron training recipes. It provides the infrastructure for building reproducible ML training pipelines with artifact versioning, configuration management, and lineage tracking.
Lightweight by Design: Kit handles recipe orchestration—not training. All heavy lifting is done by the NVIDIA AI Stack: Megatron-Core for distributed training primitives, Megatron-Bridge for model training, and NeMo-RL for reinforcement learning.
Overview#
Kit handles four core responsibilities:
Component |
Purpose |
|---|---|
Path-centric data and model versioning with typed metadata |
|
Configuration |
YAML/TOML config loading with Hydra-style overrides |
Batteries-included CLI with optional MLOps execution via |
|
Experiment tracking and artifact provenance via W&B |
|
Optional NeMo-Run integration for distributed training |
Architecture#
%%{init: {'theme': 'base', 'themeVariables': { 'primaryBorderColor': '#333333', 'lineColor': '#333333', 'primaryTextColor': '#333333', 'clusterBkg': '#ffffff', 'clusterBorder': '#333333'}}}%%
flowchart LR
subgraph cli["CLI"]
direction TB
CMD1["nemotron ... data prep"]
CMD2["nemotron ... train/sft/rl"]
end
subgraph recipes["Recipe Scripts"]
direction TB
DataPrep["data_prep.py"]
Train["train.py"]
end
subgraph kit["nemotron.kit"]
direction TB
Config["Configuration"]
Run["NeMo-Run"]
Artifact["Artifacts"]
Tracker["Lineage"]
end
subgraph execution["Execution Backends"]
direction TB
Slurm["Slurm ✓"]
Docker["Docker (planned)"]
Cloud["Cloud (planned)"]
end
subgraph tracking["Logging & Artifact Backends"]
direction LR
WandB["W&B ✓"]
More1["(planned)"]
More2["(planned)"]
end
CMD1 --> DataPrep
CMD2 --> Train
DataPrep --> kit
Train --> kit
Run --> execution
Tracker --> tracking
style cli fill:#fce4ec,stroke:#c2185b
style recipes fill:#e8f5e9,stroke:#388e3c
style kit fill:#e3f2fd,stroke:#1976d2
style execution fill:#fff3e0,stroke:#f57c00
style tracking fill:#f3e5f5,stroke:#7b1fa2
Quick Example#
A minimal training recipe using kit:
from nemotron.kit import cli, PretrainBlendsArtifact
from dataclasses import dataclass
from pathlib import Path
@dataclass
class Config:
data_path: Path
output_dir: Path
learning_rate: float = 1e-4
# Load config with CLI override support
config = cli(Config, config_file="config.yaml")
# Load data artifact
data = PretrainBlendsArtifact.load(config.data_path)
print(f"Training on {data.total_tokens:,} tokens")
# ... training code ...
# Save model artifact
model = ModelArtifact(path=config.output_dir, step=10000, loss=2.5)
model.save(name="ModelArtifact-pretrain")
Run with overrides:
python train.py --config config.yaml learning_rate=5e-5
Key Concepts#
Artifacts#
Artifacts are path-centric objects with typed metadata. The core field is always path—the filesystem location of the data. See Artifact Lineage for details.
from nemotron.kit import PretrainBlendsArtifact
# Load from semantic URI
artifact = PretrainBlendsArtifact.from_uri("art://PretrainBlendsArtifact:latest")
print(f"Path: {artifact.path}")
print(f"Tokens: {artifact.total_tokens:,}")
Configuration#
Kit supports YAML, TOML, and JSON configs with automatic format detection and Hydra-style overrides:
# config.yaml
data_path: /data/pretrain
output_dir: ${oc.env:PWD}/output
learning_rate: 1e-4
Override from CLI:
python train.py --config config.yaml output_dir=/scratch/output
Execution#
Kit integrates with NeMo-Run for distributed execution across Slurm, Docker, and cloud environments:
# Run on Slurm cluster
uv run nemotron nano3 pretrain --run YOUR-CLUSTER
# Run in Docker
uv run nemotron nano3 pretrain --run docker-profile
Lineage Tracking#
Kit tracks artifact lineage through W&B. See W&B Integration for automatic credential handling and Artifact Lineage for the lineage graph.
Module Structure#
nemotron.kit
├── artifacts # Artifact classes and registry
├── cli # CLI building and env.toml parsing
├── config # Configuration loading
├── run # NeMo-Run executor building
├── trackers # Lineage tracking backends
└── wandb # W&B integration utilities
API Reference#
Artifacts#
Export |
Description |
|---|---|
|
Base artifact class |
|
Pretrain data with train/valid/test splits |
|
Packed SFT sequences |
|
RL JSONL data |
|
Model checkpoints |
|
Source dataset metadata |
Configuration#
Export |
Description |
|---|---|
|
Config loader with CLI support |
|
Load YAML with OmegaConf |
|
Apply key=value overrides |
Execution#
Export |
Description |
|---|---|
|
Build NeMo-Run executor from profile |
|
Load execution profile from env.toml |
Tracking#
Export |
Description |
|---|---|
|
Conditional W&B initialization |
|
Add tags to W&B runs |
|
Clean W&B shutdown |
Further Reading#
NVIDIA AI Stack — Megatron-Core, Megatron-Bridge, NeMo-RL
OmegaConf Configuration — Artifact interpolations and unified W&B logging
Artifact Lineage — Artifact versioning and W&B lineage
W&B Integration — Automatic credential handling
Execution through NeMo-Run — Execution profiles and packagers
CLI Framework — Building recipe CLIs
Data Preparation — Data prep module