Stage 1: Supervised Fine-Tuning (SFT)#
This stage fine-tunes the pretrained model for instruction following using Megatron-Bridge.
Open-Source Data Only: This recipe uses exclusively open-sourced SFT data from the Nemotron Post-training Datasets collection, which is a subset of the full data used to train the released model. The recipe includes datasets from Nemotron-Science-v1, Nemotron-Instruction-Following-Chat-v1, Nemotron-Math-Proofs-v1, Nemotron-SWE-v1, Nemotron-Agentic-v1, and Nemotron-Competitive-Programming-v1. Results will differ from the benchmarks in the tech report. Use this recipe as a reference implementation to apply the methodology with your own data.
Quick Start#
// 1. Prepare data (apply chat templates, tokenize to .npy)
$ uv run nemotron nano3 data prep sft --run YOUR-CLUSTER
// 2. Run SFT
$ uv run nemotron nano3 sft --run YOUR-CLUSTER
Note: The
--run YOUR-CLUSTERflag submits jobs via NeMo-Run. See Execution through NeMo-Run for setup.
Direct Script Execution#
Inside a container on a compute node:
# Data preparation
uv run python data_prep.py --config config/data_prep.yaml
# Training (single node)
uv run python train.py --config config/default.yaml
# Training (distributed)
uv run torchrun --nproc_per_node=8 train.py --config config/default.yaml
Configuration#
File |
Purpose |
|---|---|
|
Production configuration |
|
Data preparation settings |
|
Dataset blend definition |
Data Preparation#
The data_prep.py script processes OpenAI-format chat data into packed sequences with role-based loss masking. See Data Preparation Module for detailed documentation.
CLI Command#
uv run nemotron nano3 data prep sft [options]
Option |
Description |
|---|---|
|
Execute on Slurm via NeMo-Run |
|
Limit rows per dataset (for testing) |
|
Force re-run, ignoring cache |
Output#
output/stage1_sft/
├── training.npy
├── validation.npy
├── test.npy
└── metadata.json
The output is registered as a W&B Artifact (DataBlendsArtifact-sft) for lineage tracking.
Training#
CLI Command#
uv run nemotron nano3 sft [options] [overrides...]
Option |
Description |
|---|---|
|
Attached—submits and waits, streaming logs (NeMo-Run) |
|
Detached—submits and exits immediately (NeMo-Run) |
|
Preview execution plan |
|
Override config values (CLI Framework) |
Override Examples#
# More training iterations
uv run nemotron nano3 sft train.train_iters=5000
# Different learning rate
uv run nemotron nano3 sft optimizer.lr=1e-5
# Load specific pretrained checkpoint
uv run nemotron nano3 sft checkpoint.load=/path/to/pretrain/checkpoint
Running with NeMo-Run#
Configure execution profiles in env.toml:
[wandb]
project = "nemotron"
entity = "YOUR-TEAM"
[YOUR-CLUSTER]
executor = "slurm"
account = "YOUR-ACCOUNT"
partition = "batch"
nodes = 2
ntasks_per_node = 8
gpus_per_node = 8
mounts = ["/lustre:/lustre"]
See Execution through NeMo-Run for complete configuration options.
Artifact Lineage#
%%{init: {'theme': 'base', 'themeVariables': { 'primaryBorderColor': '#333333', 'lineColor': '#333333', 'primaryTextColor': '#333333'}}}%%
flowchart TB
prev["ModelArtifact-pretrain<br/>(from Stage 0)"] --> train
inst["Instruction Datasets<br/>(OpenAI chat format)"] --> dp["data_prep.py"]
dp --> data["DataBlendsArtifact-sft<br/>(packed .npy files)"]
data --> train["train.py"]
train --> model["ModelArtifact-sft<br/>(fine-tuned checkpoint)"]
model --> next["Stage 2: RL"]
style prev fill:#e1f5fe,stroke:#2196f3
style inst fill:#f3e5f5,stroke:#9c27b0
style dp fill:#f3e5f5,stroke:#9c27b0
style data fill:#f3e5f5,stroke:#9c27b0
style train fill:#f3e5f5,stroke:#9c27b0
style model fill:#f3e5f5,stroke:#9c27b0
style next fill:#e8f5e9,stroke:#4caf50
Methodology#
For complete methodology, see Tech Report Section 3.1.
Chat Template#
Nemotron 3 Nano supports both reasoning and non-reasoning modes:
Multi-Step: Existing reasoning tokens preserved for reuse in subsequent steps
Multi-Turn: Reasoning from previous turns dropped when user message introduced
Tool Calling: Uses XML-style special tags to reduce character escaping
SFT Data Domains#
Domain |
Description |
|---|---|
Competition Math |
Tool-integrated reasoning with GPT-OSS teachers |
Competition Code |
OpenCodeReasoning solutions with obfuscation/complication |
InfinityByte |
Cross-domain code synthesis at model capability boundaries |
STEM Reasoning (RQA) |
Reasoning Q&A from undergraduate/graduate STEM content |
Conversational Tool Use |
Multi-turn trajectories with simulated tool execution |
Long Context |
128k mean token length, 256k hard limit |
Formal Proofs |
Lean theorem proving with 300k examples |
Multilingual |
French, Spanish, Italian, German, Japanese |
Terminal Use |
Terminal operations from Terminal Bench |
General Chat |
Multi-turn responses from LMSYS and WildChat |
Instruction Following |
Tülu 3 methodology with verifier filtering |
Safety |
Refusal behaviors from safety datasets |
Software Engineering |
GitHub issue resolution trajectories |
Science |
Physics, chemistry, biology via NeMo Data Designer |
For detailed data generation pipelines, see Tech Report Section 3.1.
Data Filtering#
The pipeline applies:
Structural checks: Discard malformed examples
Pathological repetition filtering: Remove repeated n-grams
Consistency filtering: Judge-based action consistency verification
Narrative filtering: Remove political/nationalistic narratives
Hyperparameters#
Parameter |
Value |
|---|---|
Learning Rate |
1e-5 |
Sequence Length |
4096 tokens (pack_size) |
Loss Masking |
Role-based (assistant tokens only) |
Optimizer |
AdamW |
Total Samples |
18M+ |
Open-Source Data#
Note: This recipe trains exclusively on the open-sourced subset of SFT data. Results will differ from the tech report benchmarks, which used additional proprietary data.
NVIDIA AI Stack#
This stage uses the following components from the NVIDIA AI Stack:
Component |
Role |
Documentation |
|---|---|---|
Distributed training primitives (TP, PP, DP, EP) |
||
Fine-tuning loop, checkpoint loading, loss masking |
Key Features Used#
Feature |
Purpose |
|---|---|
|
SFT training with pre-loaded checkpoint |
Role-based loss masking |
Only compute loss on assistant tokens |
Mixed precision (BF16) |
Memory-efficient training |
Gradient checkpointing |
Reduce memory footprint |
Container#
nvcr.io/nvidia/nemo:25.11.nemotron_3_nano
Next Steps#
After SFT completes, proceed to Stage 2: RL for alignment training.
Reference#
Tech Report Section 3.1 — SFT methodology
NVIDIA AI Stack — Megatron-Core, Megatron-Bridge documentation
Artifact Lineage — W&B artifact system
Stage 0: Pretraining — Pretrain the base model
Recipe Source:
src/nemotron/recipes/nano3/stage1_sft/— Implementation details