Nemotron Training Recipes#

Open and efficient models for agentic AI — reproducible training pipelines with fully transparent data, techniques, and weights.

Quick Start#

// Install the Nemotron training recipes
$ git clone https://github.com/NVIDIA/nemotron
$ cd nemotron && uv sync

// Run the full Nano3 pipeline
$ uv run nemotron nano3 data prep pretrain --run YOUR-CLUSTER
$ uv run nemotron nano3 pretrain --run YOUR-CLUSTER
$ uv run nemotron nano3 data prep sft --run YOUR-CLUSTER
$ uv run nemotron nano3 sft --run YOUR-CLUSTER
$ uv run nemotron nano3 data prep rl --run YOUR-CLUSTER
$ uv run nemotron nano3 rl --run YOUR-CLUSTER

Note: The --run YOUR-CLUSTER flag submits jobs to your configured Slurm cluster via NeMo-Run. See Execution through NeMo-Run for setup instructions.

Usage Cookbook & Examples#

Usage Cookbook

Deployment guides for Nemotron models: TensorRT-LLM, vLLM, SGLang, NIM, and Hugging Face.

Usage Cookbook
Use Case Examples

End-to-end applications: RAG agents, ML agents, and multi-agent systems.

Nemotron Use Case Examples

Available Training Recipes#

Nemotron 3 Nano

31.6B total / 3.6B active parameters, 25T tokens, up to 1M context. Hybrid Mamba-Transformer with sparse MoE.

Stages: Pretraining → SFT → RL

Nemotron 3 Nano Training Recipe

Training Pipeline#

The Nemotron training pipeline follows a three-stage approach with full artifact lineage tracking:

Stage

Name

Description

0

Pretraining

Base model training on large text corpus

1

SFT

Supervised fine-tuning for instruction following

2

RL

Reinforcement learning for alignment

Why Nemotron?#

Open Models

Transparent training data, techniques, and weights for community innovation

Compute Efficiency

Model pruning enabling higher throughput via TensorRT-LLM

High Accuracy

Built on frontier open models with human-aligned reasoning

Flexible Deployment

Deploy anywhere — edge, single GPU, or data center with NIM

Key Features#

  • Complete Pipelines — From raw data to deployment-ready models

  • Artifact Lineage — Full traceability via W&B from data to model

  • Production-Grade — Built on NVIDIA’s NeMo stack (Megatron-Bridge, NeMo-RL)

  • Reproducible — Versioned configs, data blends, and checkpoints

Resources#