Usage Cookbook#

Examples on how to get started with Nemotron models

What’s Inside#

This directory contains cookbook-style guides showing how to deploy and use the models directly:

TensorRT-LLM Launch Guide - Running Nemotron models efficiently with TensorRT-LLM
vLLM Integration - Steps for fast inference and scalable serving of Nemotron models with vLLM.
SGLang Deployment - Tutorials on serving and interacting with Nemotron via SGLang
NIM Microservice - Guide to deploying Nemotron as scalable, production-ready endpoints using NVIDIA Inference Microservices (NIM).
Hugging Face Transformers - Direct loading and inference of Nemotron models with Hugging Face Transformers