Usage Cookbook#
Examples on how to get started with Nemotron models
What’s Inside#
This directory contains cookbook-style guides showing how to deploy and use the models directly:
TensorRT-LLM Launch Guide - Running Nemotron models efficiently with TensorRT-LLM
vLLM Integration - Steps for fast inference and scalable serving of Nemotron models with vLLM.
SGLang Deployment - Tutorials on serving and interacting with Nemotron via SGLang
NIM Microservice - Guide to deploying Nemotron as scalable, production-ready endpoints using NVIDIA Inference Microservices (NIM).
Hugging Face Transformers - Direct loading and inference of Nemotron models with Hugging Face Transformers