Usage Cookbook#

Examples on how to get started with Nemotron models


What’s Inside#

This directory contains cookbook-style guides showing how to deploy and use the models directly:

  • TensorRT-LLM Launch Guide - Running Nemotron models efficiently with TensorRT-LLM

  • vLLM Integration - Steps for fast inference and scalable serving of Nemotron models with vLLM.

  • SGLang Deployment - Tutorials on serving and interacting with Nemotron via SGLang

  • NIM Microservice - Guide to deploying Nemotron as scalable, production-ready endpoints using NVIDIA Inference Microservices (NIM).

  • Hugging Face Transformers - Direct loading and inference of Nemotron models with Hugging Face Transformers