Nemotron-Parse-v1.1 Notebooks#

A notebook demonstrating NVIDIA Nemotron-Parse-v1.1, a specialized VLM for high-accuracy document ingestion.

Overview#

These notebooks provide examples of using NVIDIA Nemotron-Parse-v1.1, a specialized Transformer-based VLM that functions as the “ingestion backbone” for AI agents. It excels at turning messy, unstructured documents (like PDFs) into clean, structured, and agent-ready data formats, including JSON, LaTeX, and Markdown.

Models#

Document VLM (NIM): nvidia/nemotron-parse (Available on NVIDIA AI Endpoints)
Document VLM (Hugging Face): TBD

Key Features#

Structured Data Extraction: Converts complex PDFs into structured JSONL, tables into LaTeX, and full pages into clean Markdown.
High-Accuracy Parsing: Specialized for document intelligence, achieving industry-leading performance on benchmarks like PubTables-1M.
Reading Order Preservation: Intelligently extracts text, lists, and formulas in the correct semantic reading order.
Precise Bounding Boxes: Returns accurate, normalized bounding boxes for every extracted element (titles, text, figures, etc.), ideal for grounding.
9K Token Context: Features an extended context window for improved cross-page coherence and parsing of large, complex tables.
Agent-Ready Data: Drastically reduces post-processing and hallucinations by providing reliable, structured output for RAG and agent pipelines.

Requirements#

NVIDIA API key (get one here)
GPU recommended for local deployment (e.g., single H100)