Nemotron-Parse-v1.1 Notebooks#

A notebook demonstrating NVIDIA Nemotron-Parse-v1.1, a specialized VLM for high-accuracy document ingestion.

Overview#

These notebooks provide examples of using NVIDIA Nemotron-Parse-v1.1, a specialized Transformer-based VLM that functions as the “ingestion backbone” for AI agents. It excels at turning messy, unstructured documents (like PDFs) into clean, structured, and agent-ready data formats, including JSON, LaTeX, and Markdown.

Models#

  • Document VLM (NIM): nvidia/nemotron-parse (Available on NVIDIA AI Endpoints)

  • Document VLM (Hugging Face): TBD

Key Features#

  • Structured Data Extraction: Converts complex PDFs into structured JSONL, tables into LaTeX, and full pages into clean Markdown.

  • High-Accuracy Parsing: Specialized for document intelligence, achieving industry-leading performance on benchmarks like PubTables-1M.

  • Reading Order Preservation: Intelligently extracts text, lists, and formulas in the correct semantic reading order.

  • Precise Bounding Boxes: Returns accurate, normalized bounding boxes for every extracted element (titles, text, figures, etc.), ideal for grounding.

  • 9K Token Context: Features an extended context window for improved cross-page coherence and parsing of large, complex tables.

  • Agent-Ready Data: Drastically reduces post-processing and hallucinations by providing reliable, structured output for RAG and agent pipelines.

Requirements#

  • NVIDIA API key (get one here)

  • GPU recommended for local deployment (e.g., single H100)