🎨 Data Designer Tutorial: Providing Images as Context for Vision-Based Data Generation¶

📚 What you'll learn¶

This notebook demonstrates how to provide images as context to generate text descriptions using vision-language models.

✨ Visual Document Processing: Converting images to chat-ready format for model consumption
🔍 Vision-Language Generation: Using vision models to generate detailed summaries from images

If this is your first time using Data Designer, we recommend starting with the first notebook in this tutorial series.

📦 Import Data Designer¶

data_designer.config provides access to the configuration API.
DataDesigner is the main interface for data generation.

In [1]:

Copied!





# Standard library imports
import base64
import io
import uuid

# Third-party imports
import pandas as pd
import rich
from datasets import load_dataset
from IPython.display import display
from rich.panel import Panel

# Data Designer imports
import data_designer.config as dd
from data_designer.interface import DataDesigner
# Standard library imports
import base64
import io
import uuid

# Third-party imports
import pandas as pd
import rich
from datasets import load_dataset
from IPython.display import display
from rich.panel import Panel

# Data Designer imports
import data_designer.config as dd
from data_designer.interface import DataDesigner

⚙️ Initialize the Data Designer interface¶

DataDesigner is the main object responsible for managing the data generation process.
When initialized without arguments, the default model providers are used.

In [2]:

Copied!

data_designer = DataDesigner()
data_designer = DataDesigner()

🎛️ Define model configurations¶

Each ModelConfig defines a model that can be used during the generation process.
The "model alias" is used to reference the model in the Data Designer config (as we will see below).
The "model provider" is the external service that hosts the model (see the model config docs for more details).
By default, we use build.nvidia.com as the model provider.

In [3]:

Copied!





# This name is set in the model provider configuration.
MODEL_PROVIDER = "nvidia"

model_configs = [
    dd.ModelConfig(
        alias="vision",
        model="nvidia/nemotron-nano-12b-v2-vl",
        provider=MODEL_PROVIDER,
        inference_parameters=dd.ChatCompletionInferenceParams(
            temperature=0.60,
            top_p=0.95,
            max_tokens=2048,
        ),
    ),
]
# This name is set in the model provider configuration.
MODEL_PROVIDER = "nvidia"

model_configs = [
    dd.ModelConfig(
        alias="vision",
        model="nvidia/nemotron-nano-12b-v2-vl",
        provider=MODEL_PROVIDER,
        inference_parameters=dd.ChatCompletionInferenceParams(
            temperature=0.60,
            top_p=0.95,
            max_tokens=2048,
        ),
    ),
]

🏗️ Initialize the Data Designer Config Builder¶

The Data Designer config defines the dataset schema and generation process.
The config builder provides an intuitive interface for building this configuration.
The list of model configs is provided to the builder at initialization.

In [4]:

Copied!

config_builder = dd.DataDesignerConfigBuilder(model_configs=model_configs)
config_builder = dd.DataDesignerConfigBuilder(model_configs=model_configs)

🌱 Seed Dataset Creation¶

In this section, we'll prepare our visual documents as a seed dataset for summarization:

Loading Visual Documents: We use the ColPali dataset containing document images
Image Processing: Convert images to base64 format for vision model consumption
Metadata Extraction: Preserve relevant document information (filename, page number, source, etc.)

The seed dataset will be used to generate detailed text summaries of each document image.

In [5]:

Copied!





# Dataset processing configuration
IMG_COUNT = 512  # Number of images to process
BASE64_IMAGE_HEIGHT = 512  # Standardized height for model input

# Load ColPali dataset for visual documents
img_dataset_cfg = {"path": "vidore/colpali_train_set", "split": "train", "streaming": True}
# Dataset processing configuration
IMG_COUNT = 512  # Number of images to process
BASE64_IMAGE_HEIGHT = 512  # Standardized height for model input

# Load ColPali dataset for visual documents
img_dataset_cfg = {"path": "vidore/colpali_train_set", "split": "train", "streaming": True}

In [6]:

Copied!





def resize_image(image, height: int):
    """
    Resize image while maintaining aspect ratio.

    Args:
        image: PIL Image object
        height: Target height in pixels

    Returns:
        Resized PIL Image object
    """
    original_width, original_height = image.size
    width = int(original_width * (height / original_height))
    return image.resize((width, height))


def convert_image_to_chat_format(record, height: int) -> dict:
    """
    Convert PIL image to base64 format for chat template usage.

    Args:
        record: Dataset record containing image and metadata
        height: Target height for image resizing

    Returns:
        Updated record with base64_image and uuid fields
    """
    # Resize image for consistent processing
    image = resize_image(record["image"], height)

    # Convert to base64 string
    img_buffer = io.BytesIO()
    image.save(img_buffer, format="PNG")
    byte_data = img_buffer.getvalue()
    base64_encoded_data = base64.b64encode(byte_data)
    base64_string = base64_encoded_data.decode("utf-8")

    # Return updated record
    return record | {"base64_image": base64_string, "uuid": str(uuid.uuid4())}
def resize_image(image, height: int):
    """
    Resize image while maintaining aspect ratio.

    Args:
        image: PIL Image object
        height: Target height in pixels

    Returns:
        Resized PIL Image object
    """
    original_width, original_height = image.size
    width = int(original_width * (height / original_height))
    return image.resize((width, height))


def convert_image_to_chat_format(record, height: int) -> dict:
    """
    Convert PIL image to base64 format for chat template usage.

    Args:
        record: Dataset record containing image and metadata
        height: Target height for image resizing

    Returns:
        Updated record with base64_image and uuid fields
    """
    # Resize image for consistent processing
    image = resize_image(record["image"], height)

    # Convert to base64 string
    img_buffer = io.BytesIO()
    image.save(img_buffer, format="PNG")
    byte_data = img_buffer.getvalue()
    base64_encoded_data = base64.b64encode(byte_data)
    base64_string = base64_encoded_data.decode("utf-8")

    # Return updated record
    return record | {"base64_image": base64_string, "uuid": str(uuid.uuid4())}

In [7]:

Copied!





# Load and process the visual document dataset
print("📥 Loading and processing document images...")

img_dataset_iter = iter(
    load_dataset(**img_dataset_cfg).map(convert_image_to_chat_format, fn_kwargs={"height": BASE64_IMAGE_HEIGHT})
)
img_dataset = pd.DataFrame([next(img_dataset_iter) for _ in range(IMG_COUNT)])

print(f"✅ Loaded {len(img_dataset)} images with columns: {list(img_dataset.columns)}")
# Load and process the visual document dataset
print("📥 Loading and processing document images...")

img_dataset_iter = iter(
    load_dataset(**img_dataset_cfg).map(convert_image_to_chat_format, fn_kwargs={"height": BASE64_IMAGE_HEIGHT})
)
img_dataset = pd.DataFrame([next(img_dataset_iter) for _ in range(IMG_COUNT)])

print(f"✅ Loaded {len(img_dataset)} images with columns: {list(img_dataset.columns)}")

📥 Loading and processing document images...

✅ Loaded 512 images with columns: ['image', 'image_filename', 'query', 'answer', 'source', 'options', 'page', 'model', 'prompt', 'answer_type', 'base64_image', 'uuid']

In [8]:

Copied!

img_dataset.head()
img_dataset.head()

Out[8]:

	image	image_filename	query	answer	source	options	page	model	prompt	answer_type	base64_image	uuid
0	<PIL.JpegImagePlugin.JpegImageFile image mode=...	images/1810.07757_2.jpg	Comparing panels a, b, c, and d, which stateme...	D	arxiv_qa	['A. The variance of the data decreases from p...		gpt4V		None	iVBORw0KGgoAAAANSUhEUgAAAUAAAAIACAIAAAB8QiIMAA...	7080b790-1a13-4ad1-81dc-6bb2a2273c16
1	<PIL.JpegImagePlugin.JpegImageFile image mode=...	data/scrapped_pdfs_split/pages_extracted/energ...	What is the duration of the course mentioned i...	['five to ten hours, not including field trips']	pdf	None	9	sonnet	\n You are an assistant specialized in ...	None	iVBORw0KGgoAAAANSUhEUgAAAYsAAAIACAIAAAD8HddaAA...	7b2da4b0-9f36-465c-9711-d4b43f793968
2	<PIL.JpegImagePlugin.JpegImageFile image mode=...	data/scrapped_pdfs_split/pages_extracted/energ...	What is the primary purpose of the PTC in lith...	['protect against external short circuits']	pdf	None	414	sonnet	\n You are an assistant specialized in ...	None	iVBORw0KGgoAAAANSUhEUgAAAZgAAAIACAIAAAAwhO2xAA...	c4ba0a68-07cc-465e-9f74-57e88545c18b
3	<PIL.PngImagePlugin.PngImageFile image mode=L ...	0fd47b51ae9248ef36669b8619b1223f268edae3e7a44a...	What is the date?\nYour answer should be very ...	OCTOBER 17, 1995.	docvqa	None	None	None	None	None	iVBORw0KGgoAAAANSUhEUgAAAX0AAAIACAAAAABLRuMPAA...	05d54081-4037-4afc-acbf-4430de5fd07f
4	<PIL.PngImagePlugin.PngImageFile image mode=L ...	b335cfb9d442f8925ea41a064cb445a5395577f2345d52...	What is Bert Shulimson's title?\nYour response...	EXECUTIVE SECRETARY.	docvqa	None	None	None	None	None	iVBORw0KGgoAAAANSUhEUgAAAY8AAAIACAAAAABf/7+rAA...	6ebb260b-7256-4bb9-84b2-3b8256c26ca3

In [9]:

Copied!

# Add the seed dataset containing our processed images
df_seed = pd.DataFrame(img_dataset)[["uuid", "image_filename", "base64_image", "page", "options", "source"]]
config_builder.with_seed_dataset(dd.DataFrameSeedSource(df=df_seed))
# Add the seed dataset containing our processed images
df_seed = pd.DataFrame(img_dataset)[["uuid", "image_filename", "base64_image", "page", "options", "source"]]
config_builder.with_seed_dataset(dd.DataFrameSeedSource(df=df_seed))

Out[9]:

DataDesignerConfigBuilder()

In [10]:

Copied!





# Add a column to generate detailed document summaries
config_builder.add_column(
    dd.LLMTextColumnConfig(
        name="summary",
        model_alias="vision",
        prompt=(
            "Provide a detailed summary of the content in this image in Markdown format. "
            "Start from the top of the image and then describe it from top to bottom. "
            "Place a summary at the bottom."
        ),
        multi_modal_context=[dd.ImageContext(column_name="base64_image")],
    )
)

data_designer.validate(config_builder)
# Add a column to generate detailed document summaries
config_builder.add_column(
    dd.LLMTextColumnConfig(
        name="summary",
        model_alias="vision",
        prompt=(
            "Provide a detailed summary of the content in this image in Markdown format. "
            "Start from the top of the image and then describe it from top to bottom. "
            "Place a summary at the bottom."
        ),
        multi_modal_context=[dd.ImageContext(column_name="base64_image")],
    )
)

data_designer.validate(config_builder)

[12:09:15] [INFO] ✅ Validation passed

🔁 Iteration is key – preview the dataset!¶

Use the preview method to generate a sample of records quickly.
Inspect the results for quality and format issues.
Adjust column configurations, prompts, or parameters as needed.
Re-run the preview until satisfied.

In [11]:

Copied!

preview = data_designer.preview(config_builder, num_records=2)
preview = data_designer.preview(config_builder, num_records=2)

[12:09:15] [INFO] 👁️ Preview generation in progress

[12:09:15] [INFO] ✅ Validation passed

[12:09:15] [INFO] ⛓️ Sorting column configs into a Directed Acyclic Graph

[12:09:15] [INFO] 🩺 Running health checks for models...

[12:09:15] [INFO]   |-- 👀 Checking 'nvidia/nemotron-nano-12b-v2-vl' in provider named 'nvidia' for model alias 'vision'...

[12:09:16] [INFO]   |-- ✅ Passed!

[12:09:16] [INFO] 🌱 Sampling 2 records from seed dataset

[12:09:16] [INFO]   |-- seed dataset size: 512 records

[12:09:16] [INFO]   |-- sampling strategy: ordered

[12:09:16] [INFO] 📝 llm-text model config for column 'summary'

[12:09:16] [INFO]   |-- model: 'nvidia/nemotron-nano-12b-v2-vl'

[12:09:16] [INFO]   |-- model alias: 'vision'

[12:09:16] [INFO]   |-- model provider: 'nvidia'

[12:09:16] [INFO]   |-- inference parameters:

[12:09:16] [INFO]   |  |-- generation_type=chat-completion

[12:09:16] [INFO]   |  |-- max_parallel_requests=4

[12:09:16] [INFO]   |  |-- temperature=0.60

[12:09:16] [INFO]   |  |-- top_p=0.95

[12:09:16] [INFO]   |  |-- max_tokens=2048

[12:09:16] [INFO] ⚡️ Processing llm-text column 'summary' with 4 concurrent workers

[12:09:16] [INFO] ⏱️ llm-text column 'summary' will report progress after each record

[12:09:20] [INFO]   |-- 😐 llm-text column 'summary' progress: 1/2 (50%) complete, 1 ok, 0 failed, 0.22 rec/s, eta 4.5s

[12:09:21] [INFO]   |-- 🤩 llm-text column 'summary' progress: 2/2 (100%) complete, 2 ok, 0 failed, 0.41 rec/s, eta 0.0s

[12:09:21] [INFO] 📊 Model usage summary:

[12:09:21] [INFO]   |-- model: nvidia/nemotron-nano-12b-v2-vl

[12:09:21] [INFO]   |-- tokens: input=5232, output=582, total=5814, tps=1119

[12:09:21] [INFO]   |-- requests: success=2, failed=0, total=2, rpm=23

[12:09:21] [INFO] 📐 Measuring dataset column statistics:

[12:09:21] [INFO]   |-- 📝 column: 'summary'

[12:09:21] [INFO] ☀️ Preview complete!

In [12]:

Copied!

# Run this cell multiple times to cycle through the 2 preview records.
preview.display_sample_record()
# Run this cell multiple times to cycle through the 2 preview records.
preview.display_sample_record()

                                                                                                              
                                                 Seed Columns                                                 
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Name           ┃ Value                                                                                     ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ uuid           │ 7080b790-1a13-4ad1-81dc-6bb2a2273c16                                                      │
├────────────────┼───────────────────────────────────────────────────────────────────────────────────────────┤
│ image_filename │ images/1810.07757_2.jpg                                                                   │
├────────────────┼───────────────────────────────────────────────────────────────────────────────────────────┤
│ base64_image   │ iVBORw0KGgoAAAANSUhEUgAAAUAAAAIACAIAAAB8QiIMAAEAAElEQVR4nOy9edRt2VUX+vvNufY+53zNbauvVJdK… │
├────────────────┼───────────────────────────────────────────────────────────────────────────────────────────┤
│ page           │                                                                                           │
├────────────────┼───────────────────────────────────────────────────────────────────────────────────────────┤
│ options        │ ['A. The variance of the data decreases from panel a to panel d.', 'B. The variance of    │
│                │ the data increases from panel a to panel d.', 'C. The data presents no variance in any of │
│                │ the panels.', 'D. The variance of the data is inconsistent across the panels.', '-']      │
├────────────────┼───────────────────────────────────────────────────────────────────────────────────────────┤
│ source         │ arxiv_qa                                                                                  │
└────────────────┴───────────────────────────────────────────────────────────────────────────────────────────┘
                                                                                                              
                                                                                                              
                                              Generated Columns                                               
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Name    ┃ Value                                                                                            ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ summary │ In this figure, we present the results of the reconstruction process for the 3D array. We show   │
│         │ both the reconstructed trajectories and the corresponding 3D density map. Each row corresponds   │
│         │ to a different particle in the array, and each column shows a different stage of the             │
│         │ reconstruction process. The first column (a, b, c, d) shows the reconstructed trajectories,      │
│         │ while the second column (e, f, g, h) shows the corresponding 3D density map. The color scale     │
│         │ represents the density of particles in each bin, with blue indicating low density and red        │
│         │ indicating high density.                                                                         │
│         │ The figure demonstrates that the reconstruction process can accurately recover the trajectories  │
│         │ and density map of the particles in the array. The reconstructed trajectories closely match the  │
│         │ true trajectories, and the 3D density map provides a detailed visualization of the particle      │
│         │ distribution in the array.                                                                       │
│         │                                                                                                  │
│         │ This figure shows the results of the reconstruction process for a 3D array of particles. Each    │
│         │ row corresponds to a different particle in the array, and each column shows a different stage of │
│         │ the reconstruction process. The first column (a, b, c, d) shows the reconstructed trajectories,  │
│         │ while the second column (e, f, g, h) shows the corresponding 3D density map. The color scale     │
│         │ represents the density of particles in each bin, with blue indicating low density and red        │
│         │ indicating high density. The figure demonstrates that the reconstruction process can accurately  │
│         │ recover the trajectories and density map of the particles in the array.                          │
└─────────┴──────────────────────────────────────────────────────────────────────────────────────────────────┘
                                                                                                              
                                                  [index: 0]

In [13]:

Copied!

# The preview dataset is available as a pandas DataFrame.
preview.dataset
# The preview dataset is available as a pandas DataFrame.
preview.dataset

Out[13]:

	uuid	image_filename	base64_image	page	options	source	summary
0	7080b790-1a13-4ad1-81dc-6bb2a2273c16	images/1810.07757_2.jpg	iVBORw0KGgoAAAANSUhEUgAAAUAAAAIACAIAAAB8QiIMAA...		['A. The variance of the data decreases from p...	arxiv_qa	In this figure, we present the results of the ...
1	7b2da4b0-9f36-465c-9711-d4b43f793968	data/scrapped_pdfs_split/pages_extracted/energ...	iVBORw0KGgoAAAANSUhEUgAAAYsAAAIACAIAAAD8HddaAA...	9	None	pdf	### How to Use These Materials\n\nThe enclosed...

📊 Analyze the generated data¶

Data Designer automatically generates a basic statistical analysis of the generated data.
This analysis is available via the analysis property of generation result objects.

In [14]:

Copied!

# Print the analysis as a table.
preview.analysis.to_report()
# Print the analysis as a table.
preview.analysis.to_report()

──────────────────────────────────────── 🎨 Data Designer Dataset Profile ─────────────────────────────────────────

                                                                                                                   
                                                 Dataset Overview                                                  
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ number of records               ┃ number of columns               ┃ percent complete records                    ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ 2                               │ 1                               │ 100.0%                                      │
└─────────────────────────────────┴─────────────────────────────────┴─────────────────────────────────────────────┘
                                                                                                                   
                                                                                                                   
                                                📝 LLM-Text Columns                                                
┏━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃                  ┃               ┃                              ┃       prompt tokens ┃       completion tokens ┃
┃ column name      ┃     data type ┃         number unique values ┃          per record ┃              per record ┃
┡━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ summary          │        string │                   2 (100.0%) │        38.0 +/- 0.0 │          286.5 +/- 10.6 │
└──────────────────┴───────────────┴──────────────────────────────┴─────────────────────┴─────────────────────────┘
                                                                                                                   
                                                                                                                   
╭────────────────────────────────────────────────── Table Notes ──────────────────────────────────────────────────╮
│                                                                                                                 │
│  1. All token statistics are based on a sample of max(1000, len(dataset)) records.                              │
│  2. Tokens are calculated using tiktoken's cl100k_base tokenizer.                                               │
│                                                                                                                 │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
                                                                                                                   
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────

🔎 Visual Inspection¶

Let's compare the original document image with the generated summary to validate quality:

In [15]:

Copied!





# Compare original document with generated summary
index = 0  # Change this to view different examples

# Merge preview data with original images for comparison
comparison_dataset = preview.dataset.merge(pd.DataFrame(img_dataset)[["uuid", "image"]], how="left", on="uuid")

# Extract the record for display
record = comparison_dataset.iloc[index]

print("📄 Original Document Image:")
display(resize_image(record.image, BASE64_IMAGE_HEIGHT))

print("\n📝 Generated Summary:")
rich.print(Panel(record.summary, title="Document Summary", title_align="left"))
# Compare original document with generated summary
index = 0  # Change this to view different examples

# Merge preview data with original images for comparison
comparison_dataset = preview.dataset.merge(pd.DataFrame(img_dataset)[["uuid", "image"]], how="left", on="uuid")

# Extract the record for display
record = comparison_dataset.iloc[index]

print("📄 Original Document Image:")
display(resize_image(record.image, BASE64_IMAGE_HEIGHT))

print("\n📝 Generated Summary:")
rich.print(Panel(record.summary, title="Document Summary", title_align="left"))

📄 Original Document Image:

No description has been provided for this image

📝 Generated Summary:

╭─ Document Summary ──────────────────────────────────────────────────────────────────────────────────────────────╮
│ In this figure, we present the results of the reconstruction process for the 3D array. We show both the         │
│ reconstructed trajectories and the corresponding 3D density map. Each row corresponds to a different particle   │
│ in the array, and each column shows a different stage of the reconstruction process. The first column (a, b, c, │
│ d) shows the reconstructed trajectories, while the second column (e, f, g, h) shows the corresponding 3D        │
│ density map. The color scale represents the density of particles in each bin, with blue indicating low density  │
│ and red indicating high density.                                                                                │
│ The figure demonstrates that the reconstruction process can accurately recover the trajectories and density map │
│ of the particles in the array. The reconstructed trajectories closely match the true trajectories, and the 3D   │
│ density map provides a detailed visualization of the particle distribution in the array.                        │
│                                                                                                                 │
│ This figure shows the results of the reconstruction process for a 3D array of particles. Each row corresponds   │
│ to a different particle in the array, and each column shows a different stage of the reconstruction process.    │
│ The first column (a, b, c, d) shows the reconstructed trajectories, while the second column (e, f, g, h) shows  │
│ the corresponding 3D density map. The color scale represents the density of particles in each bin, with blue    │
│ indicating low density and red indicating high density. The figure demonstrates that the reconstruction process │
│ can accurately recover the trajectories and density map of the particles in the array.                          │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

🆙 Scale up!¶

Happy with your preview data?
Use the create method to submit larger Data Designer generation jobs.

In [16]:

Copied!

results = data_designer.create(config_builder, num_records=10, dataset_name="tutorial-4")
results = data_designer.create(config_builder, num_records=10, dataset_name="tutorial-4")

[12:09:21] [INFO] 🎨 Creating Data Designer dataset

[12:09:21] [INFO] ✅ Validation passed

[12:09:21] [INFO] ⛓️ Sorting column configs into a Directed Acyclic Graph

[12:09:21] [INFO] 🩺 Running health checks for models...

[12:09:21] [INFO]   |-- 👀 Checking 'nvidia/nemotron-nano-12b-v2-vl' in provider named 'nvidia' for model alias 'vision'...

[12:09:22] [INFO]   |-- ✅ Passed!

[12:09:22] [INFO] ⏳ Processing batch 1 of 1

[12:09:22] [INFO] 🌱 Sampling 10 records from seed dataset

[12:09:22] [INFO]   |-- seed dataset size: 512 records

[12:09:22] [INFO]   |-- sampling strategy: ordered

[12:09:22] [INFO] 📝 llm-text model config for column 'summary'

[12:09:22] [INFO]   |-- model: 'nvidia/nemotron-nano-12b-v2-vl'

[12:09:22] [INFO]   |-- model alias: 'vision'

[12:09:22] [INFO]   |-- model provider: 'nvidia'

[12:09:22] [INFO]   |-- inference parameters:

[12:09:22] [INFO]   |  |-- generation_type=chat-completion

[12:09:22] [INFO]   |  |-- max_parallel_requests=4

[12:09:22] [INFO]   |  |-- temperature=0.60

[12:09:22] [INFO]   |  |-- top_p=0.95

[12:09:22] [INFO]   |  |-- max_tokens=2048

[12:09:22] [INFO] ⚡️ Processing llm-text column 'summary' with 4 concurrent workers

[12:09:22] [INFO] ⏱️ llm-text column 'summary' will report progress after each record

[12:09:24] [INFO]   |-- 🥚 llm-text column 'summary' progress: 1/10 (10%) complete, 1 ok, 0 failed, 0.45 rec/s, eta 19.9s

[12:09:25] [INFO]   |-- 🥚 llm-text column 'summary' progress: 2/10 (20%) complete, 2 ok, 0 failed, 0.69 rec/s, eta 11.6s

[12:09:27] [INFO]   |-- 🐣 llm-text column 'summary' progress: 3/10 (30%) complete, 3 ok, 0 failed, 0.62 rec/s, eta 11.3s

[12:09:27] [INFO]   |-- 🐣 llm-text column 'summary' progress: 4/10 (40%) complete, 4 ok, 0 failed, 0.70 rec/s, eta 8.5s

[12:09:28] [INFO]   |-- 🐥 llm-text column 'summary' progress: 5/10 (50%) complete, 5 ok, 0 failed, 0.83 rec/s, eta 6.0s

[12:09:29] [INFO]   |-- 🐥 llm-text column 'summary' progress: 6/10 (60%) complete, 6 ok, 0 failed, 0.87 rec/s, eta 4.6s

[12:09:31] [INFO]   |-- 🐥 llm-text column 'summary' progress: 7/10 (70%) complete, 7 ok, 0 failed, 0.77 rec/s, eta 3.9s

[12:09:32] [INFO]   |-- 🐤 llm-text column 'summary' progress: 8/10 (80%) complete, 8 ok, 0 failed, 0.82 rec/s, eta 2.4s

[12:09:33] [INFO]   |-- 🐤 llm-text column 'summary' progress: 9/10 (90%) complete, 9 ok, 0 failed, 0.77 rec/s, eta 1.3s

[12:09:34] [INFO]   |-- 🐔 llm-text column 'summary' progress: 10/10 (100%) complete, 10 ok, 0 failed, 0.83 rec/s, eta 0.0s

[12:09:34] [INFO] 📊 Model usage summary:

[12:09:34] [INFO]   |-- model: nvidia/nemotron-nano-12b-v2-vl

[12:09:34] [INFO]   |-- tokens: input=30768, output=3743, total=34511, tps=2794

[12:09:34] [INFO]   |-- requests: success=10, failed=0, total=10, rpm=48

[12:09:34] [INFO] 📐 Measuring dataset column statistics:

[12:09:34] [INFO]   |-- 📝 column: 'summary'

In [17]:

Copied!

# Load the generated dataset as a pandas DataFrame.
dataset = results.load_dataset()

dataset.head()
# Load the generated dataset as a pandas DataFrame.
dataset = results.load_dataset()

dataset.head()

Out[17]:

	uuid	image_filename	base64_image	page	options	source	summary
0	7080b790-1a13-4ad1-81dc-6bb2a2273c16	images/1810.07757_2.jpg	iVBORw0KGgoAAAANSUhEUgAAAUAAAAIACAIAAAB8QiIMAA...		['A. The variance of the data decreases from p...	arxiv_qa	a) b) c) d) e) f) g) h) T Colorbar ...
1	7b2da4b0-9f36-465c-9711-d4b43f793968	data/scrapped_pdfs_split/pages_extracted/energ...	iVBORw0KGgoAAAANSUhEUgAAAYsAAAIACAIAAAD8HddaAA...	9	<NA>	pdf	This image is a typewritten document providing...
2	c4ba0a68-07cc-465e-9f74-57e88545c18b	data/scrapped_pdfs_split/pages_extracted/energ...	iVBORw0KGgoAAAANSUhEUgAAAZgAAAIACAIAAAAwhO2xAA...	414	<NA>	pdf	## LITHIUM BATTERIES 14.87 The primary purpos...
3	05d54081-4037-4afc-acbf-4430de5fd07f	0fd47b51ae9248ef36669b8619b1223f268edae3e7a44a...	iVBORw0KGgoAAAANSUhEUgAAAX0AAAIACAAAAABLRuMPAA...	<NA>	<NA>	docvqa	This image shows a document titled "CONTINUOUS...
4	6ebb260b-7256-4bb9-84b2-3b8256c26ca3	b335cfb9d442f8925ea41a064cb445a5395577f2345d52...	iVBORw0KGgoAAAANSUhEUgAAAY8AAAIACAAAAABf/7+rAA...	<NA>	<NA>	docvqa	This image is a letter from the Missouri Assoc...

In [18]:

Copied!

# Load the analysis results into memory.
analysis = results.load_analysis()

analysis.to_report()
# Load the analysis results into memory.
analysis = results.load_analysis()

analysis.to_report()

──────────────────────────────────────── 🎨 Data Designer Dataset Profile ─────────────────────────────────────────

                                                                                                                   
                                                 Dataset Overview                                                  
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ number of records               ┃ number of columns               ┃ percent complete records                    ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ 10                              │ 1                               │ 100.0%                                      │
└─────────────────────────────────┴─────────────────────────────────┴─────────────────────────────────────────────┘
                                                                                                                   
                                                                                                                   
                                                📝 LLM-Text Columns                                                
┏━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃                  ┃               ┃                              ┃       prompt tokens ┃       completion tokens ┃
┃ column name      ┃     data type ┃         number unique values ┃          per record ┃              per record ┃
┡━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ summary          │        string │                  10 (100.0%) │        38.0 +/- 0.0 │         352.5 +/- 214.6 │
└──────────────────┴───────────────┴──────────────────────────────┴─────────────────────┴─────────────────────────┘
                                                                                                                   
                                                                                                                   
╭────────────────────────────────────────────────── Table Notes ──────────────────────────────────────────────────╮
│                                                                                                                 │
│  1. All token statistics are based on a sample of max(1000, len(dataset)) records.                              │
│  2. Tokens are calculated using tiktoken's cl100k_base tokenizer.                                               │
│                                                                                                                 │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
                                                                                                                   
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────

⏭️ Next Steps¶

Now that you've learned how to use visual context for image summarization in Data Designer, explore more:

Experiment with different vision models for specific document types
Try different prompt variations to generate specialized descriptions (e.g., technical details, key findings)
Combine vision-based summaries with other column types for multi-modal workflows
Apply this pattern to other vision tasks like image captioning, OCR validation, or visual question answering
Generating images with Data Designer