๐จ Data Designer Tutorial: Providing Images as Context for Vision-Based Data Generationยถ
๐ What you'll learnยถ
This notebook demonstrates how to provide images as context to generate text descriptions using vision-language models.
- โจ Visual Document Processing: Converting images to chat-ready format for model consumption
- ๐ Vision-Language Generation: Using vision models to generate detailed summaries from images
If this is your first time using Data Designer, we recommend starting with the first notebook in this tutorial series.
โฌ๏ธ Install dependencies (if required)ยถ
!uv pip install pillow
Using Python 3.11.14 environment at: /home/runner/work/DataDesigner/DataDesigner/.venv โ Resolving dependencies... โ Resolving dependencies... โ Resolving dependencies... โ Resolving dependencies...
โ pillow==12.0.0
โ Resolved 1 package in 104ms โ Preparing packages... (0/0) โ Preparing packages... (0/1)
โ Preparing packages... (0/1) โ Preparing packages... (0/1) pillow ------------------------------ 0 B/6.71 MiB โ Preparing packages... (0/1) pillow ------------------------------ 14.91 KiB/6.71 MiB โ Preparing packages... (0/1) pillow ------------------------------ 30.91 KiB/6.71 MiB โ Preparing packages... (0/1) pillow ------------------------------ 46.91 KiB/6.71 MiB โ Preparing packages... (0/1) pillow ------------------------------ 62.91 KiB/6.71 MiB โ Preparing packages... (0/1) pillow ------------------------------ 78.71 KiB/6.71 MiB โ Preparing packages... (0/1) pillow ------------------------------ 94.71 KiB/6.71 MiB โ Preparing packages... (0/1) pillow ------------------------------ 110.71 KiB/6.71 MiB โ Preparing packages... (0/1) pillow ------------------------------ 126.71 KiB/6.71 MiB โ Preparing packages... (0/1) pillow ------------------------------ 142.71 KiB/6.71 MiB โ Preparing packages... (0/1) pillow ------------------------------ 158.71 KiB/6.71 MiB โ Preparing packages... (0/1) pillow ------------------------------ 174.71 KiB/6.71 MiB โ Preparing packages... (0/1) pillow ------------------------------ 190.71 KiB/6.71 MiB
โ Preparing packages... (0/1) pillow ------------------------------ 206.71 KiB/6.71 MiB โ Preparing packages... (0/1) pillow ------------------------------ 222.71 KiB/6.71 MiB โ Preparing packages... (0/1) pillow ------------------------------ 238.71 KiB/6.71 MiB โ Preparing packages... (0/1) pillow ------------------------------ 254.71 KiB/6.71 MiB โ Preparing packages... (0/1) pillow ------------------------------ 270.71 KiB/6.71 MiB
โ Preparing packages... (0/1) pillow ------------------------------ 966.71 KiB/6.71 MiB โ Preparing packages... (0/1) pillow ------------------------------ 3.39 MiB/6.71 MiB
โ Preparing packages... (0/1) Prepared 1 package in 193ms โโโโโโโโโโโโโโโโโโโโ [0/0] Installing wheels... โโโโโโโโโโโโโโโโโโโโ [0/1] Installing wheels... โโโโโโโโโโโโโโโโโโโโ [0/1] pillow==12.0.0 โโโโโโโโโโโโโโโโโโโโ [1/1] pillow==12.0.0 Installed 1 package in 3ms + pillow==12.0.0
๐ฆ Import the essentialsยถ
- The
essentialsmodule provides quick access to the most commonly used objects.
# Standard library imports
import base64
import io
import uuid
# Third-party imports
import pandas as pd
import rich
from datasets import load_dataset
from IPython.display import display
from rich.panel import Panel
# Data Designer imports
from data_designer.essentials import (
DataDesigner,
DataDesignerConfigBuilder,
ImageContext,
ImageFormat,
InferenceParameters,
LLMTextColumnConfig,
ModalityDataType,
ModelConfig,
)
โ๏ธ Initialize the Data Designer interfaceยถ
DataDesigneris the main object is responsible for managing the data generation process.When initialized without arguments, the default model providers are used.
data_designer = DataDesigner()
๐๏ธ Define model configurationsยถ
Each
ModelConfigdefines a model that can be used during the generation process.The "model alias" is used to reference the model in the Data Designer config (as we will see below).
The "model provider" is the external service that hosts the model (see the model config docs for more details).
By default, we use build.nvidia.com as the model provider.
# This name is set in the model provider configuration.
MODEL_PROVIDER = "nvidia"
model_configs = [
ModelConfig(
alias="vision",
model="meta/llama-4-scout-17b-16e-instruct",
provider=MODEL_PROVIDER,
inference_parameters=InferenceParameters(
temperature=0.60,
top_p=0.95,
max_tokens=2048,
),
),
]
๐๏ธ Initialize the Data Designer Config Builderยถ
The Data Designer config defines the dataset schema and generation process.
The config builder provides an intuitive interface for building this configuration.
The list of model configs is provided to the builder at initialization.
config_builder = DataDesignerConfigBuilder(model_configs=model_configs)
๐ฑ Seed Dataset Creationยถ
In this section, we'll prepare our visual documents as a seed dataset for summarization:
- Loading Visual Documents: We use the ColPali dataset containing document images
- Image Processing: Convert images to base64 format for vision model consumption
- Metadata Extraction: Preserve relevant document information (filename, page number, source, etc.)
The seed dataset will be used to generate detailed text summaries of each document image.
# Dataset processing configuration
IMG_COUNT = 512 # Number of images to process
BASE64_IMAGE_HEIGHT = 512 # Standardized height for model input
# Load ColPali dataset for visual documents
img_dataset_cfg = {"path": "vidore/colpali_train_set", "split": "train", "streaming": True}
def resize_image(image, height: int):
"""
Resize image while maintaining aspect ratio.
Args:
image: PIL Image object
height: Target height in pixels
Returns:
Resized PIL Image object
"""
original_width, original_height = image.size
width = int(original_width * (height / original_height))
return image.resize((width, height))
def convert_image_to_chat_format(record, height: int) -> dict:
"""
Convert PIL image to base64 format for chat template usage.
Args:
record: Dataset record containing image and metadata
height: Target height for image resizing
Returns:
Updated record with base64_image and uuid fields
"""
# Resize image for consistent processing
image = resize_image(record["image"], height)
# Convert to base64 string
img_buffer = io.BytesIO()
image.save(img_buffer, format="PNG")
byte_data = img_buffer.getvalue()
base64_encoded_data = base64.b64encode(byte_data)
base64_string = base64_encoded_data.decode("utf-8")
# Return updated record
return record | {"base64_image": base64_string, "uuid": str(uuid.uuid4())}
# Load and process the visual document dataset
print("๐ฅ Loading and processing document images...")
img_dataset_iter = iter(
load_dataset(**img_dataset_cfg).map(convert_image_to_chat_format, fn_kwargs={"height": BASE64_IMAGE_HEIGHT})
)
img_dataset = pd.DataFrame([next(img_dataset_iter) for _ in range(IMG_COUNT)])
print(f"โ
Loaded {len(img_dataset)} images with columns: {list(img_dataset.columns)}")
๐ฅ Loading and processing document images...
โ Loaded 512 images with columns: ['image', 'image_filename', 'query', 'answer', 'source', 'options', 'page', 'model', 'prompt', 'answer_type', 'base64_image', 'uuid']
img_dataset.head()
| image | image_filename | query | answer | source | options | page | model | prompt | answer_type | base64_image | uuid | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | <PIL.JpegImagePlugin.JpegImageFile image mode=... | images/1810.07757_2.jpg | Comparing panels a, b, c, and d, which stateme... | D | arxiv_qa | ['A. The variance of the data decreases from p... | gpt4V | None | iVBORw0KGgoAAAANSUhEUgAAAUAAAAIACAIAAAB8QiIMAA... | 8a9f2f31-c0d7-4d23-a5ad-6e89e2aab0ac | ||
| 1 | <PIL.JpegImagePlugin.JpegImageFile image mode=... | data/scrapped_pdfs_split/pages_extracted/energ... | What is the duration of the course mentioned i... | ['five to ten hours, not including field trips'] | None | 9 | sonnet | \n You are an assistant specialized in ... | None | iVBORw0KGgoAAAANSUhEUgAAAYsAAAIACAIAAAD8HddaAA... | 79c38132-aa11-4383-9ec6-fe497ca7d423 | |
| 2 | <PIL.JpegImagePlugin.JpegImageFile image mode=... | data/scrapped_pdfs_split/pages_extracted/energ... | What is the primary purpose of the PTC in lith... | ['protect against external short circuits'] | None | 414 | sonnet | \n You are an assistant specialized in ... | None | iVBORw0KGgoAAAANSUhEUgAAAZgAAAIACAIAAAAwhO2xAA... | 24e70b94-615d-4683-b2c3-e9cb7a19fb41 | |
| 3 | <PIL.PngImagePlugin.PngImageFile image mode=L ... | 0fd47b51ae9248ef36669b8619b1223f268edae3e7a44a... | What is the date?\nYour answer should be very ... | OCTOBER 17, 1995. | docvqa | None | None | None | None | None | iVBORw0KGgoAAAANSUhEUgAAAX0AAAIACAAAAABLRuMPAA... | f60aa282-3187-49b4-b140-3a07a7cd0a16 |
| 4 | <PIL.PngImagePlugin.PngImageFile image mode=L ... | b335cfb9d442f8925ea41a064cb445a5395577f2345d52... | What is Bert Shulimson's title?\nYour response... | EXECUTIVE SECRETARY. | docvqa | None | None | None | None | None | iVBORw0KGgoAAAANSUhEUgAAAY8AAAIACAAAAABf/7+rAA... | fb884f8a-ebd7-42a4-a006-c549ddddb580 |
# Add the seed dataset containing our processed images
df_seed = pd.DataFrame(img_dataset)[["uuid", "image_filename", "base64_image", "page", "options", "source"]]
config_builder.with_seed_dataset(
DataDesigner.make_seed_reference_from_dataframe(df_seed, file_path="colpali_train_set.csv")
)
[02:33:37] [INFO] ๐พ Saving seed dataset to colpali_train_set.csv
DataDesignerConfigBuilder( seed_dataset: 'colpali_train_set.csv' seed_dataset_columns: [ "uuid", "image_filename", "base64_image", "page", "options", "source" ] )
# Add a column to generate detailed document summaries
config_builder.add_column(
LLMTextColumnConfig(
name="summary",
model_alias="vision",
prompt=(
"Provide a detailed summary of the content in this image in Markdown format. "
"Start from the top of the image and then describe it from top to bottom. "
"Place a summary at the bottom."
),
multi_modal_context=[
ImageContext(
column_name="base64_image",
data_type=ModalityDataType.BASE64,
image_format=ImageFormat.PNG,
)
],
)
)
DataDesignerConfigBuilder( seed_dataset: 'colpali_train_set.csv' seed_dataset_columns: [ "uuid", "image_filename", "base64_image", "page", "options", "source" ] llm_text_columns: ['summary'] )
๐ Iteration is key โ preview the dataset!ยถ
Use the
previewmethod to generate a sample of records quickly.Inspect the results for quality and format issues.
Adjust column configurations, prompts, or parameters as needed.
Re-run the preview until satisfied.
preview = data_designer.preview(config_builder, num_records=2)
[02:33:38] [INFO] ๐ธ Preview generation in progress
[02:33:38] [INFO] โ Validation passed
[02:33:38] [INFO] โ๏ธ Sorting column configs into a Directed Acyclic Graph
[02:33:38] [INFO] ๐ฉบ Running health checks for models...
[02:33:38] [INFO] |-- ๐ Checking 'meta/llama-4-scout-17b-16e-instruct' in provider named 'nvidia' for model alias 'vision'...
[02:33:42] [INFO] |-- โ Passed!
[02:33:46] [INFO] ๐ฑ Sampling 2 records from seed dataset
[02:33:46] [INFO] |-- seed dataset size: 512 records
[02:33:46] [INFO] |-- sampling strategy: ordered
[02:33:46] [INFO] ๐ Preparing llm-text column generation
[02:33:46] [INFO] |-- column name: 'summary'
[02:33:46] [INFO] |-- model config:
{
"alias": "vision",
"model": "meta/llama-4-scout-17b-16e-instruct",
"inference_parameters": {
"temperature": 0.6,
"top_p": 0.95,
"max_tokens": 2048,
"max_parallel_requests": 4,
"timeout": null,
"extra_body": null
},
"provider": "nvidia"
}
[02:33:46] [INFO] ๐ Processing llm-text column 'summary' with 4 concurrent workers
[02:34:07] [INFO] ๐ Model usage summary:
{
"meta/llama-4-scout-17b-16e-instruct": {
"token_usage": {
"prompt_tokens": 1396,
"completion_tokens": 887,
"total_tokens": 2283
},
"request_usage": {
"successful_requests": 2,
"failed_requests": 0,
"total_requests": 2
},
"tokens_per_second": 96,
"requests_per_minute": 5
}
}
[02:34:07] [INFO] ๐ Measuring dataset column statistics:
[02:34:07] [INFO] |-- ๐ฑ column: 'uuid'
[02:34:07] [INFO] |-- ๐ฑ column: 'image_filename'
[02:34:07] [INFO] |-- ๐ฑ column: 'base64_image'
[02:34:07] [INFO] |-- ๐ฑ column: 'page'
[02:34:07] [INFO] |-- ๐ฑ column: 'options'
[02:34:07] [INFO] |-- ๐ฑ column: 'source'
[02:34:07] [INFO] |-- ๐ column: 'summary'
[02:34:07] [INFO] ๐ Preview complete!
# Run this cell multiple times to cycle through the 2 preview records.
preview.display_sample_record()
Seed Columns โโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ Name โ Value โ โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ โ uuid โ 8a9f2f31-c0d7-4d23-a5ad-6e89e2aab0ac โ โโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค โ image_filename โ images/1810.07757_2.jpg โ โโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค โ base64_image โ iVBORw0KGgoAAAANSUhEUgAAAUAAAAIACAIAAAB8QiIMAAEAAElEQVR4nOy9edRt2VUX+vvNufY+53zNbauvVJdKSEUQQโฆ โ โโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค โ page โ nan โ โโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค โ options โ ['A. The variance of the data decreases from panel a to panel d.', 'B. The variance of the โ โ โ data increases from panel a to panel d.', 'C. The data presents no variance in any of the โ โ โ panels.', 'D. The variance of the data is inconsistent across the panels.', '-'] โ โโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค โ source โ arxiv_qa โ โโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Generated Columns โโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ Name โ Value โ โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ โ summary โ ## Image Summary โ โ โ โ โ โ The image presents a collection of eight heatmap graphs, arranged in two columns and four rows, โ โ โ labeled from **a)** to **h)**. Each graph displays a similar structure, with the x-axis representing โ โ โ time in microseconds (**t(ฮผs)**) and the y-axis representing a change in frequency in megahertz ( โ โ โ **ฮf(MHz)** ). โ โ โ โ โ โ ### Top to Bottom Description โ โ โ โ โ โ 1. **Color Bar**: โ โ โ - At the top of the image, a color bar is provided, indicating the correlation coefficient (**ฯ**) โ โ โ values ranging from 0.2 to 1. โ โ โ - The color bar transitions from blue (for lower values) to yellow (for higher values). โ โ โ โ โ โ 2. **Graphs a) and e)**: โ โ โ - Graph **a)** shows a predominantly blue area, indicating a strong negative correlation (or a โ โ โ specific pattern) at certain times and frequency changes. โ โ โ - Graph **e)** displays a similar but less uniform pattern, with a notable dark blue area that โ โ โ suggests a significant correlation. โ โ โ โ โ โ 3. **Graphs b) and f)**: โ โ โ - Graph **b)** features a scattered distribution of colors, suggesting variability in the โ โ โ correlation across different times and frequency changes. โ โ โ - Graph **f)** presents a smoother transition of colors, with a clear trend of increasing โ โ โ correlation over time. โ โ โ โ โ โ 4. **Graphs c) and g)**: โ โ โ - Graph **c)** exhibits a highly variable and scattered pattern, similar to graph **b)**, but with โ โ โ a greater range of frequency changes. โ โ โ - Graph **g)** shows a more organized pattern, with a clear diagonal trend indicating how โ โ โ correlation changes over time. โ โ โ โ โ โ 5. **Graphs d) and h)**: โ โ โ - Graph **d)** displays a scattered pattern with a wide range of frequency changes, similar to โ โ โ graphs **b)** and **c)**. โ โ โ - Graph **h)** presents a smoother pattern, with a gradual change in correlation over time. โ โ โ โ โ โ ### Summary โ โ โ โ โ โ The image provides a visual comparison of eight different scenarios or conditions, labeled **a)** โ โ โ through **h)**, in terms of their correlation over time and frequency change. The heatmaps allow for โ โ โ the quick identification of patterns, trends, and variability across these conditions, with the color โ โ โ bar at the top serving as a reference for interpreting the correlation coefficient values. The graphs โ โ โ suggest that some conditions exhibit strong and organized patterns of correlation (e.g., **a)** and โ โ โ **e)**), while others show more variability and randomness (e.g., **b)**, **c)**, and **d)**). โ โ โ Overall, the image facilitates a detailed analysis of how different factors influence correlation โ โ โ over time and across different frequency changes. โ โโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ [index: 0]
# The preview dataset is available as a pandas DataFrame.
preview.dataset
| uuid | image_filename | base64_image | page | options | source | summary | |
|---|---|---|---|---|---|---|---|
| 0 | 8a9f2f31-c0d7-4d23-a5ad-6e89e2aab0ac | images/1810.07757_2.jpg | iVBORw0KGgoAAAANSUhEUgAAAUAAAAIACAIAAAB8QiIMAA... | NaN | ['A. The variance of the data decreases from p... | arxiv_qa | ## Image Summary\n\nThe image presents a colle... |
| 1 | 79c38132-aa11-4383-9ec6-fe497ca7d423 | data/scrapped_pdfs_split/pages_extracted/energ... | iVBORw0KGgoAAAANSUhEUgAAAYsAAAIACAIAAAD8HddaAA... | 9.0 | None | ## **How to Use These Materials**\nThe enclose... |
๐ Analyze the generated dataยถ
Data Designer automatically generates a basic statistical analysis of the generated data.
This analysis is available via the
analysisproperty of generation result objects.
# Print the analysis as a table.
preview.analysis.to_report()
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ ๐จ Data Designer Dataset Profile โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Dataset Overview โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ number of records โ number of columns โ percent complete records โ โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ โ 2 โ 7 โ 100.0% โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ ๐ฑ Seed-Dataset Columns โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ column name โ data type โ number unique values โ โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ โ uuid โ string โ 2 (100.0%) โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค โ image_filename โ string โ 2 (100.0%) โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค โ base64_image โ string โ 2 (100.0%) โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค โ page โ float โ 1 (50.0%) โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค โ options โ string โ 2 (100.0%) โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค โ source โ string โ 2 (100.0%) โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ ๐ LLM-Text Columns โโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโโโโโ โ โ โ โ prompt tokens โ completion tokens โ โ column name โ data type โ number unique values โ per record โ per record โ โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ โ summary โ string โ 2 (100.0%) โ 38.0 +/- 0.0 โ 445.0 +/- 144.2 โ โโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโ โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Table Notes โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ โ โ โ 1. All token statistics are based on a sample of max(1000, len(dataset)) records. โ โ 2. Tokens are calculated using tiktoken's cl100k_base tokenizer. โ โ โ โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ Visual Inspectionยถ
Let's compare the original document image with the generated summary to validate quality:
# Compare original document with generated summary
index = 0 # Change this to view different examples
# Merge preview data with original images for comparison
comparison_dataset = preview.dataset.merge(pd.DataFrame(img_dataset)[["uuid", "image"]], how="left", on="uuid")
# Extract the record for display
record = comparison_dataset.iloc[index]
print("๐ Original Document Image:")
display(resize_image(record.image, BASE64_IMAGE_HEIGHT))
print("\n๐ Generated Summary:")
rich.print(Panel(record.summary, title="Document Summary", title_align="left"))
๐ Original Document Image:
๐ Generated Summary:
โญโ Document Summary โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ โ ## Image Summary โ โ โ โ The image presents a collection of eight heatmap graphs, arranged in two columns and four rows, labeled from โ โ **a)** to **h)**. Each graph displays a similar structure, with the x-axis representing time in microseconds โ โ (**t(ฮผs)**) and the y-axis representing a change in frequency in megahertz ( **ฮf(MHz)** ). โ โ โ โ ### Top to Bottom Description โ โ โ โ 1. **Color Bar**: โ โ - At the top of the image, a color bar is provided, indicating the correlation coefficient (**ฯ**) values โ โ ranging from 0.2 to 1. โ โ - The color bar transitions from blue (for lower values) to yellow (for higher values). โ โ โ โ 2. **Graphs a) and e)**: โ โ - Graph **a)** shows a predominantly blue area, indicating a strong negative correlation (or a specific โ โ pattern) at certain times and frequency changes. โ โ - Graph **e)** displays a similar but less uniform pattern, with a notable dark blue area that suggests a โ โ significant correlation. โ โ โ โ 3. **Graphs b) and f)**: โ โ - Graph **b)** features a scattered distribution of colors, suggesting variability in the correlation across โ โ different times and frequency changes. โ โ - Graph **f)** presents a smoother transition of colors, with a clear trend of increasing correlation over โ โ time. โ โ โ โ 4. **Graphs c) and g)**: โ โ - Graph **c)** exhibits a highly variable and scattered pattern, similar to graph **b)**, but with a greater โ โ range of frequency changes. โ โ - Graph **g)** shows a more organized pattern, with a clear diagonal trend indicating how correlation โ โ changes over time. โ โ โ โ 5. **Graphs d) and h)**: โ โ - Graph **d)** displays a scattered pattern with a wide range of frequency changes, similar to graphs **b)** โ โ and **c)**. โ โ - Graph **h)** presents a smoother pattern, with a gradual change in correlation over time. โ โ โ โ ### Summary โ โ โ โ The image provides a visual comparison of eight different scenarios or conditions, labeled **a)** through โ โ **h)**, in terms of their correlation over time and frequency change. The heatmaps allow for the quick โ โ identification of patterns, trends, and variability across these conditions, with the color bar at the top โ โ serving as a reference for interpreting the correlation coefficient values. The graphs suggest that some โ โ conditions exhibit strong and organized patterns of correlation (e.g., **a)** and **e)**), while others show โ โ more variability and randomness (e.g., **b)**, **c)**, and **d)**). Overall, the image facilitates a detailed โ โ analysis of how different factors influence correlation over time and across different frequency changes. โ โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
๐ Scale up!ยถ
Happy with your preview data?
Use the
createmethod to submit larger Data Designer generation jobs.
results = data_designer.create(config_builder, num_records=10)
[02:34:07] [INFO] ๐จ Creating Data Designer dataset
[02:34:07] [INFO] โ Validation passed
[02:34:07] [INFO] โ๏ธ Sorting column configs into a Directed Acyclic Graph
[02:34:07] [INFO] ๐ Dataset path '/home/runner/work/DataDesigner/DataDesigner/docs/notebook_source/artifacts/dataset' already exists. Dataset from this session will be saved to '/home/runner/work/DataDesigner/DataDesigner/docs/notebook_source/artifacts/dataset_12-11-2025_023407' instead.
[02:34:07] [INFO] ๐ฉบ Running health checks for models...
[02:34:07] [INFO] |-- ๐ Checking 'meta/llama-4-scout-17b-16e-instruct' in provider named 'nvidia' for model alias 'vision'...
[02:34:11] [INFO] |-- โ Passed!
[02:34:12] [INFO] โณ Processing batch 1 of 1
[02:34:15] [INFO] ๐ฑ Sampling 10 records from seed dataset
[02:34:15] [INFO] |-- seed dataset size: 512 records
[02:34:15] [INFO] |-- sampling strategy: ordered
[02:34:15] [INFO] ๐ Preparing llm-text column generation
[02:34:15] [INFO] |-- column name: 'summary'
[02:34:15] [INFO] |-- model config:
{
"alias": "vision",
"model": "meta/llama-4-scout-17b-16e-instruct",
"inference_parameters": {
"temperature": 0.6,
"top_p": 0.95,
"max_tokens": 2048,
"max_parallel_requests": 4,
"timeout": null,
"extra_body": null
},
"provider": "nvidia"
}
[02:34:15] [INFO] ๐ Processing llm-text column 'summary' with 4 concurrent workers
[02:34:46] [INFO] ๐ Model usage summary:
{
"meta/llama-4-scout-17b-16e-instruct": {
"token_usage": {
"prompt_tokens": 8140,
"completion_tokens": 4405,
"total_tokens": 12545
},
"request_usage": {
"successful_requests": 10,
"failed_requests": 0,
"total_requests": 10
},
"tokens_per_second": 372,
"requests_per_minute": 17
}
}
[02:34:46] [INFO] ๐ Measuring dataset column statistics:
[02:34:46] [INFO] |-- ๐ฑ column: 'uuid'
[02:34:46] [INFO] |-- ๐ฑ column: 'image_filename'
[02:34:46] [INFO] |-- ๐ฑ column: 'base64_image'
[02:34:46] [INFO] |-- ๐ฑ column: 'page'
[02:34:46] [INFO] |-- ๐ฑ column: 'options'
[02:34:46] [INFO] |-- ๐ฑ column: 'source'
[02:34:46] [INFO] |-- ๐ column: 'summary'
# Load the generated dataset as a pandas DataFrame.
dataset = results.load_dataset()
dataset.head()
| uuid | image_filename | base64_image | page | options | source | summary | |
|---|---|---|---|---|---|---|---|
| 0 | 8a9f2f31-c0d7-4d23-a5ad-6e89e2aab0ac | images/1810.07757_2.jpg | iVBORw0KGgoAAAANSUhEUgAAAUAAAAIACAIAAAB8QiIMAA... | <NA> | ['A. The variance of the data decreases from p... | arxiv_qa | ## Image Summary The image presents a collect... |
| 1 | 79c38132-aa11-4383-9ec6-fe497ca7d423 | data/scrapped_pdfs_split/pages_extracted/energ... | iVBORw0KGgoAAAANSUhEUgAAAYsAAAIACAIAAAD8HddaAA... | 9.0 | <NA> | ## Document Summary The document appears to be... | |
| 2 | 24e70b94-615d-4683-b2c3-e9cb7a19fb41 | data/scrapped_pdfs_split/pages_extracted/energ... | iVBORw0KGgoAAAANSUhEUgAAAZgAAAIACAIAAAAwhO2xAA... | 414.0 | <NA> | ## Lithium Batteries Page 1487 ### Overview o... | |
| 3 | f60aa282-3187-49b4-b140-3a07a7cd0a16 | 0fd47b51ae9248ef36669b8619b1223f268edae3e7a44a... | iVBORw0KGgoAAAANSUhEUgAAAX0AAAIACAAAAABLRuMPAA... | <NA> | <NA> | docvqa | ## Document Summary The image depicts the titl... |
| 4 | fb884f8a-ebd7-42a4-a006-c549ddddb580 | b335cfb9d442f8925ea41a064cb445a5395577f2345d52... | iVBORw0KGgoAAAANSUhEUgAAAY8AAAIACAAAAABf/7+rAA... | <NA> | <NA> | docvqa | ## Document Summary ### Header Section The doc... |
# Load the analysis results into memory.
analysis = results.load_analysis()
analysis.to_report()
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ ๐จ Data Designer Dataset Profile โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Dataset Overview โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ number of records โ number of columns โ percent complete records โ โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ โ 10 โ 7 โ 100.0% โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ ๐ฑ Seed-Dataset Columns โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ column name โ data type โ number unique values โ โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ โ uuid โ string โ 10 (100.0%) โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค โ image_filename โ string โ 10 (100.0%) โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค โ base64_image โ string โ 10 (100.0%) โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค โ page โ float โ 6 (60.0%) โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค โ options โ string โ 3 (30.0%) โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค โ source โ string โ 3 (30.0%) โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ ๐ LLM-Text Columns โโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโโโโโ โ โ โ โ prompt tokens โ completion tokens โ โ column name โ data type โ number unique values โ per record โ per record โ โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ โ summary โ string โ 10 (100.0%) โ 38.0 +/- 0.0 โ 466.5 +/- 105.7 โ โโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโ โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Table Notes โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ โ โ โ 1. All token statistics are based on a sample of max(1000, len(dataset)) records. โ โ 2. Tokens are calculated using tiktoken's cl100k_base tokenizer. โ โ โ โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โญ๏ธ Next Stepsยถ
Now that you've learned how to use visual context for image summarization in Data Designer, explore more:
- Experiment with different vision models for specific document types
- Try different prompt variations to generate specialized descriptions (e.g., technical details, key findings)
- Combine vision-based summaries with other column types for multi-modal workflows
- Apply this pattern to other vision tasks like image captioning, OCR validation, or visual question answering