🎨 Data Designer Tutorial: Generating Images¶
📚 What you'll learn¶
This notebook shows how to generate synthetic image data with Data Designer using image-generation models.
- 🖼️ Image generation columns: Add columns that produce images from text prompts
- 📝 Jinja2 prompts: Drive diversity by referencing other columns in your prompt template
- 💾 Preview vs create: Preview stores base64 in the dataframe; create saves images to disk and stores paths
Data Designer supports both diffusion (e.g. DALL·E, Stable Diffusion, Imagen) and autoregressive (e.g. Gemini image, GPT image) models.
Prerequisites: This tutorial uses OpenRouter with the Flux 2 Pro image model. Set
OPENROUTER_API_KEYin your environment before running.
If this is your first time using Data Designer, we recommend starting with the first notebook in this tutorial series.
📦 Import Data Designer¶
data_designer.configprovides the configuration API.DataDesigneris the main interface for generation.
from IPython.display import Image as IPImage
from IPython.display import display
import data_designer.config as dd
from data_designer.interface import DataDesigner
⚙️ Initialize the Data Designer interface¶
We initialize Data Designer without arguments here—the image model is configured explicitly in the next cell. No default text model is needed for this tutorial.
data_designer = DataDesigner()
🎛️ Define an image-generation model¶
- Use
ImageInferenceParamsso Data Designer treats this model as an image generator. - Image options (size, quality, aspect ratio, etc.) are model-specific; pass them via
extra_body.
MODEL_PROVIDER = "openrouter"
MODEL_ID = "black-forest-labs/flux.2-pro"
MODEL_ALIAS = "image-model"
model_configs = [
dd.ModelConfig(
alias=MODEL_ALIAS,
model=MODEL_ID,
provider=MODEL_PROVIDER,
inference_parameters=dd.ImageInferenceParams(
extra_body={"height": 512, "width": 512},
),
)
]
🏗️ Build the config: samplers + image column¶
We'll generate diverse dog portrait images: sampler columns drive subject (breed), age, style, look direction, and emotion. The image-generation column uses a Jinja2 prompt that references all of them.
config_builder = dd.DataDesignerConfigBuilder(model_configs=model_configs)
config_builder.add_column(
dd.SamplerColumnConfig(
name="style",
sampler_type=dd.SamplerType.CATEGORY,
params=dd.CategorySamplerParams(
values=[
"photorealistic",
"oil painting",
"watercolor",
"digital art",
"sketch",
"anime",
],
),
)
)
config_builder.add_column(
dd.SamplerColumnConfig(
name="dog_breed",
sampler_type=dd.SamplerType.CATEGORY,
params=dd.CategorySamplerParams(
values=[
"a Golden Retriever",
"a German Shepherd",
"a Labrador Retriever",
"a Bulldog",
"a Beagle",
"a Poodle",
"a Corgi",
"a Siberian Husky",
"a Dalmatian",
"a Yorkshire Terrier",
"a Boxer",
"a Dachshund",
"a Doberman Pinscher",
"a Shih Tzu",
"a Chihuahua",
"a Border Collie",
"an Australian Shepherd",
"a Cocker Spaniel",
"a Maltese",
"a Pomeranian",
"a Saint Bernard",
"a Great Dane",
"an Akita",
"a Samoyed",
"a Boston Terrier",
],
),
)
)
config_builder.add_column(
dd.SamplerColumnConfig(
name="cat_breed",
sampler_type=dd.SamplerType.CATEGORY,
params=dd.CategorySamplerParams(
values=[
"a Persian",
"a Maine Coon",
"a Siamese",
"a Ragdoll",
"a Bengal",
"an Abyssinian",
"a British Shorthair",
"a Sphynx",
"a Scottish Fold",
"a Russian Blue",
"a Birman",
"an Oriental Shorthair",
"a Norwegian Forest Cat",
"a Devon Rex",
"a Burmese",
"an Egyptian Mau",
"a Tonkinese",
"a Himalayan",
"a Savannah",
"a Chartreux",
"a Somali",
"a Manx",
"a Turkish Angora",
"a Balinese",
"an American Shorthair",
],
),
)
)
config_builder.add_column(
dd.SamplerColumnConfig(
name="dog_age",
sampler_type=dd.SamplerType.CATEGORY,
params=dd.CategorySamplerParams(
values=["1-3", "3-6", "6-9", "9-12", "12-15"],
),
)
)
config_builder.add_column(
dd.SamplerColumnConfig(
name="cat_age",
sampler_type=dd.SamplerType.CATEGORY,
params=dd.CategorySamplerParams(
values=["1-3", "3-6", "6-9", "9-12", "12-18"],
),
)
)
config_builder.add_column(
dd.SamplerColumnConfig(
name="dog_look_direction",
sampler_type=dd.SamplerType.CATEGORY,
params=dd.CategorySamplerParams(
values=["left", "right", "front", "up", "down"],
),
)
)
config_builder.add_column(
dd.SamplerColumnConfig(
name="cat_look_direction",
sampler_type=dd.SamplerType.CATEGORY,
params=dd.CategorySamplerParams(
values=["left", "right", "front", "up", "down"],
),
)
)
config_builder.add_column(
dd.SamplerColumnConfig(
name="dog_emotion",
sampler_type=dd.SamplerType.CATEGORY,
params=dd.CategorySamplerParams(
values=["happy", "curious", "serious", "sleepy", "excited"],
),
)
)
config_builder.add_column(
dd.SamplerColumnConfig(
name="cat_emotion",
sampler_type=dd.SamplerType.CATEGORY,
params=dd.CategorySamplerParams(
values=["aloof", "curious", "content", "sleepy", "playful"],
),
)
)
config_builder.add_column(
dd.ImageColumnConfig(
name="generated_image",
prompt=(
"""
A {{ style }} family pet portrait of a {{ dog_breed }} dog of {{ dog_age }} years old looking {{dog_look_direction}} with an {{ dog_emotion }} expression and
{{ cat_breed }} cat of {{ cat_age }} years old looking {{ cat_look_direction }} with an {{ cat_emotion }} expression in the background. Both subjects should be in focus.
"""
),
model_alias=MODEL_ALIAS,
)
)
data_designer.validate(config_builder)
[12:09:40] [INFO] ✅ Validation passed
🔁 Preview: images as base64¶
In preview mode, generated images are stored as base64 strings in the dataframe. Run the next cell to step through each record (images are shown in the sample record display, but only in a notebook environment).
preview = data_designer.preview(config_builder, num_records=2)
[12:09:40] [INFO] 🔁 Preview generation in progress
[12:09:40] [INFO] ✅ Validation passed
[12:09:41] [INFO] ⛓️ Sorting column configs into a Directed Acyclic Graph
[12:09:41] [INFO] 🩺 Running health checks for models...
[12:09:41] [INFO] |-- 👀 Checking 'black-forest-labs/flux.2-pro' in provider named 'openrouter' for model alias 'image-model'...
[12:09:49] [INFO] |-- ✅ Passed!
[12:09:49] [INFO] 🎲 Preparing samplers to generate 2 records across 9 columns
[12:09:49] [INFO] 🖼️ image model config for column 'generated_image'
[12:09:49] [INFO] |-- model: 'black-forest-labs/flux.2-pro'
[12:09:49] [INFO] |-- model alias: 'image-model'
[12:09:49] [INFO] |-- model provider: 'openrouter'
[12:09:49] [INFO] |-- inference parameters:
[12:09:49] [INFO] | |-- generation_type=image
[12:09:49] [INFO] | |-- max_parallel_requests=4
[12:09:49] [INFO] | |-- extra_body={'height': 512, 'width': 512}
[12:09:49] [INFO] ⚡️ Processing image column 'generated_image' with 4 concurrent workers
[12:09:49] [INFO] ⏱️ image column 'generated_image' will report progress after each record
[12:09:59] [INFO] |-- 🐥 image column 'generated_image' progress: 1/2 (50%) complete, 1 ok, 0 failed, 0.10 rec/s, eta 10.0s
[12:10:04] [INFO] |-- 🐔 image column 'generated_image' progress: 2/2 (100%) complete, 2 ok, 0 failed, 0.14 rec/s, eta 0.0s
[12:10:04] [INFO] 📊 Model usage summary:
[12:10:04] [INFO] |-- model: black-forest-labs/flux.2-pro
[12:10:04] [INFO] |-- tokens: input=0, output=0, total=0, tps=0
[12:10:04] [INFO] |-- requests: success=2, failed=0, total=2, rpm=8
[12:10:04] [INFO] |-- images: total=2
[12:10:04] [INFO] 📐 Measuring dataset column statistics:
[12:10:04] [INFO] |-- 🎲 column: 'style'
[12:10:04] [INFO] |-- 🎲 column: 'dog_breed'
[12:10:04] [INFO] |-- 🎲 column: 'cat_breed'
[12:10:04] [INFO] |-- 🎲 column: 'dog_age'
[12:10:04] [INFO] |-- 🎲 column: 'cat_age'
[12:10:04] [INFO] |-- 🎲 column: 'dog_look_direction'
[12:10:04] [INFO] |-- 🎲 column: 'cat_look_direction'
[12:10:04] [INFO] |-- 🎲 column: 'dog_emotion'
[12:10:04] [INFO] |-- 🎲 column: 'cat_emotion'
[12:10:04] [INFO] |-- 🖼️ column: 'generated_image'
[12:10:04] [INFO] 👏 Preview complete!
for i in range(len(preview.dataset)):
preview.display_sample_record()
Generated Columns ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Name ┃ Value ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ style │ oil painting │ ├─────────────────────────────────────────────────┼──────────────────────────────────────────────────────────┤ │ dog_breed │ a Beagle │ ├─────────────────────────────────────────────────┼──────────────────────────────────────────────────────────┤ │ cat_breed │ a Norwegian Forest Cat │ ├─────────────────────────────────────────────────┼──────────────────────────────────────────────────────────┤ │ dog_age │ 12-15 │ ├─────────────────────────────────────────────────┼──────────────────────────────────────────────────────────┤ │ cat_age │ 12-18 │ ├─────────────────────────────────────────────────┼──────────────────────────────────────────────────────────┤ │ dog_look_direction │ right │ ├─────────────────────────────────────────────────┼──────────────────────────────────────────────────────────┤ │ cat_look_direction │ left │ ├─────────────────────────────────────────────────┼──────────────────────────────────────────────────────────┤ │ dog_emotion │ sleepy │ ├─────────────────────────────────────────────────┼──────────────────────────────────────────────────────────┤ │ cat_emotion │ aloof │ └─────────────────────────────────────────────────┴──────────────────────────────────────────────────────────┘ Images ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Name ┃ Preview ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ generated_image │ [0] <base64, 2117608 chars> │ └────────────────────────────────────────┴───────────────────────────────────────────────────────────────────┘ [index: 0]
Generated Columns ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Name ┃ Value ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ style │ digital art │ ├───────────────────────────────────────────────────────┼────────────────────────────────────────────────────┤ │ dog_breed │ a German Shepherd │ ├───────────────────────────────────────────────────────┼────────────────────────────────────────────────────┤ │ cat_breed │ a Russian Blue │ ├───────────────────────────────────────────────────────┼────────────────────────────────────────────────────┤ │ dog_age │ 3-6 │ ├───────────────────────────────────────────────────────┼────────────────────────────────────────────────────┤ │ cat_age │ 6-9 │ ├───────────────────────────────────────────────────────┼────────────────────────────────────────────────────┤ │ dog_look_direction │ down │ ├───────────────────────────────────────────────────────┼────────────────────────────────────────────────────┤ │ cat_look_direction │ right │ ├───────────────────────────────────────────────────────┼────────────────────────────────────────────────────┤ │ dog_emotion │ happy │ ├───────────────────────────────────────────────────────┼────────────────────────────────────────────────────┤ │ cat_emotion │ curious │ └───────────────────────────────────────────────────────┴────────────────────────────────────────────────────┘ Images ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Name ┃ Preview ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ generated_image │ [0] <base64, 1588420 chars> │ └────────────────────────────────────────┴───────────────────────────────────────────────────────────────────┘ [index: 1]
preview.dataset
| style | dog_breed | cat_breed | dog_age | cat_age | dog_look_direction | cat_look_direction | dog_emotion | cat_emotion | generated_image | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | oil painting | a Beagle | a Norwegian Forest Cat | 12-15 | 12-18 | right | left | sleepy | aloof | [iVBORw0KGgoAAAANSUhEUgAABAAAAAMACAIAAAA12IJaA... |
| 1 | digital art | a German Shepherd | a Russian Blue | 3-6 | 6-9 | down | right | happy | curious | [iVBORw0KGgoAAAANSUhEUgAABAAAAAMACAIAAAA12IJaA... |
🆙 Create: images saved to disk¶
In create mode, images are written to an images/ folder with UUID filenames; the dataframe stores relative paths (e.g. images/1d16b6e2-562f-4f51-91e5-baaa999ea916.png).
results = data_designer.create(config_builder, num_records=2, dataset_name="tutorial-5-images")
[12:10:04] [INFO] 🎨 Creating Data Designer dataset
[12:10:04] [INFO] ✅ Validation passed
[12:10:04] [INFO] ⛓️ Sorting column configs into a Directed Acyclic Graph
[12:10:04] [INFO] 🩺 Running health checks for models...
[12:10:04] [INFO] |-- 👀 Checking 'black-forest-labs/flux.2-pro' in provider named 'openrouter' for model alias 'image-model'...
[12:10:13] [INFO] |-- ✅ Passed!
[12:10:13] [INFO] ⏳ Processing batch 1 of 1
[12:10:13] [INFO] 🎲 Preparing samplers to generate 2 records across 9 columns
[12:10:13] [INFO] 🖼️ image model config for column 'generated_image'
[12:10:13] [INFO] |-- model: 'black-forest-labs/flux.2-pro'
[12:10:13] [INFO] |-- model alias: 'image-model'
[12:10:13] [INFO] |-- model provider: 'openrouter'
[12:10:13] [INFO] |-- inference parameters:
[12:10:13] [INFO] | |-- generation_type=image
[12:10:13] [INFO] | |-- max_parallel_requests=4
[12:10:13] [INFO] | |-- extra_body={'height': 512, 'width': 512}
[12:10:13] [INFO] ⚡️ Processing image column 'generated_image' with 4 concurrent workers
[12:10:13] [INFO] ⏱️ image column 'generated_image' will report progress after each record
[12:10:26] [INFO] |-- 🚗 image column 'generated_image' progress: 1/2 (50%) complete, 1 ok, 0 failed, 0.08 rec/s, eta 12.7s
[12:10:26] [INFO] |-- 🚀 image column 'generated_image' progress: 2/2 (100%) complete, 2 ok, 0 failed, 0.16 rec/s, eta 0.0s
[12:10:26] [INFO] 📊 Model usage summary:
[12:10:26] [INFO] |-- model: black-forest-labs/flux.2-pro
[12:10:26] [INFO] |-- tokens: input=0, output=0, total=0, tps=0
[12:10:26] [INFO] |-- requests: success=2, failed=0, total=2, rpm=9
[12:10:26] [INFO] |-- images: total=2
[12:10:26] [INFO] 📐 Measuring dataset column statistics:
[12:10:26] [INFO] |-- 🎲 column: 'style'
[12:10:26] [INFO] |-- 🎲 column: 'dog_breed'
[12:10:26] [INFO] |-- 🎲 column: 'cat_breed'
[12:10:26] [INFO] |-- 🎲 column: 'dog_age'
[12:10:26] [INFO] |-- 🎲 column: 'cat_age'
[12:10:26] [INFO] |-- 🎲 column: 'dog_look_direction'
[12:10:26] [INFO] |-- 🎲 column: 'cat_look_direction'
[12:10:26] [INFO] |-- 🎲 column: 'dog_emotion'
[12:10:26] [INFO] |-- 🎲 column: 'cat_emotion'
[12:10:26] [INFO] |-- 🖼️ column: 'generated_image'
dataset = results.load_dataset()
dataset.head()
| style | dog_breed | cat_breed | dog_age | cat_age | dog_look_direction | cat_look_direction | dog_emotion | cat_emotion | generated_image | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | anime | a Doberman Pinscher | a Turkish Angora | 3-6 | 9-12 | up | down | curious | sleepy | ['images/generated_image/8e027c73-e364-4aee-a6... |
| 1 | digital art | a Great Dane | a Turkish Angora | 12-15 | 3-6 | front | down | sleepy | playful | ['images/generated_image/104d3c11-e6a9-42f2-a4... |
# Display all images from the created dataset. Paths are relative to the artifact output directory.
for index, row in dataset.iterrows():
path_or_list = row.get("generated_image")
if path_or_list is not None:
paths = path_or_list if not isinstance(path_or_list, str) else [path_or_list]
for path in paths:
full_path = results.artifact_storage.base_dataset_path / path
display(IPImage(filename=str(full_path)))
⏭️ Next steps¶
- The basics: samplers and LLM text columns
- Structured outputs and Jinja
- Seeding with a dataset
- Providing images as context
- Image-to-image editing: edit existing images with seed datasets