🎨 NeMo Data Designer Library
👋 Welcome to the Data Designer community! We're excited to have you here.
Data Designer is a general framework for generating high-quality synthetic data from scratch or using your own seed data as a starting point for domain-grounded data generation.
Why Data Designer?
Generating high-quality synthetic data requires much more than iteratively calling an LLM.
Data Designer is purpose-built to support large-scale, high-quality data generation, including
- Diversity – statistical distributions and variety that reflect real-world data patterns, not repetitive LLM outputs
- Correlations – meaningful relationships between fields that LLMs cannot maintain across independent calls
- Steerability – flexible control over data characteristics throughout the generation process
- Validation – automated quality checks and verification that data meets specifications
- Reproducibility – shareable and reproducible generation workflows
How does it work?
Data Designer helps you create datasets through an intuitive, iterative process:
- ⚙️ Configure your model settings
- Bring your own OpenAI-compatible model providers and models
- Or use the default model providers and models to get started quickly
- Learn more by reading the model docs
-
🏗️ Design your dataset
- Iteratively design your dataset, column by column
- Leverage tools like statistical samplers and LLMs to generate a variety of data types
- Learn more by reading the column docs
-
🔁 Preview your results and iterate
- Generate a preview dataset stored in memory for fast iteration
- Inspect sample records and analysis results to refine your configuration
- Try for yourself by running the tutorial notebooks
- 🖼️ Create your dataset
- Generate your full dataset and save results to disk
- Access the generated dataset and associated artifacts for downstream use
- Give it a try by running the tutorial notebooks
Library and Microservice
Data Designer is available as both an open-source library and a NeMo microservice.
- Open-source Library: Purpose-built for flexibility and customization, prioritizing UX excellence, modularity, and extensibility.
- NeMo Microservice: An enterprise-grade solution that offers a seamless transition from the library, allowing you to leverage other NeMo microservices and generate datasets at scale. See the microservice docs for more details.