NeMo Safe Synthesizer¶

NeMo Safe Synthesizer creates private, safe versions of sensitive tabular datasets -- entirely synthetic data with no one-to-one mapping to your original records. It uses LLM fine-tuning with optional differential privacy to produce high-quality datasets that preserve the statistical properties and utility of your data for downstream AI tasks while ensuring privacy compliance and protecting sensitive information.

Key Features¶

Privacy-first synthetic data -- PII detection and replacement, optional differential privacy while fine-tuning via Opacus
LLM fine-tuning -- LoRA fine-tuning optimized for tabular data, including numeric, categorical, and text columns
Fast inference -- vLLM-powered generation with optional structured output enforcement
Comprehensive evaluation -- Privacy and quality metrics in an in-depth HTML report
Flexible interfaces -- CLI for scripting, Python SDK for programmatic workflows, YAML configuration

System Requirements

NeMo Safe Synthesizer requires a Linux machine with an NVIDIA GPU (A100 80GB+ recommended) and CUDA 12.8+ to run the training and generation pipeline. macOS, Windows, and Apple Silicon are not supported for pipeline execution. A CPU-only install is available for development and configuration validation -- see Getting Started.

Next Steps¶

Getting Started

Install the package, set up your environment, and run your first synthetic data pipeline in minutes.

Getting Started
Product Overview

Learn about the pipeline steps: replace PII, synthesize data, evaluate.

Product Overview
Tutorials

Follow hands-on tutorials to generate synthetic data.

Tutorials
User Guide

Configure and run the pipeline via YAML, CLI, SDK, or environment variables.

User Guide
Developer Guide

Browse the auto-generated API reference and dive into the architecture details.

Developer Guide
Developer Notes

Read developer blog posts and check release notes.

Developer Notes

Contact¶

License¶

NeMo Safe Synthesizer is licensed under the Apache License 2.0.