NeMo Safe Synthesizer¶
NeMo Safe Synthesizer creates private, safe versions of sensitive tabular datasets -- entirely synthetic data with no one-to-one mapping to your original records. It uses LLM fine-tuning with optional differential privacy to produce high-quality datasets that preserve the statistical properties and utility of your data for downstream AI tasks while ensuring privacy compliance and protecting sensitive information.
Key Features¶
- Privacy-first synthetic data -- PII detection and replacement, optional differential privacy while fine-tuning via Opacus
- LLM fine-tuning -- LoRA fine-tuning optimized for tabular data, including numeric, categorical, and text columns
- Fast inference -- vLLM-powered generation with optional structured output enforcement
- Comprehensive evaluation -- Privacy and quality metrics in an in-depth HTML report
- Flexible interfaces -- CLI for scripting, Python SDK for programmatic workflows, YAML configuration
System Requirements
NeMo Safe Synthesizer requires a Linux machine with an NVIDIA GPU (A100 80GB+ recommended) and CUDA 12.8+ to run the training and generation pipeline. macOS, Windows, and Apple Silicon are not supported for pipeline execution. A CPU-only install is available for development and configuration validation -- see Getting Started.
Next Steps¶
-
Getting Started
Install the package, set up your environment, and run your first synthetic data pipeline in minutes.
-
Product Overview
Learn about the pipeline steps: replace PII, synthesize data, evaluate.
-
Tutorials
Follow hands-on tutorials to generate synthetic data.
-
User Guide
Configure and run the pipeline via YAML, CLI, SDK, or environment variables.
-
Developer Guide
Browse the auto-generated API reference and dive into the architecture details.
-
Developer Notes
Read developer blog posts and check release notes.
Contact¶
License¶
NeMo Safe Synthesizer is licensed under the Apache License 2.0.