Skip to content

🎨✨ Contributing to NeMo Data Designer 🎨✨

Thank you for your interest in contributing to Data Designer!

We welcome contributions from the community and sincerely appreciate your efforts to improve the project. Whether you're fixing a typo, reporting a bug, proposing a new feature, or implementing a major enhancement, your work helps make Data Designer better for everyone πŸŽ‰.

This guide will help you get started with the contribution process.

Table of Contents

Getting Started

πŸ‘‹ Welcome to the Data Designer community! We're excited to have you here.

Whether you're new to the project or ready to dive in, the resources below will help you get oriented and productive quickly:

  1. README.md – best place to start to learn the basics of the project

  2. AGENTS.md – context and instructions to help AI coding agents work on Data Designer (it's also useful for human developers!)

  3. Documentation – detailed documentation on Data Designer's capabilities and usage

Ways to Contribute

There are many ways to contribute to Data Designer:

πŸ› Bug Fixes

Found a bug? Before reporting, please 1. Verify you're using the latest version: uv pip install --upgrade data-designer 2. Search for duplicates in the issue tracker

When creating a bug report, please include: - Data Designer version - Python version and operating system - Minimal reproducible example - Expected vs. actual behavior - Full error messages and stack traces

If you are interested in fixing the bug yourself, that's AWESOME! Please follow the development guide to get started.

✨ Feature Implementation

Want to add new functionality? Great! Please review our development approach and open a feature request to discuss the idea and get feedback before investing significant time on the implementation.

πŸ“– Documentation Improvements

Documentation is crucial for user adoption. Contributions that clarify usage, add examples, or fix typos are highly valued.

πŸ’‘ Examples and Tutorials

Share your use cases! Example notebooks and tutorials help others understand how to leverage Data Designer effectively.

πŸ§ͺ Test Coverage

Help us improve test coverage by adding tests for untested code paths or edge cases.

Feature Requests

Data Designer is designed to be as flexible and extensible as possible, and we welcome your ideas for pushing its capabilities even further! To keep the core library maintainable, while also supporting innovation, we take an incremental approach when adding new features – we explore what's already possible, extend through plugins when needed, and integrate the most broadly useful features into the core library:

How We Grow Data Designer

  1. πŸ§— Explore what's possible: Can your use case be achieved with current features? We've designed Data Designer to be composable – sometimes creative combinations of existing tools can accomplish what you need. Check out our examples or open an issue if you'd like help exploring this!

  2. πŸ”Œ Extend through plugins: If existing features aren't quite enough, consider implementing your idea as a plugin that extends the core library. Plugins let you experiment and share functionality while keeping the core library focused.

  3. βš™οΈ Integrate into the core library: If your feature or plugin proves broadly useful and aligns with Data Designer's goals, we'd love to integrate it into the core library! We're happy to discuss whether it's a good fit and how to move forward together.

This approach helps us grow thoughtfully while keeping Data Designer focused and maintainable.

Submitting a Feature Request

Open a new issue with:

  • Clear title: Concise description of the feature
  • Use case: Explain what problem this solves and why it's important
  • Proposed solution: Describe how you envision the feature working
  • Alternatives considered: Other approaches you've thought about
  • Examples: Code examples or mockups of how users would interact with the feature
  • Willingness to implement: Are you interested in implementing this yourself?

Development Guide

Data Designer uses uv for dependency management. If you don't have uv installed, follow their installation instructions.

Initial Setup

  1. Create or find an issue

    Before starting work, ensure there's an issue tracking your contribution:

  2. Fork and clone the repository

    Start by forking the Data Designer repository, then clone your fork and add the upstream remote:

    git clone https://github.com/YOUR_GITHUB_USERNAME/DataDesigner.git
    
    cd DataDesigner
    
    git remote add upstream https://github.com/NVIDIA-NeMo/DataDesigner.git
    
  3. Install dependencies

    # Install project with dev dependencies
    make install-dev
    
    # Or, if you use Jupyter / IPython for development
    make install-dev-notebooks
    
  4. Verify your setup

    make test && make check-all
    

    If no errors are reported, you're ready to develop πŸš€

Making Changes

  1. Create a feature branch

    git checkout main
    git pull upstream main
    git checkout -b <username>/<type-of-change>/<issue-number>-<short-description>
    

    Example types of change:

    • feat for new features
    • fix for bug fixes
    • docs for documentation updates
    • test for testing changes
    • refactor for code refactoring
    • chore for chore tasks
    • style for style changes
    • perf for performance improvements

    Example branch name:

    • johnnygreco/feat/123-add-xyz-generator for a new feature by @johnnygreco, addressing issue #123
  2. Develop your changes

    Please follow the patterns and conventions used throughout the codebase, as well as those outlined in AGENTS.md.

  3. Test and validate

    make check-all-fix  # Format code and fix linting issues
    make test           # Run all tests
    make coverage       # Check test coverage (must be >90%)
    

    Writing tests: Place tests in tests/ mirroring the source structure. Use fixtures from tests/conftest.py, mock external services with unittest.mock or pytest-httpx, and test both success and failure cases. See AGENTS.md for patterns and examples.

  4. Commit your work

    Write clear, descriptive commit messages, optionally including a brief summary (50 characters or less) and reference issue numbers when applicable (e.g., "Fixes #123").

    git commit -m "Add XYZ generator for synthetic data" -m "Fixes #123"
    
  5. Stay up to date

    Regularly sync your branch with upstream changes:

    git fetch upstream
    git merge upstream/main
    

Submitting Changes

Before Submitting

Ensure your changes meet the following criteria:

  • All tests pass (make test)
  • Code is formatted and linted (make check-all-fix)
  • New functionality includes tests
  • Documentation is updated (README, docstrings, examples)
  • License headers are present on all new files
  • Commit messages are clear and descriptive

Creating a Pull Request

  1. Push your changes to your fork:

    git push origin <username>/<type-of-change>/<issue-number>-<short-description>
    
  2. Open a pull request on GitHub from your fork to the main repository

  3. Respond to review feedback update your PR as needed

Pull Request Review Process

  • Maintainers will review your PR and may request changes
  • Address feedback by pushing additional commits to your branch
  • Reply to the feedback comment with a link to the commit that addresses it.
  • Once approved, a maintainer will merge your PR
  • Your contribution will be included in the next release!

Code of Conduct

Data Designer follows the Contributor Covenant Code of Conduct. We are committed to providing a welcoming and inclusive environment for all contributors.

Please read our complete Code of Conduct for full details on our standards and expectations.

License File Headers

All code files that are added to this repository must include the appropriate NVIDIA copyright header:

# SPDX-FileCopyrightText: Copyright (c) {YEAR} NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0

Use make update-license-headers to add headers automatically.

Signing off on your work

When contributing to this project, you must agree that you have authored 100% of the content, that you have the necessary rights to the content and that the content you contribute may be provided under the project license. All contributors are asked to sign the Data Designer Developer Certificate of Origin (DCO) when submitting their first pull request. The process is automated by a bot that will comment on the pull request. Our DCO is the same as the Linux Foundation requires its contributors to sign.


Thank you for contributing to NeMo Data Designer! Your efforts help make synthetic data generation more accessible and powerful for everyone. 🎨✨