🕵️ Your First Anonymization¶

Detect sensitive entities and replace them with LLM-generated substitutes -- the simplest end-to-end example of Anonymizer.

📚 What you'll learn¶

Load a CSV dataset and configure Anonymizer in a few lines
Preview anonymized results on a small sample before committing to a full run
Inspect entity detection and replacement with display_record()
Process the full dataset with run()

Tip: First time running notebooks? Start with setup instructions.

⚙️ Setup¶

Check if your NVIDIA_API_KEY from build.nvidia.com is registered for model access.
Import the core Anonymizer classes: Anonymizer, AnonymizerConfig, AnonymizerInput, and Substitute.
Anonymizer() initializes with the default model provider -- no extra config needed.

In [2]:

Copied!





import getpass
import os

if not os.getenv("NVIDIA_API_KEY"):
    key = getpass.getpass("Enter NVIDIA_API_KEY from build.nvidia.com: ").strip()
    if not key:
        raise RuntimeError("NVIDIA_API_KEY is required to run these notebooks.")
    os.environ["NVIDIA_API_KEY"] = key
import getpass
import os

if not os.getenv("NVIDIA_API_KEY"):
    key = getpass.getpass("Enter NVIDIA_API_KEY from build.nvidia.com: ").strip()
    if not key:
        raise RuntimeError("NVIDIA_API_KEY is required to run these notebooks.")
    os.environ["NVIDIA_API_KEY"] = key

In [3]:

Copied!

from anonymizer import Anonymizer, AnonymizerConfig, AnonymizerInput, Substitute
from anonymizer import Anonymizer, AnonymizerConfig, AnonymizerInput, Substitute

In [4]:

Copied!

anonymizer = Anonymizer()
anonymizer = Anonymizer()

[13:31:16] [INFO] 🔧 Anonymizer initialized with 3 model configs

[13:31:16] [INFO]   |-- 🔎 detector:  gliner-pii-detector

[13:31:16] [INFO]   |-- ✅ validator: gpt-oss-120b

[13:31:16] [INFO]   |-- 🧩 augmenter: gpt-oss-120b

📦 Load data and configure¶

AnonymizerInput points to your CSV and names the text column. data_summary gives the LLM context about the kind of text it will process.
Records up to 2,000 tokens each work with the default model configs.
AnonymizerConfig with Substitute() tells Anonymizer to replace detected entities with LLM-generated synthetic values for names, cities, dates, etc.

In [5]:

Copied!





input_data = AnonymizerInput(
    source="https://raw.githubusercontent.com/NVIDIA-NeMo/Anonymizer/refs/heads/main/docs/data/NVIDIA_synthetic_biographies.csv",
    text_column="biography",
    data_summary="Biographical profiles of individuals",
)

config = AnonymizerConfig(replace=Substitute())
input_data = AnonymizerInput(
    source="https://raw.githubusercontent.com/NVIDIA-NeMo/Anonymizer/refs/heads/main/docs/data/NVIDIA_synthetic_biographies.csv",
    text_column="biography",
    data_summary="Biographical profiles of individuals",
)

config = AnonymizerConfig(replace=Substitute())

👁️ Preview¶

preview() runs on a small sample so you can iterate quickly.
Always preview before processing the full dataset -- it's the fastest way to catch prompt or config issues early.

In [6]:

Copied!

preview = anonymizer.preview(config=config, data=input_data, num_records=3)
preview = anonymizer.preview(config=config, data=input_data, num_records=3)

[13:31:16] [INFO] 📂 Loaded 25 records from https://raw.githubusercontent.com/NVIDIA-NeMo/Anonymizer/refs/heads/main/docs/data/NVIDIA_synthetic_biographies.csv (column: 'biography')

[13:31:16] [INFO] detection labels in scope: (default: 65 labels; see anonymizer.DEFAULT_ENTITY_LABELS for list)

[13:31:16] [INFO]   |-- 👀 Preview mode: processing 3 of 25 records

[13:31:16] [INFO] 🔍 Running entity detection on 3 records

[13:32:00] [INFO]   |-- 📋 Detection complete — 78 entities found across 3 records (0 failed) [44.7s]

[13:32:00] [INFO]   |-- labels: first_name=22, organization_name=7, age=5, occupation=5, city=4, state=4, degree=4, university=4, field_of_study=4, last_name=3, race_ethnicity=3, political_view=3, language=2, religious_belief=2, street_address=2, place_name=1, date_of_birth=1, project_name=1, employment_status=1

[13:32:00] [INFO] 🔄 Running Substitute replacement

[13:32:20] [INFO]   |-- 📋 Replacement complete (0 failed) [19.8s]

[13:32:20] [INFO] 🎉 Pipeline complete — 3 records processed, 0 total failures

🔍 Inspect¶

display_record() shows the original text with highlighted entities, the replacement map, and the anonymized output -- all in one view.
The result dataframe has original and substituted text side-by-side.

In [7]:

Copied!

preview.display_record(0)
preview.display_record(0)

Anonymizer Preview (record 0)

Original

Bobby| first_name Watford| last_name, a 40| age‑year‑old Mexican| race_ethnicity veterinarian| occupation living in Denver| city, Colorado| state, grew up on the outskirts of the city and developed a love for animals early on. After graduating from Jefferson High| organization_name, he earned his DVM| degree at the University of Colorado Boulder| university, where he also completed a research stint in wildlife health| field_of_study. Fluent in English| language, Bobby| first_name has always described his upbringing as a blend of small‑town curiosity and the vibrant culture of his community, values that continue to shape his compassionate approach to animal care.

Since finishing his training, Bobby| first_name has worked at VCA Animal Hospital| organization_name and later at the Colorado Veterinary Clinic| organization_name, where he now leads a busy mixed‑practice team. He identifies as a Christian Democrat| political_view and often volunteers at local shelters, a habit encouraged by his wife, Maya| first_name, and their two teenage children, Aria and Leo| first_name. Outside the clinic, Bobby| first_name enjoys hiking the Rockies| place_name with his family and mentoring veterinary students from his alma mater.

Replaced

Ethan| first_name Kline| last_name, a 45| age‑year‑old Vietnamese| race_ethnicity wildlife rehabilitator| occupation living in Portland| city, Oregon| state, grew up on the outskirts of the city and developed a love for animals early on. After graduating from Lincoln Academy| organization_name, he earned his DDS| degree at the University of Washington Seattle| university, where he also completed a research stint in environmental epidemiology| field_of_study. Fluent in German| language, Ethan| first_name has always described his upbringing as a blend of small‑town curiosity and the vibrant culture of his community, values that continue to shape his compassionate approach to animal care.

Since finishing his training, Ethan| first_name has worked at Paws & Claws Veterinary Hospital| organization_name and later at the Cascade Veterinary Center| organization_name, where he now leads a busy mixed‑practice team. He identifies as a Libertarian| political_view and often volunteers at local shelters, a habit encouraged by his wife, Leila| first_name, and their two teenage children, Mia and Noah| first_name. Outside the clinic, Ethan| first_name enjoys hiking the Sierra Nevada| place_name with his family and mentoring veterinary students from his alma mater.

Replacement Map

Original	Label	Replacement
40	age	45
Aria and Leo	first_name	Mia and Noah
Bobby	first_name	Ethan
Christian Democrat	political_view	Libertarian
Colorado	state	Oregon
Colorado Veterinary Clinic	organization_name	Cascade Veterinary Center
DVM	degree	DDS
Denver	city	Portland
English	language	German
Jefferson High	organization_name	Lincoln Academy
Maya	first_name	Leila
Mexican	race_ethnicity	Vietnamese
Rockies	place_name	Sierra Nevada
University of Colorado Boulder	university	University of Washington Seattle
VCA Animal Hospital	organization_name	Paws & Claws Veterinary Hospital
Watford	last_name	Kline
veterinarian	occupation	wildlife rehabilitator
wildlife health	field_of_study	environmental epidemiology

In [8]:

Copied!

preview.display_record(1)
preview.display_record(1)

Anonymizer Preview (record 1)

Original

Idilio| first_name Bell| last_name is a 37| age‑year‑old astronomer| occupation living in Edison| city, New Jersey| state. Born on November 21, 1988| date_of_birth, he grew up in a bilingual Italian| race_ethnicity household and speaks English| language at home and work. He earned his bachelor’s degree| degree in physics| field_of_study from the University of New Jersey| university and later completed a PhD| degree in astrophysics| field_of_study at Princeton| university, where his dissertation focused on exoplanet atmospheres. After graduation he spent three years at NASA’s Goddard Space Flight Center| organization_name before joining SpaceX| organization_name’s research division, where he now leads a team analyzing data from the Starlink telescope array| project_name. Idilio| first_name describes himself as secular| religious_belief and leans progressive| political_view on most political issues, often volunteering for science outreach programs in his community.

Outside the lab, Idilio| first_name shares a modest house on West Roberts Drive| street_address with his wife, Maya| first_name, and their two young daughters, Lina| first_name and Zara| first_name. His mother, Elena| first_name, lives nearby and still cooks the family’s favorite pasta on Sundays, while his father, Marco| first_name, retired| employment_status from an engineering firm in New York| state. Family gatherings are a mix of lively conversation and stargazing sessions on the backyard deck, where Idilio| first_name points out constellations and tells stories of the cosmos that inspire his children’s curiosity.

Replaced

Santiago| first_name Kumar| last_name is a 42| age‑year‑old planetary scientist| occupation living in Boulder| city, Colorado| state. Born on April 12, 1984| date_of_birth, he grew up in a bilingual Greek| race_ethnicity household and speaks Spanish| language at home and work. He earned his Bachelor of Arts| degree in chemical engineering| field_of_study from the University of Colorado| university and later completed a Doctor of Engineering| degree in marine biology| field_of_study at Stanford University| university, where his dissertation focused on exoplanet atmospheres. After graduation he spent three years at European Space Agency’s European Space Research and Technology Centre| organization_name before joining Blue Origin| organization_name’s research division, where he now leads a team analyzing data from the Aurora satellite constellation| project_name. Santiago| first_name describes himself as agnostic| religious_belief and leans moderate| political_view on most political issues, often volunteering for science outreach programs in his community.

Outside the lab, Santiago| first_name shares a modest house on East Monroe Avenue| street_address with his wife, Priya| first_name, and their two young daughters, Aisha| first_name and Nadia| first_name. His mother, Sofia| first_name, lives nearby and still cooks the family’s favorite pasta on Sundays, while his father, Victor| first_name, consultant| employment_status from an engineering firm in Illinois| state. Family gatherings are a mix of lively conversation and stargazing sessions on the backyard deck, where Santiago| first_name points out constellations and tells stories of the cosmos that inspire his children’s curiosity.

Replacement Map

Original	Label	Replacement
37	age	42
Bell	last_name	Kumar
Edison	city	Boulder
Elena	first_name	Sofia
English	language	Spanish
Idilio	first_name	Santiago
Italian	race_ethnicity	Greek
Lina	first_name	Aisha
Marco	first_name	Victor
Maya	first_name	Priya
NASA’s Goddard Space Flight Center	organization_name	European Space Agency’s European Space Research and Technology Centre
New Jersey	state	Colorado
New York	state	Illinois
November 21, 1988	date_of_birth	April 12, 1984
PhD	degree	Doctor of Engineering
Princeton	university	Stanford University
SpaceX	organization_name	Blue Origin
Starlink telescope array	project_name	Aurora satellite constellation
University of New Jersey	university	University of Colorado
West Roberts Drive	street_address	East Monroe Avenue
Zara	first_name	Nadia
astronomer	occupation	planetary scientist
bachelor’s degree	degree	Bachelor of Arts
in astrophysics	field_of_study	in marine biology
physics	field_of_study	chemical engineering
progressive	political_view	moderate
retired	employment_status	consultant
secular	religious_belief	agnostic

In [9]:

Copied!

preview.dataframe
preview.dataframe

Out[9]:

	biography	biography_with_spans	final_entities	biography_replaced
0	Bobby Watford, a 40‑year‑old Mexican veterinar...	<first_name>Bobby</first_name> <last_name>Watf...	{'entities': [{'end_position': 5, 'id': 'first...	Ethan Kline, a 45‑year‑old Vietnamese wildlife...
1	Idilio Bell is a 37‑year‑old astronomer living...	<first_name>Idilio</first_name> <last_name>Bel...	{'entities': [{'end_position': 6, 'id': 'first...	Santiago Kumar is a 42‑year‑old planetary scie...
2	Jodi Allison, 36, lives at 204 Bluegrass in Cl...	<first_name>Jodi</first_name> <last_name>Allis...	{'entities': [{'end_position': 4, 'id': 'first...	Leah Keller, 42, lives at 317 Maplewood in Ale...

🚀 Full run¶

run() processes the entire dataset with the same config you previewed.
Access the output via result.dataframe.

In [10]:

Copied!

result = anonymizer.run(config=config, data=input_data)
print(result)
result = anonymizer.run(config=config, data=input_data)
print(result)

[13:32:20] [INFO] 📂 Loaded 25 records from https://raw.githubusercontent.com/NVIDIA-NeMo/Anonymizer/refs/heads/main/docs/data/NVIDIA_synthetic_biographies.csv (column: 'biography')

[13:32:20] [INFO] detection labels in scope: (default: 65 labels; see anonymizer.DEFAULT_ENTITY_LABELS for list)

[13:32:20] [INFO] 🔍 Running entity detection on 25 records

[13:37:23] [INFO]   |-- 📋 Detection complete — 666 entities found across 25 records (0 failed) [303.2s]

[13:37:23] [INFO]   |-- labels: first_name=154, organization_name=62, occupation=47, city=41, university=36, field_of_study=34, race_ethnicity=30, last_name=27, state=27, age=26, degree=25, political_view=25, religious_belief=25, street_address=23, language=19, place_name=15, employment_status=11, county=11, date_of_birth=9, education_level=7, date=5, company_name=4, country=1, gender=1, postcode=1

[13:37:23] [INFO] 🔄 Running Substitute replacement

/Users/lramaswamy/Documents/github/Anonymizer/src/anonymizer/engine/row_partitioning.py:42: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  pd.concat(list(parts), ignore_index=True)
[13:39:33] [INFO]   |-- 📋 Replacement complete (0 failed) [129.3s]

[13:39:33] [INFO] 🎉 Pipeline complete — 25 records processed, 0 total failures

AnonymizerResult(rows=25, columns=4, trace_columns=21, failed_records=0)

In [11]:

Copied!

result.dataframe.head()
result.dataframe.head()

Out[11]:

	biography	biography_with_spans	final_entities	biography_replaced
0	Bobby Watford, a 40‑year‑old Mexican veterinar...	<first_name>Bobby</first_name> <last_name>Watf...	{'entities': array([{'end_position': 5, 'id': ...	Ethan Hawthorne, a 45‑year‑old Vietnamese mari...
1	Idilio Bell is a 37‑year‑old astronomer living...	<first_name>Idilio</first_name> <last_name>Bel...	{'entities': array([{'end_position': 6, 'id': ...	Mateo Kline is a 36‑year‑old geophysicist livi...
2	Jodi Allison, 36, lives at 204 Bluegrass in Cl...	<first_name>Jodi</first_name> <last_name>Allis...	{'entities': array([{'end_position': 4, 'id': ...	Leah Harper, 42, lives at 312 Magnolia in Sava...
3	James Mills is a 69‑year‑old paramedic who liv...	<first_name>James</first_name> <last_name>Mill...	{'entities': array([{'end_position': 5, 'id': ...	Robert Harper is a 71‑year‑old firefighter who...
4	Nancy Burton is a 21‑year‑old cashier who live...	<first_name>Nancy</first_name> <last_name>Burt...	{'entities': array([{'end_position': 5, 'id': ...	Aisha Khan is a 22‑year‑old stock clerk who li...

⏭️ Next steps¶

🔍 Inspecting Detected Entities -- dig into what the detection pipeline found and debug quality.
🎯 Choosing a Replacement Strategy -- compare Redact, Annotate, Hash, and Substitute side-by-side.
✏️ Rewriting Biographies -- generate privacy-safe paraphrases instead of token-level replacements.