Tutorials¶
These tutorials cover the two user-facing surfaces of the Anonymizer plugin: the streaming preview workflow for iteration, and the run job for full datasets.
Library vs. Service¶
Anonymizer separates configuration (what to detect and how to replace it) from execution (where the work runs and how models are reached).
Part 1: Build the config (library)
Use anonymizer.config to define the rewrite or replacement strategy and detection options. This code is identical whether you run Anonymizer standalone or through the NeMo Platform service.
from anonymizer.config.anonymizer_config import AnonymizerConfig
from anonymizer.config.replace_strategies import Redact
config = AnonymizerConfig(
replace=Redact(format_template="[REDACTED_{label}]"),
)
Part 2: Execute (platform)
Submit the config to the Anonymizer service. The plugin owns the request shape (PreviewRequest, AnonymizerRequest) so it can also describe the input source and model routing:
import os
from anonymizer.config.anonymizer_config import AnonymizerConfig
from anonymizer.config.replace_strategies import Redact
from data_designer.config import ModelConfig
from nemo_anonymizer_plugin.app.input import AnonymizerInputSpec
from nemo_anonymizer_plugin.app.task_config import PreviewRequest
from nemo_platform import NeMoPlatform
WORKSPACE = os.environ.get("NMP_WORKSPACE", "default")
MODEL_PROVIDER = os.environ.get("NMP_ANON_PROVIDER", "nvidia-build")
config = AnonymizerConfig(
replace=Redact(format_template="[REDACTED_{label}]"),
)
model_configs = [
ModelConfig(alias="gliner-pii-detector", provider=MODEL_PROVIDER, model="nvidia/gliner-pii"),
ModelConfig(alias="gpt-oss-120b", provider=MODEL_PROVIDER, model="openai/gpt-oss-120b"),
ModelConfig(alias="nemotron-30b-thinking", provider=MODEL_PROVIDER, model="nvidia/nemotron-3-nano-30b-a3b"),
]
sdk = NeMoPlatform(
base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
workspace=WORKSPACE,
)
anonymizer = sdk.anonymizer
preview = anonymizer.preview(PreviewRequest(
config=config,
data=AnonymizerInputSpec(
source=f"fileset://{WORKSPACE}/anonymizer-inputs#anonymizer-input.csv",
text_column="biography",
id_column="id",
),
model_configs=model_configs,
num_records=10,
))
Service-Specific Considerations¶
When using Anonymizer as a NeMo Platform service:
| Feature | Difference | Details |
|---|---|---|
| Inference | Routes through the Inference Gateway | Configure providers once and reference them by name from model_configs. |
| Input data | Filesets and HTTP(S) URLs (local paths only in local CLI execution) | Use sdk.files.filesets.create / sdk.files.upload, then reference with #<path>. |
| Artifacts | Local or platform-managed | run run writes to persistent/results/artifacts locally; run submit stores artifacts in NeMo Platform job storage. |
Prerequisites¶
Before starting these tutorials, complete the Quick Start to:
- Install the plugin and verify the
nemo anonymizerCLI. - Configure an inference provider used in
model_configs. - Create a fileset and upload a CSV containing PII.
Tutorials¶
-
Stream a small anonymized sample to iterate on
AnonymizerConfigandmodel_configs. Coverssdk.anonymizer.preview,nemo anonymizer preview run/preview submit, and the NDJSON frame stream.beginner anonymizer
-
Run the full pipeline locally with
nemo anonymizer run runor submit it to the Jobs worker withnemo anonymizer run submit. Loaddataset.parquet,trace.parquet, andfailed_records.jsonartifacts.intermediate anonymizer