Tutorials¶

These tutorials cover the two user-facing surfaces of the Anonymizer plugin: the streaming preview workflow for iteration, and the run job for full datasets.

Library vs. Service¶

Anonymizer separates configuration (what to detect and how to replace it) from execution (where the work runs and how models are reached).

Part 1: Build the config (library)

Use anonymizer.config to define the rewrite or replacement strategy and detection options. This code is identical whether you run Anonymizer standalone or through the NeMo Platform service.

from anonymizer.config.anonymizer_config import AnonymizerConfig
from anonymizer.config.replace_strategies import Redact

config = AnonymizerConfig(
    replace=Redact(format_template="[REDACTED_{label}]"),
)

Part 2: Execute (platform)

Submit the config to the Anonymizer service. The plugin owns the request shape (PreviewRequest, AnonymizerRequest) so it can also describe the input source and model routing:

import os
from anonymizer.config.anonymizer_config import AnonymizerConfig
from anonymizer.config.replace_strategies import Redact
from data_designer.config import ModelConfig
from nemo_anonymizer_plugin.app.input import AnonymizerInputSpec
from nemo_anonymizer_plugin.app.task_config import PreviewRequest
from nemo_platform import NeMoPlatform

WORKSPACE = os.environ.get("NMP_WORKSPACE", "default")
MODEL_PROVIDER = os.environ.get("NMP_ANON_PROVIDER", "nvidia-build")

config = AnonymizerConfig(
    replace=Redact(format_template="[REDACTED_{label}]"),
)

model_configs = [
    ModelConfig(alias="gliner-pii-detector", provider=MODEL_PROVIDER, model="nvidia/gliner-pii"),
    ModelConfig(alias="gpt-oss-120b", provider=MODEL_PROVIDER, model="openai/gpt-oss-120b"),
    ModelConfig(alias="nemotron-30b-thinking", provider=MODEL_PROVIDER, model="nvidia/nemotron-3-nano-30b-a3b"),
]

sdk = NeMoPlatform(
    base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
    workspace=WORKSPACE,
)
anonymizer = sdk.anonymizer

preview = anonymizer.preview(PreviewRequest(
    config=config,
    data=AnonymizerInputSpec(
        source=f"fileset://{WORKSPACE}/anonymizer-inputs#anonymizer-input.csv",
        text_column="biography",
        id_column="id",
    ),
    model_configs=model_configs,
    num_records=10,
))

Service-Specific Considerations¶

When using Anonymizer as a NeMo Platform service:

Feature	Difference	Details
Inference	Routes through the Inference Gateway	Configure providers once and reference them by name from `model_configs`.
Input data	Filesets and HTTP(S) URLs (local paths only in local CLI execution)	Use `sdk.files.filesets.create` / `sdk.files.upload`, then reference with `#<path>`.
Artifacts	Local or platform-managed	`run run` writes to `persistent/results/artifacts` locally; `run submit` stores artifacts in NeMo Platform job storage.

Prerequisites¶

Before starting these tutorials, complete the Quick Start to:

Install the plugin and verify the nemo anonymizer CLI.
Configure an inference provider used in model_configs.
Create a fileset and upload a CSV containing PII.

Tutorials¶

Preview a Config

Stream a small anonymized sample to iterate on AnonymizerConfig and model_configs. Covers sdk.anonymizer.preview, nemo anonymizer preview run / preview submit, and the NDJSON frame stream.

beginner anonymizer
Run an Anonymizer Job

Run the full pipeline locally with nemo anonymizer run run or submit it to the Jobs worker with nemo anonymizer run submit. Load dataset.parquet, trace.parquet, and failed_records.json artifacts.

intermediate anonymizer