Skip to content

Tutorials

These tutorials cover the two user-facing surfaces of the Anonymizer plugin: the streaming preview workflow for iteration, and the run job for full datasets.

Library vs. Service

Anonymizer separates configuration (what to detect and how to replace it) from execution (where the work runs and how models are reached).

Part 1: Build the config (library)

Use anonymizer.config to define the rewrite or replacement strategy and detection options. This code is identical whether you run Anonymizer standalone or through the NeMo Platform service.

from anonymizer.config.anonymizer_config import AnonymizerConfig
from anonymizer.config.replace_strategies import Redact

config = AnonymizerConfig(
    replace=Redact(format_template="[REDACTED_{label}]"),
)

Part 2: Execute (platform)

Submit the config to the Anonymizer service. The plugin owns the request shape (PreviewRequest, AnonymizerRequest) so it can also describe the input source and model routing:

import os
from anonymizer.config.anonymizer_config import AnonymizerConfig
from anonymizer.config.replace_strategies import Redact
from data_designer.config import ModelConfig
from nemo_anonymizer_plugin.app.input import AnonymizerInputSpec
from nemo_anonymizer_plugin.app.task_config import PreviewRequest
from nemo_platform import NeMoPlatform

WORKSPACE = os.environ.get("NMP_WORKSPACE", "default")
MODEL_PROVIDER = os.environ.get("NMP_ANON_PROVIDER", "nvidia-build")

config = AnonymizerConfig(
    replace=Redact(format_template="[REDACTED_{label}]"),
)

model_configs = [
    ModelConfig(alias="gliner-pii-detector", provider=MODEL_PROVIDER, model="nvidia/gliner-pii"),
    ModelConfig(alias="gpt-oss-120b", provider=MODEL_PROVIDER, model="openai/gpt-oss-120b"),
    ModelConfig(alias="nemotron-30b-thinking", provider=MODEL_PROVIDER, model="nvidia/nemotron-3-nano-30b-a3b"),
]

sdk = NeMoPlatform(
    base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
    workspace=WORKSPACE,
)
anonymizer = sdk.anonymizer

preview = anonymizer.preview(PreviewRequest(
    config=config,
    data=AnonymizerInputSpec(
        source=f"fileset://{WORKSPACE}/anonymizer-inputs#anonymizer-input.csv",
        text_column="biography",
        id_column="id",
    ),
    model_configs=model_configs,
    num_records=10,
))

Service-Specific Considerations

When using Anonymizer as a NeMo Platform service:

Feature Difference Details
Inference Routes through the Inference Gateway Configure providers once and reference them by name from model_configs.
Input data Filesets and HTTP(S) URLs (local paths only in local CLI execution) Use sdk.files.filesets.create / sdk.files.upload, then reference with #<path>.
Artifacts Local or platform-managed run run writes to persistent/results/artifacts locally; run submit stores artifacts in NeMo Platform job storage.

Prerequisites

Before starting these tutorials, complete the Quick Start to:

  • Install the plugin and verify the nemo anonymizer CLI.
  • Configure an inference provider used in model_configs.
  • Create a fileset and upload a CSV containing PII.

Tutorials

  • Preview a Config


    Stream a small anonymized sample to iterate on AnonymizerConfig and model_configs. Covers sdk.anonymizer.preview, nemo anonymizer preview run / preview submit, and the NDJSON frame stream.

    beginner anonymizer

  • Run an Anonymizer Job


    Run the full pipeline locally with nemo anonymizer run run or submit it to the Jobs worker with nemo anonymizer run submit. Load dataset.parquet, trace.parquet, and failed_records.json artifacts.

    intermediate anonymizer