Skip to content

Quick Start

This guide walks through previewing and running an Anonymizer job on NeMo Platform.

Prerequisites

  • Access to a NeMo Platform deployment with the anonymizer plugin service enabled.
  • An API key for a model provider used by the Anonymizer pipeline.

Step 1: Install the Plugin

Follow the Setup guide to install NeMo Platform and complete nemo setup. From a repo checkout, run uv sync at the repo root; the root workspace includes the Anonymizer plugin, so no separate editable plugin install step is needed. nemo services run then picks up the plugin automatically and mounts /apis/anonymizer/... on the gateway.

Verify the CLI is registered:

nemo anonymizer --help

You should see validate, preview, and run command groups.

Step 2: Initialize the SDK

import os
from nemo_platform import NeMoPlatform

base_url = os.environ.get("NMP_BASE_URL", "http://localhost:8080")
WORKSPACE = os.environ.get("NMP_WORKSPACE", "default")
sdk = NeMoPlatform(base_url=base_url, workspace=WORKSPACE)
anonymizer = sdk.anonymizer

Step 3: Configure Inference

Anonymizer routes inference through the Inference Gateway service. You need a model provider configured before running anything that uses model_configs.

nemo setup walks you through creating a provider secret and registering an Inference Gateway provider as part of the install flow. If you skipped that step or want to add another provider, re-run nemo setup — see the Setup guide for details.

Note

The platform pre-configures a system/nvidia-build model provider during startup. This provider routes inference requests to models hosted on build.nvidia.com using the API base URL https://integrate.api.nvidia.com and the NGC API key with Public API Endpoints permissions provided during deployment (automatically saved as the built-in system/ngc-api-key secret).

You can verify this provider exists by running nemo inference providers list --workspace system.

The tutorials in these docs use this provider for inference, but you can alternatively create your own and use it instead.

Step 4: Upload an Input Fileset

Create a small CSV containing PII and upload it to a fileset:

import os
import tempfile
from pathlib import Path

from nemo_platform._exceptions import ConflictError

WORKSPACE = os.environ.get("NMP_WORKSPACE", "default")
FILESET = "anonymizer-inputs"
INPUT_FILENAME = "anonymizer-input.csv"

with tempfile.NamedTemporaryFile("w", suffix=".csv", delete=False) as f:
    f.write(
        "id,biography\n"
        "1,Alice Johnson lives in Seattle and works at NVIDIA.\n"
        "2,Bob Smith can be reached at bob.smith@example.com.\n"
    )
    input_path = Path(f.name)

try:
    sdk.files.filesets.create(
        name=FILESET,
        workspace=WORKSPACE,
        description="Anonymizer input files",
    )
except ConflictError:
    pass  # already exists

sdk.files.upload(
    local_path=str(input_path),
    fileset=FILESET,
    workspace=WORKSPACE,
    remote_path=INPUT_FILENAME,
)

The plugin accepts three input source forms:

  • Local path (local execution only): /tmp/anonymizer-input.csv
  • HTTP(S) URL: https://.../input.csv
  • Fileset reference: anonymizer-inputs#anonymizer-input.csv, default/anonymizer-inputs#anonymizer-input.csv, or fileset://default/anonymizer-inputs#anonymizer-input.csv

Step 5: Preview Anonymization

Preview streams a small anonymized sample so you can iterate on the config without running a full job. Build a PreviewRequest and call anonymizer.preview:

import os
from anonymizer.config.anonymizer_config import AnonymizerConfig
from anonymizer.config.replace_strategies import Redact
from data_designer.config import ModelConfig
from nemo_anonymizer_plugin.app.input import AnonymizerInputSpec
from nemo_anonymizer_plugin.app.task_config import PreviewRequest

MODEL_PROVIDER = os.environ.get("NMP_ANON_PROVIDER", "nvidia-build")

config = AnonymizerConfig(
    replace=Redact(format_template="[REDACTED_{label}]"),
)

model_configs = [
    ModelConfig(alias="gliner-pii-detector", provider=MODEL_PROVIDER, model="nvidia/gliner-pii"),
    ModelConfig(alias="gpt-oss-120b", provider=MODEL_PROVIDER, model="openai/gpt-oss-120b"),
    ModelConfig(alias="nemotron-30b-thinking", provider=MODEL_PROVIDER, model="nvidia/nemotron-3-nano-30b-a3b"),
]

request = PreviewRequest(
    config=config,
    data=AnonymizerInputSpec(
        source=f"fileset://{WORKSPACE}/{FILESET}#{INPUT_FILENAME}",
        text_column="biography",
        id_column="id",
    ),
    model_configs=model_configs,
    num_records=2,
)

preview = anonymizer.preview(request)

preview.dataset                   # pandas DataFrame of anonymized records
preview.trace_dataset             # detection trace
preview.failed_records            # list of per-record failures (usually empty)
preview.display_record(0)         # render a record with entity highlights

preview.dataset is a regular pandas DataFrame, so you can persist it with to_csv or to_parquet.

Run preview from the CLI instead

The same flow is available from the CLI. Write the spec to YAML:

import yaml
from pathlib import Path

preview_spec_path = Path("/tmp/anonymizer-preview.yaml")
preview_spec_path.write_text(yaml.safe_dump(request.model_dump(mode="json", exclude_none=True)))

Then run either of:

nemo anonymizer preview run \
  --spec-file /tmp/anonymizer-preview.yaml \
  --workspace "${NMP_WORKSPACE:-default}"

nemo anonymizer preview submit \
  --spec-file /tmp/anonymizer-preview.yaml \
  --workspace "${NMP_WORKSPACE:-default}" \
  --base-url "${NMP_BASE_URL:-http://localhost:8080}"

The CLI streams newline-delimited JSON frames (preview_dataset, trace_dataset, failed_records, ...) to stdout. See the preview tutorial for the frame schema and jq recipes.

Note

anonymizer.preview calls the plugin service, so it rejects local file paths in data.source and requires model_configs. The fileset reference and model_configs in the example above satisfy both constraints.

Step 6: Run a Full Job

When the preview looks correct, run the full pipeline. The anonymizer.run job can execute either locally in the CLI process (run run) or on the NeMo Platform Jobs worker (run submit / sdk.anonymizer.run()).

Build an AnonymizerRequest:

from nemo_anonymizer_plugin.app.task_config import AnonymizerRequest

run_request = AnonymizerRequest(
    config=config,
    data=AnonymizerInputSpec(
        source=f"fileset://{WORKSPACE}/{FILESET}#{INPUT_FILENAME}",
        text_column="biography",
        id_column="id",
    ),
    model_configs=model_configs,
)

Option A — submit to the Jobs worker:

job = sdk.anonymizer.run(run_request, wait_until_done=True)
results = job.download_artifacts()

dataset = results.load_dataset()
print(dataset.head())
print(f"records={len(dataset)} failures={len(results.load_failed_records())}")

sdk.anonymizer.run() returns an AnonymizerJobResource. wait_until_done=True blocks until the job reaches a terminal state; download_artifacts() fetches the job artifacts and returns an AnonymizerJobResults for in-memory access. See SDK Resources for the full surface.

The CLI equivalent submits the same spec. First write it to YAML:

import yaml
from pathlib import Path

run_spec_path = Path("/tmp/anonymizer-run.yaml")
run_spec_path.write_text(yaml.safe_dump(run_request.model_dump(mode="json", exclude_none=True)))

Then submit it:

nemo anonymizer run submit \
  --spec-file /tmp/anonymizer-run.yaml \
  --workspace "${NMP_WORKSPACE:-default}" \
  --base-url "${NMP_BASE_URL:-http://localhost:8080}"

Track the submitted job with nemo jobs get-status <job-name> --workspace "${NMP_WORKSPACE:-default}" and nemo jobs get-logs <job-name> --workspace "${NMP_WORKSPACE:-default}".

Option B — run locally in the CLI process:

import yaml
from pathlib import Path

spec_path = Path("/tmp/anonymizer-run.yaml")
spec_path.write_text(yaml.safe_dump(run_request.model_dump(mode="json", exclude_none=True)))
nemo anonymizer run run --spec-file /tmp/anonymizer-run.yaml

The CLI prints {"exit_code": 0} on success and logs the artifact directory (file://.../persistent/results/artifacts) to stderr. The directory contains:

  • dataset.parquet: anonymized output.
  • trace.parquet: detection trace.
  • metadata.json: run metadata.
  • failed_records.json: per-record failures, only when there were failures.

Differences between run run and run submit

run submit rejects local file paths in data.source (use a fileset reference or http(s) URL) and requires explicit model_configs referencing Inference Gateway providers. run run accepts local paths and can run without model_configs when the library defaults suffice.

Step 7: Inspect Artifacts

For Option A (run submit), the AnonymizerJobResults returned by download_artifacts() already loads parquet files lazily — results.load_dataset(), results.load_trace(), and results.load_failed_records() return pandas DataFrames / lists.

For Option B (run run), load the parquet files directly from the local artifact directory:

from pathlib import Path

import pandas as pd

ARTIFACTS_DIR = Path("/path/to/persistent/results/artifacts")  # from the stderr log

dataset = pd.read_parquet(ARTIFACTS_DIR / "dataset.parquet", dtype_backend="pyarrow")
trace   = pd.read_parquet(ARTIFACTS_DIR / "trace.parquet",   dtype_backend="pyarrow")

print(dataset.head())

The trace dataset (and the dataset itself for annotate / substitute strategies) contains pyarrow-backed struct<entities: list<...>> columns. Use pyarrow.parquet.read_table(...).to_pylist() if you need plain Python dict/list values for JSON output.

Troubleshooting

Problem Cause Solution
nemo anonymizer preview submit returns 404 The anonymizer plugin service isn't mounted on the gateway Confirm uv sync ran successfully at the repo root and re-run nemo services run so the plugin is discovered. See Step 1.
model_configs are required for remote execution anonymizer.preview / preview submit requires explicit model_configs Add model_configs referencing an Inference Gateway provider.
Input source ... is a local path Plugin-service execution rejects local paths Use an http(s) URL or a fileset reference.
Fileset input ... must resolve to a .csv or .parquet file Fileset path is a directory or wrong extension Point the #<path> fragment at a single .csv or .parquet file.
provider not found Inference provider missing Inspect or create the provider using the inference/model-provider docs, then reference it in model_configs.

Next Steps

  • Tutorials: Walk through preview and run flows in detail in the tutorials.
  • SDK reference: See SDK Resources for the anonymizer accessor, preview result, and job result types.
  • CLI reference: See CLI Reference for spec-file fields and command flags.
  • Library docs: Detection, replacement strategy parameters, and rewrite mode are documented in the open-source library.