🕵️ Rewriting Legal Documents¶

Rewriting legal text (TAB dataset) with a domain-specific privacy goal and custom entity labels tailored for legal proceedings.

📚 What you'll learn¶

Define domain-specific entity labels for legal text (case numbers, court names, etc.)
Configure rewrite mode with legal-specific privacy goals
Preview and run on court decision documents
Triage flagged records with needs_human_review

Tip: First time running notebooks? Start with setup instructions.

⚙️ Setup¶

Check if your NVIDIA_API_KEY from build.nvidia.com is registered for model access.
- The default build.nvidia.com (NVIDIA Build) setup is a convenient way to try Anonymizer and iterate on previews. Use of NVIDIA Build is subject to NVIDIA Build's own terms of service and privacy practices, which are separate from and independent of the NeMo Framework library. NVIDIA Build is intended for evaluation and testing purposes only and may not be used in production environments. Do not upload any confidential information or personal data when using NVIDIA Build. Your use of NVIDIA Build is logged for security purposes and to improve NVIDIA products and services.
- Request and token rate limits on build.nvidia.com vary by account and model access, and lower-volume development access can be slow for full-dataset runs. Start with preview() on a small sample, then move to your own endpoint for production data and usage.
Import Detect (for custom entity labels), Rewrite, and its config classes.
Anonymizer() initializes with the default model provider -- no extra config needed.
configure_logging(LoggingConfig.default()) keeps logs at INFO. Switch to LoggingConfig.debug() when troubleshooting.

In [1]:

Copied!





import getpass
import os

if not os.getenv("NVIDIA_API_KEY"):
    key = getpass.getpass("Enter NVIDIA_API_KEY from build.nvidia.com: ").strip()
    if not key:
        raise RuntimeError("NVIDIA_API_KEY is required to run these notebooks.")
    os.environ["NVIDIA_API_KEY"] = key
import getpass
import os

if not os.getenv("NVIDIA_API_KEY"):
    key = getpass.getpass("Enter NVIDIA_API_KEY from build.nvidia.com: ").strip()
    if not key:
        raise RuntimeError("NVIDIA_API_KEY is required to run these notebooks.")
    os.environ["NVIDIA_API_KEY"] = key

In [ ]:

Copied!





from anonymizer import (
    Anonymizer,
    AnonymizerConfig,
    AnonymizerInput,
    Detect,
    LoggingConfig,
    PrivacyGoal,
    Rewrite,
    configure_logging,
)

configure_logging(LoggingConfig.default())
from anonymizer import (
    Anonymizer,
    AnonymizerConfig,
    AnonymizerInput,
    Detect,
    LoggingConfig,
    PrivacyGoal,
    Rewrite,
    configure_logging,
)

configure_logging(LoggingConfig.default())

In [3]:

Copied!

anonymizer = Anonymizer()
anonymizer = Anonymizer()

[16:41:39] [INFO] 🔧 Anonymizer initialized with 3 model configs
[16:41:39] [INFO]   |-- 🔎 detector:  gliner-pii-detector
[16:41:39] [INFO]   |-- ✅ validator: gpt-oss-120b
[16:41:39] [INFO]   |-- 🧩 augmenter: gpt-oss-120b

📦 Input data¶

TAB (Text Anonymization Benchmark) legal documents -- court decisions containing names, dates, case numbers, and other legal identifiers.
LEGAL_ENTITY_LABELS defines the domain-specific entity types to detect. This replaces the default label set with one tailored to legal text.

In [4]:

Copied!





LEGAL_ENTITY_LABELS = [
    "first_name",
    "last_name",
    "court_name",
    "organization_name",
    "company_name",
    "prison_detention_facility",
    "street_address",
    "city",
    "state",
    "country",
    "date",
    "date_time",
    "time",
    "date_of_birth",
    "age",
    "email",
    "phone_number",
    "ssn",
    "unique_id",
    "legal_role",
    "case_number",
    "application_number",
    "monetary_amount",
    "sentence_duration",
    "nationality",
]

input_data = AnonymizerInput(
    source="https://raw.githubusercontent.com/NVIDIA-NeMo/Anonymizer/refs/heads/main/docs/data/TAB_legal_sample25.csv",
    text_column="text",
    data_summary="Legal court decisions containing personal identifiers, case numbers, and institutional references",
)
LEGAL_ENTITY_LABELS = [
    "first_name",
    "last_name",
    "court_name",
    "organization_name",
    "company_name",
    "prison_detention_facility",
    "street_address",
    "city",
    "state",
    "country",
    "date",
    "date_time",
    "time",
    "date_of_birth",
    "age",
    "email",
    "phone_number",
    "ssn",
    "unique_id",
    "legal_role",
    "case_number",
    "application_number",
    "monetary_amount",
    "sentence_duration",
    "nationality",
]

input_data = AnonymizerInput(
    source="https://raw.githubusercontent.com/NVIDIA-NeMo/Anonymizer/refs/heads/main/docs/data/TAB_legal_sample25.csv",
    text_column="text",
    data_summary="Legal court decisions containing personal identifiers, case numbers, and institutional references",
)

🎛️ Configure¶

Detect(entity_labels=...) overrides the default entity set with legal-specific labels.
PrivacyGoal tells the rewriter what to protect (identifiers, case numbers, institutional references) and what to preserve (legal reasoning, statutory references, ruling structure).

In [5]:

Copied!





config = AnonymizerConfig(
    detect=Detect(
        entity_labels=LEGAL_ENTITY_LABELS,
    ),
    rewrite=Rewrite(
        privacy_goal=PrivacyGoal(
            protect="All personal identifiers, case numbers, court names, and institutional references that could identify parties",
            preserve="Legal reasoning, procedural facts, statutory references, and the structure of the ruling",
        ),
        risk_tolerance="minimal",
        max_repair_iterations=3,
    ),
)
config = AnonymizerConfig(
    detect=Detect(
        entity_labels=LEGAL_ENTITY_LABELS,
    ),
    rewrite=Rewrite(
        privacy_goal=PrivacyGoal(
            protect="All personal identifiers, case numbers, court names, and institutional references that could identify parties",
            preserve="Legal reasoning, procedural facts, statutory references, and the structure of the ruling",
        ),
        risk_tolerance="minimal",
        max_repair_iterations=3,
    ),
)

👁️ Preview¶

Preview on a few records to check that legal entities are detected and the rewrite preserves the ruling's structure.

In [6]:

Copied!





preview = anonymizer.preview(
    config=config,
    data=input_data,
    num_records=3,
)

preview.display_record(0)
preview = anonymizer.preview(
    config=config,
    data=input_data,
    num_records=3,
)

preview.display_record(0)

[16:41:39] [INFO] 📂 Loaded 25 records from https://raw.githubusercontent.com/NVIDIA-NeMo/Anonymizer/refs/heads/main/docs/data/TAB_legal_sample25.csv (column: 'text')
[16:41:39] [INFO] detection labels in scope: ['age', 'application_number', 'case_number', 'city', 'company_name', 'country', 'court_name', 'date', 'date_of_birth', 'date_time', 'email', 'first_name', 'last_name', 'legal_role', 'monetary_amount', 'nationality', 'organization_name', 'phone_number', 'prison_detention_facility', 'sentence_duration', 'ssn', 'state', 'street_address', 'time', 'unique_id']
[16:41:39] [INFO] 🔍 Running entity detection on 3 records
[16:42:17] [INFO]   |-- 📋 Detection complete — 141 entities found across 3 records (0 failed) [37.8s]
[16:42:17] [INFO]   |-- labels: date=51, court_name=35, legal_role=10, nationality=7, last_name=7, organization_name=6, country=5, first_name=5, city=5, application_number=3, date_of_birth=3, monetary_amount=2, case_number=1, sentence_duration=1
[16:42:17] [INFO] ✏️ Running rewrite pipeline
[16:45:15] [INFO] Evaluate-repair loop iteration 0: 2/3 rows need repair
[16:46:10] [INFO] Evaluate-repair loop iteration 1: 1/3 rows need repair
[16:46:56] [INFO] Evaluate-repair loop: all rows pass at iteration 2
[16:47:13] [INFO]   |-- 📋 Rewrite complete (0 failed) [296.2s]
[16:47:13] [INFO] 🎉 Pipeline complete — 3 records processed, 0 total failures

Anonymizer Rewrite Preview (record 0)

Original

PROCEDURE

The case originated in an application (no. 74463/01| application_number) against the Republic of Turkey| country lodged with the Court under Article 34 of the Convention for the Protection of Human Rights and Fundamental Freedoms (“the Convention”) by a Turkish| nationality national, Ms Feriştah| first_name Bahçeyaka| last_name, on 8 June 2001| date.

The applicant was represented by Mr E. Kuloğlu| last_name, a lawyer| legal_role practising in Aydın| city. The Turkish Government| organization_name (“the Government”) did not designate an Agent for the purposes of the proceedings before the Court.

On 14 June 2005| date the Court decided to communicate the application. Applying Article 29 § 3 of the Convention, it decided to rule on the admissibility and merits of the applications at the same time.

The applicant and the Government each filed observations on the admissibility and the merits.

THE FACTS

The applicant was born in 1958| date and lives in Wesel| city, Germany| country.

On 12 February 1980| date the applicant and her husband established a joint bank account with a German| nationality bank.

On an unspecified date, the applicant’s husband withdrew all of the money from their joint account without the applicant’s consent and placed it into another account in a Turkish| nationality bank.

On 23 October 1992| date_of_birth the applicant filed an action with the Aydın Civil Court of first-instance| court_name to recover half the money that her husband had withdrawn from their joint bank account.

On 14 September 1999| date the Aydın Civil Court of first-instance| court_name dismissed the applicant’s case on the ground that she had failed to substantiate her claims. The court reasoned that the applicant had not furnished any bank document, such as receipts indicating withdrawal of money, capable of supporting her allegations. It also noted that the documents kept by the bank had been destroyed at the end of six years’ retention period and that therefore there was no document available on which to conclude that the applicant was right in her assertions.

On 27 December 1999| date the applicant appealed.

On 5 April 2000| date the Court of Cassation| court_name dismissed the applicant’s request for appeal. It opined that the applicant had failed to prove that her husband had withdrawn all the money from their joint bank account and placed it into another bank account.

On 16 November 2000| date the Court of Cassation| court_name dismissed the applicant’s request for rectification.

On 15 December 2000| date the Court of Cassation| court_name’s decision was served on the applicant.

Rewritten

PROCEDURE

The case originated in an application (no. 86214/02) against the Republic of Turkey lodged with the Court under Article 34 of the Convention for the Protection of Human Rights and Fundamental Freedoms (“the Convention”) by a Turkish national, Ms Isabel García, in June 2001.

The applicant was represented by Mr E. Martínez, a lawyer practising in a Turkish city. The national government (“the Government”) did not designate an Agent for the purposes of the proceedings before the Court.

In June 2005 the Court decided to communicate the application. Applying Article 29 § 3 of the Convention, it decided to rule on the admissibility and merits of the applications at the same time.

The applicant and the Government each filed observations on the admissibility and the merits.

THE FACTS

The applicant was born in the 1950s and lives in a city in Germany.

In February 1980 the applicant and the other account holder established a joint bank account with a German bank.

On an unspecified date, the other account holder withdrew all of the money from their joint account without the applicant’s consent and placed it into another account in a Turkish bank.

In 1992 the applicant filed an action with a civil court of first instance in Turkey to recover half the money that the other account holder had withdrawn from their joint bank account.

In September 1999 the civil court of first instance in Turkey dismissed the applicant’s case on the ground that she had failed to substantiate her claims. The court reasoned that the applicant had not furnished any bank document, such as receipts indicating withdrawal of money, capable of supporting her allegations. It also noted that the documents kept by the bank had been destroyed after the retention period and that therefore there was no document available on which to conclude that the applicant was right in her assertions.

In December 1999 the applicant appealed.

In April 2000 the highest appellate court dismissed the applicant’s request for appeal. It opined that the applicant had failed to prove that the other account holder had withdrawn all the money from their joint bank account and placed it into another bank account.

In November 2000 the highest appellate court dismissed the applicant’s request for rectification.

In December 2000 the highest appellate court’s decision was served on the applicant.

Scores

Utility: 0.86Leakage: 0.54Weighted Leakage Rate: 0.04Needs Review: NoJudge: privacy: 9/10, quality: 9/10, naturalness: 9/10

Entity Disposition

Entity	Label	Sensitivity	Protection
12 February 1980	date	medium	generalize
14 June 2005	date	medium	generalize
14 September 1999	date	medium	generalize
15 December 2000	date	medium	generalize
16 November 2000	date	medium	generalize
1958	date	medium	generalize
23 October 1992	date_of_birth	medium	generalize
27 December 1999	date	medium	generalize
5 April 2000	date	medium	generalize
74463/01	application_number	high	replace
8 June 2001	date	medium	generalize
Aydın	city	low	generalize
Aydın Civil Court of first-instance	court_name	medium	generalize
Bahçeyaka	last_name	high	replace
Court of Cassation	court_name	medium	generalize
Feriştah	first_name	high	replace
German	nationality	low	leave_as_is
Germany	country	low	leave_as_is
Kuloğlu	last_name	high	replace
Republic of Turkey	country	low	leave_as_is
Turkish	nationality	low	leave_as_is
Turkish Government	organization_name	medium	generalize
Wesel	city	low	generalize
lawyer	legal_role	low	leave_as_is
married	marital_status	medium	suppress_inference
victim	financial_abuse_victim	high	suppress_inference

In [7]:

Copied!

preview.display_record(1)
preview.display_record(1)

Anonymizer Rewrite Preview (record 1)

Original

PROCEDURE

The case originated in an application (no. 29360/06| application_number) against the Republic of Poland| country lodged with the Court under Article 34 of the Convention for the Protection of Human Rights and Fundamental Freedoms (“the Convention”) by a Polish| nationality national, Ms Teresa| first_name Jerzak| last_name (“the applicant”), on 3 July 2006| date.

The Polish| nationality Government (“the Government”) were represented by their Agent| legal_role, Mr J. Wołąsiewicz| last_name of the Ministry of Foreign Affairs| organization_name.

On 25 September 2007| date the President of the Fourth Section| legal_role decided to give notice of the application to the Government. Applying Article 29 § 3 of the Convention, it was decided to rule on the admissibility and merits of the application at the same time.

THE FACTS

I. THE CIRCUMSTANCES OF THE CASE

The applicant was born in 1940| date_of_birth and lives in Sulejówek| city.

A. Civil proceedings for division of inheritance

On 8 March 1994| date the applicant lodged an application for division of an inheritance with the Warsaw District Court (Sąd Rejonowy)| court_name.

On 7 November 1997| date the Warsaw District Court| court_name stayed the proceedings. It referred to the fact that related criminal proceedings concerning fraudulent acquisition of land, which could affect the outcome of the case, had first to be terminated.

On 9 February 1998| date the applicant asked for the proceedings to be resumed.

On 3 July 1998| date the court refused that request.

On 3 July 1998| date the applicant complained to the Warsaw District Court| court_name about the delays in the proceedings. On 14 July 1998| date the court informed her that the proceedings would be resumed after the criminal proceedings had been terminated.

On 3 August 2000| date the court refused to resume the proceedings. The applicant lodged an interlocutory appeal against that decision. The applicant referred to the fact that the prosecution had discontinued the investigation.

On 15 January 2001| date the applicant asked once more for the proceedings to be resumed, to no avail.

The proceedings were resumed on 14 February 2002| date.

On 30 June 2004| date the Warsaw District Court| court_name ruled that some of the issues raised in the application, concerning the acquisition of property, should be examined in separate proceedings. Consequently, part of the claim, concerning the expropriation of property, was referred to the Warsaw District Court| court_name as a separate case. The applicant appealed. On 26 July 2004| date the Warsaw Regional Court (Sąd Okręgowy)| court_name dismissed the appeal.

On 25 October 2004| date the court stayed the proceedings pending the outcome of the parallel proceedings for the expropriation of property. The applicant appealed.

On 24 October 2006| date the Regional Court| court_name quashed that decision and resumed the examination of the case. It referred to the fact that the District Court| court_name had erroneously referred part of the claim to other proceedings. It relied on the need to examine both cases simultaneously within the scope of the same proceedings.

On 9 May 2007| date the court stayed the proceedings because the parallel proceedings for the expropriation of property were pending before the appellate court.

The case is still pending before the District Court| court_name.

B. Proceedings under the 2004| date Act

On 2 January 2006| date the applicant lodged a complaint with the Warsaw Regional Court| court_name, alleging a breach of her right to a hearing within a reasonable time. She relied on section 2 of the Act of 17 June 2004| date on complaints about a breach of the right to a trial within a reasonable time (Ustawa o skardze na naruszenie prawa strony do rozpoznania sprawy w postępowaniu sądowym bez nieuzasadnionej zwłoki) (“the 2004| date Act”), which entered into force on 17 September 2004| date.

On 9 May 2006| date the Warsaw Regional Court| court_name acknowledged the excessive length of the proceedings before the Warsaw District Court| court_name. It awarded the applicant 200 Polish zlotys| monetary_amount (PLN – approximately 50 euros| monetary_amount (EUR)) by way of just satisfaction. The court referred to the resolution of the Supreme Court| organization_name (Sąd Najwyższy| court_name) of 18 January 2005| date (no. III SPP 113/04| case_number) in which it ruled that while the 2004| date Act produced legal effects as from the date of its date of entry into force, its provisions applied retroactively to all proceedings in which delays had occurred before that date and had not yet been remedied. The Regional Court| court_name held that the overall length of the proceedings before the District Court| court_name had been excessive, there had been long periods of inactivity and the hearings had not been held on a regular basis. These delays had taken place before the date of entry into force the 2004| date Act and had not been remedied afterwards. Referring to the amount of just satisfaction, the court held that having analysed all the circumstances of the case it found this amount to be sufficient for the applicant.

Rewritten

PROCEDURE

The case originated in an application (no. 48712/09) against the Republic of Poland lodged with the Court under Article 34 of the Convention for the Protection of Human Rights and Fundamental Freedoms (“the Convention”) by a Polish national, the applicant, in mid‑2006.

The Polish Government (“the Government”) was represented by its Agent, Mr J. Kovács of the Department of International Relations.

In late 2007 the President of the Fourth Section decided to give notice of the application to the Government. Applying Article 29 § 3 of the Convention, it was decided to rule on the admissibility and merits of the application at the same time.

THE FACTS

I. THE CIRCUMSTANCES OF THE CASE

The applicant was born in the early 1940s and lives in a town near the capital.

A. Civil proceedings for division of inheritance

In early 1994 the applicant lodged an application for division of an inheritance with a first‑instance court.

In late 1997 that court stayed the proceedings, referring to related criminal proceedings concerning fraudulent acquisition of land that had to be terminated first.

In early 1998 the applicant asked for the proceedings to be resumed.

In mid‑1998 the court refused that request.

In mid‑1998 the applicant complained about the delays. The court replied that the proceedings would be resumed after the criminal matters were concluded.

In mid‑2000 the court again refused to resume the case. The applicant lodged an interlocutory appeal, noting that the prosecution had discontinued the investigation.

In early 2001 the applicant asked once more for the proceedings to be resumed, without success.

The proceedings were resumed in early 2002.

In mid‑2000s the first‑instance court ruled that some issues concerning the acquisition of property should be examined separately. Consequently, part of the claim concerning expropriation was referred to a separate case. The applicant appealed. In the same period an appellate court dismissed the appeal.

In the later part of the 2000s the court stayed the proceedings pending the outcome of the parallel expropriation case. The applicant appealed.

In the later part of the 2000s the appellate court quashed that decision and resumed examination, noting that the lower court had erroneously referred part of the claim elsewhere and emphasizing the need to consider both matters together.

In the mid‑2000s the court stayed the proceedings because the parallel expropriation matters were pending before a higher appellate body.

The case remains pending before the first‑instance court.

B. Proceedings under the early‑2000s Act

In early 2006 the applicant lodged a complaint with an appellate court, alleging a breach of the right to a hearing within a reasonable time. The applicant relied on section 2 of the Act of the mid‑2000s on complaints about a breach of the right to a trial within a reasonable time (“the Act”), which entered into force in the mid‑2000s.

In the mid‑2000s the appellate court acknowledged the excessive length of the proceedings before the first‑instance court. It awarded the applicant a modest monetary sum by way of just satisfaction. The court referred to the resolution of the High Court of Justice in early 2005 (no. IV SPP 219/07) in which it held that while the Act produced legal effects from its entry into force, its provisions applied retroactively to all proceedings in which delays had occurred before that date and had not yet been remedied. The appellate court found that the overall length of the earlier proceedings had been excessive, with long periods of inactivity and irregular hearings. These delays had taken place before the Act’s entry into force and had not been remedied afterwards. Considering all circumstances, the court found the modest monetary award to be sufficient.

Scores

Utility: 0.93Leakage: 0.48Weighted Leakage Rate: 0.02Needs Review: NoJudge: privacy: 6/10, quality: 8/10, naturalness: 7/10

Entity Disposition

Entity	Label	Sensitivity	Protection
29360/06	application_number	high	replace
Republic of Poland	country	low	leave_as_is
Polish	nationality	low	leave_as_is
Teresa	first_name	high	replace
Jerzak	last_name	high	replace
3 July 2006	date	medium	generalize
Agent	legal_role	low	leave_as_is
Wołąsiewicz	last_name	high	replace
Ministry of Foreign Affairs	organization_name	high	replace
25 September 2007	date	medium	generalize
President of the Fourth Section	legal_role	low	leave_as_is
1940	date_of_birth	medium	generalize
Sulejówek	city	medium	generalize
8 March 1994	date	medium	generalize
Warsaw District Court (Sąd Rejonowy)	court_name	medium	generalize
7 November 1997	date	medium	generalize
Warsaw District Court	court_name	medium	generalize
9 February 1998	date	medium	generalize
3 July 1998	date	medium	generalize
14 July 1998	date	medium	generalize
3 August 2000	date	medium	generalize
15 January 2001	date	medium	generalize
14 February 2002	date	medium	generalize
30 June 2004	date	medium	generalize
Warsaw Regional Court (Sąd Okręgowy)	court_name	medium	generalize
26 July 2004	date	medium	generalize
25 October 2004	date	medium	generalize
24 October 2006	date	medium	generalize
Regional Court	court_name	medium	generalize
9 May 2007	date	medium	generalize
District Court	court_name	medium	generalize
2004	date	medium	generalize
2 January 2006	date	medium	generalize
Warsaw Regional Court	court_name	medium	generalize
17 June 2004	date	medium	generalize
17 September 2004	date	medium	generalize
9 May 2006	date	medium	generalize
200 Polish zlotys	monetary_amount	medium	generalize
50 euros	monetary_amount	medium	generalize
Supreme Court	organization_name	high	replace
Sąd Najwyższy	court_name	medium	generalize
18 January 2005	date	medium	generalize
III SPP 113/04	case_number	high	replace
66	age	medium	suppress_inference
small town near Warsaw	residence	medium	suppress_inference
female	gender	high	remove
property inheritance dispute	legal_situation	low	leave_as_is

🚀 Full run¶

result.dataframe has user-facing columns: rewritten text, scores, and the review flag.

In [8]:

Copied!

result = anonymizer.run(config=config, data=input_data)

result.dataframe.head()
result = anonymizer.run(config=config, data=input_data)

result.dataframe.head()

[16:47:13] [INFO] 📂 Loaded 25 records from https://raw.githubusercontent.com/NVIDIA-NeMo/Anonymizer/refs/heads/main/docs/data/TAB_legal_sample25.csv (column: 'text')
[16:47:13] [INFO] detection labels in scope: ['age', 'application_number', 'case_number', 'city', 'company_name', 'country', 'court_name', 'date', 'date_of_birth', 'date_time', 'email', 'first_name', 'last_name', 'legal_role', 'monetary_amount', 'nationality', 'organization_name', 'phone_number', 'prison_detention_facility', 'sentence_duration', 'ssn', 'state', 'street_address', 'time', 'unique_id']
[16:47:13] [INFO] 🔍 Running entity detection on 25 records
[16:51:28] [INFO]   |-- 📋 Detection complete — 1285 entities found across 25 records (0 failed) [254.7s]
[16:51:28] [INFO]   |-- labels: date=418, court_name=241, legal_role=167, last_name=84, organization_name=76, first_name=62, city=47, nationality=46, country=43, application_number=26, date_of_birth=25, prison_detention_facility=17, sentence_duration=13, monetary_amount=10, state=4, unique_id=2, case_number=1, age=1, time=1, company_name=1
[16:51:28] [INFO] ✏️ Running rewrite pipeline
[17:05:12] [INFO] Evaluate-repair loop iteration 0: 16/25 rows need repair
[17:08:02] [INFO] Evaluate-repair loop iteration 1: 9/25 rows need repair
[17:09:34] [INFO] Evaluate-repair loop iteration 2: 7/25 rows need repair
[17:11:39] [INFO]   |-- 📋 Rewrite complete (0 failed) [1211.3s]
[17:11:39] [INFO] 🎉 Pipeline complete — 25 records processed, 0 total failures

Out[8]:

	text	text_rewritten	utility_score	leakage_mass	weighted_leakage_rate	any_high_leaked	needs_human_review
0	PROCEDURE The case originated in an applicati...	PROCEDURE The case originated in an applicati...	0.86	0.9	0.056962	True	True
1	PROCEDURE The case originated in an applicati...	PROCEDURE The case originated in an applicati...	0.841667	0.54	0.020769	False	False
2	PROCEDURE The case originated in an applicati...	PROCEDURE The case originated in an applicati...	0.857143	0.0	0.0	False	False
3	PROCEDURE The case originated in an applicati...	PROCEDURE The case originated in an applicati...	0.982353	0.0	0.0	False	False
4	PROCEDURE The case originated in an applicati...	PROCEDURE The case originated in an applicati...	0.984615	0.57	0.033529	False	False

In [9]:

Copied!

result.dataframe[["text_rewritten", "utility_score", "leakage_mass", "needs_human_review"]].head()
result.dataframe[["text_rewritten", "utility_score", "leakage_mass", "needs_human_review"]].head()

Out[9]:

	text_rewritten	utility_score	leakage_mass	needs_human_review
0	PROCEDURE The case originated in an applicati...	0.86	0.9	True
1	PROCEDURE The case originated in an applicati...	0.841667	0.54	False
2	PROCEDURE The case originated in an applicati...	0.857143	0.0	False
3	PROCEDURE The case originated in an applicati...	0.982353	0.0	False
4	PROCEDURE The case originated in an applicati...	0.984615	0.57	False

🚩 Filter by review flag¶

Records where automated metrics exceed thresholds are flagged for manual review.
Use this to prioritize human attention on the records that need it most.
See Working with flagged records for guidance on diagnosing and resolving flagged records.

In [10]:

Copied!





df = result.dataframe
flagged = df[df["needs_human_review"] == True]  # noqa: E712
print(f"{len(flagged)} of {len(df)} records flagged for human review")
flagged.head()
df = result.dataframe
flagged = df[df["needs_human_review"] == True]  # noqa: E712
print(f"{len(flagged)} of {len(df)} records flagged for human review")
flagged.head()

7 of 25 records flagged for human review

Out[10]:

	text	text_rewritten	utility_score	leakage_mass	weighted_leakage_rate	any_high_leaked	needs_human_review
0	PROCEDURE The case originated in an applicati...	PROCEDURE The case originated in an applicati...	0.86	0.9	0.056962	True	True
6	PROCEDURE The case originated in an applicati...	PROCEDURE The case originated in an applicati...	0.966667	1.6	0.070796	True	True
10	PROCEDURE The case originated in an applicati...	PROCEDURE The case originated in an applicati...	0.873333	1.85	0.064685	True	True
12	PROCEDURE The case originated in an applicati...	PROCEDURE The case originated in an applicati...	0.9	4.3	0.119444	True	True
18	PROCEDURE The case originated in an applicati...	PROCEDURE The case originated in an applicati...	1.0	4.38	0.183264	True	True

⏭️ Next steps¶

🔍 Inspecting Detected Entities -- debug what the detection pipeline found before rewriting.
Try it on your own data! Swap in your CSV, define entity labels for your domain, and set a PrivacyGoal that fits -- you've got all the building blocks.