fragment
fragment
¶
Metadata fragment assembly for NER-annotated records.
Provides Metadata and MetadataFragment for aggregating per-field NER
predictions, along with helpers to merge fragments, build entity maps, and
produce API-compatible response dicts.
Classes:
| Name | Description |
|---|---|
MetadataError |
Raised when metadata fragments cannot be merged (e.g., mismatched IDs). |
Metadata |
Merged record metadata aggregated from one or more |
MetadataFragment |
A single annotation pass over a record (e.g., one NER model's output). |
Functions:
| Name | Description |
|---|---|
merge_fragments |
Merge one or more |
fragment_for_record |
Create a new |
predictions_to_dict |
Aggregate NER predictions into per-field results and an entity map. |
fragment_from_ner_predictions |
Build a |
build_ner_metadata |
Construct a |
create_ner_api_response |
Build an API-compatible list of |
MetadataError
¶
Bases: Exception
Raised when metadata fragments cannot be merged (e.g., mismatched IDs).
Metadata(record_id, fields, entities, received_at)
dataclass
¶
Merged record metadata aggregated from one or more MetadataFragment objects.
The fields dict has the structure::
field_name -> fragment_name -> metadata_type -> [metadata_items]
Methods:
| Name | Description |
|---|---|
as_dict |
Serialize to a plain dictionary. |
Attributes:
| Name | Type | Description |
|---|---|---|
fields |
dict
|
Nested dict of per-field, per-fragment metadata. |
entities |
dict
|
Entity map produced by |
received_at |
str
|
ISO-8601 timestamp of the earliest fragment. |
MetadataFragment(record_id, fragment_ts, fragment_epoch, fragment_name)
dataclass
¶
A single annotation pass over a record (e.g., one NER model's output).
Fragments are later merged via merge_fragments into a single
Metadata object per record.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
record_id
|
str
|
Unique identifier for the source record. |
required |
fragment_ts
|
str
|
ISO-8601 timestamp string. |
required |
fragment_epoch
|
float
|
Unix epoch of the fragment creation. |
required |
fragment_name
|
str
|
Identifier for this annotation pass (e.g., |
required |
Methods:
| Name | Description |
|---|---|
add_field_data |
Append metadata entries for a field. |
as_dict |
Serialize to a plain dictionary. |
Attributes:
| Name | Type | Description |
|---|---|---|
fragment_datetime |
datetime
|
Fragment creation time as a |
fragment_datetime
property
¶
Fragment creation time as a datetime object.
add_field_data(field_name, metadata_type, field_data)
¶
Append metadata entries for a field.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
field_name
|
str
|
Name of the field to annotate. |
required |
metadata_type
|
str
|
Category of metadata (e.g., |
required |
field_data
|
dict | list
|
A dict (single entry) or list of entries to add. |
required |
Raises:
| Type | Description |
|---|---|
TypeError
|
If |
Source code in src/nemo_safe_synthesizer/data_processing/records/fragment.py
merge_fragments(*fragments, ts=None)
¶
Merge one or more MetadataFragment objects into a single Metadata.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*fragments
|
Fragments to merge. All must share the same |
()
|
|
ts
|
str | None
|
Override timestamp for |
None
|
Returns:
| Type | Description |
|---|---|
Metadata
|
A single |
Raises:
| Type | Description |
|---|---|
MetadataError
|
If the fragments have different |
Source code in src/nemo_safe_synthesizer/data_processing/records/fragment.py
fragment_for_record(record_id, fragment_name)
¶
Create a new MetadataFragment timestamped to the current time.
Source code in src/nemo_safe_synthesizer/data_processing/records/fragment.py
predictions_to_dict(predictions, *, high_score=Score.HIGH, med_score=Score.MED)
¶
Aggregate NER predictions into per-field results and an entity map.
Groups predictions by field and builds a score-bucketed entity map::
{
"score_high": ["ip_address", ...],
"score_med": [],
"score_low": [],
"fields_by_entity": {"ip_address": ["conn_str"]},
}
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
predictions
|
list[NERPrediction]
|
List of NER prediction objects. |
required |
high_score
|
float
|
Minimum score threshold for the |
HIGH
|
med_score
|
float
|
Minimum score threshold for the |
MED
|
Returns:
| Type | Description |
|---|---|
tuple[dict, dict]
|
A tuple of (predictions_by_field, entity_map). |
Source code in src/nemo_safe_synthesizer/data_processing/records/fragment.py
fragment_from_ner_predictions(fragment_name, predictions, record_id)
¶
Build a MetadataFragment and entity map from NER predictions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
fragment_name
|
str
|
Identifier for this annotation pass (e.g., |
required |
predictions
|
list[NERPrediction]
|
List of NER predictions to aggregate. |
required |
gretel_id
|
Unique identifier for the source record. |
required |
Returns:
| Type | Description |
|---|---|
tuple[MetadataFragment, dict]
|
A tuple of (fragment, entity_map). |
Source code in src/nemo_safe_synthesizer/data_processing/records/fragment.py
build_ner_metadata(preds)
¶
Construct a Metadata object from raw prediction dicts.
Source code in src/nemo_safe_synthesizer/data_processing/records/fragment.py
create_ner_api_response(records, predictions, pure_dict=False)
¶
Build an API-compatible list of {data, model_metadata} dicts.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
records
|
list[dict]
|
Raw record dictionaries. |
required |
predictions
|
list[dict]
|
Per-record NER prediction lists (parallel with |
required |
pure_dict
|
bool
|
If True, round-trip through JSON to eliminate non-dict types. |
False
|
Returns:
| Type | Description |
|---|---|
list[dict]
|
List of dicts, each containing |