models
models
¶
Classes:
| Name | Description |
|---|---|
Visibility |
Visibility identifiers indicating what access is opened for each model |
ObjectRef |
Maps individual model files to keys in memory |
ModelManifest |
Manages remote model files and versions. These configurations are model specific. |
StorageConfig |
Defines where and how |
CacheManager |
Handles downloading model files from the "opt" package repo. |
Functions:
| Name | Description |
|---|---|
get_cache_manager |
Returns a singleton instance of |
Attributes:
| Name | Type | Description |
|---|---|---|
DEFAULT_BUCKET |
Default bucket environment variable. If it's not found, use dev. |
|
DEFAULT_CACHE_DIR |
|
DEFAULT_BUCKET = os.getenv('NSS_OPT_BUCKET', 'nss-opt-dev-use2')
module-attribute
¶
Default bucket environment variable. If it's not found, use dev.
DEFAULT_CACHE_DIR = os.getenv('NSS_OPT_CACHE_DIR', '.optcache')
module-attribute
¶
StorageConfig default cache directory. Searches the NSS_OPT_CACHE_DIR environment
for a cache directory. By default it will fallback to .optcache. If this is set to
disabled files wont be cached to disk.
Visibility
¶
Bases: Enum
Visibility identifiers indicating what access is opened for each model
Attributes:
| Name | Type | Description |
|---|---|---|
PUBLIC |
Packages that are open to the public with a public-read ACL |
|
PRIVATE |
Packages available to customers via "paywall" behind an api key |
|
INTERNAL |
Only available from internal infrastructure. |
PUBLIC = 'pub'
class-attribute
instance-attribute
¶
Packages that are open to the public with a public-read ACL
PRIVATE = 'priv'
class-attribute
instance-attribute
¶
Packages available to customers via "paywall" behind an api key
INTERNAL = 'int'
class-attribute
instance-attribute
¶
Only available from internal infrastructure.
ObjectRef(key, file_name)
dataclass
¶
ModelManifest(model, version, sources, visibility)
dataclass
¶
Manages remote model files and versions. These configurations are model specific.
Models are by default stored in opt, under the convention
/[vis]/models/[pkg]/[version]/[...files]
Attributes:
| Name | Type | Description |
|---|---|---|
model |
str
|
Model identifer. Eg spacy, fasttext, entityruler |
version |
str
|
Model version |
sources |
list[ObjectRef]
|
A list of pickled objects to load into memory. |
visibility |
Visibility
|
Package visibility |
key |
str
|
A unique key that can be used to store or fetch model sources. |
model
instance-attribute
¶
Model identifer. Eg spacy, fasttext, entityruler
version
instance-attribute
¶
Model version
sources
instance-attribute
¶
A list of pickled objects to load into memory.
visibility
instance-attribute
¶
Package visibility
key
property
¶
A unique key that can be used to store or fetch model sources.
StorageConfig(bucket=DEFAULT_BUCKET, cache_dir=None)
dataclass
¶
Defines where and how ModelManifests are stored. These configurations are likely
environment specific.
Methods:
| Name | Description |
|---|---|
from_system |
Return a default |
Attributes:
| Name | Type | Description |
|---|---|---|
bucket |
Optional[str]
|
Remote "opt" bucket model files are stored under |
cache_dir |
Optional[Path]
|
Local file system directory. If this value is None, |
bucket = DEFAULT_BUCKET
class-attribute
instance-attribute
¶
Remote "opt" bucket model files are stored under
cache_dir = None
class-attribute
instance-attribute
¶
Local file system directory. If this value is None, ModelManifest files
wont be cached through to disk.
from_system()
classmethod
¶
Return a default StorageConfig based on a system's environment variables.
By convention, it looks for the environment variable NSS_OPT_BUCKET as
the bucket location. The default settings from this function are appropriate
for development without additional configuration.
Source code in src/nemo_safe_synthesizer/pii_replacer/ner/models.py
CacheManager(storage_config=None)
¶
Handles downloading model files from the "opt" package repo.
This class will also optionally cache these files to disk. This is useful for environments with local persistent state such as a local development laptop.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
storage_config
|
StorageConfig
|
A storage config. |
None
|
Methods:
| Name | Description |
|---|---|
get_instance |
Returns a singleton instance of |
register_manifest |
Registers a manifest in the cache manager. This will not download the file |
set_storage_config |
Apply a new |
download_and_cache_manifest_data |
Load each registered manifest into memory |
resolve |
Given a manifest, will return it's resolved data. If the manifest hasn't |
obj_from_fs |
Return the source object from the filesystem if it exists. If no file is found |
Attributes:
| Name | Type | Description |
|---|---|---|
timings |
dict[str, float]
|
Holds timings for each model manifest. Keyed by |
Source code in src/nemo_safe_synthesizer/pii_replacer/ner/models.py
timings = {}
instance-attribute
¶
Holds timings for each model manifest. Keyed by ModelManifest.model
get_instance(storage_config=None)
classmethod
¶
Returns a singleton instance of CacheManager.
Source code in src/nemo_safe_synthesizer/pii_replacer/ner/models.py
register_manifest(manifest)
¶
Registers a manifest in the cache manager. This will not download the file
Source code in src/nemo_safe_synthesizer/pii_replacer/ner/models.py
set_storage_config(storage_config)
¶
download_and_cache_manifest_data()
¶
resolve(manifest, evict=False, skip_pickle=False)
¶
Given a manifest, will return it's resolved data. If the manifest hasn't already been registered with the manager, it will be registered automatically.
The load order is as follows:
1. From in-memory cache
2. From FS cache if enabled via a ``StorageConfig``
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
manifest
|
ModelManifest
|
The manifest file to resolve and return |
required |
evict
|
bool
|
If |
False
|
Source code in src/nemo_safe_synthesizer/pii_replacer/ner/models.py
obj_from_fs(manifest, obj_ref, skip_pickle=False)
¶
Return the source object from the filesystem if it exists. If no file is found
return None.
Source code in src/nemo_safe_synthesizer/pii_replacer/ner/models.py
get_cache_manager(storage_config=None)
¶
Returns a singleton instance of CacheManager.