utils
utils
¶
GPU memory management, quantization, device mapping, and tokenizer helpers for LLM loading.
Optional LLM dependencies are imported inside the helpers that need them so
lightweight utilities such as trust_remote_code_for_model remain usable
without installing the full training or inference stack.
Classes:
| Name | Description |
|---|---|
ModelRef |
Resolved model reference for local cache and trust policy decisions. |
Functions:
| Name | Description |
|---|---|
trust_remote_code_for_model |
Determine whether to trust remote code when loading a model. |
cleanup_memory |
Run garbage collection and empty the CUDA cache. |
gpu_stats |
Log current GPU memory reservation and total capacity. |
get_max_vram |
Calculate maximum memory allocation for each available GPU. |
add_bos_eos_tokens_to_tokenizer |
Enable BOS/EOS token injection and set a pad token if missing. |
get_param_from_config |
Read a single attribute from a HuggingFace |
get_device_map |
Infer the device map for a model and optionally pin all layers to one device. |
count_trainable_params |
Count trainable and total parameters in a PEFT model. |
get_quantization_config |
Build a |
get_device_name |
Get the name of the current device (first index). Returns 'undefined' if the device is not available. |
ModelRef(original, repo_id=None, revision='main', local_path=None, cache_root=None)
dataclass
¶
Resolved model reference for local cache and trust policy decisions.
Intended public API:
- parse() normalizes a user-supplied model string or path without
contacting Hugging Face.
- target() returns the value that should be passed to
from_pretrained-style loaders: a local snapshot path when available,
otherwise the original model reference.
- trust_remote_code reports whether the reference belongs to a trusted
organization after accounting for resolved local HF cache paths.
- partial_cached_snapshot() returns HF's local snapshot path for the
repo/revision, even when the snapshot is incomplete.
- missing_required_components() reports whether a local model directory
has the components this project expects before an offline load.
- missing_remote_code_components() reports trusted remote-code files
referenced by Transformers auto_map metadata but absent locally.
Deliberate Hugging Face coupling: repo-id validation, cache-root resolution, cache scanning, snapshot layout, artifact names, tokenizer filenames, and sharded weight index parsing mirror current Hugging Face Hub and Transformers behavior. This is intentional so NSS decisions match the libraries that load the model. If model loading or cache preflight behavior changes after an upstream HF release, inspect this class first.
Internal helpers are not a generic model-layout abstraction. They should stay close to HF's implementation rather than grow compatibility shims for unrelated storage formats.
Methods:
| Name | Description |
|---|---|
parse |
Parse a model identifier or path without contacting Hugging Face. |
missing_required_components |
Return local model components missing from |
missing_remote_code_components |
Return trusted remote-code components referenced by config but absent locally. |
partial_cached_snapshot |
Return the local HF snapshot for this repo/revision, even if it is partial. |
is_trusted_org |
Return whether an organization is allowed to load remote code. |
target |
Return the local snapshot path when available, otherwise the original input. |
Attributes:
| Name | Type | Description |
|---|---|---|
trust_remote_code |
bool
|
Whether loaders should pass |
trust_remote_code
property
¶
Whether loaders should pass trust_remote_code=True for this model.
parse(model_name, *, revision='main', cache_root=None)
classmethod
¶
Parse a model identifier or path without contacting Hugging Face.
This is safe to call in preflight and loader setup because it uses Hugging Face's local cache APIs only. Cached-model hits may still cost a few milliseconds because HF cache scanning walks cache metadata to confirm model artifacts exist.
Source code in src/nemo_safe_synthesizer/llm/utils.py
missing_required_components(model_dir)
classmethod
¶
Return local model components missing from model_dir.
Source code in src/nemo_safe_synthesizer/llm/utils.py
missing_remote_code_components(model_dir)
classmethod
¶
Return trusted remote-code components referenced by config but absent locally.
Source code in src/nemo_safe_synthesizer/llm/utils.py
partial_cached_snapshot()
¶
Return the local HF snapshot for this repo/revision, even if it is partial.
Source code in src/nemo_safe_synthesizer/llm/utils.py
is_trusted_org(org)
classmethod
¶
target()
¶
trust_remote_code_for_model(model_name, *, cache_root=None)
¶
Determine whether to trust remote code when loading a model.
Returns True for model identifiers owned by trusted organizations,
including configured Hugging Face cache snapshots for those organizations.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_name
|
str | Path
|
HuggingFace model identifier or local path. |
required |
cache_root
|
str | Path | None
|
Hugging Face Hub cache root. Defaults to the configured hub cache. |
None
|
Returns:
| Type | Description |
|---|---|
bool
|
Whether to set |
Source code in src/nemo_safe_synthesizer/llm/utils.py
cleanup_memory()
¶
gpu_stats()
¶
Log current GPU memory reservation and total capacity.
Queries CUDA device 0 and logs the peak reserved memory and total available memory in GiB.
Source code in src/nemo_safe_synthesizer/llm/utils.py
get_max_vram(max_vram_fraction=None)
¶
Calculate maximum memory allocation for each available GPU.
Reserves a 2 GiB safety buffer on each device, then applies
max_vram_fraction to the remaining free memory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
max_vram_fraction
|
float | None
|
Fraction of total GPU memory to allocate.
Defaults to |
None
|
Returns:
| Type | Description |
|---|---|
dict[int, float]
|
Mapping of CUDA device index to the usable memory fraction. |
Source code in src/nemo_safe_synthesizer/llm/utils.py
add_bos_eos_tokens_to_tokenizer(tokenizer)
¶
Enable BOS/EOS token injection and set a pad token if missing.
Mutates tokenizer in-place to set add_bos_token and
add_eos_token to True. If no pad token is configured,
pad_token_id is set to eos_token_id.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tokenizer
|
PreTrainedTokenizer
|
The tokenizer to configure. |
required |
Returns:
| Type | Description |
|---|---|
PreTrainedTokenizer
|
The same tokenizer instance, modified in-place. |
Source code in src/nemo_safe_synthesizer/llm/utils.py
get_param_from_config(param, default_value=None, model_name=None, trust_remote_code=None, config=None)
¶
Read a single attribute from a HuggingFace AutoConfig.
Either an existing config object or a model_name (used to
load one on the fly) must be provided.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
param
|
str
|
Name of the config attribute to retrieve. |
required |
default_value
|
Any | None
|
Fallback value when the attribute is absent. |
None
|
model_name
|
str | None
|
HuggingFace model identifier. Required when
|
None
|
trust_remote_code
|
bool | None
|
Passed through to
|
None
|
config
|
AutoConfig | None
|
Pre-loaded |
None
|
Returns:
| Type | Description |
|---|---|
str | None
|
The attribute value, or |
str | None
|
not exist on the config. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If neither |
Source code in src/nemo_safe_synthesizer/llm/utils.py
get_device_map(model_target, autoconfig=None, revision=None, trust_remote_code=False, local_files_only=False, force_single_device=None)
¶
Infer the device map for a model and optionally pin all layers to one device.
Uses accelerate.infer_auto_device_map on an empty-weight model
skeleton to determine layer-to-device assignments.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_target
|
str
|
HuggingFace model identifier or local path. |
required |
autoconfig
|
AutoConfig | None
|
Pre-loaded |
None
|
revision
|
str | None
|
Model revision (branch, tag, or commit hash). |
None
|
trust_remote_code
|
bool
|
Whether to trust remote code when loading. |
False
|
local_files_only
|
bool
|
Restrict loading to local files only. |
False
|
force_single_device
|
int | None
|
When set, every layer is assigned to this CUDA device index. |
None
|
Returns:
| Type | Description |
|---|---|
str | dict[str, int | str]
|
Ordered dictionary mapping layer names to device identifiers. |
Source code in src/nemo_safe_synthesizer/llm/utils.py
count_trainable_params(model)
¶
Count trainable and total parameters in a PEFT model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
PeftModel
|
A |
required |
Returns:
| Type | Description |
|---|---|
tuple[int, int]
|
A tuple of |
Source code in src/nemo_safe_synthesizer/llm/utils.py
get_quantization_config(quantization_bits)
¶
Build a BitsAndBytesConfig for 4-bit or 8-bit quantization.
Both configurations use NormalFloat quantization with double
quantization enabled and bfloat16 as the compute dtype.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
quantization_bits
|
Literal[4, 8]
|
Number of bits — must be |
required |
Returns:
| Type | Description |
|---|---|
BitsAndBytesConfig
|
A |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Source code in src/nemo_safe_synthesizer/llm/utils.py
get_device_name()
¶
Get the name of the current device (first index). Returns 'undefined' if the device is not available.