utils

`utils` ¶

GPU memory management, quantization, device mapping, and tokenizer helpers for LLM loading.

Optional LLM dependencies are imported inside the helpers that need them so lightweight utilities such as trust_remote_code_for_model remain usable without installing the full training or inference stack.

Classes:

Name	Description
`ModelRef`	Resolved model reference for local cache and trust policy decisions.

Functions:

Name	Description
`trust_remote_code_for_model`	Determine whether to trust remote code when loading a model.
`cleanup_memory`	Run garbage collection and empty the CUDA cache.
`gpu_stats`	Log current GPU memory reservation and total capacity.
`get_max_vram`	Calculate maximum memory allocation for each available GPU.
`add_bos_eos_tokens_to_tokenizer`	Enable BOS/EOS token injection and set a pad token if missing.
`get_param_from_config`	Read a single attribute from a HuggingFace `AutoConfig`.
`get_device_map`	Infer the device map for a model and optionally pin all layers to one device.
`count_trainable_params`	Count trainable and total parameters in a PEFT model.
`get_quantization_config`	Build a `BitsAndBytesConfig` for 4-bit or 8-bit quantization.
`get_device_name`	Get the name of the current device (first index). Returns 'undefined' if the device is not available.

`ModelRef(original, repo_id=None, revision='main', local_path=None, cache_root=None)` `dataclass` ¶

Resolved model reference for local cache and trust policy decisions.

Intended public API: - parse() normalizes a user-supplied model string or path without contacting Hugging Face. - target() returns the value that should be passed to from_pretrained-style loaders: a local snapshot path when available, otherwise the original model reference. - trust_remote_code reports whether the reference belongs to a trusted organization after accounting for resolved local HF cache paths. - partial_cached_snapshot() returns HF's local snapshot path for the repo/revision, even when the snapshot is incomplete. - missing_required_components() reports whether a local model directory has the components this project expects before an offline load. - missing_remote_code_components() reports trusted remote-code files referenced by Transformers auto_map metadata but absent locally.

Deliberate Hugging Face coupling: repo-id validation, cache-root resolution, cache scanning, snapshot layout, artifact names, tokenizer filenames, and sharded weight index parsing mirror current Hugging Face Hub and Transformers behavior. This is intentional so NSS decisions match the libraries that load the model. If model loading or cache preflight behavior changes after an upstream HF release, inspect this class first.

Internal helpers are not a generic model-layout abstraction. They should stay close to HF's implementation rather than grow compatibility shims for unrelated storage formats.

Methods:

Name	Description
`parse`	Parse a model identifier or path without contacting Hugging Face.
`missing_required_components`	Return local model components missing from `model_dir`.
`missing_remote_code_components`	Return trusted remote-code components referenced by config but absent locally.
`partial_cached_snapshot`	Return the local HF snapshot for this repo/revision, even if it is partial.
`is_trusted_org`	Return whether an organization is allowed to load remote code.
`target`	Return the local snapshot path when available, otherwise the original input.

Attributes:

Name	Type	Description
`trust_remote_code`	`bool`	Whether loaders should pass `trust_remote_code=True` for this model.

`trust_remote_code` `property` ¶

Whether loaders should pass trust_remote_code=True for this model.

`parse(model_name, *, revision='main', cache_root=None)` `classmethod` ¶

Parse a model identifier or path without contacting Hugging Face.

This is safe to call in preflight and loader setup because it uses Hugging Face's local cache APIs only. Cached-model hits may still cost a few milliseconds because HF cache scanning walks cache metadata to confirm model artifacts exist.

Source code in src/nemo_safe_synthesizer/llm/utils.py

@classmethod
def parse(
    cls,
    model_name: str | Path,
    *,
    revision: str = "main",
    cache_root: str | Path | None = None,
) -> Self:
    """Parse a model identifier or path without contacting Hugging Face.

    This is safe to call in preflight and loader setup because it uses
    Hugging Face's local cache APIs only. Cached-model hits may still cost a
    few milliseconds because HF cache scanning walks cache metadata to
    confirm model artifacts exist.
    """
    cache_root_path = Path(cache_root) if cache_root is not None else cls._default_hf_cache_root()
    model_ref = str(model_name)
    if not model_ref:
        return cls(original=model_name, revision=revision, cache_root=cache_root_path)

    model_path = Path(model_name)
    if model_path.exists():
        repo_id = cls._repo_id_from_hf_cache_path(model_path, cache_root_path)
        return cls(
            original=model_name,
            repo_id=repo_id,
            revision=revision,
            local_path=model_path,
            cache_root=cache_root_path,
        )

    repo_id = cls._repo_id_from_hub_identifier(model_ref)
    local_path = cls._cached_snapshot_for_repo(repo_id, revision, cache_root_path) if repo_id else None
    return cls(
        original=model_name,
        repo_id=repo_id,
        revision=revision,
        local_path=local_path,
        cache_root=cache_root_path,
    )

`missing_required_components(model_dir)` `classmethod` ¶

Return local model components missing from model_dir.

Source code in src/nemo_safe_synthesizer/llm/utils.py

@classmethod
def missing_required_components(cls, model_dir: Path) -> list[str]:
    """Return local model components missing from ``model_dir``."""
    return [name for name, present in cls._required_component_status(model_dir).items() if not present]

`missing_remote_code_components(model_dir)` `classmethod` ¶

Return trusted remote-code components referenced by config but absent locally.

Source code in src/nemo_safe_synthesizer/llm/utils.py

@classmethod
def missing_remote_code_components(cls, model_dir: Path) -> list[str]:
    """Return trusted remote-code components referenced by config but absent locally."""
    required = cls._remote_code_components(model_dir)
    missing: list[str] = []
    for component, local_path in required:
        if local_path is None or not (model_dir / local_path).is_file():
            missing.append(component)
    return sorted(missing)

`partial_cached_snapshot()` ¶

Return the local HF snapshot for this repo/revision, even if it is partial.

Source code in src/nemo_safe_synthesizer/llm/utils.py

def partial_cached_snapshot(self) -> Path | None:
    """Return the local HF snapshot for this repo/revision, even if it is partial."""
    if self.repo_id is None or self.cache_root is None:
        return None
    return self._local_snapshot_for_repo(self.repo_id, self.revision, self.cache_root)

`is_trusted_org(org)` `classmethod` ¶

Return whether an organization is allowed to load remote code.

Source code in src/nemo_safe_synthesizer/llm/utils.py

@classmethod
def is_trusted_org(cls, org: str) -> bool:
    """Return whether an organization is allowed to load remote code."""
    return org.casefold() in cls.trusted_orgs

`target()` ¶

Return the local snapshot path when available, otherwise the original input.

Source code in src/nemo_safe_synthesizer/llm/utils.py

def target(self) -> str:
    """Return the local snapshot path when available, otherwise the original input."""
    return str(self.local_path or self.original)

`trust_remote_code_for_model(model_name, *, cache_root=None)` ¶

Determine whether to trust remote code when loading a model.

Returns True for model identifiers owned by trusted organizations, including configured Hugging Face cache snapshots for those organizations.

Parameters:

Name	Type	Description	Default
`model_name`	`str \| Path`	HuggingFace model identifier or local path.	required
`cache_root`	`str \| Path \| None`	Hugging Face Hub cache root. Defaults to the configured hub cache.	`None`

Returns:

Type	Description
`bool`	Whether to set `trust_remote_code=True` when loading the model.

Source code in src/nemo_safe_synthesizer/llm/utils.py

def trust_remote_code_for_model(model_name: str | Path, *, cache_root: str | Path | None = None) -> bool:
    """Determine whether to trust remote code when loading a model.

    Returns ``True`` for model identifiers owned by trusted organizations,
    including configured Hugging Face cache snapshots for those organizations.

    Args:
        model_name: HuggingFace model identifier or local path.
        cache_root: Hugging Face Hub cache root. Defaults to the configured hub cache.

    Returns:
        Whether to set ``trust_remote_code=True`` when loading the model.
    """
    return ModelRef.parse(model_name, cache_root=cache_root).trust_remote_code

`cleanup_memory()` ¶

Run garbage collection and empty the CUDA cache.

Source code in src/nemo_safe_synthesizer/llm/utils.py

def cleanup_memory() -> None:
    """Run garbage collection and empty the CUDA cache."""
    import torch

    gc.collect()
    with torch.no_grad():
        torch.cuda.empty_cache()

`gpu_stats()` ¶

Log current GPU memory reservation and total capacity.

Queries CUDA device 0 and logs the peak reserved memory and total available memory in GiB.

Source code in src/nemo_safe_synthesizer/llm/utils.py

def gpu_stats() -> None:
    """Log current GPU memory reservation and total capacity.

    Queries CUDA device 0 and logs the peak reserved memory and total
    available memory in GiB.
    """
    import torch

    def round_gb(value: float) -> float:
        return round(value / 1024 / 1024 / 1024, 3)

    gpu_stats = torch.cuda.get_device_properties(0)
    start_gpu_memory = round_gb(torch.cuda.max_memory_reserved())
    max_memory = round_gb(gpu_stats.total_memory)
    logger.info(f"{start_gpu_memory} GB of memory reserved.")
    logger.info(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")

`get_max_vram(max_vram_fraction=None)` ¶

Calculate maximum memory allocation for each available GPU.

Reserves a 2 GiB safety buffer on each device, then applies max_vram_fraction to the remaining free memory.

Parameters:

Name	Type	Description	Default
`max_vram_fraction`	`float \| None`	Fraction of total GPU memory to allocate. Defaults to `0.8` (80 %).	`None`

Returns:

Type	Description
`dict[int, float]`	Mapping of CUDA device index to the usable memory fraction.

Source code in src/nemo_safe_synthesizer/llm/utils.py

def get_max_vram(max_vram_fraction: float | None = None) -> dict[int, float]:
    """Calculate maximum memory allocation for each available GPU.

    Reserves a 2 GiB safety buffer on each device, then applies
    ``max_vram_fraction`` to the remaining free memory.

    Args:
        max_vram_fraction: Fraction of total GPU memory to allocate.
            Defaults to ``0.8`` (80 %).

    Returns:
        Mapping of CUDA device index to the usable memory fraction.
    """
    import torch

    if max_vram_fraction is None:
        max_vram_fraction = 0.8
    max_memory = {}

    if torch.cuda.is_available():
        num_gpus = torch.cuda.device_count()
        for i in range(num_gpus):
            free, total = torch.cuda.mem_get_info(device=i)
            safe_free = free - (2 * 1024**3)
            gpu_memory_utilization = min(max_vram_fraction, safe_free / total)
            memory_gib = gpu_memory_utilization * total / (1024**3)
            max_memory[i] = gpu_memory_utilization
            logger.info(
                f"GPU {i}: Will allocate {memory_gib:.2f}GiB ({max_vram_fraction * 100}% of {total / (1024**3):.2f}GiB)"
            )

    return max_memory

`add_bos_eos_tokens_to_tokenizer(tokenizer)` ¶

Enable BOS/EOS token injection and set a pad token if missing.

Mutates tokenizer in-place to set add_bos_token and add_eos_token to True. If no pad token is configured, pad_token_id is set to eos_token_id.

Parameters:

Name	Type	Description	Default
`tokenizer`	`PreTrainedTokenizer`	The tokenizer to configure.	required

Returns:

Type	Description
`PreTrainedTokenizer`	The same tokenizer instance, modified in-place.

Source code in src/nemo_safe_synthesizer/llm/utils.py

def add_bos_eos_tokens_to_tokenizer(tokenizer: PreTrainedTokenizer) -> PreTrainedTokenizer:
    """Enable BOS/EOS token injection and set a pad token if missing.

    Mutates ``tokenizer`` in-place to set ``add_bos_token`` and
    ``add_eos_token`` to ``True``.  If no pad token is configured,
    ``pad_token_id`` is set to ``eos_token_id``.

    Args:
        tokenizer: The tokenizer to configure.

    Returns:
        The same tokenizer instance, modified in-place.
    """
    tokenizer.add_bos_token = True
    tokenizer.add_eos_token = True
    if not tokenizer.pad_token_id:
        tokenizer.pad_token_id = tokenizer.eos_token_id
    return tokenizer

`get_param_from_config(param, default_value=None, model_name=None, trust_remote_code=None, config=None)` ¶

Read a single attribute from a HuggingFace AutoConfig.

Either an existing config object or a model_name (used to load one on the fly) must be provided.

Parameters:

Name	Type	Description	Default
`param`	`str`	Name of the config attribute to retrieve.	required
`default_value`	`Any \| None`	Fallback value when the attribute is absent.	`None`
`model_name`	`str \| None`	HuggingFace model identifier. Required when `config` is not supplied.	`None`
`trust_remote_code`	`bool \| None`	Passed through to `AutoConfig.from_pretrained` when loading a config.	`None`
`config`	`AutoConfig \| None`	Pre-loaded `AutoConfig`. Takes precedence over `model_name`.	`None`

Returns:

Type	Description
`str \| None`	The attribute value, or `default_value` if the attribute does
`str \| None`	not exist on the config.

Raises:

Type	Description
`ValueError`	If neither `model_name` nor `config` is provided.

Source code in src/nemo_safe_synthesizer/llm/utils.py

def get_param_from_config(
    param: str,
    default_value: Any | None = None,
    model_name: str | None = None,
    trust_remote_code: bool | None = None,
    config: AutoConfig | None = None,
) -> str | None:
    """Read a single attribute from a HuggingFace ``AutoConfig``.

    Either an existing ``config`` object or a ``model_name`` (used to
    load one on the fly) must be provided.

    Args:
        param: Name of the config attribute to retrieve.
        default_value: Fallback value when the attribute is absent.
        model_name: HuggingFace model identifier.  Required when
            ``config`` is not supplied.
        trust_remote_code: Passed through to
            ``AutoConfig.from_pretrained`` when loading a config.
        config: Pre-loaded ``AutoConfig``.  Takes precedence over
            ``model_name``.

    Returns:
        The attribute value, or ``default_value`` if the attribute does
        not exist on the config.

    Raises:
        ValueError: If neither ``model_name`` nor ``config`` is provided.
    """
    from transformers import AutoConfig

    if config is None:
        if model_name is None:
            raise ValueError("model_name is required if config is not provided")
        config = AutoConfig.from_pretrained(model_name, trust_remote_code=trust_remote_code)

    return getattr(config, param, default_value)

`get_device_map(model_target, autoconfig=None, revision=None, trust_remote_code=False, local_files_only=False, force_single_device=None)` ¶

Infer the device map for a model and optionally pin all layers to one device.

Uses accelerate.infer_auto_device_map on an empty-weight model skeleton to determine layer-to-device assignments.

Parameters:

Name	Type	Description	Default
`model_target`	`str`	HuggingFace model identifier or local path.	required
`autoconfig`	`AutoConfig \| None`	Pre-loaded `AutoConfig`. If `None`, one is loaded from `model_target`.	`None`
`revision`	`str \| None`	Model revision (branch, tag, or commit hash).	`None`
`trust_remote_code`	`bool`	Whether to trust remote code when loading.	`False`
`local_files_only`	`bool`	Restrict loading to local files only.	`False`
`force_single_device`	`int \| None`	When set, every layer is assigned to this CUDA device index.	`None`

Returns:

Type	Description
`str \| dict[str, int \| str]`	Ordered dictionary mapping layer names to device identifiers.

Source code in src/nemo_safe_synthesizer/llm/utils.py

def get_device_map(
    model_target: str,
    autoconfig: AutoConfig | None = None,
    revision: str | None = None,
    trust_remote_code: bool = False,
    local_files_only: bool = False,
    force_single_device: int | None = None,
) -> str | dict[str, int | str]:
    """Infer the device map for a model and optionally pin all layers to one device.

    Uses ``accelerate.infer_auto_device_map`` on an empty-weight model
    skeleton to determine layer-to-device assignments.

    Args:
        model_target: HuggingFace model identifier or local path.
        autoconfig: Pre-loaded ``AutoConfig``.  If ``None``, one is
            loaded from ``model_target``.
        revision: Model revision (branch, tag, or commit hash).
        trust_remote_code: Whether to trust remote code when loading.
        local_files_only: Restrict loading to local files only.
        force_single_device: When set, every layer is assigned to this
            CUDA device index.

    Returns:
        Ordered dictionary mapping layer names to device identifiers.
    """
    from accelerate import infer_auto_device_map, init_empty_weights
    from transformers import AutoConfig, AutoModelForCausalLM

    config = autoconfig or AutoConfig.from_pretrained(
        model_target,
        revision=revision,
        trust_remote_code=trust_remote_code,
        local_files_only=local_files_only,
    )
    # Create an empty model with the configuration
    with init_empty_weights():
        model = AutoModelForCausalLM.from_config(config, trust_remote_code=trust_remote_code)
    device_map = infer_auto_device_map(model)
    if force_single_device is not None:
        for key in device_map:
            device_map[key] = force_single_device
    return device_map

`count_trainable_params(model)` ¶

Count trainable and total parameters in a PEFT model.

Parameters:

Name	Type	Description	Default
`model`	`PeftModel`	A `PeftModel` (or any `torch.nn.Module`) to inspect.	required

Returns:

Type	Description
`tuple[int, int]`	A tuple of `(trainable_params, all_params)`.

Source code in src/nemo_safe_synthesizer/llm/utils.py

def count_trainable_params(model: PeftModel) -> tuple[int, int]:
    """Count trainable and total parameters in a PEFT model.

    Args:
        model: A ``PeftModel`` (or any ``torch.nn.Module``) to inspect.

    Returns:
        A tuple of ``(trainable_params, all_params)``.
    """
    trainable_params = 0
    all_params = 0
    for _, param in model.named_parameters():
        all_params += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    return trainable_params, all_params

`get_quantization_config(quantization_bits)` ¶

Build a BitsAndBytesConfig for 4-bit or 8-bit quantization.

Both configurations use NormalFloat quantization with double quantization enabled and bfloat16 as the compute dtype.

Parameters:

Name	Type	Description	Default
`quantization_bits`	`Literal[4, 8]`	Number of bits — must be `4` or `8`.	required

Returns:

Type	Description
`BitsAndBytesConfig`	A `BitsAndBytesConfig` ready to pass to model loading.

Raises:

Type	Description
`ValueError`	If `quantization_bits` is not 4 or 8.

Source code in src/nemo_safe_synthesizer/llm/utils.py

def get_quantization_config(quantization_bits: Literal[4, 8]) -> BitsAndBytesConfig:
    """Build a ``BitsAndBytesConfig`` for 4-bit or 8-bit quantization.

    Both configurations use NormalFloat quantization with double
    quantization enabled and ``bfloat16`` as the compute dtype.

    Args:
        quantization_bits: Number of bits — must be ``4`` or ``8``.

    Returns:
        A ``BitsAndBytesConfig`` ready to pass to model loading.

    Raises:
        ValueError: If ``quantization_bits`` is not 4 or 8.
    """
    import torch
    from transformers import BitsAndBytesConfig

    if quantization_bits == 4:
        return BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_quant_type="nf4",
            bnb_4bit_use_double_quant=True,
            bnb_4bit_compute_dtype=torch.bfloat16,
        )
    elif quantization_bits == 8:
        return BitsAndBytesConfig(
            load_in_8bit=True,
            bnb_8bit_quant_type="nf8",
            bnb_8bit_use_double_quant=True,
            bnb_8bit_compute_dtype=torch.bfloat16,
        )
    else:
        raise ValueError(f"Invalid quantization bits: {quantization_bits}")

`get_device_name()` ¶

Get the name of the current device (first index). Returns 'undefined' if the device is not available.

Source code in src/nemo_safe_synthesizer/llm/utils.py

def get_device_name() -> str:
    """Get the name of the current device (first index). Returns 'undefined' if the device is not available."""
    try:
        import torch

        return torch.cuda.get_device_properties(0).name
    except Exception:
        return "undefined"

utils

utils ¶

ModelRef(original, repo_id=None, revision='main', local_path=None, cache_root=None) dataclass ¶

trust_remote_code property ¶

parse(model_name, *, revision='main', cache_root=None) classmethod ¶

missing_required_components(model_dir) classmethod ¶

missing_remote_code_components(model_dir) classmethod ¶

partial_cached_snapshot() ¶

is_trusted_org(org) classmethod ¶

target() ¶

trust_remote_code_for_model(model_name, *, cache_root=None) ¶

cleanup_memory() ¶

gpu_stats() ¶

get_max_vram(max_vram_fraction=None) ¶

add_bos_eos_tokens_to_tokenizer(tokenizer) ¶

get_param_from_config(param, default_value=None, model_name=None, trust_remote_code=None, config=None) ¶

get_device_map(model_target, autoconfig=None, revision=None, trust_remote_code=False, local_files_only=False, force_single_device=None) ¶

count_trainable_params(model) ¶

get_quantization_config(quantization_bits) ¶

get_device_name() ¶

`utils` ¶

`ModelRef(original, repo_id=None, revision='main', local_path=None, cache_root=None)` `dataclass` ¶

`trust_remote_code` `property` ¶

`parse(model_name, *, revision='main', cache_root=None)` `classmethod` ¶

`missing_required_components(model_dir)` `classmethod` ¶

`missing_remote_code_components(model_dir)` `classmethod` ¶

`partial_cached_snapshot()` ¶

`is_trusted_org(org)` `classmethod` ¶

`target()` ¶

`trust_remote_code_for_model(model_name, *, cache_root=None)` ¶

`cleanup_memory()` ¶

`gpu_stats()` ¶

`get_max_vram(max_vram_fraction=None)` ¶

`add_bos_eos_tokens_to_tokenizer(tokenizer)` ¶

`get_param_from_config(param, default_value=None, model_name=None, trust_remote_code=None, config=None)` ¶

`get_device_map(model_target, autoconfig=None, revision=None, trust_remote_code=False, local_files_only=False, force_single_device=None)` ¶

`count_trainable_params(model)` ¶

`get_quantization_config(quantization_bits)` ¶

`get_device_name()` ¶