utils

`utils` ¶

GPU memory management, quantization, device mapping, and tokenizer helpers for LLM loading.

Optional LLM dependencies are imported inside the helpers that need them so lightweight utilities such as trust_remote_code_for_model remain usable without installing the full training or inference stack.

Classes:

Name	Description
`ModelRef`	Resolved model reference for local cache and trust policy decisions.

Functions:

Name	Description
`trust_remote_code_for_model`	Determine whether to trust remote code when loading a model.
`cleanup_memory`	Run garbage collection and empty the CUDA cache.
`gpu_stats`	Log current GPU memory reservation and total capacity.
`get_max_vram`	Return vLLM-style GPU utilization fractions for each available GPU.
`get_max_memory_map`	Return Hugging Face `max_memory` byte limits for each available GPU.
`add_bos_eos_tokens_to_tokenizer`	Enable BOS/EOS token injection and set a pad token if missing.
`get_param_from_config`	Read a single attribute from a HuggingFace `AutoConfig`.
`load_fast_tokenizer`	Load a tokenizer, preferring the Rust `tokenizers` backend.
`get_device_name`	Get the name of the current device (first index). Returns 'undefined' if the device is not available.
`get_device_map`	Infer the device map for a model and optionally pin all layers to one device.
`count_trainable_params`	Count trainable and total parameters in a PEFT model.
`get_quantization_config`	Compatibility wrapper for building a transformers v5 quantization config.

`ModelRef(original, repo_id=None, revision='main', local_path=None, cache_root=None)` `dataclass` ¶

Resolved model reference for local cache and trust policy decisions.

Intended public API: - parse() normalizes a user-supplied model string or path without contacting Hugging Face. - target() returns the value that should be passed to from_pretrained-style loaders: a local snapshot path when available, otherwise the original model reference. - trust_remote_code reports whether the reference belongs to a trusted organization after accounting for resolved local HF cache paths. - partial_cached_snapshot() returns HF's local snapshot path for the repo/revision, even when the snapshot is incomplete. - missing_required_components() reports whether a local model directory has the components this project expects before an offline load. - missing_remote_code_components() reports trusted remote-code files referenced by Transformers auto_map metadata but absent locally.

Deliberate Hugging Face coupling: repo-id validation, cache-root resolution, cache scanning, snapshot layout, artifact names, tokenizer filenames, and sharded weight index parsing mirror current Hugging Face Hub and Transformers behavior. This is intentional so NSS decisions match the libraries that load the model. If model loading or cache preflight behavior changes after an upstream HF release, inspect this class first.

Internal helpers are not a generic model-layout abstraction. They should stay close to HF's implementation rather than grow compatibility shims for unrelated storage formats.

Methods:

Name	Description
`parse`	Parse a model identifier or path without contacting Hugging Face.
`missing_required_components`	Return local model components missing from `model_dir`.
`missing_remote_code_components`	Return trusted remote-code components referenced by config but absent locally.
`partial_cached_snapshot`	Return the local HF snapshot for this repo/revision, even if it is partial.
`is_trusted_org`	Return whether an organization is allowed to load remote code.
`target`	Return the local snapshot path when available, otherwise the original input.

Attributes:

Name	Type	Description
`trust_remote_code`	`bool`	Whether loaders should pass `trust_remote_code=True` for this model.

`trust_remote_code` `property` ¶

Whether loaders should pass trust_remote_code=True for this model.

`parse(model_name, *, revision='main', cache_root=None)` `classmethod` ¶

Parse a model identifier or path without contacting Hugging Face.

This is safe to call in preflight and loader setup because it uses Hugging Face's local cache APIs only. Cached-model hits may still cost a few milliseconds because HF cache scanning walks cache metadata to confirm model artifacts exist.

Source code in src/nemo_safe_synthesizer/llm/utils.py

@classmethod
def parse(
    cls,
    model_name: str | Path,
    *,
    revision: str = "main",
    cache_root: str | Path | None = None,
) -> Self:
    """Parse a model identifier or path without contacting Hugging Face.

    This is safe to call in preflight and loader setup because it uses
    Hugging Face's local cache APIs only. Cached-model hits may still cost a
    few milliseconds because HF cache scanning walks cache metadata to
    confirm model artifacts exist.
    """
    cache_root_path = Path(cache_root) if cache_root is not None else cls._default_hf_cache_root()
    model_ref = str(model_name)
    if not model_ref:
        return cls(original=model_name, revision=revision, cache_root=cache_root_path)

    model_path = Path(model_name)
    if model_path.exists():
        repo_id = cls._repo_id_from_hf_cache_path(model_path, cache_root_path)
        return cls(
            original=model_name,
            repo_id=repo_id,
            revision=revision,
            local_path=model_path,
            cache_root=cache_root_path,
        )

    repo_id = cls._repo_id_from_hub_identifier(model_ref)
    local_path = cls._cached_snapshot_for_repo(repo_id, revision, cache_root_path) if repo_id else None
    return cls(
        original=model_name,
        repo_id=repo_id,
        revision=revision,
        local_path=local_path,
        cache_root=cache_root_path,
    )

`missing_required_components(model_dir)` `classmethod` ¶

Return local model components missing from model_dir.

Source code in src/nemo_safe_synthesizer/llm/utils.py

@classmethod
def missing_required_components(cls, model_dir: Path) -> list[str]:
    """Return local model components missing from ``model_dir``."""
    return [name for name, present in cls._required_component_status(model_dir).items() if not present]

`missing_remote_code_components(model_dir)` `classmethod` ¶

Return trusted remote-code components referenced by config but absent locally.

Source code in src/nemo_safe_synthesizer/llm/utils.py

@classmethod
def missing_remote_code_components(cls, model_dir: Path) -> list[str]:
    """Return trusted remote-code components referenced by config but absent locally."""
    required = cls._remote_code_components(model_dir)
    missing: list[str] = []
    for component, local_path in required:
        if local_path is None or not (model_dir / local_path).is_file():
            missing.append(component)
    return sorted(missing)

`partial_cached_snapshot()` ¶

Return the local HF snapshot for this repo/revision, even if it is partial.

Source code in src/nemo_safe_synthesizer/llm/utils.py

def partial_cached_snapshot(self) -> Path | None:
    """Return the local HF snapshot for this repo/revision, even if it is partial."""
    if self.repo_id is None or self.cache_root is None:
        return None
    return self._local_snapshot_for_repo(self.repo_id, self.revision, self.cache_root)

`is_trusted_org(org)` `classmethod` ¶

Return whether an organization is allowed to load remote code.

Source code in src/nemo_safe_synthesizer/llm/utils.py

@classmethod
def is_trusted_org(cls, org: str) -> bool:
    """Return whether an organization is allowed to load remote code."""
    return org.casefold() in cls.trusted_orgs

`target()` ¶

Return the local snapshot path when available, otherwise the original input.

Source code in src/nemo_safe_synthesizer/llm/utils.py

def target(self) -> str:
    """Return the local snapshot path when available, otherwise the original input."""
    return str(self.local_path or self.original)

`trust_remote_code_for_model(model_name, *, cache_root=None)` ¶

Determine whether to trust remote code when loading a model.

Returns True for model identifiers owned by trusted organizations, including configured Hugging Face cache snapshots for those organizations.

Parameters:

Name	Type	Description	Default
`model_name`	`str \| Path`	HuggingFace model identifier or local path.	required
`cache_root`	`str \| Path \| None`	Hugging Face Hub cache root. Defaults to the configured hub cache.	`None`

Returns:

Type	Description
`bool`	Whether to set `trust_remote_code=True` when loading the model.

Source code in src/nemo_safe_synthesizer/llm/utils.py

def trust_remote_code_for_model(model_name: str | Path, *, cache_root: str | Path | None = None) -> bool:
    """Determine whether to trust remote code when loading a model.

    Returns ``True`` for model identifiers owned by trusted organizations,
    including configured Hugging Face cache snapshots for those organizations.

    Args:
        model_name: HuggingFace model identifier or local path.
        cache_root: Hugging Face Hub cache root. Defaults to the configured hub cache.

    Returns:
        Whether to set ``trust_remote_code=True`` when loading the model.
    """
    return ModelRef.parse(model_name, cache_root=cache_root).trust_remote_code

`cleanup_memory()` ¶

Run garbage collection and empty the CUDA cache.

Source code in src/nemo_safe_synthesizer/llm/utils.py

def cleanup_memory() -> None:
    """Run garbage collection and empty the CUDA cache."""
    import torch

    gc.collect()
    with torch.no_grad():
        torch.cuda.empty_cache()

`gpu_stats()` ¶

Log current GPU memory reservation and total capacity.

Queries CUDA device 0 and logs the peak reserved memory and total available memory in GiB.

Source code in src/nemo_safe_synthesizer/llm/utils.py

def gpu_stats() -> None:
    """Log current GPU memory reservation and total capacity.

    Queries CUDA device 0 and logs the peak reserved memory and total
    available memory in GiB.
    """
    import torch

    def round_gb(value: float) -> float:
        return round(value / 1024 / 1024 / 1024, 3)

    gpu_stats = torch.cuda.get_device_properties(0)
    start_gpu_memory = round_gb(torch.cuda.max_memory_reserved())
    max_memory = round_gb(gpu_stats.total_memory)
    logger.info(f"{start_gpu_memory} GB of memory reserved.")
    logger.info(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")

`get_max_vram(max_vram_fraction=None)` ¶

Return vLLM-style GPU utilization fractions for each available GPU.

Source code in src/nemo_safe_synthesizer/llm/utils.py

def get_max_vram(max_vram_fraction: float | None = None) -> dict[int, float]:
    """Return vLLM-style GPU utilization fractions for each available GPU."""
    return {device: allocation.utilization for device, allocation in _get_vram_allocations(max_vram_fraction).items()}

`get_max_memory_map(max_vram_fraction=None)` ¶

Return Hugging Face max_memory byte limits for each available GPU.

Source code in src/nemo_safe_synthesizer/llm/utils.py

def get_max_memory_map(max_vram_fraction: float | None = None) -> dict[int, int]:
    """Return Hugging Face ``max_memory`` byte limits for each available GPU."""
    return {device: allocation.memory_bytes for device, allocation in _get_vram_allocations(max_vram_fraction).items()}

`add_bos_eos_tokens_to_tokenizer(tokenizer)` ¶

Enable BOS/EOS token injection and set a pad token if missing.

Mutates tokenizer in-place to set add_bos_token and add_eos_token to True. If no pad token is configured, pad_token_id is set to eos_token_id.

Parameters:

Name	Type	Description	Default
`tokenizer`	`PreTrainedTokenizer`	The tokenizer to configure.	required

Returns:

Type	Description
`PreTrainedTokenizer`	The same tokenizer instance, modified in-place.

Source code in src/nemo_safe_synthesizer/llm/utils.py

def add_bos_eos_tokens_to_tokenizer(tokenizer: PreTrainedTokenizer) -> PreTrainedTokenizer:
    """Enable BOS/EOS token injection and set a pad token if missing.

    Mutates ``tokenizer`` in-place to set ``add_bos_token`` and
    ``add_eos_token`` to ``True``.  If no pad token is configured,
    ``pad_token_id`` is set to ``eos_token_id``.

    Args:
        tokenizer: The tokenizer to configure.

    Returns:
        The same tokenizer instance, modified in-place.
    """
    tokenizer.add_bos_token = True
    tokenizer.add_eos_token = True
    if not tokenizer.pad_token_id:
        tokenizer.pad_token_id = tokenizer.eos_token_id
    return tokenizer

`get_param_from_config(param, default_value=None, model_name=None, trust_remote_code=None, config=None)` ¶

Read a single attribute from a HuggingFace AutoConfig.

Either an existing config object or a model_name (used to load one on the fly) must be provided.

Parameters:

Name	Type	Description	Default
`param`	`str`	Name of the config attribute to retrieve.	required
`default_value`	`Any \| None`	Fallback value when the attribute is absent.	`None`
`model_name`	`str \| None`	HuggingFace model identifier. Required when `config` is not supplied.	`None`
`trust_remote_code`	`bool \| None`	Passed through to `AutoConfig.from_pretrained` when loading a config.	`None`
`config`	`AutoConfig \| None`	Pre-loaded `AutoConfig`. Takes precedence over `model_name`.	`None`

Returns:

Type	Description
`str \| None`	The attribute value, or `default_value` if the attribute does
`str \| None`	not exist on the config.

Raises:

Type	Description
`ValueError`	If neither `model_name` nor `config` is provided.

Source code in src/nemo_safe_synthesizer/llm/utils.py

def get_param_from_config(
    param: str,
    default_value: Any | None = None,
    model_name: str | None = None,
    trust_remote_code: bool | None = None,
    config: AutoConfig | None = None,
) -> str | None:
    """Read a single attribute from a HuggingFace ``AutoConfig``.

    Either an existing ``config`` object or a ``model_name`` (used to
    load one on the fly) must be provided.

    Args:
        param: Name of the config attribute to retrieve.
        default_value: Fallback value when the attribute is absent.
        model_name: HuggingFace model identifier.  Required when
            ``config`` is not supplied.
        trust_remote_code: Passed through to
            ``AutoConfig.from_pretrained`` when loading a config.
        config: Pre-loaded ``AutoConfig``.  Takes precedence over
            ``model_name``.

    Returns:
        The attribute value, or ``default_value`` if the attribute does
        not exist on the config.

    Raises:
        ValueError: If neither ``model_name`` nor ``config`` is provided.
    """
    from transformers import AutoConfig

    if config is None:
        if model_name is None:
            raise ValueError("model_name is required if config is not provided")
        config = AutoConfig.from_pretrained(model_name, trust_remote_code=trust_remote_code)

    return getattr(config, param, default_value)

`load_fast_tokenizer(model_name_or_path, **kwargs)` ¶

Load a tokenizer, preferring the Rust tokenizers backend.

Centralizes our tokenizer loads so we consistently request the fast (Rust) backend that transformers v5 auto-selects, and log when the selected backend falls back to the slow Python implementation.

Why this matters under v5: transformers v5 consolidated the previously split tokenization_*.py / tokenization_*_fast.py modules into a single file per model with automatic backend selection. use_fast defaults to True, but a small set of models with no Rust port (older SentencePiece-only checkpoints) still resolve to the slow backend. Surfacing that fallback gives operators a clear signal when tokenization is on the slow path — meaningful in our data-prep pipeline where assembling training examples is tokenizer-bound.

Parameters:

Name	Type	Description	Default
`model_name_or_path`	`Path \| str`	HuggingFace model id or local path.	required
`**kwargs`	`Any`	Forwarded to `AutoTokenizer.from_pretrained` (e.g. `model_max_length`, `trust_remote_code`). `use_fast` is forced to `True`.	`{}`

Returns:

Type	Description
`PreTrainedTokenizer`	Loaded `PreTrainedTokenizer` (Rust-backed when available).

Source code in src/nemo_safe_synthesizer/llm/utils.py

def load_fast_tokenizer(model_name_or_path: Path | str, **kwargs: Any) -> PreTrainedTokenizer:
    """Load a tokenizer, preferring the Rust ``tokenizers`` backend.

    Centralizes our tokenizer loads so we consistently request the fast
    (Rust) backend that transformers v5 auto-selects, and log when the
    selected backend falls back to the slow Python implementation.

    Why this matters under v5: transformers v5 consolidated the previously
    split ``tokenization_*.py`` / ``tokenization_*_fast.py`` modules into
    a single file per model with automatic backend selection. ``use_fast``
    defaults to ``True``, but a small set of models with no Rust port
    (older SentencePiece-only checkpoints) still resolve to the slow
    backend. Surfacing that fallback gives operators a clear signal when
    tokenization is on the slow path — meaningful in our data-prep
    pipeline where assembling training examples is tokenizer-bound.

    Args:
        model_name_or_path: HuggingFace model id or local path.
        **kwargs: Forwarded to ``AutoTokenizer.from_pretrained`` (e.g.
            ``model_max_length``, ``trust_remote_code``). ``use_fast`` is
            forced to ``True``.

    Returns:
        Loaded ``PreTrainedTokenizer`` (Rust-backed when available).
    """
    from transformers import AutoTokenizer, PreTrainedTokenizer

    kwargs["use_fast"] = True
    tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, **kwargs)
    if not getattr(tokenizer, "is_fast", False):
        logger.warning(
            "Loaded slow (Python) tokenizer for %r — no Rust backend available. "
            "Data-prep tokenization will be ~5-10x slower than the fast path.",
            str(model_name_or_path),
        )
    return cast(PreTrainedTokenizer, tokenizer)

`get_device_name()` ¶

Get the name of the current device (first index). Returns 'undefined' if the device is not available.

Source code in src/nemo_safe_synthesizer/llm/utils.py

def get_device_name() -> str:
    """Get the name of the current device (first index). Returns 'undefined' if the device is not available."""
    # torch may be absent (CPU-only install); CUDA/driver problems surface as
    # RuntimeError/AssertionError from get_device_properties. Anything else is
    # unexpected and should propagate rather than masquerade as 'undefined'.
    try:
        import torch

        return torch.cuda.get_device_properties(0).name
    except (ImportError, RuntimeError, AssertionError):
        logger.debug("Could not resolve CUDA device name; reporting 'undefined'.", exc_info=True)
        return "undefined"

`get_device_map(model_target, autoconfig=None, revision=None, trust_remote_code=False, local_files_only=False, force_single_device=None)` ¶

Infer the device map for a model and optionally pin all layers to one device.

Uses accelerate.infer_auto_device_map on an empty-weight model skeleton to determine layer-to-device assignments.

Parameters:

Name	Type	Description	Default
`model_target`	`str`	HuggingFace model identifier or local path.	required
`autoconfig`	`AutoConfig \| None`	Pre-loaded `AutoConfig`. If `None`, one is loaded from `model_target`.	`None`
`revision`	`str \| None`	Model revision (branch, tag, or commit hash).	`None`
`trust_remote_code`	`bool`	Whether to trust remote code when loading.	`False`
`local_files_only`	`bool`	Restrict loading to local files only.	`False`
`force_single_device`	`int \| None`	When set, every layer is assigned to this CUDA device index.	`None`

Returns:

Type	Description
`str \| dict[str, int \| str]`	Ordered dictionary mapping layer names to device identifiers.

Source code in src/nemo_safe_synthesizer/llm/utils.py

def get_device_map(
    model_target: str,
    autoconfig: AutoConfig | None = None,
    revision: str | None = None,
    trust_remote_code: bool = False,
    local_files_only: bool = False,
    force_single_device: int | None = None,
) -> str | dict[str, int | str]:
    """Infer the device map for a model and optionally pin all layers to one device.

    Uses ``accelerate.infer_auto_device_map`` on an empty-weight model
    skeleton to determine layer-to-device assignments.

    Args:
        model_target: HuggingFace model identifier or local path.
        autoconfig: Pre-loaded ``AutoConfig``.  If ``None``, one is
            loaded from ``model_target``.
        revision: Model revision (branch, tag, or commit hash).
        trust_remote_code: Whether to trust remote code when loading.
        local_files_only: Restrict loading to local files only.
        force_single_device: When set, every layer is assigned to this
            CUDA device index.

    Returns:
        Ordered dictionary mapping layer names to device identifiers.
    """
    from accelerate import infer_auto_device_map, init_empty_weights
    from transformers import AutoConfig, AutoModelForCausalLM

    config = autoconfig or AutoConfig.from_pretrained(
        model_target,
        revision=revision,
        trust_remote_code=trust_remote_code,
        local_files_only=local_files_only,
    )
    # Create an empty model with the configuration
    with init_empty_weights():
        model = AutoModelForCausalLM.from_config(config, trust_remote_code=trust_remote_code)
    device_map = infer_auto_device_map(model)
    if force_single_device is not None:
        for key in device_map:
            device_map[key] = force_single_device
    return device_map

`count_trainable_params(model)` ¶

Count trainable and total parameters in a PEFT model.

Parameters:

Name	Type	Description	Default
`model`	`PeftModel`	A `PeftModel` (or any `torch.nn.Module`) to inspect.	required

Returns:

Type	Description
`tuple[int, int]`	A tuple of `(trainable_params, all_params)`.

Source code in src/nemo_safe_synthesizer/llm/utils.py

def count_trainable_params(model: PeftModel) -> tuple[int, int]:
    """Count trainable and total parameters in a PEFT model.

    Args:
        model: A ``PeftModel`` (or any ``torch.nn.Module``) to inspect.

    Returns:
        A tuple of ``(trainable_params, all_params)``.
    """
    trainable_params = 0
    all_params = 0
    for _, param in model.named_parameters():
        all_params += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    return trainable_params, all_params

`get_quantization_config(scheme)` ¶

Compatibility wrapper for building a transformers v5 quantization config.

Accepts a :class:QuantizationScheme (or its string value) for new callers, or an integer 4 / 8 for backward compatibility with the legacy quantization_bits field (4 → bnb-4bit, 8 → bnb-8bit). New code should prefer :meth:nemo_safe_synthesizer.config.training.QuantizationScheme.to_transformers_config.

Parameters:

Name	Type	Description	Default
`scheme`	`QuantizationScheme \| str \| Literal[4, 8]`	A `QuantizationScheme` value, its string equivalent (e.g. `"nvfp4"`), or the legacy bit-count alias.	required

Returns:

Type	Description
`QuantizationConfigMixin`	A transformers `QuantizationConfigMixin` subclass instance
`QuantizationConfigMixin`	(`BitsAndBytesConfig`, `FineGrainedFP8Config`, `TorchAoConfig`,
`QuantizationConfigMixin`	or `Mxfp4Config`) ready to pass to `from_pretrained()` via the
`QuantizationConfigMixin`	`quantization_config=` kwarg.

Raises:

Type	Description
`ValueError`	If `scheme` is not a recognized scheme name or bit count.
`ImportError`	If the underlying quantization backend is not installed (e.g. torchao for NVFP4 / MXFP4).

Source code in src/nemo_safe_synthesizer/llm/utils.py

def get_quantization_config(scheme: QuantizationScheme | str | Literal[4, 8]) -> QuantizationConfigMixin:
    """Compatibility wrapper for building a transformers v5 quantization config.

    Accepts a :class:`QuantizationScheme` (or its string value) for new
    callers, or an integer ``4`` / ``8`` for backward compatibility with the
    legacy ``quantization_bits`` field (4 → ``bnb-4bit``, 8 → ``bnb-8bit``).
    New code should prefer
    :meth:`nemo_safe_synthesizer.config.training.QuantizationScheme.to_transformers_config`.

    Args:
        scheme: A ``QuantizationScheme`` value, its string equivalent
            (e.g. ``"nvfp4"``), or the legacy bit-count alias.

    Returns:
        A transformers ``QuantizationConfigMixin`` subclass instance
        (``BitsAndBytesConfig``, ``FineGrainedFP8Config``, ``TorchAoConfig``,
        or ``Mxfp4Config``) ready to pass to ``from_pretrained()`` via the
        ``quantization_config=`` kwarg.

    Raises:
        ValueError: If ``scheme`` is not a recognized scheme name or bit count.
        ImportError: If the underlying quantization backend is not installed
            (e.g. torchao for NVFP4 / MXFP4).
    """
    from ..config.training import QuantizationScheme

    return QuantizationScheme.from_alias(scheme).to_transformers_config()

utils

utils ¶

ModelRef(original, repo_id=None, revision='main', local_path=None, cache_root=None) dataclass ¶

trust_remote_code property ¶

parse(model_name, *, revision='main', cache_root=None) classmethod ¶

missing_required_components(model_dir) classmethod ¶

missing_remote_code_components(model_dir) classmethod ¶

partial_cached_snapshot() ¶

is_trusted_org(org) classmethod ¶

target() ¶

trust_remote_code_for_model(model_name, *, cache_root=None) ¶

cleanup_memory() ¶

gpu_stats() ¶

get_max_vram(max_vram_fraction=None) ¶

get_max_memory_map(max_vram_fraction=None) ¶

add_bos_eos_tokens_to_tokenizer(tokenizer) ¶

get_param_from_config(param, default_value=None, model_name=None, trust_remote_code=None, config=None) ¶

load_fast_tokenizer(model_name_or_path, **kwargs) ¶

get_device_name() ¶

get_device_map(model_target, autoconfig=None, revision=None, trust_remote_code=False, local_files_only=False, force_single_device=None) ¶

count_trainable_params(model) ¶

get_quantization_config(scheme) ¶

`utils` ¶

`ModelRef(original, repo_id=None, revision='main', local_path=None, cache_root=None)` `dataclass` ¶

`trust_remote_code` `property` ¶

`parse(model_name, *, revision='main', cache_root=None)` `classmethod` ¶

`missing_required_components(model_dir)` `classmethod` ¶

`missing_remote_code_components(model_dir)` `classmethod` ¶

`partial_cached_snapshot()` ¶

`is_trusted_org(org)` `classmethod` ¶

`target()` ¶

`trust_remote_code_for_model(model_name, *, cache_root=None)` ¶

`cleanup_memory()` ¶

`gpu_stats()` ¶

`get_max_vram(max_vram_fraction=None)` ¶

`get_max_memory_map(max_vram_fraction=None)` ¶

`add_bos_eos_tokens_to_tokenizer(tokenizer)` ¶

`get_param_from_config(param, default_value=None, model_name=None, trust_remote_code=None, config=None)` ¶

`load_fast_tokenizer(model_name_or_path, **kwargs)` ¶

`get_device_name()` ¶

`get_device_map(model_target, autoconfig=None, revision=None, trust_remote_code=False, local_files_only=False, force_single_device=None)` ¶

`count_trainable_params(model)` ¶

`get_quantization_config(scheme)` ¶