Skip to content

unsloth_backend

unsloth_backend

Optimized training backend using Unsloth.

Classes:

Name Description
UnslothTrainer

Training backend using Unsloth for optimized LLM fine-tuning.

UnslothTrainer(*args, **kwargs)

Bases: HuggingFaceBackend

Training backend using Unsloth for optimized LLM fine-tuning.

Extends HuggingFaceBackend to leverage Unsloth's optimized training routines, providing faster training speeds and reduced memory usage compared to standard HuggingFace implementations.

In addition to the arguments accepted by the parent class, **kwargs may include:

  • rope_scaling -- RoPE scaling configuration from model metadata.
  • torch_dtype -- Data type for model weights.
  • quantization_config -- Configuration for model quantization.
See Also

HuggingFaceBackend: Parent class providing base training functionality.

Raises:

Type Description
RuntimeError

If CUDA is not available.

Methods:

Name Description
maybe_quantize

Apply PEFT wrapping via Unsloth's FastLanguageModel.get_peft_model.

load_model

Load a pretrained model using Unsloth's FastLanguageModel.

Source code in src/nemo_safe_synthesizer/training/unsloth_backend.py
def __init__(self, *args, **kwargs):
    from unsloth import FastLanguageModel  # ty: ignore[unresolved-import]

    super().__init__(*args, **kwargs)
    self.model_loader_type = FastLanguageModel

    if not torch.cuda.is_available():
        raise RuntimeError("Cannot use unsloth without GPU.")
    self.prepare_config(**kwargs)
    self._update_for_unsloth(**kwargs)

maybe_quantize()

Apply PEFT wrapping via Unsloth's FastLanguageModel.get_peft_model.

This method configures and applies Parameter-Efficient Fine-Tuning (PEFT) using Unsloth's optimized implementation. The PEFT wrapping is always applied to ensure the adapter is saved correctly.

Note

Unlike the parent class implementation, this method uses Unsloth's FastLanguageModel.get_peft_model.

Source code in src/nemo_safe_synthesizer/training/unsloth_backend.py
def maybe_quantize(self):
    """Apply PEFT wrapping via Unsloth's ``FastLanguageModel.get_peft_model``.

    This method configures and applies Parameter-Efficient Fine-Tuning (PEFT)
    using Unsloth's optimized implementation. The PEFT wrapping is always
    applied to ensure the adapter is saved correctly.

    Note:
        Unlike the parent class implementation, this method uses Unsloth's
        ``FastLanguageModel.get_peft_model``.
    """
    from unsloth import FastLanguageModel  # ty: ignore[unresolved-import]

    self._prepare_quantize_base()
    qparams = self.quant_params.copy()
    # unsloth infers the task type from the model, so we need to remove it from the quant params
    qparams.pop("task_type", None)
    # Always wrap the model as a PEFT model to ensure adapter is saved correctly
    self.model = FastLanguageModel.get_peft_model(self.model, **qparams)

load_model(**model_args)

Load a pretrained model using Unsloth's FastLanguageModel.

Applies a workaround that disables Unsloth's LLAMA32 support check to prevent unnecessary HuggingFace Hub requests, then calls :meth:prepare_config, :meth:_load_pretrained_model, and :meth:maybe_quantize in sequence.

Parameters:

Name Type Description Default
**model_args

Additional keyword arguments for model configuration.

{}
Note

This method applies a workaround that disables Unsloth's LLAMA32 support check to prevent unnecessary HuggingFace Hub requests. See: https://github.com/unslothai/unsloth/blob/main/unsloth/models/loader.py#L235

Source code in src/nemo_safe_synthesizer/training/unsloth_backend.py
def load_model(self, **model_args):
    """Load a pretrained model using Unsloth's ``FastLanguageModel``.

    Applies a workaround that disables Unsloth's LLAMA32 support
    check to prevent unnecessary HuggingFace Hub requests, then
    calls :meth:`prepare_config`, :meth:`_load_pretrained_model`,
    and :meth:`maybe_quantize` in sequence.

    Args:
        **model_args: Additional keyword arguments for model configuration.

    Note:
        This method applies a workaround that disables Unsloth's LLAMA32
        support check to prevent unnecessary HuggingFace Hub requests.
        See: https://github.com/unslothai/unsloth/blob/main/unsloth/models/loader.py#L235
    """
    # NOTE: this hack stops unsloth from reaching out to huggingface, see
    # https://github.com/unslothai/unsloth/blob/main/unsloth/models/loader.py#L235
    from unsloth.models import loader  # ty: ignore[unresolved-import]

    loader.SUPPORTS_LLAMA32 = False
    logger.info(f"load_model: Loading model {self.params.training.pretrained_model} with args: {model_args}")

    self.prepare_config(**model_args)
    self._load_pretrained_model(**model_args)

    self.maybe_quantize(**model_args)