Fine-tune Hugging Face Models Instantly with Day-0 Support with NVIDIA NeMo AutoModel
As organizations strive to maximize the value of their generative AI investments, access to the latest model developments is crucial for continued success. By using state-of-the-art models on Day-0, teams can harness these innovations efficiently, maintain relevance, and be competitive.
The past year has seen a flurry of exciting model series releases in the open-source community, including Meta Llama, Google Gemma, Mistral Codestral, Codestral Mamba, Large 2, Mixtral, Qwen 3, 2, and 2.5, Deepseek R1, NVIDIA Nemotron, and NVIDIA Llama Nemotron. These models are often made available on the Hugging Face Hub, providing the broader community with easy access.
Shortly after release, many users focus on evaluating model capabilities and exploring potential applications. Fine-tuning for specific use cases often becomes a key priority to gain an understanding of the models' potential and to identify opportunities for innovation.
The NVIDIA NeMo Framework uses NVIDIA Megatron-Core and Transformer-Engine (TE) backends to achieve high throughput and Model Flops Utilization (MFU) on thousands of NVIDIA GPUs, driving exceptional performance. However, integrating new model architectures into the NeMo framework requires multi-stage model conversion using Megatron-Core primitives, followed by validation of different phases, including supervised and parameter-efficient finetuning, model evaluation, and Hugging Face to NeMo conversion. This introduces a time delay between model release and optimal training/post-training recipe development.
To ensure Day-0 support for the latest models, NeMo framework introduces the Automatic Model (AutoModel) feature.