Guide to Fine-tune Nvidia NeMo models with Granary Data
Introduction
The Granary dataset stands out as one of the largest and most diverse open-source collections of European speech data available today. Designed to advance research and development in automatic speech recognition (ASR) and automatic speech translation (AST), Granary provides approximately 643,000 hours of audio paired with transcripts for ASR, and around 351,000 hours of aligned translation pairs. Its recordings are sourced from a variety of Creative Commons corpora—such as YODAS2, YouTube-Commons, VoxPopuli, and Libri-Light—and each sample is carefully reviewed to ensure that only clear, high-quality audio and accurate transcripts are included. Because the dataset includes consistent segment boundaries and normalized text across more than twenty-five languages (including Italian), it eliminates much of the preprocessing burden and allows you to focus on model development or evaluation.