Skip to content

Nemotron-Math-v2

Nemotron-Math-v2 Dataset

Using our pipelines we created Nemotron-Math-v2 Dataset. This dataset contains

  • 350K unique mathematical problems sourced from AoPS forums, Math Stack Exchange and MathOverflow
    • 7.5M natural language solutions generated by gpt-oss-120b
    • with and without Python tool use
    • 3 reasoning regimes, high, medium, and low

We used Qwen2.5-32B-Instruct to preprocess problems, and gpt-oss-120b generate solutions.

See our paper to learn more details!

How to reproduce our results

Browse the sections below to see all commands needed to fully reproduce our results.

Please note that unless you have an access to a large GPU cluster, it might take a very long time for some of the commands to complete!