Skip to content

Training using verl

Info

The pipeline starting script is

All extra parameters are passed to

PPO with verl

Here is an example of running PPO job with verl. You can use nemo_skills/training/verl/prepare_data.py to convert our standard SFT data format into parquet.

from nemo_skills.pipeline.cli import wrap_arguments, ppo_verl

ppo_verl(
    ctx=wrap_arguments(
        '++trainer.save_freq=10 '
        '++data.train_batch_size=32 '
        '++data.filter_prompts=False '
        '++actor_rollout_ref.rollout.gpu_memory_utilization=0.7 '
        '++data.max_response_length=12000 '
        '++actor_rollout_ref.rollout.n=64 '
        '++actor_rollout_ref.rollout.tensor_model_parallel_size=2 '
    ),
    cluster="slurm",
    expname="test-verl-ppo",
    output_dir="/workspace/test-verl-ppo",
    hf_model="/hf_models/Qwen2.5-1.5B-Instruct",
    prompt_data="/data/rl-data.parquet",
    num_gpus=8,
    num_nodes=2,
)