Training using verl¶
Info
The pipeline starting script is
All extra parameters are passed to
PPO with verl¶
Here is an example of running PPO job with verl. You can use nemo_skills/training/verl/prepare_data.py to convert our standard SFT data format into parquet.
from nemo_skills.pipeline.cli import wrap_arguments, ppo_verl
ppo_verl(
ctx=wrap_arguments(
'++trainer.save_freq=10 '
'++data.train_batch_size=32 '
'++data.filter_prompts=False '
'++actor_rollout_ref.rollout.gpu_memory_utilization=0.7 '
'++data.max_response_length=12000 '
'++actor_rollout_ref.rollout.n=64 '
'++actor_rollout_ref.rollout.tensor_model_parallel_size=2 '
),
cluster="slurm",
expname="test-verl-ppo",
output_dir="/workspace/test-verl-ppo",
hf_model="/hf_models/Qwen2.5-1.5B-Instruct",
prompt_data="/data/rl-data.parquet",
num_gpus=8,
num_nodes=2,
)