v0.3.1: Multi-modal DAPO

@hiyouga

What's Changed

[example] fix runtime env by @hiyouga in #224
Update Awesome Work using EasyR1 by @RainBowLuoCS in #240
Update Awesome Work using EasyR1 by @xyliugo in #239
[trainer] support async reward by @hiyouga in #252
[readme] add baselines by @hiyouga in #253
[script] fix merge script by @hiyouga in #254
[misc] update baselines & docker image by @hiyouga in #256
[readme] update baseline by @hiyouga in #258
[data] support custom chat template by @hiyouga in #270
[reward] support batch reward by @hiyouga in #271
[example] change env vars by @hiyouga in #272
[Readme] Add awesome work using EasyR1 by @Wangbiao2 in #273
[model] add qwen3 support by @hiyouga in #276
[example] update script by @hiyouga in #277
[readme] update wechat by @hiyouga in #280
[misc] fix logger by @hiyouga in #288
[readme] update wechat by @hiyouga in #292
add get model from modelscope by @Saigyouji-Yuyuko1000 in #297
[readme] update wechat by @hiyouga in #301
Add a new work based on EasyR1 by @LiuRicky in #303
add new work based on EasyR1 by @waltonfuture in #313
[logger] fix tensorboard by @hiyouga in #316
Add a new work based on EasyR1 by @Gabesarch in #325
[misc] fix console hanging by @hiyouga in #293
[misc] several update by @hiyouga in #329
Update README.md by @CSfufu in #330
[perf] pass raw image data between workers by @tongxiao2002 in #318
[readme] add our work using EasyR1 by @kxfan2002 in #331
add our work using EasyR1 by @YutingLi0606 in #337
[data] fix position ids for qwen2vl mrope & add test by @hiyouga in #339
[worker] colocate actor and ref model by @hiyouga in #342
[trainer] save best checkpoint by @hiyouga in #343
[trainer] fix bug by @hiyouga in #344
[utils] update data protocol by @hiyouga in #345
[trainer] repeat rollout and prepare filter by @hiyouga in #346
[worker] expose rollout manager by @hiyouga in #347
[worker] fix vllm sharding manager by @hiyouga in #348
fix: bug by @gdw439 in #350
[trainer] fix progress bar by @hiyouga in #355
[readme] update docker image by @hiyouga in #357
[trainer] add online filtering by @Saigyouji-Yuyuko1000 in #358
[worker] update reward manager by @hiyouga in #360
Fix/vllm processor cache for text only model by @cyc00518 in #359
[breaking] support text-image mixed data by @hiyouga in #361
[model] fix qwen2vl bug by @hiyouga in #363
[tracking] add tensorboard exp name by @hiyouga in #365
[worker] do not load ref if kl is disabled by @hiyouga in #366
[worker] fix skip ref model by @hiyouga in #367
[examples] add qwen3_14b_dapo17k_dapo by @Saigyouji-Yuyuko1000 in #369
[release] 0.3.1 by @hiyouga in #370

New Contributors

@RainBowLuoCS made their first contribution in #240
@xyliugo made their first contribution in #239
@Saigyouji-Yuyuko1000 made their first contribution in #297
@waltonfuture made their first contribution in #313
@Gabesarch made their first contribution in #325
@CSfufu made their first contribution in #330
@tongxiao2002 made their first contribution in #318
@kxfan2002 made their first contribution in #331
@YutingLi0606 made their first contribution in #337
@gdw439 made their first contribution in #350
@cyc00518 made their first contribution in #359

Full Changelog: v0.3.0...v0.3.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.3.1: Multi-modal DAPO

What's Changed

New Contributors

Contributors

Uh oh!