Tags: mbrukman/sail-sg-oat
Tags
feat: fix deps, refactor apis, allow resume training (sail-sg#39) * fix deps, refactor apis * bump version * updates * actor identity * fix ref offload * training resume * bump version
Upgrade to vllm V1 (0.8.4) and use actor api init() (sail-sg#38) * updates * bump version
Upgrade vllm for more efficient collocation (sail-sg#34) * upgrade vllm & adopt collective_rpc * use .float() for kl & increase timeout to 60m * speed up minibatch training * add constant lr scheduler * update * updates * fix non_eos detection * changes * minor * update * ratio * updates
Refactor and add PPO for math reasoning (sail-sg#25) * huge refactor to make structure clearer and more extendable * sync * fix * update docs * bump version * update logo * minor * minor * fix images * minor