v0.0.9
* upgrade vllm & adopt collective_rpc * use .float() for kl & increase timeout to 60m * speed up minibatch training * add constant lr scheduler * update * updates * fix non_eos detection * changes * minor * update * ratio * updates
8000
* upgrade vllm & adopt collective_rpc * use .float() for kl & increase timeout to 60m * speed up minibatch training * add constant lr scheduler * update * updates * fix non_eos detection * changes * minor * update * ratio * updates