8000 GitHub - fangyuan-ksgk/Tiny-GRPO: minimal GRPO implementation from scratch
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

fangyuan-ksgk/Tiny-GRPO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tiny_po

Minimal implementation of Group Relative Policy Optimization (GRPO) (DeepSeek) from scratch. No complicated file structure—just a simple, hackable implementation with few scripts for better understanding of the algorithm.

Inspired by the implementation by @aburkov. This implementation optimizes memory usage during training by:

  • Using chunk-wise softmax operations
  • Leveraging mixed precision training
    Together, these techniques reduce memory usage by 50%, enabling GRPO to run on singel GPU while achieving strong results on math datasets.

set up environment

bash set.sh 

train GRPO on gsm8k dataset (Qwen-2.5-Instruct-1.5B)

python train.py

test model output

python test.py

🤝 Contributing Feel free to submit issues, PRs, or suggestions to improve the implementation!

⚡ Acknowledgments Inspired by @aburkov's work in The LM Book (https://github.com/aburkov/theLMbook).

About

minimal GRPO implementation from scratch

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0