GitHub - fangyuan-ksgk/Tiny-GRPO: minimal GRPO implementation from scratch

Minimal implementation of Group Relative Policy Optimization (GRPO) (DeepSeek) from scratch. No complicated file structure—just a simple, hackable implementation with few scripts for better understanding of the algorithm.

Inspired by the implementation by @aburkov. This implementation optimizes memory usage during training by:

Using chunk-wise softmax operations
Leveraging mixed precision training
Together, these techniques reduce memory usage by 50%, enabling GRPO to run on singel GPU while achieving strong results on math datasets.

set up environment

bash set.sh

train GRPO on gsm8k dataset (Qwen-2.5-Instruct-1.5B)

python train.py

test model output

python test.py

🤝 Contributing Feel free to submit issues, PRs, or suggestions to improve the implementation!

⚡ Acknowledgments Inspired by @aburkov's work in The LM Book (https://github.com/aburkov/theLMbook).

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
asset		asset
GRPO.ipynb		GRPO.ipynb
LICENSE		LICENSE
README.md		README.md
grpo.py		grpo.py
gsm8k_data.py		gsm8k_data.py
set.sh		set.sh
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

fangyuan-ksgk/Tiny-GRPO

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages