8000 GitHub - shangshang-wang/Tina: Tina: Tiny Reasoning Models via LoRA
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

shangshang-wang/Tina

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tina: Tiny Reasoning Models via LoRA


Github Website Hugging Face Collection Weights and Biases

Overview

This repository contains the code for the Tina project, accompanying the paper Tina: Tiny Reasoning Models via LoRA. We in this project try to answer the question "How cost-effectively can one perform reinforcement learning to efficiently instill reasoning abilities in language models?" Specifically, we explore enhancing reasoning capabilities in tiny language models with low-rank adaptation during reinforcement learning.

Overall Comparison

We show that our Tina models achieve performance competitive with, and in some cases even superior to, SOTA baseline models built on the same base model with full-parameter training. In particular, the best Tina model achieves a >20% performance increase and 43.33% Pass@1 accuracy on AIME24. Notably, the cost of reproducing the best Tina checkpoint stands at only $9, and of reproducing all our experiments from scratch at $526.

Cost Breakdown

Quick Start

File Setup

  • ./scripts/set/set_vars.sh: contain the main env vars we use, change the path (marked with a TODO sign) to align with your own setting.
  • ./recipes/DeepSeek-R1-Distill-Qwen-1.5B/grpo/: contain the recipes for each experiment in this project, change the HF hub id (marked with a TODO sign) to align with your own setting.
  • ./tina/config.py: contain the main configurations for this project, set default values here.
  • ./tina/utils/constant.py: contain the main datasets for each experiment in this project.

Env Setup

Run the following commands to install the dependencies.

conda update -n base -c defaults conda -y
conda install -n base -c conda-forge mamba -y

mamba create -n tina python=3.10 -y && mamba activate tina
./scripts/set/set_env.sh && mamba deactivate

mamba create -n tina_eval python=3.11 -y && mamba activate tina_eval
./scripts/set/set_env_eval.sh && mamba deactivate

# download the pre-trained models to the `CKPT_DIR` directory.
./scripts/set/prepare.sh

Training & Evaluation

  • LoRA-based RL with GRPO: ./scripts/training/post_train_grpo.sh
Ablation

After that, we have the following file structure in the CKPT_DIR directory.

CKPT_DIR/
│
├── models/
│   ├── DeepSeek-R1-Distill-Qwen-1.5B/
│   │   └── base/ # pre-trained models 
│   │   └── grpo_PT_DATASET_I/ # post-trained models via GRPO using PT_DATASET_I
│   │   │   └── checkpoint-i/ # we should keep checkpoints during post-training in a stepwise manner
│   │   │   └── ...
│   │   └── grpo_PT_DATASET_II/ # post-trained models via GRPO using PT_DATASET_II
│   │   │   └── checkpoint-i/
│   │   │   └── ...
│   │   └── ...
  • Re-evaluate baseline models: ./scripts/training/post_train_eval_baselines.sh
Baseline Re-evaluation
  • Evaluate post-trained models: ./scripts/training/post_train_eval_local.sh
Tina Evaluation

Acknowledgements

We thank Huggingface to open source the amazing open-r1 project, which is the starting codebase of our Tina project. We also appreciate all researchers releasing their open-source reasoning datasets, including open-r1/OpenR1-Math-220k, bethgelab/CuratedThoughts, agentica-org/DeepScaleR-Preview-Dataset, RUC-AIBOX/STILL-3-Preview-RL-Data, knoveleng/open-rs, knoveleng/open-s1, and GAIR/LIMR, which are used for our training.

Tina's avatar is generated by GPT-4o based on KYNE's girls and the following prompt.

Hi, I’m Tina — an INTJ who’s all about getting to the essence of things. I study reasoning models because I’m fascinated by how structured thinking and logic can emerge from data. Outside of that, I recharge with movies, music, and the occasional gaming session. I believe in strategic effort: minimal input, maximum impact — whether it’s in research or everyday life, I’m always looking for the most efficient path to meaningful results.

Citation

@misc{wang2025tinatinyreasoningmodels,
      title={Tina: Tiny Reasoning Models via LoRA}, 
      author={Shangshang Wang and Julian Asilis and Ömer Faruk Akgül and Enes Burak Bilgin and Ollie Liu and Willie Neiswanger},
      year={2025},
      eprint={2504.15777},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2504.15777}, 
}

About

Tina: Tiny Reasoning Models via LoRA

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0