- May 30, 2025: PromptCoT-Mamba released! Introducing an attention-free foundation model for reasoning tasks.
- Apr 11, 2025: PromptCoT-QwQ-32B model and its training data released, achieving new state-of-the-art results.
- Mar 7, 2025: PromptCoT project launched, including the problem generation model, distilled models (PromptCoT-DS series), and associated datasets.
This repository unifies two synergistic projects aimed at advancing the frontiers of mathematical and code reasoning in Large Language Models (LLMs): PromptCoT and PromptCoT-Mamba.
PromptCoT (Synthesizing Olympiad-Level Problems for Mathematical Reasoning in Large Language Models) addresses the critical challenge of acquiring high-quality, complex problems for training advanced LLMs. It introduces a novel methodology to systematically generate Olympiad-level mathematical problems by modeling the rationale behind expert problem design. This approach not only enhances problem diversity and difficulty but also ensures logical consistency in problem construction, providing a scalable solution for creating robust training datasets.
PromptCoT-Mamba (Scaling Reasoning without Attention) leverages the problem generation capabilities of the PromptCoT pipeline to train PromptCoT-Mamba-7B, the first attention-free foundation model based on the Mamba-2 architecture. This model demonstrates that structured training curricula can enable attention-free models to surpass strong Transformer baselines on a wide array of competition-level math and code reasoning tasks, all while maintaining constant-memory inference without KV caching.
Together, these projects offer a powerful suite of tools, models, and datasets for researchers and developers working on the cutting edge of AI reasoning.
- ✨ The Missing Piece for Test-Time Scaling: A lightweight yet powerful problem generation model enabling the construction of prompt sets at any scale with sufficient quality, perfect for SFT or RL post-training.
- 📖 A Fully Open Project: All models (generation, distilled LLMs) and datasets (generation inputs, SFT data) are open-sourced.
- 🏆 Superior Performance of Distilled Models:
- PromptCoT-DS-7B consistently surpasses its base model, DeepSeek-R1-Distill-Qwen-7B, with significant gains:
- +0.9% on MATH-500 (93.7%)
- +3.2% on AIME2024 (58.7%)
- +9.2% on AIME2025 (49.2%)
- PromptCoT-DS-7B (7B parameters) achieves results comparable to larger 32B models like S1-32B and LIMO-32B.
- PromptCoT-QwQ-32B sets a new standard, outperforming other 32B models by a significant margin:
- MATH-500: 96.7% ± 0.5%
- AIME2024: 83.8% ± 2.8%
- AIME2025: 75.4% ± 4.7%
- PromptCoT-DS-1.5B demonstrates competitive performance against RL-based models purely through distillation.
- PromptCoT-DS-7B consistently surpasses its base model, DeepSeek-R1-Distill-Qwen-7B, with significant gains:
- ⚡ Efficiency Without Compromise: PromptCoT-DS-1.5B achieves 40+% AIME scores using over 15× fewer A100 GPU hours compared to models like DeepScaleR-1.5B-Preview.
- 🚀 First Attention-Free SOTA: PromptCoT-Mamba-7B is the first attention-free model (Mamba-2 architecture) to outperform strong Transformer baselines in math and code reasoning.
- 🧠 Trained with PromptCoT Pipeline: Utilizes a structured, two-stage curriculum with data generated by PromptCoT.
- 💪 Strong General Performance: PromptCoT-Mamba-7B consistently outperforms 7B-scale Transformer and hybrid Mamba-Transformer baselines.
- MATH-500: 84.6%
- AIME 2024: 35.2%
- AIME 2025: 24.6%
- Livecodebench: 29.9%
- 🎯 Math Specialization: The math-specialized variant, PromptCoT-Mamba-Math-7B, further boosts math performance:
- MATH-500: 88.0%
- AIME 2024: 42.9% (+7.7% over generalist)
- AIME 2025: 30.8% (+6.2% over generalist)
- ⚡ Inference Efficiency: Offers substantial speedups (e.g., 3.66× faster on 24GB GPU for long sequences) and constant-memory inference, ideal for cost-sensitive or long-context workloads.
Model | GSM8K | MATH-500 | AIME2024 | AIME2025 |
---|---|---|---|---|
🔹 1.5B Models | ||||
DeepSeek-R1-Distill-Qwen-1.5B | - | 83.9% | 28.9% | 28.1% |
STILL-3-1.5B-preview | - | 85.5% | 39.3% | - |
DeepScaleR-1.5B-Preview | - | 🟢 87.8% | 🟢 43.1% | 🟢 37.1% |
PromptCoT-DS-1.5B (ours) | 🟢 87.6% ± 0.5% | 85.3% ± 1.1% | 41.2% ± 6.9% | 36.7% ± 6.2% |
🔹 7B Models | ||||
DeepSeek-R1-Distill-Qwen-7B | - | 92.8% | 55.5% | 40.0% |
Qwen2.5-7B-SimpleRL | - | 82.4% | 26.7% | - |
OpenThinker-7B | - | 89.6% | 30.0% | 33.3% |
OpenR1-Qwen-7B | - | 90.6% | 36.7% | 40.0% |
PromptCoT-DS-7B (ours) | 🔥 92.8% ± 0.5% | 🔥 93.7% ± 0.7% | 🔥 58.7% ± 3.1% | 🔥 49.2% ± 7.9% |
🔹 32B Models | ||||
DeepSeek-R1-Distill-Qwen-32B | - | 94.3% | 72.6% | - |
S1-32B | - | 93.0% | 56.7% | 26.6% |
LIMO-32B | - | 94.8% | 57.1% | 46.6% |
QwQ-32B | - | - | 82.1% | 70.8% |
PromptCoT-QwQ-32B (ours) | 🔥🔥 96.4% ± 0.2% | 🔥🔥 96.7% ± 0.5% | 🔥🔥 83.8% ± 2.8% | 🔥🔥 75.4% ± 4.7% |
General Performance:
Model | MATH-500 | AIME 24 | AIME 25 | OlympiadBench | HumanEval | HumanEval+ | Livecodebench |
---|---|---|---|---|---|---|---|
PromptCoT-Mamba-7B | 84.6 | 🔥🔥35.2 | 🔥🔥24.6 | 50.7 | 81.7 | 75.0 | 🔥🔥29.9 |
Gemma3-27B | 89.0 | 32.6 | 24.0 | 54.2 | 86.0 | 78.0 | 26.9 |
Gemma3-12B | 83.8 | 22.9 | 19.2 | 49.9 | 81.1 | 73.2 | 22.2 |
Sky-T1-7B | 85.0 | 19.2 | 19.2 | 49.2 | 41.5 | 37.2 | 18.3 |
S1.1-7B | 82.0 | 19.2 | 17.5 | 43.1 | 64.0 | 56.7 | 13.3 |
Bespoke-Stratos-7B | 81.2 | 18.3 | 16.3 | 45.0 | 73.2 | 68.3 | 8.6 |
Nemotron-H-8B | 77.6 | -- | -- | -- | 79.3 | 74.4 | -- |
M1-3B | 81.7 | 23.0 | 22.0 | 43.6 | -- | -- | -- |
Math Specialization vs. Generalist:
Model | MATH-500 | AIME 24 | AIME 25 | OlympiadBench | HumanEval | HumanEval+ | Livecodebench |
---|---|---|---|---|---|---|---|
PromptCoT-Mamba-Math-7B | 🔥🔥88.0 | 🔥🔥42.9 | 🔥🔥30.8 | 🔥🔥52.1 | 71.3 | 66.5 | 20.3 |
PromptCoT-Mamba-7B | 84.6 | 35.2 | 24.6 | 50.7 | 81.7 | 75.0 | 29.9 |
If you find PromptCoT or PromptCoT-Mamba useful in your research, please consider citing the respective papers:
For PromptCoT:
@article{zhao2025promptcot,
author = {Zhao, Xueliang and Wu, Wei and Guan, Jian and Kong, Lingpeng},
title = {PromptCoT: Synthesizing Olympiad-Level Problems for Mathematical Reasoning in Large Language Models},
year = {2025},
journal = {arXiv preprint arXiv:2503.02324},
url = {http://arxiv.org/abs/2503.02324}
}
For PromptCoT-Mamba:
@article{zhao2025scaling,
author = {Xueliang Zhao and Wei Wu and Lingpeng Kong},
title = {Scaling Reasoning without Attention},
journal = {arXiv preprint arXiv:2505.22425},
year = {2025},
url = {https://arxiv.org/abs/2505.22425}
}