8000 GitHub - inclusionAI/PromptCoT: A unified suite for generating elite reasoning problems and training high-performance LLMs, including pioneering attention-free architectures
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

A unified suite for generating elite reasoning problems and training high-performance LLMs, including pioneering attention-free architectures

License

Notifications You must be signed in to change notification settings

inclusionAI/PromptCoT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PromptCoT & PromptCoT-Mamba: Advancing the Frontiers of Reasoning


News

  • May 30, 2025: PromptCoT-Mamba released! Introducing an attention-free foundation model for reasoning tasks.
  • Apr 11, 2025: PromptCoT-QwQ-32B model and its training data released, achieving new state-of-the-art results.
  • Mar 7, 2025: PromptCoT project launched, including the problem generation model, distilled models (PromptCoT-DS series), and associated datasets.

Overview

This repository unifies two synergistic projects aimed at advancing the frontiers of mathematical and code reasoning in Large Language Models (LLMs): PromptCoT and PromptCoT-Mamba.

PromptCoT (Synthesizing Olympiad-Level Problems for Mathematical Reasoning in Large Language Models) addresses the critical challenge of acquiring high-quality, complex problems for training advanced LLMs. It introduces a novel methodology to systematically generate Olympiad-level mathematical problems by modeling the rationale behind expert problem design. This approach not only enhances problem diversity and difficulty but also ensures logical consistency in problem construction, providing a scalable solution for creating robust training datasets.

PromptCoT-Mamba (Scaling Reasoning without Attention) leverages the problem generation capabilities of the PromptCoT pipeline to train PromptCoT-Mamba-7B, the first attention-free foundation model based on the Mamba-2 architecture. This model demonstrates that structured training curricula can enable attention-free models to surpass strong Transformer baselines on a wide array of competition-level math and code reasoning tasks, all while maintaining constant-memory inference without KV caching.

Together, these projects offer a powerful suite of tools, models, and datasets for researchers and developers working on the cutting edge of AI reasoning.


Highlights & Key Results

1. PromptCoT: Problem Generation & Distilled Models

  • ✨ The Missing Piece for Test-Time Scaling: A lightweight yet powerful problem generation model enabling the construction of prompt sets at any scale with sufficient quality, perfect for SFT or RL post-training.
  • 📖 A Fully Open Project: All models (generation, distilled LLMs) and datasets (generation inputs, SFT data) are open-sourced.
  • 🏆 Superior Performance of Distilled Models:
    • PromptCoT-DS-7B consistently surpasses its base model, DeepSeek-R1-Distill-Qwen-7B, with significant gains:
      • +0.9% on MATH-500 (93.7%)
      • +3.2% on AIME2024 (58.7%)
      • +9.2% on AIME2025 (49.2%)
    • PromptCoT-DS-7B (7B parameters) achieves results comparable to larger 32B models like S1-32B and LIMO-32B.
    • PromptCoT-QwQ-32B sets a new standard, outperforming other 32B models by a significant margin:
      • MATH-500: 96.7% ± 0.5%
      • AIME2024: 83.8% ± 2.8%
      • AIME2025: 75.4% ± 4.7%
    • PromptCoT-DS-1.5B demonstrates competitive performance against RL-based models purely through distillation.
  • ⚡ Efficiency Without Compromise: PromptCoT-DS-1.5B achieves 40+% AIME scores using over 15× fewer A100 GPU hours compared to models like DeepScaleR-1.5B-Preview.

2. PromptCoT-Mamba: Attention-Free Reasoning

  • 🚀 First Attention-Free SOTA: PromptCoT-Mamba-7B is the first attention-free model (Mamba-2 architecture) to outperform strong Transformer baselines in math and code reasoning.
  • 🧠 Trained with PromptCoT Pipeline: Utilizes a structured, two-stage curriculum with data generated by PromptCoT.
  • 💪 Strong General Performance: PromptCoT-Mamba-7B consistently outperforms 7B-scale Transformer and hybrid Mamba-Transformer baselines.
    • MATH-500: 84.6%
    • AIME 2024: 35.2%
    • AIME 2025: 24.6%
    • Livecodebench: 29.9%
  • 🎯 Math Specialization: The math-specialized variant, PromptCoT-Mamba-Math-7B, further boosts math performance:
    • MATH-500: 88.0%
    • AIME 2024: 42.9% (+7.7% over generalist)
    • AIME 2025: 30.8% (+6.2% over generalist)
  • Inference Efficiency: Offers substantial speedups (e.g., 3.66× faster on 24GB GPU for long sequences) and constant-memory inference, ideal for cost-sensitive or long-context workloads.

Performance Details

PromptCoT Series Performance

Model GSM8K MATH-500 AIME2024 AIME2025
🔹 1.5B Models
DeepSeek-R1-Distill-Qwen-1.5B - 83.9% 28.9% 28.1%
STILL-3-1.5B-preview - 85.5% 39.3% -
DeepScaleR-1.5B-Preview - 🟢 87.8% 🟢 43.1% 🟢 37.1%
PromptCoT-DS-1.5B (ours) 🟢 87.6% ± 0.5% 85.3% ± 1.1% 41.2% ± 6.9% 36.7% ± 6.2%
🔹 7B Models
DeepSeek-R1-Distill-Qwen-7B - 92.8% 55.5% 40.0%
Qwen2.5-7B-SimpleRL - 82.4% 26.7% -
OpenThinker-7B - 89.6% 30.0% 33.3%
OpenR1-Qwen-7B - 90.6% 36.7% 40.0%
PromptCoT-DS-7B (ours) 🔥 92.8% ± 0.5% 🔥 93.7% ± 0.7% 🔥 58.7% ± 3.1% 🔥 49.2% ± 7.9%
🔹 32B Models
DeepSeek-R1-Distill-Qwen-32B - 94.3% 72.6% -
S1-32B - 93.0% 56.7% 26.6%
LIMO-32B - 94.8% 57.1% 46.6%
QwQ-32B - - 82.1% 70.8%
PromptCoT-QwQ-32B (ours) 🔥🔥 96.4% ± 0.2% 🔥🔥 96.7% ± 0.5% 🔥🔥 83.8% ± 2.8% 🔥🔥 75.4% ± 4.7%

PromptCoT-Mamba Performance

General Performance:

Model MATH-500 AIME 24 AIME 25 OlympiadBench HumanEval HumanEval+ Livecodebench
PromptCoT-Mamba-7B 84.6 🔥🔥35.2 🔥🔥24.6 50.7 81.7 75.0 🔥🔥29.9
Gemma3-27B 89.0 32.6 24.0 54.2 86.0 78.0 26.9
Gemma3-12B 83.8 22.9 19.2 49.9 81.1 73.2 22.2
Sky-T1-7B 85.0 19.2 19.2 49.2 41.5 37.2 18.3
S1.1-7B 82.0 19.2 17.5 43.1 64.0 56.7 13.3
Bespoke-Stratos-7B 81.2 18.3 16.3 45.0 73.2 68.3 8.6
Nemotron-H-8B 77.6 -- -- -- 79.3 74.4 --
M1-3B 81.7 23.0 22.0 43.6 -- -- --

Math Specialization vs. Generalist:

Model MATH-500 AIME 24 AIME 25 OlympiadBench HumanEval HumanEval+ Livecodebench
PromptCoT-Mamba-Math-7B 🔥🔥88.0 🔥🔥42.9 🔥🔥30.8 🔥🔥52.1 71.3 66.5 20.3
PromptCoT-Mamba-7B 84.6 35.2 24.6 50.7 81.7 75.0 29.9

Citation

If you find PromptCoT or PromptCoT-Mamba useful in your research, please consider citing the respective papers:

For PromptCoT:

@article{zhao2025promptcot,
  author    = {Zhao, Xueliang and Wu, Wei and Guan, Jian and Kong, Lingpeng},
  title     = {PromptCoT: Synthesizing Olympiad-Level Problems for Mathematical Reasoning in Large Language Models},
  year      = {2025},
  journal   = {arXiv preprint arXiv:2503.02324},
  url       = {http://arxiv.org/abs/2503.02324}
}

For PromptCoT-Mamba:

@article{zhao2025scaling,
  author    = {Xueliang Zhao and Wei Wu and Lingpeng Kong},
  title     = {Scaling Reasoning without Attention},
  journal   = {arXiv preprint arXiv:2505.22425},
  year      = {2025},
  url       = {https://arxiv.org/abs/2505.22425}
}

About

A unified suite for generating elite reasoning problems and training high-performance LLMs, including pioneering attention-free architectures

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0