Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space

This is the official implementation of the paper: Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space

📂 Directory Structure

soft_thinking/
├── datasets/
│   ├── aime2024.json
│   └── ... (other datasets)
├── models/
│   └── download.py
├── scripts/
│   ├── baseline/
│   └── st/
├── sglang_soft_thinking_pkg/
│   └── (sglang files)
├── config.sh
├── codeeval.py
├── convert_livecodebench.py
├── humanevaleval.py
├── mbppeval.py
├── matheval.py
├── run_sglang_softthinking.py
├── run_sglang_nothinking.py
└── ... (other files)

⚙️ Environment Setup

To set up the virtual environment for SGlang soft thinking inference, execute each line in config.sh:

conda create -n st python=3.11 -y && conda activate st
pip install --upgrade pip
pip install torch transformers accelerate jsonlines math_verify openai torch_memory_saver
pip install flash_attn --no-build-isolation # may take more time (20min). try `pip install flash_attn==2.7.3 --no-build-isolation` if find undefined symbol bug

# install SGlang (0.4.6.post1) tailored for soft thinking
cd sglang_soft_thinking_pkg
pip install -e "python[all]"
cd ..

🚀 Quick Start

Clone the repository:

git clone https://github.com/your-repo/soft_thinking.git
cd soft_thinking

Set up the environment: Follow the Environment Setup instructions.
Run a baseline test:
```
bash scripts/baseline/qwq32b.sh
```

🔄 Reproduction Instructions

1. Baseline

Run the baseline script:

bash scripts/baseline/qwq32b.sh

📥 Download the Model

First, download the model to the models/ directory:

python ./models/download.py --model_name "Qwen/QwQ-32B"

🧠 Run Inference

Then, run the baseline inference:

python run_sglang_softthinking.py \
    --dataset "aime2024" \
    --model_name "./models/Qwen/QwQ-32B" \
    --max_generated_tokens 32768 \
    --temperature 0.6 \
    --top_p 0.95 \
    --top_k 30 \
    --min_p 0.0 \
    --mem_fraction_static 0.8 \
    --start_idx 0 \
    --end_idx 10000 \
    --num_gpus 8 \
    --num_samples 16  \
    --use_llm_judge \
    --api_base "<replace it>" \
    --deployment_name "<replace it>" \
    --api_version "<replace it>" \
    --api_key "<replace it>" \
    --push_results_to_hf \
    --hf_repo_id "<replace it>" \
    --hf_token "<replace it>"

Note:

If you use the LLM judge or wish to upload results to Hugging Face, remember to provide the required API information.

2. Soft Thinking

Run the Soft Thinking script:

bash scripts/st/qwq32b.sh

Or directly execute:

python run_sglang_softthinking.py \
    --dataset "aime2024" \
    --model_name "./models/Qwen/QwQ-32B" \
    --max_topk 15 \
    --max_generated_tokens 32768 \
    --temperature 0.6 \
    --top_p 0.95 \
    --top_k 30 \
    --min_p 0.0 \
    --after_thinking_temperature 0.6 \
    --after_thinking_top_p 0.95 \
    --after_thinking_top_k 30 \
    --after_thinking_min_p 0.0 \
    --early_stopping_entropy_threshold 0.1 \
    --early_stopping_length_threshold 256 \
    --mem_fraction_static 0.8 \
    --start_idx 0 \
    --end_idx 10000 \
    --num_gpus 8 \
    --num_samples 1 \
    --enable_soft_thinking \
    --use_llm_judge \
    --api_base "<replace it>" \
    --deployment_name "<replace it>" \
    --api_version "<replace it>" \
    --api_key "<replace it>" \
    --push_results_to_hf \
    --hf_repo_id "<replace it>" \
    --hf_token "<replace it>"

🔍 Hyperparameter Search

To achieve optimal results, tune the following hyperparameters:

max_topk: {5, 10, 15, 20}
min_p: {0.005, 0.01, 0.02}
early_stopping_entropy_threshold: {0.01, 0.05, 0.1, 0.3}
early_stopping_length_threshold: {128, 256, 512, 1024}

Note:

Results may vary across different devices even with the same hyperparameters, due to differences in computation precision.

You can change the model (model_name) and dataset (dataset) to experiment with other configurations.

📜 Citation

If you use this code or dataset, please cite our paper:

@misc{zhang2025softthinking,
    title={Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space}, 
    author={Zhen Zhang and Xuehai He and Weixiang Yan and Ao Shen and Chenyang Zhao and Shuohang Wang and Yelong Shen and Xin Eric Wang},
    year={2025},
    eprint={2505.15778},
    archivePrefix={arXiv},
    primaryClass={cs.CL},
    url={https://arxiv.org/abs/2505.15778}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
datasets		datasets
imgs		imgs
scripts		scripts
sglang_soft_thinking_pkg		sglang_soft_thinking_pkg
.gitignore		.gitignore
codeeval.py		codeeval.py
configure.sh		configure.sh
convert_livecodebench.py		convert_livecodebench.py
demo.py		demo.py
humanevaleval.py		humanevaleval.py
matheval.py		matheval.py
mbppeval.py		mbppeval.py
readme.md		readme.md
run_sglang_nothinking.py		run_sglang_nothinking.py
run_sglang_softthinking.py		run_sglang_softthinking.py
task_eval_sglang.py		task_eval_sglang.py
task_eval_window.py		task_eval_window.py
test.ipynb		test.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space

📂 Directory Structure

⚙️ Environment Setup

🚀 Quick Start

🔄 Reproduction Instructions

1. Baseline

📥 Download the Model

🧠 Run Inference

2. Soft Thinking

🔍 Hyperparameter Search

📜 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Languages

eric-ai-lab/Soft-Thinking

Folders and files

Latest commit

History

Repository files navigation

Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space

📂 Directory Structure

⚙️ Environment Setup

🚀 Quick Start

🔄 Reproduction Instructions

1. Baseline

📥 Download the Model

🧠 Run Inference

2. Soft Thinking

🔍 Hyperparameter Search

📜 Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Languages

Packages