๐ง Note: This project is evolving rapidlyโjoin the community by opening issues, submitting PRs, leaving comments, or โญ starring the repo to help build a leading resource for agentic search.
-
Research Collection: Curate and categorize comprehensive research work in agentic search, including papers, code implementations, and empirical findings
-
Interactive Demos: Build demonstration pages to showcase different agentic search methods and allow hands-on exploration of their capabilities
-
Evaluation Arena: Develop a Python toolkit for systematic evaluation and benchmarking of agentic search methods across diverse tasks and metrics
-
Training Gym: Create a Python framework for training and optimizing agentic search models, including reinforcement learning and other approaches
For each paper, we provide the following information:
๐จโ๐ First Author ยท ๐ง Corresponding Author (Last Author if not specified) ยท ๐๏ธ First Organization ยท ๐ Dataset
Note: Please submit a PR if we missed anything!
๐ Dataset Types:
General QA: NQ, TriviaQA, PopQA
Multi-Hop QA: HotpotQA, 2wiki, Musique, Bamboogle
Complex Task: GPQA, GAIA, WebWalker QA, Humanity's Last Exam (HLE)
Report Generation: Glaive
Math & Coding: AIME, MATH500, AMC, LiveCodeBench
Search-R1: Training LLMs to Reason and Leverage Search
Engines with Reinforcement Learning
๐จโ๐ Bowen Jin ยท ๐ง Jiawei Han ยท ๐๏ธ UIUC
๐ Dataset: General QA, Multi-Hop QA ยท ๐ค Model: Qwen-2.5-3B / 7B ยท ๐ฏ Training: GRPO, PPO
An Empirical Study on Reinforcement Learning for Reasoning-Search Interleaved LLM Agents
๐จโ๐ Bowen Jin ยท ๐ง Jiawei Han ยท ๐๏ธ UIUC
๐ Dataset: General QA, Multi-Hop QA ยท ๐ค Model: Qwen-2.5-3B / 7B / 14Bยท ๐ฏ Training: GRPO, PPO
Notes: a new version of Search-R1.
WebThinker: Empowering Large Reasoning Models with Deep Research Capability:
๐จโ๐ Xiaoxi Li ยท ๐ง Zhicheng Dou ยท ๐๏ธ GSAI, RUC
๐ Dataset: Complex Task, Report Generation ยท ๐ค Model: QwQ 32B ยท ๐ฏ Training: SFT, DPO
DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments
๐จโ๐ Yuxiang Zheng ยท ๐ง Pengfei Liu ยท ๐๏ธ SJTU
๐ Dataset: General QA, Multi-Hop QA ยท ๐ค Model: Qwen-2.5-7B ยท ๐ฏ Training: GRPO
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
๐จโ๐ Huatong Song ยท ๐ง Wayne Xin Zhao ยท ๐๏ธ GSAI, RUC
๐ Dataset: General QA, Multi-Hop QA ยท ๐ค Model: Qwen-2.5-7B, Llama-3.1-8B ยท ๐ฏ Training: SFT, GRPO, REINFORCE++
R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning
๐จโ๐ Huatong Song ยท ๐ง Wayne Xin Zhao ยท ๐๏ธ GSAI, RUC
๐ Dataset: General QA, Multi-Hop QA ยท ๐ค Model: Qwen-2.5-7B ยท ๐ฏ Training: SFT, GRPO, REINFORCE++
SimpleDeepSearcher: Deep Information Seeking via Web-Powered Reasoning Trajectory Synthesis
๐จโ๐ Shuang Sun ยท ๐ง Wayne Xin Zhao ยท ๐๏ธ GSAI, RUC
๐ Dataset: General QA, Multi-Hop QA ยท ๐ค Model: Qwen-2.5-7B / 32B, QwQ-32B ยท ๐ฏ Training: SFT, DPO, REINFORCE++
ZeroSearch: Incentivize the Search Capability of LLMs without Searching
๐จโ๐ Hao Sun ยท ๐ง Zile Qiao, Jiayan Guo, Yan Zhang ยท ๐๏ธ Tongyi Lab
๐ Dataset: General QA, Multi-Hop QA ยท ๐ค Model: Qwen-2.5-3B / 7B, s LLaMA-3.2-3B ยท ๐ฏ Training: REINFORCE, GRPO, PPO
Chain-of-Retrieval Augmented Generation
๐จโ๐ Liang Wang ยท ๐ง Furu Wei ยท ๐๏ธ MSRA
๐ Dataset: General QA, Multi-Hop QA ยท ๐ค Model: Llama-3.1-8B-Instruct ยท ๐ฏ Training: REINFORCE, GRPO, PPO
IKEA: Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent
๐จโ๐ Ziyang Huang ยท ๐ง Kang Liu ยท ๐๏ธ IA, CAS
๐ Dataset: General QA, Multi-Hop QA ยท ๐ค Model: Qwen-2.5-3B / 7B ยท ๐ฏ Training: GRPO
Scent of Knowledge: Optimizing Search-Enhanced Reasoning with Information Foraging
๐จโ๐ Hongjin Qian ยท ๐ง Zheng Liu ยท ๐๏ธ BAAI
๐ Dataset: General QA, Multi-Hop QA ยท ๐ค Model: Qwen-2.5-3B / 7B ยท ๐ฏ Training: GRPO, PPO
Search and Refine During Think: Autonomous Retrieval-Augmented Reasoning of LLMs
๐จโ๐ Yaorui Shi ยท ๐ง Xiang Wang ยท ๐๏ธ USTC
๐ Dataset: General QA, Multi-Hop QA ยท ๐ค Model: Qwen-2.5-3B ยท ๐ฏ Training: GRPO
ConvSearch-R1: Enhancing Query Reformulation for Conversational Search with Reasoning via Reinforcement Learning
๐จโ๐ Changtai Zhu ยท ๐ง Xipeng Qiu ยท ๐๏ธ FDU
๐ Dataset: Conversational QA ยท ๐ค Model: Qwen-2.5-3B / Llama-3.2-3B ยท ๐ฏ Training: SFT, GRPO
Process vs. Outcome Reward: Which is Better for Agentic RAG Reinforcement Learning
๐จโ๐ Wenlin Zhang ยท ๐ง Xiangyu Zhao ยท ๐๏ธ CityUHK
๐ Dataset: General QA, Multi-Hop QA ยท ๐ค Model: Qwen-2.5-7B ยท ๐ฏ Training: DPO
WebDancer: Towards Autonomous Information Seeking Agency
๐จโ๐ Jialong Wu ยท ๐ง Wenbiao Yin, Yong Jiang ยท ๐๏ธ Tongyi Lab
๐ Dataset: Complex Task ยท ๐ค Model: Qwen-2.5-7B / 32B, QwQ-32B ยท ๐ฏ Training: DAPO
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning
๐จโ๐ Mingyang Chen ยท ๐ง Fan Yang ยท ๐๏ธ Baichuan
๐ Dataset: Multi-Hop QA ยท ๐ค Model: Qwen-2.5-7B / 32B ยท ๐ฏ Training: GRPO
Search-o1: Agentic Search-Enhanced Large Reasoning Models:
๐จโ๐ Xiaoxi Li ยท ๐ง Zhicheng Dou ยท ๐๏ธ GSAI, RUC
๐ Dataset: General QA, Multi-Hop QA, Complex Task, Math & Coding ยท ๐ค Model: QwQ-32B-Preview
Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research
๐จโ๐ Junde Wu ยท ๐ง Yuyuan Liu ยท ๐๏ธ Oxford University
๐ Dataset: Complex Task ยท ๐ค Model: APIs
Coding Agents with Multimodal Browsing are Generalist Problem Solvers
๐จโ๐ Aditya Bharat Soni ยท ๐ง Graham Neubigo ยท ๐๏ธ CMU
๐ Dataset: Complex Task ยท ๐ค Model: claude-3-7-sonnet
Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning
๐จโ๐ Guanting Dong ยท ๐ง Zhicheng Dou ยท ๐๏ธ GSAI, RUC
๐ Dataset: General QA, Multi-Hop QA, Math & Coding ยท ๐ค Model: Qwen-2.5-3Bยท ๐ฏ Training: SFT,GRPO, PPO
OTC: Optimal Tool Calls via Reinforcement Learning
๐จโ๐ Hongru Wang ยท ๐ง Heng Ji ยท ๐๏ธ CUHK
๐ Dataset: General QA, Multi-Hop QA, Math & Coding ยท ๐ค Model: Qwen-2.5-3B / 7Bยท ๐ฏ Training: GRPO, PPO
Multimodal-Search-R1: Incentivizing LMMs to Search
๐จโ๐ Jinming Wu ยท ๐ง Zejun Ma ยท ๐๏ธ BUPT
๐ Dataset: VQA ยท ๐ค Model: Qwen2.5-VL-Instruct-3B/7B ยท ๐ฏ Training: GRPO
๐จโ๐ Zhaorui Yang ยท ๐ง Bo Zhang ยท ๐๏ธ ZJU
๐ Dataset: Report Generation
InfoDeepSeek: Benchmarking Agentic Information Seeking for Retrieval-Augmented Generation
๐จโ๐ Yunjia Xi ยท ๐ง Jianghao Lin ยท ๐๏ธ SJTU
๐ Dataset: General QA, Multi-Hop QA
BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents
๐จโ๐ Jason Wei ยท ๐ง Amelia Glaese ยท ๐๏ธ OpenAI
๐ Dataset: Web Browsing
HealthBench: Evaluating Large Language Models Towards Improved Human Health
๐จโ๐ Rahul K. Arora ยท ๐ง Karan Singhal ยท ๐๏ธ OpenAI
๐ Dataset: Multi-turn Medical QA
๐จโ๐ Lisheng Huang ยท ๐ง Wayne Xin Zhao ยท ๐๏ธ GSAI, RUC
๐ Dataset: Web Browsing
WebWalker: Benchmarking LLMs in Web Traversal
๐จโ๐ Jialong Wu ยท ๐ง Deyu Zhou, Yong Jiang ยท ๐๏ธ SEU, Tongyi Lab
๐ Dataset: Web Browsing
๐จโ๐ Weinan Zhang ยท ๐๏ธ SJTU
OpenAI's Deep Research: https://openai.com/index/introducing-deep-research/
Google's Gemini Pro: https://www.google.com/search/about/
X's Grok 3: https://x.ai/news/grok-3
Perplexity: https://www.perplexity.ai/
Jina AI: https://jina.ai/deepsearch/
Metasota: https://metaso.cn/
We are building a demo page to showcase different agentic search methods and allow hands-on exploration of their capabilities. Each demo will be integrated into a standardized retrieval and web browser interface with comparable settings, enabling comprehensive and fair comparisons across various approaches. This systematic evaluation will help identify strengths and limitations of different methods and advance the state-of-the-art in agentic search.
Currently, it looks like this:
You can run the demo by serving the models via vllm:
vllm serve path_to_your_model --port 25900 --host 127.0.0.1
Then, build a search server, for example, use this:
bash retrieval_launch.sh
Config your serve address in config/demo_config.json
, modify model list here.
Run the demo by:
streamlit run demo/app.py
We maintain a collection of ๐ paper presentation slides on Overleaf to facilitate learning and knowledge sharing in the agentic search community. Each presentation consists of 3-5 slides that concisely introduce key aspects of a paper, including motivation, methodology, and main results. These slides serve as quick references for understanding important works in the field and can be used for self-study, teaching, or research presentations.
๐ Check out our slides collection: Agentic Search Paper Slides
We are building an arena page to benchmark different agentic search methods in a unified evaluation framework. All methods will be integrated into standardized retrieval and web browser interfaces with comparable settings, enabling comprehensive and fair comparisons across various approaches. This systematic evaluation will help identify strengths and limitations of different methods and advance the state-of-the-art in agentic search.
We are organizing a collection of optimization frameworks and training approaches used in agentic search, including reinforcement learning methods like GRPO and PPO, as well as supervised fine-tuning techniques. This will help researchers understand and implement effective training strategies for their agentic search models.
Stay tuned for detailed tutorials and code examples on training agentic search systems!
We welcome contributions to this repository! If you have any suggestions or feedback, please feel free to open an issue or submit a pull request.
If you find this repository useful, please consider citing it as follows:
@misc{awesome-agentic-search,
author = {Hongjin Qian, Zheng Liu},
title = {Awesome Agentic Search},
year = {2025},
publisher = {GitHub},