GitHub - tiannuo-yang/SearchAgent-X: A High-Efficiency System of Large Language Model Based Search Agents

SearchAgent-X is a highly efficient system for reasoning-search interleaved large language model (LLM) agents.
Compared to the popular LLM inference framework vLLM and HNSW-based retrieval methods, it achieves 1.3–3.4× higher throughput with only 0.2–0.6× the latency. See detailed techniques in our paper .

🔔 When to Use SearchAgent-X:

Serving: expecting low latency and high throughput LLM search agents;
Post-training (e.g., reinforcement learning): mitigating time-consuming, multi-turn LLM rollouts.

🚀 Quick Start

Environment

Retriever (and Encoder)

conda create -n retriever_env python=3.12.9
pip install -r retriever_requirements.txt

Generator

conda create -n SearchAgent-X python=3.9
pip install -r generator_requirements.txt

Datasets & Models

SearchAgent-X requires these datasets and models for running interleaved search and reasoning. Here we introduce our experimental settings. You can definitely change them to your own datasets/models. Remember where you store them for later configuration.

Corpus: wiki-18-corpus
Embedding Model: all-MiniLM-L6-v2
ANN Index: Our HNSW Index
LLM Reasoning Model: 7B model; 14B model
Request Dataset: Musique

😄 You can easily find them all in one HF Collection.

Run SearchAgent-X

Modify the paths to your downloaded embedding model, HNSW index, and corpus in config.py

Start Retriever Server

conda activate retriever_env
python vllm/entrypoints/emb_ret_server.py

Modify the paths to your downloaded datasets and models in config.py
Run experiments
```
conda activate SearchAgent-X
python vllm/entrypoints/searchagent-x.py
```
The experimental results will be stored by default in the directory experiments/output/.

👨‍💻 For Developers

How To Encode And Index My Own Corpus?

The dataset directory contains scripts for processing your corpus: embedding.py for generating sentence embeddings and build_hnsw.py for constructing the HNSW index.

Follow these steps to prepare your corpus and build the search index:

Encode Corpus: Use embedding.py to convert the corpus into embeddings using a specified Sentence Transformer model.
```
python ./datasets/embedding.py <SentenceTransformer_model_path> <data_file_path> <embedding_save_path>
```
- <SentenceTransformer_model_path>: Path to your specified Sentence Transformer model.
- <data_file_path>: Path to your input data file (e.g., a .jsonl corpus).
- <embedding_save_path>: Desired path to save the generated embeddings.
Build HNSW Index: Use build_hnsw.py to create an HNSW index for retrieval. You need to specify the num_ele 8AC9 ments and data_dim within the build_hnsw.py script based on your generated embeddings.
```
python ./datasets/build_hnsw.py <embeddings_data_path> <hnsw_index_path>
```
- <embeddings_data_path>: Path to the embeddings file generated in the previous step.
- <hnsw_index_path>: Desired path to save the HNSW index file.

How To Use Other Reasoning Models?

You can integrate different reasoning models by editing the config.py. Specifically, you'll need to:

Set the MODEL path to your desired reasoning model.
Configure the appropriate prompt template for that model within config.py.

How To Deploy SearchAgent-X in Offline/Online Scenarios?

Offline Deployment: Ideal for batch processing or scenarios where rate limiting isn't needed. Set REQUEST_RATE = 'inf' in config.py.
Online Deployment: Designed for real-time applications where you need to manage request rate. Set REQUEST_RATE (requests per second) to a specific numerical value (e.g., 5) in config.py.

Then, simply execute SearchAgent-X.

📋 What's Next?

Integrating SearchAgent-X into post-training frameworks like Search-R1, ReSearch, and R1-Searcher, measuring end-to-end training benefits.
Supporting more commonly used retrieval methods, such as IVF_PQ and SCANN.
... (Expecting Your Feedback 😄!

Acknowledgments

SearchAgent-X is built upon vLLM for its high-performance PagedAttention; and HNSWLib for its favorable tradeoff between retrieval speed and accuracy. Thanks for their awesome work! In addition, our motivation of addressing search agent efficiency comes from these pioneering search agent models: Search-R1, ReSearch, and R1-Searcher. We believe this agentic paradigm will be the next generation of RAG.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.buildkite		.buildkite
benchmarks		benchmarks
cmake		cmake
csrc		csrc
datasets		datasets
examples		examples
hnswlib		hnswlib
rocm_patch		rocm_patch
tests		tests
vllm		vllm
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
MANIFEST.in		MANIFEST.in
README.md		README.md
collect_env.py		collect_env.py
config.py		config.py
generator_requirements.txt		generator_requirements.txt
logo.png		logo.png
performance.png		performance.png
pyproject.toml		pyproject.toml
requirements-build.txt		requirements-build.txt
requirements-dev.txt		requirements-dev.txt
requirements-neuron.txt		requirements-neuron.txt
requirements-rocm.txt		requirements-rocm.txt
requirements.txt		requirements.txt
retriever_requirements.txt		retriever_requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚀 Quick Start

Environment

Datasets & Models

Run SearchAgent-X

👨‍💻 For Developers

How To Encode And Index My Own Corpus?

How To Use Other Reasoning Models?

How To Deploy SearchAgent-X in Offline/Online Scenarios?

📋 What's Next?

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

tiannuo-yang/SearchAgent-X

Folders and files

Latest commit

History

Repository files navigation

🚀 Quick Start

Environment

Datasets & Models

Run SearchAgent-X

👨‍💻 For Developers

How To Encode And Index My Own Corpus?

How To Use Other Reasoning Models?

How To Deploy SearchAgent-X in Offline/Online Scenarios?

📋 What's Next?

Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages