SearchAgent-X is a highly efficient system for reasoning-search interleaved large language model (LLM) agents.
Compared to the popular LLM inference framework vLLM and HNSW-based retrieval methods, it achieves 1.3–3.4× higher throughput with only 0.2–0.6× the latency. See detailed techniques in our paper .
🔔 When to Use SearchAgent-X:
- Serving: expecting low latency and high throughput LLM search agents;
- Post-training (e.g., reinforcement learning): mitigating time-consuming, multi-turn LLM rollouts.
- Retriever (and Encoder)
conda create -n retriever_env python=3.12.9 pip install -r retriever_requirements.txt
- Generator
conda create -n SearchAgent-X python=3.9 pip install -r generator_requirements.txt
SearchAgent-X requires these datasets and models for running interleaved search and reasoning. Here we introduce our experimental settings. You can definitely change them to your own datasets/models. Remember where you store them for later configuration.
- Corpus: wiki-18-corpus
- Embedding Model: all-MiniLM-L6-v2
- ANN Index: Our HNSW Index
- LLM Reasoning Model: 7B model; 14B model
- Request Dataset: Musique
😄 You can easily find them all in one HF Collection.
- Modify the paths to your downloaded embedding model, HNSW index, and corpus in
config.py
- Start Retriever Server
conda activate retriever_env python vllm/entrypoints/emb_ret_server.py
- Modify the paths to your downloaded datasets and models in
config.py
- Run experiments
The experimental results will be stored by default in the directory
conda activate SearchAgent-X python vllm/entrypoints/searchagent-x.py
experiments/output/
.
The dataset
directory contains scripts for processing your corpus: embedding.py
for generating sentence embeddings and build_hnsw.py
for constructing the HNSW index.
Follow these steps to prepare your corpus and build the search index:
-
Encode Corpus: Use
embedding.py
to convert the corpus into embeddings using a specified Sentence Transformer model.python ./datasets/embedding.py <SentenceTransformer_model_path> <data_file_path> <embedding_save_path>
<SentenceTransformer_model_path>
: Path to your specified Sentence Transformer model.<data_file_path>
: Path to your input data file (e.g., a.jsonl
corpus).<embedding_save_path>
: Desired path to save the generated embeddings.
-
Build HNSW Index: Use
build_hnsw.py
to create an HNSW index for retrieval. You need to specify thenum_ele 8AC9 ments
anddata_dim
within thebuild_hnsw.py
script based on your generated embeddings.python ./datasets/build_hnsw.py <embeddings_data_path> <hnsw_index_path>
<embeddings_data_path>
: Path to the embeddings file generated in the previous step.<hnsw_index_path>
: Desired path to save the HNSW index file.
You can integrate different reasoning models by editing the config.py
. Specifically, you'll need to:
- Set the
MODEL
path to your desired reasoning model. - Configure the appropriate prompt template for that model within
config.py
.
-
Offline Deployment: Ideal for batch processing or scenarios where rate limiting isn't needed. Set
REQUEST_RATE = 'inf'
inconfig.py
. -
Online Deployment: Designed for real-time applications where you need to manage request rate. Set
REQUEST_RATE
(requests per second) to a specific numerical value (e.g.,5
) inconfig.py
.
Then, simply execute SearchAgent-X.
- Integrating SearchAgent-X into post-training frameworks like Search-R1, ReSearch, and R1-Searcher, measuring end-to-end training benefits.
- Supporting more commonly used retrieval methods, such as IVF_PQ and SCANN.
- ... (Expecting Your Feedback 😄!
SearchAgent-X is built upon vLLM for its high-performance PagedAttention; and HNSWLib for its favorable tradeoff between retrieval speed and accuracy. Thanks for their awesome work! In addition, our motivation of addressing search agent efficiency comes from these pioneering search agent models: Search-R1, ReSearch, and R1-Searcher. We believe this agentic paradigm will be the next generation of RAG.