8000 GitHub - tiannuo-yang/SearchAgent-X: A High-Efficiency System of Large Language Model Based Search Agents
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

tiannuo-yang/SearchAgent-X

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SearchAgent-X Logo

SearchAgent-X is a highly efficient system for reasoning-search interleaved large language model (LLM) agents.
Compared to the popular LLM inference framework vLLM and HNSW-based retrieval methods, it achieves 1.3–3.4× higher throughput with only 0.2–0.6× the latency. See detailed techniques in our paper .

SearchAgent-X Performance

🔔 When to Use SearchAgent-X:

  • Serving: expecting low latency and high throughput LLM search agents;
  • Post-training (e.g., reinforcement learning): mitigating time-consuming, multi-turn LLM rollouts.

🚀 Quick Start

Environment

  • Retriever (and Encoder)
    conda create -n retriever_env python=3.12.9
    pip install -r retriever_requirements.txt
  • Generator
    conda create -n SearchAgent-X python=3.9
    pip install -r generator_requirements.txt

Datasets & Models

SearchAgent-X requires these datasets and models for running interleaved search and reasoning. Here we introduce our experimental settings. You can definitely change them to your own datasets/models. Remember where you store them for later configuration.

😄 You can easily find them all in one HF Collection.

Run SearchAgent-X

  • Modify the paths to your downloaded embedding model, HNSW index, and corpus in config.py
  • Start Retriever Server
    conda activate retriever_env
    python vllm/entrypoints/emb_ret_server.py
  • Modify the paths to your downloaded datasets and models in config.py
  • Run experiments
    conda activate SearchAgent-X
    python vllm/entrypoints/searchagent-x.py
    The experimental results will be stored by default in the directory experiments/output/.

👨‍💻 For Developers

How To Encode And Index My Own Corpus?

The dataset directory contains scripts for processing your corpus: embedding.py for generating sentence embeddings and build_hnsw.py for constructing the HNSW index.

Follow these steps to prepare your corpus and build the search index:

  1. Encode Corpus: Use embedding.py to convert the corpus into embeddings using a specified Sentence Transformer model.

    python ./datasets/embedding.py <SentenceTransformer_model_path> <data_file_path> <embedding_save_path>
    • <SentenceTransformer_model_path>: Path to your specified Sentence Transformer model.
    • <data_file_path>: Path to your input data file (e.g., a .jsonl corpus).
    • <embedding_save_path>: Desired path to save the generated embeddings.
  2. Build HNSW Index: Use build_hnsw.py to create an HNSW index for retrieval. You need to specify the num_ele 8AC9 ments and data_dim within the build_hnsw.py script based on your generated embeddings.

    python ./datasets/build_hnsw.py <embeddings_data_path> <hnsw_index_path>
    • <embeddings_data_path>: Path to the embeddings file generated in the previous step.
    • <hnsw_index_path>: Desired path to save the HNSW index file.

How To Use Other Reasoning Models?

You can integrate different reasoning models by editing the config.py. Specifically, you'll need to:

  1. Set the MODEL path to your desired reasoning model.
  2. Configure the appropriate prompt template for that model within config.py.

How To Deploy SearchAgent-X in Offline/Online Scenarios?

  • Offline Deployment: Ideal for batch processing or scenarios where rate limiting isn't needed. Set REQUEST_RATE = 'inf' in config.py.

  • Online Deployment: Designed for real-time applications where you need to manage request rate. Set REQUEST_RATE (requests per second) to a specific numerical value (e.g., 5) in config.py.

Then, simply execute SearchAgent-X.

📋 What's Next?

  1. Integrating SearchAgent-X into post-training frameworks like Search-R1, ReSearch, and R1-Searcher, measuring end-to-end training benefits.
  2. Supporting more commonly used retrieval methods, such as IVF_PQ and SCANN.
  3. ... (Expecting Your Feedback 😄!

Acknowledgments

SearchAgent-X is built upon vLLM for its high-performance PagedAttention; and HNSWLib for its favorable tradeoff between retrieval speed and accuracy. Thanks for their awesome work! In addition, our motivation of addressing search agent efficiency comes from these pioneering search agent models: Search-R1, ReSearch, and R1-Searcher. We believe this agentic paradigm will be the next generation of RAG.

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  
0