📘 Zhihu Blog • 📚 Social Media • 📝 Arxiv Paper
The official implementation of CtrlA: Adaptive Retrieval-Augmented Generation via Inherent Control.
CtrlA introduces an effective inherent control-based adaptive RAG framework, termed CtrlA, to enhance retrieval-augmented generation for LLM, balancing its internal and external knowledge. CtrlA characterizes LLM’s internal states and intervenes in the LLM generation from two perspectives: honesty steering and confidence monitoring via simple yet effective feature direction representations.
Oct 7, 2024 🎉 We have updated the version of CtrlA on Arxiv (Version 2).
Install dependencies by running the command below.
pip install -r requirements.txt
The dataset used for training the Confidence and Honesty Probes, as well as for our evaluation, is available here. Please create an eval_data/
directory and place all the data files within it.
Please download the model file from mistralai/Mistral-7B-Instruct-v0.1 on Hugging Face and place it in the model/
directory.
The pre-trained feature are stored in the trained_probe/
directory.
To extract the features, refer to the train_confidence_probe.ipynb
notebook for the confidence feature, and the train_honesty_probe.ipynb
notebook for the honesty feature.
All the code related to the retriever setup is in the code/retrievers
directory. We provide two retrieval services
as reported in our paper:
- BM25 Retrieval Service using ElasticSearch
- BGE Retrieval Service using FAISS
- Wikipedia 2018 Snippets:
wget https://dl.fbaipublicfiles.com/dpr/wikipedia_split/psgs_w100.tsv.gz
- BGE Embedding Model Weights:
https://huggingface.co/BAAI/bge-large-en-v1.5
- FAISS :
https://github.com/facebookresearch/faiss
orhttps://pypi.org/project/faiss/
- SentenceTransformers:
https://github.com/UKPLab/sentence-transformers
- Flask
- PyTorch
- ElasticSearch
cd code/retrievers/bge_retrieval_service # go to the target directory
python encode_wiki_bge.py # encode snippets into embeddings
python bge_faiss.py # set up bge-retrieval service
The sample code to call the bge-retrieval service:
python send_req_bge_wiki.py -q <query> -k <stop_k> --use_prefix
--use_prefix
is optional, which appends the prefix Represent this sentence for searching relevant passages:
in front of queries for asymmetric encoding of queries and passages
cd code/retrievers/es_retrieval_service # go to the target directory
python es_dictionary.py # convert passages in tsv to desired dictionary format.
python es_service.py # set up Elasticsearch Retrieval Service
The sample code to call the es-retrieval service:
python send_es_req.py -q <query> -k <stop_k>
After deploying the retrieval service, please complete the corresponding retrieval functions in code/retrieval.py
.
All the commands can be found in ./run.sh
python run.py --config configs/run.json --model run_short_form --dataset triviaqa --task triviaqa --max_new_tokens 1024 --retrieve_method bge_serper --metric match --use_tvq
python run.py --config configs/run.json --model run_short_form --dataset popqa --task popqa --max_new_tokens 1024 --retrieve_method bge_serper --metric match --use_tvq --continue_gen_without_contents
python run.py --config configs/run.json --model run_long_form --dataset asqa --task asqa --max_new_tokens 130 --retrieve_method bge --use_tvq
ALCE/ASQA offers a thorough evaluation of long-form QA using various metrics. To conduct the initial evaluation, you can install the ALCE repository and download the necessary data.
git clone https://github.com/princeton-nlp/ALCE.git
python3 -m alce_env
cd ALCE
bash download_data.sh
python run.py --config configs/run.json --model run_long_form --dataset fact --task fact --max_new_tokens 300 --retrieve_method bge_serper --use_tvq
Please follow the instructions in the FactScore official repository to set up your environment. Since the original repository is no longer maintained, consider using alternative sources like wj210's fork or armingh2000's FactScoreLite for evaluations. To proceed, use the command below:
python -m factscore.factscorer --data_path <output_file> --model_name retrieval+ChatGPT --cache_dir <cache_dir> --openai_key <openai_key> --verbose
python run.py --config configs/run.json --model run_long_form --dataset fresh --task fresh --max_new_tokens 1024 --retrieve_method serper --use_tvq
Please follow the instructions provided in the freshllms/freshqa repository, which includes complete data and codes of FreshLLMs, to conduct your evaluation.
If this work is helpful for you, please kindly cite it as follows:
@misc{liu2024ctrla,
title={CtrlA: Adaptive Retrieval-Augmented Generation via Probe-Guided Control},
author={Huanshuo Liu and Hao Zhang and Zhijiang Guo and Kuicai Dong and Xiangyang Li and Yi Quan Lee and Cong Zhang and Yong Liu},
year={2024},
eprint={2405.18727},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
If you have questions, feel free to send an email to huanshuo.liu[at]u.nus.edu.