8000 GitHub - thongnt99/DyVo: EMNLP 2024: Dynamic Vocabularies For Learned Sparse Retrieval with Entities
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

thongnt99/DyVo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
8000

Repository files navigation

DyVo

Codebase for the paper:
"DyVo: Dynamic Vocabularies for Learned Sparse Retrieval with Entities"
(EMNLP 2024)


Steps to Run the Code

1. Create Conda Environment and Install Dependencies

Create and activate the environment:

conda create --name lsr python=3.9.12
conda activate lsr

Install required packages:

pip install -r requirements.txt

2. Download Codebase and Data

2.1 Clone the DyVo Repository

git clone https://github.com/thongnt99/DyVo

2.2 Create a Data Directory

cd DyVo
mkdir dyvo_data
cd dyvo_data

2.3 Download Entity Data from Hugging Face

Make sure the Hugging Face CLI is installed:

pip install huggingface_hub

Then download the data:

huggingface-cli download lsr42/dyvo_data

Note:

  • You may need to log in to Hugging Face before downloading:
    huggingface-cli login
  • The downloaded files will be cached locally. Refer to the Hugging Face CLI documentation for cache settings if needed.

2.4 Query and Document Collections (Wapo, Codec, Robust04)

Queries and documents are accessible via ir-datasets.
Please refer to the website for instructions on how to download them.

Dataset ir_datasets Key
Wapo wapo/v2/trec-core-2018
Robust04 disks45/nocr/trec-robust-2004
Codec codec

3. Train and Evaluate a Model

Example command to start training:

python -m lsr.train +experiment=qmlp_dmlm_emlm_laque_wapo_msmarco_pretrained_inparsv2_monot53b_distillation_l1_0.0_0.001_entw_0.05.yaml training_arguments.fp16=True
  • The list of experiment configuration files can be found in the lsr/configs/experiment/ directory.

Citing DyVo

If you find this repository helpful, please cite our paper:

@inproceedings{nguyen-etal-2024-dyvo,
    title = "DyVo: Dynamic Vocabularies for Learned Sparse Retrieval with Entities",
    author = "Nguyen, Thong  and
              Chatterjee, Shubham  and
              MacAvaney, Sean  and
              Mackie, Iain  and
              Dalton, Jeff  and
              Yates, Andrew",
    booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2024"
}

About

EMNLP 2024: Dynamic Vocabularies For Learned Sparse Retrieval with Entities

Resources

License

Stars

Watchers

Forks

Languages

0