DyVo

Codebase for the paper:
"DyVo: Dynamic Vocabularies for Learned Sparse Retrieval with Entities"
(EMNLP 2024)

Steps to Run the Code

1. Create Conda Environment and Install Dependencies

Create and activate the environment:

conda create --name lsr python=3.9.12
conda activate lsr

Install required packages:

pip install -r requirements.txt

2. Download Codebase and Data

2.1 Clone the DyVo Repository

git clone https://github.com/thongnt99/DyVo

2.2 Create a Data Directory

cd DyVo
mkdir dyvo_data
cd dyvo_data

2.3 Download Entity Data from Hugging Face

Make sure the Hugging Face CLI is installed:

pip install huggingface_hub

Then download the data:

huggingface-cli download lsr42/dyvo_data

Note:

You may need to log in to Hugging Face before downloading:
```
huggingface-cli login
```
The downloaded files will be cached locally. Refer to the Hugging Face CLI documentation for cache settings if needed.

2.4 Query and Document Collections (Wapo, Codec, Robust04)

Queries and documents are accessible via ir-datasets.
Please refer to the website for instructions on how to download them.

Dataset	`ir_datasets` Key
Wapo	wapo/v2/trec-core-2018
Robust04	disks45/nocr/trec-robust-2004
Codec	codec

3. Train and Evaluate a Model

Example command to start training:

python -m lsr.train +experiment=qmlp_dmlm_emlm_laque_wapo_msmarco_pretrained_inparsv2_monot53b_distillation_l1_0.0_0.001_entw_0.05.yaml training_arguments.fp16=True

The list of experiment configuration files can be found in the lsr/configs/experiment/ directory.

Citing DyVo

If you find this repository helpful, please cite our paper:

@inproceedings{nguyen-etal-2024-dyvo,
    title = "DyVo: Dynamic Vocabularies for Learned Sparse Retrieval with Entities",
    author = "Nguyen, Thong  and
              Chatterjee, Shubham  and
              MacAvaney, Sean  and
              Mackie, Iain  and
              Dalton, Jeff  and
              Yates, Andrew",
    booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2024"
}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
lsr		lsr
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
inf.py		inf.py
requirements.txt		requirements.txt
8000

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DyVo

Steps to Run the Code

1. Create Conda Environment and Install Dependencies

2. Download Codebase and Data

2.1 Clone the DyVo Repository

2.2 Create a Data Directory

2.3 Download Entity Data from Hugging Face

2.4 Query and Document Collections (Wapo, Codec, Robust04)

3. Train and Evaluate a Model

Citing DyVo

About

Uh oh!

Uh oh!

Languages

License

thongnt99/DyVo

Folders and files

Latest commit

History

Repository files navigation

DyVo

Steps to Run the Code

1. Create Conda Environment and Install Dependencies

2. Download Codebase and Data

2.1 Clone the DyVo Repository

2.2 Create a Data Directory

2.3 Download Entity Data from Hugging Face

2.4 Query and Document Collections (Wapo, Codec, Robust04)

3. Train and Evaluate a Model

Citing DyVo

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages