8000 Add metadata for prebuilt indexes for Pyserini alignment by lilyjge · Pull Request #2853 · castorini/anserini · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Add metadata for prebuilt indexes for Pyserini alignment #2853

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 24, 2025

Conversation

lilyjge
Copy link
Member
@lilyjge lilyjge commented Jun 23, 2025

Addresses #2852. This adds all the metadata removed by castorini/pyserini#2159 to Anserinin's prebuilt indexes. However, some Anserini indexes such as MSMARCO v2 shard don't exist in Pyserini, some are new like SPLADEv3 on BEIR, and some were already ported over from Anserini like BEIR BGE, so the metadata doesn't exist in Pyserini. Does it exist anywhere? Specificially: file size (compressed), number of total terms, number of documents, and number of unique terms.

@lintool
Copy link
Member
lintool commented Jun 23, 2025

@lilyjge thanks for working on this - I think this is good for now, for v1.1.0 - let's circle back and do more reconciliation later.

@lilyjge lilyjge merged commit 68a6284 into castorini:master Jun 24, 2025
1 check passed
@lilyjge lilyjge deleted the metadata branch June 24, 2025 00:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0