8000 Added msmarco v2.1 doc segmented splade-v3 bindings by clides · Pull Request #2890 · castorini/anserini · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Added msmarco v2.1 doc segmented splade-v3 bindings #2890

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Jul 9, 2025

Conversation

clides
Copy link
Member
@clides clides commented Jul 9, 2025

This pull request introduces regression tests and documentation updates for the SPLADE-v3 model applied to the MS MARCO V2.1 segmented document corpus, supporting both cached query encoding and ONNX-based on-the-fly query encoding. Key changes include the addition of new regression configurations, templates, and documentation, as well as updates to the IndexInfo enumeration and associated tests.

Regression Tests and Documentation Updates:

Cached Query Encoding:

  • Added a new regression configuration file rag24-doc-segmented-test-umbrela.splade-v3.cached.yaml for cached query encoding with SPLADE-v3, including metrics, topics, and model parameters.
  • Created a documentation template rag24-doc-segmented-test-umbrela.splade-v3.cached.template for generating regression test pages for cached queries.
  • Generated documentation page regressions-rag24-doc-segmented-test-umbrela.splade-v3.cached.md based on the template.

ONNX Query Encoding:

  • Added a new regression configuration file rag24-doc-segmented-test-umbrela.splade-v3.onnx.yaml for ONNX-based query encoding with SPLADE-v3, including metrics, topics, and model parameters.
  • Created a documentation template rag24-doc-segmented-test-umbrela.splade-v3.onnx.template for generating regression test pages for ONNX-based queries.
  • Generated documentation page regressions-rag24-doc-segmented-test-umbrela.splade-v3.onnx.md based on the template.

Codebase Updates:

  • Added a new entry MSMARCO_V21_DOC_SEGMENTED_SPLADE_V3 to the IndexInfo enumeration in IndexInfo.java to define the SPLADE-v3 index metadata.
  • Updated the prebuilt index count in the test testNumPrebuiltIndexes to reflect the addition of the new index.

@lintool lintool self-requested a review July 9, 2025 12:46
Copy link
Member
@lintool lintool left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @clides - few items:

  • you need to update tools and commit the update so I can get access to tools/topics-and-qrels/topics.rag24.test.splade-v3.tsv.gz
  • please create the *nist variants of the YAML
  • please add the README in the hgf repo

@clides clides merged commit 35ab54d into castorini:master Jul 9, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0