castorini · clides · Jul 9, 2025 · Jul 6, 2025 · Jul 7, 2025 · Jul 7, 2025
diff --git a/docs/regressions/regressions-msmarco-v2.1-doc-segmented.splade-v3.cached.md b/docs/regressions/regressions-msmarco-v2.1-doc-segmented.splade-v3.cached.md
@@ -0,0 +1,92 @@
+# Anserini Regressions: MS MARCO V2.1 Document Ranking
+
+**Model**: [SPLADE-v3](https://arxiv.org/abs/2403.06789) (using cached queries)
+
+This page describes regression experiments for document ranking _on the segmented version_ of the MS MARCO V2.1 document corpus using the dev queries, which is integrated into Anserini's regression testing framework.
+This corpus was derived from the MS MARCO V2 _segmented_ document corpus and prepared for the TREC 2024 RAG Track.
+
+The model itself can be download [here](https://huggingface.co/naver/splade-v3).
+See the [official SPLADE repo](https://github.com/naver/splade) and the following paper for more details:
+
+> Carlos Lassance, Hervé Déjean, Thibault Formal, and Stéphane Clinchant. [SPLADE-v3: New baselines for SPLADE.](https://arxiv.org/abs/2403.06789) _arXiv:2403.06789_.
+
+In these experiments, we are using cached queries (i.e., cached results of query encoding).
+
+The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/msmarco-v2.1-doc-segmented.splade-v3.cached.yaml).
+Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/msmarco-v2.1-doc-segmented.splade-v3.cached.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
+
+From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:
+
+```
+python src/main/python/run_regression.py --index --verify --search --regression msmarco-v2.1-doc-segmented.splade-v3.cached
+```
+
+## Indexing
+
+Typical indexing command:
+
+```
+bin/run.sh io.anserini.index.IndexCollection \
+  -threads 24 \
+  -collection JsonVectorCollection \
+  -input /path/to/msmarco-v2.1-doc-segmented-splade-v3 \
+  -generator DefaultLuceneDocumentGenerator \
+  -index indexes/lucene-inverted.msmarco-v2.1-doc-segmented.splade-v3/ \
+  -impact -pretokenized \
+  >& logs/log.msmarco-v2.1-doc-segmented-splade-v3 &
+```
+
+The setting of `-input` should be a directory containing the compressed `jsonl` files that comprise the corpus.
+
+For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md).
+
+## Retrieval
+
+Topics and qrels are stored [here](https://github.com/castorini/anserini-tools/tree/master/topics-and-qrels), which is linked to the Anserini repo as a submodule.
+These evaluation resources are from the original V2 corpus, but have been "projected" over to the V2.1 corpus.
+
+After indexing has completed, you should be able to perform retrieval as follows:
+
+```
+bin/run.sh io.anserini.search.SearchCollection \
+  -index indexes/lucene-inverted.msmarco-v2.1-doc-segmented.splade-v3/ \
+  -topics tools/topics-and-qrels/topics.msmarco-v2-doc.dev.tsv.gz \
+  -topicReader TsvString \
+  -output runs/run.msmarco-v2.1-doc-segmented-splade-v3.splade-v3-cached.topics.msmarco-v2-doc.dev.txt \
+  -impact -pretokenized -removeQuery -hits 10000 -selectMaxPassage -selectMaxPassage.delimiter "#" -selectMaxPassage.hits 1000 &
+bin/run.sh io.anserini.search.SearchCollection \
+  -index indexes/lucene-inverted.msmarco-v2.1-doc-segmented.splade-v3/ \
+  -topics tools/topics-and-qrels/topics.msmarco-v2-doc.dev2.tsv.gz \
+  -topicReader TsvString \
+  -output runs/run.msmarco-v2.1-doc-segmented-splade-v3.splade-v3-cached.topics.msmarco-v2-doc.dev2.txt \
+  -impact -pretokenized -removeQuery -hits 10000 -selectMaxPassage -selectMaxPassage.delimiter "#" -selectMaxPassage.hits 1000 &
+```
+
+Evaluation can be performed using `trec_eval`:
+
+```
+bin/trec_eval -c -m recall.100 tools/topics-and-qrels/qrels.msmarco-v2.1-doc.dev.txt runs/run.msmarco-v2.1-doc-segmented-splade-v3.splade-v3-cached.topics.msmarco-v2-doc.dev.txt
+bin/trec_eval -c -m recall.1000 tools/topics-and-qrels/qrels.msmarco-v2.1-doc.dev.txt runs/run.msmarco-v2.1-doc-segmented-splade-v3.splade-v3-cached.topics.msmarco-v2-doc.dev.txt
+bin/trec_eval -c -M 100 -m map -c -M 100 -m recip_rank tools/topics-and-qrels/qrels.msmarco-v2.1-doc.dev.txt runs/run.msmarco-v2.1-doc-segmented-splade-v3.splade-v3-cached.topics.msmarco-v2-doc.dev.txt
+bin/trec_eval -c -m recall.100 tools/topics-and-qrels/qrels.msmarco-v2.1-doc.dev2.txt runs/run.msmarco-v2.1-doc-segmented-splade-v3.splade-v3-cached.topics.msmarco-v2-doc.dev2.txt
+bin/trec_eval -c -m recall.1000 tools/topics-and-qrels/qrels.msmarco-v2.1-doc.dev2.txt runs/run.msmarco-v2.1-doc-segmented-splade-v3.splade-v3-cached.topics.msmarco-v2-doc.dev2.txt
+bin/trec_eval -c -M 100 -m map -c -M 100 -m recip_rank tools/topics-and-qrels/qrels.msmarco-v2.1-doc.dev2.txt runs/run.msmarco-v2.1-doc-segmented-splade-v3.splade-v3-cached.topics.msmarco-v2-doc.dev2.txt
+```
+
+## Effectiveness
+
+With the above commands, you should be able to reproduce the following results:
+
+| **MAP@100**                                                                                                  | **SPLADE-v3**|
+|:-------------------------------------------------------------------------------------------------------------|-----------|
+| [MS MARCO V2 Doc: Dev](https://microsoft.github.io/msmarco/TREC-Deep-Learning.html)                          | 0.2846    |
+| [MS MARCO V2 Doc: Dev2](https://microsoft.github.io/msmarco/TREC-Deep-Learning.html)                         | 0.2836    |
+| **MRR@100**                                                                                                  | **SPLADE-v3**|
+| [MS MARCO V2 Doc: Dev](https://microsoft.github.io/msmarco/TREC-Deep-Learning.html)                          | 0.2874    |
+| [MS MARCO V2 Doc: Dev2](https://microsoft.github.io/msmarco/TREC-Deep-Learning.html)                         | 0.2869    |
+| **R@100**                                                                                                    | **SPLADE-v3**|
+| [MS MARCO V2 Doc: Dev](https://microsoft.github.io/msmarco/TREC-Deep-Learning.html)                          | 0.8446    |
+| [MS MARCO V2 Doc: Dev2](https://microsoft.github.io/msmarco/TREC-Deep-Learning.html)                         | 0.8462    |
+| **R@1000**                                                                                                   | **SPLADE-v3**|
+| [MS MARCO V2 Doc: Dev](https://microsoft.github.io/msmarco/TREC-Deep-Learning.html)                          | 0.9390    |
+| [MS MARCO V2 Doc: Dev2](https://microsoft.github.io/msmarco/TREC-Deep-Learning.html)                         | 0.9407    |
diff --git a/docs/regressions/regressions-msmarco-v2.1-doc-segmented.splade-v3.onnx.md b/docs/regressions/regressions-msmarco-v2.1-doc-segmented.splade-v3.onnx.md
@@ -0,0 +1,92 @@
+# Anserini Regressions: MS MARCO V2.1 Document Ranking
+
+**Model**: [SPLADE-v3](https://arxiv.org/abs/2403.06789) (using ONNX for on-the-fly query encoding)
+
+This page describes regression experiments for document ranking _on the segmented version_ of the MS MARCO V2.1 document corpus using the dev queries, which is integrated into Anserini's regression testing framework.
+This corpus was derived from the MS MARCO V2 _segmented_ document corpus and prepared for the TREC 2024 RAG Track.
+
+The model itself can be download [here](https://huggingface.co/naver/splade-v3).
+See the [official SPLADE repo](https://github.com/naver/splade) and the following paper for more details:
+
+> Carlos Lassance, Hervé Déjean, Thibault Formal, and Stéphane Clinchant. [SPLADE-v3: New baselines for SPLADE.](https://arxiv.org/abs/2403.06789) _arXiv:2403.06789_.
+
+In these experiments, we are using ONNX to perform query encoding on the fly.
+
+The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/msmarco-v2.1-doc-segmented.splade-v3.onnx.yaml).
+Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/msmarco-v2.1-doc-segmented.splade-v3.onnx.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
+
+From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:
+
+```
+python src/main/python/run_regression.py --index --verify --search --regression msmarco-v2.1-doc-segmented.splade-v3.onnx
+```
+
+## Indexing
+
+Typical indexing command:
+
+```
+bin/run.sh io.anserini.index.IndexCollection \
+  -threads 24 \
+  -collection JsonVectorCollection \
+  -input /path/to/msmarco-v2.1-doc-segmented-splade-v3 \
+  -generator DefaultLuceneDocumentGenerator \
+  -index indexes/lucene-inverted.msmarco-v2.1-doc-segmented.splade-v3/ \
+  -impact -pretokenized \
+  >& logs/log.msmarco-v2.1-doc-segmented-splade-v3 &
+```
+
+The setting of `-input` should be a directory containing the compressed `jsonl` files that comprise the corpus.
+
+For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md).
+
+## Retrieval
+
+Topics and qrels are stored [here](https://github.com/castorini/anserini-tools/tree/master/topics-and-qrels), which is linked to the Anserini repo as a submodule.
+These evaluation resources are from the original V2 corpus, but have been "projected" over to the V2.1 corpus.
+
+After indexing has completed, you should be able to perform retrieval as follows:
+
+```
+bin/run.sh io.anserini.search.SearchCollection \
+  -index indexes/lucene-inverted.msmarco-v2.1-doc-segmented.splade-v3/ \
+  -topics tools/topics-and-qrels/topics.msmarco-v2-doc.dev.txt \
+  -topicReader TsvString \
+  -output runs/run.msmarco-v2.1-doc-segmented-splade-v3.splade-v3-onnx.topics.msmarco-v2-doc.dev.txt \
+  -impact -pretokenized -removeQuery -hits 10000 -selectMaxPassage -selectMaxPassage.delimiter "#" -selectMaxPassage.hits 1000 -encoder SpladeV3 &
+bin/run.sh io.anserini.search.SearchCollection \
+  -index indexes/lucene-inverted.msmarco-v2.1-doc-segmented.splade-v3/ \
+  -topics tools/topics-and-qrels/topics.msmarco-v2-doc.dev2.txt \
+  -topicReader TsvString \
+  -output runs/run.msmarco-v2.1-doc-segmented-splade-v3.splade-v3-onnx.topics.msmarco-v2-doc.dev2.txt \
+  -impact -pretokenized -removeQuery -hits 10000 -selectMaxPassage -selectMaxPassage.delimiter "#" -selectMaxPassage.hits 1000 -encoder SpladeV3 &
+```
+
+Evaluation can be performed using `trec_eval`:
+
+```
+bin/trec_eval -c -m recall.100 tools/topics-and-qrels/qrels.msmarco-v2.1-doc.dev.txt runs/run.msmarco-v2.1-doc-segmented-splade-v3.splade-v3-onnx.topics.msmarco-v2-doc.dev.txt
+bin/trec_eval -c -m recall.1000 tools/topics-and-qrels/qrels.msmarco-v2.1-doc.dev.txt runs/run.msmarco-v2.1-doc-segmented-splade-v3.splade-v3-onnx.topics.msmarco-v2-doc.dev.txt
+bin/trec_eval -c -M 100 -m map -c -M 100 -m recip_rank tools/topics-and-qrels/qrels.msmarco-v2.1-doc.dev.txt runs/run.msmarco-v2.1-doc-segmented-splade-v3.splade-v3-onnx.topics.msmarco-v2-doc.dev.txt
+bin/trec_eval -c -m recall.100 tools/topics-and-qrels/qrels.msmarco-v2.1-doc.dev2.txt runs/run.msmarco-v2.1-doc-segmented-splade-v3.splade-v3-onnx.topics.msmarco-v2-doc.dev2.txt
+bin/trec_eval -c -m recall.1000 tools/topics-and-qrels/qrels.msmarco-v2.1-doc.dev2.txt runs/run.msmarco-v2.1-doc-segmented-splade-v3.splade-v3-onnx.topics.msmarco-v2-doc.dev2.txt
+bin/trec_eval -c -M 100 -m map -c -M 100 -m recip_rank tools/topics-and-qrels/qrels.msmarco-v2.1-doc.dev2.txt runs/run.msmarco-v2.1-doc-segmented-splade-v3.splade-v3-onnx.topics.msmarco-v2-doc.dev2.txt
+```
+
+## Effectiveness
+
+With the above commands, you should be able to reproduce the following results:
+
+| **MAP@100**                                                                                                  | **SPLADE-v3**|
+|:-------------------------------------------------------------------------------------------------------------|-----------|
+| [MS MARCO V2 Doc: Dev](https://microsoft.github.io/msmarco/TREC-Deep-Learning.html)                          | 0.2846    |
+| [MS MARCO V2 Doc: Dev2](https://microsoft.github.io/msmarco/TREC-Deep-Learning.html)                         | 0.2836    |
+| **MRR@100**                                                                                                  | **SPLADE-v3**|
+| [MS MARCO V2 Doc: Dev](https://microsoft.github.io/msmarco/TREC-Deep-Learning.html)                          | 0.2874    |
+| [MS MARCO V2 Doc: Dev2](https://microsoft.github.io/msmarco/TREC-Deep-Learning.html)                         | 0.2869    |
+| **R@100**                                                                                                    | **SPLADE-v3**|
+| [MS MARCO V2 Doc: Dev](https://microsoft.github.io/msmarco/TREC-Deep-Learning.html)                          | 0.8446    |
+| [MS MARCO V2 Doc: Dev2](https://microsoft.github.io/msmarco/TREC-Deep-Learning.html)                         | 0.8462    |
+| **R@1000**                                                                                                   | **SPLADE-v3**|
+| [MS MARCO V2 Doc: Dev](https://microsoft.github.io/msmarco/TREC-Deep-Learning.html)                          | 0.9390    |
+| [MS MARCO V2 Doc: Dev2](https://microsoft.github.io/msmarco/TREC-Deep-Learning.html)                         | 0.9407    |
diff --git a/docs/regressions/regressions-rag24-doc-segmented-test-nist.splade-v3.cached.md b/docs/regressions/regressions-rag24-doc-segmented-test-nist.splade-v3.cached.md
@@ -0,0 +1,82 @@
+# Anserini Regressions: TREC 2024 RAG Track Test Topics
+
+**Model**: [SPLADE-v3](https://arxiv.org/abs/2403.06789) (using cached queries)
+
+This page describes regression experiments for ranking _on the segmented version_ of the MS MARCO V2.1 document corpus using the test topics (= queries in TREC parlance), which is integrated into Anserini's regression testing framework.
+This corpus was derived from the MS MARCO V2 _segmented_ document corpus and prepared for the TREC 2024 RAG Track.
+
+The model itself can be download [here](https://huggingface.co/naver/splade-v3).
+See the [official SPLADE repo](https://github.com/naver/splade) and the following paper for more details:
+
+> Carlos Lassance, Hervé Déjean, Thibault Formal, and Stéphane Clinchant. [SPLADE-v3: New baselines for SPLADE.](https://arxiv.org/abs/2403.06789) _arXiv:2403.06789_.
+
+In these experiments, we are using cached queries (i.e., cached results of query encoding).
+
+Evaluation uses qrels over 89 topics from the TREC 2024 RAG Track test set.
+These qrels represent manual relevance judgments from NIST assessors, contrasted with automatically generated UMBRELA judgments.
+See the following paper for more details:
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+
+The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/rag24-doc-segmented-test-nist.splade-v3.cached.yaml).
+Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.splade-v3.cached.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation.
+
+From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:
+
+```
+python src/main/python/run_regression.py --index --verify --search --regression rag24-doc-segmented-test-nist.splade-v3.cached
+```
+
+## Indexing
+
+Sample indexing command:
+
+```
+bin/run.sh io.anserini.index.IndexCollection \
+  -threads 24 \
+  -collection JsonVectorCollection \
+  -input /path/to/msmarco-v2.1-doc-segmented-splade-v3 \
+  -generator DefaultLuceneDocumentGenerator \
+  -index indexes/lucene-inverted.msmarco-v2.1-doc-segmented.splade-v3/ \
+  -impact -pretokenized \
+  >& logs/log.msmarco-v2.1-doc-segmented-splade-v3 &
+```
+
+The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens.
+For additional details, see explanation of [common indexing options](../../docs/common-indexing-options.md).
+
+## Retrieval
+
+Here, we are using 89 test topics from the TREC 2024 RAG Track with manual relevance judgments from NIST assessors.
+Topics and qrels are stored [here](https://github.com/castorini/anserini-tools/tree/master/topics-and-qrels), which is linked to the Anserini repo as a submodule.
+
+After indexing has completed, you should be able to perform retrieval as follows:
+
+```
+bin/run.sh io.anserini.search.SearchCollection \
+  -index indexes/lucene-inverted.msmarco-v2.1-doc-segmented.splade-v3/ \
+  -topics tools/topics-and-qrels/topics.rag24.test.splade-v3.tsv.gz \
+  -topicReader TsvString \
+  -output runs/run.msmarco-v2.1-doc-segmented-splade-v3.splade-v3-cached.topics.rag24.test.splade-v3.txt \
+  -impact -pretokenized -removeQuery -hits 1000 &
+```
+
+Evaluation can be performed using `trec_eval`:
+
+```
+bin/trec_eval -c -m ndcg_cut.20 tools/topics-and-qrels/qrels.rag24.test.txt runs/run.msmarco-v2.1-doc-segmented-splade-v3.splade-v3-cached.topics.rag24.test.splade-v3.txt
+bin/trec_eval -c -m ndcg_cut.100 tools/topics-and-qrels/qrels.rag24.test.txt runs/run.msmarco-v2.1-doc-segmented-splade-v3.splade-v3-cached.topics.rag24.test.splade-v3.txt
+bin/trec_eval -c -m recall.100 tools/topics-and-qrels/qrels.rag24.test.txt runs/run.msmarco-v2.1-doc-segmented-splade-v3.splade-v3-cached.topics.rag24.test.splade-v3.txt
+```
+
+## Effectiveness
+
+With the above commands, you should be able to reproduce the following results:
+
+| **nDCG@20**                                                                                                  | **SPLADE-v3**|
+|:-------------------------------------------------------------------------------------------------------------|-----------|
+| RAG 24: Test queries                                                                                         | 0.4642    |
+| **nDCG@100**                                                                                                 | **SPLADE-v3**|
+| RAG 24: Test queries                                                                                         | 0.4349    |
+| **R@100**                                                                                                    | **SPLADE-v3**|
+| RAG 24: Test queries                                                                                         | 0.3198    |