8000 Adds fusion feature for Anserini by Stefan824 · Pull Request #2590 · castorini/anserini · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Adds fusion feature for Anserini #2590

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Sep 8, 2024
Merged

Conversation

Stefan824
Copy link
Contributor

The design is following :

  • One TrecRun class that handles TrecRun-related methods including file-read, rescore, clone...
  • One main class TrecRunFuser that calls methods from TrecRun and actually does the fusion
  • One entry point class FuseTrecRuns that is in charge of only arguments and dispatches

Did a number of tests on codes but not thorough enough. Will test more if the design is accepted.

Should close #2355 and the results align with pyserini.

pom.xml Outdated
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.junit.jupiter</groupId>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

algin indentation with above

Copy link
Member
@lintool lintool left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's a need for a new package src/main/java/io/anserini/trectools/ - just dump everything into io.anserini.fusion for now?


// Constructor with reSort parameter
public TrecRun(Path filepath, Boolean reSort) throws IOException {
this.resetData();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consistent two space indentation?

}

private void resetData() {
runData = new ArrayList<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@@ -0,0 +1,3 @@
1 Q0 pyeb86on 1 24 reciprocal_rank_fusion_k=60
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put all these files under a fusion/ directory please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These test files were actually from Daniel early on; I didn't include my test files on the commit. Of course, I could move these to a fusion/ directory or just delete them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these files are actually used, then please move. If they're not, then remove.

@@ -0,0 +1,76 @@
package io.anserini.trectools;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add boilerplate to here also?

@lintool
Copy link
Member
lintool commented Sep 7, 2024

Oh, I see, you're modeling the structure after pyserini... yea, let's do a from-ground-up redesign.

Copy link
Member
@lintool lintool left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please look more carefully at the test cases?


@Before
public void setUp() throws IOException {
sampleFilePath = Paths.get("runs/testlong/run.neuclir22-zh-en-splade.splade.topics.neuclir22-en.splade.original-desc_title.txt");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check? This file doesn't exist...?

Please verify that the test cases are actually doing something...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So sorry. The file seemed a bit too big for git. The plan was to bring this up when doing more tests after the code style/structure is accepted.

Copy link
Member
@lintool lintool left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good enough for now...

@lintool lintool merged commit 4ddc640 into castorini:master Sep 8, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0