-
Notifications
You must be signed in to change notification settings - Fork 487
Adds run fusion to anserini #2489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails. 8000
Already on GitHub? Sign in to your account
Conversation
…yserini convention
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
package io.anserini.fusion; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra blank line
this.runtag = runtag; | ||
} | ||
|
||
private class Document implements Comparable<Document>{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private class Document implements Comparable<Document>{ | |
private class Document implements Comparable<Document> { |
String docid; | ||
double score; | ||
|
||
public Document(String docid, double score) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
House style, brace on previous line
this.score = score; | ||
} | ||
|
||
@Override public int compareTo(Document a) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
House style, annotation on separate line.
|
||
} | ||
|
||
public void writeTopic(String qid, HashMap<String,Double> results) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public void writeTopic(String qid, HashMap<String,Double> results) { | |
public void writeTopic(String qid, Map<String,Double> results) { |
No need to specify impl
|
||
public void writeTopic(String qid, HashMap<String,Double> results) { | ||
int rank = 1; | ||
ArrayList<Document> documents = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ArrayList<Document> documents = new ArrayList<>(); | |
List<Document> documents = new ArrayList<>(); |
ditto
} | ||
} | ||
|
||
public class FuseRuns { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Python, it doesn't violate best practice to have multiple classes in a single file.
In Java it's different... best practice is one class per file. If you need extra classes, organize as inner classes.
So above class should be it's own file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And FusedRunOutputWriter
also.
|
||
} | ||
|
||
public static TreeMap<String, HashMap<String,DocScore>> createRunMap(String filename) throws FileNotFoundException, IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, think about when you really need to specify impl?
|
||
TreeMap<String, HashMap<String, Double>> finalRun; | ||
|
||
if (fuseArgs.method.equals(FusionMethods.AVERAGE)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm... see high-level comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm... I'm not sure I like the design.
Should there be like an actual RunsFuser
class (I don't like the name but we can quibble about it separately) that performs the actual fusion. Right now, you have FusionMethods
that's dispatched from a main class.
So, I'm thinking - move the logic of fusion into RunsFuser
?
I like the design pattern of a lightweight main class that really just parses args, and dispatches to class that handles the heavy lifting.
E.g., HnswDenseSearcher
is the main searcher class https://github.com/castorini/anserini/blob/master/src/main/java/io/anserini/search/HnswDenseSearcher.java
And SearchHnswDenseVectors.java
does arg parsing and dispatches:
https://github.com/castorini/anserini/blob/master/src/main/java/io/anserini/search/SearchHnswDenseVectors.java
Thoughts? Open to discussion...
(Updated, fixed links)
Working on it now. |
Pinged on Slack, Aug 28th, @DanielKohn1208 doesn't appear to be responsive. Moving on... |
Closed by #2590 |
This adds run fusion to anserini to match pyserini's implementation and closes #2355. To fuse 2 runs we use the following command
Currently the supported methods are
rrf
,interpolation
, andaverage