Open
Description
I was looking at your code and attempting to recreate your results.
If this this is how the results quoted in the paper were obtained it seems a bit strange that you are fine-tuning your threshold on the test set. Not withstanding the fact that the threshold is tuned per dataset in the benchmark (this being mentioned in the paper).
Metadata
Metadata
Assignees
Labels
No labels