8000 GitHub - fmaglia/BoI: Multi-index hashing for the resolution of ANN search problem on large datasets
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

fmaglia/BoI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 

Repository files navigation

Bag of Indexes (BoI)

[paper] [project] [poster]

Bag of Indexes (BoI) is a multi-index hashing C++ library, used for solving Approximate Nearest Neighbors (ANN) search problems. In the belove figure, an example of BoI's working is presented.

BoI_overview

LSH and multi-probe LSH are the only, at the moment, projection functions implemented.

Datasets

The original dataset files are converted in binary through the application of a simple C++ script:

After downloading the dat files you need to create a folder called dataset and then put in the uncompressed version.

Installation

  • Requirements:
    • G++ 5.4 or greater
    • openmp
  • Build: g++ -o BoI BoI.cpp -lstdc++fs -std=c++14 -fopenmp -O2

Test

./BoI SIFT1M 16 25 500 true ..

In the previous command the values of the parameters are:

  • δ = 16;
  • L = 25;
  • topN = 500;
  • fast re-ranking = true (loading descriptors from RAM).

Results

SIFT1M

The following table represents the results obtained through the application of BoI adaptive multi-probe LSH, trying to change some parameters:

  • L: number of hash tables;
  • δ: number of buckets per hash table (hash dimension);
  • topN: number of top elements to re-rank based on original distance (usually Euclidean distance).

In every test, the neighborhood is setted to 3.

Configuration 1 10 100 avg retrieval time
δ = 16, L = 25, top500 90.50% 91.57% 91.57% 6 msec
δ = 15, L = 50, top500 98.36% 99.19% 99.19% 18 msec
δ = 15, L = 50, top10k 99.20% 100% 100% 18 msec

The reduced average retrieval time is obtained through the multi-threading and the cut-off (an unsuperivesd reduction of the BoI structure). For more info see the pape 5C77 r cited in the Reference section. The reason behind the same retrieval time changing the top elements to re-rank is due to the speed of loading files from RAM memory.

GIST1M

Configuration 1 10 100 avg retrieval time
δ = 16, L = 100, top500 79.80% 80.80% 80.80% 60 msec
δ = 16, L = 100, top10k 97.40% 98.50% 98.50% 100 msec

All the experiments are evaluated following the Recall@R, that is the average rate of queries for which the 1-nearest neighbor is ranked in the top R positions.

Reference

@article{magliani2018efficient,
  title={Efficient Nearest Neighbors Search for Large-Scale Landmark Recognition},
  author={Magliani, Federico and Fontanini, Tomaso and Prati, Andrea},
  journal={arXiv preprint arXiv:1806.05946},
  year={2018}
}

About

Multi-index hashing for the resolution of ANN search problem on large datasets

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0