8000 Release Lean version · jourlin/FELTS · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Lean version

Pre-release
Pre-release
Compare
Choose a tag to compare
@jourlin jourlin released this 03 Dec 16:45
· 87 commits to master since this release

This version is entirely based on a perfect minimal Jenkin's hash function for terms.
It was tested on a dictionary of wikipedia titles for french + english + spanish, i.e. over 9.5 millions of distinct terms composed of over 4.5 millions of words. When asked to extract terms on the dictionnary itself (9.5 millions of distinct terms), it requires 500 Mb of RAM and can process 9.5 millions of terms in less than 30 minutes on a single core of a Intel® Core™ i7-2670QM CPU @ 2.20GHz × 8.

0