You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
I'm trying to match 3M lines of emails to a 700M lines (roughly 50GB), but after everything is going smoothly and after doing a bunch of tests, I can't get a single match returned, even on the emails/user that I know are in my dataset for sure.
All the processes are running on an AWS instance (so I followed the server deployment steps), tried to build from source and use the released version, tried to split my data into smaller files, but still no results. I tried launching the server version and requesting through http request as well.
The really wieird thing is that when I run a search on the test folder you provide, it works properly with your provided indexes.
But when I try to regenerate the indexes for small.txt using the doc from the wiki, I'm not getting any results and when I diff my generated index, and the one you provide, they differ, so I'm guessing it has something to do with how the index generation/sorting .
Describe the bug
I'm trying to match 3M lines of emails to a 700M lines (roughly 50GB), but after everything is going smoothly and after doing a bunch of tests, I can't get a single match returned, even on the emails/user that I know are in my dataset for sure.
All the processes are running on an AWS instance (so I followed the server deployment steps), tried to build from source and use the released version, tried to split my data into smaller files, but still no results. I tried launching the server version and requesting through http request as well.
The really wieird thing is that when I run a search on the test folder you provide, it works properly with your provided indexes.
But when I try to regenerate the indexes for small.txt using the doc from the wiki, I'm not getting any results and when I diff my generated index, and the one you provide, they differ, so I'm guessing it has something to do with how the index generation/sorting .
To Reproduce
Steps to reproduce the behavior:
./leakdb-curator --format colon-newline --recursive --target ./large-folder-containing-all --output normalized.json
./leakdb-curator --json normalized.json
./leakdb-curator search -i leakdb/email.idx -j leakdb/bloomed.json -v "xxx@gmail.com"
Response :
Found 0 results ..
grep -F "xxx@gmail.com" bloomed.json
Response :
{"email": "xxx", "user": "xxx", "domain": "gmail.com", "password": "xxx"}
I really wish I could get this to work because it looks amazing, I'm at your disposal for any questions/tests you want me to run.
Enzyro
The text was updated successfully, but these errors were encountered: