-
-
Notifications
You must be signed in to change notification settings - Fork 25
Add ability to serialize to alternate formats #19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Sounds like a great feature. Totally agree it would be very useful. |
OK. I'll start picking away. Based on my first read-through, it seems this can be done in a separate "contrib" library, to keep the extensions separate from the main lunr code, but I could be wrong, when it comes to invoking the pipeline at query time, but I can least start with an alternate serialization that's disk-based and go from there. Most likely an alternate search extension method is required to do the deferred lookups. |
@bleroy sorry for the long delay on this -- I've had some time to look through the codebase, and it seems to me that the most tractable approach for alternate storage would be allowing the user to swap the currently fixed inproc This way, logically anything that implements This puts the burden of an efficient alternative on the implementer of the dictionary interface, but it seems tractable (for example, implementing a fast-read LMDB dictionary is fairly trivial). From here I think it is just implementing similar shims for fields, vectors, and tokens. Didn't want to get too far without you having a think. |
That sounds good. I’ll take a look at your draft but your description sounds fine. Thanks! |
Is your feature request related to a problem? Please describe.
I'd like to offload larger indexes so they don't require being present in memory via the Index type. For my use case, for example, I want to store the reverse indices, pipelines, vectors, etc., in LMDB, so that I only need to pull the relevant indices for a range of data records.
Describe the solution you'd like
I want the ability to "plug in" an alternate serialization interface that can defer lookups rather than require indices readily available from in-process memory.
Additional context
The text was updated successfully, but these errors were encountered: