8000 Add ability to serialize to alternate formats · Issue #19 · bleroy/lunr-core · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Add ability to serialize to alternate formats #19

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
danielcrenna opened this issue Aug 20, 2020 · 4 comments
Closed

Add ability to serialize to alternate formats #19

danielcrenna opened this issue Aug 20, 2020 · 4 comments
Assignees

Comments

@danielcrenna
Copy link
Contributor

Is your feature request related to a problem? Please describe.
I'd like to offload larger indexes so they don't require being present in memory via the Index type. For my use case, for example, I want to store the reverse indices, pipelines, vectors, etc., in LMDB, so that I only need to pull the relevant indices for a range of data records.

Describe the solution you'd like
I want the ability to "plug in" an alternate serialization interface that can defer lookups rather than require indices readily available from in-process memory.

Additional context

  • I'm willing to do the work for this, but wanted to file a request first as it might not be a goal you have, as it's orthogonal to lunr.js' feature set.
@bleroy
Copy link
Owner
bleroy commented Aug 20, 2020

Sounds like a great feature. Totally agree it would be very useful.

@danielcrenna
Copy link
Contributor Author

Sounds like a great feature. Totally agree it would be very useful.

OK. I'll start picking away. Based on my first read-through, it seems this can be done in a separate "contrib" library, to keep the extensions separate from the main lunr code, but I could be wrong, when it comes to invoking the pipeline at query time, but I can least start with an alternate serialization that's disk-based and go from there. Most likely an alternate search extension method is required to do the deferred lookups.

@danielcrenna
Copy link
Contributor Author
danielcrenna commented Oct 8, 2020

@bleroy sorry for the long delay on this -- I've had some time to look through the codebase, and it seems to me that the most tractable approach for alternate storage would be allowing the user to swap the currently fixed inproc InvertedIndex (which at the moment is just a converter-holding surrogate for serializing a dictionary), with an interface equivalent, and letting the user pass a custom one in the Builder ctor.

This way, logically anything that implements IDictionary<string, InvertedIndexEntry> could be plugged in there. I started an early draft of this here: danielcrenna@c7e5d8d for book-keeping and thoughts.

This puts the burden of an efficient alternative on the implementer of the dictionary interface, but it seems tractable (for example, implementing a fast-read LMDB dictionary is fairly trivial).

From here I think it is just implementing similar shims for fields, vectors, and tokens.

Didn't want to get too far without you having a think.

@bleroy
Copy link
Owner
bleroy commented Oct 8, 2020

That sounds good. I’ll take a look at your draft but your description sounds fine. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
0