-
-
Notifications
You must be signed in to change notification settings - Fork 25
Proposal: DelegatedIndex #26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Oooh, cool, thanks for the contrib. Will review ASAP. |
There's some hiccup in the tests I'll investigate if this is deemed a viable pathway -- looks like a linux LMDB issue. |
It looks like they only provide a runtime for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty cool, thanks for the awesome contribution. My biggest feedback is on the delegate vs. interface thing.
LunrCore.sln.DotSettings
Outdated
@@ -0,0 +1,6 @@ | |||
<wpf:ResourceDictionary xml:space="preserve" xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml" xmlns:s="clr-namespace:System;assembly=mscorlib" xmlns:ss="urn:shemas-jetbrains-com:settings-storage-xaml" xmlns:wpf="http://schemas.microsoft.com/winfx/2006/xaml/presentation"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also exclude the Resharper dotsettings files.
LunrCoreLmdb/DelegatedIndex.cs
Outdated
foreach (string expandedTerm in expandedTerms) | ||
{ | ||
// For each term get the posting and termIndex, this is required for building the query vector. | ||
InvertedIndexEntry posting = GetInvertedIndexEntryByKey(expandedTerm); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've got a bunch of possible null value warnings here. I know there are some in the original code as well, that I'll take care of, but it would be great to fix those as well before I merge
{ | ||
unsafe | ||
{ | ||
fixed(byte* buf = &buffer.GetPinnableReference()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious how much better this is than using Span<T>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to answer this via: 4482cfd
Note that the benchmark implementation is kind of wonky because I couldn't find an API that allowed me to construct a Stream
from a Span<byte>
, so I had to call the primitive methods directly.
It does look like Span is the clear winner, but I'm not confident in my implementation since it requires manually slicing the buffer.
BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.1256 (1909/November2018Update/19H2)
Intel Core i9-9820X CPU 3.30GHz, 1 CPU, 20 logical and 10 physical cores
.NET Core SDK=5.0.101
[Host] : .NET Core 3.1.10 (CoreCLR 4.700.20.51601, CoreFX 4.700.20.51901), X64 RyuJIT
DefaultJob : .NET Core 3.1.10 (CoreCLR 4.700.20.51601, CoreFX 4.700.20.51901), X64 RyuJIT
Method | Mean | Error | StdDev |
---|---|---|---|
GetPinnableReference | 583.48 ns | 2.811 ns | 2.629 ns |
Span | 81.90 ns | 0.356 ns | 0.297 ns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's pretty spectacular. Your implementation looks fine to me, but piece of mind should just be a matter of writing enough tests ;)
@bleroy I believe everything is in place and ready for your review.
Here are the current search performance comparisons between Index BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.1256 (1909/November2018Update/19H2)
Intel Core i9-9820X CPU 3.30GHz, 1 CPU, 20 logical and 10 physical cores
.NET Core SDK=5.0.101
[Host] : .NET Core 3.1.10 (CoreCLR 4.700.20.51601, CoreFX 4.700.20.51901), X64 RyuJIT
DefaultJob : .NET Core 3.1.10 (CoreCLR 4.700.20.51601, CoreFX 4.700.20.51901), X64 RyuJIT
LmdbIndex BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.1256 (1909/November2018Update/19H2)
Intel Core i9-9820X CPU 3.30GHz, 1 CPU, 20 logical and 10 physical cores
.NET Core SDK=5.0.101
[Host] : .NET Core 3.1.10 (CoreCLR 4.700.20.51601, CoreFX 4.700.20.51901), X64 RyuJIT
DefaultJob : .NET Core 3.1.10 (CoreCLR 4.700.20.51601, CoreFX 4.700.20.51901), X64 RyuJIT
|
@bleroy review fixes are in. You can assign any issues that come up for LMDB to me. I will also issue a follow-up PR with markdown docs for LMDB usage once merged, and once I use it in anger for my use case. I'm still thinking through how to unlock the feature of adding new documents to the index after an index has been built, without having the user need to do it themselves via |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thing we need to decide: packaging. I think we should put LMDB support in a separate package that has a dependency on the core package, if only because its other dependencies are a little more complex and I don't want to impose additional constraints to use in-memory features. It's your call whether you want to publish the new package yourself on NuGet or if you want me to continue managing the publication of both. I'm fine with either.
Thanks for the hard work and great features you put into this PR! |
Thanks for the existence of this project and for being responsive over a holiday! I'm looking forward to using it to provide full text search for reference data without the need to manage an ES cluster, will come in handy in all kinds of scenarios. |
Finally, here is a starter proof of concept for a delegated index.
This is in support of #13 and #19. So that indexes can be saved to disk, added to incrementally, et. al.
The reference implementation shows an LMDB-backed (memory mapped files on disk)
DelegatedIndex
that defers the five core functions of the in-memory index, so that any read-only index implementation could be plugged in.Then, the mutable indexing features are added via regular methods.
The performance test indicates the delegated index variant has negligible performance impacts (for some reason it's faster when I run it locally, but it's within the tolerance for the original tests).
It's done as additive-only to not touch any of the original code.
Before I go too far, I wanted to get your feedback on this approach.