-
Notifications
You must be signed in to change notification settings - Fork 130
Sequential disk writer #475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sequential disk writer #475
Conversation
Signed-off-by: Samuel Herman <sherman8 8000 915@gmail.com>
Signed-off-by: Samuel Herman <sherman8915@gmail.com>
Signed-off-by: Samuel Herman <sherman8915@gmail.com>
Signed-off-by: Samuel Herman <sherman8915@gmail.com>
Signed-off-by: Samuel Herman <sherman8915@gmail.com>
Signed-off-by: Samuel Herman <sherman8915@gmail.com>
Signed-off-by: Samuel Herman <sherman8915@gmail.com>
Signed-off-by: Samuel Herman <sherman8915@gmail.com>
Signed-off-by: Samuel Herman <sherman8915@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good! A few relatively minor comments here and there.
import java.io.FileOutputStream; | ||
import java.io.IOException; | ||
import java.nio.file.Path; | ||
import java.util.zip.CRC32; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that this import can be removed? It seems that we do not use CRC32.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
} | ||
|
||
// write sparse levels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we merge the writing of the sparse levels and the separated features from OnDiskSequentialGraphIndexWriter and OnDiskGraphIndexWriter into functions in AbstractGraphIndexWriter? They seem to be exactly the same. Trying to avoid code repetition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
|
||
/** | ||
* Builder for OnDiskGraphIndexWriter, with optional features. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment should probably reference OnDIskSequentialGraphIndexWriter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
/** | ||
* Builder for OnDiskGraphIndexWriter, with optional features. | ||
*/ | ||
public static class Builder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we merge most of this code into some abstract class? Maybe AbstractGraphIndexWriter.Builder? It looks like apart from the startOffset in OnDiskGraphIndexWriter.Builder, the rest is the same. Trying to avoid code repetition.
* - Base layer max degree | ||
* - ID upper bound | ||
* - Number of layers | ||
* - Layer info (size and degree for each layer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do something about the name of this variable in CommonHeader?
private static final int V4_MAX_LAYERS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this one a little bit, I'm not really sure, because we haven't changed anything with max layers, so might be best to keep as is and perhaps put a comment next to it?
Open for suggestions on this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. We can leave it as is.
Signed-off-by: Samuel Herman <sherman8915@gmail.com>
Signed-off-by: Samuel Herman <sherman8915@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for the contribution!
Description
Main motivation for this change is to introduce a disk writer that keeps immutability and is pure sequential.
This is essential for integration with frameworks such as Lucene and OpenSearch.
Changes
Additional changes in this PR
OnDiskSequentialGraphIndexWriter
and abstract theGraphWriter
interfacegetNodes
Tests
Introduce tests for
TestOnDiskSequentialGraphIndexWriter
and add proper configuration for log4j debug logs in tests.