-
Notifications
You must be signed in to change notification settings - Fork 9
using B-trees instead of RB trees #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thank you for the feedback! Is there any technical value in moving to a B-tree at this point? I've already solved the mutation problems with the RB tree and due to its balancing guarantees, it creates a lot of nice invariants currently. Perhaps if someone was willing to create a PR utilizing a B-tree instead of an RB tree and had some compelling data to show the switch is worthwhile, I would definitely make the switch. As is, I have no immediate plans to swap major data structures like this without strong evidence to suggest the move is worth the effort. |
One additional benefit is that a range remove in a b-tree can be as cheap as a single element remove with at most 2 nodes allocated per layer, instead of what I'm seeing in the current RB tree with an iterated remove with associated rebalancing. Handy if you select a large region with heavy edits and then delete it. That will cut down on the allocations needed and associated heap churn. I presume the fredbuf-test.cpp is the minimal it needs to support? Or does a replacement need to fully support what's in all the headers? |
Yes, fredbuf-test.cpp should be the most minimal support needed, but the other APIs should be reasonably exercised in that test anyway. |
https://github.com/ratchetfreak/fredbuf I got all your tests passing, but I still need to do some hardening passes, see which apis your tests didn't excercise and implement them, and avoid visiting nodes when I don't need to. Also need to figure out how to best implement coalescing pieces when it's just after the last insert. |
I've got it as good as I can get it without having access to code that actually exercises it heavily in ways I can't think of. I added 2 tests One that will test the proper functioning of the iterators, your treewalkers fail those btw. The other test is a (unreasonably large) bulk remove (remove most of 1000 pieces starting from the second ranging to second to last) that in my b-tree implementation is O(depth) in both time and allocation but in your rb-tree would need O(n*depth) time and allocations. |
I think this is an interesting design point. Some standard library iterators will advance if you dereference them (e.g. stream iterators). I chose the path of moving the iterator forward after a dereference since it's more common for me to inspect a result and move on rather than inspect using
Excellent! Is there a reasonable set of numbers you can provide here? e.g. the worst-case for my original implementation, O(n*depth), exists if the system was not able to 'chunk' newly added piece additions, so I'm curious how often that comes up in practice. Though it does seem like a B-tree might edge out the RB-tree when multi-cursors are enabled, but I have yet to test any of that assumption. |
sure, but this isn't inspecting a stream, this is looping through a data structure. None of those have increment behavior on dereference.
I didn't do any timings. But I know that my implementation does at most 4 allocations per level on remove (no matter how many piece get yeeted), and only then if the remove is internal to a piece splitting it up and all the nodes above are stuffed to capacity. Whereas your implementation loops removing one Piece at a time until the offending Pieces are removed and you only need to trim the last piece. BTW is you do make some timings make sure to turn off the algoMark statements in The Tree api is not really designed around efficient multi cursor though, it sounds like you are doing an |
Thanks for the info! I will try to swap out my implementation for yours some time after next week (week of 2/10, I'm currently on a business trip). Feel free to join up on the Discord as I'll likely be posting some investigations there: https://fred-dev.tech. |
so I did a basic counter on the constructor of nodes in each implementation for each test :
(MaxChildren for the B-tree is 10) of course none of these are reasonable use patterns |
Reading your article you had some issues with getting an RedBlack tree working in a pure functional style.
However a B-tree would work a lot better in a pure functional style.
Every subtree is automatically balanced because growing the height only happens at the root. And splits and joins only affect the current node and its siblings.
The text was updated successfully, but these errors were encountered: