8000 using B-trees instead of RB trees · Issue #3 · cdacamar/fredbuf · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

using B-trees instead of RB trees #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ratchetfreak opened this issue Jan 28, 2025 · 9 comments
Open

using B-trees instead of RB trees #3

ratchetfreak opened this issue Jan 28, 2025 · 9 comments

Comments

@ratchetfreak
Copy link

Reading your article you had some issues with getting an RedBlack tree working in a pure functional style.

However a B-tree would work a lot better in a pure functional style.

Every subtree is automatically balanced because growing the height only happens at the root. And splits and joins only affect the current node and its siblings.

@cdacamar
Copy link
Owner

Thank you for the feedback!

Is there any technical value in moving to a B-tree at this point? I've already solved the mutation problems with the RB tree and due to its balancing guarantees, it creates a lot of nice invariants currently.

Perhaps if someone was willing to create a PR utilizing a B-tree instead of an RB tree and had some compelling data to show the switch is worthwhile, I would definitely make the switch. As is, I have no immediate plans to swap major data structures like this without strong evidence to suggest the move is worth the effort.

@ratchetfreak
Copy link
Author

One additional benefit is that a range remove in a b-tree can be as cheap as a single element remove with at most 2 nodes allocated per layer, instead of what I'm seeing in the current RB tree with an iterated remove with associated rebalancing. Handy if you select a large region with heavy edits and then delete it. That will cut down on the allocations needed and associated heap churn.

I presume the fredbuf-test.cpp is the minimal it needs to support? Or does a replacement need to fully support what's in all the headers?

@cdacamar
Copy link
Owner

Yes, fredbuf-test.cpp should be the most minimal support needed, but the other APIs should be reasonably exercised in that test anyway.

@ratchetfreak
Copy link
Author

https://github.com/ratchetfreak/fredbuf

I got all your tests passing, but I still need to do some hardening passes, see which apis your tests didn't excercise and implement them, and avoid visiting nodes when I don't need to.

Also need to figure out how to best implement coalescing pieces when it's just after the last insert.

@ratchetfreak
Copy link
Author

I've got it as good as I can get it without having access to code that actually exercises it heavily in ways I can't think of.

I added 2 tests

One that will test the proper functioning of the iterators, your treewalkers fail those btw. operator* shouldn't advance the iterator.

The other test is a (unreasonably large) bulk remove (remove most of 1000 pieces starting from the second ranging to second to last) that in my b-tree implementation is O(depth) in both time and allocation but in your rb-tree would need O(n*depth) time and allocations.

@cdacamar
Copy link
Owner
cdacamar commented Feb 9, 2025

One that will test the proper functioning of the iterators, your treewalkers fail those btw. operator* shouldn't advance the iterator.

I think this is an interesting design point. Some standard library iterators will advance if you dereference them (e.g. stream iterators). I chose the path of moving the iterator forward after a dereference since it's more common for me to inspect a result and move on rather than inspect using operator* multiple times. In the latter case, I would just cache the result anyway.

The other test is a (unreasonably large) bulk remove (remove most of 1000 pieces starting from the second ranging to second to last) that in my b-tree implementation is O(depth) in both time and allocation but in your rb-tree would need O(n*depth) time and allocations.

Excellent! Is there a reasonable set of numbers you can provide here? e.g. the worst-case for my original implementation, O(n*depth), exists if the system was not able to 'chunk' newly added piece additions, so I'm curious how often that comes up in practice. Though it does seem like a B-tree might edge out the RB-tree when multi-cursors are enabled, but I have yet to test any of that assumption.

@ratchetfreak
Copy link
Author

I think this is an interesting design point. Some standard library iterators will advance if you dereference them (e.g. stream iterators). I chose the path of moving the iterator forward after a dereference since it's more common for me to inspect a result and move on rather than inspect using operator* multiple times. In the latter case, I would just cache the result anyway.

sure, but this isn't inspecting a stream, this is looping through a data structure. None of those have increment behavior on dereference.

Excellent! Is there a reasonable set of numbers you can provide here? e.g. the worst-case for my original implementation, O(n*depth), exists if the system was not able to 'chunk' newly added piece additions, so I'm curious how often that comes up in practice. Though it does seem like a B-tree might edge out the RB-tree when multi-cursors are enabled, but I have yet to test any of that assumption.

I didn't do any timings. But I know that my implementation does at most 4 allocations per level on remove (no matter how many piece get yeeted), and only then if the remove is internal to a piece splitting it up and all the nodes above are stuffed to capacity. Whereas your implementation loops removing one Piece at a time until the offending Pieces are removed and you only need to trim the last piece.

BTW is you do make some timings make sure to turn off the algoMark statements in ratbuf_btree.h at line 202. I added those to help me visualize the algorithms in a test program. That vector will grow until you reset it.

The Tree api is not really designed around efficient multi cursor though, it sounds like you are doing an tree.insert() per cursor. That will lose the ModBuffer coherency needed to coalesce Pieces in either implementation. There's also only a single last_insert offset that you keep track of.

@cdacamar
Copy link
Owner
cdacamar commented Feb 9, 2025

Thanks for the info!

I will try to swap out my implementation for yours some time after next week (week of 2/10, I'm currently on a business trip).

Feel free to join up on the Discord as I'll likely be posting some investigations there: https://fred-dev.tech.

@ratchetfreak
Copy link
Author

so I did a basic counter on the constructor of nodes in each implementation for each test :

redblack b-tree
test1 12 3
test2 573 97
test3 187 12
test4 8 3
test5 2 1
test6 20 6
test7 10 3
test8 24 8
test9 12 4
test10 11 1
test11 40366 112

(MaxChildren for the B-tree is 10)

of course none of these are reasonable use patterns

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
0