-
Notifications
You must be signed in to change notification settings - Fork 126
Perform immut/{set, sparse_array} #2272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Path comparison in Node::contains might cause infinite loopCategory (Branch(children), path) => {
if path.is_last() { return false }
let idx = path.idx()
if children[idx] is Some(child) {
continue (child, path.next())
}
false
} Reasoning Unnecessary allocations in union operationCategory fn go(node1, node2) {
match (node1, node2) {
(node, Flat(key, path)) | (Flat(key, path), node) =>
if node.contains(key, path) { node } else { node.add_with_path(key, path) } Reasoning Path implementation details are not well documentedCategory const SEGMENT_LENGTH : Int = 5 ///| A Path represents a sequence of 5-bit segments in a 32-bit integer.
/// The format is: HEAD_TAG | segment_n | ... | segment_1 | segment_0
/// where each segment is 5 bits and HEAD_TAG ensures the upper bits are set.
pub(all) type Path UInt derive(Eq) Reasoning |
Pull Request Test Coverage Report for Build 166Details
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll leave it to the experts for the implementation of the algorithm, but here's something I've observed.
44df2f7
to
ad77ba3
Compare
btw from #2205 the optimized set operations seem quite independent, I think we should include that part in this PR too, so that there would be no performance degeneration in those operations |
3d148b8
to
1c87007
Compare
const SEGMENT_NUM : Int = 32 / SEGMENT_LENGTH | ||
|
||
///| | ||
const HEAD_TAG : UInt = 0xffffffffU << (SEGMENT_LENGTH * SEGMENT_NUM) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the purpose of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Path::of(x)
always has the form 0b11_xxxxx_xxxxx_xxxxx_xxxxx_xxxxx
.
Path::of(x).next().next().next()
is0b11_xxxxx_xxxxx
, not satisfy<= 0b11_11111
Path::of(x).next().next().next().next()
is0b11_xxxxx
, satisfy<= 0b11_11111
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I understand correctly, the new version only utilize the lower 30 bits of the hash value, and the highest 2 bits are used to identify end of hash value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I understand correctly, the new version only utilize the lower 30 bits of the hash value, and the highest 2 bits are used to identify end of hash value?
Yes
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a note: maybe it's possible to also do path compression for Branch
(i.e. add an extra path segment (Path
+ length) parameter) too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a note: maybe it's possible to also do path compression for
Branch
(i.e. add an extra path segment (Path
+ length) parameter) too?
I think the benefit might be limited. With a good enough hash function, for k in 1~5, the chance of n different elements all have the same kth Path segment is about 1/(32^(n-1)) (assuming 5 bits per segment).
immut/hashset/HAMT.mbt
Outdated
(Collision(xs), Collision(ys)) => | ||
xs.size() == ys.size() && xs.iter().all(fn(x) { ys.find(x) }) | ||
(Leaf(x, xs), Leaf(y, ys)) => | ||
xs.length() == ys.length() && xs.add(x).iter().all(ys.add(y).contains(_)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ys.add(y)
will be run multiple times here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the expected behavior for me. I thought f(exp, _)
would desugar to
{
let v1 = exp
hole => f(v1, hole)
}
I'll fix it. And I suggest documenting this behavior in the language's documentation.
558deab
to
ca818e3
Compare
(using arrow function) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR overall looks good to me, remaining TODO:
- accidental modification of
@immut/hashmap
below - add more comments to the
@path
package, explaining how the length of path is represented
m.add(kv.0, kv.1) | ||
} else { | ||
m | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR should only affect immut/hashset
, is this accidentally introduced?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not accidental. Since the types of SparseArray::difference
and SparseArray::intersection
were changed
-fn[X] SparseArray::difference(Self[X], Self[X]) -> Self[X]
+fn[X] SparseArray::difference(Self[X], Self[X], (X, X) -> X?) -> Self[X]?
-fn[X] SparseArray::intersection(Self[X], Self[X], (X, X) -> X raise?) -> Self[X] raise?
+fn[X] SparseArray::intersection(Self[X], Self[X], (X, X) -> X? raise?) -> Self[X]? raise?
the corresponding calls in immut/hashmap were also affected.
Done. |
cleanup the |
How should I clean up? I replied above, this is not an accidental change. |
I cannot see your comment. If it is under my review comment, is it still pending and not submitted? |
Copy here: Not accidental. Since the types of -fn[X] SparseArray::difference(Self[X], Self[X]) -> Self[X]
+fn[X] SparseArray::difference(Self[X], Self[X], (X, X) -> X?) -> Self[X]?
-fn[X] SparseArray::intersection(Self[X], Self[X], (X, X) -> X raise?) -> Self[X] raise?
+fn[X] SparseArray::intersection(Self[X], Self[X], (X, X) -> X? raise?) -> Self[X]? raise? the corresponding calls in immut/hashmap were also affected. It seems I still need to learn how to use GitHub, XD OK, now I know how to handle it. This "pending review" design is seriously user-unfriendly |
This PR is the first part of the split #2205, with clearer git history.
Path
to avoid passing depthCollision(Bucket[A])
withLeaf(A, @list.T[A])
T
toNode
, removeNode::empty
, defineT
asNode?
Branch({ data:[Branch(...)], .. })
byFlat(key, path)
Parts from #2205 not included in this PR:
add
/remove
to reuse old nodes in special cases (maybe we don't need this)union
,intersection
,difference
MyInt
, which only use the lower 8 bits for hashOverview