Speed-up contains by using memchr
on every iteration
#16484
Merged
+43
−82
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR reworks the implementation for
contains
(which also gets generated as part ofLIKE '%str%'
by the optimizer).Previously we had three different implementations - for small aligned contains (2/4/8 bytes), small unaligned contains (3/5/6/7 bytes) and a generic fallback for needles > 8 bytes. These implementations were pretty distinct and not entirely optimal - they did a lot of work for every byte even if an early-out could be used to speed this up in many cases.
This PR unifies these three implementations and leans on
memchr
to find the beginning of possible matches - followed by doing an actual matching comparison. The way this works is that we first do the comparison with the largest unsigned integer that fits in the needle (so either 2/4/8 bytes) - followed by amemcmp
for the remaining bytes (if any).Running this query on
hits
we can see this improves performance in all cases - but especially for small unaligned searches and large searches.