Speed-up contains by using `memchr` on every iteration #16484

Mytherin · 2025-03-03T11:49:58Z

This PR reworks the implementation for contains (which also gets generated as part of LIKE '%str%' by the optimizer).

Previously we had three different implementations - for small aligned contains (2/4/8 bytes), small unaligned contains (3/5/6/7 bytes) and a generic fallback for needles > 8 bytes. These implementations were pretty distinct and not entirely optimal - they did a lot of work for every byte even if an early-out could be used to speed this up in many cases.

This PR unifies these three implementations and leans on memchr to find the beginning of possible matches - followed by doing an actual matching comparison. The way this works is that we first do the comparison with the largest unsigned integer that fits in the needle (so either 2/4/8 bytes) - followed by a memcmp for the remaining bytes (if any).

Running this query on hits we can see this improves performance in all cases - but especially for small unaligned searches and large searches.

SELECT COUNT(*) FROM hits WHERE URL LIKE '%goog%';

LIKE	Search Size	v1.2.0	New
%goog%	4	0.35s	0.31s
%google%	6	0.41s	0.33s
%google.com%	10	0.43s	0.32s

Speed-up contains by using `memchr` on every iteration (duckdb/duckdb#16484)

Mytherin added 3 commits March 3, 2025 09:54

Rework contains to use memchr for every iteration

71889a7

Unify contains functions

64f6cc1

Remove smaller binary here

707e929

Mytherin merged commit 5ca240b into duckdb:main Mar 3, 2025
47 checks passed

Mytherin deleted the fastercontains branch April 2, 2025 09:24

krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 15, 2025

vendor: Update vendored sources to duckdb/duckdb@5ca240b

f28d370

Speed-up contains by using `memchr` on every iteration (duckdb/duckdb#16484)

krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 15, 2025

vendor: Update vendored sources to duckdb/duckdb@5ca240b

9a97799

Speed-up contains by using `memchr` on every iteration (duckdb/duckdb#16484)

krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 16, 2025

vendor: Update vendored sources to duckdb/duckdb@5ca240b

bb2edae

Speed-up contains by using `memchr` on every iteration (duckdb/duckdb#16484)

krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 17, 2025

vendor: Update vendored sources to duckdb/duckdb@5ca240b

0d85f84

Speed-up contains by using `memchr` on every iteration (duckdb/duckdb#16484)

krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 18, 2025

vendor: Update vendored sources to duckdb/duckdb@5ca240b

aab4727

Speed-up contains by using `memchr` on every iteration (duckdb/duckdb#16484)

krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 18, 2025

vendor: Update vendored sources to duckdb/duckdb@5ca240b

e1da84e

Speed-up contains by using `memchr` on every iteration (duckdb/duckdb#16484)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speed-up contains by using `memchr` on every iteration #16484

Speed-up contains by using `memchr` on every iteration #16484

Uh oh!

Uh oh!

Uh oh!

Speed-up contains by using memchr on every iteration #16484

Speed-up contains by using memchr on every iteration #16484

Conversation

Uh oh!

Uh oh!

Uh oh!

Speed-up contains by using `memchr` on every iteration #16484

Speed-up contains by using `memchr` on every iteration #16484