8000 Hoist all bounds checks out of loops by jub0bs · Pull Request #31 · agnivade/levenshtein · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Hoist all bounds checks out of loops #31

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 19, 2025
Merged

Conversation

jub0bs
Copy link
Contributor
@jub0bs jub0bs commented May 17, 2025

Some benchmark results (no changes in terms of allocations):

$ benchstat -filter '.name:Simple' old new
goos: darwin
goarch: amd64
pkg: github.com/agnivade/levenshtein
cpu: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
                     │     old     │                 new                 │
                     │   sec/op    │   sec/op     vs base                │
Simple/ASCII-8         167.5n ± 3%   142.4n ± 4%  -14.96% (p=0.000 n=20)
Simple/French-8        267.1n ± 2%   220.3n ± 3%  -17.49% (p=0.000 n=20)
Simple/Nordic-8        432.6n ± 2%   346.7n ± 3%  -19.86% (p=0.000 n=20)
Simple/Long_lead-8     326.2n ± 0%   275.9n ± 0%  -15.41% (p=0.000 n=20)
Simple/Long_middle-8   520.4n ± 0%   441.6n ± 0%  -15.15% (p=0.000 n=20)
Simple/Long_trail-8    721.6n ± 0%   608.3n ± 0%  -15.70% (p=0.000 n=20)
Simple/Long_diff-8     8.549µ ± 1%   6.537µ ± 0%  -23.53% (p=0.000 n=20)
Simple/Tibetan-8       562.1n ± 2%   493.3n ± 3%  -12.25% (p=0.000 n=20)
geomean                571.6n        475.2n       -16.86%

@jub0bs jub0bs changed the title Hoist all but one bounds check out of loops Optimize performance May 17, 2025
@jub0bs jub0bs changed the title Optimize performance Hoist all bounds checks out of loops May 18, 2025
@jub0bs
Copy link
Contributor Author
jub0bs commented May 18, 2025

Comparison to dgryski's implementation

Before

$ benchstat -col '/impl@(dgryski agniva)' old
goos: darwin
goarch: amd64
pkg: github.com/agnivade/levenshtein
cpu: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
                   │   dgryski    │               agniva                │
                   │    sec/op    │   sec/op     vs base                │
All/case=ASCII-8      335.6n ± 0%   163.1n ± 2%  -51.41% (p=0.000 n=20)
All/case=French-8     616.0n ± 0%   267.0n ± 2%  -56.66% (p=0.000 n=20)
All/case=Nordic-8    1184.0n ± 1%   429.3n ± 1%  -63.74% (p=0.000 n=20)
All/case=Tibetan-8    952.8n ± 2%   555.7n ± 1%  -41.67% (p=0.000 n=20)
geomean               694.9n        319.2n       -54.06%

After

$ benchstat -col '/impl@(dgryski agniva)' new
goos: darwin
goarch: amd64
pkg: github.com/agnivade/levenshtein
cpu: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
                   │   dgryski    │               agniva                │
                   │    sec/op    │   sec/op     vs base                │
All/case=ASCII-8      342.1n ± 1%   131.2n ± 3%  -61.63% (p=0.000 n=20)
All/case=French-8     628.1n ± 0%   221.9n ± 2%  -64.68% (p=0.000 n=20)
All/case=Nordic-8    1185.5n ± 1%   335.0n ± 2%  -71.74% (p=0.000 n=20)
All/case=Tibetan-8    974.5n ± 1%   493.4n ± 1%  -49.37% (p=0.000 n=20)
geomean               705.8n        263.4n       -62.68%

Comparison to arbovm's implementation

Before

$ benchstat -col '/impl@(arbovm agniva)' old
goos: darwin
goarch: amd64
pkg: github.com/agnivade/levenshtein
cpu: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
                   │    arbovm    │               agniva                │
                   │    sec/op    │   sec/op     vs base                │
All/case=ASCII-8      356.8n ± 1%   163.1n ± 2%  -54.30% (p=0.000 n=20)
All/case=French-8     644.8n ± 0%   267.0n ± 2%  -58.60% (p=0.000 n=20)
All/case=Nordic-8    1208.5n ± 1%   429.3n ± 1%  -64.48% (p=0.000 n=20)
All/case=Tibetan-8    976.7n ± 3%   555.7n ± 1%  -43.10% (p=0.000 n=20)
geomean               721.9n        319.2n       -55.78%

After

$ benchstat -col '/impl@(arbovm agniva)' new
goos: darwin
goarch: amd64
pkg: github.com/agnivade/levenshtein
cpu: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
                   │    arbovm    │               agniva                │
                   │    sec/op    │   sec/op     vs base                │
All/case=ASCII-8      358.6n ± 1%   131.2n ± 3%  -63.40% (p=0.000 n=20)
All/case=French-8     653.3n ± 1%   221.9n ± 2%  -66.04% (p=0.000 n=20)
All/case=Nordic-8    1219.0n ± 1%   335.0n ± 2%  -72.52% (p=0.000 n=20)
All/case=Tibetan-8    980.0n ± 2%   493.4n ± 1%  -49.65% (p=0.000 n=20)
geomean               727.3n        263.4n       -63.79%

Copy link
Owner
@agnivade agnivade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very clever, thank you! Hoisting the bounds checks from the main loops looks good to me.

I wonder how much performance are we getting in hoisting out the checks in the trimLongestCommonSuffix and trimLongestCommonPrefix functions. Because it looks like we have to pay quite a large cost to readability.

Could you re-run the benchmarks by reverting the logic of the two functions?

@jub0bs
Copy link
Contributor Author
jub0bs commented May 19, 2025

I wonder how much performance are we getting in hoisting out the checks in the trimLongestCommonSuffix and trimLongestCommonPrefix functions.

I guess the performance gain largely depends on the two input strings: if they have a long common prefix and/or a long common suffix, avoiding unnecessary bounds checks in those loops is likely to have a non-negligible benefit on performance.

[...] it looks like we have to pay quite a large cost to readability.

I don't think readability suffers that much, esp. since I've abstracted the two loops to functions with clarifying names.

Could you re-run the benchmarks by reverting the logic of the two functions?

Here are some results after reverting the commit in question:

goos: darwin
goarch: amd64
pkg: github.com/agnivade/levenshtein
cpu: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
                     │     new     │                new2                 │
                     │   sec/op    │   sec/op     vs base                │
Simple/ASCII-8         142.4n ± 4%   143.4n ± 4%        ~ (p=0.344 n=20)
Simple/French-8        220.3n ± 3%   226.4n ± 1%   +2.75% (p=0.000 n=20)
Simple/Nordic-8        346.7n ± 3%   336.0n ± 1%   -3.07% (p=0.007 n=20)
Simple/Long_lead-8     275.9n ± 0%   338.1n ± 0%  +22.56% (p=0.000 n=20)
Simple/Long_middle-8   441.6n ± 0%   501.2n ± 0%  +13.51% (p=0.000 n=20)
Simple/Long_trail-8    608.3n ± 0%   631.7n ± 0%   +3.83% (p=0.000 n=20)
Simple/Long_diff-8     6.537µ ± 0%   5.925µ ± 1%   -9.37% (p=0.000 n=20)
Simple/Tibetan-8       493.3n ± 3%   509.9n ± 1%   +3.35% (p=0.001 n=20)
geomean                475.2n        493.7n        +3.88%

The commit is worth keeping, in my opinion. What do you think?

@jub0bs
Copy link
Contributor Author
jub0bs commented May 19, 2025

I've changed the names of the sub-benchmarks in order for the results to be easier to analyse with benchstat.

@agnivade
Copy link
Owner

I guess the performance gain largely depends on the two input strings: if they have a long common prefix and/or a long common suffix,

Yes. Could you add the additional benchmarks you added for the common prefix and suffixes? I don't see them in the PR.

@jub0bs
Copy link
Contributor Author
jub0bs commented May 19, 2025

Yes. Could you add the additional benchmarks you added for the common prefix and suffixes? I don't see them in the PR.

I haven't added any new cases yet, but I can.

Edit: Actually, aren't the cases named "Long lead" and "Long trail" already testing exactly this? Are new benchmark cases needed?

jub0bs added 4 commits May 19, 2025 09:17
Note: functions trimLongestCommonPrefix and trimLongestCommonSuffix are
implemented in such a way as to make them eligible for inlining.
@agnivade
Copy link
Owner

My bad there. Somehow I forgot that the long lead and long trail cases already exist. Thanks!

@agnivade agnivade merged commit 4208675 into agnivade:master May 19, 2025
6 checks passed
@jub0bs
Copy link
Contributor Author
jub0bs commented May 19, 2025

Great! Thanks for taking the time to review and accept my changes.

@agnivade
Copy link
Owner

Thanks for spending the time to improve the library! Hope it's useful to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0