-
Notifications
You must be signed in to change notification settings - Fork 511
aarch64 performance fix: replace long double with double #790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Use lround instead of lroundl, for perf improvements on aarch64 machines.
Use lround instead of lroundl, for perf improvements on aarch64 machines.
Use lround instead of lroundl, for perf improvements on aarch64 machines.
Update copyright year.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! This might provide a small perf boost on other platforms since it looks like all uses of long double get casted back into regular double, and none of them look like the extra 16 bits should impact the intermediate values except maybe the branches in _hex2dToCoordIJK
? (But the comparisons of long-double intermediate values look like they are always against regular doubles, so I presume they get casted during the comparison and the extra bits of precision are dropped.)
FWIW, h3o only uses |
Thanks for the PR! I remember removing some uses of |
Everything was |
I measured a 40% performance uplift on an Ampere Altra QS80-30 on the benchmark cmdline below. Building with gcc-12 -O3, runtime dropped from 481s to 340s.
|
@@ -346,8 +346,8 @@ H3Error _upAp7Checked(CoordIJK *ijk) { | |||
} | |||
|
|||
// TODO: Do the int math parts here in long double? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little late here, but it looks like there are some TODO comments that are now out of date
On aarch64 systems, we are seeing a significant overhead in libgcc - __multf3.
This overhead is due to the usage of lroundl in coordijk.c. To prevent this overhead, we are proposing to replace long double type with double type and use lround instead of lroundl. This should have no impact on other systems than aarch64. We have verified the behavior matches on some test cases. For coherency sake, we also changed long double constants to double type.
This was found by the SPEC CPU committee, where h3 is being considered as a candidate benchmark for SPEC CPU v8 and undergoing testing across a wide variety of systems and compilers.
Feel free to squash these commits.