pull pos
out to a local in getU4
#8824
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
When you have a class member that's repeatedly accessed in a non-
const
function -- likepos
is ingetU4
-- the compiler is not smart enough to see that you could potentially load the member once for the whole function and then store it back at the end (inlining, exceptions, etc. complicate this picture, so it's not entirely the compiler's fault).This PR essentially performs the optimization we wish the compiler could see, since we know there's no potential side-effects the compiler needs to be concerned about here. I would show you the assembly, but since this change makes
getU4
simple enough that the compi A5C3 ler thinks it's worth inlining (!), there no easy side-by-side comparison.I don't have numbers at hand, but ISTR
getU4
taking something in the range of 10-15% of deserialization, and this optimization speeding upgetU4
by 10% or so (roughly, since it's inlined; I was getting my numbers out of valgrind's cachegrind tool), which would be only 1%, but wins are wins, right? (Maybe the numbers are off; I can try to remeasure if people would like that.)Test plan
Existing tests.