10000 Add benchmarkIsValidCell.c by ajfriend · Pull Request #518 · uber/h3 · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Add benchmarkIsValidCell.c #518

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Oct 9, 2021
Merged

Conversation

ajfriend
Copy link
Contributor

Adding in benchmarks before making changes to isValidCell (#496).

@coveralls
Copy link
coveralls commented Sep 27, 2021

Coverage Status

Coverage remained the same at 98.685% when pulling 63a0850 on ajfriend:bench_isValidCell into 46ff097 on uber:master.

Copy link
Collaborator
@dfellis dfellis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also get a branch that checks invalid data, possibly of different types of incorrectness?

I think we should optimize the valid path, but it can still be useful to see how big the performance difference is, especially for those wanting to put this check in a hot loop.

@ajfriend
Copy link
Contributor Author

Can we also get a branch that checks invalid data, possibly of different types of incorrectness?

I think we should optimize the valid path, but it can still be useful to see how big the performance difference is, especially for those wanting to put this check in a hot loop.

Makes sense. Any thoughts on what kinds of incorrectness it would make sense to check for? There are lots of ways to be incorrect, so I'm not sure off the top of my head what a benchmark suite with "good" coverage might look like.

@dfellis
Copy link
Collaborator
dfellis commented Sep 27, 2021

Can we also get a branch that checks invalid data, possibly of different types of incorrectness?
I think we should optimize the valid path, but it can still be useful to see how big the performance difference is, especially for those wanting to put this check in a hot loop.

Makes sense. Any thoughts on what kinds of incorrectness it would make sense to check for? There are lots of ways to be incorrect, so I'm not sure off the top of my head what a benchmark suite with "good" coverage might look like.

The kinds of invalid cells:

  • sign bit or other reserved bits set
  • invalid mode selected
  • invalid base cell
  • invalid child selection
    • setting a '7' before the chosen resolution
    • setting the deleted child of a pentagon

I don't which would have what effect, though I suspect the last two would be the slowest to recognize so they might be the ones to prioritize?

@ajfriend
Copy link
Contributor Author

I don't which would have what effect, though I suspect the last two would be the slowest to recognize so they might be the ones to prioritize?

They'd be recognized in the order you stated. Thinking about it now, a benchmark like this would basically be on "doing only the first n out of N steps" of the validation. The "valid data" case then sort of includes all of the "invalid data" cases implicitly (assuming isValidCell works correctly, but that's covered by the test suite).

I'm now leaning towards just keeping it like we have it now, with just the "valid data" tests, since the speed of identifying invalid data (of any of the types listed above) should basically correspond how quickly the function runs through the full stage of validity tests. I guess we would miss the exact n/N breakdown, but I'm not sure that's providing much more information--unless we have a specific hypothesis we'd like to look into.

@dfellis
Copy link
Collaborator
dfellis commented Sep 27, 2021

I don't which would have what effect, though I suspect the last two would be the slowest to recognize so they might be the ones to prioritize?

They'd be recognized in the order you stated. Thinking about it now, a benchmark like this would basically be on "doing only the first n out of N steps" of the validation. The "valid data" case then sort of includes all of the "invalid data" cases implicitly (assuming isValidCell works correctly, but that's covered by the test suite).

I'm now leaning towards just keeping it like we have it now, with just the "valid data" tests, since the speed of identifying invalid data (of any of the types listed above) should basically correspond how quickly the function runs through the full stage of validity tests. I guess we would miss the exact n/N breakdown, but I'm not sure that's providing much more information--unless we have a specific hypothesis we'd like to look into.

Mixer valid and invalid cells may be slower than either 100% valid or 100% invalid because of branch prediction failures?

cells = calloc(N, sizeof(H3Index));
H3_EXPORT(cellToChildren)(p, childRes, cells);

BENCHMARK(pentagonChildren_2_8, 1000, {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we only benchmarking pentagon parent cells?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an easy way to generate data. At least with pentagon parents, you get one pentagon child cell.

@nrabinowitz
Copy link
Collaborator

FWIW - in my experience the most common case here is a mix of valid H3 indexes and nulls. So it might be worth benchmarking that, e.g. with alternating indexes. But I'm not sure the branch prediction failure question makes sense to benchmark - won't that vary widely over platforms, not to mention across bindings?

Comment on lines 56 to 63
// pentagon 2->8
//
// Starting with a parent *pentagon* cell of resolution 2,
// loop through all its children at resolution 8, applying `isValidCell`
parentRes = 2;
childRes = 8;
p = pentagonAtRes(parentRes);
H3_EXPORT(cellToChildrenSize)(p, childRes, &N);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Total nit but could consider doing all setup before all BENCHMARK macros. I don't expect it to effect results.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also like to organize it that way, but I could't figure out how to do a problem setup for each benchmark without it being timed.

I'd like to setup and free any memory right before and after each timing, but I don't see an easy way to do that with the current benchmarking code. Is there a way I'm not seeing?

@isaacbr
8000
odsky
Copy link
Collaborator

FWIW - in my experience the most common case here is a mix of valid H3 indexes and nulls. So it might be worth benchmarking that, e.g. with alternating indexes. But I'm not sure the branch prediction failure question makes sense to benchmark - won't that vary widely over platforms, not to mention across bindings?

I am also curious about performance across valid cells and H3_NULL.

I think the branch prediction question is applicable to our entire benchmarking approach. Both that we test the same data repeatedly, and that we don't have a performance reference.

@ajfriend
Copy link
Contributor Author
ajfriend commented Oct 2, 2021

Here are the numbers I got from running on my macbook air (so take them with a grain of salt). The last test is the same as the first, just to give an idea of how much the numbers can vary based on how hot my laptop is running (I'll remove the first one before landing).

build/bin/benchmarkIsValidCell
    -- pentagonChildren_8_14_null_100: 1844.567000 microseconds per iteration (1000 iterations)
    -- pentagonChildren_8_14: 1938.043000 microseconds per iteration (1000 iterations)
    -- pentagonChildren_2_8: 1663.274000 microseconds per iteration (1000 iterations)
    -- pentagonChildren_8_14_null_2: 1104.476000 microseconds per iteration (1000 iterations)
    -- pentagonChildren_8_14_null_10: 2010.307000 microseconds per iteration (1000 iterations)
    -- pentagonChildren_8_14_null_100: 1988.599000 microseconds per iteration (1000 iterations)

@ajfriend
Copy link
Contributor Author
ajfriend commented Oct 2, 2021

I added the H3_NULL tests above. I kicked the other branch prediction benchmark questions to #520, just because I'd like to close this out and get back to #496.

I probably won't keep this in the final, but adding it here
just to have the code in the git history and report at least one run.
@ajfriend
Copy link
Contributor Author
ajfriend commented Oct 2, 2021

Run with all cells at resolution 4:

build/bin/benchmarkIsValidCell
	-- allCellsAtRes4: 12272.413000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_8_14: 5825.150000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_2_8: 4668.005000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_8_14_null_2: 2515.056000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_8_14_null_10: 4762.763000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_8_14_null_100: 5794.260000 microseconds per iteration (1000 iterations)

@ajfriend
Copy link
Contributor Author
ajfriend commented Oct 2, 2021

Run with the latest code:

build/bin/benchmarkIsValidCell
	-- pentagonChildren_2_8: 4460.855000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_8_14: 4377.846000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_8_14_null_2: 2168.223000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_8_14_null_10: 4188.899000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_8_14_null_100: 4368.239000 microseconds per iteration (1000 iterations)

@ajfriend
Copy link
Contributor Author
ajfriend commented Oct 9, 2021

Mostly just for my future reference, the commit above is some code where I compare
various ways to code up the benchmarks. It looks like each of these
approaches are roughly equivalent---at least when you cmake -DCMAKE_BUILD_TYPE=Release.

s = """
	-- pentagonChildren_2_8_func: 1414.681000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_2_8_loop_no_struct: 1376.121000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_2_8_loop_struct: 1406.939000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_2_8_func: 1412.431000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_2_8_loop_no_struct: 1406.760000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_2_8_loop_struct: 1422.283000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_2_8_func: 1436.422000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_2_8_loop_no_struct: 1433.394000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_2_8_loop_struct: 1439.210000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_2_8_func: 1478.305000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_2_8_loop_no_struct: 1454.012000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_2_8_loop_struct: 1516.242000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_2_8_func: 1493.511000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_2_8_loop_no_struct: 1518.663000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_2_8_loop_struct: 1497.821000 microseconds per iteration (1000 iterations)
"""

s = s.split('\n')
s = [e.strip('- \t') for e in s if e]
s = [e.split()[:2] for e in s]
s = [(name[:-1], float(val)) for name, val in s]

import toolz

s = toolz.groupby(lambda x: x[0], s)
s = toolz.valmap(lambda e: [x[1] for x in e], s)


import matplotlib.pyplot as plt

fig, ax = plt.subplots()
for name, vals in s.items():
    ax.plot(vals, '-o', label=name)
ax.legend()

Screen Shot 2021-10-08 at 8 09 13 PM

@ajfriend
Copy link
Contributor Author
ajfriend commented Oct 9, 2021

And here are the latest benchmarks (still on my macbook air). Also, the previous benchmark numbers are off because I'd forgotten to cmake -DCMAKE_BUILD_TYPE=Release ... That's why these numbers are lower.

build/bin/benchmarkIsValidCell
	-- pentagonChildren_2_8_func: 1636.138000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_8_14: 2254.166000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_8_14_null_2: 1229.321000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_8_14_null_10: 2064.721000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_8_14_null_100: 2220.962000 microseconds per iteration (1000 iterations)

@ajfriend
Copy link
Contributor Author
ajfriend commented Oct 9, 2021
build/bin/benchmarkIsValidCell
	-- pentagonChildren_2_8: 1602.169000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_8_14: 1939.921000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_8_14_null_2: 1071.529000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_8_14_null_10: 1828.599000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_8_14_null_100: 1878.061000 microseconds per iter
964A
ation (1000 iterations)

@ajfriend ajfriend merged commit f2786a4 into uber:master Oct 9, 2021
@ajfriend ajfriend deleted the bench_isValidCell branch October 9, 2021 04:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0