-
Notifications
You must be signed in to change notification settings - Fork 511
Add benchmarkIsValidCell.c #518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also get a branch that checks invalid data, possibly of different types of incorrectness?
I think we should optimize the valid path, but it can still be useful to see how big the performance difference is, especially for those wanting to put this check in a hot loop.
Makes sense. Any thoughts on what kinds of incorrectness it would make sense to check for? There are lots of ways to be incorrect, so I'm not sure off the top of my head what a benchmark suite with "good" coverage might look like. |
The kinds of invalid cells:
I don't which would have what effect, though I suspect the last two would be the slowest to recognize so they might be the ones to prioritize? |
They'd be recognized in the order you stated. Thinking about it now, a benchmark like this would basically be on "doing only the first n out of N steps" of the validation. The "valid data" case then sort of includes all of the "invalid data" cases implicitly (assuming I'm now leaning towards just keeping it like we have it now, with just the "valid data" tests, since the speed of identifying invalid data (of any of the types listed above) should basically correspond how quickly the function runs through the full stage of validity tests. I guess we would miss the exact n/N breakdown, but I'm not sure that's providing much more information--unless we have a specific hypothesis we'd like to look into. |
Mixer valid and invalid cells may be slower than either 100% valid or 100% invalid because of branch prediction failures? |
cells = calloc(N, sizeof(H3Index)); | ||
H3_EXPORT(cellToChildren)(p, childRes, cells); | ||
|
||
BENCHMARK(pentagonChildren_2_8, 1000, { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we only benchmarking pentagon parent cells?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an easy way to generate data. At least with pentagon parents, you get one pentagon child cell.
FWIW - in my experience the most common case here is a mix of valid H3 indexes and nulls. So it might be worth benchmarking that, e.g. with alternating indexes. But I'm not sure the branch prediction failure question makes sense to benchmark - won't that vary widely over platforms, not to mention across bindings? |
// pentagon 2->8 | ||
// | ||
// Starting with a parent *pentagon* cell of resolution 2, | ||
// loop through all its children at resolution 8, applying `isValidCell` | ||
parentRes = 2; | ||
childRes = 8; | ||
p = pentagonAtRes(parentRes); | ||
H3_EXPORT(cellToChildrenSize)(p, childRes, &N); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Total nit but could consider doing all setup before all BENCHMARK
macros. I don't expect it to effect results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd also like to organize it that way, but I could't figure out how to do a problem setup for each benchmark without it being timed.
I'd like to setup and free any memory right before and after each timing, but I don't see an easy way to do that with the current benchmarking code. Is there a way I'm not seeing?
I am also curious about performance across valid cells and I think the branch prediction question is applicable to our entire benchmarking approach. Both that we test the same data repeatedly, and that we don't have a performance reference. |
Here are the numbers I got from running on my macbook air (so take them with a grain of salt). The last test is the same as the first, just to give an idea of how much the numbers can vary based on how hot my laptop is running (I'll remove the first one before landing).
|
I probably won't keep this in the final, but adding it here just to have the code in the git history and report at least one run.
Run with all cells at resolution 4:
|
Run with the latest code:
|
Mostly just for my future reference, the commit above is some code where I compare s = """
-- pentagonChildren_2_8_func: 1414.681000 microseconds per iteration (1000 iterations)
-- pentagonChildren_2_8_loop_no_struct: 1376.121000 microseconds per iteration (1000 iterations)
-- pentagonChildren_2_8_loop_struct: 1406.939000 microseconds per iteration (1000 iterations)
-- pentagonChildren_2_8_func: 1412.431000 microseconds per iteration (1000 iterations)
-- pentagonChildren_2_8_loop_no_struct: 1406.760000 microseconds per iteration (1000 iterations)
-- pentagonChildren_2_8_loop_struct: 1422.283000 microseconds per iteration (1000 iterations)
-- pentagonChildren_2_8_func: 1436.422000 microseconds per iteration (1000 iterations)
-- pentagonChildren_2_8_loop_no_struct: 1433.394000 microseconds per iteration (1000 iterations)
-- pentagonChildren_2_8_loop_struct: 1439.210000 microseconds per iteration (1000 iterations)
-- pentagonChildren_2_8_func: 1478.305000 microseconds per iteration (1000 iterations)
-- pentagonChildren_2_8_loop_no_struct: 1454.012000 microseconds per iteration (1000 iterations)
-- pentagonChildren_2_8_loop_struct: 1516.242000 microseconds per iteration (1000 iterations)
-- pentagonChildren_2_8_func: 1493.511000 microseconds per iteration (1000 iterations)
-- pentagonChildren_2_8_loop_no_struct: 1518.663000 microseconds per iteration (1000 iterations)
-- pentagonChildren_2_8_loop_struct: 1497.821000 microseconds per iteration (1000 iterations)
"""
s = s.split('\n')
s = [e.strip('- \t') for e in s if e]
s = [e.split()[:2] for e in s]
s = [(name[:-1], float(val)) for name, val in s]
import toolz
s = toolz.groupby(lambda x: x[0], s)
s = toolz.valmap(lambda e: [x[1] for x in e], s)
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
for name, vals in s.items():
ax.plot(vals, '-o', label=name)
ax.legend() |
And here are the latest benchmarks (still on my macbook air). Also, the previous benchmark numbers are off because I'd forgotten to
|
|
Adding in benchmarks before making changes to isValidCell (#496).