Add benchmarkIsValidCell.c #518

ajfriend · 2021-09-27T00:05:29Z

Adding in benchmarks before making changes to isValidCell (#496).

coveralls · 2021-09-27T00:08:35Z

Coverage remained the same at 98.685% when pulling 63a0850 on ajfriend:bench_isValidCell into 46ff097 on uber:master.

dfellis

Can we also get a branch that checks invalid data, possibly of different types of incorrectness?

I think we should optimize the valid path, but it can still be useful to see how big the performance difference is, especially for those wanting to put this check in a hot loop.

ajfriend · 2021-09-27T01:33:31Z

Can we also get a branch that checks invalid data, possibly of different types of incorrectness?

I think we should optimize the valid path, but it can still be useful to see how big the performance difference is, especially for those wanting to put this check in a hot loop.

Makes sense. Any thoughts on what kinds of incorrectness it would make sense to check for? There are lots of ways to be incorrect, so I'm not sure off the top of my head what a benchmark suite with "good" coverage might look like.

dfellis · 2021-09-27T01:55:37Z

Can we also get a branch that checks invalid data, possibly of different types of incorrectness?
I think we should optimize the valid path, but it can still be useful to see how big the performance difference is, especially for those wanting to put this check in a hot loop.

Makes sense. Any thoughts on what kinds of incorrectness it would make sense to check for? There are lots of ways to be incorrect, so I'm not sure off the top of my head what a benchmark suite with "good" coverage might look like.

The kinds of invalid cells:

sign bit or other reserved bits set
invalid mode selected
invalid base cell
invalid child selection
- setting a '7' before the chosen resolution
- setting the deleted child of a pentagon

I don't which would have what effect, though I suspect the last two would be the slowest to recognize so they might be the ones to prioritize?

ajfriend · 2021-09-27T03:47:50Z

I don't which would have what effect, though I suspect the last two would be the slowest to recognize so they might be the ones to prioritize?

They'd be recognized in the order you stated. Thinking about it now, a benchmark like this would basically be on "doing only the first n out of N steps" of the validation. The "valid data" case then sort of includes all of the "invalid data" cases implicitly (assuming isValidCell works correctly, but that's covered by the test suite).

I'm now leaning towards just keeping it like we have it now, with just the "valid data" tests, since the speed of identifying invalid data (of any of the types listed above) should basically correspond how quickly the function runs through the full stage of validity tests. I guess we would miss the exact n/N breakdown, but I'm not sure that's providing much more information--unless we have a specific hypothesis we'd like to look into.

dfellis · 2021-09-27T14:17:58Z

I don't which would have what effect, though I suspect the last two would be the slowest to recognize so they might be the ones to prioritize?

They'd be recognized in the order you stated. Thinking about it now, a benchmark like this would basically be on "doing only the first n out of N steps" of the validation. The "valid data" case then sort of includes all of the "invalid data" cases implicitly (assuming isValidCell works correctly, but that's covered by the test suite).

I'm now leaning towards just keeping it like we have it now, with just the "valid data" tests, since the speed of identifying invalid data (of any of the types listed above) should basically correspond how quickly the function runs through the full stage of validity tests. I guess we would miss the exact n/N breakdown, but I'm not sure that's providing much more information--unless we have a specific hypothesis we'd like to look into.

Mixer valid and invalid cells may be slower than either 100% valid or 100% invalid because of branch prediction failures?

nrabinowitz · 2021-09-27T18:24:30Z

src/apps/benchmarks/benchmarkIsValidCell.c

+cells = calloc(N, sizeof(H3Index));
+H3_EXPORT(cellToChildren)(p, childRes, cells);
+
+BENCHMARK(pentagonChildren_2_8, 1000, {


Why are we only benchmarking pentagon parent cells?

Just an easy way to generate data. At least with pentagon parents, you get one pentagon child cell.

nrabinowitz · 2021-09-27T18:42:13Z

FWIW - in my experience the most common case here is a mix of valid H3 indexes and nulls. So it might be worth benchmarking that, e.g. with alternating indexes. But I'm not sure the branch prediction failure question makes sense to benchmark - won't that vary widely over platforms, not to mention across bindings?

isaacbrodsky · 2021-09-28T17:32:19Z

src/apps/benchmarks/benchmarkIsValidCell.c

+// pentagon 2->8
+//
+// Starting with a parent *pentagon* cell of resolution 2,
+// loop through all its children at resolution 8, applying `isValidCell`
+parentRes = 2;
+childRes = 8;
+p = pentagonAtRes(parentRes);
+H3_EXPORT(cellToChildrenSize)(p, childRes, &N);


Total nit but could consider doing all setup before all BENCHMARK macros. I don't expect it to effect results.

I'd also like to organize it that way, but I could't figure out how to do a problem setup for each benchmark without it being timed.

I'd like to setup and free any memory right before and after each timing, but I don't see an easy way to do that with the current benchmarking code. Is there a way I'm not seeing?

isaacbrodsky · 2021-09-28T17:40:41Z

FWIW - in my experience the most common case here is a mix of valid H3 indexes and nulls. So it might be worth benchmarking that, e.g. with alternating indexes. But I'm not sure the branch prediction failure question makes sense to benchmark - won't that vary widely over platforms, not to mention across bindings?

I am also curious about performance across valid cells and H3_NULL.

I think the branch prediction question is applicable to our entire benchmarking approach. Both that we test the same data repeatedly, and that we don't have a performance reference.

ajfriend · 2021-10-02T20:25:24Z

Here are the numbers I got from running on my macbook air (so take them with a grain of salt). The last test is the same as the first, just to give an idea of how much the numbers can vary based on how hot my laptop is running (I'll remove the first one before landing).

build/bin/benchmarkIsValidCell
    -- pentagonChildren_8_14_null_100: 1844.567000 microseconds per iteration (1000 iterations)
    -- pentagonChildren_8_14: 1938.043000 microseconds per iteration (1000 iterations)
    -- pentagonChildren_2_8: 1663.274000 microseconds per iteration (1000 iterations)
    -- pentagonChildren_8_14_null_2: 1104.476000 microseconds per iteration (1000 iterations)
    -- pentagonChildren_8_14_null_10: 2010.307000 microseconds per iteration (1000 iterations)
    -- pentagonChildren_8_14_null_100: 1988.599000 microseconds per iteration (1000 iterations)

ajfriend · 2021-10-02T20:36:20Z

I added the H3_NULL tests above. I kicked the other branch prediction benchmark questions to #520, just because I'd like to close this out and get back to #496.

I probably won't keep this in the final, but adding it here just to have the code in the git history and report at least one run.

ajfriend · 2021-10-02T21:15:45Z

Run with all cells at resolution 4:

build/bin/benchmarkIsValidCell
	-- allCellsAtRes4: 12272.413000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_8_14: 5825.150000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_2_8: 4668.005000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_8_14_null_2: 2515.056000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_8_14_null_10: 4762.763000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_8_14_null_100: 5794.260000 microseconds per iteration (1000 iterations)

ajfriend · 2021-10-02T22:30:08Z

Run with the latest code:

build/bin/benchmarkIsValidCell
	-- pentagonChildren_2_8: 4460.855000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_8_14: 4377.846000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_8_14_null_2: 2168.223000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_8_14_null_10: 4188.899000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_8_14_null_100: 4368.239000 microseconds per iteration (1000 iterations)

ajfriend · 2021-10-09T03:10:43Z

Mostly just for my future reference, the commit above is some code where I compare
various ways to code up the benchmarks. It looks like each of these
approaches are roughly equivalent---at least when you cmake -DCMAKE_BUILD_TYPE=Release.

s = """
	-- pentagonChildren_2_8_func: 1414.681000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_2_8_loop_no_struct: 1376.121000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_2_8_loop_struct: 1406.939000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_2_8_func: 1412.431000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_2_8_loop_no_struct: 1406.760000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_2_8_loop_struct: 1422.283000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_2_8_func: 1436.422000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_2_8_loop_no_struct: 1433.394000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_2_8_loop_struct: 1439.210000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_2_8_func: 1478.305000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_2_8_loop_no_struct: 1454.012000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_2_8_loop_struct: 1516.242000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_2_8_func: 1493.511000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_2_8_loop_no_struct: 1518.663000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_2_8_loop_struct: 1497.821000 microseconds per iteration (1000 iterations)
"""

s = s.split('\n')
s = [e.strip('- \t') for e in s if e]
s = [e.split()[:2] for e in s]
s = [(name[:-1], float(val)) for name, val in s]

import toolz

s = toolz.groupby(lambda x: x[0], s)
s = toolz.valmap(lambda e: [x[1] for x in e], s)


import matplotlib.pyplot as plt

fig, ax = plt.subplots()
for name, vals in s.items():
    ax.plot(vals, '-o', label=name)
ax.legend()

ajfriend · 2021-10-09T03:30:32Z

And here are the latest benchmarks (still on my macbook air). Also, the previous benchmark numbers are off because I'd forgotten to cmake -DCMAKE_BUILD_TYPE=Release ... That's why these numbers are lower.

build/bin/benchmarkIsValidCell
	-- pentagonChildren_2_8_func: 1636.138000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_8_14: 2254.166000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_8_14_null_2: 1229.321000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_8_14_null_10: 2064.721000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_8_14_null_100: 2220.962000 microseconds per iteration (1000 iterations)

ajfriend · 2021-10-09T03:53:30Z

build/bin/benchmarkIsValidCell
	-- pentagonChildren_2_8: 1602.169000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_8_14: 1939.921000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_8_14_null_2: 1071.529000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_8_14_null_10: 1828.599000 microseconds per iteration (1000 iterations)
	-- pentagonChildren_8_14_null_100: 1878.061000 microseconds per iter
964A
ation (1000 iterations)

Add benchmarkIsValidCell.c

b36496b

ajfriend requested review from nrabinowitz, dfellis and isaacbrodsky September 27, 2021 00:05

dfellis approved these changes Sep 27, 2021

View reviewed changes

nrabinowitz reviewed Sep 27, 2021

View reviewed changes

isaacbrodsky approved these changes Sep 28, 2021

View reviewed changes

add in periodic H3_NULL data

0d1af85

remove duplicate benchmark

88d73e2

ajfriend mentioned this pull request Oct 2, 2021

Benchmark isValidCell with invalid cells #520

Open

ajfriend added 2 commits October 2, 2021 14:11

benchmark for all cells at resolution 4.

a40a448

I probably won't keep this in the final, but adding it here just to have the code in the git history and report at least one run.

res typo

e15fb7e

nrabinowitz approved these changes Oct 2, 2021

View reviewed changes

ajfriend added 2 commits October 2, 2021 14:21

remove the bench with all cells at resolution 4

0ab95be

DRY up that code!

0207c31

testing some stuff

185e606

put back the original benchmarks

046750c

ajfriend mentioned this pull request Oct 9, 2021

Add GitHub action to test that benchmarks run without error #523

Closed

names are important

63a0850

ajfriend merged commit f2786a4 into uber:master Oct 9, 2021

ajfriend deleted the bench_isValidCell branch October 9, 2021 04:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add benchmarkIsValidCell.c #518

Add benchmarkIsValidCell.c #518

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Add benchmarkIsValidCell.c #518

Add benchmarkIsValidCell.c #518

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!