8000 [Feature] callables for merge_tensordicts by vmoens · Pull Request #1033 · pytorch/tensordict · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[Feature] callables for merge_tensordicts #1033

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 7, 2024

Conversation

vmoens
Copy link
Collaborator
@vmoens vmoens commented Oct 7, 2024

Stack from ghstack (oldest at bottom):

[ghstack-poisoned]
vmoens pushed a commit that referenced this pull request Oct 7, 2024
ghstack-source-id: ff5ade8
Pull Request resolved: #1033
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 7, 2024
@vmoens vmoens linked an issue Oct 7, 2024 that may be closed by this pull request
3 tasks
Copy link
github-actions bot commented Oct 7, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 222. Improved: $\large\color{#35bf28}29$. Worsened: $\large\color{#d91a1a}12$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 50.7250μs 24.7321μs 40.4333 KOps/s 38.2005 KOps/s $\textbf{\color{#35bf28}+5.84\%}$
test_plain_set_stack_nested 57.3470μs 24.7340μs 40.4302 KOps/s 38.2797 KOps/s $\textbf{\color{#35bf28}+5.62\%}$
test_plain_set_nested_inplace 67.1760μs 26.8507μs 37.2430 KOps/s 35.2032 KOps/s $\textbf{\color{#35bf28}+5.79\%}$
test_plain_set_stack_nested_inplace 90.8300μs 26.8859μs 37.1942 KOps/s 35.1307 KOps/s $\textbf{\color{#35bf28}+5.87\%}$
test_items 45.3350μs 4.0772μs 245.2639 KOps/s 235.5507 KOps/s $\color{#35bf28}+4.12\%$
test_items_nested 0.6743ms 0.3847ms 2.5994 KOps/s 2.5871 KOps/s $\color{#35bf28}+0.48\%$
test_items_nested_locked 0.5877ms 0.3844ms 2.6011 KOps/s 2.5928 KOps/s $\color{#35bf28}+0.32\%$
test_items_nested_leaf 0.2233ms 82.0964μs 12.1808 KOps/s 12.3260 KOps/s $\color{#d91a1a}-1.18\%$
test_items_stack_nested 0.7910ms 0.3931ms 2.5436 KOps/s 2.5407 KOps/s $\color{#35bf28}+0.12\%$
test_items_stack_nested_leaf 0.1700ms 83.5683μs 11.9663 KOps/s 11.7204 KOps/s $\color{#35bf28}+2.10\%$
test_items_stack_nested_locked 0.6179ms 0.3849ms 2.5979 KOps/s 2.5772 KOps/s $\color{#35bf28}+0.80\%$
test_keys 39.2840μs 6.1595μs 162.3510 KOps/s 287.7866 KOps/s $\textbf{\color{#d91a1a}-43.59\%}$
test_keys_nested 0.2298ms 0.1352ms 7.3952 KOps/s 7.3985 KOps/s $\color{#d91a1a}-0.04\%$
test_keys_nested_locked 1.6569ms 0.1407ms 7.1066 KOps/s 7.1465 KOps/s $\color{#d91a1a}-0.56\%$
test_keys_nested_leaf 0.1915ms 0.1170ms 8.5445 KOps/s 8.4694 KOps/s $\color{#35bf28}+0.89\%$
test_keys_stack_nested 0.2533ms 0.1364ms 7.3302 KOps/s 7.3498 KOps/s $\color{#d91a1a}-0.27\%$
test_keys_stack_nested_leaf 0.1938ms 0.1176ms 8.4998 KOps/s 8.5434 KOps/s $\color{#d91a1a}-0.51\%$
test_keys_stack_nested_locked 0.2660ms 0.1417ms 7.0594 KOps/s 7.2210 KOps/s $\color{#d91a1a}-2.24\%$
test_values 16.9598μs 1.0379μs 963.5061 KOps/s 948.5958 KOps/s $\color{#35bf28}+1.57\%$
test_values_nested 0.1543ms 94.5345μs 10.5782 KOps/s 10.6194 KOps/s $\color{#d91a1a}-0.39\%$
test_values_nested_locked 0.1561ms 94.2439μs 10.6108 KOps/s 10.3902 KOps/s $\color{#35bf28}+2.12\%$
test_values_nested_leaf 0.1327ms 78.7920μs 12.6916 KOps/s 12.3562 KOps/s $\color{#35bf28}+2.71\%$
test_values_stack_nested 0.1886ms 95.4640μs 10.4752 KOps/s 10.8739 KOps/s $\color{#d91a1a}-3.67\%$
test_values_stack_nested_leaf 0.1432ms 79.7131μs 12.5450 KOps/s 12.8134 KOps/s $\color{#d91a1a}-2.09\%$
test_values_stack_nested_locked 0.1879ms 93.9849μs 10.6400 KOps/s 10.6908 KOps/s $\color{#d91a1a}-0.48\%$
test_membership 24.1050μs 0.8993μs 1.1119 MOps/s 1.0534 MOps/s $\textbf{\color{#35bf28}+5.55\%}$
test_membership_nested 18.6650μs 2.7515μs 363.4375 KOps/s 353.8287 KOps/s $\color{#35bf28}+2.72\%$
test_membership_nested_leaf 33.4430μs 2.7440μs 364.4369 KOps/s 359.5750 KOps/s $\color{#35bf28}+1.35\%$
test_membership_stacked_nested 22.6520μs 2.8227μs 354.2705 KOps/s 363.1403 KOps/s $\color{#d91a1a}-2.44\%$
test_membership_stacked_nested_leaf 23.8940μs 2.7819μs 359.4660 KOps/s 360.6205 KOps/s $\color{#d91a1a}-0.32\%$
test_membership_nested_last 38.9530μs 4.2827μs 233.5000 KOps/s 237.3322 KOps/s $\color{#d91a1a}-1.61\%$
test_membership_nested_leaf_last 41.0770μs 4.3019μs 232.4530 KOps/s 237.3298 KOps/s $\color{#d91a1a}-2.05\%$
test_membership_stacked_nested_last 30.1470μs 5.1032μs 195.9560 KOps/s 74.1634 KOps/s $\textbf{\color{#35bf28}+164.22\%}$
test_membership_stacked_nested_leaf_last 29.6050μs 5.2588μs 190.1567 KOps/s 74.6697 KOps/s $\textbf{\color{#35bf28}+154.66\%}$
test_nested_getleaf 69.9910μs 10.7198μs 93.2853 KOps/s 93.2055 KOps/s $\color{#35bf28}+0.09\%$
test_nested_get 46.5780μs 10.3574μs 96.5497 KOps/s 98.9442 KOps/s $\color{#d91a1a}-2.42\%$
test_stacked_getleaf 35.9470μs 10.8156μs 92.4591 KOps/s 93.4178 KOps/s $\color{#d91a1a}-1.03\%$
test_stacked_get 34.0740μs 10.2790μs 97.2857 KOps/s 97.9178 KOps/s $\color{#d91a1a}-0.65\%$
test_nested_getitemleaf 42.9310μs 11.2802μs 88.6507 KOps/s 89.0281 KOps/s $\color{#d91a1a}-0.42\%$
test_nested_getitem 47.4390μs 10.3049μs 97.0416 KOps/s 94.4079 KOps/s $\color{#35bf28}+2.79\%$
test_stacked_getitemleaf 43.9330μs 11.1663μs 89.5551 KOps/s 89.7049 KOps/s $\color{#d91a1a}-0.17\%$
test_stacked_getitem 39.4540μs 10.4847μs 95.3767 KOps/s 95.8271 KOps/s $\color{#d91a1a}-0.47\%$
test_lock_nested 97.0165ms 0.6250ms 1.6001 KOps/s 1.9080 KOps/s $\textbf{\color{#d91a1a}-16.14\%}$
test_lock_stack_nested 0.7609ms 0.4865ms 2.0553 KOps/s 2.1576 KOps/s $\color{#d91a1a}-4.74\%$
test_unlock_nested 98.4856ms 0.5390ms 1.8553 KOps/s 2.3015 KOps/s $\textbf{\color{#d91a1a}-19.39\%}$
test_unlock_stack_nested 0.6490ms 0.3972ms 2.5175 KOps/s 2.6556 KOps/s $\textbf{\color{#d91a1a}-5.20\%}$
test_flatten_speed 0.1846ms 0.1018ms 9.8184 KOps/s 9.7821 KOps/s $\color{#35bf28}+0.37\%$
test_unflatten_speed 0.7542ms 0.5255ms 1.9031 KOps/s 1.8857 KOps/s $\color{#35bf28}+0.92\%$
test_common_ops 4.3847ms 1.1719ms 853.2875 Ops/s 817.5560 Ops/s $\color{#35bf28}+4.37\%$
test_creation 29.6060μs 2.0826μs 480.1596 KOps/s 479.5498 KOps/s $\color{#35bf28}+0.13\%$
test_creation_empty 55.8450μs 18.1913μs 54.9714 KOps/s 49.1433 KOps/s $\textbf{\color{#35bf28}+11.86\%}$
test_creation_nested_1 62.0970μs 21.3797μs 46.7734 KOps/s 40.3618 KOps/s $\textbf{\color{#35bf28}+15.89\%}$
test_creation_nested_2 63.5590μs 25.7360μs 38.8560 KOps/s 34.5487 KOps/s $\textbf{\color{#35bf28}+12.47\%}$
test_clone 0.1544ms 17.8283μs 56.0907 KOps/s 58.4774 KOps/s $\color{#d91a1a}-4.08\%$
test_getitem[int] 1.0733ms 17.1163μs 58.4239 KOps/s 60.3459 KOps/s $\color{#d91a1a}-3.18\%$
test_getitem[slice_int] 0.1388ms 31.2426μs 32.0075 KOps/s 33.4814 KOps/s $\color{#d91a1a}-4.40\%$
test_getitem[range] 0.3845ms 57.5391μs 17.3795 KOps/s 16.6606 KOps/s $\color{#35bf28}+4.31\%$
test_getitem[tuple] 0.1376ms 24.9729μs 40.0434 KOps/s 38.2064 KOps/s $\color{#35bf28}+4.81\%$
test_getitem[list] 0.4440ms 53.8262μs 18.5783 KOps/s 18.4555 KOps/s $\color{#35bf28}+0.67\%$
test_setitem_dim[int] 60.8840μs 33.5902μs 29.7706 KOps/s 29.6779 KOps/s $\color{#35bf28}+0.31\%$
test_setitem_dim[slice_int] 0.1004ms 62.1766μs 16.0832 KOps/s 16.3061 KOps/s $\color{#d91a1a}-1.37\%$
test_setitem_dim[range] 0.1389ms 83.7582μs 11.9391 KOps/s 11.7298 KOps/s $\color{#35bf28}+1.78\%$
test_setitem_dim[tuple] 97.4530μs 51.0203μs 19.6001 KOps/s 19.9552 KOps/s $\color{#d91a1a}-1.78\%$
test_setitem 0.2469ms 30.6183μs 32.6603 KOps/s 31.8568 KOps/s $\color{#35bf28}+2.52\%$
test_set 0.2227ms 29.5847μs 33.8013 KOps/s 32.3753 KOps/s $\color{#35bf28}+4.40\%$
test_set_shared 2.6073ms 0.2263ms 4.4194 KOps/s 4.4405 KOps/s $\color{#d91a1a}-0.47\%$
test_update 0.2318ms 39.2350μs 25.4875 KOps/s 25.0048 KOps/s $\color{#35bf28}+1.93\%$
test_update_nested 1.0693ms 49.8707μs 20.0518 KOps/s 19.4088 KOps/s $\color{#35bf28}+3.31\%$
test_update__nested 0.2491ms 37.5426μs 26.6364 KOps/s 27.4276 KOps/s $\color{#d91a1a}-2.88\%$
test_set_nested 0.2362ms 32.3390μs 30.9224 KOps/s 29.9744 KOps/s $\color{#35bf28}+3.16\%$
test_set_nested_new 0.2291ms 37.2567μs 26.8408 KOps/s 26.3672 KOps/s $\color{#35bf28}+1.80\%$
test_select 0.2675ms 56.0445μs 17.8430 KOps/s 17.7811 KOps/s $\color{#35bf28}+0.35\%$
test_select_nested 0.1353ms 60.2562μs 16.5958 KOps/s 16.7317 KOps/s $\color{#d91a1a}-0.81\%$
test_exclude_nested 0.1466ms 75.1319μs 13.3099 KOps/s 13.3904 KOps/s $\color{#d91a1a}-0.60\%$
test_empty[True] 0.7103ms 0.3501ms 2.8566 KOps/s 2.8552 KOps/s $\color{#35bf28}+0.05\%$
test_empty[False] 9.3527μs 1.2365μs 808.7494 KOps/s 784.9200 KOps/s $\color{#35bf28}+3.04\%$
test_unbind_speed 0.7131ms 0.3067ms 3.2601 KOps/s 3.2788 KOps/s $\color{#d91a1a}-0.57\%$
test_unbind_speed_stack0 0.4985ms 0.3064ms 3.2640 KOps/s 3.4338 KOps/s $\color{#d91a1a}-4.94\%$
test_unbind_speed_stack1 98.5829ms 0.8517ms 1.1742 KOps/s 1.3841 KOps/s $\textbf{\color{#d91a1a}-15.17\%}$
test_split 2.3077ms 2.0203ms 494.9744 Ops/s 454.3066 Ops/s $\textbf{\color{#35bf28}+8.95\%}$
test_chunk 0.1017s 2.2384ms 446.7435 Ops/s 442.7684 Ops/s $\color{#35bf28}+0.90\%$
test_creation[device0] 0.2666ms 0.1175ms 8.5139 KOps/s 8.4758 KOps/s $\color{#35bf28}+0.45\%$
test_creation_from_tensor 3.6900ms 0.1188ms 8.4166 KOps/s 8.3258 KOps/s $\color{#35bf28}+1.09\%$
test_add_one[memmap_tensor0] 0.4172ms 7.6446μs 130.8121 KOps/s 140.1591 KOps/s $\textbf{\color{#d91a1a}-6.67\%}$
test_contiguous[memmap_tensor0] 27.1310μs 1.9127μs 522.8305 KOps/s 511.9061 KOps/s $\color{#35bf28}+2.13\%$
test_stack[memmap_tensor0] 70.9240μs 5.5814μs 179.1666 KOps/s 181.9950 KOps/s $\color{#d91a1a}-1.55\%$
test_memmaptd_index 1.2028ms 0.4050ms 2.4693 KOps/s 2.4082 KOps/s $\color{#35bf28}+2.54\%$
test_memmaptd_index_astensor 1.1269ms 0.5094ms 1.9631 KOps/s 1.9373 KOps/s $\color{#35bf28}+1.33\%$
test_memmaptd_index_op 1.8519ms 1.0630ms 940.7448 Ops/s 915.9545 Ops/s $\color{#35bf28}+2.71\%$
test_serialize_model 0.2205s 0.1363s 7.3385 Ops/s 8.3629 Ops/s $\textbf{\color{#d91a1a}-12.25\%}$
test_serialize_model_pickle 0.4413s 0.3930s 2.5443 Ops/s 2.4968 Ops/s $\color{#35bf28}+1.90\%$
test_serialize_weights 0.1271s 0.1193s 8.3800 Ops/s 7.7332 Ops/s $\textbf{\color{#35bf28}+8.36\%}$
test_serialize_weights_returnearly 0.1895s 0.1648s 6.0677 Ops/s 6.3029 Ops/s $\color{#d91a1a}-3.73\%$
test_serialize_weights_pickle 1.1474s 0.7569s 1.3212 Ops/s 2.4651 Ops/s $\textbf{\color{#d91a1a}-46.40\%}$
test_serialize_weights_filesystem 0.1550s 0.1438s 6.9529 Ops/s 6.9483 Ops/s $\color{#35bf28}+0.06\%$
test_serialize_model_filesystem 0.1580s 0.1498s 6.6769 Ops/s 5.9948 Ops/s $\textbf{\color{#35bf28}+11.38\%}$
test_reshape_pytree 90.9510μs 38.3180μs 26.0974 KOps/s 25.4177 KOps/s $\color{#35bf28}+2.67\%$
test_reshape_td 0.1212ms 46.9174μs 21.3141 KOps/s 21.1950 KOps/s $\color{#35bf28}+0.56\%$
test_view_pytree 0.1921ms 39.3034μs 25.4431 KOps/s 25.4710 KOps/s $\color{#d91a1a}-0.11\%$
test_view_td 0.1123ms 52.1095μs 19.1904 KOps/s 19.3364 KOps/s $\color{#d91a1a}-0.76\%$
test_unbind_pytree 77.7660μs 36.6070μs 27.3172 KOps/s 27.5695 KOps/s $\color{#d91a1a}-0.92\%$
test_unbind_td 0.3154ms 45.7971μs 21.8355 KOps/s 22.0167 KOps/s $\color{#d91a1a}-0.82\%$
test_split_pytree 83.9970μs 37.9591μs 26.3442 KOps/s 26.4366 KOps/s $\color{#d91a1a}-0.35\%$
test_split_td 0.4709ms 58.1233μs 17.2048 KOps/s 17.4518 KOps/s $\color{#d91a1a}-1.42\%$
test_add_pytree 99.2160μs 46.0406μs 21.7199 KOps/s 22.5159 KOps/s $\color{#d91a1a}-3.54\%$
test_add_td 0.2175ms 86.4711μs 11.5646 KOps/s 11.5340 KOps/s $\color{#35bf28}+0.27\%$
test_compile_add_one_nested[tensordict-compile] 0.1360ms 57.8580μs 17.2837 KOps/s 17.2925 KOps/s $\color{#d91a1a}-0.05\%$
test_compile_add_one_nested[tensordict-eager] 0.3595ms 0.1981ms 5.0483 KOps/s 5.0673 KOps/s $\color{#d91a1a}-0.38\%$
test_compile_add_one_nested[pytree-compile] 0.1307ms 57.0640μs 17.5242 KOps/s 17.7612 KOps/s $\color{#d91a1a}-1.33\%$
test_compile_add_one_nested[pytree-eager] 0.2710ms 0.1459ms 6.8545 KOps/s 7.2485 KOps/s $\textbf{\color{#d91a1a}-5.44\%}$
test_compile_copy_nested[tensordict-compile] 74.1990μs 23.0375μs 43.4074 KOps/s 42.9758 KOps/s $\color{#35bf28}+1.00\%$
test_compile_copy_nested[tensordict-eager] 0.1673ms 74.9326μs 13.3453 KOps/s 13.3110 KOps/s $\color{#35bf28}+0.26\%$
test_compile_copy_nested[pytree-compile] 0.1437ms 75.5956μs 13.2283 KOps/s 12.9962 KOps/s $\color{#35bf28}+1.79\%$
test_compile_copy_nested[pytree-eager] 0.2123ms 68.7499μs 14.5455 KOps/s 14.6103 KOps/s $\color{#d91a1a}-0.44\%$
test_compile_add_one_flat[tensordict-compile] 0.3263ms 0.1831ms 5.4615 KOps/s 5.3797 KOps/s $\color{#35bf28}+1.52\%$
test_compile_add_one_flat[tensordict-eager] 0.5052ms 0.2439ms 4.0999 KOps/s 4.0794 KOps/s $\color{#35bf28}+0.50\%$
test_compile_add_one_flat[tensorclass-compile] 0.1163ms 47.6281μs 20.9960 KOps/s 20.7594 KOps/s $\color{#35bf28}+1.14\%$
test_compile_add_one_flat[tensorclass-eager] 0.1706ms 76.2600μs 13.1130 KOps/s 12.8261 KOps/s $\color{#35bf28}+2.24\%$
test_compile_add_one_flat[pytree-compile] 0.4021ms 0.1778ms 5.6228 KOps/s 5.6753 KOps/s $\color{#d91a1a}-0.93\%$
test_compile_add_one_flat[pytree-eager] 0.5794ms 0.3032ms 3.2981 KOps/s 3.5011 KOps/s $\textbf{\color{#d91a1a}-5.80\%}$
test_compile_add_self_flat[tensordict-eager] 0.4683ms 0.2799ms 3.5722 KOps/s 3.5540 KOps/s $\color{#35bf28}+0.51\%$
test_compile_add_self_flat[tensordict-compile] 0.3624ms 0.1828ms 5.4696 KOps/s 5.4377 KOps/s $\color{#35bf28}+0.59\%$
test_compile_add_self_flat[tensorclass-eager] 0.1992ms 74.1262μs 13.4905 KOps/s 13.7038 KOps/s $\color{#d91a1a}-1.56\%$
test_compile_add_self_flat[tensorclass-compile] 0.1087ms 47.6172μs 21.0008 KOps/s 20.2836 KOps/s $\color{#35bf28}+3.54\%$
test_compile_add_self_flat[pytree-eager] 0.4347ms 0.2420ms 4.1327 KOps/s 4.3355 KOps/s $\color{#d91a1a}-4.68\%$
test_compile_add_self_flat[pytree-compile] 0.2826ms 0.1745ms 5.7318 KOps/s 5.4705 KOps/s $\color{#35bf28}+4.78\%$
test_compile_copy_flat[tensordict-compile] 0.2567ms 0.1124ms 8.8951 KOps/s 8.8610 KOps/s $\color{#35bf28}+0.38\%$
test_compile_copy_flat[tensordict-eager] 0.1419ms 78.4763μs 12.7427 KOps/s 12.4719 KOps/s $\color{#35bf28}+2.17\%$
test_compile_copy_flat[pytree-compile] 0.1630ms 78.4753μs 12.7429 KOps/s 12.9868 KOps/s $\color{#d91a1a}-1.88\%$
test_compile_copy_flat[pytree-eager] 0.1267ms 68.4658μs 14.6058 KOps/s 14.3004 KOps/s $\color{#35bf28}+2.14\%$
test_compile_assign_and_add[tensordict-compile] 0.3798ms 0.1956ms 5.1132 KOps/s 5.1206 KOps/s $\color{#d91a1a}-0.14\%$
test_compile_assign_and_add[tensordict-eager] 2.4806ms 1.7639ms 566.9239 Ops/s 560.4942 Ops/s $\color{#35bf28}+1.15\%$
test_compile_assign_and_add[pytree-compile] 0.2883ms 0.1944ms 5.1436 KOps/s 5.0579 KOps/s $\color{#35bf28}+1.69\%$
test_compile_assign_and_add[pytree-eager] 1.4093ms 1.1248ms 889.0675 Ops/s 901.2215 Ops/s $\color{#d91a1a}-1.35\%$
test_compile_assign_and_add_stack[compile] 0.5202ms 0.4320ms 2.3149 KOps/s 2.3283 KOps/s $\color{#d91a1a}-0.58\%$
test_compile_assign_and_add_stack[eager] 4.2511ms 4.1031ms 243.7175 Ops/s 232.0005 Ops/s $\textbf{\color{#35bf28}+5.05\%}$
test_compile_indexing[tensor-tensordict-compile] 0.1054ms 33.7168μs 29.6588 KOps/s 29.5461 KOps/s $\color{#35bf28}+0.38\%$
test_compile_indexing[tensor-tensordict-eager] 0.7987ms 47.9683μs 20.8471 KOps/s 20.1993 KOps/s $\color{#35bf28}+3.21\%$
test_compile_indexing[tensor-tensorclass-compile] 70.1120μs 28.9667μs 34.5224 KOps/s 32.5657 KOps/s $\textbf{\color{#35bf28}+6.01\%}$
test_compile_indexing[tensor-tensorclass-eager] 66.7350μs 27.9508μs 35.7772 KOps/s 35.5710 KOps/s $\color{#35bf28}+0.58\%$
test_compile_indexing[tensor-pytree-compile] 73.3980μs 29.4147μs 33.9966 KOps/s 33.2196 KOps/s $\color{#35bf28}+2.34\%$
test_compile_indexing[tensor-pytree-eager] 70.4030μs 28.4331μs 35.1702 KOps/s 35.4824 KOps/s $\color{#d91a1a}-0.88\%$
test_compile_indexing[slice-tensordict-compile] 0.1351ms 73.1111μs 13.6778 KOps/s 13.6145 KOps/s $\color{#35bf28}+0.47\%$
test_compile_indexing[slice-tensordict-eager] 0.5754ms 27.3047μs 36.6238 KOps/s 36.6569 KOps/s $\color{#d91a1a}-0.09\%$
test_compile_indexing[slice-tensorclass-compile] 0.1249ms 68.7272μs 14.5503 KOps/s 14.6018 KOps/s $\color{#d91a1a}-0.35\%$
test_compile_indexing[slice-tensorclass-eager] 59.8220μs 23.4326μs 42.6755 KOps/s 43.9083 KOps/s $\color{#d91a1a}-2.81\%$
test_compile_indexing[slice-pytree-compile] 0.1411ms 69.0763μs 14.4767 KOps/s 14.7681 KOps/s $\color{#d91a1a}-1.97\%$
test_compile_indexing[slice-pytree-eager] 68.0380μs 23.3470μs 42.8321 KOps/s 44.1311 KOps/s $\color{#d91a1a}-2.94\%$
test_compile_indexing[int-tensordict-compile] 0.1962ms 73.4123μs 13.6217 KOps/s 13.8175 KOps/ 8000 s $\color{#d91a1a}-1.42\%$
test_compile_indexing[int-tensordict-eager] 1.0275ms 26.9715μs 37.0761 KOps/s 37.0889 KOps/s $\color{#d91a1a}-0.03\%$
test_compile_indexing[int-tensorclass-compile] 0.1340ms 68.6012μs 14.5770 KOps/s 14.4873 KOps/s $\color{#35bf28}+0.62\%$
test_compile_indexing[int-tensorclass-eager] 65.6430μs 22.9312μs 43.6088 KOps/s 43.5573 KOps/s $\color{#35bf28}+0.12\%$
test_compile_indexing[int-pytree-compile] 0.1242ms 67.9646μs 14.7135 KOps/s 14.6336 KOps/s $\color{#35bf28}+0.55\%$
test_compile_indexing[int-pytree-eager] 72.5160μs 23.1636μs 43.1711 KOps/s 44.2496 KOps/s $\color{#d91a1a}-2.44\%$
test_mod_add[eager] 75.8120μs 26.1014μs 38.3121 KOps/s 37.8522 KOps/s $\color{#35bf28}+1.21\%$
test_mod_add[compile] 0.1263ms 40.1790μs 24.8886 KOps/s 26.7379 KOps/s $\textbf{\color{#d91a1a}-6.92\%}$
test_mod_add[compile-overhead] 0.1209ms 38.5789μs 25.9209 KOps/s 26.0201 KOps/s $\color{#d91a1a}-0.38\%$
test_mod_wrap[eager] 0.4147ms 0.2089ms 4.7861 KOps/s 4.7907 KOps/s $\color{#d91a1a}-0.10\%$
test_mod_wrap[compile] 0.3253ms 0.2320ms 4.3100 KOps/s 4.2883 KOps/s $\color{#35bf28}+0.50\%$
test_mod_wrap[compile-overhead] 0.6509ms 0.2315ms 4.3205 KOps/s 4.3512 KOps/s $\color{#d91a1a}-0.71\%$
test_mod_wrap_and_backward[eager] 12.1062ms 10.8577ms 92.1004 Ops/s 83.6593 Ops/s $\textbf{\color{#35bf28}+10.09\%}$
test_mod_wrap_and_backward[compile] 12.1715ms 10.8735ms 91.9667 Ops/s 81.6723 Ops/s $\textbf{\color{#35bf28}+12.60\%}$
test_mod_wrap_and_backward[compile-overhead] 12.5860ms 10.9130ms 91.6340 Ops/s 84.1718 Ops/s $\textbf{\color{#35bf28}+8.87\%}$
test_seq_add[eager] 0.1656ms 92.5382μs 10.8063 KOps/s 10.5228 KOps/s $\color{#35bf28}+2.69\%$
test_seq_add[compile] 0.1484ms 64.1784μs 15.5816 KOps/s 15.2971 KOps/s $\color{#35bf28}+1.86\%$
test_seq_add[compile-overhead] 0.1142ms 62.1953μs 16.0784 KOps/s 15.4164 KOps/s $\color{#35bf28}+4.29\%$
test_seq_wrap[eager] 0.6627ms 0.3829ms 2.6116 KOps/s 2.4833 KOps/s $\textbf{\color{#35bf28}+5.17\%}$
test_seq_wrap[compile] 1.3605ms 0.2696ms 3.7098 KOps/s 3.6133 KOps/s $\color{#35bf28}+2.67\%$
test_seq_wrap[compile-overhead] 1.3520ms 0.2709ms 3.6912 KOps/s 3.6154 KOps/s $\color{#35bf28}+2.10\%$
test_func_call_runtime[False-eager] 0.7535ms 0.5180ms 1.9306 KOps/s 1.8895 KOps/s $\color{#35bf28}+2.17\%$
test_func_call_runtime[False-compile] 0.8952ms 0.5046ms 1.9819 KOps/s 1.9711 KOps/s $\color{#35bf28}+0.55\%$
test_func_call_runtime[False-compile-overhead] 0.6094ms 0.5059ms 1.9766 KOps/s 1.9972 KOps/s $\color{#d91a1a}-1.03\%$
test_func_call_runtime[True-eager] 1.5492ms 0.7370ms 1.3568 KOps/s 1.3555 KOps/s $\color{#35bf28}+0.09\%$
test_func_call_runtime[True-compile] 0.6114ms 0.5155ms 1.9397 KOps/s 1.9523 KOps/s $\color{#d91a1a}-0.64\%$
test_func_call_runtime[True-compile-overhead] 0.9748ms 0.5188ms 1.9277 KOps/s 1.9306 KOps/s $\color{#d91a1a}-0.15\%$
test_func_call_cm_runtime[False-eager] 0.9337ms 0.5069ms 1.9728 KOps/s 1.9050 KOps/s $\color{#35bf28}+3.56\%$
test_func_call_cm_runtime[False-compile] 0.9253ms 0.5093ms 1.9636 KOps/s 1.9846 KOps/s $\color{#d91a1a}-1.06\%$
test_func_call_cm_runtime[False-compile-overhead] 0.6648ms 0.5081ms 1.9682 KOps/s 1.9775 KOps/s $\color{#d91a1a}-0.47\%$
test_func_call_cm_runtime[True-eager] 1.0197ms 0.8810ms 1.1351 KOps/s 1.1227 KOps/s $\color{#35bf28}+1.11\%$
test_func_call_cm_runtime[True-compile] 0.8487ms 0.7290ms 1.3717 KOps/s 1.3531 KOps/s $\color{#35bf28}+1.37\%$
test_func_call_cm_runtime[True-compile-overhead] 1.0164ms 0.7344ms 1.3617 KOps/s 1.3378 KOps/s $\color{#35bf28}+1.78\%$
test_vmap_func_call_cm_runtime[eager] 2.5678ms 1.9164ms 521.8232 Ops/s 506.0233 Ops/s $\color{#35bf28}+3.12\%$
test_vmap_func_call_cm_runtime[compile] 2.6852ms 1.9873ms 503.2073 Ops/s 496.3027 Ops/s $\color{#35bf28}+1.39\%$
test_vmap_func_call_cm_runtime[compile-overhead] 2.7592ms 1.9795ms 505.1893 Ops/s 494.8159 Ops/s $\color{#35bf28}+2.10\%$
test_distributed 0.2733ms 0.1245ms 8.0330 KOps/s 7.6380 KOps/s $\textbf{\color{#35bf28}+5.17\%}$
test_tdmodule 0.1059ms 18.5901μs 53.7921 KOps/s 50.1135 KOps/s $\textbf{\color{#35bf28}+7.34\%}$
test_tdmodule_dispatch 56.2450μs 36.3103μs 27.5404 KOps/s 25.3087 KOps/s $\textbf{\color{#35bf28}+8.82\%}$
test_tdseq 41.0870μs 20.8872μs 47.8761 KOps/s 44.1199 KOps/s $\textbf{\color{#35bf28}+8.51\%}$
test_tdseq_dispatch 66.2840μs 41.9659μs 23.8289 KOps/s 21.1382 KOps/s $\textbf{\color{#35bf28}+12.73\%}$
test_instantiation_functorch 1.8260ms 1.5809ms 632.5567 Ops/s 629.2471 Ops/s $\color{#35bf28}+0.53\%$
test_instantiation_td 2.3642ms 1.1962ms 835.9559 Ops/s 835.5925 Ops/s $\color{#35bf28}+0.04\%$
test_exec_functorch 0.2976ms 0.1852ms 5.3992 KOps/s 5.4590 KOps/s $\color{#d91a1a}-1.10\%$
test_exec_functional_call 0.3331ms 0.1743ms 5.7381 KOps/s 5.8915 KOps/s $\color{#d91a1a}-2.60\%$
test_exec_td 0.3115ms 0.1985ms 5.0376 KOps/s 4.8977 KOps/s $\color{#35bf28}+2.86\%$
test_exec_td_decorator 0.6032ms 0.2342ms 4.2691 KOps/s 4.2868 KOps/s $\color{#d91a1a}-0.41\%$
test_vmap_mlp_speed[True-True] 0.8885ms 0.6753ms 1.4807 KOps/s 1.4386 KOps/s $\color{#35bf28}+2.93\%$
test_vmap_mlp_speed[True-False] 1.1293ms 0.6900ms 1.4492 KOps/s 1.4461 KOps/s $\color{#35bf28}+0.21\%$
test_vmap_mlp_speed[False-True] 0.8114ms 0.5343ms 1.8717 KOps/s 1.8685 KOps/s $\color{#35bf28}+0.17\%$
test_vmap_mlp_speed[False-False] 0.7371ms 0.5325ms 1.8779 KOps/s 1.8492 KOps/s $\color{#35bf28}+1.55\%$
test_vmap_mlp_speed_decorator[True-True] 1.6793ms 0.6399ms 1.5628 KOps/s 1.5318 KOps/s $\color{#35bf28}+2.02\%$
test_vmap_mlp_speed_decorator[True-False] 0.9538ms 0.6385ms 1.5662 KOps/s 1.5308 KOps/s $\color{#35bf28}+2.31\%$
test_vmap_mlp_speed_decorator[False-True] 0.7179ms 0.5288ms 1.8909 KOps/s 1.8852 KOps/s $\color{#35bf28}+0.31\%$
test_vmap_mlp_speed_decorator[False-False] 0.9435ms 0.5346ms 1.8705 KOps/s 1.8710 KOps/s $\color{#d91a1a}-0.03\%$
test_to_module_speed[True] 2.0893ms 1.4010ms 713.7827 Ops/s 696.7265 Ops/s $\color{#35bf28}+2.45\%$
test_to_module_speed[False] 2.1869ms 1.3799ms 724.6729 Ops/s 719.6450 Ops/s $\color{#35bf28}+0.70\%$
test_tc_init 81.1520μs 43.6377μs 22.9160 KOps/s 20.2776 KOps/s $\textbf{\color{#35bf28}+13.01\%}$
test_tc_init_nested 0.1581ms 90.1148μs 11.0970 KOps/s 10.1091 KOps/s $\textbf{\color{#35bf28}+9.77\%}$
test_tc_first_layer_tensor 29.9360μs 1.5618μs 640.2869 KOps/s 656.5668 KOps/s $\color{#d91a1a}-2.48\%$
test_tc_first_layer_nontensor 26.1290μs 4.9059μs 203.8365 KOps/s 210.9792 KOps/s $\color{#d91a1a}-3.39\%$
test_tc_second_layer_tensor 24.9970μs 2.8541μs 350.3750 KOps/s 345.2721 KOps/s $\color{#35bf28}+1.48\%$
test_tc_second_layer_nontensor 25.7880μs 6.1931μs 161.4709 KOps/s 163.9566 KOps/s $\color{#d91a1a}-1.52\%$
test_unbind 0.4691s 15.4528ms 64.7131 Ops/s 73.4882 Ops/s $\textbf{\color{#d91a1a}-11.94\%}$
test_full_like 8.5406ms 7.6838ms 130.1435 Ops/s 77.8152 Ops/s $\textbf{\color{#35bf28}+67.25\%}$
test_zeros_like 14.4877ms 6.5324ms 153.0826 Ops/s 132.5572 Ops/s $\textbf{\color{#35bf28}+15.48\%}$
test_ones_like 13.2111ms 7.5634ms 132.2164 Ops/s 129.7934 Ops/s $\color{#35bf28}+1.87\%$
test_clone 19.7182ms 9.6600ms 103.5195 Ops/s 104.5763 Ops/s $\color{#d91a1a}-1.01\%$
test_squeeze 93.3650μs 12.7201μs 78.6157 KOps/s 79.3251 KOps/s $\color{#d91a1a}-0.89\%$
test_unsqueeze 0.2315ms 96.0480μs 10.4115 KOps/s 10.7071 KOps/s $\color{#d91a1a}-2.76\%$
test_split 0.4948ms 0.1958ms 5.1072 KOps/s 5.0553 KOps/s $\color{#35bf28}+1.03\%$
test_permute 0.3416ms 0.2141ms 4.6711 KOps/s 4.4255 KOps/s $\textbf{\color{#35bf28}+5.55\%}$
test_stack 32.8707ms 25.4836ms 39.2410 Ops/s 38.3034 Ops/s $\color{#35bf28}+2.45\%$
test_cat 31.6061ms 25.0060ms 39.9904 Ops/s 38.6659 Ops/s $\color{#35bf28}+3.43\%$

Copy link
github-actions bot commented Oct 7, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 228. Improved: $\large\color{#35bf28}43$. Worsened: $\large\color{#d91a1a}16$.

Expand to view detailed results < 8000 /tr>
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.1446ms 16.1356μs 61.9748 KOps/s 57.1225 KOps/s $\textbf{\color{#35bf28}+8.49\%}$
test_plain_set_stack_nested 45.5620μs 16.2137μs 61.6761 KOps/s 57.2139 KOps/s $\textbf{\color{#35bf28}+7.80\%}$
test_plain_set_nested_inplace 52.5130μs 17.3932μs 57.4938 KOps/s 54.1558 KOps/s $\textbf{\color{#35bf28}+6.16\%}$
test_plain_set_stack_nested_inplace 48.2320μs 17.2389μs 58.0083 KOps/s 54.2941 KOps/s $\textbf{\color{#35bf28}+6.84\%}$
test_items 38.9420μs 2.9448μs 339.5827 KOps/s 340.7759 KOps/s $\color{#d91a1a}-0.35\%$
test_items_nested 0.3992ms 0.3439ms 2.9078 KOps/s 2.8710 KOps/s $\color{#35bf28}+1.28\%$
test_items_nested_locked 0.4187ms 0.3434ms 2.9118 KOps/s 2.8365 KOps/s $\color{#35bf28}+2.66\%$
test_items_nested_leaf 0.1022ms 64.3694μs 15.5353 KOps/s 15.4840 KOps/s $\color{#35bf28}+0.33\%$
test_items_stack_nested 0.3759ms 0.3424ms 2.9207 KOps/s 2.8579 KOps/s $\color{#35bf28}+2.20\%$
test_items_stack_nested_leaf 0.1174ms 64.0707μs 15.6077 KOps/s 15.4697 KOps/s $\color{#35bf28}+0.89\%$
test_items_stack_nested_locked 0.3761ms 0.3436ms 2.9101 KOps/s 2.8224 KOps/s $\color{#35bf28}+3.11\%$
test_keys 28.6410μs 3.4409μs 290.6211 KOps/s 287.9930 KOps/s $\color{#35bf28}+0.91\%$
test_keys_nested 0.1352ms 70.7293μs 14.1384 KOps/s 14.3410 KOps/s $\color{#d91a1a}-1.41\%$
test_keys_nested_locked 2.3992ms 77.0756μs 12.9743 KOps/s 12.8519 KOps/s $\color{#35bf28}+0.95\%$
test_keys_nested_leaf 0.1027ms 61.2353μs 16.3304 KOps/s 16.2688 KOps/s $\color{#35bf28}+0.38\%$
test_keys_stack_nested 0.1010ms 70.5416μs 14.1760 KOps/s 14.0935 KOps/s $\color{#35bf28}+0.59\%$
test_keys_stack_nested_leaf 85.4840μs 60.5354μs 16.5193 KOps/s 15.7977 KOps/s $\color{#35bf28}+4.57\%$
test_keys_stack_nested_locked 0.1073ms 76.1793μs 13.1269 KOps/s 13.0225 KOps/s $\color{#35bf28}+0.80\%$
test_values 5.5235μs 0.8455μs 1.1828 MOps/s 1.1904 MOps/s $\color{#d91a1a}-0.64\%$
test_values_nested 85.8440μs 48.7425μs 20.5160 KOps/s 20.5056 KOps/s $\color{#35bf28}+0.05\%$
test_values_nested_locked 86.5440μs 50.5535μs 19.7810 KOps/s 19.9573 KOps/s $\color{#d91a1a}-0.88\%$
test_values_nested_leaf 76.2540μs 42.8740μs 23.3242 KOps/s 23.6056 KOps/s $\color{#d91a1a}-1.19\%$
test_values_stack_nested 80.3640μs 49.0865μs 20.3722 KOps/s 20.3029 KOps/s $\color{#35bf28}+0.34\%$
test_values_stack_nested_leaf 82.0140μs 43.1318μs 23.1848 KOps/s 23.0738 KOps/s $\color{#35bf28}+0.48\%$
test_values_stack_nested_locked 83.2540μs 50.3579μs 19.8578 KOps/s 19.6714 KOps/s $\color{#35bf28}+0.95\%$
test_membership 2.0216μs 0.5012μs 1.9951 MOps/s 1.9988 MOps/s $\color{#d91a1a}-0.19\%$
test_membership_nested 15.5955μs 1.8840μs 530.7955 KOps/s 522.9668 KOps/s $\color{#35bf28}+1.50\%$
test_membership_nested_leaf 9.9937μs 1.8604μs 537.5194 KOps/s 538.0020 KOps/s $\color{#d91a1a}-0.09\%$
test_membership_stacked_nested 30.2210μs 1.9825μs 504.4028 KOps/s 523.6422 KOps/s $\color{#d91a1a}-3.67\%$
test_membership_stacked_nested_leaf 24.0010μs 1.9577μs 510.7991 KOps/s 518.8093 KOps/s $\color{#d91a1a}-1.54\%$
test_membership_nested_last 31.4420μs 2.9689μs 336.8280 KOps/s 337.5475 KOps/s $\color{#d91a1a}-0.21\%$
test_membership_nested_leaf_last 32.7810μs 2.9952μs 333.8629 KOps/s 329.9465 KOps/s $\color{#35bf28}+1.19\%$
test_membership_stacked_nested_last 28.5620μs 2.9900μs 334.4447 KOps/s 328.3241 KOps/s $\color{#35bf28}+1.86\%$
test_membership_stacked_nested_leaf_last 28.8820μs 3.0092μs 332.3088 KOps/s 330.2855 KOps/s $\color{#35bf28}+0.61\%$
test_nested_getleaf 33.8920μs 5.9926μs 166.8712 KOps/s 166.0064 KOps/s $\color{#35bf28}+0.52\%$
test_nested_get 67.1330μs 5.7212μs 174.7873 KOps/s 173.5556 KOps/s $\color{#35bf28}+0.71\%$
test_stacked_getleaf 29.3820μs 6.1051μs 163.7980 KOps/s 165.7352 KOps/s $\color{#d91a1a}-1.17\%$
test_stacked_get 31.2710μs 5.6644μs 176.5401 KOps/s 176.9183 KOps/s $\color{#d91a1a}-0.21\%$
test_nested_getitemleaf 34.4820μs 6.1464μs 162.6961 KOps/s 163.1264 KOps/s $\color{#d91a1a}-0.26\%$
test_nested_getitem 28.4610μs 5.8136μs 172.0111 KOps/s 171.6859 KOps/s $\color{#35bf28}+0.19\%$
test_stacked_getitemleaf 33.5810μs 6.1888μs 161.5823 KOps/s 162.8203 KOps/s $\color{#d91a1a}-0.76\%$
test_stacked_getitem 44.4420μs 5.7988μs 172.4498 KOps/s 175.6152 KOps/s $\color{#d91a1a}-1.80\%$
test_lock_nested 4.5733ms 0.4272ms 2.3410 KOps/s 2.3055 KOps/s $\color{#35bf28}+1.54\%$
test_lock_stack_nested 0.4360ms 0.3926ms 2.5468 KOps/s 2.5105 KOps/s $\color{#35bf28}+1.45\%$
test_unlock_nested 0.7533ms 0.3607ms 2.7721 KOps/s 2.7085 KOps/s $\color{#35bf28}+2.35\%$
test_unlock_stack_nested 0.3706ms 0.3297ms 3.0333 KOps/s 2.9900 KOps/s $\color{#35bf28}+1.45\%$
test_flatten_speed 0.1600ms 78.1270μs 12.7997 KOps/s 12.7920 KOps/s $\color{#35bf28}+0.06\%$
test_unflatten_speed 0.3635ms 0.3208ms 3.1177 KOps/s 3.1185 KOps/s $\color{#d91a1a}-0.03\%$
test_common_ops 1.5143ms 1.2250ms 816.3118 Ops/s 730.8770 Ops/s $\textbf{\color{#35bf28}+11.69\%}$
test_creation 30.6410μs 1.4964μs 668.2687 KOps/s 676.2183 KOps/s $\color{#d91a1a}-1.18\%$
test_creation_empty 42.7310μs 14.5164μs 68.8874 KOps/s 60.2923 KOps/s $\textbf{\color{#35bf28}+14.26\%}$
test_creation_nested_1 45.6720μs 15.9990μs 62.5039 KOps/s 52.9094 KOps/s $\textbf{\color{#35bf28}+18.13\%}$
test_creation_nested_2 48.9820μs 19.0047μs 52.6185 KOps/s 45.3436 KOps/s $\textbf{\color{#35bf28}+16.04\%}$
test_clone 71.6330μs 27.9488μs 35.7797 KOps/s 32.1693 KOps/s $\textbf{\color{#35bf28}+11.22\%}$
test_getitem[int] 91.5988ms 23.5459μs 42.4702 KOps/s 62.5557 KOps/s $\textbf{\color{#d91a1a}-32.11\%}$
test_getitem[slice_int] 0.1278ms 28.0825μs 35.6094 KOps/s 34.8012 KOps/s $\color{#35bf28}+2.32\%$
test_getitem[range] 0.2244ms 0.1141ms 8.7656 KOps/s 8.9878 KOps/s $\color{#d91a1a}-2.47\%$
test_getitem[tuple] 0.1188ms 24.4917μs 40.8301 KOps/s 40.5893 KOps/s $\color{#35bf28}+0.59\%$
test_getitem[list] 0.2063ms 0.1093ms 9.1480 KOps/s 9.4444 KOps/s $\color{#d91a1a}-3.14\%$
test_setitem_dim[int] 65.9530μs 44.7480μs 22.3474 KOps/s 20.4413 KOps/s $\textbf{\color{#35bf28}+9.32\%}$
test_setitem_dim[slice_int] 94.5840μs 68.0471μs 14.6957 KOps/s 13.7793 KOps/s $\textbf{\color{#35bf28}+6.65\%}$
test_setitem_dim[range] 0.1824ms 0.1313ms 7.6167 KOps/s 7.3609 KOps/s $\color{#35bf28}+3.47\%$
test_setitem_dim[tuple] 0.1009ms 63.0142μs 15.8694 KOps/s 15.1361 KOps/s $\color{#35bf28}+4.84\%$
test_setitem 83.4840μs 44.3908μs 22.5272 KOps/s 21.9752 KOps/s $\color{#35bf28}+2.51\%$
test_set 91.8740μs 43.1105μs 23.1962 KOps/s 22.2056 KOps/s $\color{#35bf28}+4.46\%$
test_set_shared 0.3661ms 56.0150μs 17.8524 KOps/s 17.1119 KOps/s $\color{#35bf28}+4.33\%$
test_update 0.1320ms 49.0738μs 20.3775 KOps/s 18.1660 KOps/s $\textbf{\color{#35bf28}+12.17\%}$
test_update_nested 96.1350μs 57.4985μs 17.3918 KOps/s 15.7102 KOps/s $\textbf{\color{#35bf28}+10.70\%}$
test_update__nested 0.3989ms 66.8695μs 14.9545 KOps/s 14.8813 KOps/s $\color{#35bf28}+0.49\%$
test_set_nested 88.6340μs 45.9416μs 21.7668 KOps/s 20.4215 KOps/s $\textbf{\color{#35bf28}+6.59\%}$
test_set_nested_new 96.3640μs 49.4156μs 20.2365 KOps/s 19.4171 KOps/s $\color{#35bf28}+4.22\%$
test_select 0.1298ms 62.2082μs 16.0751 KOps/s 15.4840 KOps/s $\color{#35bf28}+3.82\%$
test_select_nested 0.2265ms 41.8544μs 23.8923 KOps/s 23.4343 KOps/s $\color{#35bf28}+1.95\%$
test_exclude_nested 82.1930μs 58.6922μs 17.0380 KOps/s 16.6233 KOps/s $\color{#35bf28}+2.49\%$
test_empty[True] 0.3281ms 0.2576ms 3.8822 KOps/s 3.8727 KOps/s $\color{#35bf28}+0.25\%$
test_empty[False] 4.9502μs 0.7363μs 1.3581 MOps/s 1.3263 MOps/s $\color{#35bf28}+2.40\%$
test_to 54.4420μs 26.8682μs 37.2187 KOps/s 35.7306 KOps/s $\color{#35bf28}+4.16\%$
test_to_nonblocking 57.6020μs 24.8700μs 40.2091 KOps/s 40.4588 KOps/s $\color{#d91a1a}-0.62\%$
test_unbind_speed 1.5893ms 0.2776ms 3.6029 KOps/s 3.4705 KOps/s $\color{#35bf28}+3.82\%$
test_unbind_speed_stack0 0.3203ms 0.2740ms 3.6499 KOps/s 3.4359 KOps/s $\textbf{\color{#35bf28}+6.23\%}$
test_unbind_speed_stack1 91.2175ms 0.7121ms 1.4043 KOps/s 1.3962 KOps/s $\color{#35bf28}+0.59\%$
test_split 93.2305ms 2.2135ms 451.7698 Ops/s 454.9368 Ops/s $\color{#d91a1a}-0.70\%$
test_chunk 94.0592ms 2.2109ms 452.3093 Ops/s 450.2364 Ops/s $\color{#35bf28}+0.46\%$
test_creation[device0] 0.3407ms 0.1295ms 7.7234 KOps/s 7.5816 KOps/s $\color{#35bf28}+1.87\%$
test_creation_from_tensor 0.4754ms 0.1307ms 7.6531 KOps/s 7.4637 KOps/s $\color{#35bf28}+2.54\%$
test_add_one[memmap_tensor0] 0.2341ms 8.5775μs 116.5838 KOps/s 107.2992 KOps/s $\textbf{\color{#35bf28}+8.65\%}$
test_contiguous[memmap_tensor0] 37.5920μs 2.1776μs 459.2191 KOps/s 443.5552 KOps/s $\color{#35bf28}+3.53\%$
test_stack[memmap_tensor0] 40.1520μs 6.7070μs 149.0974 KOps/s 147.7446 KOps/s $\color{#35bf28}+0.92\%$
test_memmaptd_index 1.2685ms 0.4318ms 2.3158 KOps/s 2.3221 KOps/s $\color{#d91a1a}-0.27\%$
test_memmaptd_index_astensor 0.7622ms 0.5018ms 1.9927 KOps/s 1.9791 KOps/s $\color{#35bf28}+0.69\%$
test_memmaptd_index_op 1.4036ms 1.0183ms 981.9938 Ops/s 927.3950 Ops/s $\textbf{\color{#35bf28}+5.89\%}$
test_serialize_model 0.1316s 0.1303s 7.6744 Ops/s 7.6669 Ops/s $\color{#35bf28}+0.10\%$
test_serialize_model_pickle 1.3613s 1.2144s 0.8234 Ops/s 0.8235 Ops/s $-0.01\%$
test_serialize_weights 0.2277s 0.1436s 6.9624 Ops/s 7.7160 Ops/s $\textbf{\color{#d91a1a}-9.77\%}$
test_serialize_weights_returnearly 0.2105s 55.1783ms 18.1231 Ops/s 17.6211 Ops/s $\color{#35bf28}+2.85\%$
test_serialize_weights_pickle 1.3753s 1.2189s 0.8204 Ops/s 0.7221 Ops/s $\textbf{\color{#35bf28}+13.62\%}$
test_reshape_pytree 78.1030μs 35.7044μs 28.0078 KOps/s 28.4746 KOps/s $\color{#d91a1a}-1.64\%$
test_reshape_td 78.0140μs 42.0331μs 23.7908 KOps/s 24.3149 KOps/s $\color{#d91a1a}-2.16\%$
test_view_pytree 62.8730μs 35.4501μs 28.2087 KOps/s 28.3682 KOps/s $\color{#d91a1a}-0.56\%$
test_view_td 88.9440μs 47.4737μs 21.0643 KOps/s 22.3874 KOps/s $\textbf{\color{#d91a1a}-5.91\%}$
test_unbind_pytree 60.9430μs 33.6828μs 29.6887 KOps/s 29.5862 KOps/s $\color{#35bf28}+0.35\%$
test_unbind_td 0.3997ms 42.5154μs 23.5209 KOps/s 24.2655 KOps/s $\color{#d91a1a}-3.07\%$
test_split_pytree 90.5840μs 48.1713μs 20.7593 KOps/s 22.4498 KOps/s $\textbf{\color{#d91a1a}-7.53\%}$
test_split_td 93.5714ms 65.5710μs 15.2506 KOps/s 18.2028 KOps/s $\textbf{\color{#d91a1a}-16.22\%}$
test_add_pytree 0.1015ms 59.6343μs 16.7689 KOps/s 17.5215 KOps/s $\color{#d91a1a}-4.30\%$
test_add_td 0.1357ms 96.8964μs 10.3203 KOps/s 10.7563 KOps/s $\color{#d91a1a}-4.05\%$
test_compile_add_one_nested[tensordict-compile] 0.2150ms 0.1601ms 6.2463 KOps/s 6.1086 KOps/s $\color{#35bf28}+2.25\%$
test_compile_add_one_nested[tensordict-eager] 0.2182ms 0.1617ms 6.1830 KOps/s 5.9952 KOps/s $\color{#35bf28}+3.13\%$
test_compile_add_one_nested[pytree-compile] 0.1910ms 0.1433ms 6.9782 KOps/s 6.8495 KOps/s $\color{#35bf28}+1.88\%$
test_compile_add_one_nested[pytree-eager] 0.2330ms 0.1846ms 5.4172 KOps/s 5.2868 KOps/s $\color{#35bf28}+2.47\%$
test_compile_copy_nested[tensordict-compile] 60.2130μs 21.7891μs 45.8945 KOps/s 49.3932 KOps/s $\textbf{\color{#d91a1a}-7.08\%}$
test_compile_copy_nested[tensordict-eager] 75.4440μs 49.1062μs 20.3640 KOps/s 20.7918 KOps/s $\color{#d91a1a}-2.06\%$
test_compile_copy_nested[pytree-compile] 0.2667ms 64.7334μs 15.4480 KOps/s 15.6874 KOps/s $\color{#d91a1a}-1.53\%$
test_compile_copy_nested[pytree-eager] 97.7440μs 49.8723μs 20.0512 KOps/s 20.4150 KOps/s $\color{#d91a1a}-1.78\%$
test_compile_add_one_flat[tensordict-compile] 0.3714ms 0.3185ms 3.1394 KOps/s 3.1351 KOps/s $\color{#35bf28}+0.14\%$
test_compile_add_one_flat[tensordict-eager] 0.3547ms 0.2335ms 4.2818 KOps/s 4.2156 KOps/s $\color{#35bf28}+1.57\%$
test_compile_add_one_flat[tensorclass-compile] 0.1662ms 0.1274ms 7.8504 KOps/s 7.8270 KOps/s $\color{#35bf28}+0.30\%$
test_compile_add_one_flat[tensorclass-eager] 0.1462ms 65.9187μs 15.1702 KOps/s 14.9693 KOps/s $\color{#35bf28}+1.34\%$
test_compile_add_one_flat[pytree-compile] 0.3726ms 0.3178ms 3.1467 KOps/s 3.0541 KOps/s $\color{#35bf28}+3.03\%$
test_compile_add_one_flat[pytree-eager] 0.6995ms 0.6230ms 1.6050 KOps/s 1.5344 KOps/s $\color{#35bf28}+4.60\%$
test_compile_add_self_flat[tensordict-eager] 0.3361ms 0.2815ms 3.5518 KOps/s 3.4368 KOps/s $\color{#35bf28}+3.35\%$
test_compile_add_self_flat[tensordict-compile] 0.3721ms 0.3205ms 3.1202 KOps/s 3.0404 KOps/s $\color{#35bf28}+2.62\%$
test_compile_add_self_flat[tensorclass-eager] 0.1185ms 75.6073μs 13.2262 KOps/s 12.9825 KOps/s $\color{#35bf28}+1.88\%$
test_compile_add_self_flat[tensorclass-compile] 0.1870ms 0.1306ms 7.6543 KOps/s 7.7493 KOps/s $\color{#d91a1a}-1.23\%$
test_compile_add_self_flat[pytree-eager] 0.6774ms 0.5363ms 1.8645 KOps/s 1.8316 KOps/s $\color{#35bf28}+1.80\%$
test_compile_add_self_flat[pytree-compile] 0.4355ms 0.3177ms 3.1473 KOps/s 3.0850 KOps/s $\color{#35bf28}+2.02\%$
test_compile_copy_flat[tensordict-compile] 67.4330μs 19.6744μs 50.8274 KOps/s 53.4664 KOps/s $\color{#d91a1a}-4.94\%$
test_compile_copy_flat[tensordict-eager] 73.8130μs 38.1912μs 26.1841 KOps/s 25.8570 KOps/s $\color{#35bf28}+1.26\%$
test_compile_copy_flat[pytree-compile] 0.1085ms 69.3425μs 14.4212 KOps/s 14.2266 KOps/s $\color{#35bf28}+1.37\%$
test_compile_copy_flat[pytree-eager] 0.1161ms 50.8664μs 19.6593 KOps/s 19.2256 KOps/s $\color{#35bf28}+2.26\%$
test_compile_assign_and_add[tensordict-compile] 2.3443ms 0.7763ms 1.2882 KOps/s 1.2098 KOps/s $\textbf{\color{#35bf28}+6.49\%}$
test_compile_assign_and_add[tensordict-eager] 3.6253ms 3.2907ms 303.8866 Ops/s 294.5961 Ops/s $\color{#35bf28}+3.15\%$
test_compile_assign_and_add[pytree-compile] 2.3194ms 0.8143ms 1.2281 KOps/s 1.1328 KOps/s $\textbf{\color{#35bf28}+8.41\%}$
test_compile_assign_and_add[pytree-eager] 3.4140ms 3.2209ms 310.4699 Ops/s 299.6608 Ops/s $\color{#35bf28}+3.61\%$
test_compile_indexing[tensor-tensordict-compile] 0.5220ms 0.1123ms 8.9012 KOps/s 9.2307 KOps/s $\color{#d91a1a}-3.57\%$
test_compile_indexing[tensor-tensordict-eager] 0.5555ms 67.3338μs 14.8514 KOps/s 16.1357 KOps/s $\textbf{\color{#d91a1a}-7.96\%}$
test_compile_indexing[tensor-tensorclass-compile] 0.1546ms 0.1083ms 9.2309 KOps/s 9.7112 KOps/s $\color{#d91a1a}-4.95\%$
test_compile_indexing[tensor-tensorclass-eager] 0.4333ms 48.4713μs 20.6308 KOps/s 22.5972 KOps/s $\textbf{\color{#d91a1a}-8.70\%}$
test_compile_indexing[tensor-pytree-compile] 0.5227ms 0.1101ms 9.0831 KOps/s 9.3859 KOps/s $\color{#d91a1a}-3.23\%$
test_compile_indexing[tensor-pytree-eager] 0.4324ms 48.5024μs 20.6175 KOps/s 21.3304 KOps/s $\color{#d91a1a}-3.34\%$
test_compile_indexing[slice-tensordict-compile] 0.1961ms 0.1441ms 6.9393 KOps/s 7.2861 KOps/s $\color{#d91a1a}-4.76\%$
test_compile_indexing[slice-tensordict-eager] 0.4277ms 27.0634μs 36.9503 KOps/s 38.8872 KOps/s $\color{#d91a1a}-4.98\%$
test_compile_indexing[slice-tensorclass-compile] 0.1809ms 0.1319ms 7.5793 KOps/s 7.6476 KOps/s $\color{#d91a1a}-0.89\%$
test_compile_indexing[slice-tensorclass-eager] 0.4338ms 23.6769μs 42.2352 KOps/s 48.1760 KOps/s $\textbf{\color{#d91a1a}-12.33\%}$
test_compile_indexing[slice-pytree-compile] 0.5461ms 0.1415ms 7.0659 KOps/s 7.3514 KOps/s $\color{#d91a1a}-3.88\%$
test_compile_indexing[slice-pytree-eager] 0.4266ms 23.1595μs 43.1788 KOps/s 47.8237 KOps/s $\textbf{\color{#d91a1a}-9.71\%}$
test_compile_indexing[int-tensordict-compile] 0.5528ms 0.1484ms 6.7378 KOps/s 7.2401 KOps/s $\textbf{\color{#d91a1a}-6.94\%}$
test_compile_indexing[int-tensordict-eager] 0.4983ms 26.5589μs 37.6522 KOps/s 39.5214 KOps/s $\color{#d91a1a}-4.73\%$
test_compile_indexing[int-tensorclass-compile] 0.5412ms 0.1412ms 7.0836 KOps/s 7.5413 KOps/s $\textbf{\color{#d91a1a}-6.07\%}$
test_compile_indexing[int-tensorclass-eager] 0.4196ms 22.5162μs 44.4124 KOps/s 46.0160 KOps/s $\color{#d91a1a}-3.48\%$
test_compile_indexing[int-pytree-compile] 0.5342ms 0.1416ms 7.0600 KOps/s 7.5301 KOps/s $\textbf{\color{#d91a1a}-6.24\%}$
test_compile_indexing[int-pytree-eager] 0.4161ms 22.3239μs 44.7951 KOps/s 47.6181 KOps/s $\textbf{\color{#d91a1a}-5.93\%}$
test_mod_add[eager] 0.4517ms 33.7340μs 29.6436 KOps/s 29.3292 KOps/s $\color{#35bf28}+1.07\%$
test_mod_add[compile] 0.4836ms 70.2691μs 14.2310 KOps/s 13.8665 KOps/s $\color{#35bf28}+2.63\%$
test_mod_add[compile-overhead] 0.2569ms 0.1302ms 7.6816 KOps/s 7.2711 KOps/s $\textbf{\color{#35bf28}+5.65\%}$
test_mod_wrap[eager] 0.3180ms 0.2406ms 4.1560 KOps/s 4.0414 KOps/s $\color{#35bf28}+2.84\%$
test_mod_wrap[compile] 1.4278ms 0.3029ms 3.3018 KOps/s 3.2383 KOps/s $\color{#35bf28}+1.96\%$
test_mod_wrap[compile-overhead] 7.6272ms 4.0612ms 246.2328 Ops/s 251.0922 Ops/s $\color{#d91a1a}-1.94\%$
test_mod_wrap_and_backward[eager] 1.5563ms 1.3594ms 735.6117 Ops/s 686.2701 Ops/s $\textbf{\color{#35bf28}+7.19\%}$
test_mod_wrap_and_backward[compile] 1.5820ms 1.3195ms 757.8431 Ops/s 683.7936 Ops/s $\textbf{\color{#35bf28}+10.83\%}$
test_mod_wrap_and_backward[compile-overhead] 1.3713ms 0.9207ms 1.0861 KOps/s 913.1926 Ops/s $\textbf{\color{#35bf28}+18.93\%}$
test_seq_add[eager] 0.1568ms 98.6106μs 10.1409 KOps/s 9.2648 KOps/s $\textbf{\color{#35bf28}+9.46\%}$
test_seq_add[compile] 0.1426ms 82.8717μs 12.0668 KOps/s 11.4551 KOps/s $\textbf{\color{#35bf28}+5.34\%}$
test_seq_add[compile-overhead] 0.1634ms 0.1133ms 8.8246 KOps/s 8.2253 KOps/s $\textbf{\color{#35bf28}+7.29\%}$
test_seq_wrap[eager] 0.4630ms 0.3799ms 2.6322 KOps/s 2.4818 KOps/s $\textbf{\color{#35bf28}+6.06\%}$
test_seq_wrap[compile] 0.3988ms 0.3099ms 3.2273 KOps/s 3.1157 KOps/s $\color{#35bf28}+3.58\%$
test_seq_wrap[compile-overhead] 0.2853ms 0.2217ms 4.5100 KOps/s 4.5033 KOps/s $\color{#35bf28}+0.15\%$
test_func_call_runtime[False-eager] 0.9309ms 0.7821ms 1.2786 KOps/s 1.3162 KOps/s $\color{#d91a1a}-2.86\%$
test_func_call_runtime[False-compile] 0.8763ms 0.7732ms 1.2933 KOps/s 1.2358 KOps/s $\color{#35bf28}+4.65\%$
test_func_call_runtime[False-compile-overhead] 0.4272ms 0.3593ms 2.7830 KOps/s 2.7639 KOps/s $\color{#35bf28}+0.69\%$
test_func_call_runtime[True-eager] 1.0087ms 0.9077ms 1.1017 KOps/s 1.0836 KOps/s $\color{#35bf28}+1.67\%$
test_func_call_runtime[True-compile] 0.8467ms 0.7934ms 1.2604 KOps/s 1.1664 KOps/s $\textbf{\color{#35bf28}+8.06\%}$
test_func_call_runtime[True-compile-overhead] 0.4424ms 0.3816ms 2.6208 KOps/s 2.5743 KOps/s $\color{#35bf28}+1.81\%$
test_func_call_cm_runtime[False-eager] 0.8129ms 0.7377ms 1.3556 KOps/s 1.3101 KOps/s $\color{#35bf28}+3.47\%$
test_func_call_cm_runtime[False-compile] 0.8333ms 0.7735ms 1.2928 KOps/s 1.2200 KOps/s $\textbf{\color{#35bf28}+5.96\%}$
test_func_call_cm_runtime[False-compile-overhead] 0.4206ms 0.3610ms 2.7700 KOps/s 2.7143 KOps/s $\color{#35bf28}+2.05\%$
test_func_call_cm_runtime[True-eager] 1.1074ms 1.0140ms 986.2151 Ops/s 923.3088 Ops/s $\textbf{\color{#35bf28}+6.81\%}$
test_func_call_cm_runtime[True-compile] 1.0283ms 0.8249ms 1.2123 KOps/s 1.1642 KOps/s $\color{#35bf28}+4.13\%$
test_func_call_cm_runtime[True-compile-overhead] 0.4499ms 0.4071ms 2.4565 KOps/s 2.4390 KOps/s $\color{#35bf28}+0.72\%$
test_vmap_func_call_cm_runtime[eager] 2.5326ms 2.0938ms 477.6090 Ops/s 472.3964 Ops/s $\color{#35bf28}+1.10\%$
test_vmap_func_call_cm_runtime[compile] 0.9353ms 0.8468ms 1.1809 KOps/s 1.1505 KOps/s $\color{#35bf28}+2.65\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.4623ms 0.4142ms 2.4142 KOps/s 2.4241 KOps/s $\color{#d91a1a}-0.41\%$
test_distributed 5.0651ms 0.1706ms 5.8619 KOps/s 8.5851 KOps/s $\textbf{\color{#d91a1a}-31.72\%}$
test_tdmodule 0.5029ms 14.6183μs 68.4073 KOps/s 61.3333 KOps/s $\textbf{\color{#35bf28}+11.53\%}$
test_tdmodule_dispatch 48.8120μs 27.5487μs 36.2994 KOps/s 31.5798 KOps/s $\textbf{\color{#35bf28}+14.94\%}$
test_tdseq 36.8020μs 14.9470μs 66.9030 KOps/s 58.5923 KOps/s $\textbf{\color{#35bf28}+14.18\%}$
test_tdseq_dispatch 51.2330μs 29.7287μs 33.6375 KOps/s 30.2374 KOps/s $\textbf{\color{#35bf28}+11.24\%}$
test_instantiation_functorch 2.1438ms 1.8520ms 539.9678 Ops/s 531.5665 Ops/s $\color{#35bf28}+1.58\%$
test_instantiation_td 1.7749ms 1.1844ms 844.3299 Ops/s 820.6673 Ops/s $\color{#35bf28}+2.88\%$
test_exec_functorch 0.2629ms 0.2168ms 4.6119 KOps/s 4.7545 KOps/s $\color{#d91a1a}-3.00\%$
test_exec_functional_call 0.2712ms 0.2194ms 4.5579 KOps/s 4.7575 KOps/s $\color{#d91a1a}-4.19\%$
test_exec_td 0.2858ms 0.2444ms 4.0921 KOps/s 4.2401 KOps/s $\color{#d91a1a}-3.49\%$
test_exec_td_decorator 1.0526ms 0.2727ms 3.6666 KOps/s 3.5821 KOps/s $\color{#35bf28}+2.36\%$
test_vmap_mlp_speed[True-True] 0.8087ms 0.7066ms 1.4152 KOps/s 1.3451 KOps/s $\textbf{\color{#35bf28}+5.21\%}$
test_vmap_mlp_speed[True-False] 0.7891ms 0.7039ms 1.4206 KOps/s 1.3526 KOps/s $\textbf{\color{#35bf28}+5.03\%}$
test_vmap_mlp_speed[False-True] 0.6837ms 0.6001ms 1.6664 KOps/s 1.5836 KOps/s $\textbf{\color{#35bf28}+5.23\%}$
test_vmap_mlp_speed[False-False] 0.6610ms 0.6005ms 1.6654 KOps/s 1.5850 KOps/s $\textbf{\color{#35bf28}+5.07\%}$
test_vmap_mlp_speed_decorator[True-True] 1.3063ms 0.6771ms 1.4769 KOps/s 1.4031 KOps/s $\textbf{\color{#35bf28}+5.26\%}$
test_vmap_mlp_speed_decorator[True-False] 0.8006ms 0.6772ms 1.4766 KOps/s 1.4354 KOps/s $\color{#35bf28}+2.87\%$
test_vmap_mlp_speed_decorator[False-True] 0.7293ms 0.6003ms 1.6659 KOps/s 1.6648 KOps/s $\color{#35bf28}+0.07\%$
test_vmap_mlp_speed_decorator[False-False] 0.6812ms 0.5982ms 1.6716 KOps/s 1.6499 KOps/s $\color{#35bf28}+1.31\%$
test_vmap_transformer_speed[True-True] 8.8073ms 8.5022ms 117.6159 Ops/s 116.6748 Ops/s $\color{#35bf28}+0.81\%$
test_vmap_transformer_speed[True-False] 8.5557ms 8.4818ms 117.8996 Ops/s 117.5326 Ops/s $\color{#35bf28}+0.31\%$
test_vmap_transformer_speed[False-True] 8.7518ms 8.3507ms 119.7506 Ops/s 119.6910 Ops/s $\color{#35bf28}+0.05\%$
test_vmap_transformer_speed[False-False] 8.4011ms 8.3061ms 120.3935 Ops/s 119.6322 Ops/s $\color{#35bf28}+0.64\%$
test_vmap_transformer_speed_decorator[True-True] 19.7072ms 19.6110ms 50.9918 Ops/s 50.7112 Ops/s $\color{#35bf28}+0.55\%$
test_vmap_transformer_speed_decorator[True-False] 20.3317ms 19.6375ms 50.9231 Ops/s 50.7943 Ops/s $\color{#35bf28}+0.25\%$
test_vmap_transformer_speed_decorator[False-True] 19.5750ms 19.5210ms 51.2269 Ops/s 51.4851 Ops/s $\color{#d91a1a}-0.50\%$
test_vmap_transformer_speed_decorator[False-False] 20.7537ms 19.5944ms 51.0349 Ops/s 51.4066 Ops/s $\color{#d91a1a}-0.72\%$
test_to_module_speed[True] 1.0819ms 0.9925ms 1.0075 KOps/s 984.0650 Ops/s $\color{#35bf28}+2.39\%$
test_to_module_speed[False] 1.0540ms 0.9575ms 1.0444 KOps/s 1.0184 KOps/s $\color{#35bf28}+2.55\%$
test_tc_init 61.6330μs 32.8911μs 30.4034 KOps/s 28.7251 KOps/s $\textbf{\color{#35bf28}+5.84\%}$
test_tc_init_nested 0.1089ms 66.7334μs 14.9850 KOps/s 14.0367 KOps/s $\textbf{\color{#35bf28}+6.76\%}$
test_tc_first_layer_tensor 4.3401μs 0.6767μs 1.4777 MOps/s 1.4834 MOps/s $\color{#d91a1a}-0.38\%$
test_tc_first_layer_nontensor 47.9920μs 2.1494μs 465.2556 KOps/s 443.7072 KOps/s $\color{#35bf28}+4.86\%$
test_tc_second_layer_tensor 27.8313μs 1.3796μs 724.8274 KOps/s 728.4522 KOps/s $\color{#d91a1a}-0.50\%$
test_tc_second_layer_nontensor 29.6710μs 2.9425μs 339.8428 KOps/s 340.0758 KOps/s $\color{#d91a1a}-0.07\%$
test_unbind 0.1891s 10.9452ms 91.3643 Ops/s 101.4344 Ops/s $\textbf{\color{#d91a1a}-9.93\%}$
test_full_like 0.6578ms 0.5723ms 1.7472 KOps/s 1.7460 KOps/s $\color{#35bf28}+0.07\%$
test_zeros_like 0.2782ms 0.1977ms 5.0577 KOps/s 5.0540 KOps/s $\color{#35bf28}+0.07\%$
test_ones_like 0.2430ms 0.1974ms 5.0653 KOps/s 5.0598 KOps/s $\color{#35bf28}+0.11\%$
test_clone 0.4529ms 0.4145ms 2.4125 KOps/s 2.4137 KOps/s $\color{#d91a1a}-0.05\%$
test_squeeze 33.6510μs 9.4000μs 106.3830 KOps/s 100.5203 KOps/s $\textbf{\color{#35bf28}+5.83\%}$
test_unsqueeze 0.2703ms 75.6207μs 13.2239 KOps/s 13.4720 KOps/s $\color{#d91a1a}-1.84\%$
test_split 0.3009ms 0.1594ms 6.2730 KOps/s 6.1623 KOps/s $\color{#35bf28}+1.80\%$
test_permute 0.3098ms 0.1853ms 5.3963 KOps/s 5.6074 KOps/s $\color{#d91a1a}-3.76\%$
test_stack 1.2478ms 0.8622ms 1.1598 KOps/s 1.1896 KOps/s $\color{#d91a1a}-2.51\%$
test_cat 1.2511ms 1.2319ms 811.7643 Ops/s 812.0282 Ops/s $\color{#d91a1a}-0.03\%$

@vmoens vmoens merged commit 5c32c8b into gh/vmoens/28/base Oct 7, 2024
33 of 51 checks passed
vmoens pushed a commit that referenced this pull request Oct 7, 2024
ghstack-source-id: ff5ade8
Pull Request resolved: #1033
@vmoens vmoens deleted the gh/vmoens/28/head branch October 7, 2024 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Undefined behavior with torch.cat
2 participants
0