8000 [Feature] give a `__name__` to TDModules by vmoens · Pull Request #1045 · pytorch/tensordict · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[Feature] give a __name__ to TDModules #1045

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 16, 2024
Merged

Conversation

vmoens
Copy link
Collaborator
@vmoens vmoens commented Oct 16, 2024

Stack from ghstack (oldest at bottom):

[ghstack-poisoned]
vmoens pushed a commit that referenced this pull request Oct 16, 2024
ghstack-source-id: 95a9f41
Pull Request resolved: #1045
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 16, 2024
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 216. Improved: $\large\color{#35bf28}12$. Worsened: $\large\color{#d91a1a}22$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 44.9830μs 25.0409μs 39.9347 KOps/s 40.8933 KOps/s $\color{#d91a1a}-2.34\%$
test_plain_set_stack_nested 52.1670μs 25.1637μs 39.7398 KOps/s 40.3343 KOps/s $\color{#d91a1a}-1.47\%$
test_plain_set_nested_inplace 65.5620μs 27.1815μs 36.7897 KOps/s 36.4600 KOps/s $\color{#35bf28}+0.90\%$
test_plain_set_stack_nested_inplace 81.9020μs 27.2700μs 36.6704 KOps/s 37.6965 KOps/s $\color{#d91a1a}-2.72\%$
test_items 20.1270μs 4.2339μs 236.1916 KOps/s 238.7874 KOps/s $\color{#d91a1a}-1.09\%$
test_items_nested 0.4464ms 0.3810ms 2.6246 KOps/s 2.5804 KOps/s $\color{#35bf28}+1.71\%$
test_items_nested_locked 0.6845ms 0.3832ms 2.6095 KOps/s 2.5751 KOps/s $\color{#35bf28}+1.34\%$
test_items_nested_leaf 0.1558ms 80.4003μs 12.4378 KOps/s 12.2628 KOps/s $\color{#35bf28}+1.43\%$
test_items_stack_nested 0.4345ms 0.3839ms 2.6046 KOps/s 2.5397 KOps/s $\color{#35bf28}+2.55\%$
test_items_stack_nested_leaf 0.1155ms 80.3846μs 12.4402 KOps/s 11.9205 KOps/s $\color{#35bf28}+4.36\%$
test_items_stack_nested_locked 0.4684ms 0.3856ms 2.5931 KOps/s 2.5398 KOps/s $\color{#35bf28}+2.10\%$
test_keys 23.4030μs 3.5037μs 285.4112 KOps/s 285.2164 KOps/s $\color{#35bf28}+0.07\%$
test_keys_nested 0.2333ms 0.1342ms 7.4521 KOps/s 7.4389 KOps/s $\color{#35bf28}+0.18\%$
test_keys_nested_locked 1.7781ms 0.1388ms 7.2038 KOps/s 7.1114 KOps/s $\color{#35bf28}+1.30\%$
test_keys_nested_leaf 0.2037ms 0.1172ms 8.5311 KOps/s 8.4737 KOps/s $\color{#35bf28}+0.68\%$
test_keys_stack_nested 0.2371ms 0.1340ms 7.4648 KOps/s 7.3573 KOps/s $\color{#35bf28}+1.46\%$
test_keys_stack_nested_leaf 0.1869ms 0.1164ms 8.5943 KOps/s 8.5018 KOps/s $\color{#35bf28}+1.09\%$
test_keys_stack_nested_locked 0.2122ms 0.1387ms 7.2082 KOps/s 7.0888 KOps/s $\color{#35bf28}+1.68\%$
test_values 4.5584μs 1.0470μs 955.0976 KOps/s 960.0891 KOps/s $\color{#d91a1a}-0.52\%$
test_values_nested 0.1607ms 94.7216μs 10.5573 KOps/s 10.5093 KOps/s $\color{#35bf28}+0.46\%$
test_values_nested_locked 0.1656ms 94.2026μs 10.6154 KOps/s 10.5757 KOps/s $\color{#35bf28}+0.38\%$
test_values_nested_leaf 0.1447ms 80.2187μs 12.4659 KOps/s 12.5374 KOps/s $\color{#d91a1a}-0.57\%$
test_values_stack_nested 0.1708ms 94.2612μs 10.6088 KOps/s 10.6173 KOps/s $\color{#d91a1a}-0.08\%$
test_values_stack_nested_leaf 0.1432ms 79.4132μs 12.5924 KOps/s 12.0336 KOps/s $\color{#35bf28}+4.64\%$
test_values_stack_nested_locked 0.1530ms 91.4777μs 10.9316 KOps/s 10.7009 KOps/s $\color{#35bf28}+2.16\%$
test_membership 20.1980μs 0.9374μs 1.0668 MOps/s 1.2994 MOps/s $\textbf{\color{#d91a1a}-17.90\%}$
test_membership_nested 28.3520μs 2.7248μs 366.9948 KOps/s 364.0260 KOps/s $\color{#35bf28}+0.82\%$
test_membership_nested_leaf 28.2130μs 2.7072μs 369.3898 KOps/s 360.9347 KOps/s $\color{#35bf28}+2.34\%$
test_membership_stacked_nested 15.6390μs 2.6910μs 371.6077 KOps/s 364.5771 KOps/s $\color{#35bf28}+1.93\%$
test_membership_stacked_nested_leaf 31.2380μs 2.7077μs 369.3168 KOps/s 363.0918 KOps/s $\color{#35bf28}+1.71\%$
test_membership_nested_last 26.9000μs 4.1459μs 241.2009 KOps/s 239.3024 KOps/s $\color{#35bf28}+0.79\%$
test_membership_nested_leaf_last 28.4520μs 4.1523μs 240.8302 KOps/s 237.0821 KOps/s $\color{#35bf28}+1.58\%$
test_membership_stacked_nested_last 19.2760μs 4.0977μs 244.0391 KOps/s 237.7569 KOps/s $\color{#35bf28}+2.64\%$
test_membership_stacked_nested_leaf_last 31.4180μs 4.1041μs 243.6561 KOps/s 234.5497 KOps/s $\color{#35bf28}+3.88\%$
test_nested_getleaf 29.4540μs 10.4666μs 95.5417 KOps/s 93.2341 KOps/s $\color{#35bf28}+2.48\%$
test_nested_get 46.6700μs 9.9122μs 100.8859 KOps/s 98.2494 KOps/s $\color{#35bf28}+2.68\%$
test_stacked_getleaf 35.7570μs 10.2712μs 97.3598 KOps/s 92.4231 KOps/s $\textbf{\color{#35bf28}+5.34\%}$
test_stacked_get 38.7620μs 9.8856μs 101.1576 KOps/s 97.0857 KOps/s $\color{#35bf28}+4.19\%$
test_nested_getitemleaf 31.6190μs 10.9829μs 91.0509 KOps/s 88.2319 KOps/s $\color{#35bf28}+3.19\%$
test_nested_getitem 28.9540μs 10.3498μs 96.6203 KOps/s 95.2620 KOps/s $\color{#35bf28}+1.43\%$
test_stacked_getitemleaf 38.3510μs 11.0817μs 90.2385 KOps/s 89.7451 KOps/s $\color{#35bf28}+0.55\%$
test_stacked_getitem 48.3370μs 10.3005μs 97.0829 KOps/s 94.5254 KOps/s $\color{#35bf28}+2.71\%$
test_lock_nested 82.2851ms 0.5885ms 1.6993 KOps/s 1.9766 KOps/s $\textbf{\color{#d91a1a}-14.03\%}$
test_lock_stack_nested 0.8913ms 0.4754ms 2.1036 KOps/s 2.1262 KOps/s $\color{#d91a1a}-1.06\%$
test_unlock_nested 85.6553ms 0.5093ms 1.9634 KOps/s 2.3783 KOps/s $\textbf{\color{#d91a1a}-17.45\%}$
test_unlock_stack_nested 0.5845ms 0.3884ms 2.5744 KOps/s 2.5960 KOps/s $\color{#d91a1a}-0.83\%$
test_flatten_speed 0.2068ms 0.1012ms 9.8855 KOps/s 9.9570 KOps/s $\color{#d91a1a}-0.72\%$
test_unflatten_speed 0.6826ms 0.5042ms 1.9833 KOps/s 1.9347 KOps/s $\color{#35bf28}+2.51\%$
test_common_ops 5.9753ms 1.1828ms 845.4194 Ops/s 906.3947 Ops/s $\textbf{\color{#d91a1a}-6.73\%}$
test_creation 21.9410μs 2.1593μs 463.1067 KOps/s 470.0578 KOps/s $\color{#d91a1a}-1.48\%$
test_creation_empty 45.2540μs 19.7812μs 50.5531 KOps/s 56.8303 KOps/s $\textbf{\color{#d91a1a}-11.05\%}$
test_creation_nested_1 77.4230μs 22.8387μs 43.7853 KOps/s 48.6191 KOps/s $\textbf{\color{#d91a1a}-9.94\%}$
test_creation_nested_2 85.1690μs 27.4351μs 36.4497 KOps/s 39.4387 KOps/s $\textbf{\color{#d91a1a}-7.58\%}$
test_clone 67.6950μs 17.1244μs 58.3963 KOps/s 59.6673 KOps/s $\color{#d91a1a}-2.13\%$
test_getitem[int] 1.2705ms 17.1250μs 58.3941 KOps/s 62.1287 KOps/s $\textbf{\color{#d91a1a}-6.01\%}$
test_getitem[slice_int] 0.1389ms 31.4476μs 31.7989 KOps/s 32.9747 KOps/s $\color{#d91a1a}-3.57\%$
test_getitem[range] 0.1696ms 59.2550μs 16.8762 KOps/s 17.3928 KOps/s $\color{#d91a1a}-2.97\%$
test_getitem[tuple] 0.1352ms 25.7366μs 38.8551 KOps/s 39.9457 KOps/s $\color{#d91a1a}-2.73\%$
test_getitem[list] 0.1799ms 54.7243μs 18.2734 KOps/s 19.1041 KOps/s $\color{#d91a1a}-4.35\%$
test_setitem_dim[int] 60.2710μs 33.3383μs 29.9955 KOps/s 30.8994 KOps/s $\color{#d91a1a}-2.93\%$
test_setitem_dim[slice_int] 93.9540μs 61.5083μs 16.2580 KOps/s 16.4333 KOps/s $\color{#d91a1a}-1.07\%$
test_setitem_dim[range] 0.1274ms 83.6503μs 11.9545 KOps/s 12.0666 KOps/s $\color{#d91a1a}-0.93\%$
test_setitem_dim[tuple] 87.4920μs 49.2091μs 20.3214 KOps/s 20.2875 KOps/s $\color{#35bf28}+0.17\%$
test_setitem 89.4960μs 30.5970μs 32.6830 KOps/s 33.8663 KOps/s $\color{#d91a1a}-3.49\%$
test_set 85.3480μs 30.4121μs 32.8816 KOps/s 34.4815 KOps/s $\color{#d91a1a}-4.64\%$
test_set_shared 2.8819ms 0.2158ms 4.6340 KOps/s 4.6062 KOps/s $\color{#35bf28}+0.60\%$
test_update 0.1394ms 39.9705μs 25.0184 KOps/s 26.8332 KOps/s $\textbf{\color{#d91a1a}-6.76\%}$
test_update_nested 0.1238ms 51.4081μs 19.4522 KOps/s 20.8205 KOps/s $\textbf{\color{#d91a1a}-6.57\%}$
test_update__nested 0.5208ms 46.2360μs 21.6281 KOps/s 22.2562 KOps/s $\color{#d91a1a}-2.82\%$
test_set_nested 84.9370μs 33.8168μs 29.5711 KOps/s 31.4653 KOps/s $\textbf{\color{#d91a1a}-6.02\%}$
test_set_nested_new 0.1113ms 38.8270μs 25.7553 KOps/s 26.9662 KOps/s $\color{#d91a1a}-4.49\%$
test_select 0.1315ms 55.4032μs 18.0495 KOps/s 18.2516 KOps/s $\color{#d91a1a}-1.11\%$
test_select_nested 0.1382ms 59.4460μs 16.8220 KOps/s 16.4096 KOps/s $\color{#35bf28}+2.51\%$
test_exclude_nested 0.2979ms 76.1477μs 13.1324 KOps/s 13.2202 KOps/s $\color{#d91a1a}-0.66\%$
test_empty[True] 0.5392ms 0.3465ms 2.8862 KOps/s 2.8472 KOps/s $\color{#35bf28}+1.37\%$
test_empty[False] 27.3883μs 1.2249μs 816.3722 KOps/s 801.9669 KOps/s $\color{#35bf28}+1.80\%$
test_unbind_speed 0.5208ms 0.2961ms 3.3773 KOps/s 3.2896 KOps/s $\color{#35bf28}+2.67\%$
test_unbind_speed_stack0 0.4298ms 0.2973ms 3.3635 KOps/s 3.3904 KOps/s $\color{#d91a1a}-0.79\%$
test_unbind_speed_stack1 91.8529ms 0.8090ms 1.2360 KOps/s 1.3315 KOps/s $\textbf{\color{#d91a1a}-7.17\%}$
test_split 87.1848ms 2.2022ms 454.0926 Ops/s 474.2084 Ops/s $\color{#d91a1a}-4.24\%$
test_chunk 2.2379ms 2.0338ms 491.6912 Ops/s 468.9826 Ops/s $\color{#35bf28}+4.84\%$
test_creation[device0] 0.3549ms 0.1174ms 8.5146 KOps/s 8.7246 KOps/s $\color{#d91a1a}-2.41\%$
test_creation_from_tensor 4.2443ms 0.1172ms 8.5290 KOps/s 8.5614 KOps/s $\color{#d91a1a}-0.38\%$
test_add_one[memmap_tensor0] 0.1659ms 7.3957μs 135.2134 KOps/s 140.1172 KOps/s $\color{#d91a1a}-3.50\%$
test_contiguous[memmap_tensor0] 20.8580μs 1.9625μs 509.5461 KOps/s 532.9019 KOps/s $\color{#d91a1a}-4.38\%$
test_stack[memmap_tensor0] 41.5780μs 5.8878μs 169.8424 KOps/s 180.9387 KOps/s $\textbf{\color{#d91a1a}-6.13\%}$
test_memmaptd_index 1.2123ms 0.4206ms 2.3777 KOps/s 2.4127 KOps/s $\color{#d91a1a}-1.45\%$
test_memmaptd_index_astensor 0.9079ms 0.5196ms 1.9247 KOps/s 1.9328 KOps/s $\color{#d91a1a}-0.42\%$
test_memmaptd_index_op 1.5253ms 1.0907ms 916.8421 Ops/s 954.9775 Ops/s $\color{#d91a1a}-3.99\%$
test_serialize_model 0.1313s 0.1173s 8.5260 Ops/s 7.5364 Ops/s $\textbf{\color{#35bf28}+13.13\%}$
test_serialize_model_pickle 0.4479s 0.3935s 2.5410 Ops/s 2.4680 Ops/s $\color{#35bf28}+2.96\%$
test_serialize_weights 0.2112s 0.1311s 7.6287 Ops/s 8.5884 Ops/s $\textbf{\color{#d91a1a}-11.17\%}$
test_serialize_weights_returnearly 0.1744s 0.1598s 6.2592 Ops/s 6.4580 Ops/s $\color{#d91a1a}-3.08\%$
test_serialize_weights_pickle 0.5004s 0.4478s 2.2332 Ops/s 2.3607 Ops/s $\textbf{\color{#d91a1a}-5.40\%}$
test_serialize_weights_filesystem 0.1461s 0.1395s 7.1700 Ops/s 7.0618 Ops/s $\color{#35bf28}+1.53\%$
test_serialize_model_filesystem 0.2305s 0.1592s 6.2820 Ops/s 6.6360 Ops/s $\textbf{\color{#d91a1a}-5.33\%}$
test_reshape_pytree 98.7030μs 39.3687μs 25.4009 KOps/s 25.0860 KOps/s $\color{#35bf28}+1.26\%$
test_reshape_td 87.0920μs 45.2111μs 22.1184 KOps/s 21.2857 KOps/s $\color{#35bf28}+3.91\%$
test_view_pytree 77.3140μs 39.1802μs 25.5231 KOps/s 25.1192 KOps/s $\color{#35bf28}+1.61\%$
test_view_td 0.1209ms 51.5149μs 19.4119 KOps/s 18.0773 KOps/s $\textbf{\color{#35bf28}+7.38\%}$
test_unbind_pytree 0.1357ms 36.4704μs 27.4195 KOps/s 27.4958 KOps/s $\color{#d91a1a}-0.28\%$
test_unbind_td 0.3293ms 44.6631μs 22.3898 KOps/s 21.6918 KOps/s $\color{#35bf28}+3.22\%$
test_split_pytree 72.9150μs 37.9548μs 26.3471 KOps/s 26.2017 KOps/s $\color{#35bf28}+0.56\%$
test_split_td 0.5123ms 58.1026μs 17.2109 KOps/s 17.4679 KOps/s $\color{#d91a1a}-1.47\%$
test_add_pytree 96.0190μs 46.1974μs 21.6462 KOps/s 21.8733 KOps/s $\color{#d91a1a}-1.04\%$
test_add_td 0.1618ms 91.1379μs 10.9724 KOps/s 11.9807 KOps/s $\textbf{\color{#d91a1a}-8.42\%}$
test_compile_add_one_nested[tensordict-compile] 0.1485ms 57.9553μs 17.2547 KOps/s 16.7455 KOps/s $\color{#35bf28}+3.04\%$
test_compile_add_one_nested[tensordict-eager] 0.4053ms 0.1984ms 5.0415 KOps/s 4.9014 KOps/s $\color{#35bf28}+2.86\%$
test_compile_add_one_nested[pytree-compile] 0.1080ms 56.6058μs 17.6660 KOps/s 17.3935 KOps/s $\color{#35bf28}+1.57\%$
test_compile_add_one_nested[pytree-eager] 0.2751ms 0.1434ms 6.9721 KOps/s 7.0234 KOps/s $\color{#d91a1a}-0.73\%$
test_compile_copy_nested[tensordict-compile] 54.3900μs 23.3426μs 42.8401 KOps/s 42.1666 KOps/s $\color{#35bf28}+1.60\%$
test_compile_copy_nested[tensordict-eager] 0.1288ms 73.4996μs 13.6055 KOps/s 13.4783 KOps/s $\color{#35bf28}+0.94\%$
test_compile_copy_nested[pytree-compile] 0.1288ms 75.5322μs 13.2394 KOps/s 12.9382 KOps/s $\color{#35bf28}+2.33\%$
test_compile_copy_nested[pytree-eager] 0.1802ms 68.5283μs 14.5925 KOps/s 14.3124 KOps/s $\color{#35bf28}+1.96\%$
test_compile_add_one_flat[tensordict-compile] 0.3491ms 0.1843ms 5.4245 KOps/s 5.3831 KOps/s $\color{#35bf28}+0.77\%$
test_compile_add_one_flat[tensordict-eager] 0.3857ms 0.2401ms 4.1648 KOps/s 4.1635 KOps/s $\color{#35bf28}+0.03\%$
test_compile_add_one_flat[tensorclass-compile] 90.4280μs 47.1513μs 21.2083 KOps/s 20.1523 KOps/s $\textbf{\color{#35bf28}+5.24\%}$
test_compile_add_one_flat[tensorclass-eager] 0.4262ms 78.4063μs 12.7541 KOps/s 12.4471 KOps/s $\color{#35bf28}+2.47\%$
test_compile_add_one_flat[pytree-compile] 0.3106ms 0.1765ms 5.6645 KOps/s 5.6424 KOps/s $\color{#35bf28}+0.39\%$
test_compile_add_one_flat[pytree-eager] 0.6156ms 0.2972ms 3.3653 KOps/s 3.5233 KOps/s $\color{#d91a1a}-4.48\%$
test_compile_add_self_flat[tensordict-eager] 0.3738ms 0.2726ms 3.6681 KOps/s 3.6108 KOps/s $\color{#35bf28}+1.59\%$
test_compile_add_self_flat[tensordict-compile] 0.3131ms 0.1852ms 5.3996 KOps/s 5.2412 KOps/s $\color{#35bf28}+3.02\%$
test_compile_add_self_flat[tensorclass-eager] 0.9629ms 73.8834μs 13.5348 KOps/s 13.1517 KOps/s $\color{#35bf28}+2.91\%$
test_compile_add_self_flat[tensorclass-compile] 0.1066ms 48.4717μs 20.6306 KOps/s 19.1246 KOps/s $\textbf{\color{#35bf28}+7.87\%}$
test_compile_add_self_flat[pytree-eager] 0.3157ms 0.2389ms 4.1861 KOps/s 4.3416 KOps/s $\color{#d91a1a}-3.58\%$
test_compile_add_self_flat[pytree-compile] 0.3277ms 0.1803ms 5.5451 KOps/s 5.6056 KOps/s $\color{#d91a1a}-1.08\%$
test_compile_copy_flat[tensordict-compile] 0.1965ms 0.1139ms 8.7830 KOps/s 8.9406 KOps/s $\color{#d91a1a}-1.76\%$
test_compile_copy_flat[tensordict-eager] 0.1524ms 78.2073μs 12.7865 KOps/s 12.9342 KOps/s $\color{#d91a1a}-1.14\%$
test_compile_copy_flat[pytree-compile] 0.1325ms 78.7417μs 12.6997 KOps/s 12.8258 KOps/s $\color{#d91a1a}-0.98\%$
test_compile_copy_flat[pytree-eager] 0.1439ms 70.0430μs 14.2770 KOps/s 14.2991 KOps/s $\color{#d91a1a}-0.16\%$
test_compile_assign_and_add[tensordict-compile] 0.3208ms 0.1958ms 5.1072 KOps/s 5.0929 KOps/s $\color{#35bf28}+0.28\%$
test_compile_assign_and_add[tensordict-eager] 2.7013ms 1.7587ms 568.6033 Ops/s 575.2090 Ops/s $\color{#d91a1a}-1.15\%$
test_compile_assign_and_add[pytree-compile] 0.2856ms 0.1954ms 5.1184 KOps/s 5.0687 KOps/s $\color{#35bf28}+0.98\%$
test_compile_assign_and_add[pytree-eager] 2.1323ms 1.1434ms 874.5625 Ops/s 930.1695 Ops/s $\textbf{\color{#d91a1a}-5.98\%}$
test_compile_assign_and_add_stack[compile] 0.7372ms 0.4331ms 2.3089 KOps/s 2.3350 KOps/s $\color{#d91a1a}-1.12\%$
test_compile_assign_and_add_stack[eager] 4.5229ms 4.2144ms 237.2794 Ops/s 249.9409 Ops/s $\textbf{\color{#d91a1a}-5.07\%}$
test_compile_indexing[tensor-tensordict-compile] 95.8880μs 34.1366μs 29.2941 KOps/s 28.0127 KOps/s $\color{#35bf28}+4.57\%$
test_compile_indexing[tensor-tensordict-eager] 1.0424ms 49.7362μs 20.1061 KOps/s 19.8972 KOps/s $\color{#35bf28}+1.05\%$
test_compile_indexing[tensor-tensorclass-compile] 82.8730μs 29.6801μs 33.6926 KOps/s 32.1212 KOps/s $\color{#35bf28}+4.89\%$
test_compile_indexing[tensor-tensorclass-eager] 0.1055ms 29.0047μs 34.4771 KOps/s 34.0407 KOps/s $\color{#35bf28}+1.28\%$
test_compile_indexing[tensor-pytree-compile] 82.4430μs 30.2858μs 33.0188 KOps/s 32.0209 KOps/s $\color{#35bf28}+3.12\%$
test_compile_indexing[tensor-pytree-eager] 82.5830μs 28.6001μs 34.9649 KOps/s 34.7820 KOps/s $\color{#35bf28}+0.53\%$
test_compile_indexing[slice-tensordict-compile] 0.1746ms 76.8630μs 13.0102 KOps/s 13.2596 KOps/s $\color{#d91a1a}-1.88\%$
test_compile_indexing[slice-tensordict-eager] 0.6135ms 28.2824μs 35.3576 KOps/s 35.3002 KOps/s $\color{#35bf28}+0.16\%$
test_compile_indexing[slice-tensorclass-compile] 0.1343ms 69.1768μs 14.4557 KOps/s 14.0711 KOps/s $\color{#35bf28}+2.73\%$
test_compile_indexing[slice-tensorclass-eager] 77.6440μs 23.2782μs 42.9587 KOps/s 41.8318 KOps/s $\color{#35bf28}+2.69\%$
test_compile_indexing[slice-pytree-compile] 0.1506ms 69.4402μs 14.4009 KOps/s 14.0344 KOps/s $\color{#35bf28}+2.61\%$
test_compile_indexing[slice-pytree-eager] 82.4630μs 23.6738μs 42.2407 KOps/s 41.5808 KOps/s $\color{#35bf28}+1.59\%$
test_compile_indexing[int-tensordict-compile] 0.1379ms 75.4984μs 13.2453 KOps/s 13.1849 KOps/s $\color{#35bf28}+0.46\%$
test_compile_indexing[int-tensordict-eager] 0.8564ms 27.8242μs 35.9399 KOps/s 35.3151 KOps/s $\color{#35bf28}+1.77\%$
test_compile_indexing[int-tensorclass-compile] 0.1734ms 69.1357μs 14.4643 KOps/s 14.1028 KOps/s $\color{#35bf28}+2.56\%$
test_compile_indexing[int-tensorclass-eager] 62.4960μs 23.1869μs 43.1278 KOps/s 42.7307 KOps/s $\color{#35bf28}+0.93\%$
test_compile_indexing[int-pytree-compile] 0.1582ms 69.6451μs 14.3585 KOps/s 14.1369 KOps/s $\color{#35bf28}+1.57\%$
test_compile_indexing[int-pytree-eager] 93.0220μs 23.4515μs 42.6412 KOps/s 42.9668 KOps/s $\color{#d91a1a}-0.76\%$
test_mod_add[eager] 95.1360μs 26.1785μs 38.1993 KOps/s 39.7739 KOps/s $\color{#d91a1a}-3.96\%$
test_mod_add[compile] 0.1125ms 39.0384μs 25.6158 KOps/s 24.8420 KOps/s $\color{#35bf28}+3.11\%$
test_mod_add[compile-overhead] 86.0800μs 38.9305μs 25.6868 KOps/s 23.8227 KOps/s $\textbf{\color{#35bf28}+7.82\%}$
test_mod_wrap[eager] 0.4210ms 0.2092ms 4.7800 KOps/s 4.7111 KOps/s $\color{#35bf28}+1.46\%$
test_mod_wrap[compile] 0.4733ms 0.2401ms 4.1651 KOps/s 4.1771 KOps/s $\color{#d91a1a}-0.29\%$
test_mod_wrap[compile-overhead] 0.3723ms 0.2354ms 4.2473 KOps/s 4.0176 KOps/s $\textbf{\color{#35bf28}+5.72\%}$
test_mod_wrap_and_backward[eager] 14.7227ms 11.4106ms 87.6379 Ops/s 87.6769 Ops/s $\color{#d91a1a}-0.04\%$
test_mod_wrap_and_backward[compile] 14.6283ms 11.9198ms 83.8940 Ops/s 86.2631 Ops/s $\color{#d91a1a}-2.75\%$
test_mod_wrap_and_backward[compile-overhead] 14.4007ms 11.7855ms 84.8499 Ops/s 77.5255 Ops/s $\textbf{\color{#35bf28}+9.45\%}$
test_seq_add[eager] 0.2306ms 96.4960 8000 μs 10.3631 KOps/s 10.7399 KOps/s $\color{#d91a1a}-3.51\%$
test_seq_add[compile] 0.1728ms 66.2942μs 15.0843 KOps/s 14.7094 KOps/s $\color{#35bf28}+2.55\%$
test_seq_add[compile-overhead] 0.1368ms 65.1246μs 15.3552 KOps/s 14.8748 KOps/s $\color{#35bf28}+3.23\%$
test_seq_wrap[eager] 0.5702ms 0.3926ms 2.5474 KOps/s 2.5448 KOps/s $\color{#35bf28}+0.10\%$
test_seq_wrap[compile] 0.3697ms 0.2776ms 3.6028 KOps/s 3.5715 KOps/s $\color{#35bf28}+0.88\%$
test_seq_wrap[compile-overhead] 0.3883ms 0.2775ms 3.6037 KOps/s 3.5609 KOps/s $\color{#35bf28}+1.20\%$
test_func_call_runtime[False-eager] 0.7038ms 0.5283ms 1.8928 KOps/s 1.8725 KOps/s $\color{#35bf28}+1.08\%$
test_func_call_runtime[False-compile] 0.8970ms 0.5101ms 1.9603 KOps/s 1.9765 KOps/s $\color{#d91a1a}-0.82\%$
test_func_call_runtime[False-compile-overhead] 1.0595ms 0.5132ms 1.9486 KOps/s 1.9742 KOps/s $\color{#d91a1a}-1.30\%$
test_func_call_runtime[True-eager] 0.9031ms 0.7474ms 1.3380 KOps/s 1.3268 KOps/s $\color{#35bf28}+0.85\%$
test_func_call_runtime[True-compile] 1.0574ms 0.5235ms 1.9103 KOps/s 1.9142 KOps/s $\color{#d91a1a}-0.21\%$
test_func_call_runtime[True-compile-overhead] 0.6494ms 0.5202ms 1.9225 KOps/s 1.9148 KOps/s $\color{#35bf28}+0.40\%$
test_func_call_cm_runtime[False-eager] 0.9033ms 0.5199ms 1.9236 KOps/s 1.8946 KOps/s $\color{#35bf28}+1.53\%$
test_func_call_cm_runtime[False-compile] 0.8272ms 0.5073ms 1.9711 KOps/s 1.9734 KOps/s $\color{#d91a1a}-0.11\%$
test_func_call_cm_runtime[False-compile-overhead] 0.6140ms 0.5047ms 1.9815 KOps/s 1.9726 KOps/s $\color{#35bf28}+0.45\%$
test_func_call_cm_runtime[True-eager] 1.5729ms 0.8984ms 1.1131 KOps/s 1.0979 KOps/s $\color{#35bf28}+1.39\%$
test_func_call_cm_runtime[True-compile] 0.8726ms 0.7462ms 1.3401 KOps/s 1.3140 KOps/s $\color{#35bf28}+1.99\%$
test_func_call_cm_runtime[True-compile-overhead] 1.1891ms 0.7540ms 1.3263 KOps/s 1.3247 KOps/s $\color{#35bf28}+0.12\%$
test_vmap_func_call_cm_runtime[eager] 2.4652ms 1.9335ms 517.2060 Ops/s 524.9193 Ops/s $\color{#d91a1a}-1.47\%$
test_vmap_func_call_cm_runtime[compile] 2.7386ms 1.9906ms 502.3615 Ops/s 507.1286 Ops/s $\color{#d91a1a}-0.94\%$
test_vmap_func_call_cm_runtime[compile-overhead] 2.6384ms 1.9950ms 501.2435 Ops/s 510.4564 Ops/s $\color{#d91a1a}-1.80\%$
test_distributed 0.2347ms 0.1278ms 7.8239 KOps/s 7.7707 KOps/s $\color{#35bf28}+0.68\%$
test_tdmodule 46.0860μs 18.8016μs 53.1870 KOps/s 57.2166 KOps/s $\textbf{\color{#d91a1a}-7.04\%}$
test_tdmodule_dispatch 86.5310μs 38.0147μs 26.3056 KOps/s 27.9114 KOps/s $\textbf{\color{#d91a1a}-5.75\%}$
test_tdseq 36.4380μs 21.6280μs 46.2364 KOps/s 48.1689 KOps/s $\color{#d91a1a}-4.01\%$
test_tdseq_dispatch 71.5930μs 43.3513μs 23.0673 KOps/s 24.7576 KOps/s $\textbf{\color{#d91a1a}-6.83\%}$
test_instantiation_functorch 2.3816ms 1.6026ms 623.9997 Ops/s 605.5599 Ops/s $\color{#35bf28}+3.05\%$
test_exec_functorch 0.2439ms 0.1861ms 5.3729 KOps/s 5.3209 KOps/s $\color{#35bf28}+0.98\%$
test_exec_functional_call 0.3356ms 0.1731ms 5.7781 KOps/s 5.4883 KOps/s $\textbf{\color{#35bf28}+5.28\%}$
test_exec_td_decorator 0.5449ms 0.2327ms 4.2980 KOps/s 4.2063 KOps/s $\color{#35bf28}+2.18\%$
test_vmap_mlp_speed_decorator[True-True] 0.8170ms 0.6520ms 1.5338 KOps/s 1.5752 KOps/s $\color{#d91a1a}-2.63\%$
test_vmap_mlp_speed_decorator[True-False] 1.0102ms 0.6524ms 1.5327 KOps/s 1.5723 KOps/s $\color{#d91a1a}-2.52\%$
test_vmap_mlp_speed_decorator[False-True] 0.7799ms 0.5329ms 1.8765 KOps/s 1.8935 KOps/s $\color{#d91a1a}-0.90\%$
test_vmap_mlp_speed_decorator[False-False] 0.7258ms 0.5294ms 1.8888 KOps/s 1.8883 KOps/s $\color{#35bf28}+0.03\%$
test_to_module_speed[True] 1.9706ms 1.4027ms 712.8984 Ops/s 695.6653 Ops/s $\color{#35bf28}+2.48\%$
test_to_module_speed[False] 1.9541ms 1.3639ms 733.1997 Ops/s 717.5945 Ops/s $\color{#35bf28}+2.17\%$
test_tc_init 0.1025ms 46.2314μs 21.6303 KOps/s 21.9872 KOps/s $\color{#d91a1a}-1.62\%$
test_tc_init_nested 0.1767ms 92.9642μs 10.7568 KOps/s 10.7932 KOps/s $\color{#d91a1a}-0.34\%$
test_tc_first_layer_tensor 21.0890μs 1.5060μs 664.0242 KOps/s 648.5643 KOps/s $\color{#35bf28}+2.38\%$
test_tc_first_layer_nontensor 31.4680μs 4.6601μs 214.5884 KOps/s 211.4131 KOps/s $\color{#35bf28}+1.50\%$
test_tc_second_layer_tensor 21.4500μs 2.7836μs 359.2449 KOps/s 358.9963 KOps/s $\color{#35bf28}+0.07\%$
test_tc_second_layer_nontensor 27.3900μs 6.1284μs 163.1734 KOps/s 164.9545 KOps/s $\color{#d91a1a}-1.08\%$
test_unbind 0.4580s 13.2609ms 75.4094 Ops/s 74.5515 Ops/s $\color{#35bf28}+1.15\%$
test_full_like 7.7610ms 6.7821ms 147.4473 Ops/s 139.5052 Ops/s $\textbf{\color{#35bf28}+5.69\%}$
test_zeros_like 2.9072ms 2.6215ms 381.4538 Ops/s 365.8882 Ops/s $\color{#35bf28}+4.25\%$
test_ones_like 3.4673ms 3.0755ms 325.1530 Ops/s 322.5531 Ops/s $\color{#35bf28}+0.81\%$
test_clone 5.3131ms 4.7700ms 209.6423 Ops/s 205.3944 Ops/s $\color{#35bf28}+2.07\%$
test_squeeze 62.7870μs 12.5280μs 79.8210 KOps/s 77.8201 KOps/s $\color{#35bf28}+2.57\%$
test_unsqueeze 0.1670ms 92.1261μs 10.8547 KOps/s 10.3598 KOps/s $\color{#35bf28}+4.78\%$
test_split 0.5512ms 0.2029ms 4.9294 KOps/s 4.9791 KOps/s $\color{#d91a1a}-1.00\%$
test_permute 0.3831ms 0.2209ms 4.5271 KOps/s 4.4436 KOps/s $\color{#35bf28}+1.88\%$
test_stack 29.5502ms 24.0818ms 41.5251 Ops/s 38.5319 Ops/s $\textbf{\color{#35bf28}+7.77\%}$
test_cat 28.3388ms 24.1058ms 41.4838 Ops/s 39.3080 Ops/s $\textbf{\color{#35bf28}+5.54\%}$

Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 222. Improved: $\large\color{#35bf28}23$. Worsened: $\large\color{#d91a1a}6$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 45.1010μs 17.0677μs 58.5902 KOps/s 57.2866 KOps/s $\color{#35bf28}+2.28\%$
test_plain_set_stack_nested 44.4210μs 17.3703μs 57.5694 KOps/s 56.7396 KOps/s $\color{#35bf28}+1.46\%$
test_plain_set_nested_inplace 62.3210μs 18.2634μs 54.7544 KOps/s 52.8649 KOps/s $\color{#35bf28}+3.57\%$
test_plain_set_stack_nested_inplace 47.8000μs 18.2271μs 54.8633 KOps/s 53.2276 KOps/s $\color{#35bf28}+3.07\%$
test_items 21.9800μs 2.8758μs 347.7348 KOps/s 343.0792 KOps/s $\color{#35bf28}+1.36\%$
test_items_nested 0.3817ms 0.3411ms 2.9316 KOps/s 2.9855 KOps/s $\color{#d91a1a}-1.80\%$
test_items_nested_locked 0.4501ms 0.3410ms 2.9322 KOps/s 2.9255 KOps/s $\color{#35bf28}+0.23\%$
test_items_nested_leaf 0.1199ms 62.1758μs 16.0834 KOps/s 15.9221 KOps/s $\color{#35bf28}+1.01\%$
test_items_stack_nested 0.4032ms 0.3438ms 2.9083 KOps/s 2.9208 KOps/s $\color{#d91a1a}-0.43\%$
test_items_stack_nested_leaf 0.1018ms 64.5204μs 15.4990 KOps/s 15.4374 KOps/s $\color{#35bf28}+0.40\%$
test_items_stack_nested_locked 0.3812ms 0.3420ms 2.9236 KOps/s 2.9123 KOps/s $\color{#35bf28}+0.39\%$
test_keys 22.5310μs 3.4083μs 293.3993 KOps/s 291.4494 KOps/s $\color{#35bf28}+0.67\%$
test_keys_nested 0.1133ms 71.2300μs 14.0390 KOps/s 13.9402 KOps/s $\color{#35bf28}+0.71\%$
test_keys_nested_locked 2.5095ms 76.9257μs 12.9996 KOps/s 13.0324 KOps/s $\color{#d91a1a}-0.25\%$
test_keys_nested_leaf 99.9620μs 60.3612μs 16.5669 KOps/s 15.9986 KOps/s $\color{#35bf28}+3.55\%$
test_keys_stack_nested 0.1234ms 72.0346μs 13.8822 KOps/s 13.8515 KOps/s $\color{#35bf28}+0.22\%$
test_keys_stack_nested_leaf 99.8020μs 62.3945μs 16.0271 KOps/s 15.7163 KOps/s $\color{#35bf28}+1.98\%$
test_keys_stack_nested_locked 0.1134ms 78.3466μs 12.7638 KOps/s 13.0329 KOps/s $\color{#d91a1a}-2.07\%$
test_values 5.1583μs 0.8313μs 1.2029 MOps/s 1.1939 MOps/s $\color{#35bf28}+0.76\%$
test_values_nested 78.5010μs 48.4515μs 20.6392 KOps/s 20.5739 KOps/s $\color{#35bf28}+0.32\%$
test_values_nested_locked 79.9420μs 49.8106μs 20.0760 KOps/s 20.0836 KOps/s $\color{#d91a1a}-0.04\%$
test_values_nested_leaf 81.4920μs 42.5015μs 23.5286 KOps/s 23.4143 KOps/s $\color{#35bf28}+0.49\%$
test_values_stack_nested 84.6210μs 49.8164μs 20.0737 KOps/s 19.8499 KOps/s $\color{#35bf28}+1.13\%$
test_values_stack_nested_leaf 75.6520μs 43.1989μs 23.1487 KOps/s 22.8630 KOps/s $\color{#35bf28}+1.25\%$
test_values_stack_nested_locked 0.1293ms 51.5406μs 19.4022 KOps/s 19.3846 KOps/s $\color{#35bf28}+0.09\%$
test_membership 2.0341μs 0.5019μs 1.9926 MOps/s 1.9712 MOps/s $\color{#35bf28}+1.09\%$
test_membership_nested 13.6950μs 1.9067μs 524.4685 KOps/s 530.0506 KOps/s $\color{#d91a1a}-1.05\%$
test_membership_nested_leaf 12.5155μs 1.9099μs 523.5938 KOps/s 527.4165 KOps/s $\color{#d91a1a}-0.72\%$
test_membership_stacked_nested 20.3900μs 1.9356 8000 s 516.6284 KOps/s 512.3133 KOps/s $\color{#35bf28}+0.84\%$
test_membership_stacked_nested_leaf 24.7010μs 1.9486μs 513.1850 KOps/s 513.0325 KOps/s $\color{#35bf28}+0.03\%$
test_membership_nested_last 26.5900μs 3.0263μs 330.4321 KOps/s 331.6190 KOps/s $\color{#d91a1a}-0.36\%$
test_membership_nested_leaf_last 23.4510μs 2.9867μs 334.8161 KOps/s 333.7097 KOps/s $\color{#35bf28}+0.33\%$
test_membership_stacked_nested_last 24.5710μs 3.5972μs 277.9964 KOps/s 177.8184 KOps/s $\textbf{\color{#35bf28}+56.34\%}$
test_membership_stacked_nested_leaf_last 33.8810μs 3.5399μs 282.4907 KOps/s 179.2664 KOps/s $\textbf{\color{#35bf28}+57.58\%}$
test_nested_getleaf 26.4910μs 6.0785μs 164.5146 KOps/s 165.9365 KOps/s $\color{#d91a1a}-0.86\%$
test_nested_get 35.8410μs 5.7462μs 174.0277 KOps/s 175.5001 KOps/s $\color{#d91a1a}-0.84\%$
test_stacked_getleaf 35.1210μs 6.0230μs 166.0306 KOps/s 164.7197 KOps/s $\color{#35bf28}+0.80\%$
test_stacked_get 34.0300μs 5.6349μs 177.4650 KOps/s 178.2000 KOps/s $\color{#d91a1a}-0.41\%$
test_nested_getitemleaf 28.7500μs 6.1142μs 163.5540 KOps/s 163.9656 KOps/s $\color{#d91a1a}-0.25\%$
test_nested_getitem 27.8900μs 5.7232μs 174.7269 KOps/s 172.6813 KOps/s $\color{#35bf28}+1.18\%$
test_stacked_getitemleaf 33.0410μs 6.0809μs 164.4505 KOps/s 163.6266 KOps/s $\color{#35bf28}+0.50\%$
test_stacked_getitem 32.7600μs 5.7027μs 175.3546 KOps/s 175.6955 KOps/s $\color{#d91a1a}-0.19\%$
test_lock_nested 7.6985ms 0.4367ms 2.2897 KOps/s 2.3332 KOps/s $\color{#d91a1a}-1.86\%$
test_lock_stack_nested 0.4373ms 0.3949ms 2.5323 KOps/s 2.5623 KOps/s $\color{#d91a1a}-1.17\%$
test_unlock_nested 0.7825ms 0.3669ms 2.7258 KOps/s 2.7086 KOps/s $\color{#35bf28}+0.64\%$
test_unlock_stack_nested 0.3650ms 0.3309ms 3.0224 KOps/s 3.0380 KOps/s $\color{#d91a1a}-0.51\%$
test_flatten_speed 0.1600ms 76.5248μs 13.0677 KOps/s 12.9458 KOps/s $\color{#35bf28}+0.94\%$
test_unflatten_speed 0.3675ms 0.3193ms 3.1322 KOps/s 3.1087 KOps/s $\color{#35bf28}+0.76\%$
test_common_ops 1.6339ms 1.2908ms 774.7406 Ops/s 773.4362 Ops/s $\color{#35bf28}+0.17\%$
test_creation 23.6500μs 1.5045μs 664.6910 KOps/s 679.2022 KOps/s $\color{#d91a1a}-2.14\%$
test_creation_empty 43.3000μs 16.5866μs 60.2897 KOps/s 59.2829 KOps/s $\color{#35bf28}+1.70\%$
test_creation_nested_1 42.7500μs 18.8449μs 53.0647 KOps/s 53.8314 KOps/s $\color{#d91a1a}-1.42\%$
test_creation_nested_2 50.9610μs 21.1640μs 47.2500 KOps/s 47.6806 KOps/s $\color{#d91a1a}-0.90\%$
test_clone 61.3910μs 29.4625μs 33.9414 KOps/s 33.9350 KOps/s $\color{#35bf28}+0.02\%$
test_getitem[int] 0.9628ms 16.0472μs 62.3162 KOps/s 61.2267 KOps/s $\color{#35bf28}+1.78\%$
test_getitem[slice_int] 0.1200ms 27.8150μs 35.9519 KOps/s 35.9724 KOps/s $\color{#d91a1a}-0.06\%$
test_getitem[range] 0.1476ms 0.1076ms 9.2905 KOps/s 9.1343 KOps/s $\color{#35bf28}+1.71\%$
test_getitem[tuple] 0.1224ms 24.2279μs 41.2748 KOps/s 41.8020 KOps/s $\color{#d91a1a}-1.26\%$
test_getitem[list] 0.2086ms 98.6419μs 10.1377 KOps/s 9.9947 KOps/s $\color{#35bf28}+1.43\%$
test_setitem_dim[int] 78.7510μs 46.3393μs 21.5800 KOps/s 22.0215 KOps/s $\color{#d91a1a}-2.01\%$
test_setitem_dim[slice_int] 0.1125ms 68.6510μs 14.5664 KOps/s 14.5069 KOps/s $\color{#35bf28}+0.41\%$
test_setitem_dim[range] 0.1563ms 0.1274ms 7.8485 KOps/s 7.6648 KOps/s $\color{#35bf28}+2.40\%$
test_setitem_dim[tuple] 93.2110μs 62.3062μs 16.0498 KOps/s 15.0958 KOps/s $\textbf{\color{#35bf28}+6.32\%}$
test_setitem 74.4210μs 43.0653μs 23.2206 KOps/s 21.7943 KOps/s $\textbf{\color{#35bf28}+6.54\%}$
test_set 74.4210μs 41.8476μs 23.8962 KOps/s 23.5050 KOps/s $\color{#35bf28}+1.66\%$
test_set_shared 0.3581ms 54.2328μs 18.4390 KOps/s 18.3848 KOps/s $\color{#35bf28}+0.29\%$
test_update 94.7810μs 52.3777μs 19.0921 KOps/s 19.1678 KOps/s $\color{#d91a1a}-0.39\%$
test_update_nested 96.0820μs 59.7707μs 16.7306 KOps/s 16.7449 KOps/s $\color{#d91a1a}-0.09\%$
test_update__nested 0.2147ms 63.5569μs 15.7339 KOps/s 14.4930 KOps/s $\textbf{\color{#35bf28}+8.56\%}$
test_set_nested 81.2120μs 44.9462μs 22.2488 KOps/s 21.6851 KOps/s $\color{#35bf28}+2.60\%$
test_set_nested_new 90.6420μs 48.2801μs 20.7125 KOps/s 20.0186 KOps/s $\color{#35bf28}+3.47\%$
test_select 0.1000ms 64.4934μs 15.5055 KOps/s 15.5660 KOps/s $\color{#d91a1a}-0.39\%$
test_select_nested 70.4410μs 41.3293μs 24.1959 KOps/s 23.9148 KOps/s $\color{#35bf28}+1.18\%$
test_exclude_nested 85.8020μs 57.8372μs 17.2899 KOps/s 16.7715 KOps/s $\color{#35bf28}+3.09\%$
test_empty[True] 0.2940ms 0.2602ms 3.8432 KOps/s 3.8811 KOps/s $\color{#d91a1a}-0.98\%$
test_empty[False] 3.5971μs 0.7386μs 1.3539 MOps/s 1.3793 MOps/s $\color{#d91a1a}-1.85\%$
test_to 52.6210μs 27.4418μs 36.4408 KOps/s 37.1106 KOps/s $\color{#d91a1a}-1.80\%$
test_to_nonblocking 55.2410μs 26.2912μs 38.0356 KOps/s 40.0513 KOps/s $\textbf{\color{#d91a1a}-5.03\%}$
test_unbind_speed 1.4130ms 0.2759ms 3.6239 KOps/s 3.5765 KOps/s $\color{#35bf28}+1.32\%$
test_unbind_speed_stack0 0.3304ms 0.2791ms 3.5828 KOps/s 3.5722 KOps/s $\color{#35bf28}+0.30\%$
test_unbind_speed_stack1 92.4068ms 0.7189ms 1.3909 KOps/s 1.4018 KOps/s $\color{#d91a1a}-0.77\%$
test_split 94.7748ms 2.2412ms 446.1912 Ops/s 446.6917 Ops/s $\color{#d91a1a}-0.11\%$
test_chunk 94.6603ms 2.2455ms 445.3288 Ops/s 447.3553 Ops/s $\color{#d91a1a}-0.45\%$
test_to[False] 3.5998ms 3.4755ms 287.7319 Ops/s 283.4002 Ops/s $\color{#35bf28}+1.53\%$
test_to[True] 4.8754ms 4.5545ms 219.5622 Ops/s 218.9963 Ops/s $\color{#35bf28}+0.26\%$
test_to_njt[False] 0.3268s 0.2496s 4.0056 Ops/s 4.2944 Ops/s $\textbf{\color{#d91a1a}-6.72\%}$
test_to_njt[True] 0.3637s 0.2793s 3.5806 Ops/s 3.5463 Ops/s $\color{#35bf28}+0.97\%$
test_creation[device0] 0.3882ms 0.1272ms 7.8605 KOps/s 7.8343 KOps/s $\color{#35bf28}+0.34\%$
test_creation_from_tensor 0.4176ms 0.1342ms 7.4537 KOps/s 7.5556 KOps/s $\color{#d91a1a}-1.35\%$
test_add_one[memmap_tensor0] 0.1850ms 9.2807μs 107.7504 KOps/s 113.5731 KOps/s $\textbf{\color{#d91a1a}-5.13\%}$
test_contiguous[memmap_tensor0] 30.6800μs 2.1966μs 455.2404 KOps/s 455.5398 KOps/s $\color{#d91a1a}-0.07\%$
test_stack[memmap_tensor0] 0.1375ms 7.0003μs 142.8501 KOps/s 147.4174 KOps/s $\color{#d91a1a}-3.10\%$
test_memmaptd_index 1.0731ms 0.4387ms 2.2796 KOps/s 2.2435 KOps/s $\color{#35bf28}+1.61\%$
test_memmaptd_index_astensor 0.7603ms 0.5072ms 1.9717 KOps/s 1.9336 KOps/s $\color{#35bf28}+1.97\%$
test_memmaptd_index_op 1.4413ms 1.0700ms 934.5513 Ops/s 923.9067 Ops/s $\color{#35bf28}+1.15\%$
test_serialize_model 0.1314s 0.1305s 7.6620 Ops/s 7.7125 Ops/s $\color{#d91a1a}-0.65\%$
test_serialize_model_pickle 1.3546s 1.1918s 0.8391 Ops/s 0.8207 Ops/s $\color{#35bf28}+2.24\%$
test_serialize_weights 0.1320s 0.1299s 7.6996 Ops/s 7.7104 Ops/s $\color{#d91a1a}-0.14\%$
test_serialize_weights_returnearly 0.2065s 55.3947ms 18.0523 Ops/s 16.2117 Ops/s $\textbf{\color{#35bf28}+11.35\%}$
test_serialize_weights_pickle 1.3733s 1.1906s 0.8399 Ops/s 0.8246 Ops/s $\color{#35bf28}+1.86\%$
test_reshape_pytree 71.0010μs 36.1439μs 27.6672 KOps/s 28.1445 KOps/s $\color{#d91a1a}-1.70\%$
test_reshape_td 84.5410μs 44.1336μs 22.6585 KOps/s 23.4381 KOps/s $\color{#d91a1a}-3.33\%$
test_view_pytree 66.1610μs 35.0095μs 28.5637 KOps/s 27.9293 KOps/s $\color{#35bf28}+2.27\%$
test_view_td 95.6120μs 47.0337μs 21.2613 KOps/s 22.5970 KOps/s $\textbf{\color{#d91a1a}-5.91\%}$
test_unbind_pytree 66.0610μs 33.8479μs 29.5439 KOps/s 28.4773 KOps/s $\color{#35bf28}+3.75\%$
test_unbind_td 0.5044ms 43.8836μs 22.7875 KOps/s 22.3770 KOps/s $\color{#35bf28}+1.83\%$
test_split_pytree 93.7410μs 46.7564μs 21.3874 KOps/s 21.9837 KOps/s $\color{#d91a1a}-2.71\%$
test_split_td 0.7061ms 56.6596μs 17.6492 KOps/s 16.6942 KOps/s $\textbf{\color{#35bf28}+5.72\%}$
test_add_pytree 0.1164ms 57.8207μs 17.2948 KOps/s 16.3137 KOps/s $\textbf{\color{#35bf28}+6.01\%}$
test_add_td 0.1780ms 0.1016ms 9.8409 KOps/s 9.8046 KOps/s $\color{#35bf28}+0.37\%$
test_compile_add_one_nested[tensordict-compile] 0.2123ms 0.1620ms 6.1709 KOps/s 6.0581 KOps/s $\color{#35bf28}+1.86\%$
test_compile_add_one_nested[tensordict-eager] 0.2927ms 0.1626ms 6.1498 KOps/s 6.2695 KOps/s $\color{#d91a1a}-1.91\%$
test_compile_add_one_nested[pytree-compile] 0.2158ms 0.1542ms 6.4831 KOps/s 6.2442 KOps/s $\color{#35bf28}+3.83\%$
test_compile_add_one_nested[pytree-eager] 0.2679ms 0.1876ms 5.3301 KOps/s 5.4032 KOps/s $\color{#d91a1a}-1.35\%$
test_compile_copy_nested[tensordict-compile] 66.8410μs 22.0132μs 45.4272 KOps/s 46.6349 KOps/s $\color{#d91a1a}-2.59\%$
test_compile_copy_nested[tensordict-eager] 93.6220μs 48.8940μs 20.4524 KOps/s 20.4154 KOps/s $\color{#35bf28}+0.18\%$
test_compile_copy_nested[pytree-compile] 0.4360ms 64.6728μs 15.4624 KOps/s 15.4786 KOps/s $\color{#d91a1a}-0.10\%$
test_compile_copy_nested[pytree-eager] 0.1184ms 50.0389μs 19.9844 KOps/s 20.1730 KOps/s $\color{#d91a1a}-0.93\%$
test_compile_add_one_flat[tensordict-compile] 0.3741ms 0.3207ms 3.1180 KOps/s 3.0912 KOps/s $\color{#35bf28}+0.87\%$
test_compile_add_one_flat[tensordict-eager] 0.3464ms 0.2313ms 4.3243 KOps/s 4.3128 KOps/s $\color{#35bf28}+0.27\%$
test_compile_add_one_flat[tensorclass-compile] 0.1789ms 0.1276ms 7.8382 KOps/s 7.6683 KOps/s $\color{#35bf28}+2.22\%$
test_compile_add_one_flat[tensorclass-eager] 0.1218ms 65.8792μs 15.1793 KOps/s 15.5847 KOps/s $\color{#d91a1a}-2.60\%$
test_compile_add_one_flat[pytree-compile] 0.3876ms 0.3291ms 3.0382 KOps/s 3.0194 KOps/s $\color{#35bf28}+0.62\%$
test_compile_add_one_flat[pytree-eager] 0.7356ms 0.6395ms 1.5637 KOps/s 1.5748 KOps/s $\color{#d91a1a}-0.70\%$
test_compile_add_self_flat[tensordict-eager] 0.3936ms 0.2829ms 3.5347 KOps/s 3.5316 KOps/s $\color{#35bf28}+0.09\%$
test_compile_add_self_flat[tensordict-compile] 0.3769ms 0.3216ms 3.1096 KOps/s 3.0698 KOps/s $\color{#35bf28}+1.29\%$
test_compile_add_self_flat[tensorclass-eager] 0.1503ms 78.9206μs 12.6710 KOps/s 13.0330 KOps/s $\color{#d91a1a}-2.78\%$
test_compile_add_self_flat[tensorclass-compile] 0.1805ms 0.1290ms 7.7504 KOps/s 7.6325 KOps/s $\color{#35bf28}+1.54\%$
test_compile_add_self_flat[pytree-eager] 0.6955ms 0.5302ms 1.8860 KOps/s 1.8801 KOps/s $\color{#35bf28}+0.31\%$
test_compile_add_self_flat[pytree-compile] 0.3758ms 0.3280ms 3.0488 KOps/s 2.9958 KOps/s $\color{#35bf28}+1.77\%$
test_compile_copy_flat[tensordict-compile] 55.1710μs 18.9540μs 52.7593 KOps/s 51.6714 KOps/s $\color{#35bf28}+2.11\%$
test_compile_copy_flat[tensordict-eager] 79.3810μs 38.0564μs 26.2768 KOps/s 26.2866 KOps/s $\color{#d91a1a}-0.04\%$
test_compile_copy_flat[pytree-compile] 0.1109ms 70.1253μs 14.2602 KOps/s 14.4394 KOps/s $\color{#d91a1a}-1.24\%$
test_compile_copy_flat[pytree-eager] 88.6220μs 51.2377μs 19.5169 KOps/s 19.6281 KOps/s $\color{#d91a1a}-0.57\%$
test_compile_assign_and_add[tensordict-compile] 2.3624ms 0.8294ms 1.2056 KOps/s 1.1137 KOps/s $\textbf{\color{#35bf28}+8.25\%}$
test_compile_assign_and_add[tensordict-eager] 3.4861ms 3.2306ms 309.5425 Ops/s 299.5417 Ops/s $\color{#35bf28}+3.34\%$
test_compile_assign_and_add[pytree-compile] 2.4234ms 0.8446ms 1.1840 KOps/s 1.0813 KOps/s $\textbf{\color{#35bf28}+9.50\%}$
test_compile_assign_and_add[pytree-eager] 3.5075ms 3.2868ms 304.2498 Ops/s 307.7813 Ops/s $\color{#d91a1a}-1.15\%$
test_compile_indexing[tensor-tensordict-compile] 0.1610ms 8000 0.1178ms 8.4881 KOps/s 8.2814 KOps/s $\color{#35bf28}+2.50\%$
test_compile_indexing[tensor-tensordict-eager] 0.1836ms 63.3180μs 15.7933 KOps/s 15.3184 KOps/s $\color{#35bf28}+3.10\%$
test_compile_indexing[tensor-tensorclass-compile] 0.1708ms 0.1156ms 8.6497 KOps/s 8.5495 KOps/s $\color{#35bf28}+1.17\%$
test_compile_indexing[tensor-tensorclass-eager] 96.8210μs 46.5416μs 21.4862 KOps/s 22.6276 KOps/s $\textbf{\color{#d91a1a}-5.04\%}$
test_compile_indexing[tensor-pytree-compile] 0.1611ms 0.1164ms 8.5925 KOps/s 8.6034 KOps/s $\color{#d91a1a}-0.13\%$
test_compile_indexing[tensor-pytree-eager] 82.9010μs 46.7973μs 21.3688 KOps/s 22.3349 KOps/s $\color{#d91a1a}-4.33\%$
test_compile_indexing[slice-tensordict-compile] 0.1883ms 0.1457ms 6.8630 KOps/s 6.6686 KOps/s $\color{#35bf28}+2.92\%$
test_compile_indexing[slice-tensordict-eager] 0.1539ms 25.2780μs 39.5601 KOps/s 36.8440 KOps/s $\textbf{\color{#35bf28}+7.37\%}$
test_compile_indexing[slice-tensorclass-compile] 0.1873ms 0.1390ms 7.1942 KOps/s 6.7164 KOps/s $\textbf{\color{#35bf28}+7.11\%}$
test_compile_indexing[slice-tensorclass-eager] 57.8710μs 21.2323μs 47.0981 KOps/s 47.0350 KOps/s $\color{#35bf28}+0.13\%$
test_compile_indexing[slice-pytree-compile] 0.1927ms 0.1425ms 7.0187 KOps/s 6.9250 KOps/s $\color{#35bf28}+1.35\%$
test_compile_indexing[slice-pytree-eager] 55.7310μs 21.0207μs 47.5722 KOps/s 47.4520 KOps/s $\color{#35bf28}+0.25\%$
test_compile_indexing[int-tensordict-compile] 0.2898ms 0.1532ms 6.5284 KOps/s 6.5420 KOps/s $\color{#d91a1a}-0.21\%$
test_compile_indexing[int-tensordict-eager] 0.5053ms 25.4981μs 39.2186 KOps/s 39.3655 KOps/s $\color{#d91a1a}-0.37\%$
test_compile_indexing[int-tensorclass-compile] 0.2577ms 0.1406ms 7.1104 KOps/s 6.7737 KOps/s $\color{#35bf28}+4.97\%$
test_compile_indexing[int-tensorclass-eager] 47.6200μs 20.9591μs 47.7119 KOps/s 47.0560 KOps/s $\color{#35bf28}+1.39\%$
test_compile_indexing[int-pytree-compile] 0.1893ms 0.1402ms 7.1327 KOps/s 6.8042 KOps/s $\color{#35bf28}+4.83\%$
test_compile_indexing[int-pytree-eager] 50.1110μs 20.9703μs 47.6864 KOps/s 46.3424 KOps/s $\color{#35bf28}+2.90\%$
test_mod_add[eager] 70.2910μs 32.7347μs 30.5486 KOps/s 29.5729 KOps/s $\color{#35bf28}+3.30\%$
test_mod_add[compile] 0.1678ms 82.7317μs 12.0873 KOps/s 12.0221 KOps/s $\color{#35bf28}+0.54\%$
test_mod_add[compile-overhead] 0.2987ms 0.1498ms 6.6760 KOps/s 5.8834 KOps/s $\textbf{\color{#35bf28}+13.47\%}$
test_mod_wrap[eager] 0.3331ms 0.2443ms 4.0936 KOps/s 3.8394 KOps/s $\textbf{\color{#35bf28}+6.62\%}$
test_mod_wrap[compile] 1.5071ms 0.2996ms 3.3376 KOps/s 3.2838 KOps/s $\color{#35bf28}+1.64\%$
test_mod_wrap[compile-overhead] 10.9223ms 3.9670ms 252.0783 Ops/s 244.6414 Ops/s $\color{#35bf28}+3.04\%$
test_mod_wrap_and_backward[eager] 1.5532ms 1.3652ms 732.5104 Ops/s 683.7393 Ops/s $\textbf{\color{#35bf28}+7.13\%}$
test_mod_wrap_and_backward[compile] 1.6119ms 1.3504ms 740.5361 Ops/s 685.5100 Ops/s $\textbf{\color{#35bf28}+8.03\%}$
test_mod_wrap_and_backward[compile-overhead] 1.3314ms 0.9030ms 1.1075 KOps/s 917.8416 Ops/s $\textbf{\color{#35bf28}+20.66\%}$
test_seq_add[eager] 0.1354ms 0.1004ms 9.9636 KOps/s 9.4141 KOps/s $\textbf{\color{#35bf28}+5.84\%}$
test_seq_add[compile] 0.1738ms 92.5820μs 10.8012 KOps/s 10.3300 KOps/s $\color{#35bf28}+4.56\%$
test_seq_add[compile-overhead] 0.1813ms 0.1242ms 8.0504 KOps/s 7.8033 KOps/s $\color{#35bf28}+3.17\%$
test_seq_wrap[eager] 0.4520ms 0.3891ms 2.5698 KOps/s 2.5476 KOps/s $\color{#35bf28}+0.87\%$
test_seq_wrap[compile] 0.3689ms 0.3176ms 3.1484 KOps/s 3.0439 KOps/s $\color{#35bf28}+3.44\%$
test_seq_wrap[compile-overhead] 0.2654ms 0.2213ms 4.5194 KOps/s 4.4545 KOps/s $\color{#35bf28}+1.46\%$
test_func_call_runtime[False-eager] 0.9284ms 0.7480ms 1.3369 KOps/s 1.3209 KOps/s $\color{#35bf28}+1.21\%$
test_func_call_runtime[False-compile] 0.8357ms 0.7990ms 1.2516 KOps/s 1.2361 KOps/s $\color{#35bf28}+1.25\%$
test_func_call_runtime[False-compile-overhead] 0.4051ms 0.3594ms 2.7820 KOps/s 2.7486 KOps/s $\color{#35bf28}+1.22\%$
test_func_call_runtime[True-eager] 1.0799ms 0.9010ms 1.1098 KOps/s 1.0788 KOps/s $\color{#35bf28}+2.88\%$
test_func_call_runtime[True-compile] 0.9225ms 0.8258ms 1.2110 KOps/s 1.1952 KOps/s $\color{#35bf28}+1.32\%$
test_func_call_runtime[True-compile-overhead] 0.4329ms 0.3819ms 2.6185 KOps/s 2.6019 KOps/s $\color{#35bf28}+0.64\%$
test_func_call_cm_runtime[False-eager] 0.7886ms 0.7424ms 1.3470 KOps/s 1.3284 KOps/s $\color{#35bf28}+1.40\%$
test_func_call_cm_runtime[False-compile] 0.8868ms 0.7990ms 1.2516 KOps/s 1.2325 KOps/s $\color{#35bf28}+1.55\%$
test_func_call_cm_runtime[False-compile-overhead] 0.4161ms 0.3609ms 2.7709 KOps/s 2.7428 KOps/s $\color{#35bf28}+1.02\%$
test_func_call_cm_runtime[True-eager] 1.1211ms 1.0197ms 980.7216 Ops/s 969.9685 Ops/s $\color{#35bf28}+1.11\%$
test_func_call_cm_runtime[True-compile] 0.8992ms 0.8489ms 1.1780 KOps/s 1.1540 KOps/s $\color{#35bf28}+2.08\%$
test_func_call_cm_runtime[True-compile-overhead] 0.4540ms 0.4054ms 2.4669 KOps/s 2.4485 KOps/s $\color{#35bf28}+0.75\%$
test_vmap_func_call_cm_runtime[eager] 2.5399ms 2.0907ms 478.2990 Ops/s 470.6774 Ops/s $\color{#35bf28}+1.62\%$
test_vmap_func_call_cm_runtime[compile] 0.9692ms 0.8824ms 1.1333 KOps/s 1.1415 KOps/s $\color{#d91a1a}-0.71\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.4654ms 0.4133ms 2.4194 KOps/s 2.4174 KOps/s $\color{#35bf28}+0.08\%$
test_distributed 3.0709ms 0.1852ms 5.4003 KOps/s 8.8876 KOps/s $\textbf{\color{#d91a1a}-39.24\%}$
test_tdmodule 0.3806ms 15.9749μs 62.5983 KOps/s 60.8930 KOps/s $\color{#35bf28}+2.80\%$
test_tdmodule_dispatch 59.8210μs 31.3561μs 31.8918 KOps/s 31.0041 KOps/s $\color{#35bf28}+2.86\%$
test_tdseq 35.9210μs 16.2656μs 61.4794 KOps/s 56.3335 KOps/s $\textbf{\color{#35bf28}+9.13\%}$
test_tdseq_dispatch 58.3110μs 33.0386μs 30.2677 KOps/s 28.3236 KOps/s $\textbf{\color{#35bf28}+6.86\%}$
test_instantiation_functorch 2.0194ms 1.8723ms 534.1113 Ops/s 523.5087 Ops/s $\color{#35bf28}+2.03\%$
test_exec_functorch 0.2630ms 0.2112ms 4.7347 KOps/s 4.5510 KOps/s $\color{#35bf28}+4.04\%$
test_exec_functional_call 0.2832ms 0.2106ms 4.7482 KOps/s 4.5526 KOps/s $\color{#35bf28}+4.30\%$
test_exec_td_decorator 0.4519ms 0.2623ms 3.8121 KOps/s 3.7917 KOps/s $\color{#35bf28}+0.54\%$
test_vmap_mlp_speed_decorator[True-True] 0.8145ms 0.6856ms 1.4585 KOps/s 1.4302 KOps/s $\color{#35bf28}+1.98\%$
test_vmap_mlp_speed_decorator[True-False] 0.8093ms 0.6854ms 1.4590 KOps/s 1.4430 KOps/s $\color{#35bf28}+1.11\%$
test_vmap_mlp_speed_decorator[False-True] 0.7213ms 0.6011ms 1.6636 KOps/s 1.6501 KOps/s $\color{#35bf28}+0.82\%$
test_vmap_mlp_speed_decorator[False-False] 0.7109ms 0.6018ms 1.6617 KOps/s 1.6508 KOps/s $\color{#35bf28}+0.66\%$
test_vmap_transformer_speed_decorator[True-True] 19.7343ms 19.6605ms 50.8634 Ops/s 50.6414 Ops/s $\color{#35bf28}+0.44\%$
test_vmap_transformer_speed_decorator[True-False] 19.7569ms 19.6427ms 50.9095 Ops/s 50.6177 Ops/s $\color{#35bf28}+0.58\%$
test_vmap_transformer_speed_decorator[False-True] 19.6155ms 19.5110ms 51.2531 Ops/s 51.0521 Ops/s $\color{#35bf28}+0.39\%$
test_vmap_transformer_speed_decorator[False-False] 20.6934ms 19.5444ms 51.1656 Ops/s 50.5382 Ops/s $\color{#35bf28}+1.24\%$
test_to_module_speed[True] 1.4373ms 1.0055ms 994.5667 Ops/s 996.9961 Ops/s $\color{#d91a1a}-0.24\%$
test_to_module_speed[False] 1.3895ms 0.9767ms 1.0239 KOps/s 1.0180 KOps/s $\color{#35bf28}+0.58\%$
test_tc_init 71.8420μs 36.0620μs 27.7300 KOps/s 27.5297 KOps/s $\color{#35bf28}+0.73\%$
test_tc_init_nested 0.1131ms 72.2691μs 13.8372 KOps/s 13.6734 KOps/s $\color{#35bf28}+1.20\%$
test_tc_first_layer_tensor 3.8186μs 0.6889μs 1.4516 MOps/s 1.4612 MOps/s $\color{#d91a1a}-0.66\%$
test_tc_first_layer_nontensor 22.5000μs 2.2521μs 444.0363 KOps/s 446.8611 KOps/s $\color{#d91a1a}-0.63\%$
test_tc_second_layer_tensor 7.7850μs 1.3865μs 721.2640 KOps/s 730.2274 KOps/s $\color{#d91a1a}-1.23\%$
test_tc_second_layer_nontensor 31.4000μs 2.9503μs 338.9480 KOps/s 336.7318 KOps/s $\color{#35bf28}+0.66\%$
test_unbind 0.1922s 9.5556ms 104.6511 Ops/s 92.2043 Ops/s $\textbf{\color{#35bf28}+13.50\%}$
test_full_like 0.6587ms 0.5736ms 1.7434 KOps/s 1.7400 KOps/s $\color{#35bf28}+0.19\%$
test_zeros_like 0.2745ms 0.1980ms 5.0502 KOps/s 5.0509 KOps/s $\color{#d91a1a}-0.01\%$
test_ones_like 0.2339ms 0.1978ms 5.0559 KOps/s 5.0578 KOps/s $\color{#d91a1a}-0.04\%$
test_clone 0.4485ms 0.4148ms 2.4106 KOps/s 2.4105 KOps/s $+0.00\%$
test_squeeze 41.6210μs 10.0469μs 99.5337 KOps/s 93.5048 KOps/s $\textbf{\color{#35bf28}+6.45\%}$
test_unsqueeze 0.2254ms 73.5893μs 13.5889 KOps/s 12.8028 KOps/s $\textbf{\color{#35bf28}+6.14\%}$
test_split 0.4031ms 0.1557ms 6.4244 KOps/s 6.1731 KOps/s $\color{#35bf28}+4.07\%$
test_permute 0.2219ms 0.1815ms 5.5096 KOps/s 5.4868 KOps/s $\color{#35bf28}+0.42\%$
test_stack 1.3316ms 0.8685ms 1.1514 KOps/s 1.1403 KOps/s $\color{#35bf28}+0.97\%$
test_cat 1.2448ms 1.2312ms 812.2327 Ops/s 812.1977 Ops/s $+0.00\%$

@vmoens vmoens added the enhancement New feature or request label Oct 16, 2024
@vmoens vmoens merged commit 1622253 into gh/vmoens/32/base Oct 16, 2024
50 of 55 checks passed
vmoens pushed a commit that referenced this pull request Oct 16, 2024
ghstack-source-id: 95a9f41
Pull Request resolved: #1045
@vmoens vmoens deleted the gh/vmoens/32/head branch October 16, 2024 20:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0