Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Refactor] Use default device instead of CPU in losses #2687

Open
wants to merge 3 commits into
base: gh/vmoens/65/base
Choose a base branch
from

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jan 10, 2025

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Jan 10, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2687

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 14 New Failures, 2 Pending, 2 Unrelated Failures

As of commit 399b618 with merge base ed656a1 (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 10, 2025
vmoens added a commit that referenced this pull request Jan 10, 2025
ghstack-source-id: 52a013a04a763bdb8c1c77a43a0984babe32bd77
Pull Request resolved: #2687
Copy link

github-actions bot commented Jan 10, 2025

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 149. Improved: $\large\color{#35bf28}5$. Worsened: $\large\color{#d91a1a}4$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_simple 0.5306s 0.4424s 2.2605 Ops/s 2.2498 Ops/s $\color{#35bf28}+0.47\%$
test_transformed 0.7054s 0.6275s 1.5937 Ops/s 1.6045 Ops/s $\color{#d91a1a}-0.68\%$
test_serial 1.4605s 1.3621s 0.7342 Ops/s 0.7341 Ops/s $+0.01\%$
test_parallel 1.2987s 1.2043s 0.8304 Ops/s 0.8209 Ops/s $\color{#35bf28}+1.15\%$
test_step_mdp_speed[True-True-True-True-True] 0.1797ms 30.1480μs 33.1697 KOps/s 32.6609 KOps/s $\color{#35bf28}+1.56\%$
test_step_mdp_speed[True-True-True-True-False] 57.7180μs 17.8041μs 56.1669 KOps/s 56.6727 KOps/s $\color{#d91a1a}-0.89\%$
test_step_mdp_speed[True-True-True-False-True] 70.7950μs 16.9071μs 59.1468 KOps/s 58.5266 KOps/s $\color{#35bf28}+1.06\%$
test_step_mdp_speed[True-True-True-False-False] 38.7430μs 10.0560μs 99.4432 KOps/s 100.5743 KOps/s $\color{#d91a1a}-1.12\%$
test_step_mdp_speed[True-True-False-True-True] 84.0470μs 32.2010μs 31.0549 KOps/s 30.8567 KOps/s $\color{#35bf28}+0.64\%$
test_step_mdp_speed[True-True-False-True-False] 54.2710μs 19.6683μs 50.8433 KOps/s 50.9987 KOps/s $\color{#d91a1a}-0.30\%$
test_step_mdp_speed[True-True-False-False-True] 56.2450μs 18.8837μs 52.9558 KOps/s 52.8340 KOps/s $\color{#35bf28}+0.23\%$
test_step_mdp_speed[True-True-False-False-False] 37.2790μs 11.9027μs 84.0149 KOps/s 83.7804 KOps/s $\color{#35bf28}+0.28\%$
test_step_mdp_speed[True-False-True-True-True] 77.0430μs 34.3208μs 29.1368 KOps/s 29.2041 KOps/s $\color{#d91a1a}-0.23\%$
test_step_mdp_speed[True-False-True-True-False] 49.9030μs 21.7799μs 45.9139 KOps/s 46.6179 KOps/s $\color{#d91a1a}-1.51\%$
test_step_mdp_speed[True-False-True-False-True] 45.4450μs 18.9351μs 52.8119 KOps/s 52.4966 KOps/s $\color{#35bf28}+0.60\%$
test_step_mdp_speed[True-False-True-False-False] 38.4120μs 12.0051μs 83.2981 KOps/s 83.4644 KOps/s $\color{#d91a1a}-0.20\%$
test_step_mdp_speed[True-False-False-True-True] 74.3180μs 35.6019μs 28.0884 KOps/s 27.9015 KOps/s $\color{#35bf28}+0.67\%$
test_step_mdp_speed[True-False-False-True-False] 56.0540μs 23.4665μs 42.6139 KOps/s 42.8229 KOps/s $\color{#d91a1a}-0.49\%$
test_step_mdp_speed[True-False-False-False-True] 53.3700μs 20.5776μs 48.5966 KOps/s 48.1171 KOps/s $\color{#35bf28}+1.00\%$
test_step_mdp_speed[True-False-False-False-False] 41.7980μs 13.6456μs 73.2838 KOps/s 72.9557 KOps/s $\color{#35bf28}+0.45\%$
test_step_mdp_speed[False-True-True-True-True] 76.8630μs 34.2565μs 29.1916 KOps/s 29.2118 KOps/s $\color{#d91a1a}-0.07\%$
test_step_mdp_speed[False-True-True-True-False] 58.2990μs 21.7344μs 46.0101 KOps/s 46.4598 KOps/s $\color{#d91a1a}-0.97\%$
test_step_mdp_speed[False-True-True-False-True] 52.1770μs 21.4790μs 46.5570 KOps/s 46.6690 KOps/s $\color{#d91a1a}-0.24\%$
test_step_mdp_speed[False-True-True-False-False] 41.2670μs 13.1568μs 76.0062 KOps/s 76.0053 KOps/s $+0.00\%$
test_step_mdp_speed[False-True-False-True-True] 98.0230μs 35.6669μs 28.0372 KOps/s 27.8837 KOps/s $\color{#35bf28}+0.55\%$
test_step_mdp_speed[False-True-False-True-False] 68.1770μs 23.3686μs 42.7924 KOps/s 43.0705 KOps/s $\color{#d91a1a}-0.65\%$
test_step_mdp_speed[False-True-False-False-True] 2.4996ms 23.1756μs 43.1487 KOps/s 42.2969 KOps/s $\color{#35bf28}+2.01\%$
test_step_mdp_speed[False-True-False-False-False] 43.5220μs 15.0318μs 66.5257 KOps/s 66.4160 KOps/s $\color{#35bf28}+0.17\%$
test_step_mdp_speed[False-False-True-True-True] 83.5160μs 37.7501μs 26.4900 KOps/s 26.5012 KOps/s $\color{#d91a1a}-0.04\%$
test_step_mdp_speed[False-False-True-True-False] 57.6780μs 25.3278μs 39.4824 KOps/s 39.6839 KOps/s $\color{#d91a1a}-0.51\%$
test_step_mdp_speed[False-False-True-False-True] 64.8210μs 22.9590μs 43.5558 KOps/s 42.8793 KOps/s $\color{#35bf28}+1.58\%$
test_step_mdp_speed[False-False-True-False-False] 41.2970μs 15.0321μs 66.5241 KOps/s 66.8401 KOps/s $\color{#d91a1a}-0.47\%$
test_step_mdp_speed[False-False-False-True-True] 87.3130μs 38.9878μs 25.6491 KOps/s 25.6583 KOps/s $\color{#d91a1a}-0.04\%$
test_step_mdp_speed[False-False-False-True-False] 61.4640μs 26.9796μs 37.0650 KOps/s 37.3186 KOps/s $\color{#d91a1a}-0.68\%$
test_step_mdp_speed[False-False-False-False-True] 58.2080μs 24.7139μs 40.4630 KOps/s 40.3159 KOps/s $\color{#35bf28}+0.36\%$
test_step_mdp_speed[False-False-False-False-False] 69.7170μs 16.6198μs 60.1692 KOps/s 60.0916 KOps/s $\color{#35bf28}+0.13\%$
test_values[generalized_advantage_estimate-True-True] 11.1910ms 9.9621ms 100.3809 Ops/s 102.5281 Ops/s $\color{#d91a1a}-2.09\%$
test_values[vec_generalized_advantage_estimate-True-True] 41.7908ms 33.7932ms 29.5918 Ops/s 29.9345 Ops/s $\color{#d91a1a}-1.14\%$
test_values[td0_return_estimate-False-False] 0.2365ms 0.1747ms 5.7249 KOps/s 5.8554 KOps/s $\color{#d91a1a}-2.23\%$
test_values[td1_return_estimate-False-False] 27.0814ms 24.2163ms 41.2945 Ops/s 41.2623 Ops/s $\color{#35bf28}+0.08\%$
test_values[vec_td1_return_estimate-False-False] 37.0144ms 33.5652ms 29.7928 Ops/s 29.8408 Ops/s $\color{#d91a1a}-0.16\%$
test_values[td_lambda_return_estimate-True-False] 41.0218ms 35.2145ms 28.3974 Ops/s 28.1357 Ops/s $\color{#35bf28}+0.93\%$
test_values[vec_td_lambda_return_estimate-True-False] 42.1603ms 33.8206ms 29.5678 Ops/s 29.6107 Ops/s $\color{#d91a1a}-0.15\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 8.7963ms 8.5448ms 117.0302 Ops/s 114.1576 Ops/s $\color{#35bf28}+2.52\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 2.1517ms 1.8671ms 535.5995 Ops/s 518.6288 Ops/s $\color{#35bf28}+3.27\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.6040ms 0.3665ms 2.7288 KOps/s 2.7679 KOps/s $\color{#d91a1a}-1.41\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 48.1164ms 41.3940ms 24.1581 Ops/s 26.3774 Ops/s $\textbf{\color{#d91a1a}-8.41\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 4.0244ms 3.0314ms 329.8835 Ops/s 328.8393 Ops/s $\color{#35bf28}+0.32\%$
test_dqn_speed[False-None] 5.8785ms 1.4267ms 700.8947 Ops/s 703.6774 Ops/s $\color{#d91a1a}-0.40\%$
test_dqn_speed[False-backward] 2.0286ms 1.9286ms 518.5028 Ops/s 522.6890 Ops/s $\color{#d91a1a}-0.80\%$
test_dqn_speed[True-None] 0.1597s 0.5795ms 1.7255 KOps/s 2.0429 KOps/s $\textbf{\color{#d91a1a}-15.54\%}$
test_dqn_speed[True-backward] 1.1029ms 0.9637ms 1.0377 KOps/s 1.0782 KOps/s $\color{#d91a1a}-3.75\%$
test_dqn_speed[reduce-overhead-None] 0.6030ms 0.4820ms 2.0746 KOps/s 2.0652 KOps/s $\color{#35bf28}+0.45\%$
test_dqn_speed[reduce-overhead-backward] 1.4984ms 1.0989ms 910.0066 Ops/s 1.0932 KOps/s $\textbf{\color{#d91a1a}-16.76\%}$
test_ddpg_speed[False-None] 3.5792ms 2.9063ms 344.0789 Ops/s 342.4450 Ops/s $\color{#35bf28}+0.48\%$
test_ddpg_speed[False-backward] 5.1755ms 4.1298ms 242.1449 Ops/s 246.7769 Ops/s $\color{#d91a1a}-1.88\%$
test_ddpg_speed[True-None] 1.3122ms 1.0170ms 983.3175 Ops/s 976.0122 Ops/s $\color{#35bf28}+0.75\%$
test_ddpg_speed[True-backward] 2.0923ms 1.9290ms 518.4090 Ops/s 519.9714 Ops/s $\color{#d91a1a}-0.30\%$
test_ddpg_speed[reduce-overhead-None] 1.6944ms 1.0251ms 975.5554 Ops/s 969.1795 Ops/s $\color{#35bf28}+0.66\%$
test_ddpg_speed[reduce-overhead-backward] 2.0606ms 1.9043ms 525.1358 Ops/s 506.3836 Ops/s $\color{#35bf28}+3.70\%$
test_sac_speed[False-None] 8.6470ms 8.1332ms 122.9524 Ops/s 117.3285 Ops/s $\color{#35bf28}+4.79\%$
test_sac_speed[False-backward] 11.4524ms 10.9090ms 91.6676 Ops/s 90.8172 Ops/s $\color{#35bf28}+0.94\%$
test_sac_speed[True-None] 2.1740ms 1.8392ms 543.7058 Ops/s 539.3048 Ops/s $\color{#35bf28}+0.82\%$
test_sac_speed[True-backward] 4.4707ms 3.5914ms 278.4461 Ops/s 261.7017 Ops/s $\textbf{\color{#35bf28}+6.40\%}$
test_sac_speed[reduce-overhead-None] 2.3081ms 1.8516ms 540.0650 Ops/s 536.9291 Ops/s $\color{#35bf28}+0.58\%$
test_sac_speed[reduce-overhead-backward] 3.7381ms 3.5418ms 282.3439 Ops/s 277.0206 Ops/s $\color{#35bf28}+1.92\%$
test_redq_speed[False-None] 14.7163ms 12.9438ms 77.2572 Ops/s 77.1990 Ops/s $\color{#35bf28}+0.08\%$
test_redq_speed[False-backward] 24.2201ms 22.4478ms 44.5478 Ops/s 44.6773 Ops/s $\color{#d91a1a}-0.29\%$
test_redq_speed[True-None] 5.5593ms 4.6889ms 213.2716 Ops/s 210.8362 Ops/s $\color{#35bf28}+1.16\%$
test_redq_speed[True-backward] 12.6464ms 12.0488ms 82.9957 Ops/s 83.1270 Ops/s $\color{#d91a1a}-0.16\%$
test_redq_speed[reduce-overhead-None] 5.3663ms 4.6307ms 215.9500 Ops/s 212.7854 Ops/s $\color{#35bf28}+1.49\%$
test_redq_speed[reduce-overhead-backward] 13.3801ms 12.1823ms 82.0864 Ops/s 82.3967 Ops/s $\color{#d91a1a}-0.38\%$
test_redq_deprec_speed[False-None] 14.3715ms 13.0370ms 76.7046 Ops/s 76.0837 Ops/s $\color{#35bf28}+0.82\%$
test_redq_deprec_speed[False-backward] 22.3973ms 18.9440ms 52.7872 Ops/s 53.0083 Ops/s $\color{#d91a1a}-0.42\%$
test_redq_deprec_speed[True-None] 4.3360ms 3.5898ms 278.5700 Ops/s 274.2815 Ops/s $\color{#35bf28}+1.56\%$
test_redq_deprec_speed[True-backward] 9.1276ms 8.0662ms 123.9736 Ops/s 125.4371 Ops/s $\color{#d91a1a}-1.17\%$
test_redq_deprec_speed[reduce-overhead-None] 4.3974ms 3.6418ms 274.5914 Ops/s 279.0390 Ops/s $\color{#d91a1a}-1.59\%$
test_redq_deprec_speed[reduce-overhead-backward] 9.6019ms 8.0996ms 123.4630 Ops/s 126.5189 Ops/s $\color{#d91a1a}-2.42\%$
test_td3_speed[False-None] 10.8418ms 8.1941ms 122.0396 Ops/s 122.3722 Ops/s $\color{#d91a1a}-0.27\%$
test_td3_speed[False-backward] 12.0703ms 10.6420ms 93.9673 Ops/s 93.7880 Ops/s $\color{#35bf28}+0.19\%$
test_td3_speed[True-None] 1.9911ms 1.7469ms 572.4276 Ops/s 574.3595 Ops/s $\color{#d91a1a}-0.34\%$
test_td3_speed[True-backward] 4.2599ms 3.3954ms 294.5181 Ops/s 294.8850 Ops/s $\color{#d91a1a}-0.12\%$
test_td3_speed[reduce-overhead-None] 2.1825ms 1.7458ms 572.8137 Ops/s 572.1913 Ops/s $\color{#35bf28}+0.11\%$
test_td3_speed[reduce-overhead-backward] 3.4747ms 3.3383ms 299.5519 Ops/s 300.0253 Ops/s $\color{#d91a1a}-0.16\%$
test_cql_speed[False-None] 39.5550ms 36.5648ms 27.3487 Ops/s 27.1927 Ops/s $\color{#35bf28}+0.57\%$
test_cql_speed[False-backward] 49.1373ms 46.4687ms 21.5199 Ops/s 20.9161 Ops/s $\color{#35bf28}+2.89\%$
test_cql_speed[True-None] 18.0198ms 15.6412ms 63.9337 Ops/s 63.6001 Ops/s $\color{#35bf28}+0.52\%$
test_cql_speed[True-backward] 24.6169ms 22.2157ms 45.0133 Ops/s 44.5740 Ops/s $\color{#35bf28}+0.99\%$
test_cql_speed[reduce-overhead-None] 19.2025ms 15.9611ms 62.6522 Ops/s 63.4155 Ops/s $\color{#d91a1a}-1.20\%$
test_cql_speed[reduce-overhead-backward] 25.1812ms 22.2740ms 44.8955 Ops/s 44.1684 Ops/s $\color{#35bf28}+1.65\%$
test_a2c_speed[False-None] 9.8242ms 7.2743ms 137.4694 Ops/s 136.4039 Ops/s $\color{#35bf28}+0.78\%$
test_a2c_speed[False-backward] 15.6005ms 14.5751ms 68.6100 Ops/s 68.6209 Ops/s $\color{#d91a1a}-0.02\%$
test_a2c_speed[True-None] 4.8255ms 4.2218ms 236.8675 Ops/s 235.2513 Ops/s $\color{#35bf28}+0.69\%$
test_a2c_speed[True-backward] 11.5862ms 10.7756ms 92.8019 Ops/s 92.0012 Ops/s $\color{#35bf28}+0.87\%$
test_a2c_speed[reduce-overhead-None] 6.4565ms 4.2647ms 234.4825 Ops/s 235.3481 Ops/s $\color{#d91a1a}-0.37\%$
test_a2c_speed[reduce-overhead-backward] 11.1693ms 10.7300ms 93.1963 Ops/s 92.0910 Ops/s $\color{#35bf28}+1.20\%$
test_ppo_speed[False-None] 9.1705ms 7.4548ms 134.1425 Ops/s 132.8750 Ops/s $\color{#35bf28}+0.95\%$
test_ppo_speed[False-backward] 16.9591ms 14.8960ms 67.1322 Ops/s 67.4527 Ops/s $\color{#d91a1a}-0.48\%$
test_ppo_speed[True-None] 4.0988ms 3.7143ms 269.2273 Ops/s 268.4383 Ops/s $\color{#35bf28}+0.29\%$
test_ppo_speed[True-backward] 10.3746ms 9.6630ms 103.4880 Ops/s 103.1195 Ops/s $\color{#35bf28}+0.36\%$
test_ppo_speed[reduce-overhead-None] 4.5647ms 3.7058ms 269.8475 Ops/s 268.7609 Ops/s $\color{#35bf28}+0.40\%$
test_ppo_speed[reduce-overhead-backward] 9.9868ms 9.5900ms 104.2748 Ops/s 101.8931 Ops/s $\color{#35bf28}+2.34\%$
test_reinforce_speed[False-None] 7.8927ms 6.6019ms 151.4709 Ops/s 150.5846 Ops/s $\color{#35bf28}+0.59\%$
test_reinforce_speed[False-backward] 11.1325ms 9.8344ms 101.6843 Ops/s 101.0116 Ops/s $\color{#35bf28}+0.67\%$
test_reinforce_speed[True-None] 4.9305ms 2.6697ms 374.5781 Ops/s 369.4488 Ops/s $\color{#35bf28}+1.39\%$
test_reinforce_speed[True-backward] 9.0349ms 8.5701ms 116.6851 Ops/s 115.6103 Ops/s $\color{#35bf28}+0.93\%$
test_reinforce_speed[reduce-overhead-None] 3.1177ms 2.6701ms 374.5214 Ops/s 365.8012 Ops/s $\color{#35bf28}+2.38\%$
test_reinforce_speed[reduce-overhead-backward] 8.9877ms 8.5909ms 116.4026 Ops/s 115.2261 Ops/s $\color{#35bf28}+1.02\%$
test_iql_speed[False-None] 34.3480ms 32.3447ms 30.9170 Ops/s 29.9205 Ops/s $\color{#35bf28}+3.33\%$
test_iql_speed[False-backward] 47.9863ms 45.4502ms 22.0021 Ops/s 15.0920 Ops/s $\textbf{\color{#35bf28}+45.79\%}$
test_iql_speed[True-None] 12.0498ms 10.6870ms 93.5713 Ops/s 92.5960 Ops/s $\color{#35bf28}+1.05\%$
test_iql_speed[True-backward] 26.2303ms 21.7493ms 45.9785 Ops/s 46.0030 Ops/s $\color{#d91a1a}-0.05\%$
test_iql_speed[reduce-overhead-None] 11.9423ms 10.7297ms 93.1994 Ops/s 92.8201 Ops/s $\color{#35bf28}+0.41\%$
test_iql_speed[reduce-overhead-backward] 22.6576ms 21.6884ms 46.1075 Ops/s 45.7275 Ops/s $\color{#35bf28}+0.83\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 7.7507ms 4.9837ms 200.6524 Ops/s 202.2804 Ops/s $\color{#d91a1a}-0.80\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.8010ms 0.5172ms 1.9336 KOps/s 1.9085 KOps/s $\color{#35bf28}+1.31\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7547ms 0.4973ms 2.0108 KOps/s 1.9894 KOps/s $\color{#35bf28}+1.08\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 7.5730ms 4.7085ms 212.3821 Ops/s 211.9432 Ops/s $\color{#35bf28}+0.21\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.2483ms 0.5068ms 1.9731 KOps/s 1.9701 KOps/s $\color{#35bf28}+0.15\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.7208ms 0.4837ms 2.0674 KOps/s 2.0449 KOps/s $\color{#35bf28}+1.10\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 2.1008ms 1.6471ms 607.1364 Ops/s 600.2724 Ops/s $\color{#35bf28}+1.14\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 2.1400ms 1.5669ms 638.2110 Ops/s 626.1464 Ops/s $\color{#35bf28}+1.93\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 5.9171ms 4.8494ms 206.2119 Ops/s 207.4052 Ops/s $\color{#d91a1a}-0.58\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.0577ms 0.6445ms 1.5516 KOps/s 1.5416 KOps/s $\color{#35bf28}+0.65\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.9296ms 0.6214ms 1.6091 KOps/s 1.5837 KOps/s $\color{#35bf28}+1.61\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.0388ms 4.7906ms 208.7407 Ops/s 211.8221 Ops/s $\color{#d91a1a}-1.45\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.9421ms 0.5137ms 1.9467 KOps/s 1.9185 KOps/s $\color{#35bf28}+1.47\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6805ms 0.4926ms 2.0299 KOps/s 1.9769 KOps/s $\color{#35bf28}+2.68\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.3814ms 4.7003ms 212.7519 Ops/s 216.0029 Ops/s $\color{#d91a1a}-1.51\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.0812ms 0.5047ms 1.9813 KOps/s 514.9251 Ops/s $\textbf{\color{#35bf28}+284.77\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.7011ms 0.4796ms 2.0850 KOps/s 2.0325 KOps/s $\color{#35bf28}+2.58\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 7.8453ms 4.8452ms 206.3877 Ops/s 207.3833 Ops/s $\color{#d91a1a}-0.48\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.0041ms 0.6480ms 1.5431 KOps/s 1.5131 KOps/s $\color{#35bf28}+1.98\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 1.1879ms 0.6236ms 1.6035 KOps/s 1.5898 KOps/s $\color{#35bf28}+0.86\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 5.6374ms 4.1782ms 239.3352 Ops/s 225.8404 Ops/s $\textbf{\color{#35bf28}+5.98\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 8.8292ms 2.3601ms 423.7076 Ops/s 419.6287 Ops/s $\color{#35bf28}+0.97\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 2.7755ms 1.4163ms 706.0698 Ops/s 691.4880 Ops/s $\color{#35bf28}+2.11\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.3858s 11.7958ms 84.7760 Ops/s 241.5330 Ops/s $\textbf{\color{#d91a1a}-64.90\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 6.7505ms 2.4366ms 410.4000 Ops/s 423.7250 Ops/s $\color{#d91a1a}-3.14\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 2.5076ms 1.6722ms 598.0080 Ops/s 552.9388 Ops/s $\textbf{\color{#35bf28}+8.15\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 5.9992ms 4.3036ms 232.3642 Ops/s 234.2427 Ops/s $\color{#d91a1a}-0.80\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 7.2939ms 2.4733ms 404.3234 Ops/s 406.5869 Ops/s $\color{#d91a1a}-0.56\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 6.2028ms 1.6369ms 610.8953 Ops/s 611.4505 Ops/s $\color{#d91a1a}-0.09\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 13.3575ms 13.1411ms 76.0973 Ops/s 72.7778 Ops/s $\color{#35bf28}+4.56\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 16.0483ms 15.1269ms 66.1073 Ops/s 64.5866 Ops/s $\color{#35bf28}+2.35\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 23.0805ms 22.0606ms 45.3298 Ops/s 44.6649 Ops/s $\color{#35bf28}+1.49\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 17.6898ms 15.4135ms 64.8784 Ops/s 64.0454 Ops/s $\color{#35bf28}+1.30\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 24.5644ms 21.9218ms 45.6167 Ops/s 44.9989 Ops/s $\color{#35bf28}+1.37\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 18.2255ms 16.7498ms 59.7021 Ops/s 59.6885 Ops/s $\color{#35bf28}+0.02\%$

Copy link

github-actions bot commented Jan 10, 2025

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 149. Improved: $\large\color{#35bf28}21$. Worsened: $\large\color{#d91a1a}6$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_simple 0.8099s 0.7152s 1.3982 Ops/s 1.3922 Ops/s $\color{#35bf28}+0.43\%$
test_transformed 0.9543s 0.9401s 1.0637 Ops/s 1.0370 Ops/s $\color{#35bf28}+2.58\%$
test_serial 2.1215s 2.0836s 0.4799 Ops/s 0.4802 Ops/s $\color{#d91a1a}-0.06\%$
test_parallel 1.8604s 1.8216s 0.5490 Ops/s 0.5537 Ops/s $\color{#d91a1a}-0.86\%$
test_step_mdp_speed[True-True-True-True-True] 0.1881ms 38.4183μs 26.0293 KOps/s 26.3094 KOps/s $\color{#d91a1a}-1.06\%$
test_step_mdp_speed[True-True-True-True-False] 54.9330μs 22.5391μs 44.3673 KOps/s 45.1751 KOps/s $\color{#d91a1a}-1.79\%$
test_step_mdp_speed[True-True-True-False-True] 50.7830μs 20.9562μs 47.7186 KOps/s 47.5764 KOps/s $\color{#35bf28}+0.30\%$
test_step_mdp_speed[True-True-True-False-False] 38.6120μs 12.5106μs 79.9319 KOps/s 80.8848 KOps/s $\color{#d91a1a}-1.18\%$
test_step_mdp_speed[True-True-False-True-True] 0.1212ms 40.3842μs 24.7622 KOps/s 24.3968 KOps/s $\color{#35bf28}+1.50\%$
test_step_mdp_speed[True-True-False-True-False] 55.3520μs 24.5134μs 40.7941 KOps/s 40.8907 KOps/s $\color{#d91a1a}-0.24\%$
test_step_mdp_speed[True-True-False-False-True] 58.8230μs 23.3584μs 42.8112 KOps/s 42.3923 KOps/s $\color{#35bf28}+0.99\%$
test_step_mdp_speed[True-True-False-False-False] 44.9330μs 14.7278μs 67.8986 KOps/s 68.3464 KOps/s $\color{#d91a1a}-0.66\%$
test_step_mdp_speed[True-False-True-True-True] 82.3340μs 43.4193μs 23.0312 KOps/s 23.2290 KOps/s $\color{#d91a1a}-0.85\%$
test_step_mdp_speed[True-False-True-True-False] 59.7930μs 26.7671μs 37.3593 KOps/s 36.9763 KOps/s $\color{#35bf28}+1.04\%$
test_step_mdp_speed[True-False-True-False-True] 55.6430μs 23.8098μs 41.9995 KOps/s 42.4060 KOps/s $\color{#d91a1a}-0.96\%$
test_step_mdp_speed[True-False-True-False-False] 48.0620μs 14.6722μs 68.1559 KOps/s 68.0592 KOps/s $\color{#35bf28}+0.14\%$
test_step_mdp_speed[True-False-False-True-True] 77.9230μs 45.2753μs 22.0871 KOps/s 21.9068 KOps/s $\color{#35bf28}+0.82\%$
test_step_mdp_speed[True-False-False-True-False] 59.2630μs 29.3375μs 34.0861 KOps/s 34.5528 KOps/s $\color{#d91a1a}-1.35\%$
test_step_mdp_speed[True-False-False-False-True] 61.0330μs 25.5117μs 39.1976 KOps/s 39.7726 KOps/s $\color{#d91a1a}-1.45\%$
test_step_mdp_speed[True-False-False-False-False] 45.8420μs 16.8123μs 59.4802 KOps/s 59.7061 KOps/s $\color{#d91a1a}-0.38\%$
test_step_mdp_speed[False-True-True-True-True] 73.8530μs 43.1896μs 23.1537 KOps/s 23.1223 KOps/s $\color{#35bf28}+0.14\%$
test_step_mdp_speed[False-True-True-True-False] 52.2430μs 27.1375μs 36.8494 KOps/s 37.8282 KOps/s $\color{#d91a1a}-2.59\%$
test_step_mdp_speed[False-True-True-False-True] 61.6130μs 27.7373μs 36.0525 KOps/s 37.2867 KOps/s $\color{#d91a1a}-3.31\%$
test_step_mdp_speed[False-True-True-False-False] 44.4420μs 16.4371μs 60.8380 KOps/s 61.0610 KOps/s $\color{#d91a1a}-0.37\%$
test_step_mdp_speed[False-True-False-True-True] 80.6130μs 45.7149μs 21.8747 KOps/s 21.9851 KOps/s $\color{#d91a1a}-0.50\%$
test_step_mdp_speed[False-True-False-True-False] 60.3830μs 29.2610μs 34.1752 KOps/s 34.4279 KOps/s $\color{#d91a1a}-0.73\%$
test_step_mdp_speed[False-True-False-False-True] 3.3036ms 29.9950μs 33.3389 KOps/s 33.1202 KOps/s $\color{#35bf28}+0.66\%$
test_step_mdp_speed[False-True-False-False-False] 49.9320μs 18.7479μs 53.3393 KOps/s 53.7889 KOps/s $\color{#d91a1a}-0.84\%$
test_step_mdp_speed[False-False-True-True-True] 88.9950μs 47.4709μs 21.0655 KOps/s 21.1028 KOps/s $\color{#d91a1a}-0.18\%$
test_step_mdp_speed[False-False-True-True-False] 62.3530μs 31.5166μs 31.7293 KOps/s 32.1188 KOps/s $\color{#d91a1a}-1.21\%$
test_step_mdp_speed[False-False-True-False-True] 59.2120μs 29.3558μs 34.0648 KOps/s 34.3226 KOps/s $\color{#d91a1a}-0.75\%$
test_step_mdp_speed[False-False-True-False-False] 50.9920μs 18.7484μs 53.3378 KOps/s 54.1032 KOps/s $\color{#d91a1a}-1.41\%$
test_step_mdp_speed[False-False-False-True-True] 81.5840μs 49.8125μs 20.0753 KOps/s 20.4066 KOps/s $\color{#d91a1a}-1.62\%$
test_step_mdp_speed[False-False-False-True-False] 59.9630μs 33.7423μs 29.6364 KOps/s 29.7426 KOps/s $\color{#d91a1a}-0.36\%$
test_step_mdp_speed[False-False-False-False-True] 68.2230μs 31.1478μs 32.1050 KOps/s 31.2768 KOps/s $\color{#35bf28}+2.65\%$
test_step_mdp_speed[False-False-False-False-False] 62.0130μs 20.6863μs 48.3412 KOps/s 48.5401 KOps/s $\color{#d91a1a}-0.41\%$
test_values[generalized_advantage_estimate-True-True] 24.6157ms 24.1878ms 41.3431 Ops/s 41.5803 Ops/s $\color{#d91a1a}-0.57\%$
test_values[vec_generalized_advantage_estimate-True-True] 96.1099ms 2.8085ms 356.0609 Ops/s 342.2796 Ops/s $\color{#35bf28}+4.03\%$
test_values[td0_return_estimate-False-False] 0.1001ms 76.6345μs 13.0490 KOps/s 12.7685 KOps/s $\color{#35bf28}+2.20\%$
test_values[td1_return_estimate-False-False] 54.1326ms 53.4797ms 18.6987 Ops/s 18.5245 Ops/s $\color{#35bf28}+0.94\%$
test_values[vec_td1_return_estimate-False-False] 1.3598ms 1.0704ms 934.2017 Ops/s 933.5861 Ops/s $\color{#35bf28}+0.07\%$
test_values[td_lambda_return_estimate-True-False] 85.0902ms 84.6944ms 11.8072 Ops/s 11.7243 Ops/s $\color{#35bf28}+0.71\%$
test_values[vec_td_lambda_return_estimate-True-False] 1.3380ms 1.0616ms 942.0025 Ops/s 938.7834 Ops/s $\color{#35bf28}+0.34\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 24.1469ms 23.8661ms 41.9005 Ops/s 42.2568 Ops/s $\color{#d91a1a}-0.84\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.0707ms 0.7313ms 1.3675 KOps/s 1.3544 KOps/s $\color{#35bf28}+0.96\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.7589ms 0.6570ms 1.5220 KOps/s 1.5221 KOps/s $-0.00\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.5103ms 1.4599ms 684.9757 Ops/s 683.0893 Ops/s $\color{#35bf28}+0.28\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.7173ms 0.6711ms 1.4901 KOps/s 1.4890 KOps/s $\color{#35bf28}+0.07\%$
test_dqn_speed[False-None] 6.9373ms 1.4700ms 680.2581 Ops/s 681.0168 Ops/s $\color{#d91a1a}-0.11\%$
test_dqn_speed[False-backward] 2.1089ms 2.0580ms 485.9164 Ops/s 482.1171 Ops/s $\color{#35bf28}+0.79\%$
test_dqn_speed[True-None] 0.6213ms 0.5374ms 1.8610 KOps/s 1.7989 KOps/s $\color{#35bf28}+3.45\%$
test_dqn_speed[True-backward] 1.1520ms 1.0782ms 927.4800 Ops/s 910.1628 Ops/s $\color{#35bf28}+1.90\%$
test_dqn_speed[reduce-overhead-None] 0.6191ms 0.5536ms 1.8063 KOps/s 1.7231 KOps/s $\color{#35bf28}+4.83\%$
test_dqn_speed[reduce-overhead-backward] 1.0267ms 0.9400ms 1.0639 KOps/s 1.0498 KOps/s $\color{#35bf28}+1.34\%$
test_ddpg_speed[False-None] 3.1153ms 2.7874ms 358.7543 Ops/s 357.1125 Ops/s $\color{#35bf28}+0.46\%$
test_ddpg_speed[False-backward] 4.4283ms 4.0140ms 249.1260 Ops/s 248.6317 Ops/s $\color{#35bf28}+0.20\%$
test_ddpg_speed[True-None] 1.1135ms 1.0553ms 947.5599 Ops/s 934.6058 Ops/s $\color{#35bf28}+1.39\%$
test_ddpg_speed[True-backward] 2.1472ms 2.1044ms 475.2017 Ops/s 468.0585 Ops/s $\color{#35bf28}+1.53\%$
test_ddpg_speed[reduce-overhead-None] 1.2437ms 1.1181ms 894.3825 Ops/s 924.5523 Ops/s $\color{#d91a1a}-3.26\%$
test_ddpg_speed[reduce-overhead-backward] 1.6262ms 1.5919ms 628.1715 Ops/s 615.3804 Ops/s $\color{#35bf28}+2.08\%$
test_sac_speed[False-None] 8.2780ms 7.8650ms 127.1461 Ops/s 127.4961 Ops/s $\color{#d91a1a}-0.27\%$
test_sac_speed[False-backward] 11.4829ms 10.7660ms 92.8847 Ops/s 93.0113 Ops/s $\color{#d91a1a}-0.14\%$
test_sac_speed[True-None] 1.5859ms 1.4971ms 667.9589 Ops/s 658.5958 Ops/s $\color{#35bf28}+1.42\%$
test_sac_speed[True-backward] 3.1796ms 3.1356ms 318.9215 Ops/s 299.4768 Ops/s $\textbf{\color{#35bf28}+6.49\%}$
test_sac_speed[reduce-overhead-None] 22.4133ms 12.4455ms 80.3505 Ops/s 80.0368 Ops/s $\color{#35bf28}+0.39\%$
test_sac_speed[reduce-overhead-backward] 1.3817ms 1.3209ms 757.0351 Ops/s 750.3526 Ops/s $\color{#35bf28}+0.89\%$
test_redq_speed[False-None] 8.1018ms 7.3329ms 136.3721 Ops/s 134.1740 Ops/s $\color{#35bf28}+1.64\%$
test_redq_speed[False-backward] 12.1033ms 11.1011ms 90.0809 Ops/s 88.8068 Ops/s $\color{#35bf28}+1.43\%$
test_redq_speed[True-None] 2.0030ms 1.9330ms 517.3271 Ops/s 510.8450 Ops/s $\color{#35bf28}+1.27\%$
test_redq_speed[True-backward] 3.7369ms 3.5459ms 282.0192 Ops/s 262.7969 Ops/s $\textbf{\color{#35bf28}+7.31\%}$
test_redq_speed[reduce-overhead-None] 2.1308ms 1.9580ms 510.7244 Ops/s 510.5652 Ops/s $\color{#35bf28}+0.03\%$
test_redq_speed[reduce-overhead-backward] 4.0112ms 3.5509ms 281.6156 Ops/s 277.3704 Ops/s $\color{#35bf28}+1.53\%$
test_redq_deprec_speed[False-None] 9.4277ms 8.8721ms 112.7132 Ops/s 110.2068 Ops/s $\color{#35bf28}+2.27\%$
test_redq_deprec_speed[False-backward] 12.2557ms 11.7953ms 84.7792 Ops/s 83.4380 Ops/s $\color{#35bf28}+1.61\%$
test_redq_deprec_speed[True-None] 2.4432ms 2.2798ms 438.6434 Ops/s 435.8419 Ops/s $\color{#35bf28}+0.64\%$
test_redq_deprec_speed[True-backward] 4.3365ms 3.9443ms 253.5307 Ops/s 243.6590 Ops/s $\color{#35bf28}+4.05\%$
test_redq_deprec_speed[reduce-overhead-None] 2.3459ms 2.2767ms 439.2409 Ops/s 434.0132 Ops/s $\color{#35bf28}+1.20\%$
test_redq_deprec_speed[reduce-overhead-backward] 3.9800ms 3.9141ms 255.4881 Ops/s 245.0830 Ops/s $\color{#35bf28}+4.25\%$
test_td3_speed[False-None] 7.8099ms 7.7669ms 128.7522 Ops/s 127.4631 Ops/s $\color{#35bf28}+1.01\%$
test_td3_speed[False-backward] 10.6388ms 10.1262ms 98.7540 Ops/s 96.0811 Ops/s $\color{#35bf28}+2.78\%$
test_td3_speed[True-None] 1.5767ms 1.5527ms 644.0586 Ops/s 625.4516 Ops/s $\color{#35bf28}+2.97\%$
test_td3_speed[True-backward] 3.1006ms 3.0500ms 327.8688 Ops/s 305.4953 Ops/s $\textbf{\color{#35bf28}+7.32\%}$
test_td3_speed[reduce-overhead-None] 55.2740ms 24.9968ms 40.0051 Ops/s 40.2949 Ops/s $\color{#d91a1a}-0.72\%$
test_td3_speed[reduce-overhead-backward] 1.3584ms 1.2873ms 776.8379 Ops/s 689.3220 Ops/s $\textbf{\color{#35bf28}+12.70\%}$
test_cql_speed[False-None] 17.0114ms 16.4078ms 60.9467 Ops/s 60.4738 Ops/s $\color{#35bf28}+0.78\%$
test_cql_speed[False-backward] 21.9382ms 21.4529ms 46.6137 Ops/s 45.4180 Ops/s $\color{#35bf28}+2.63\%$
test_cql_speed[True-None] 3.0266ms 2.8509ms 350.7621 Ops/s 346.0878 Ops/s $\color{#35bf28}+1.35\%$
test_cql_speed[True-backward] 4.9874ms 4.9135ms 203.5195 Ops/s 199.1635 Ops/s $\color{#35bf28}+2.19\%$
test_cql_speed[reduce-overhead-None] 0.3689s 14.6636ms 68.1961 Ops/s 77.3314 Ops/s $\textbf{\color{#d91a1a}-11.81\%}$
test_cql_speed[reduce-overhead-backward] 1.5738ms 1.5004ms 666.4782 Ops/s 589.8128 Ops/s $\textbf{\color{#35bf28}+13.00\%}$
test_a2c_speed[False-None] 3.3327ms 3.1351ms 318.9713 Ops/s 314.8994 Ops/s $\color{#35bf28}+1.29\%$
test_a2c_speed[False-backward] 6.4533ms 5.9923ms 166.8815 Ops/s 156.6318 Ops/s $\textbf{\color{#35bf28}+6.54\%}$
test_a2c_speed[True-None] 1.1006ms 0.9993ms 1.0007 KOps/s 992.2764 Ops/s $\color{#35bf28}+0.85\%$
test_a2c_speed[True-backward] 2.6073ms 2.5450ms 392.9227 Ops/s 391.7975 Ops/s $\color{#35bf28}+0.29\%$
test_a2c_speed[reduce-overhead-None] 20.9295ms 11.3136ms 88.3889 Ops/s 89.3130 Ops/s $\color{#d91a1a}-1.03\%$
test_a2c_speed[reduce-overhead-backward] 1.0470ms 0.9577ms 1.0442 KOps/s 1.0042 KOps/s $\color{#35bf28}+3.98\%$
test_ppo_speed[False-None] 3.7203ms 3.6000ms 277.7806 Ops/s 274.9605 Ops/s $\color{#35bf28}+1.03\%$
test_ppo_speed[False-backward] 7.1018ms 6.6568ms 150.2230 Ops/s 147.7293 Ops/s $\color{#35bf28}+1.69\%$
test_ppo_speed[True-None] 1.0850ms 0.9393ms 1.0646 KOps/s 1.0465 KOps/s $\color{#35bf28}+1.73\%$
test_ppo_speed[True-backward] 2.5835ms 2.4980ms 400.3166 Ops/s 394.1417 Ops/s $\color{#35bf28}+1.57\%$
test_ppo_speed[reduce-overhead-None] 0.5852ms 0.5254ms 1.9033 KOps/s 70.7858 Ops/s $\textbf{\color{#35bf28}+2588.75\%}$
test_ppo_speed[reduce-overhead-backward] 1.0060ms 0.9566ms 1.0454 KOps/s 992.3505 Ops/s $\textbf{\color{#35bf28}+5.35\%}$
test_reinforce_speed[False-None] 2.3605ms 2.2241ms 449.6242 Ops/s 448.2040 Ops/s $\color{#35bf28}+0.32\%$
test_reinforce_speed[False-backward] 3.6658ms 3.2331ms 309.3015 Ops/s 309.4993 Ops/s $\color{#d91a1a}-0.06\%$
test_reinforce_speed[True-None] 0.9397ms 0.8295ms 1.2055 KOps/s 1.1569 KOps/s $\color{#35bf28}+4.21\%$
test_reinforce_speed[True-backward] 2.3993ms 2.3532ms 424.9452 Ops/s 414.0171 Ops/s $\color{#35bf28}+2.64\%$
test_reinforce_speed[reduce-overhead-None] 0.2925s 11.8413ms 84.4499 Ops/s 90.6692 Ops/s $\textbf{\color{#d91a1a}-6.86\%}$
test_reinforce_speed[reduce-overhead-backward] 1.1260ms 1.0192ms 981.1859 Ops/s 890.8744 Ops/s $\textbf{\color{#35bf28}+10.14\%}$
test_iql_speed[False-None] 9.7157ms 9.2523ms 108.0815 Ops/s 108.6788 Ops/s $\color{#d91a1a}-0.55\%$
test_iql_speed[False-backward] 13.7945ms 12.8187ms 78.0111 Ops/s 76.6429 Ops/s $\color{#35bf28}+1.79\%$
test_iql_speed[True-None] 2.0072ms 1.8038ms 554.3919 Ops/s 571.8593 Ops/s $\color{#d91a1a}-3.05\%$
test_iql_speed[True-backward] 4.2324ms 4.1199ms 242.7272 Ops/s 236.7248 Ops/s $\color{#35bf28}+2.54\%$
test_iql_speed[reduce-overhead-None] 19.2833ms 11.1027ms 90.0684 Ops/s 70.3226 Ops/s $\textbf{\color{#35bf28}+28.08\%}$
test_iql_speed[reduce-overhead-backward] 1.4766ms 1.4014ms 713.5826 Ops/s 700.8322 Ops/s $\color{#35bf28}+1.82\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 7.8103ms 6.2572ms 159.8160 Ops/s 157.9212 Ops/s $\color{#35bf28}+1.20\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.5323ms 0.3391ms 2.9489 KOps/s 3.0247 KOps/s $\color{#d91a1a}-2.51\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5255ms 0.3231ms 3.0953 KOps/s 3.9250 KOps/s $\textbf{\color{#d91a1a}-21.14\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.2676ms 6.0231ms 166.0282 Ops/s 165.1683 Ops/s $\color{#35bf28}+0.52\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.9056ms 0.2962ms 3.3758 KOps/s 3.1957 KOps/s $\textbf{\color{#35bf28}+5.63\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5224ms 0.2922ms 3.4226 KOps/s 3.4728 KOps/s $\color{#d91a1a}-1.45\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.5047ms 1.2646ms 790.7929 Ops/s 709.9252 Ops/s $\textbf{\color{#35bf28}+11.39\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.4835ms 1.1834ms 845.0327 Ops/s 771.9917 Ops/s $\textbf{\color{#35bf28}+9.46\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.2823ms 6.1600ms 162.3383 Ops/s 161.1858 Ops/s $\color{#35bf28}+0.72\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.7261ms 0.4194ms 2.3844 KOps/s 2.2666 KOps/s $\textbf{\color{#35bf28}+5.20\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6234ms 0.3943ms 2.5359 KOps/s 2.3071 KOps/s $\textbf{\color{#35bf28}+9.92\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.1088ms 5.9905ms 166.9302 Ops/s 165.2002 Ops/s $\color{#35bf28}+1.05\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.7697ms 0.3568ms 2.8026 KOps/s 3.1210 KOps/s $\textbf{\color{#d91a1a}-10.20\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.4923ms 0.2974ms 3.3619 KOps/s 2.9675 KOps/s $\textbf{\color{#35bf28}+13.29\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.1734ms 5.9421ms 168.2920 Ops/s 166.3614 Ops/s $\color{#35bf28}+1.16\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.0873ms 0.2658ms 3.7622 KOps/s 3.1168 KOps/s $\textbf{\color{#35bf28}+20.71\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.4507ms 0.2807ms 3.5629 KOps/s 3.6383 KOps/s $\color{#d91a1a}-2.07\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.3206ms 6.1403ms 162.8593 Ops/s 161.4966 Ops/s $\color{#35bf28}+0.84\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.7000ms 0.4130ms 2.4211 KOps/s 2.2277 KOps/s $\textbf{\color{#35bf28}+8.68\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.5944ms 0.3918ms 2.5525 KOps/s 2.3652 KOps/s $\textbf{\color{#35bf28}+7.92\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 9.4790ms 5.8732ms 170.2649 Ops/s 186.6908 Ops/s $\textbf{\color{#d91a1a}-8.80\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 8.6575ms 2.2812ms 438.3748 Ops/s 444.7347 Ops/s $\color{#d91a1a}-1.43\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 7.8796ms 1.2310ms 812.3406 Ops/s 774.1616 Ops/s $\color{#35bf28}+4.93\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 7.2753ms 5.3108ms 188.2961 Ops/s 185.5597 Ops/s $\color{#35bf28}+1.47\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 8.3461ms 1.9946ms 501.3591 Ops/s 432.4691 Ops/s $\textbf{\color{#35bf28}+15.93\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 7.2957ms 1.2925ms 773.7123 Ops/s 825.9422 Ops/s $\textbf{\color{#d91a1a}-6.32\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.4872s 15.2429ms 65.6043 Ops/s 32.7960 Ops/s $\textbf{\color{#35bf28}+100.04\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 11.1780ms 2.3015ms 434.5059 Ops/s 435.6053 Ops/s $\color{#d91a1a}-0.25\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 6.8973ms 1.4242ms 702.1365 Ops/s 722.1334 Ops/s $\color{#d91a1a}-2.77\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 15.6934ms 15.0022ms 66.6568 Ops/s 66.0807 Ops/s $\color{#35bf28}+0.87\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 20.0454ms 17.3537ms 57.6246 Ops/s 58.6743 Ops/s $\color{#d91a1a}-1.79\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 19.3923ms 19.1025ms 52.3493 Ops/s 50.0971 Ops/s $\color{#35bf28}+4.50\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 19.7950ms 17.4067ms 57.4491 Ops/s 57.8693 Ops/s $\color{#d91a1a}-0.73\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 19.8807ms 19.0639ms 52.4552 Ops/s 51.3242 Ops/s $\color{#35bf28}+2.20\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 20.2474ms 18.8808ms 52.9638 Ops/s 52.9624 Ops/s $+0.00\%$

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Jan 10, 2025
ghstack-source-id: dfcb987806f7dfc4d1d9a1ef6a5161a35284fdf0
Pull Request resolved: #2687
@vmoens vmoens added the Refactoring Refactoring of an existing feature label Jan 10, 2025
[ghstack-poisoned]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Refactoring Refactoring of an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants