-
Notifications
You must be signed in to change notification settings - Fork 63
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
4 changed files
with
60 additions
and
62 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
bdda313
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JuliaRegistrator register
bdda313
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Registration pull request created: JuliaRegistries/General/115386
Tip: Release Notes
Did you know you can add release notes too? Just add markdown formatted text underneath the comment after the text
"Release notes:" and it will be added to the registry PR, and if TagBot is installed it will also be added to the
release that TagBot creates. i.e.
To add them here just re-invoke and the PR will be updated.
Tagging
After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.
This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:
bdda313
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lux Benchmarks
Dense(512 => 512, identity)(512 x 128)/forward/CPU/2 thread(s)
415000
ns412937.5
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/4 thread(s)
243792
ns322667
ns0.76
Dense(512 => 512, identity)(512 x 128)/forward/CPU/8 thread(s)
244000
ns323104.5
ns0.76
Dense(512 => 512, identity)(512 x 128)/forward/CPU/1 thread(s)
739583
ns739750
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/GPU/CUDA
43539.5
ns43577
ns1.00
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/2 thread(s)
1274291
ns1320395.5
ns0.97
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/4 thread(s)
1231896
ns2436708
ns0.51
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/8 thread(s)
16356854.5
ns13630167
ns1.20
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/1 thread(s)
2269041
ns2195250
ns1.03
Dense(512 => 512, identity)(512 x 128)/zygote/GPU/CUDA
203819.5
ns203168.5
ns1.00
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/2 thread(s)
1367999.5
ns1394666
ns0.98
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/4 thread(s)
1283084
ns2614271
ns0.49
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/8 thread(s)
16339834
ns13809542
ns1.18
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/1 thread(s)
2229354.5
ns2256125
ns0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1774104
ns1655084
ns1.07
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1088209
ns1103916
ns0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1549542
ns1549791
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
3006521.5
ns2999729.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/GPU/CUDA
206533
ns207221
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12172854.5
ns12143833.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
8832479
ns8785708
ns1.01
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9239625.5
ns9239167
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
18613312.5
ns18591208
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1489717
ns1485675
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17302083.5
ns17317333
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
14002833
ns13967208
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14486750
ns14514354
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
21734667
ns21818416
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
250085250
ns250042270.5
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
148902834
ns148555750
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
116494728.5
ns115889000
ns1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
447939292
ns447187584
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/GPU/CUDA
5485971
ns5452362
ns1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1224661000
ns1224923291
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
934365500
ns928030208
ns1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
833255520.5
ns825911895.5
ns1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
1629736875
ns1633435667
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
31255272
ns31214910.5
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1147235708
ns1134846000
ns1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
1001812750
ns982157791.5
ns1.02
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1307261375
ns1328335541.5
ns0.98
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
1734762104
ns1734630541.5
ns1.00
lenet(28, 28, 1, 32)/forward/CPU/2 thread(s)
1129937.5
ns1097854
ns1.03
lenet(28, 28, 1, 32)/forward/CPU/4 thread(s)
1633541.5
ns1625083.5
ns1.01
lenet(28, 28, 1, 32)/forward/CPU/8 thread(s)
3618417
ns3841334
ns0.94
lenet(28, 28, 1, 32)/forward/CPU/1 thread(s)
781771
ns778042
ns1.00
lenet(28, 28, 1, 32)/forward/GPU/CUDA
261687.5
ns263538
ns0.99
lenet(28, 28, 1, 32)/zygote/CPU/2 thread(s)
2989729.5
ns2979417
ns1.00
lenet(28, 28, 1, 32)/zygote/CPU/4 thread(s)
4144709
ns4119104.5
ns1.01
lenet(28, 28, 1, 32)/zygote/CPU/8 thread(s)
9495937
ns11207896
ns0.85
lenet(28, 28, 1, 32)/zygote/CPU/1 thread(s)
3189125
ns3132750
ns1.02
lenet(28, 28, 1, 32)/zygote/GPU/CUDA
1088544
ns1091322.5
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
2337041.5
ns2334729
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1336875
ns1437000
ns0.93
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1558583
ns1665458.5
ns0.94
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
4218770.5
ns4198334
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/GPU/CUDA
207631
ns207913
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
19455000
ns19383125
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
16101833.5
ns16092916.5
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
17388416
ns17269063
ns1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
25913875
ns25856687.5
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1590005
ns1585334
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
34246750
ns34322667
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
31092604
ns30864666.5
ns1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
31249500
ns31132250
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
36490792
ns36963875
ns0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
4538334
ns4524500
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2564249.5
ns2779000
ns0.92
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2673688
ns2902854
ns0.92
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
8398125
ns8387541.5
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/GPU/CUDA
422799
ns420101
ns1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
39119187
ns38904229
ns1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
32288959
ns32105979
ns1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
32519209
ns32346959
ns1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
51939791
ns51945541
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2619605
ns2624775
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
89409416
ns88746333.5
ns1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
115467459
ns114006959
ns1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
221528542
ns224259542
ns0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
74507479
ns74608375
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
268539958
ns267333208
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
156524479
ns159214292
ns0.98
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
123385041.5
ns126745542
ns0.97
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
486000125
ns487494166
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/GPU/CUDA
6937063
ns7012704
ns0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1476618521
ns1472344083.5
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
1175367083
ns1138687375
ns1.03
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
1067186396
ns1071038854
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
2004426208.5
ns2002947479.5
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
34818111
ns34854968.5
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1724168958
ns1712616292
ns1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
1536515396
ns1536070562.5
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1869553708
ns1863636167
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
2206120083
ns2213962958
ns1.00
lenet(28, 28, 1, 128)/forward/CPU/2 thread(s)
2081499.5
ns2080042
ns1.00
lenet(28, 28, 1, 128)/forward/CPU/4 thread(s)
3049542
ns2936917
ns1.04
lenet(28, 28, 1, 128)/forward/CPU/8 thread(s)
8519417
ns8042334
ns1.06
lenet(28, 28, 1, 128)/forward/CPU/1 thread(s)
2503499.5
ns2431104
ns1.03
lenet(28, 28, 1, 128)/forward/GPU/CUDA
264802
ns278095
ns0.95
lenet(28, 28, 1, 128)/zygote/CPU/2 thread(s)
9682063
ns9677209
ns1.00
lenet(28, 28, 1, 128)/zygote/CPU/4 thread(s)
12056041
ns12036500
ns1.00
lenet(28, 28, 1, 128)/zygote/CPU/8 thread(s)
24527250
ns24751583.5
ns0.99
lenet(28, 28, 1, 128)/zygote/CPU/1 thread(s)
11752062.5
ns11606292
ns1.01
lenet(28, 28, 1, 128)/zygote/GPU/CUDA
1163895
ns1201527
ns0.97
vgg16(32, 32, 3, 32)/forward/CPU/2 thread(s)
384081437.5
ns379827708
ns1.01
vgg16(32, 32, 3, 32)/forward/CPU/4 thread(s)
312823416.5
ns286677271
ns1.09
vgg16(32, 32, 3, 32)/forward/CPU/8 thread(s)
256685166
ns240261834
ns1.07
vgg16(32, 32, 3, 32)/forward/CPU/1 thread(s)
452363520.5
ns451256520.5
ns1.00
vgg16(32, 32, 3, 32)/forward/GPU/CUDA
4828920.5
ns4858918
ns0.99
vgg16(32, 32, 3, 32)/zygote/CPU/2 thread(s)
1156564042
ns1157780125
ns1.00
vgg16(32, 32, 3, 32)/zygote/CPU/4 thread(s)
943789584
ns905233917
ns1.04
vgg16(32, 32, 3, 32)/zygote/CPU/8 thread(s)
959368416
ns987524666
ns0.97
vgg16(32, 32, 3, 32)/zygote/CPU/1 thread(s)
1405462958
ns1579543625
ns0.89
vgg16(32, 32, 3, 32)/zygote/GPU/CUDA
19179831
ns17849892
ns1.07
lenet(28, 28, 1, 64)/forward/CPU/2 thread(s)
1041166.5
ns1058583.5
ns0.98
lenet(28, 28, 1, 64)/forward/CPU/4 thread(s)
1660937.5
ns1671187.5
ns0.99
lenet(28, 28, 1, 64)/forward/CPU/8 thread(s)
6773750
ns5011708
ns1.35
lenet(28, 28, 1, 64)/forward/CPU/1 thread(s)
1302396
ns1300437.5
ns1.00
lenet(28, 28, 1, 64)/forward/GPU/CUDA
265345
ns274747.5
ns0.97
lenet(28, 28, 1, 64)/zygote/CPU/2 thread(s)
6515979.5
ns6254041
ns1.04
lenet(28, 28, 1, 64)/zygote/CPU/4 thread(s)
13193417
ns13149791.5
ns1.00
lenet(28, 28, 1, 64)/zygote/CPU/8 thread(s)
19482917
ns18860833
ns1.03
lenet(28, 28, 1, 64)/zygote/CPU/1 thread(s)
6040709
ns5852333
ns1.03
lenet(28, 28, 1, 64)/zygote/GPU/CUDA
1201565
ns1238555
ns0.97
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
70558729.5
ns70498000
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
43775917
ns43638250
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
39575708
ns39557666
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
132776896
ns132574187
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/GPU/CUDA
1873567
ns1944256
ns0.96
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
356452895.5
ns356301646
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
271260020.5
ns269549208
ns1.01
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
254204208
ns253732875
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
534878271
ns534920187.5
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
12315083
ns12320196.5
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
399335750
ns395172666
ns1.01
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
382828229
ns377158375
ns1.02
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
696801291.5
ns657754625
ns1.06
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
711815250
ns709829333
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/2 thread(s)
1196786667
ns1189792833
ns1.01
vgg16(32, 32, 3, 128)/forward/CPU/4 thread(s)
840316417
ns691561166.5
ns1.22
vgg16(32, 32, 3, 128)/forward/CPU/8 thread(s)
646428312
ns626986833
ns1.03
vgg16(32, 32, 3, 128)/forward/CPU/1 thread(s)
1770755354.5
ns1860884792
ns0.95
vgg16(32, 32, 3, 128)/forward/GPU/CUDA
12304606
ns12309151
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/2 thread(s)
3647916604
ns3633655916
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/4 thread(s)
2830461833
ns2828990458
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/8 thread(s)
2740115583
ns2702591209
ns1.01
vgg16(32, 32, 3, 128)/zygote/CPU/1 thread(s)
5008028167
ns5056811500
ns0.99
vgg16(32, 32, 3, 128)/zygote/GPU/CUDA
49712678.5
ns49201169.5
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3430083
ns3425625
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2069729.5
ns2072958.5
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2510750
ns2525541
ns0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
6046916.5
ns6028666.5
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/GPU/CUDA
339536
ns322034
ns1.05
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
26138042
ns25910625
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
18987625
ns18853584
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
19921353.5
ns19458875
ns1.02
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
39376375
ns39298645.5
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2472544
ns2474706.5
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
54782125
ns54292500
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
82825584
ns81331292
ns1.02
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
176027917
ns170565562
ns1.03
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
45641958
ns45567333
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1792417
ns1782916
ns1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1107270.5
ns1103709
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1575042
ns1548917
ns1.02
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
3035584
ns3027375
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/GPU/CUDA
210834
ns210691.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12551417
ns12525854
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
9225583
ns9206541.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9692084
ns9628792
ns1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
19021208.5
ns19005604.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1545625
ns1537547
ns1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17641833.5
ns17655854
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
14341584
ns14331645.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14582062.5
ns14600583
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
22215521
ns22163250
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
70599125
ns70499459
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
43507917
ns43573833
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
39557208
ns39479542
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
132632458.5
ns132481104.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/GPU/CUDA
1903889.5
ns1867593
ns1.02
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
360661853.5
ns360531229
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
349138333
ns345233354
ns1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
306947875
ns303345083
ns1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
723089875
ns722647875
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
13383173
ns13388759.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
424456604.5
ns418893124.5
ns1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
427863792
ns418550083
ns1.02
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
785018416.5
ns733622021
ns1.07
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
715835208
ns714074250
ns1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/2 thread(s)
1596583.5
ns1662791
ns0.96
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/4 thread(s)
1159458
ns1326395.5
ns0.87
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/8 thread(s)
1148708
ns1266458.5
ns0.91
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/1 thread(s)
2311250
ns2293875
ns1.01
mlp7layer_bn(gelu)(32 x 256)/forward/GPU/CUDA
580091
ns584223
ns0.99
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/2 thread(s)
8847333
ns8911021
ns0.99
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/4 thread(s)
13545750
ns12871250
ns1.05
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/8 thread(s)
33507812.5
ns31057917
ns1.08
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/1 thread(s)
9857583
ns9825729.5
ns1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/GPU/CUDA
1429244
ns1434469
ns1.00
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/2 thread(s)
16580916.5
ns16503167
ns1.00
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/4 thread(s)
23509292
ns20919875
ns1.12
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/8 thread(s)
49483041
ns44942437.5
ns1.10
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/1 thread(s)
13148812.5
ns13103167
ns1.00
Dense(512 => 512, relu)(512 x 128)/forward/CPU/2 thread(s)
838271
ns789458
ns1.06
Dense(512 => 512, relu)(512 x 128)/forward/CPU/4 thread(s)
628167
ns538437.5
ns1.17
Dense(512 => 512, relu)(512 x 128)/forward/CPU/8 thread(s)
1073104
ns1024041.5
ns1.05
Dense(512 => 512, relu)(512 x 128)/forward/CPU/1 thread(s)
724833.5
ns725041
ns1.00
Dense(512 => 512, relu)(512 x 128)/forward/GPU/CUDA
47723
ns47144.5
ns1.01
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/2 thread(s)
1518875
ns1463416
ns1.04
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/4 thread(s)
1003959
ns1040312
ns0.97
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/8 thread(s)
1878958
ns1411187.5
ns1.33
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/1 thread(s)
2254542
ns2257916
ns1.00
Dense(512 => 512, relu)(512 x 128)/zygote/GPU/CUDA
235472.5
ns234270.5
ns1.01
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/2 thread(s)
1574750
ns1530583
ns1.03
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/4 thread(s)
1064562
ns1024209
ns1.04
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/8 thread(s)
2010625
ns1524333
ns1.32
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/1 thread(s)
2260979.5
ns2201771
ns1.03
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3405459
ns3406354
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2065167
ns2052854.5
ns1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2509625
ns2507959
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
6033354.5
ns6001333
ns1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/GPU/CUDA
282898
ns287105.5
ns0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
24076541
ns24055375
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
17229771
ns17211646
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
17134667
ns17114333
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
37637875
ns37572333
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2400944
ns2401548.5
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
52973250
ns52614646
ns1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
83207042
ns82221646
ns1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
170182021
ns169582250
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
44655333
ns44570125
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
250702791
ns250290791.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
148728229.5
ns148276667
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
116178542
ns115710770.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
447727458.5
ns447663770.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/GPU/CUDA
5450463
ns5443484
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1105083958
ns1105632500
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
856949208.5
ns854893979
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
828942146
ns827018271
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
1750357541
ns1767047166
ns0.99
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
28967174
ns28898282.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1027883729.5
ns1021345729.5
ns1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
982128875
ns974787791
ns1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1326112792
ns1329964041.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
1722634604
ns1731428604.5
ns0.99
mlp7layer_bn(relu)(32 x 256)/forward/CPU/2 thread(s)
1231729.5
ns1243062.5
ns0.99
mlp7layer_bn(relu)(32 x 256)/forward/CPU/4 thread(s)
782291
ns955375
ns0.82
mlp7layer_bn(relu)(32 x 256)/forward/CPU/8 thread(s)
778459
ns906875
ns0.86
mlp7layer_bn(relu)(32 x 256)/forward/CPU/1 thread(s)
2020000
ns2048500
ns0.99
mlp7layer_bn(relu)(32 x 256)/forward/GPU/CUDA
562638
ns563206.5
ns1.00
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/2 thread(s)
5893271
ns5919958
ns1.00
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/4 thread(s)
9134459
ns6419604
ns1.42
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/8 thread(s)
26527458
ns23873812
ns1.11
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/1 thread(s)
7115333
ns7097854.5
ns1.00
mlp7layer_bn(relu)(32 x 256)/zygote/GPU/CUDA
1395487
ns1364575
ns1.02
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/2 thread(s)
9708334
ns9591542
ns1.01
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/4 thread(s)
16006479.5
ns13052166.5
ns1.23
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/8 thread(s)
34574625
ns31360875
ns1.10
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/1 thread(s)
7611479.5
ns7260167
ns1.05
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/2 thread(s)
529958.5
ns481625
ns1.10
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/4 thread(s)
478041
ns443500
ns1.08
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/8 thread(s)
2960583.5
ns1999500
ns1.48
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/1 thread(s)
88458
ns87833
ns1.01
Dense(128 => 128, gelu)(128 x 128)/forward/GPU/CUDA
27957
ns27760
ns1.01
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/2 thread(s)
381541
ns377333
ns1.01
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/4 thread(s)
438792
ns439500
ns1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/8 thread(s)
4910417
ns4505250
ns1.09
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/1 thread(s)
258791
ns258291
ns1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/GPU/CUDA
223833.5
ns219213
ns1.02
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/2 thread(s)
413542
ns408166.5
ns1.01
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/4 thread(s)
469459
ns470125
ns1.00
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/8 thread(s)
4892312.5
ns4495000
ns1.09
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/1 thread(s)
271125
ns271000
ns1.00
Dense(128 => 128, relu)(128 x 128)/forward/CPU/2 thread(s)
476333
ns427792
ns1.11
Dense(128 => 128, relu)(128 x 128)/forward/CPU/4 thread(s)
419750
ns376896
ns1.11
Dense(128 => 128, relu)(128 x 128)/forward/CPU/8 thread(s)
773166
ns733562.5
ns1.05
Dense(128 => 128, relu)(128 x 128)/forward/CPU/1 thread(s)
54583
ns52417
ns1.04
Dense(128 => 128, relu)(128 x 128)/forward/GPU/CUDA
28075
ns28102
ns1.00
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/2 thread(s)
341146
ns336500
ns1.01
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/4 thread(s)
333854
ns333854
ns1
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/8 thread(s)
490125
ns419229.5
ns1.17
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/1 thread(s)
151750
ns151562.5
ns1.00
Dense(128 => 128, relu)(128 x 128)/zygote/GPU/CUDA
210240.5
ns204281.5
ns1.03
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/2 thread(s)
355771
ns351375
ns1.01
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/4 thread(s)
347438
ns348375
ns1.00
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/8 thread(s)
497291
ns899625
ns0.55
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/1 thread(s)
150959
ns150667
ns1.00
vgg16(32, 32, 3, 64)/forward/CPU/2 thread(s)
605471333
ns603094958
ns1.00
vgg16(32, 32, 3, 64)/forward/CPU/4 thread(s)
434427292
ns428615062.5
ns1.01
vgg16(32, 32, 3, 64)/forward/CPU/8 thread(s)
396215292
ns384740459
ns1.03
vgg16(32, 32, 3, 64)/forward/CPU/1 thread(s)
871673167
ns873854208.5
ns1.00
vgg16(32, 32, 3, 64)/forward/GPU/CUDA
7029497
ns7027277
ns1.00
vgg16(32, 32, 3, 64)/zygote/CPU/2 thread(s)
2010340292
ns2002711146
ns1.00
vgg16(32, 32, 3, 64)/zygote/CPU/4 thread(s)
1626784792
ns1606403375
ns1.01
vgg16(32, 32, 3, 64)/zygote/CPU/8 thread(s)
1578107124.5
ns1551092146
ns1.02
vgg16(32, 32, 3, 64)/zygote/CPU/1 thread(s)
2633522708
ns2631708917
ns1.00
vgg16(32, 32, 3, 64)/zygote/GPU/CUDA
26046548
ns26123824
ns1.00
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/2 thread(s)
524125
ns521396
ns1.01
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/4 thread(s)
393083
ns431875
ns0.91
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/8 thread(s)
2705354.5
ns1926416
ns1.40
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/1 thread(s)
880750
ns866417
ns1.02
Dense(512 => 512, gelu)(512 x 128)/forward/GPU/CUDA
47587
ns47024
ns1.01
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/2 thread(s)
1872979
ns1855270.5
ns1.01
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/4 thread(s)
1766354.5
ns2793583
ns0.63
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/8 thread(s)
16551583.5
ns14609250
ns1.13
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/1 thread(s)
2708895.5
ns2648521
ns1.02
Dense(512 => 512, gelu)(512 x 128)/zygote/GPU/CUDA
251708.5
ns246347
ns1.02
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/2 thread(s)
1951396
ns1974875
ns0.99
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/4 thread(s)
1841000.5
ns5038917
ns0.37
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/8 thread(s)
16533458
ns15177854.5
ns1.09
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/1 thread(s)
2782374.5
ns2744270.5
ns1.01
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/2 thread(s)
1493542
ns1512729
ns0.99
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/4 thread(s)
932833
ns1178292
ns0.79
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/8 thread(s)
1063125
ns1180084
ns0.90
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/1 thread(s)
2333792
ns2300375
ns1.01
mlp7layer_bn(tanh)(32 x 256)/forward/GPU/CUDA
588210.5
ns589242.5
ns1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/2 thread(s)
5938333.5
ns5245791
ns1.13
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/4 thread(s)
8498791.5
ns4733604
ns1.80
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/8 thread(s)
26209458
ns24184833
ns1.08
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/1 thread(s)
7349083.5
ns7316583
ns1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/GPU/CUDA
1387138.5
ns1392514
ns1.00
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/2 thread(s)
11708833
ns11607209
ns1.01
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/4 thread(s)
17834979
ns16305271
ns1.09
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/8 thread(s)
38988875
ns35977250
ns1.08
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/1 thread(s)
9549499.5
ns9550875
ns1.00
Dense(16 => 16, relu)(16 x 128)/forward/CPU/2 thread(s)
2209
ns2333
ns0.95
Dense(16 => 16, relu)(16 x 128)/forward/CPU/4 thread(s)
2250
ns2542
ns0.89
Dense(16 => 16, relu)(16 x 128)/forward/CPU/8 thread(s)
3584
ns3083
ns1.16
Dense(16 => 16, relu)(16 x 128)/forward/CPU/1 thread(s)
2458.5
ns2458
ns1.00
Dense(16 => 16, relu)(16 x 128)/forward/GPU/CUDA
25040
ns25059
ns1.00
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/2 thread(s)
7250
ns7417
ns0.98
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/4 thread(s)
7020.5
ns7042
ns1.00
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/8 thread(s)
7375
ns7209
ns1.02
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/1 thread(s)
7000
ns7333
ns0.95
Dense(16 => 16, relu)(16 x 128)/zygote/GPU/CUDA
215926
ns214253
ns1.01
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/2 thread(s)
8542
ns8333
ns1.03
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/4 thread(s)
8209
ns8083
ns1.02
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/8 thread(s)
8417
ns8437.5
ns1.00
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/1 thread(s)
5958
ns6125
ns0.97
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/2 thread(s)
10958
ns10667
ns1.03
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/4 thread(s)
12959
ns13791
ns0.94
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/8 thread(s)
10395.5
ns11208
ns0.93
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/1 thread(s)
7333
ns7375
ns0.99
Dense(16 => 16, gelu)(16 x 128)/forward/GPU/CUDA
25049.5
ns25157
ns1.00
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/2 thread(s)
20083
ns20062.5
ns1.00
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/4 thread(s)
19833
ns19833
ns1
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/8 thread(s)
20250
ns20041
ns1.01
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/1 thread(s)
19979.5
ns20000
ns1.00
Dense(16 => 16, gelu)(16 x 128)/zygote/GPU/CUDA
235054
ns235669
ns1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/2 thread(s)
23541
ns23562.5
ns1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/4 thread(s)
23500
ns23417
ns1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/8 thread(s)
23792
ns23625
ns1.01
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/1 thread(s)
21375
ns21458
ns1.00
Dense(128 => 128, identity)(128 x 128)/forward/CPU/2 thread(s)
28833.5
ns28584
ns1.01
Dense(128 => 128, identity)(128 x 128)/forward/CPU/4 thread(s)
29084
ns28916
ns1.01
Dense(128 => 128, identity)(128 x 128)/forward/CPU/8 thread(s)
28375
ns29417
ns0.96
Dense(128 => 128, identity)(128 x 128)/forward/CPU/1 thread(s)
46208
ns46000
ns1.00
Dense(128 => 128, identity)(128 x 128)/forward/GPU/CUDA
26563
ns26406
ns1.01
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/2 thread(s)
229666.5
ns221667
ns1.04
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/4 thread(s)
272334
ns278604.5
ns0.98
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/8 thread(s)
4450292
ns4081750
ns1.09
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/1 thread(s)
146042
ns145833
ns1.00
Dense(128 => 128, identity)(128 x 128)/zygote/GPU/CUDA
207954
ns208494.5
ns1.00
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/2 thread(s)
247416
ns237333
ns1.04
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/4 thread(s)
290292
ns295625
ns0.98
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/8 thread(s)
4062333.5
ns4027625
ns1.01
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/1 thread(s)
145604
ns145875
ns1.00
Dense(16 => 16, identity)(16 x 128)/forward/CPU/2 thread(s)
1875
ns2083.5
ns0.90
Dense(16 => 16, identity)(16 x 128)/forward/CPU/4 thread(s)
2042
ns1917
ns1.07
Dense(16 => 16, identity)(16 x 128)/forward/CPU/8 thread(s)
2833
ns2458
ns1.15
Dense(16 => 16, identity)(16 x 128)/forward/CPU/1 thread(s)
1917
ns1791
ns1.07
Dense(16 => 16, identity)(16 x 128)/forward/GPU/CUDA
23298
ns23206
ns1.00
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/2 thread(s)
5291
ns5333
ns0.99
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/4 thread(s)
5209
ns5167
ns1.01
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/8 thread(s)
5375
ns5375
ns1
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/1 thread(s)
5250
ns5250
ns1
Dense(16 => 16, identity)(16 x 128)/zygote/GPU/CUDA
240994.5
ns238219
ns1.01
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/2 thread(s)
7625
ns7292
ns1.05
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/4 thread(s)
7500
ns7291
ns1.03
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/8 thread(s)
7583
ns7542
ns1.01
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/1 thread(s)
5104.5
ns5333
ns0.96
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
80254791.5
ns79904000
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
48077979
ns49166750
ns0.98
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
43208687.5
ns44974542
ns0.96
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
151606750
ns151504667
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/GPU/CUDA
2668752
ns2718498
ns0.98
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
608153250
ns496218625
ns1.23
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
416713916
ns410097125
ns1.02
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
401894583
ns397607667
ns1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
688243625
ns684031750
ns1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
14712224
ns14583158
ns1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
717290334
ns709703166.5
ns1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
671671750
ns675407250
ns0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
1014211208
ns1001028958
ns1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
1000127792
ns995697250
ns1.00
This comment was automatically generated by workflow using github-action-benchmark.