Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SVE Backend #842

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

SVE Backend #842

wants to merge 1 commit into from

Conversation

jeremylt
Copy link
Member

No description provided.

@jeremylt
Copy link
Member Author

jeremylt commented Nov 12, 2021

This now compiles and passes the t3 tests on Ookami.

ToDo:

  • Performance comparisons
  • Makefile flag fix (not sure how the AVX one even works)
  • Improved vectorization instructions?
  • Add SVE backend to README
  • Do we want manual unrolling for opt? The unrolling is pretty straightforward

@LeilaGhaffari
Copy link
Member

I remember from Ookami's talk that performance with GCC was the worst among all compilers. I did a brief experiment before losing access to Ookami.
sve/blocked with armclang: DoFs/Sec in CG: 1.43289 (1.43289) million
opt/blocked with gcc: DoFs/Sec in CG: 0.729637 (0.729637) million

I am not sure why these numbers are so small compared to what we had Friday but I think the poor performance with sve might partly have something to do with compiler, maybe?! I have to apply for an account to do more experiments though.

@jedbrown
Copy link
Member

jedbrown commented Sep 6, 2022

Is this ready for review? Should we include it in v0.11?

@jeremylt
Copy link
Member Author

jeremylt commented Sep 6, 2022

The two big todos are fixing the makefile magic and seeing if this actually does anything different than OPT in terms of performance.

@jeremylt jeremylt force-pushed the jeremy/sve branch 3 times, most recently from d3cd77e to e57bab0 Compare September 12, 2022 15:53
@jeremylt jeremylt marked this pull request as ready for review September 12, 2022 15:54
@jedbrown
Copy link
Member

I noticed that libxsmm contains aarch64/SVE code and it's announced as supported for the next release.

@jedbrown
Copy link
Member

Do we have a place where we can measure performance? There is a machine at Sandia that you can access if you put in a Sarape request and AWS c7g also has SVE. JLSE also has a system that I could try requesting.

@jeremylt jeremylt force-pushed the jeremy/sve branch 2 times, most recently from 2ddda2a to e9e9d45 Compare March 13, 2023 17:14
@jeremylt jeremylt force-pushed the jeremy/sve branch 5 times, most recently from 3e3b319 to c51269f Compare April 27, 2023 16:25
@jeremylt jeremylt force-pushed the jeremy/sve branch 3 times, most recently from d35b987 to 7d887ac Compare February 13, 2024 17:14
@jeremylt jeremylt force-pushed the jeremy/sve branch 2 times, most recently from 3475cec to a112875 Compare September 27, 2024 22:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants