-
Notifications
You must be signed in to change notification settings - Fork 870
WeeklyTelcon_20220614
- Dialup Info: (Do not post to public mailing list or public wiki)
- Akshay Venkatesh (NVIDIA)
- Austen Lauria (IBM)
- Brendan Cunningham (Cornelis Networks)
- Brian Barrett (AWS)
- Christoph Niethammer (HLRS)
- Edgar Gabriel (UoH)
- Geoffrey Paulsen (IBM)
- Hessam Mirsadeghi (UCX/nVidia)
- Joseph Schuchart
- Josh Fisher (Cornelis Networks)
- Josh Hursey (IBM)
- Matthew Dosanjh (Sandia)
- Todd Kordenbrock (Sandia)
- Tommy Janjusic (nVidia)
- William Zhang (AWS)
- Artem Polyakov (nVidia)
- Aurelien Bouteiller (UTK)
- Brandon Yates (Intel)
- Charles Shereda (LLNL)
- David Bernhold (ORNL)
- Erik Zeiske
- Geoffroy Vallee (ARM)
- George Bosilca (UTK)
- Harumi Kuno (HPE)
- Howard Pritchard (LANL)
- Jeff Squyres (Cisco)
- Joshua Ladd (nVidia)
- Marisa Roman (Cornelius)
- Mark Allen (IBM)
- Matias Cabral (Intel)
- Michael Heinz (Cornelis Networks)
- Nathan Hjelm (Google)
- Noah Evans (Sandia)
- Raghu Raja (AWS)
- Ralph Castain (Intel)
- Sam Gutierrez (LLNL)
- Scott Breyer (Sandia?)
- Shintaro iwasaki
- Thomas Naughton (ORNL)
- Xin Zhao (nVidia)
- v4.1.5
- Schedule: targeting ~6 mon (Nov 1)
- No driver on schedule yet.
-
Updated PMIx and PRRTE submodule pointers.
- Issue 10437 - We hope this is resolved by updated pointers.
- Austen couldn't reproduce, can anyone give confirmation that this is resolved?
-
Issue 10468 - Doc to-do list.
-
Issue 10459 - a bunch of issues with ompi-master.
- Compiler issues with Q-Threads
- Not sure who the owner of qthreads is.
- Compiler issues with Q-Threads
-
Discussions about new
-
Mellanox still have some use-cases for sm_cuda btl.
-
Any idea on how mature accellerator framework is?
- nVidia commits to testing the framework on
main
. - Still some discussion on the Pull Request.
- nVidia commits to testing the framework on
-
A couple of critical new issues.
- Issue 10435 - a Regression from v4.1
- No update.
- Issue 10435 - a Regression from v4.1
-
Progress being made on missing Sessions symbols.
- Howard has a PR open that needs a bit more work.
-
Call to Prte / PMIx
- Longest Pole in the tent right now.
- If you want OMPI v5.0 released in near-ish future, please scare up some resources
- Use PRRTE
critical
andTarget v2.1
labels for issues.
-
Schedule:
- Blockers are still the same.
- PRRTE blocker -
- Right now looking like late summer (Us not having a PRRTE release for Packager to package)
- Call for help - If anyone has resources to help, we can move this release date much sooner.
- Requires investment from us.
- Blockers are listed Some are in the PRRTE project
- Any Alternatives?
- The problem for Open MPI is not that PRRTE isn't ready to release. The parts we use, works great, but other parts still have issues (namely DVM)
- Because we install PMIx and PRRTE as if they came from their own tarballs.
- This leaves Packagers no good way to distribute Open MPI.
- How do we install PMIx and PRRTE in open-mpi/lib instead and get all of the
rpaths
correct? - This might be the best bet (aside from fixing PRRTE ources of course)
-
Several Backported PRs
-
coll_han tuning runs discussion [PR 10347]
- Tommy(nVidia) + UCX on v5.0.x Seems that Adapt and Han are underperforming realtive to
- Graph of data posted to [PR 10347]
- Percentage difference latency graphs.
- Anything ABOVE 0 is where Han out performed (better) than tuned.
- He's been seeing some "sorry" messages.
- Perhaps a combination of SLURM and MPIRUN?
- Just tested Alltoall, Allreduce, and Allgather.
- x86 cluster, 32nodes x 40ppn
- By node HAN seems to perform better
- By core Tuned seems to perform better.
- Some dips might be due to UCX dynamic transport at this scale (rather than RC)
- Tommy can do some more testing if others have suggestions.
- Used
mpirun
with either (--map-by-node
|--map-by-core
) force ucx and select collective. - Tommy will also run 1ppn and full ppn
- Graph of data posted to [PR 10347]
- Would be good to run Open MPI
v4.1
branch to see, especially since George's paper was against v4.1 - Brian(AWS) was using EFA, and seeing similar things.
- Would also be interesting to see how UCC stands up against these numbers.
- Corneilius (Brendan) ran both
v4.1
andmain
- not highly tuned clusters, but similar components.- Trying to isolate the differences between
v4.1
andmain
. - Just increasing priority SHOULD work to select the correct collective components.
- OFI with PSM2 provider
- Substantial difference between
main
andv4.1
- Have seen substantial differences with different mapping flags.
- Maybe we should rerun this with explict mapping controls.
- Small messages seem better with Han and large messages due to Tuned?
- Trying to isolate the differences between
- Austen (IBM) also did graphs with v5.0.x
- lower percentages
- OB1 with out of box with Tuned/Han
- Orange is
--map-by-core
, blue is--map-by-node
- Bcast getting close to 90%
- Will run with IMB to verify OSU data.
- Using UCX didn't see much difference on Han and Tuned.
- HAN is heirarchical so scaling ppn shouldn't be as noticable difference as scaling nodes.
- Don't really see too much difference between
--map-by-core
and--map-by-node
(expected in HAN), but dissimilar with Brian and Tommy's data.
- Would be good for George to look and comment on this.
- Joseph is also planning to do runs.
- Will talk to George on posted numbers and post any suggestions.
- Thomas Naughton.
-
main
andv5.0.x
should be the same, use either
- Tommy(nVidia) + UCX on v5.0.x Seems that Adapt and Han are underperforming realtive to
-
Please HELP!
- Performance test default selection of Tuned vs HAN
- Brian hasn't (and might not for a while) have time to send out instructions on how to test.
- Can anyone send out these instructions?
- Call for folks to performance test at 16 nodes, and at whatever "makes sense" for them.
-
Accelerator stuff that William is working on, should be able to get out of draft.
- Edgar has been working on ROCME component of Framework
- Post v5.0.0? Originally was shouldn't since release was close, but if it slips to end of summer, we'll see ...
-
Edgar finished ROCM component... appears to be working.
- William or Brian can comment on how close to merge to
main
. - William working on btl sm_cuda and rcache code. Could maybe merge at the end of this week.
- Tommy, was going to get some nVidia people to review / test.
- Discussion on
btl sm_cuda
- used to be a cloned copy ofsm
, but it's the oldersm
component, notvader
which was renamed tosm
.- Might be time to drop
btl sm_cuda
? - vader component does not have hooks to the new framework.
- Uses where
btl sm_cuda
might get used today would be:- TCP path would use this for on-node
- Node without UCX
- even one-sided would not end up using
btl sm_cuda
.
- Might be time to drop
- v5.0.0 would be a good time to remove this.
- Based on old
sm
is a big detractor. - Can we ALSO remove
rcache
? Unclear.
- Based on old
- William or Brian can comment on how close to merge to
-
What's the status of accellerator branch on v5.0.x branch?
- PR is just to
main
. - We said we could do a backport, but that would be after it gets merged to
main
- If v5.0.0 is still a month out, is that enough time?
- v5.0.0 is lurking closer.
- This is a BIG chunk of code...
- But if v5.0.0 delays longer... this would be good to get in.
- Answer is largely dependent on pmix and prte.
- Also has implications on OMPI-next?
- PR is just to
-
Can anyone who understands packaging review: https://github.com/open-mpi/ompi/pull/10386 ?
-
Automate 3rd Party minimum version checks into a txt file that both
- configure and docs could read from a common file.
- config.py runs at beginning of Sphynx and could read in files, etc.
- Still iterating on.
-
https://github.com/open-mpi/ompi/pull/8941 -
- Like to get this in, or close it
- Geoff will sent him an email to George to ask him to reiview.
- What are companies thinking about travel?
- Wiki for face to face: https://github.com/open-mpi/ompi/wiki/Meeting-2022
- Should think about schedule, location, and topics.
- Some new topics added this week. Please consider adding more topics.
- MPI Forum was virtual
- Next one Euro MPI will be hybrid.
- Plan to continue being hybrid with 1-2 meetings / year.