-
Notifications
You must be signed in to change notification settings - Fork 870
WeeklyTelcon_20220621
Geoffrey Paulsen edited this page Jun 22, 2022
·
1 revision
- Dialup Info: (Do not post to public mailing list or public wiki)
- Akshay Venkatesh (NVIDIA)
- Austen Lauria (IBM)
- Brendan Cunningham (Cornelis Networks)
- Christoph Niethammer (HLRS)
- David Bernhold (ORNL)
- Edgar Gabriel (UoH)
- Geoffrey Paulsen (IBM)
- George Bosilca (UTK)
- Harumi Kuno (HPE)
- Hessam Mirsadeghi (UCX/nVidia)
- Howard Pritchard (LANL)
- Joseph Schuchart
- Josh Fisher (Cornelis Networks)
- Thomas Naughton (ORNL)
- Todd Kordenbrock (Sandia)
- Tommy Janjusic (nVidia)
- William Zhang (AWS)
- Artem Polyakov (nVidia)
- Aurelien Bouteiller (UTK)
- Brandon Yates (Intel)
- Brian Barrett (AWS)
- Charles Shereda (LLNL)
- Erik Zeiske
- Geoffroy Vallee (ARM)
- Jeff Squyres (Cisco)
- Josh Hursey (IBM)
- Joshua Ladd (nVidia)
- Marisa Roman (Cornelius)
- Mark Allen (IBM)
- Matias Cabral (Intel)
- Matthew Dosanjh (Sandia)
- Michael Heinz (Cornelis Networks)
- Nathan Hjelm (Google)
- Noah Evans (Sandia)
- Raghu Raja (AWS)
- Ralph Castain (Intel)
- Sam Gutierrez (LLNL)
- Scott Breyer (Sandia?)
- Shintaro iwasaki
- Xin Zhao (nVidia)
- v4.1.5
- Schedule: targeting ~6 mon (Nov 1)
- No driver on schedule yet.
- Will add a WIP label to https://github.com/open-mpi/ompi/pull/10448
- Plan to merge along with another commit that's not yet on
main
.
- Plan to merge along with another commit that's not yet on
- Schedule:
- PRRTE is targeting late summer.
- Newish issue regarding the partition communication features.
- Since it's a new feature try to get these in as well.
- Only a small number of changes on v5.0.x branch.
- Some docs
-
main
has also been quiet this week.- New Issues opened 10480 - Need to be done prior to release.
- New Issues opened 10481 - Need to be done prior to release.
- A few other issues.
-
mpirun -v
on v5.0.x returns prrte version.
- Does anyone still care about the
min-dist
mapper? Considering dropping this is PRRTE.- Mellanox developed and will reply.
- Open an issue to track it?
- Discussed Accelerator framework (see below)
- Discussed atomics PRs (see below)
-
Tommy got some discussion that they do have customers who use the sm_cuda component.
-
William will try to update sm_cuda component and convert it into the framework.
- Akshay had some comments.
- Mellanox commits to testing these changes.
-
Want to see what priority to set HAN and Adapt by default and what priority.
- Depends on scale and message sizes.
- Not just the message size, but also the ranking affects the performance
- Tuned, the communications go between ranks based on tree ignoring ranking on nodes.
- Han rearranges the ranks to allow for optimal approach at each level.
- Han should be faster and more stable because
-
Adapt deals with asyncronous order of arrival to collective.
- Tommy saw some segv with Adapt, so he just
- logic is very similar to tuned with tree. But much more async
- really adapt based on which arrives
- 10492 and link to 10487
- C11 atomics makes every atomic sequentially
- But we have many code-paths that we don't want this.
- If you don't use threads, or if you do use thread but do initializations, we don't want this.
- First thought on this is to relax load and stores.
- But going through code and figuring out where to
- So second PR just removes
_atomic
for C11.
- Difference measures was 20-25% for local messages.
- 10492 moves us back to where we were before C11 atomics.
- Because they're
atomic
gcc uses exchange (x86)- and exchange is very expensive even if there's already a lock around it.
- saw this in GCC 9, but not 10, but then again in 11.
- Compiler doesn't know
- OPAL_THREAD macros. no way to tell it to avoid it.
- Variable is marked with atomic flag. and doing
+
in thread - objdump an ob1 function.
- with
_atomic
we have no control over memory ordering other than explicit atomic load/store operations...- This is what first PR does...
- Is there a risk with 2nd PR that we might need to add some locks.
- Code we have today has been tested with old flavor, so it should be pretty safe.
- When we write new code, we'll need to
- Given the way OPAL_ATOMIC is structured, we hope no one expected an increment was not atomic.
- Wiki for face to face: https://github.com/open-mpi/ompi/wiki/Meeting-2022
- Should think about schedule, location, and topics.
- Some new topics added this week. Please consider adding more topics.
- Might be better to do a half-day/day-long virtual working session.
- Due to company's travel policies, and convenience.