-
Notifications
You must be signed in to change notification settings - Fork 870
WeeklyTelcon_20210330
- Dialup Info: (Do not post to public mailing list or public wiki)
-
Austen Lauria (IBM)
-
Brendan Cunningham (Cornelis Networks)
-
Brian Barrett (AWS)
-
Edgar Gabriel (UH)
-
Geoffrey Paulsen (IBM)
-
Harumi Kuno (HPE)
-
Hessam Mirsadeghi (UCX/nVidia)
-
Howard Pritchard (LANL)
-
Jeff Squyres (Cisco)
-
Josh Hursey (IBM)
-
Michael Heinz (Cornelis Networks)
-
Naughton III, Thomas (ORNL)
-
Raghu Raja (AWS)
-
Ralph Castain (Intel)
-
Todd Kordenbrock (Sandia)
-
Tomislav Janjusic
-
William Zhang (AWS)
-
Marisa Roman (Cornelius)
-
Matthew Dosanjh (Sandia)
-
Akshay Venkatesh (NVIDIA)
-
Artem Polyakov (nVidia/Mellanox)
-
Aurelien Bouteiller (UTK)
-
Brandon Yates (Intel)
-
Charles Shereda (LLNL)
-
Christoph Niethammer (HLRS)
-
David Bernhold (ORNL)
-
Erik Zeiske
-
Geoffroy Vallee (ARM)
-
George Bosilca (UTK)
-
Joseph Schuchart
-
Joshua Ladd (nVidia/Mellanox)
-
Mark Allen (IBM)
-
Matias Cabral (Intel)
-
Nathan Hjelm (Google)
-
Noah Evans (Sandia)
-
Scott Breyer (Sandia?)
-
Shintaro iwasaki
-
Xin Zhao (nVidia/Mellanox)
CUDA Build problems - https://github.com/open-mpi/ompi/issues/8736
- master /
- AWS can install CUDA and prevent build breakage.
- And ask nVidia/Mellanox to add this to their MTT.
- MPICH datatype stuff that came in from IBM
-
Sessions branch is pretty big, and needs to come back.
- So plan was to wait for rest of formatting until sessions is rebased, and then format master.
- Howard's having a few more issues on sessons, so is okay with us reformatting
-
Reformatted
-
Doing formatting on master and v5.0.x seems reasonable
-
But reformatting v4.0.x and v4.1.x seems too risky.
-
clang-format instructions are in the format file.
-
He also ran clang-tidy, and we don't have directions for that yet.
-
Requires clang-format at least v10 (Different version clang-format than clang compiler)
- Nathan will try to make it compatible with older v8
- Geoff ping Nathan to request the v5.0.x version of opal PR.
-
clang-format is separate from compiler-toolchange
-
Will we require developers to REQUIRE this?
- Not requiring a github build to require it.
- Will have a CI test that will check it.
- Not in a path where every CI will have to have it installed.
-
Do we want to hold off on MORE before v5.0.0 ships? (or 6 months after?)
-
Should be rerun as a non-cherry-pick. Might be easy to lose
- But the two branches are close.
-
Run it on master, try to PR to v5.0.x, and
-
Nathan can only run certain sections of the code-base with the systems he has.
- Strongly encourage everyone test their sections.
- PSM2 - doesn't even build in our CI, so someone should build/test this.
- Needs a squash, missing signed off commit.
- Austen will ping Nathan.
- want in v5.0.x also
- Merged to v5.0 3/29 - DONE with opal
- This is working just fine at the moment, except for ROMIO.
- ROMIO is throwing tons of warnings. But okay.
- Would need to fix it upstream.
- PMIx/PRRTE is updated.
- Perhaps now for 3rdParties, configure with --silence-obsolencense flag.
- Does someone want to ping Rob about it?
- Jeff will
- Intercomm Merge tests are timing out.
- MTT master on HLS timeouts
-
Require a C11 compiler to support 32bit platforms.
-
Debian is the only Linux distro that supports 32bit.
- Can be done in PMIx or PRRTE if desirable.
- 32bit atomics stay, because we still support 32bit datatypes.
- 32bit only architectures are removed.
-
Failure in prrte on v5.0.x, will be resolved in tonight's.
-
Using an actual 32bit gcc - Compile fail
-
Nathan thinks he might be able to write a compare-and-swap
-
v5.0 - good time to drop 32bit.
- Jeff will send note to packaging, and see if they will care.
- Debian is okay, they will just use MPICH
- OSC/RDMA assumed everything was 64bit, but once we changed
-
On 32bit, if we could use C11 atomics with locks, it might be allowed.
- So perhaps this would be a path.
- Is C11 available on older 32bit systems.
- gcc 6.0+ it should work fine.
-
Nobody has a strong opinon.
- Pride issue, but it's also time and money
- Right now the only thing breaking it Nathan's 1sided.
- Lets ask Nathan what he thinks, and if he has time to fix it.
- Shoot for a next RC of v4.0.6 on March 31st
- blocking on UCX issues (see New topics above)
- George, will get to it soon.
- Too many Open Issues (50)
- Geoff and Howard will go over v4.0.x issues, and try to close or address many of them.
- May need to label some as wont_fix, and then close
- Closed a number of issues.
- Geoff and Howard will go over v4.0.x issues, and try to close or address many of them.
- Check status of ROMIO from MPICH vs in v4.1 vs v4.0.x
- Same boat, waiting for George's datatype fix.
- A new v4.1 RC was built last week
- Most of ROMIO fixes have gone into MPICH
- 8371 - might be close
- Intercomm Merge issue
- may have gone away after PRRTE update on master
- Investigating
- blocking on UCX issues (see New topics above)
- George, will get to soon.
- PMIx and PRRTE are close to a release canidate.
- Is there a list of PRRTE issues that still need to be added?
- No, just the ones in the issue.
- Ralph thinks they're in PRRTE, but perhaps OMPI submodule not updated.
- Raghu will check.
- Regression is OSC/UCX that breaks Dynamic Windows.
- reported a year ago, but not update.
- Issue 6987
- If UCX is going to be broken for this long, may
- Couldn't get RDMA backend.
- What do we do with the mpirun Manpage?
- Didn't want OMPI requiring Sphynx, but if PRRTE and PMIx in same tar
- Ralph almost has singleton comm spawn working
- Single node without the mpirun process
- Static MCA components default still on track for v5.0.x
- ECP Community days ( March 30-April 1st )
- Need SLIDES by close of business FRIDAY (not Saturday)
- Each day 90 minute time slots.
- Tuesday March 30th from 1-2:30pm (US Eastern)
- LIVE
- Invited some people to speak. They will be our main community speakers.
- Anyone on OMPI community can send slides to Jeff and George
- Due Friday March 26th
- PMIx Wed 31st 11 - 12:30 (US Eastern)
- Need to ensure no more MPIR, SLURM PMI1/2,
- PR 8329 - convert README, HACKING, and possibly Manpages to restructured text.
- Uses https://www.sphinx-doc.org/en/master/ (Python tool, can pip install)
- Intent this is for v5.0
- mpirun / prrterun - we had quite a bit of details in orte, but are updating as much as possible.
- Ralph has asked about this for PMIx/PRRTE since this is turning out to work
- No update - 3/16
- Could be independent of PMIx and PRRTE.
- PMIx and PRRTE want to follow suite, and not require both pandoc and sphynx.
- OLD
- What do we want to do about ROMIO in general.
- OMPIO is the default everywhere.
- Giles is saying the changes we made are integration changes.
- There have been some OMPI specific changes put into ROMIO, meaning upstream maintainers refuse to help us with it.
- We may be able to work with upstream to make a clear API between the two.
- As a 3rd party package, should we move it upto the 3rd party packaging area, to be clear that we shouldn't make changes to this area?
- Need to look at this treematch thing. Upstream package that is now inside of Open-MPI.
- Might want a CI bot to watch a set of files, and flag PRs that violate principles like this.
How's the state of https://github.com/open-mpi/ompi-tests-public/
- Putting new tests there
- ULFM have some tests added there.
- Need folks to add to MTT
- Should have some new Sessions tests