-
Notifications
You must be signed in to change notification settings - Fork 877
WeeklyTelcon_20210323
Geoffrey Paulsen edited this page Mar 24, 2021
·
1 revision
- Dialup Info: (Do not post to public mailing list or public wiki)
- Austen Lauria (IBM)
- Brendan Cunningham (Cornelis Networks)
- Brian Barrett (AWS)
- Edgar Gabriel (UH)
- Geoffrey Paulsen (IBM)
- Harumi Kuno (HPE)
- Hessam Mirsadeghi (UCX/nVidia)
- Howard Pritchard (LANL)
- Jeff Squyres (Cisco)
- Josh Hursey (IBM)
- Michael Heinz (Cornelis Networks)
- Naughton III, Thomas (ORNL)
- Raghu Raja (AWS)
- Ralph Castain (Intel)
- Todd Kordenbrock (Sandia)
- Tomislav Janjusic
- William Zhang (AWS)
- Marisa Roman (Cornelius)
- Matthew Dosanjh (Sandia)
- Akshay Venkatesh (NVIDIA)
- Artem Polyakov (nVidia/Mellanox)
- Aurelien Bouteiller (UTK)
- Brandon Yates (Intel)
- Charles Shereda (LLNL)
- Christoph Niethammer (HLRS)
- David Bernhold (ORNL)
- Erik Zeiske
- Geoffroy Vallee (ARM)
- George Bosilca (UTK)
- Joseph Schuchart
- Joshua Ladd (nVidia/Mellanox)
- Mark Allen (IBM)
- Matias Cabral (Intel)
- Nathan Hjelm (Google)
- Noah Evans (Sandia)
- Scott Breyer (Sandia?)
- Shintaro iwasaki
- Xin Zhao (nVidia/Mellanox)
- If you don't have zlib, this affects launching and memory consumption
- Tools will spit out a warning that you don't have compression
- We need to write up something for Packagers as well.
- Brian will document this (really should build with zlib) in a README-packagers.md
- Hope that packager will package these things externally.
- NEWS bullets for zlib as well.
- Geoff will do this.
- Please update your CI to run MTT on v5.0.x PRs, and on v5.0.x based PRs
- Please Cherry-pick your bugfix/v5.0.x PRs there after your PR is accepted to master
- Doing formatting on master and v5.0.x seems reasonable
- But reformatting v4.0.x and v4.1.x seems too risky.
- clang-format instructions are in the format file.
- He also ran clang-tidy, and we don't have directions for that yet.
- Requires clang-format at least v10 (Different version clang-format than clang compiler)
- Nathan will try to make it compatible with older v8
- Geoff ping Nathan to request the v5.0.x version of opal PR.
- clang-format is separate from compiler-toolchange
- Will we require developers to REQUIRE this?
- Not requiring a github build to require it.
- Will have a CI test that will check it.
- Not in a path where every CI will have to have it installed.
- Do we want to hold off on MORE before v5.0.0 ships? (or 6 months after?)
- Should be rerun as a non-cherry-pick. Might be easy to lose
- But the two branches are close.
- Run it on master, try to PR to v5.0.x, and
- Nathan can only run certain sections of the code-base with the systems he has.
- Strongly encourage everyone test their sections.
- PSM2 - doesn't even build in our CI, so someone should build/test this.
- Needs a squash, missing signed off commit.
- Austen will ping Nathan.
- want in v5.0.x also
- This is working just fine at the moment, except for ROMIO.
- ROMIO is throwing tons of warnings. But okay.
- Would need to fix it upstream.
- PMIx/PRRTE is updated.
- Perhaps now for 3rdParties, configure with --silence-obsolencense flag.
- Does someone want to ping Rob about it?
- Jeff will
- Intercomm Merge tests are timing out.
- MTT master on HLS timeouts
- Failure in prrte on v5.0.x, will be resolved in tonight's.
- https://github.com/open-mpi/ompi/issues/8566
- Using an actual 32bit gcc - Compile fail
- Nathan thinks he might be able to write a compare-and-swap
- v5.0 - good time to drop 32bit.
- Jeff will send note to packaging, and see if they will care.
- Debian is okay, they will just use MPICH
- OSC/RDMA assumed everything was 64bit, but once we changed
- On 32bit, if we could use C11 atomics with locks, it might be allowed.
- So perhaps this would be a path.
- Is C11 available on older 32bit systems.
- gcc 6.0+ it should work fine.
- Nobody has a strong opinon.
- Pride issue, but it's also time and money
- Right now the only thing breaking it Nathan's 1sided.
- Lets ask Nathan what he thinks, and if he has time to fix it.
- Shoot for a next RC of v4.0.6 on March 31st
- blocking on UCX issues (see New topics above)
- George, will get to it soon.
- Too many Open Issues (50)
- Geoff and Howard will go over v4.0.x issues, and try to close or address many of them.
- May need to label some as wont_fix, and then close
- Geoff and Howard will go over v4.0.x issues, and try to close or address many of them.
- Check status of ROMIO from MPICH vs in v4.1 vs v4.0.x
- Same boat, waiting for George's datatype fix.
- A new v4.1 RC was built last week
- Most of ROMIO fixes have gone into MPICH
- 8371 - might be close
- Intercomm Merge issue
- may have gone away after PRRTE update on master
- Investigating
- blocking on UCX issues (see New topics above)
- George, will get to soon.
- What do we do with the mpirun Manpage?
- Didn't want OMPI requiring Sphynx, but if PRRTE and PMIx in same tar
- Ralph almost has singleton comm spawn working
- Single node without the mpirun process
- Static MCA components default still on track for v5.0.x
- ECP Community days ( March 30-April 1st )
- Need SLIDES by close of business FRIDAY (not Saturday)
- Each day 90 minute time slots.
- Tuesday March 30th from 1-2:30pm (US Eastern)
- LIVE
- Invited some people to speak. They will be our main community speakers.
- Anyone on OMPI community can send slides to Jeff and George
- Due Friday March 26th
- PMIx Wed 31st 11 - 12:30 (US Eastern)
- Need to ensure no more MPIR, SLURM PMI1/2,
- PR 8329 - convert README, HACKING, and possibly Manpages to restructured text.
- Uses https://www.sphinx-doc.org/en/master/ (Python tool, can pip install)
- Intent this is for v5.0
- mpirun / prrterun - we had quite a bit of details in orte, but are updating as much as possible.
- Ralph has asked about this for PMIx/PRRTE since this is turning out to work
- No update - 3/16
- Could be independent of PMIx and PRRTE.
- PMIx and PRRTE want to follow suite, and not require both pandoc and sphynx.
- OLD
- What do we want to do about ROMIO in general.
- OMPIO is the default everywhere.
- Giles is saying the changes we made are integration changes.
- There have been some OMPI specific changes put into ROMIO, meaning upstream maintainers refuse to help us with it.
- We may be able to work with upstream to make a clear API between the two.
- As a 3rd party package, should we move it upto the 3rd party packaging area, to be clear that we shouldn't make changes to this area?
- Need to look at this treematch thing. Upstream package that is now inside of Open-MPI.
- Might want a CI bot to watch a set of files, and flag PRs that violate principles like this.
How's the state of https://github.com/open-mpi/ompi-tests-public/
- Putting new tests there
- ULFM have some tests added there.
- Need folks to add to MTT
- Should have some new Sessions tests