-
Notifications
You must be signed in to change notification settings - Fork 870
WeeklyTelcon_20210504
Geoffrey Paulsen edited this page May 5, 2021
·
2 revisions
- Dialup Info: (Do not post to public mailing list or public wiki)
- Aurelien Bouteiller (UTK)
- Austen Lauria (IBM)
- Brian Barrett (AWS)
- David Bernhold (ORNL)
- Edgar Gabriel (UH)
- Geoffrey Paulsen (IBM)
- Harumi Kuno (HPE)
- Hessam Mirsadeghi (UCX/nVidia)
- Howard Pritchard (LANL)
- Jeff Squyres (Cisco)
- Joseph Schuchart
- Josh Hursey (IBM)
- Matthew Dosanjh (Sandia)
- Raghu Raja
- Sam Gutierrez (LANL)
- Todd Kordenbrock (Sandia)
- Tomislav Janjusic
- William Zhang (AWS)
- Akshay Venkatesh (NVIDIA)
- Artem Polyakov (nVidia/Mellanox)
- Brandon Yates (Intel)
- Brendan Cunningham (Cornelis Networks)
- Charles Shereda (LLNL)
- Christoph Niethammer (HLRS)
- Erik Zeiske
- Geoffroy Vallee (ARM)
- George Bosilca (UTK)
- Joshua Ladd (nVidia/Mellanox)
- Marisa Roman (Cornelius)
- Mark Allen (IBM)
- Matias Cabral (Intel)
- Michael Heinz (Cornelis Networks)
- Nathan Hjelm (Google)
- Naughton III, Thomas (ORNL)
- Noah Evans (Sandia)
- Ralph Castain (Intel)
- Scott Breyer (Sandia?)
- Shintaro iwasaki
- Xin Zhao (nVidia/Mellanox)
- Tommy is taking over for Josh Ladd for short-term.
- Please send Mellanox items to him.
- He will also help with v5 RM work.
- Howard was trying to build OSU benchmark (most recent) doesn't build simply against master and v5
- Howard didn't have mpicxx or mpicpp
- If this is an actual issue, assign this to Jeff.
- Also, Joseph set CC not CCX env, and C++ wraper wasn't being built.
- This Could be correct behavior even if it's unexpected.
- We're still waiting on Datatype issues now reported in v4.1.1
- If others can replicate tests/datatype/partial - make check
- Jeff and George can not get it to fail.
- If can make it fail with original, then try debug test with lots of output.
- Two users have reported it with two different environments.
- concerned.
- Also run by CI.
- Test we're talking about is on master (partial.c) This test was not cherry-picked back to release branches.
- Test is in PRs merged into v4.1.x, but we haven't merged PR to v4.0.x yet.
- Jeff will check the test on v4.1.x branch.
- If others can replicate tests/datatype/partial - make check
- Issue 8918 - Another datatype issue we need to look at.
- Need a review for 8898 (and equivalent v4.1.x)
- In holding pattern waiting for Datatype issue.
- Not taking too many more PRs in case we decide to spin a v4.1.2 with datatype regression fixes
- Went through a bunch of stuff last week.
- At least 3 PRs pending for v5.0
- Got ROMIO 3.4.1 sync in.
- Bringing Tommy in for nVidia RM.
- Examples and tests directory need to get done.
- Code Refactoring needs to get done.
- ompi PR 8816 is still open. Need rebasing
- Could be as easy as running clang-format on HEAD, and merging quickly.
- Any volunteer?
- Joseph saw Opal code, some copyright headers got scrambled.
- Fixed in master and v5.0
- Macros might need "don't reformat" tags around some macros.
- includes might need reordering to build properly.
- Joseph saw Opal code, some copyright headers got scrambled.
- May need to stop committing other PRs until this gets done.
- Nathan responded to a ping during the call and will try to get it done Thursday.
- Should eventually do oshem eventually
- Some folks didn't like the results
- Macro was one area and that can be address with tags.
- Do we want to set a date to close master if this doesn't get done?
- Not really, someone should just do it.
- Scope should only be an hour.
- May 14th turn on CI.
- PR 8816
- Would like Nathan to rebase and merge to master.
- Certain blocks we don't want to format (specifically some in datatype)
- clang format trips over
- Pmix is trying to maintain standards and library versions that are in sync with each other.
- There is a PMIx standard version and an open PMIx library version.
- Added some PMIx v4.1 standard items to the PMIx v4.0 branch
- Rest the PMIx v4.0 branch without all of v4.1 functionality.
- Open-MPI v5.0 will ship open PMIx v4.1 submodule
- Will require Prrte 2.0 will require open PMIx v4.1
- So if running with Open PMIx v4.0 or older, just can't use PRRTE
- Has anyone checked how far back Open-MPI v5 can work with PMIx?
- At one point verified it worked with open pmix v3.1, but there had been some work on top and need to reverify.
- No update
- Also some changes with libcurl, especially since this breaks OMPI built.
- PMIx can interface with REST interfaces (used by libcurl)
- JSON
- Build system issue in PMIx when we changed to static DSOs.
- Think this has been resolved
- Ralph was looking at this (private messaged Geoff)
- Jeff and Ralph and Yosif had a good conversation
- Lengthy discussion, Summary is, that it's a work in progress.
- Need to look at the public tests repo for merging in both ULFM and Sessions tests.
- Howard and Geoff will look at this week.
- ULFM is built in by default.
- Since we don't test it, then it degenerates quickly.
- At this moment the latest changes to PRRTE has broken ULFM.
- May be easier to integrate into somewhere else.
- Some tests put into OMPI-public - this test ran for 4 minutes on 4 nodes
- Would MTT be sufficient for the ULFM testing?
- It would be a step in the right direction.
- It WOULD be good to get things into CI.
- How do we do it without adding more time to CI.
- If someone has one physical box.
- Open MPI CI is not machines, it's someone needs to set this up and maintain it.
- It's be great if someone in the community could extend the Open-MPI Infrastructure and maintain this.
- Our CI tests are currently running on a single node.
- Could be extended, just need volunteers to learn and maintain.
- Converting docs to Readthedocs.io
- https://github.com/open-mpi/ompi/pull/8329
- PRRTE issue https://github.com/openpmix/prrte/issues/931 on how to document personalities