Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

macOS Sonoma (14) support / CI #20339

Closed
29 of 32 tasks
jwnimmer-tri opened this issue Oct 10, 2023 · 22 comments · Fixed by #20678
Closed
29 of 32 tasks

macOS Sonoma (14) support / CI #20339

jwnimmer-tri opened this issue Oct 10, 2023 · 22 comments · Fixed by #20678
Assignees
Labels

Comments

@jwnimmer-tri
Copy link
Collaborator

jwnimmer-tri commented Oct 10, 2023

Is your feature request related to a problem? Please describe.

Apple has released macOS Sonoma (14) a couple weeks ago. Drake should officially (https://drake.mit.edu/installation.html) support it, which means testing and CI coverage.

Describe the solution you'd like

Add Sonoma builds to CI, get them passing, elevate to Production, then update our docs & releases.

Since we only support the two most recent versions of macOS, that means (mostly) dropping Monterey. However, we'll leave alone macOS Monterey (12) x86 CI for now.

Tasks:

@BetsyMcPhail
Copy link
Contributor

Reminder to myself to not forget to switch the packing job's "publish" flag from Monterey to Ventura in drake-ci: https://github.com/RobotLocomotion/drake-ci/blob/92bfabe99de6fcdd269b7aa654891ab0f8d4e214/driver/environment.cmake#L216

@jwnimmer-tri
Copy link
Collaborator Author

In #20340 we added the ability to test the Packaging and Wheel jobs on Ventura. Both of those passed (see logs for Wheel and Packaging).

@jwnimmer-tri
Copy link
Collaborator Author

I suppose let's say that Drake v1.22.0 (ETA next Monday) will still keep Monterey on all fronts, so let's not actually ditch any Monterey jobs yet. After the release happens, we can start shutting things off.

@jwnimmer-tri
Copy link
Collaborator Author

The release has been shipped, so you have a green light for turning off Monterey (on arm).

@jwnimmer-tri
Copy link
Collaborator Author

The goal should be to finish the first three items (mac arm packaging builds, docs, and drake-external-examples) before the next release (mid-November).

@BetsyMcPhail
Copy link
Contributor

While working on the first bullet point, I noticed that the Mac "staging-packaging" jobs have never been run and are not mentioned in the release playbook. Should these jobs be removed?

@jwnimmer-tri
Copy link
Collaborator Author

I can't say for certain without chasing down a few details first, but I'd guess that maybe @mwoehlke-kitware's work to do #18300 will end up needing them after all?

@BetsyMcPhail
Copy link
Contributor

I'll leave them for now!

@mwoehlke-kitware
Copy link
Contributor

mwoehlke-kitware commented Nov 9, 2023

Yes, post-#18300, cutting releases will require packages built by staging jobs.

@svenevs
Copy link
Contributor

svenevs commented Nov 9, 2023

@mwoehlke-kitware I may have made a mistake updating the jenkins yaml and deleted jobs you were using. If so, let us know and we can restore it or use the ventura flavor now possibly? After I made my change, linux-jammy-clang-bazel-continuous-everything-address-sanitizer and linux-jammy-clang-bazel-continuous-leak-sanitizer were showing up as never having been run before (right after I deployed jenkins)... Sorry :/

@jwnimmer-tri after switching the load balance between monterey and ventura jobs was mostly even, with the cache server health check job being the tie breaker. We went ahead and switched that over and are rebalancing to one monterey and two ventura runners. Posted an FYI in slack just in case we have issues.

@svenevs
Copy link
Contributor

svenevs commented Nov 10, 2023

DJJ: Figure out which Monterey jobs to switch over to Ventura (either permanently, or only until we have Sonoma at which point they move up again) and switch them over.

This is where we are currently at. I reached out to MacStadium to figure out if Sonoma is supported yet, it does not appear to be and I also get the impression we will need another orka upgrade first. While that may stall this ticket for a while, these are the last jobs on monterey arm being considered (curl https://drake-jenkins.csail.mit.edu/api/json?tree=jobs[name] | jq '.jobs[].name' | /bin/grep monterey | /bin/grep -v x86):

  • Continuous:
    • mac-arm-monterey-clang-bazel-continuous-release
  • Nightly:
    • mac-arm-monterey-clang-bazel-nightly-debug
    • mac-arm-monterey-clang-bazel-nightly-everything-address-sanitizer
    • mac-arm-monterey-clang-bazel-nightly-everything-release
    • mac-arm-monterey-clang-cmake-nightly-everything-release
    • mac-arm-monterey-clang-cmake-nightly-release
    • mac-arm-monterey-unprovisioned-clang-bazel-nightly-release
  • Experimental:
    • mac-arm-monterey-clang-bazel-experimental-address-sanitizer
    • mac-arm-monterey-clang-bazel-experimental-debug
    • mac-arm-monterey-clang-bazel-experimental-everything-address-sanitizer
    • mac-arm-monterey-clang-bazel-experimental-everything-debug
    • mac-arm-monterey-clang-bazel-experimental-everything-release
    • mac-arm-monterey-clang-bazel-experimental-release
    • mac-arm-monterey-clang-cmake-experimental-everything-release
    • mac-arm-monterey-clang-cmake-experimental-release
    • mac-arm-monterey-unprovisioned-clang-bazel-experimental-everything-release
    • mac-arm-monterey-unprovisioned-clang-bazel-experimental-release
    • mac-arm-monterey-unprovisioned-clang-cmake-experimental-everything-release
    • mac-arm-monterey-unprovisioned-clang-cmake-experimental-release

So, before we just switch them all to monterey blindly, do we know which ones we want to be sonoma long term?

@mwoehlke-kitware
Copy link
Contributor

@svenevs, these are the staging packages we need:

# Wheels.
f"drake-{version[1:]}-cp38-cp38-manylinux_2_31_x86_64.whl",
f"drake-{version[1:]}-cp39-cp39-manylinux_2_31_x86_64.whl",
f"drake-{version[1:]}-cp310-cp310-manylinux_2_31_x86_64.whl",
f"drake-{version[1:]}-cp311-cp311-manylinux_2_31_x86_64.whl",
f"drake-{version[1:]}-cp311-cp311-macosx_12_0_x86_64.whl",
f"drake-{version[1:]}-cp311-cp311-macosx_12_0_arm64.whl",
# Deb packages.
f"drake-dev_{version[1:]}-1_amd64-focal.deb",
f"drake-dev_{version[1:]}-1_amd64-jammy.deb",
# Tarballs.
f"drake-{version[1:]}-focal.tar.gz",
f"drake-{version[1:]}-jammy.tar.gz",
f"drake-{version[1:]}-mac.tar.gz",
f"drake-{version[1:]}-mac-arm64.tar.gz",

AFAICT we still have jobs for all of those? (IIUC, the non-wheel Ubuntu jobs should both generate one each .deb and .tar.gz, while all Ubuntu wheels are generated by one job.)

@svenevs
Copy link
Contributor

svenevs commented Nov 10, 2023

@mwoehlke-kitware staging now has mac-arm-ventura-unprovisioned-clang-wheel-staging-release instead of monterey. So I think the arm64 wheel is going to give you macosx_13_0 now (not 12_0). I'm about to open a PR, we missed this in the first pass.

AFAICT you never staged any wheels for 0.99, staging artifacts end up here. So (next week it seems) if you could stage the rest of the jobs, then we'll have the artifacts needed for you and know if that's the only change that the unrelated work here broke for you 🤞

@jwnimmer-tri
Copy link
Collaborator Author

macOS ARM CI policy:

(1) The "health check" can be whatever. I guess just a single macOS version?

(2) The nightly packaging and nightly wheel builds only run on the oldest supported macOS version for the given architecture. (This is already up-to-date with Sonoma as of RobotLocomotion/drake-ci#252.)

(3) The "release build from scratch" jobs (unprovisioned nightly) should be run on every supported macOS:

  • mac-x86-monterey-unprovisioned-clang-bazel-nightly-release
  • mac-arm-monterey-unprovisioned-clang-bazel-nightly-release (<== remove)
  • mac-arm-ventura-unprovisioned-clang-bazel-nightly-release
  • mac-arm-sonoma-unprovisioned-clang-bazel-nightly-release (<== to be added)

(4) The only thing in continuous should be the typical incremental release build, run on every supported macOS:

  • mac-x86-monterey-clang-bazel-continuous-release
  • mac-arm-monterey-clang-bazel-continuous-release (<== remove)
  • mac-arm-ventura-clang-bazel-continuous-release
  • mac-arm-sonoma-clang-bazel-continuous-release (<== to be added)

(5) The supplemental release-build nightlies (cmake, everything) should run on every supported macOS:

  • mac-arm-monterey-clang-bazel-nightly-everything-release (<== remove)
  • mac-arm-monterey-clang-cmake-nightly-everything-release (<== remove)
  • mac-arm-monterey-clang-cmake-nightly-release (<== remove)
  • mac-arm-ventura-clang-bazel-nightly-everything-release
  • mac-arm-ventura-clang-cmake-nightly-everything-release
  • mac-arm-ventura-clang-cmake-nightly-release
  • mac-arm-sonoma-clang-bazel-nightly-everything-release (<== to be added)
  • mac-arm-sonoma-clang-cmake-nightly-everything-release (<== to be added)
  • mac-arm-sonoma-clang-cmake-nightly-release (<== to be added)

(6) The extra-testing nightlies (debug) should run on the same macOS as the packaging jobs, i.e., the oldest one:

  • mac-arm-monterey-clang-bazel-nightly-debug (<== remove)
  • mac-arm-monterey-clang-bazel-nightly-everything-address-sanitizer (<== remove)
  • mac-arm-sonoma-clang-bazel-nightly-debug (<== to be added)
  • mac-arm-sonomaclang-bazel-nightly-everything-address-sanitizer (<== to be added)

If we end up over capacity, part (5) would be the one we could prune a bit.

@svenevs
Copy link
Contributor

svenevs commented Nov 14, 2023

@jwnimmer-tri I went as closely to that as we could, given how the yaml is setup. It did resurface that there is a note in there saying we would prefer to run an asan job on mac-arm but do not currently. Something worth reconsidering.

I have deleted all the monterey jobs as of those four PRs, both jenkins and https://github.com/RobotLocomotion/drake/tree/jenkins-jobs-experimental do not reveal any mac-arm-monterey. For right now I'm calling it a night, and am going to make us a third ventura runner after deploying those images (so I can delete the monterey ones).

We seem to have a path forward with Sonoma, but it seems we may also need to schedule an orka update as well. Will continue tomorrow.

@svenevs
Copy link
Contributor

svenevs commented Nov 17, 2023

Sonoma is alive in CI, not all configurations but a handful of jobs. The first two full builds:

There are new views, I'm going to launch the remaining ones and more results will come back in time (there is only one sonoma machine so that may take some time).

Note: we should see if disk space is actually going to be a problem. May need to add some more experimental builds, but AFAICT we don't need to deal with orka image resize to go from 90GB to 200GB anymore. Since the debugging symbols got nuked.

The actual reason this is being tested is because orka image resize was failing, so I decided to just see what the 90G images could do. Because if we do have enough space, we can actually get all the images on all the runners.

  1. Removes development time for image building.
  2. The previous limit was to fit 2 x 200GB images on a node. So we will be able to fit 4 x 90GB. Meaning all three CI runners become available for everything, and we can stop needing to worry about balancing the load and schedules between ventura and sonoma.

To be continued...

@BetsyMcPhail
Copy link
Contributor

BetsyMcPhail commented Dec 13, 2023

I confirmed with Stephen that the orka setup is ready for us to create the remaining Sonoma jobs. Summarizing from above and the current Jenkins state, there are a few Sonoma jobs that still need to be created (in the list below, all jobs with a link exist in Jenkins).

continuous

nightly

experimental

experimental jobs with no continuous/nightly equivalent

@BetsyMcPhail
Copy link
Contributor

The remaining non-production jobs (see previous comment) have been created in Jenkins. I will monitor these jobs for the next few days and then we can move them into production.

@BetsyMcPhail
Copy link
Contributor

Most of the nightly Sonoma jobs ran without issue. Several tests are failing on mac-arm-sonoma-clang-bazel-nightly-non-production-everything-address-sanitizer .

@jwnimmer-tri
Copy link
Collaborator Author

Okay, executive decision -- drop all ASan builds on all macs. Moving forward, ASan in CI will be Ubuntu-only.

@BetsyMcPhail
Copy link
Contributor

Okay, executive decision -- drop all ASan builds on all macs. Moving forward, ASan in CI will be Ubuntu-only.

Works for me! I will make the update in Jenkins.

@BetsyMcPhail
Copy link
Contributor

Sonoma jobs have been moved to production. The only item remaining is the doc update, opening a PR momentarily.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging a pull request may close this issue.

4 participants