-
Notifications
You must be signed in to change notification settings - Fork 546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: [DO NOT MERGE] introduce libcuml wheels #6199
base: branch-25.02
Are you sure you want to change the base?
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
…ges (#6217) Follow-up to #6190. Proposes some miscellaneous packaging cleanup: * declares `cuml-cu{11,12}` wheels' runtime dependency on `cuda-python` - *as a result of stuff like this: https://github.com/rapidsai/cuml/blob/bfd2e220d3adf5d8c6b76dc90e3d1275054f32d5/python/cuml/cuml/svm/linear.pyx#L40-L43* *~ adds `raft_log.txt` to `.gitignore`~ * adds CMake option `CUML_USE_RAFT_STATIC` - *to provide a default for this: https://github.com/rapidsai/cuml/blob/bfd2e220d3adf5d8c6b76dc90e3d1275054f32d5/cpp/CMakeLists.txt#L600* * defines `BUILD_CAGRA_HNSWLIB OFF` in `get_cuvs.cmake` - *as is done for RAFT: https://github.com/rapidsai/cuml/blob/bfd2e220d3adf5d8c6b76dc90e3d1275054f32d5/cpp/cmake/thirdparty/get_raft.cmake#L58* - *cuML doesn't need the CAGRA stuff from cuVS, as far as I can tell* - *this is `ON` by default in cuVS, so this change saves a bit of build time and size: https://github.com/rapidsai/cuvs/blob/1e548f8c3a773452ce69556f4db72fc712efae02/cpp/CMakeLists.txt#L58* * explicitly passing package type to `rapids-download-wheels-from-s3` in CI scripts ## Notes for Reviewers These changes are useful independently, but will also make the `libcuml` wheels PR (#6199) a bit smaller and easier to review. Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #6217
Replaces #2306, contributes to rapidsai/build-planning#33. Proposes packaging `libraft` as a wheel, which is then re-used by: * `pylibraft-cu{11,12}` and `raft-cu{11,12}` (this PR) * `libcugraph-cu{11,12}`, `pylibcugraph-cu{11,12}`, and `cugraph-cu{11,12}` in rapidsai/cugraph#4804 * `libcuml-cu{11,12}` and `cuml-cu{11,12}` in rapidsai/cuml#6199 As part of this, also proposes: * introducing a new CMake option, `RAFT_COMPILE_DYNAMIC_ONLY`, to allow building/installing only the dynamic shared library (i.e. skipping the static library) * enforcing `rapids-cmake`'s preferred CMake style (#2531 (comment)) * making wheel-building CI jobs always depend on other wheel-building CI jobs, not tests or `*-publish` (to reduce end-to-end CI time) ## Notes for Reviewers ### Benefits of these changes * smaller wheels (see "Size Changes" below) * faster compile times (no more re-compiling RAFT in cuGraph and cuML CI) * other benefits mentioned in rapidsai/build-planning#33 ### Wheel contents `libraft`: * `libraft.so` (shared library) * RAFT headers * vendored dependencies (`fmt`, CCCL, `cuco`, `cute`, `cutlass`) `pylibraft`: * `pylibraft` Python / Cython code and compiled Cython extensions `raft-dask`: * `raft-dask` Python / Cython code and compiled Cython extension ### Dependency Flows In short.... `libraft` contains a `libraft.so` dynamic library and the headers to link against it. * Anything that needs to link against RAFT at build time pulls in `libraft` wheels as a build dependency. * Anything that needs RAFT's symbols at runtime pulls it in as a runtime dependency, and calls `libraft.load_library()`. For more details and some flowcharts, see rapidsai/build-planning#33 (comment) ### Size changes (CUDA 12, Python 3.12, x86_64) | wheel | num files (before) | num files (these PRs) | size (before) | size (these PRs) | |:---------------:|------------------:|-----------------:|--------------:|-------------:| | `libraft`. | --- | 3169 | --- | 19M | | `pylibraft` | 64 | 63 | 11M | 1M | | `raft-dask` | 29 | 28 | 188M | 188M | | `libcugraph` | --- | 1762 | --- | 903M | | `pylibcugraph` | 190 | 187 | 901M | 2M | | `cugraph` | 315 | 313 | 899M | 3.0M | | `libcuml` | --- | 1766 | --- | 289M | | `cuml` | 442 | --- | 517M | --- | |**TOTAL** | **1,040** | **7,268** | **2,516M** | **1,405M** | *NOTES: size = compressed, "before" = 2025-01-13 nightlies* <details><summary>how I calculated those (click me)</summary> * `cugraph`: nightly commit = rapidsai/cugraph@8507cbf, PR = rapidsai/cugraph#4804 * `cuml`: nightly commit = rapidsai/cuml@7c715c4, PR = rapidsai/cuml#6199 * `raft`: nightly commit = 1b62c41, PR = this PR ```shell docker run \ --rm \ --network host \ --env RAPIDS_NIGHTLY_DATE=2025-01-13 \ --env CUGRAPH_NIGHTLY_SHA=8507cbf63db2f349136b266d3e6e787b189f45a0 \ --env CUGRAPH_PR="pull-request/4804" \ --env CUGRAPH_PR_SHA="2ef32eaa006a84c0bd16220bb8e8af34198fbee8" \ --env CUML_NIGHTLY_SHA=7c715c494dff71274d0fdec774bdee12a7e78827 \ --env CUML_PR="pull-request/6199" \ --env CUML_PR_SHA="2ef32eaa006a84c0bd16220bb8e8af34198fbee8" \ --env RAFT_NIGHTLY_SHA=1b62c4117a35b11ce3c830daae248e32ebf75e3f \ --env RAFT_PR="pull-request/2531" \ --env RAFT_PR_SHA="0d6597b08919f2aae8ac268f1a68d6a8fe5beb4e" \ --env RAPIDS_PY_CUDA_SUFFIX=cu12 \ --env WHEEL_DIR_BEFORE=/tmp/wheels-before \ --env WHEEL_DIR_AFTER=/tmp/wheels-after \ -it rapidsai/ci-wheel:cuda12.5.1-rockylinux8-py3.12 \ bash # --- nightly wheels --- # mkdir -p ./wheels-before export RAPIDS_BUILD_TYPE=branch export RAPIDS_REF_NAME="branch-25.02" # pylibraft RAPIDS_PY_WHEEL_NAME="pylibraft_${RAPIDS_PY_CUDA_SUFFIX}" \ RAPIDS_REPOSITORY=rapidsai/raft \ RAPIDS_SHA=${RAFT_NIGHTLY_SHA} \ rapids-download-wheels-from-s3 python ./wheels-before # raft-dask RAPIDS_PY_WHEEL_NAME="raft_dask_${RAPIDS_PY_CUDA_SUFFIX}" \ RAPIDS_REPOSITORY=rapidsai/raft \ RAPIDS_SHA=${RAFT_NIGHTLY_SHA} \ rapids-download-wheels-from-s3 python ./wheels-before # cugraph RAPIDS_PY_WHEEL_NAME="cugraph_${RAPIDS_PY_CUDA_SUFFIX}" \ RAPIDS_REPOSITORY=rapidsai/cugraph \ RAPIDS_SHA=${CUGRAPH_NIGHTLY_SHA} \ rapids-download-wheels-from-s3 python ./wheels-before # pylibcugraph RAPIDS_PY_WHEEL_NAME="pylibcugraph_${RAPIDS_PY_CUDA_SUFFIX}" \ RAPIDS_REPOSITORY=rapidsai/cugraph \ RAPIDS_SHA=${CUGRAPH_NIGHTLY_SHA} \ rapids-download-wheels-from-s3 python ./wheels-before # cuml RAPIDS_PY_WHEEL_NAME="cuml_${RAPIDS_PY_CUDA_SUFFIX}" \ RAPIDS_REPOSITORY=rapidsai/cuml \ RAPIDS_SHA=${CUML_NIGHTLY_SHA} \ rapids-download-wheels-from-s3 python ./wheels-before # --- wheels from CI --- # mkdir -p ./wheels-after export RAPIDS_BUILD_TYPE="pull-request" # libraft RAPIDS_PY_WHEEL_NAME="libraft_${RAPIDS_PY_CUDA_SUFFIX}" \ RAPIDS_REPOSITORY=rapidsai/raft \ RAPIDS_REF_NAME="${RAFT_PR}" \ RAPIDS_SHA="${RAFT_PR_SHA}" \ rapids-download-wheels-from-s3 cpp ./wheels-after # pylibraft RAPIDS_PY_WHEEL_NAME="pylibraft_${RAPIDS_PY_CUDA_SUFFIX}" \ RAPIDS_REPOSITORY=rapidsai/raft \ RAPIDS_REF_NAME="${RAFT_PR}" \ RAPIDS_SHA="${RAFT_PR_SHA}" \ rapids-download-wheels-from-s3 python ./wheels-after # raft-dask RAPIDS_PY_WHEEL_NAME="raft_dask_${RAPIDS_PY_CUDA_SUFFIX}" \ RAPIDS_REPOSITORY=rapidsai/raft \ RAPIDS_REF_NAME="${RAFT_PR}" \ RAPIDS_SHA="${RAFT_PR_SHA}" \ rapids-download-wheels-from-s3 python ./wheels-after # libcugraph RAPIDS_PY_WHEEL_NAME="libcugraph_${RAPIDS_PY_CUDA_SUFFIX}" \ RAPIDS_REPOSITORY=rapidsai/cugraph \ RAPIDS_REF_NAME="${CUGRAPH_PR}" \ RAPIDS_SHA="${CUGRAPH_PR_SHA}" \ rapids-download-wheels-from-s3 cpp ./wheels-after # pylibcugraph RAPIDS_PY_WHEEL_NAME="pylibcugraph_${RAPIDS_PY_CUDA_SUFFIX}" \ RAPIDS_REPOSITORY=rapidsai/cugraph \ RAPIDS_REF_NAME="${CUGRAPH_PR}" \ RAPIDS_SHA="${CUGRAPH_PR_SHA}" \ rapids-download-wheels-from-s3 python ./wheels-after # cugraph RAPIDS_PY_WHEEL_NAME="cugraph_${RAPIDS_PY_CUDA_SUFFIX}" \ RAPIDS_REPOSITORY=rapidsai/cugraph \ RAPIDS_REF_NAME="${CUGRAPH_PR}" \ RAPIDS_SHA="${CUGRAPH_PR_SHA}" \ rapids-download-wheels-from-s3 python ./wheels-after # libcuml RAPIDS_PY_WHEEL_NAME="libcuml_${RAPIDS_PY_CUDA_SUFFIX}" \ RAPIDS_REPOSITORY=rapidsai/cuml \ RAPIDS_REF_NAME="${CUML_PR}" \ RAPIDS_SHA="${CUML_PR_SHA}" \ rapids-download-wheels-from-s3 cpp ./wheels-after # cuml RAPIDS_PY_WHEEL_NAME="cuml_${RAPIDS_PY_CUDA_SUFFIX}" \ RAPIDS_REPOSITORY=rapidsai/cuml \ RAPIDS_REF_NAME="${CUML_PR}" \ RAPIDS_SHA="${CUML_PR_SHA}" \ rapids-download-wheels-from-s3 python ./wheels-after pip install pydistcheck pydistcheck \ --inspect \ --select 'distro-too-large-compressed' \ ./wheels-before/*.whl \ | grep -E '^checking|files: | compressed' \ > ./before.txt # get more exact sizes du -sh ./wheels-before/* pydistcheck \ --inspect \ --select 'distro-too-large-compressed' \ ./wheels-after/*.whl \ | grep -E '^checking|files: | compressed' \ > ./after.txt # get more exact sizes du -sh ./wheels-after/* ``` </details> ### How I tested this These other PRs: * rapidsai/devcontainers#435 * rapidsai/cugraph-gnn#110 * rapidsai/cuml#6199 * rapidsai/cugraph#4804
/ok to test |
/ok to test |
# libcuml (C++) and cuml (Cython). | ||
set(CUML_USE_CUVS_STATIC OFF) | ||
set(CUML_EXCLUDE_CUVS_FROM_ALL ON) | ||
include(${CUML_CPP_SRC}/cmake/thirdparty/get_cuvs.cmake) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the Cython code has direct uses cuVS:
cdef extern from "cuvs/cluster/kmeans.hpp" namespace \ |
so cuVS is needed in the build environment for both libcuml
and cuml
wheels. That means we end up compiling libcuvs.so
in every libcuml
build AND every cuml
build.
Even if with decent cache hit rates, on this PR I've seen that result in it taking on the order of 2.5 hours end-to-end for all the build-libcuml
and build-cuml
jobs to complete :/
Maybe we need to stop here and try to add a libcuvs
wheel?
) | ||
fi | ||
elif [[ "${package_dir}" == "python/cuml" ]]; then | ||
# TODO(jameslamb): why are the CUDA 11 wheels so big??? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't found the root cause yet, but the CUDA 11 cuml
wheels being produced on this branch are a lot bigger than I'd expect.
Suspect it's related to static linking against the CUDA math wheels, but I'm surprised that the difference could be so large for these Cython extensions.
For context, the Cython extension sizes don't really same to vary much by CUDA version on latest branch-25.02
:
- CUDA 11.8.0: https://github.com/rapidsai/cuml/actions/runs/12836193902/job/35797302998#step:10:1672
- CUDA 12.5.1: https://github.com/rapidsai/cuml/actions/runs/12836193902/job/35797303218#step:10:2754
It's just libcuml++.so
driving the big difference in total size on branch-25.02
.
CUDA 11.8.0, arm64, Python 3.11
file size
* compressed size: 1.1G
* uncompressed size: 4.0G
* compression space saving: 72.4%
contents
* directories: 74
* files: 441 (85 compiled)
...
largest files
* (86.8M) cuml/experimental/fil/fil.cpython-311-aarch64-linux-gnu.so
* (86.7M) cuml/fil/fil.cpython-311-aarch64-linux-gnu.so
* (85.3M) cuml/ensemble/randomforest_shared.cpython-311-aarch64-linux-gnu.so
* (85.2M) cuml/explainer/tree_shap.cpython-311-aarch64-linux-gnu.so
* (85.2M) cuml/explainer/kernel_shap.cpython-311-aarch64-linux-gnu.so
CUDA 12.5.1, arm64, Python 3.11
file size
* compressed size: 8.9M
* uncompressed size: 30.1M
* compression space saving: 70.4%
contents
* directories: 74
* files: 441 (85 compiled)
...
largest files
* (2.6M) cuml/experimental/fil/fil.cpython-311-aarch64-linux-gnu.so
* (2.4M) cuml/fil/fil.cpython-311-aarch64-linux-gnu.so
* (1.0M) cuml/cluster/hdbscan/hdbscan.cpython-311-aarch64-linux-gnu.so
* (0.9M) cuml/svm/linear.cpython-311-aarch64-linux-gnu.so
* (0.9M) cuml/manifold/umap.cpython-311-aarch64-linux-gnu.so
# --- RAFT---# | ||
# find RAFT before cuVS, to avoid | ||
# cuVS CMake defining conflicting versions of targets like 'nvidia::cutlass::cutlass' | ||
include(${CUML_CPP_SRC}/cmake/thirdparty/get_raft.cmake) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without first finding RAFT here, builds fail like this:
-- Found Thrust: /pyenv/versions/3.12.7/lib/python3.12/site-packages/libraft/lib64/rapids/cmake/thrust/thrust-config.cmake (found suitable exact version "2.7.0.0")
-- Found CCCL: /pyenv/versions/3.12.7/lib/python3.12/site-packages/libraft/lib64/rapids/cmake/cccl/cccl-config.cmake (found version "2.7.0.0")
-- Found nvtx3: /pyenv/versions/3.12.7/lib/python3.12/site-packages/librmm/lib64/cmake/nvtx3/nvtx3-config.cmake (found version "3.1.0")
-- Found rmm: /pyenv/versions/3.12.7/lib/python3.12/site-packages/librmm/lib64/cmake/rmm/rmm-config.cmake (found version "25.02.0")
CMake Error at /pyenv/versions/3.12.7/lib/python3.12/site-packages/libraft/lib64/cmake/NvidiaCutlass/NvidiaCutlassTargets.cmake:42 (message):
Some (but not all) targets in this export set were already defined.
Targets Defined: nvidia::cutlass::cutlass
Targets not yet defined: nvidia::cutlass::tools::util
Call Stack (most recent call first):
/pyenv/versions/3.12.7/lib/python3.12/site-packages/libraft/lib64/cmake/NvidiaCutlass/NvidiaCutlassConfig.cmake:9 (include)
/pyenv/versions/3.12.7/lib/python3.12/site-packages/cmake/data/share/cmake-3.31/Modules/CMakeFindDependencyMacro.cmake:76 (find_package)
/pyenv/versions/3.12.7/lib/python3.12/site-packages/libraft/lib64/cmake/raft/raft-dependencies.cmake:43 (find_dependency)
/pyenv/versions/3.12.7/lib/python3.12/site-packages/libraft/lib64/cmake/raft/raft-config.cmake:83 (include)
/pyenv/versions/3.12.7/lib/python3.12/site-packages/libcuml/lib64/cmake/cuml/cuml-dependencies.cmake:40 (find_package)
/pyenv/versions/3.12.7/lib/python3.12/site-packages/libcuml/lib64/cmake/cuml/cuml-config.cmake:72 (include)
CMakeLists.txt:150 (find_package)
-- Configuring incomplete, errors occurred!
*** CMake configuration failed
error: subprocess-exited-with-error
I suspect that's some interaction between RAFT's exports (which do include cutlass)) and cuVS's (code link), but I haven't figured it out yet.
Replaces #6006, contributes to rapidsai/build-planning#33.
Proposes packaging
libcuml
as a wheel, which is then re-used bycuml-cu{11,12}
wheels.Notes for Reviewers
If you see this note, that means this is not ready for review.
Benefits of these changes
Wheel contents
libcuml
:libcuml++.so
(shared library) and its headerslibcumlprims_mg.so
(shared library) and its headersfmt
)cuml
:cuml
Python / Cython code and compiled Cython extensionsDependency Flows
In short....
libcuml
containslibcuml.so
andlibcumlprims_mg.so
dynamic libraries and the headers to link against them.libcugraph
wheels as a build dependency.libcuml.load_library()
.For more details and some flowcharts, see rapidsai/build-planning#33 (comment)
Size changes (CUDA 12, Python 3.12, x86_64)
libcuml
cuml
NOTES: size = compressed, "before" = 2025-01-13 nightlies
how I calculated those (click me)
How I tested this
These other PRs: