-
Notifications
You must be signed in to change notification settings - Fork 649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Cross-Platform Refactor: Build System and Binary Distribution #1032
Comments
Hmm... looking at https://github.com/TimDettmers/bitsandbytes/pull/949/checks, the wheels that get built are only up to 22 megabytes? |
These artifacts only have the libraries built for one CUDA toolkit version. Side note: I think the code still expects The current Linux x86-64 wheels on PyPI have binaries for 11.0, 11.1, 11.4, 11.5, 11.7, 11.8, 12.0, 12.1, 12.2, and 12.3, so it adds up quickly. The wheel is 105MB and when unpacked it's 3x that size. If I recall correctly, VS2022 support came with CUDA 11.6, and I don't think we should bother with anything older than that for Windows builds. On Windows we don't really have to worry about a BC break but it's something to consider for Linux. But with that said, my opinion is that the default binary wheels for both should narrow the list down and instruct to build from source for some of the older versions. Alternatively, build separate wheels for those older ones? The
Some of the NVIDIA NGC containers from the past ~2 years:
My opinion here is maybe to ship with at most five versions by default: cu117, cu118, cu121, cu122, cu123. This would mean dropping these five that are in the current wheels: cu110, cu111, cu114, cu115, cu120. The idea: cover the support matrix from the last 4 torch releases. |
IMHO, this should be avoided if at all possible. I would consider 1) in this case, even if it is a breaking change (maybe bump major version?) Does CUDA follow semantic versioning? I.e. will CUDA 12.2.1-compiled binaries also run on CUDA 12.2.2 ? I hope at least this much is true, then we can at least do what @matthewdouglas suggests. Additionally, could we rely on minor version compatibility?
If so, we can reduce the build matrix significantly, basically compiling only for CUDA 11 and CUDA 12. We would have to ve careful when upgrading though, as each new minor version drops all the older ones. Note that this document speaks about ABIs in CUDA in general. Not sure if this applies to CUDA and toolkit, i.e. if they follow standard Linux semantic .so rules. I guess we are bound to Torch for hte CUDA toolkit version to use, otherwise we could also look at compiling statically in this case? |
My understanding is that For what it's worth, I've had success running most of the tests on Windows with I haven't tried <=11.8 yet but will at some point. I have some thoughts here on how things can be simplified and will try to put together some PRs soon. One of the other things I'm thinking is we should be able to do away with the separate |
Windows binaries Hey @matthewdouglas @wkpark @rickardp @akx, Just checking in about making the Windows build pip installable. I currently don't have a Windows machine to test stuff and I saw that some of you seemed to have already played around with the new wheels. I didn't have time to investigate this, but it seems to me Windows packaging is pretty much solved (thanks for this amazing work ❤️ ). Are you aware of anything that's still missing in regard to fully supporting My idea would be to publish to testPypi this week and make sure everything is as we expect it to be before going ahead with the official release afterwards. |
@Titus-von-Koeller I've been able to successfully run most of the unit tests, but experience a system crash when testing the 32bit optimizers. I'm not sure if it's just because I have an underwhelming amount of vRAM (6GB) or if there's more to it. I have a memory dump from this but have not looked into it yet. Apart from that I think we just need to make sure to include the binaries for at least both cu121 and cu118 in the wheels. |
I gave That said, the wheel only contains CPU and CUDA 12 (not CUDA 11 and 12), so that needs to be rectified:
|
Ok, so Tim gave me the following statement:
He thinks that we should simplify the CUDA versions we're supporting and reduce to only the ones PyTorch supports. Tim concurs that the versions from the last 4 releases should be enough going forward:
Thanks a lot @matthewdouglas for your thorough analysis on the topic, this was extremely helpful ❤️ ! Tim also thinks that "CUDA 12.2.1-compiled binaries [should] also run on CUDA 12.2.2". That would potentially really simplify things then, as @rickardp explained, right?
@rickardp Could you please clarify what you meant with the below sentence, especially the second part?
|
Sounds good!
Yes, I just wanted confirmation that I understood this correctly but this is also my understanding
I meant exactly this. If we upgraded to 12.2.2, it cannot (typically) run in 12.2.1 ABI. Or so I would assume, based on the above. |
I definitely would expect compiling with 12.2.1 to run on 12.2.2 also. Patch version compatibility shouldn't be much of an issue. But based on the docs, after 11.1, I'd also expect to see some level of minor version compatibility too - i.e. compile with 12.3, run with any 12.x at runtime assuming a new enough driver is installed. As far as device coverage goes, here's the main things I think to note:
Aside: Edit: CUDA 12.4.0 download page is up now. Edit 2: Attaching a Windows binary built with CUDA 12.4.0 in case anyone wants to try. Note: only built for Pascal and newer. |
Archiving this, because it's out of date and we ended up favoring other modes of interaction to coordinate. |
We're confirming the choice of migrating to CMake for cross-platform builds and are in the process of moving everything to Github workflows.
The goal of this issue is to have a central place to discuss, agree on work still needed and track progress on the following topics:
Strategies for Binary Distribution and Size Constraints: Discuss the options for distributing binaries, considering PyPI's size restrictions and the potential for using separate packages for each platform or an additional index. We will likely ask for an exemption to the PyPi size constraint for the time being, but we need a new way to go forward. I am aware of useful discussions around these topics in PRs and issues in the last week, but since I'll be away on vacation and need to leave now, I couldn't yet integrate everything into this page.
However, the idea is that this is the central hub for such discussions and I'll integrate any feedback into this description here in order to have an up-to-date summary of decisions, challenges, tasks, etc.
Distribution of binaries
Right now we're already hitting PyPI's 100MB size restriction and could therefore not add new functionality yet, as it would increase binary size.
There are two approaches that could solve this:
@albanD had a few valuable comments about that (feel free to engage in this discussion as well :) ):
Security implications
Also from @albanD:
The text was updated successfully, but these errors were encountered: