-
-
Notifications
You must be signed in to change notification settings - Fork 292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pex zip-creation takes a very long time for torch>=2
#2292
Comments
This has come up before. Two concrete results are the support for If neither
@cosmicexplorer explored both and came up wanting. I think #2158 is probably the best entrypoint into that work. |
I'll have a peek at those, thanks. We already use layout=packed (+execution_mode=venv) in some situations. In the specific case where I hit this I was running a
I still do think there is great value to being performant "by default" though, but maybe my effort is better invested into contributing to the already existing work by @cosmicexplorer -- will see if there's anything I can do there. |
I agree there, but the only real solution for that is faster zip support. FWICT that is a problem for native code and not really related to Pex at all. With that implemented though, Pex - and many other tools - could benefit. To be honest though, I think trying to make Pex - or any zipapp implementation - faster for behomoths like pytorch is fighting the wrong battle altogether. I imagine a much "simpler" way to do this is to not use a zipapp. For example, one might imagine a scie that contained all the resolved wheels for a zipapp, but not pre-installed wheels like PEXes contain, the actual wheel files downloaded from PyPI. The scie could then use PBS's Python distributions support for |
Alternatively, instead of the scie containing raw wheel files, a PEX could. Pex would then need to learn how to install wheels though at runtime. Currently it lets Pip do this at build time. In this way the whl contents of a PEX could be stored as STORED by default. |
That is also an option, and looks like was explored fairly well. Will see if that can be landed, it'd definitely be good. My approach is Python native, but probably a lot hackier since it depended a lot on zipfile internals.
I think my stance on
Hmm. That doesn't sound half bad, at least for some use-cases. I guess it'd be almost the same size as well, since zip only uses local compression. A wheel install is pretty much guaranteed to be isolated, right? I'm not sure I can fully see the implications for Pants though, or how it'd end up working in every situation ( |
This would be opaque to all Pex users at runtime. The PEX zipapp would use STORED unadulterated .whl files instead of today's DEFLATED installed wheel chroots and the packed layout would use I really do think this is the right way to go. Don't speed up zipping, avoid unzipping (installing wheels at build time) + zipping (back into a PEX zipapp or packed layout ~wheel zips) altogether. There will still be an unzip on a cold cache for the 1st boot at runtime, but since I experimented enough writing a PEP-427 installer today to see it works, but you need to handle generating console scripts since .whls in the wild, for the most part, don't actually carry these in proj-rev.data/scripts/... as you'd hope they would given PEP-427. |
@tgolsson I won't have solid time until the 23-28th, but I think I can get this knocked out and released then. I'm not sure exactly how to spell the feature activation, perhaps two new |
That sounds very good. My concern with pants is mostly how far away from |
It also seems like a good feature for Pex, regardless of Pants usage. |
This sets the stage for doing runtime installation of wheels without needing to ship a copy of Pip in every PEX file. To prove the robustness, convert build time installation of wheel chroots to this mechanism. Work towards pex-tool#2292
Noting I did not complete this during the current work stretch. It will be picked back up on December 10th when I start my next work stretch. |
This sets the stage for doing runtime installation of wheels without needing to ship a copy of Pip in every PEX file. To help prove the robustness, convert the current build time installation of wheel chroots to this mechanism. Work towards #2292
This should completely side-step the need for #2158 since it does better than that approach ever could by avoiding zipping altogether (and unzipping as well!). |
Ok, circling back to the OP using #2298:
So that's:
Of course, this is not a great example since the resulting PEX cannot be run as the elided warning indicates in both cases; so we can't examine the tradeoff in the 1st boot runtime penalty for installing the wheels just in time. |
And, using the OP, but with
So that's:
And at runtime:
So, in summary, that's (assuming resolve time for the build and run cases are equal and so are ignored):
This means, for local, internal-only use For cases where remote deployment cold 1st run start time is important (legacy lambdex use cases come to mind), For other cases the perf is a wash and more localized analysis is needed to decide which set of options to use. |
Working through the perf analysis in pex-tool#2292 brought these to light.
The analysis above is at the extreme end of PEX sizes (~2GB). I'll add the same analysis below for the extreme small end (A cowsay PEX) to button this up, assuming ~linearity between the two extremes. |
Ok, for a small case I used cowsay and ansicolors deps with this 93 byte app/src/main.pyimport colors
import cowsay
if __name__ == "__main__":
cowsay.tux(colors.blue("Moo?")) app/build-cowsay.sh#!/usr/bin/env bash
set -euo pipefail
PYTHON="${PYTHON:-python3.11}"
PEX_DIR="$(git rev-parse --show-toplevel)"
APP_DIR="${PEX_DIR}/app"
cd "${PEX_DIR}"
DEPS="${DEPS:-cowsay ansicolors}"
venv="$(mktemp -d)"
"${PYTHON}" -mvenv "${venv}"
"${venv}/bin/python" -mpip --disable-pip-version-check -q wheel --wheel-dir "${APP_DIR}/wheels" ${DEPS[*]}
function build_pex() {
echo "${PYTHON} -mpex --no-pypi -f ${APP_DIR}/wheels -D ${APP_DIR}/src -m main ${DEPS[*]} ${@}"
}
hyperfine \
-w2 \
-p 'rm -rf ~/.pex' \
-p 'rm -rf ~/.pex' \
-p 'rm -rf ~/.pex' \
-p 'rm -rf ~/.pex' \
-p 'rm -rf ~/.pex' \
-p 'rm -rf ~/.pex' \
-p '' \
-p '' \
-p '' \
-p '' \
-p '' \
-p '' \
-n 'Build zipappi (cold)' \
-n 'Build .whl zipapp (cold)' \
-n 'Build packed (cold)' \
-n 'Build .whl packed (cold)' \
-n 'Build loose (cold)' \
-n 'Build .whl loose (cold)' \
-n 'Build zipappi (warm)' \
-n 'Build .whl zipapp (warm)' \
-n 'Build packed (warm)' \
-n 'Build .whl packed (warm)' \
-n 'Build loose (warm)' \
-n 'Build .whl loose (warm)' \
"$(build_pex --layout zipapp -o ${APP_DIR}/cowsay.zipapp.pex)" \
"$(build_pex --layout zipapp --no-pre-install-wheels -o ${APP_DIR}/cowsay.zipapp.whls.pex)" \
"$(build_pex --layout packed -o ${APP_DIR}/cowsay.packed.pex)" \
"$(build_pex --layout packed --no-pre-install-wheels -o ${APP_DIR}/cowsay.packed.whls.pex)" \
"$(build_pex --layout loose -o ${APP_DIR}/cowsay.loose.pex)" \
"$(build_pex --layout loose --no-pre-install-wheels -o ${APP_DIR}/cowsay.loose.whls.pex)" \
"$(build_pex --layout zipapp -o ${APP_DIR}/cowsay.zipapp.pex)" \
"$(build_pex --layout zipapp --no-pre-install-wheels -o ${APP_DIR}/cowsay.zipapp.whls.pex)" \
"$(build_pex --layout packed -o ${APP_DIR}/cowsay.packed.pex)" \
"$(build_pex --layout packed --no-pre-install-wheels -o ${APP_DIR}/cowsay.packed.whls.pex)" \
"$(build_pex --layout loose -o ${APP_DIR}/cowsay.loose.pex)" \
"$(build_pex --layout loose --no-pre-install-wheels -o ${APP_DIR}/cowsay.loose.whls.pex)"
du -sbl ${APP_DIR}/cowsay.* | sort -n
app/perf-cowsay.sh#!/usr/bin/env bash
set -euo pipefail
PEX_DIR="$(git rev-parse --show-toplevel)"
APP_DIR="${PEX_DIR}/app"
cd "${APP_DIR}"
hyperfine \
-w2 \
-p 'rm -rf ~/.pex' \
-n 'Run zipapp cold' \
-n 'Run .whl zipapp cold' \
-n 'Run packed cold' \
-n 'Run .whl packed cold' \
-n 'Run loose cold' \
-n 'Run .whl loose cold' \
-n 'Run zipapp cold (parallel)' \
-n 'Run .whl zipapp coldi (parallel)' \
-n 'Run packed cold (parallel)' \
-n 'Run .whl packed cold (parallel)' \
-n 'Run loose cold (parallel)' \
-n 'Run .whl loose cold (parallel)' \
"./cowsay.zipapp.pex" \
"./cowsay.zipapp.whls.pex" \
"cowsay.packed.pex/__main__.py" \
"cowsay.packed.whls.pex/__main__.py" \
"cowsay.loose.pex/__main__.py" \
"cowsay.loose.whls.pex/__main__.py" \
"PEX_MAX_INSTALL_JOBS=0 ./cowsay.zipapp.pex" \
"PEX_MAX_INSTALL_JOBS=0 ./cowsay.zipapp.whls.pex" \
"PEX_MAX_INSTALL_JOBS=0 cowsay.packed.pex/__main__.py" \
"PEX_MAX_INSTALL_JOBS=0 cowsay.packed.whls.pex/__main__.py" \
"PEX_MAX_INSTALL_JOBS=0 cowsay.loose.pex/__main__.py" \
"PEX_MAX_INSTALL_JOBS=0 cowsay.loose.whls.pex/__main__.py"
The summary is:
|
…2298) The `--no-pre-install-wheels` option causes built PEXes to use raw `.whl` files. For `--layout zipapp` this means a single `.whl` file is `STORED` per dep, and for `--layout {packed,loose}` this means the loose `.deps/` dir contains raw `.whl` files. This speeds up all PEX builds by avoiding pre-installing wheel deps (~unzipping into the `PEX_ROOT`) and then, in the case of zipapp and packed layout, re-zipping. For large dependencies the time savings can be dramatic. Not pre-installing wheels comes with a PEX boot cold-start performance tradeoff since installation now needs to be done at runtime. This is generally a penalty of O(100ms), but that penalty can be erased for some deployment scenarios with the new `--max-install-jobs` build option / `PEX_MAX_INSTALL_JOBS` runtime env var. By default, runtime installs are performed serially, but this new option can be set to use multiple parallel install processes, which can speed up cold boots for large dependencies. Fixes #2292
…s of internal pexes (#20670) This has all internal PEXes be built with settings to improve performance: - with `--no-pre-install-wheels`, to package `.whl` directly rather than unpack and install them. (NB. this requires Pex 2.3.0 to pick up pex-tool/pex#2392) - with `PEX_MAX_INSTALL_JOBS`, to use more concurrency for install, when available This is designed to be a performance improvement for any processing where Pants synthesises a PEX internally, like `pants run path/to/script.py` or `pants test ...`. pex-tool/pex#2292 has benchmarks for the PEX tool itself. For benchmarks, I did some more purposeful ones with tensorflow (PyTorch seems a bit awkward hard to set-up and Tensorflow is still huge), using https://gist.github.com/huonw/0560f5aaa34630b68bfb7e0995e99285 . I did 3 runs each of two goals, with 2.21.0.dev4 and with `PANTS_SOURCE` pointing to this PR, and pulled the numbers out by finding the relevant log lines: - `pants --no-local-cache --no-pantsd --named-caches-dir=$(mktemp -d) test example_test.py`. This involves building 4 separate PEXes partially in parallel, partially sequentially: `requirements.pex`, `local_dists.pex` `pytest.pex`, and then `pytest_runner.pex`. The first and last are the interesting ones for this test. - `pants --no-local-cache --no-pantsd --named-caches-dir=$(mktemp -d) run script.py`. This just builds the requirements into `script.pex`. (NB. these are potentially unrealistic in they're running with all caching turned off or cleared, so are truly a worst case. This means they're downloading tensorflow wheels and all the others, each time, which takes about 30s on my 100Mbit/s connection. Faster connections will thus see a higher ratio of benefit.) | goal | period | before (s) | after (s) | |---------------------|------------------------------|-----------:|----------:| | `run script.py` | building requirements | 74-82 | 49-52 | | `test some_test.py` | building requirements | 67-71 | 30-36 | | | building pytest runner | 8-9 | 17-18 | | | total to start running tests | 76-80 | 53-58 | I also did more adhoc ones on a real-world work repo of mine, which doesn't use any of the big ML libraries, just running some basic goals once. | goal | period | before (s) | after (s) | | |---------------------------------------------------|-----------------------------------------|-----------:|----------:|----| | `pants export` on largest resolve | building requirements | 66 | 35 | | | | total | 82 | 54 | | | "random" `pants test path/to/file.py` (1 attempt) | building requirements and pytest runner | 1 | 49 | 38 | Fixes #15062
Hey!
Not sure if actionable, but maybe there's something here that can be done. I was investigating another issue today and ended up seeing a very slow Pants package step ~5 minutes. The issue reproduces with the simple command line
pex -vvv torch>=2 -o t2.2.pex
. This takes ~280 seconds on my machine, of which ~210-220 is spent purely in the zip step:This turns out to a 2.5 GB pex, which admittedly is on the fat side. Unzipping this beast takes ~30 seconds, and zipping it with regular
zip
takes ~230 seconds.zip -1
takes ~100 seconds and adds ~10% to the size.zip -0
takes 12 seconds but doubles the size. Seeing as compression seems to add the majority of the runtime, I did a very quick hack (outside of pex) where I move the compress step to a process pool (since it's CPU-heavy). With that, I get ~30 seconds at level 1, or about ~60 seconds on level 6. So 3-4x speed increase. It may be able to push this a bit higher by playing with ordering.I also played around with the store-only-by-suffix capabilities, but it seems like the .so's make up the bulk of both the compression potential and time: only compressing text-like files gives a ~4.3 GB zip in 20 seconds.
With all that said, I'm mostly curious if this is something that has been discussed elsewhere (found nothing while searching), and what kind of solution might be palatable relative to the gains that can be made. I'm willing to contribute something based on the work I've done so far, or investigate other suggested approaches.
The text was updated successfully, but these errors were encountered: