Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use venv in CI #379

Merged
merged 10 commits into from
Jan 14, 2025
Merged

Use venv in CI #379

merged 10 commits into from
Jan 14, 2025

Conversation

Hardcode84
Copy link
Contributor

No description provided.

Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
@Hardcode84 Hardcode84 changed the title venv Use venv in CI Jan 10, 2025
@Hardcode84 Hardcode84 marked this pull request as ready for review January 10, 2025 17:40
@Hardcode84 Hardcode84 requested a review from ScottTodd January 10, 2025 17:40
Comment on lines 49 to 50
source ${VENV_DIR}/bin/activate
echo PATH=$PATH >> $GITHUB_ENV
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not keep the venv active across job steps from what I can tell.

See that the logs are not using .turbine-venv: https://github.com/iree-org/iree-turbine/actions/runs/12714371230/job/35446471647?pr=379#step:5:20

I usually explicitly activate the venv in each step that needs it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what's going on, /bin/activate doesn't do anything special besides updating PATH and unsetting PYTHONHOME

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, how PATH env should look like

PATH=/home/runner/work/iree-turbine/iree-turbine/.turbine-venv/bin:/opt/hostedtoolcache/Python/3.11.11/x64/bin:/opt/hostedtoolcache/Python/3.11.11/x64:/snap/bin:/home/runner/.local/bin:/opt/pipx_bin:/home/runner/.cargo/bin:/home/runner/.config/composer/vendor/bin:/usr/local/.ghcup/bin:/home/runner/.dotnet/tools:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin

And what it's getting instead in next step

PATH=/opt/hostedtoolcache/Python/3.11.11/x64/bin:/opt/hostedtoolcache/Python/3.11.11/x64:/home/runner/work/iree-turbine/iree-turbine/.turbine-venv/bin:/opt/hostedtoolcache/Python/3.11.11/x64/bin:/opt/hostedtoolcache/Python/3.11.11/x64:/snap/bin:/home/runner/.local/bin:/opt/pipx_bin:/home/runner/.cargo/bin:/home/runner/.config/composer/vendor/bin:/usr/local/.ghcup/bin:/home/runner/.dotnet/tools:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin

Our path is still here, but system python was prepended to path again

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, echo "$VENV_DIR/bin" >> $GITHUB_PATH seems to work and it broke torch as it cannot find GPU anymore

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what's going on, /bin/activate doesn't do anything special besides updating PATH and unsetting PYTHONHOME

It also sets the VIRTUAL_ENV environment variable, which some tools look for. It's safer to just source the activate script in each step that needs it IMO.

Haha, echo "$VENV_DIR/bin" >> $GITHUB_PATH seems to work and it broke torch as it cannot find GPU anymore

That is overwriting PATH, not appending to it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is overwriting PATH, not appending to it?

It's appending, according to docs

https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/workflow-commands-for-github-actions

echo "$HOME/.local/bin" >> "$GITHUB_PATH"

Copy link
Contributor Author

@Hardcode84 Hardcode84 Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, gpu tests were failing because we were installing cpu torch first and then pip install -r pytorch-rocm-requirements.txt was doing nothing. Now things seems to work (besides usual flaky attention tests), PTAL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the jobs in this file run on ephemeral github-hosted machines, using a virtual environment is less important. Could keep the file as-is, with pip caching.

Not that much of a difference in time though:

Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
# Note: We install in three steps in order to satisfy requirements
# from non default locations first. Installing the PyTorch CPU
# wheels saves multiple minutes and a lot of bandwidth on runner setup.
pip install --no-compile -r pytorch-rocm-requirements.txt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is pretty slow on the mi250 runner (11m30s to install deps for 5 minutes of running tests): https://github.com/iree-org/iree-turbine/actions/runs/12757197012/job/35556958617?pr=379

We noticed similar setup time issues over at nod-ai/shark-ai#780. One solution there was to use a different runner, but the coverage on multiple accelerator types is useful here. We could also try using a cache and/or https://github.com/astral-sh/uv instead of pip.

I still think it's worth using a venv instead of installing packages on persistent runners at the system level, though I do wish the default install steps for these requirements was faster. What do you think? Is this workflow performance acceptable for developers working in iree-turbine, or should we iterate some more?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if setup-pythons cache: 'pip' can make any difference. But I agree that using venv is better than system packages. @harsh-nod @raikonenfnu FYI

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I tried the cache provided by setup-python on nod-ai/shark-ai#640 and found that it wasn't compatible or at least didn't help.

Check the timestamps in the logs to see what is taking time. It looks like the downloads were fast or already cached, but the install itself was slow. Switching from pip to uv can help with install time (sometimes by a factor of 10-100x)

Mon, 13 Jan 2025 23:06:58 GMT Collecting pytorch-triton-rocm==3.1.0 (from torch>=2.3.0->-r pytorch-rocm-requirements.txt (line 2))
Mon, 13 Jan 2025 23:06:58 GMT  Downloading https://download.pytorch.org/whl/pytorch_triton_rocm-3.1.0-cp311-cp311-linux_x86_64.whl (344.9 MB)
Mon, 13 Jan 2025 23:07:01 GMT     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 344.9/344.9 MB 123.0 MB/s eta 0:00:00
Mon, 13 Jan 2025 23:07:01 GMT Collecting sympy==1.13.1 (from torch>=2.3.0->-r pytorch-rocm-requirements.txt (line 2))
...
Mon, 13 Jan 2025 23:07:03 GMT Installing collected packages: mpmath, typing-extensions, sympy, pillow, numpy, networkx, MarkupSafe, fsspec, filelock, pytorch-triton-rocm, jinja2, torch, torchvision, torchaudio
Mon, 13 Jan 2025 23:14:18 GMT Successfully installed MarkupSafe-2.1.5 filelock-3.13.1 fsspec-2024.2.0 jinja2-3.1.3 mpmath-1.3.0 networkx-3.2.1 numpy-1.26.3 pillow-10.2.0 pytorch-triton-rocm-3.1.0 sympy-1.13.1 torch-2.5.1+rocm6.2 torchaudio-2.5.1+rocm6.2 torchvision-0.20.1+rocm6.2 typing-extensions-4.9.0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, mi250 runner spent most of the install time in

Tue, 14 Jan 2025 00:43:03 GMT   changing mode of /groups/aig_sharks/actions-runner-iree/_work/iree-turbine/iree-turbine/.turbine-venv/bin/proton-viewer to 755
Tue, 14 Jan 2025 00:48:43 GMT   changing mode of /groups/aig_sharks/actions-runner-iree/_work/iree-turbine/iree-turbine/.turbine-venv/bin/convert-caffe2-to-onnx to 755

But mi300 didn't

Tue, 14 Jan 2025 00:42:27 GMT   changing mode of /home/sai/actions-runner-iree-turbine/_work/iree-turbine/iree-turbine/.turbine-venv/bin/proton-viewer to 755
Tue, 14 Jan 2025 00:43:18 GMT   changing mode of /home/sai/actions-runner-iree-turbine/_work/iree-turbine/iree-turbine/.turbine-venv/bin/convert-caffe2-to-onnx to 755

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ScottTodd We decided we can live with that times, let's merge it.

Signed-off-by: Ivan Butygin <[email protected]>
This reverts commit 4ba07eb.

Signed-off-by: Ivan Butygin <[email protected]>
@Hardcode84 Hardcode84 merged commit 11b6126 into iree-org:main Jan 14, 2025
10 checks passed
@Hardcode84 Hardcode84 deleted the venv branch January 14, 2025 18:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants