Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use venv in CI #379

Merged
merged 10 commits into from
Jan 14, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 20 additions & 11 deletions .github/workflows/ci-tk.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ jobs:
os: [ubuntu-22.04, nodai-amdgpu-mi300-x86-64, nodai-amdgpu-mi250-x86-64]
runs-on: ${{matrix.os}}
env:
PIP_CACHE_DIR: "${{ github.workspace }}/.pip-cache"
VENV_DIR: ${{ github.workspace }}/.turbine-venv
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

Expand All @@ -41,14 +41,15 @@ jobs:
with:
python-version: ${{matrix.version}}

- name: Cache Pip Packages
uses: actions/cache@1bd1e32a3bdc45362d1e726936510720a7c30a57 # v4.2.0
id: cache-pip
with:
path: ${{ env.PIP_CACHE_DIR }}
key: pip-${{ steps.setup_python.outputs.python-version }}-${{ hashFiles('*requirements*.txt') }}
- name: Create Python venv
run: |
python -m venv ${VENV_DIR}
source ${VENV_DIR}/bin/activate
echo VIRTUAL_ENV=$VIRTUAL_ENV >> "$GITHUB_ENV"
echo "$VENV_DIR/bin" >> "$GITHUB_PATH"

- name: Install pip deps
if: "!contains(matrix.os, 'amdgpu') && !cancelled()"
run: |
python -m pip install --no-compile --upgrade pip
# Note: We install in three steps in order to satisfy requirements
Expand All @@ -58,6 +59,17 @@ jobs:
pip install --no-cache-dir -r requirements-iree-pinned.txt --upgrade
pip install -r requirements.txt -e .

- name: Install GPU pip deps
if: "contains(matrix.os, 'amdgpu') && !cancelled()"
run: |
python -m pip install --no-compile --upgrade pip
# Note: We install in three steps in order to satisfy requirements
# from non default locations first. Installing the PyTorch CPU
# wheels saves multiple minutes and a lot of bandwidth on runner setup.
pip install --no-compile -r pytorch-rocm-requirements.txt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is pretty slow on the mi250 runner (11m30s to install deps for 5 minutes of running tests): https://github.com/iree-org/iree-turbine/actions/runs/12757197012/job/35556958617?pr=379

We noticed similar setup time issues over at nod-ai/shark-ai#780. One solution there was to use a different runner, but the coverage on multiple accelerator types is useful here. We could also try using a cache and/or https://github.com/astral-sh/uv instead of pip.

I still think it's worth using a venv instead of installing packages on persistent runners at the system level, though I do wish the default install steps for these requirements was faster. What do you think? Is this workflow performance acceptable for developers working in iree-turbine, or should we iterate some more?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if setup-pythons cache: 'pip' can make any difference. But I agree that using venv is better than system packages. @harsh-nod @raikonenfnu FYI

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I tried the cache provided by setup-python on nod-ai/shark-ai#640 and found that it wasn't compatible or at least didn't help.

Check the timestamps in the logs to see what is taking time. It looks like the downloads were fast or already cached, but the install itself was slow. Switching from pip to uv can help with install time (sometimes by a factor of 10-100x)

Mon, 13 Jan 2025 23:06:58 GMT Collecting pytorch-triton-rocm==3.1.0 (from torch>=2.3.0->-r pytorch-rocm-requirements.txt (line 2))
Mon, 13 Jan 2025 23:06:58 GMT  Downloading https://download.pytorch.org/whl/pytorch_triton_rocm-3.1.0-cp311-cp311-linux_x86_64.whl (344.9 MB)
Mon, 13 Jan 2025 23:07:01 GMT     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 344.9/344.9 MB 123.0 MB/s eta 0:00:00
Mon, 13 Jan 2025 23:07:01 GMT Collecting sympy==1.13.1 (from torch>=2.3.0->-r pytorch-rocm-requirements.txt (line 2))
...
Mon, 13 Jan 2025 23:07:03 GMT Installing collected packages: mpmath, typing-extensions, sympy, pillow, numpy, networkx, MarkupSafe, fsspec, filelock, pytorch-triton-rocm, jinja2, torch, torchvision, torchaudio
Mon, 13 Jan 2025 23:14:18 GMT Successfully installed MarkupSafe-2.1.5 filelock-3.13.1 fsspec-2024.2.0 jinja2-3.1.3 mpmath-1.3.0 networkx-3.2.1 numpy-1.26.3 pillow-10.2.0 pytorch-triton-rocm-3.1.0 sympy-1.13.1 torch-2.5.1+rocm6.2 torchaudio-2.5.1+rocm6.2 torchvision-0.20.1+rocm6.2 typing-extensions-4.9.0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, mi250 runner spent most of the install time in

Tue, 14 Jan 2025 00:43:03 GMT   changing mode of /groups/aig_sharks/actions-runner-iree/_work/iree-turbine/iree-turbine/.turbine-venv/bin/proton-viewer to 755
Tue, 14 Jan 2025 00:48:43 GMT   changing mode of /groups/aig_sharks/actions-runner-iree/_work/iree-turbine/iree-turbine/.turbine-venv/bin/convert-caffe2-to-onnx to 755

But mi300 didn't

Tue, 14 Jan 2025 00:42:27 GMT   changing mode of /home/sai/actions-runner-iree-turbine/_work/iree-turbine/iree-turbine/.turbine-venv/bin/proton-viewer to 755
Tue, 14 Jan 2025 00:43:18 GMT   changing mode of /home/sai/actions-runner-iree-turbine/_work/iree-turbine/iree-turbine/.turbine-venv/bin/convert-caffe2-to-onnx to 755

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ScottTodd We decided we can live with that times, let's merge it.

pip install --no-cache-dir -r requirements-iree-pinned.txt --upgrade
pip install -r requirements.txt -e .

- name: Run unit tests
if: ${{ !cancelled() }}
run: |
Expand All @@ -66,21 +78,18 @@ jobs:
- name: Test TKW runtime related stack on amdgpu
if: "contains(matrix.os, 'amdgpu') && !cancelled()"
run: |
pip install --no-compile -r pytorch-rocm-requirements.txt
export export WAVE_CACHE_DIR=$PWD/.wave
export WAVE_CACHE_DIR=$PWD/.wave
rm -rf ./.wave
WAVE_CACHE_ON=1 pytest --capture=tee-sys -vv --run-e2e ./tests/kernel/wave/runtime

- name: Run e2e tests on AMD GPU MI300
if: "contains(matrix.os, 'mi300') && !cancelled()"
run: |
pip install --no-compile -r pytorch-rocm-requirements.txt
WAVE_CACHE_ON=0 pytest -n 8 --capture=tee-sys -vv --run-e2e --gpu-distribute 8 ./tests/kernel/wave/

- name: Run e2e tests on AMD GPU MI250
if: "contains(matrix.os, 'mi250') && !cancelled()"
run: |
pip install --no-compile -r pytorch-rocm-requirements.txt
WAVE_CACHE_ON=0 pytest -n 2 --capture=tee-sys --run-e2e -vv ./tests/kernel/wave/

- name: Run LIT tests
Expand Down
14 changes: 7 additions & 7 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ jobs:
os: [ubuntu-22.04]
runs-on: ${{matrix.os}}
env:
PIP_CACHE_DIR: "${{ github.workspace }}/.pip-cache"
VENV_DIR: ${{ github.workspace }}/.turbine-venv
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

Expand All @@ -42,12 +42,12 @@ jobs:
with:
python-version: ${{matrix.version}}

- name: Cache Pip Packages
uses: actions/cache@1bd1e32a3bdc45362d1e726936510720a7c30a57 # v4.2.0
id: cache-pip
with:
path: ${{ env.PIP_CACHE_DIR }}
key: pip-${{ steps.setup_python.outputs.python-version }}-${{ hashFiles('*requirements*.txt') }}
- name: Create Python venv
run: |
python -m venv ${VENV_DIR}
source ${VENV_DIR}/bin/activate
echo VIRTUAL_ENV=$VIRTUAL_ENV >> "$GITHUB_ENV"
echo "$VENV_DIR/bin" >> "$GITHUB_PATH"

- name: Install pip deps
run: |
Expand Down
14 changes: 7 additions & 7 deletions .github/workflows/perf.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ jobs:
os: [nodai-amdgpu-mi300-x86-64]
runs-on: ${{matrix.os}}
env:
PIP_CACHE_DIR: "${{ github.workspace }}/.pip-cache"
VENV_DIR: ${{ github.workspace }}/.turbine-venv
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

Expand All @@ -43,12 +43,12 @@ jobs:
with:
python-version: ${{matrix.version}}

- name: Cache Pip Packages
uses: actions/cache@1bd1e32a3bdc45362d1e726936510720a7c30a57 # v4.2.0
id: cache-pip
with:
path: ${{ env.PIP_CACHE_DIR }}
key: pip-${{ steps.setup_python.outputs.python-version }}-${{ hashFiles('*requirements*.txt') }}
- name: Create Python venv
run: |
python -m venv ${VENV_DIR}
source ${VENV_DIR}/bin/activate
echo VIRTUAL_ENV=$VIRTUAL_ENV >> "$GITHUB_ENV"
echo "$VENV_DIR/bin" >> "$GITHUB_PATH"

- name: Install pip deps
run: |
Expand Down
Loading