Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[reopen2] Release GIL when doing standalone solves #363

Merged
merged 7 commits into from
Jan 13, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 8 additions & 22 deletions .github/workflows/ci-linux-osx-win-conda.yml
Original file line number Diff line number Diff line change
Expand Up @@ -57,40 +57,29 @@ jobs:
with:
submodules: recursive

- uses: conda-incubator/setup-miniconda@v2
if: matrix.os != 'macos-14'
- uses: conda-incubator/setup-miniconda@v3
with:
miniforge-variant: Mambaforge
miniforge-version: latest
channels: conda-forge
python-version: "3.10"
activate-environment: proxsuite

- uses: conda-incubator/setup-miniconda@v3
if: matrix.os == 'macos-14'
with:
channels: conda-forge
python-version: "3.10"
activate-environment: proxsuite
installer-url: https://github.com/conda-forge/miniforge/releases/download/23.11.0-0/Mambaforge-23.11.0-0-MacOSX-arm64.sh

- name: Install dependencies [Conda]
shell: bash -l {0}
run: |
# Workaround for https://github.com/conda-incubator/setup-miniconda/issues/186
conda config --remove channels defaults
# Compilation related dependencies
mamba install cmake compilers make pkg-config doxygen ninja graphviz typing_extensions llvm-openmp clang
conda install cmake compilers make pkg-config doxygen ninja graphviz typing_extensions llvm-openmp clang
# Main dependencies
mamba install eigen simde
conda install eigen simde
# Test dependencies
mamba install libmatio numpy scipy
conda install libmatio numpy scipy

- name: Install julia [macOS/Linux]
if: contains(matrix.os, 'macos-latest') || contains(matrix.os, 'ubuntu')
- name: Install julia [Linux]
if: contains(matrix.os, 'ubuntu')
shell: bash -l {0}
run: |
mamba install julia
conda install julia

- name: Activate ccache [Conda]
uses: hendrikmuhs/[email protected]
Expand All @@ -102,7 +91,7 @@ jobs:
shell: bash -l {0}
run: |
conda info
mamba list
conda list
env

- name: Configure [Conda/Linux&macOS]
Expand Down Expand Up @@ -142,7 +131,6 @@ jobs:
shell: bash -l {0}
run: |
echo $(where ccache)
ls C:\\Miniconda3\\envs\\proxsuite\\Library\\lib
git submodule update --init
mkdir build
cd build
Expand All @@ -155,7 +143,6 @@ jobs:
shell: bash -l {0}
run: |
echo $(where ccache)
ls C:\\Miniconda3\\envs\\proxsuite\\Library\\lib
git submodule update --init
mkdir build
cd build
Expand All @@ -168,7 +155,6 @@ jobs:
shell: bash -l {0}
run: |
echo $(where ccache)
ls C:\\Miniconda3\\envs\\proxsuite\\Library\\lib
git submodule update --init
mkdir build
cd build
Expand Down
10 changes: 4 additions & 6 deletions .github/workflows/gh-pages.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,9 @@ jobs:
with:
submodules: recursive

- uses: conda-incubator/setup-miniconda@v2
- uses: conda-incubator/setup-miniconda@v3
with:
miniforge-variant: Mambaforge
miniforge-version: latest
channels: conda-forge
python-version: "3.10"
activate-environment: doc

Expand All @@ -27,16 +25,16 @@ jobs:
conda config --remove channels defaults

# Compilation related dependencies
mamba install cmake make pkg-config doxygen graphviz
conda install cmake make pkg-config doxygen graphviz

# Main dependencies
mamba install eigen
conda install eigen

- name: Print environment
shell: bash -l {0}
run: |
conda info
mamba list
conda list
env

- name: Configure
Expand Down
19 changes: 3 additions & 16 deletions .github/workflows/release-osx-win.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,38 +35,25 @@ jobs:
git submodule update

- name: Setup conda
if: contains(matrix.os, 'macos-13') || contains(matrix.os, 'windows')
uses: conda-incubator/setup-miniconda@v2
with:
miniforge-variant: Mambaforge
miniforge-version: latest
channels: conda-forge
python-version: ${{ matrix.python-version }}
activate-environment: proxsuite

- name: Setup conda
if: matrix.os == 'macos-14'
uses: conda-incubator/setup-miniconda@v3
with:
channels: conda-forge
miniforge-version: latest
python-version: ${{ matrix.python-version }}
activate-environment: proxsuite
installer-url: https://github.com/conda-forge/miniforge/releases/download/23.11.0-0/Mambaforge-23.11.0-0-MacOSX-arm64.sh

- name: Install dependencies [Conda]
if: contains(matrix.os, 'macos') || contains(matrix.os, 'windows')
shell: bash -l {0}
run: |
# Workaround for https://github.com/conda-incubator/setup-miniconda/issues/186
conda config --remove channels defaults
mamba install doxygen graphviz eigen simde cmake compilers typing_extensions
conda install doxygen graphviz eigen simde cmake compilers typing_extensions

- name: Print environment [Conda]
if: contains(matrix.os, 'macos') || contains(matrix.os, 'windows')
shell: bash -l {0}
run: |
conda info
mamba list
conda list
env

- name: Build wheel
Expand Down
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,12 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).

### Added
* Stub files for Python bindings, using [nanobind's native support](https://nanobind.readthedocs.io/en/latest/typing.html#stub-generation) ([#340](https://github.com/Simple-Robotics/proxsuite/pull/340))
* Add `solve_no_gil` for dense backend (multithreading via python) ([#363](https://github.com/Simple-Robotics/proxsuite/pull/363))
* Add benchmarks for `solve_no_gil` vs `solve_in_parallel` (openmp) ([#363](https://github.com/Simple-Robotics/proxsuite/pull/363))

### Changed
* Change Python bindings to use nanobind instead of pybind11 ([#340](https://github.com/Simple-Robotics/proxsuite/pull/340))
* Update setup-minicondav2 to v3 ([#363](https://github.com/Simple-Robotics/proxsuite/pull/363))


## [0.6.7] - 2024-08-27
Expand Down
139 changes: 107 additions & 32 deletions benchmark/timings-parallel.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,17 @@
import numpy as np
import scipy.sparse as spa
from time import perf_counter_ns
from concurrent.futures import ThreadPoolExecutor

"""
There are two interfaces to solve a QP problem with the dense backend. a) create a qp object by passing the problem data (matrices, vectors) to the qp.init method (this does memory allocation and the preconditioning) and then calling qp.solve or b) use the solve function directly taking the problem data as input (this does everything in one go).

Currently, only the qp.solve method (a) is parallelized (using openmp). Therefore the memory alloc + preconditioning is done in serial when building a batch of qps that is then passed to the `solve_in_parallel` function. The solve function (b) is not parallelized but can easily be parallelized in Python using ThreadPoolExecutor.

Here we do some timings to compare the two approaches. We generate a batch of QP problems and solve them in parallel using the `solve_in_parallel` function and compare the timings (need to add the timings for building the batch of qps + the parallel solving) with solving each problem in parallel using ThreadPoolExecutor for the solve function.
"""

num_threads = proxsuite.proxqp.omp_get_max_threads()


def generate_mixed_qp(n, n_eq, n_in, seed=1):
Expand All @@ -23,45 +34,109 @@ def generate_mixed_qp(n, n_eq, n_in, seed=1):
u = A @ v
l = -1.0e20 * np.ones(m)

return P.toarray(), q, A[:n_eq, :], u[:n_eq], A[n_in:, :], u[n_in:], l[n_in:]
return P.toarray(), q, A[:n_eq, :], u[:n_eq], A[n_in:, :], l[n_in:], u[n_in:]


n = 500
n_eq = 200
n_in = 200
problem_specs = [
# (n, n_eq, n_in),
(50, 20, 20),
(100, 40, 40),
(200, 80, 80),
(500, 200, 200),
(1000, 200, 200),
]

num_qps = 128

# qps = []
timings = {}
qps = proxsuite.proxqp.dense.VectorQP()

tic = perf_counter_ns()
for j in range(num_qps):
qp = proxsuite.proxqp.dense.QP(n, n_eq, n_in)
H, g, A, b, C, u, l = generate_mixed_qp(n, n_eq, n_in, seed=j)
qp.init(H, g, A, b, C, l, u)
qp.settings.eps_abs = 1e-9
qp.settings.verbose = False
qp.settings.initial_guess = proxsuite.proxqp.InitialGuess.NO_INITIAL_GUESS
qps.append(qp)
timings["problem_data"] = (perf_counter_ns() - tic) * 1e-6

tic = perf_counter_ns()
for qp in qps:
qp.solve()
timings["solve_serial"] = (perf_counter_ns() - tic) * 1e-6
for n, n_eq, n_in in problem_specs:

num_threads = proxsuite.proxqp.omp_get_max_threads()
for j in range(1, num_threads):
print(f"\nProblem specs: {n=} {n_eq=} {n_in=}. Generating {num_qps} such problems.")
problems = [generate_mixed_qp(n, n_eq, n_in, seed=j) for j in range(num_qps)]
print(
f"Generated problems. Solving {num_qps} problems with proxsuite.proxqp.omp_get_max_threads()={num_threads} threads."
)

timings = {}

# create a vector of QP objects. This is not efficient because memory is allocated when creating the qp object + when it is appended to the vector which creates a copy of the object.
qps_vector = proxsuite.proxqp.dense.VectorQP()
tic = perf_counter_ns()
proxsuite.proxqp.dense.solve_in_parallel(j, qps)
timings[f"solve_parallel_{j}_threads"] = (perf_counter_ns() - tic) * 1e-6
print("\nSetting up vector of qps")
for H, g, A, b, C, l, u in problems:
qp = proxsuite.proxqp.dense.QP(n, n_eq, n_in)
qp.init(H, g, A, b, C, l, u)
qp.settings.eps_abs = 1e-9
qp.settings.verbose = False
qp.settings.initial_guess = proxsuite.proxqp.InitialGuess.NO_INITIAL_GUESS
qps_vector.append(qp)
timings["setup_vector_of_qps"] = (perf_counter_ns() - tic) * 1e-6

# use BatchQP, which can initialize the qp objects in place and is more efficient
qps_batch = proxsuite.proxqp.dense.BatchQP()
tic = perf_counter_ns()
print("Setting up batch of qps")
for H, g, A, b, C, l, u in problems:
qp = qps_batch.init_qp_in_place(n, n_eq, n_in)
qp.init(H, g, A, b, C, l, u)
qp.settings.eps_abs = 1e-9
qp.settings.verbose = False
qp.settings.initial_guess = proxsuite.proxqp.InitialGuess.NO_INITIAL_GUESS
timings["setup_batch_of_qps"] = (perf_counter_ns() - tic) * 1e-6

tic = perf_counter_ns()
proxsuite.proxqp.dense.solve_in_parallel(qps=qps)
timings[f"solve_parallel_heuristics_threads"] = (perf_counter_ns() - tic) * 1e-6
print("Solving batch of qps using solve_in_parallel with default thread config")
tic = perf_counter_ns()
proxsuite.proxqp.dense.solve_in_parallel(qps=qps_batch)
timings[f"solve_in_parallel_heuristics_threads"] = (perf_counter_ns() - tic) * 1e-6

print("Solving vector of qps serially")
tic = perf_counter_ns()
for qp in qps_vector:
qp.solve()
timings["qp_solve_serial"] = (perf_counter_ns() - tic) * 1e-6

print("Solving batch of qps using solve_in_parallel with various thread configs")
for j in range(1, num_threads, 2):
tic = perf_counter_ns()
proxsuite.proxqp.dense.solve_in_parallel(qps=qps_batch, num_threads=j)
timings[f"solve_in_parallel_{j}_threads"] = (perf_counter_ns() - tic) * 1e-6

def solve_problem_with_dense_backend(
problem,
):
H, g, A, b, C, l, u = problem
return proxsuite.proxqp.dense.solve_no_gil(
H,
g,
A,
b,
C,
l,
u,
initial_guess=proxsuite.proxqp.InitialGuess.NO_INITIAL_GUESS,
eps_abs=1e-9,
)

# add final timings for the solve_in_parallel function considering setup time for batch of qps
for k, v in list(timings.items()):
if "solve_in_parallel" in k:
k_init = k + "_and_setup_batch_of_qps"
timings[k_init] = timings["setup_batch_of_qps"] + v

print("Solving each problem serially with solve function.")
# Note: here we just pass the problem data to the solve function. This does not require running the init method separately.
tic = perf_counter_ns()
for problem in problems:
solve_problem_with_dense_backend(problem)
timings["solve_fun_serial"] = (perf_counter_ns() - tic) * 1e-6

print(
"Solving each problem in parallel (with a ThreadPoolExecutor) with solve function."
)
tic = perf_counter_ns()
with ThreadPoolExecutor(max_workers=num_threads) as executor:
results = list(executor.map(solve_problem_with_dense_backend, problems))
timings["solve_fun_parallel"] = (perf_counter_ns() - tic) * 1e-6

for k, v in timings.items():
print(f"{k}: {v}ms")
print("\nTimings:")
for k, v in timings.items():
print(f"{k}: {v:.3f}ms")
Loading
Loading