Fixes to API and CUDA backend
- Fixed a bug in the HTST API
- Fixed incorrect names in the Python log API
- The CUDA backend now uses CUB for reductions, eliminating some error-prone self-written boilerplate code.
- Improved the documentation of CUDA limitations, noting the fact that the GUI cannot be used with the CUDA backend on Windows or OSX