-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Contribute Pillow-SIMD back to Pillow #8
Comments
Compile-time decisionIn general, SIMD code will looks like this: #if defined(__SSE4__)
#include <emmintrin.h>
#include <mmintrin.h>
#include <smmintrin.h>
#if defined(__AVX2__)
#include <immintrin.h>
#endif
#endif
void
ImagingResampleSomething(UINT32 *lineOut, UINT32 *lineIn, int xmax)
{
var x = 0;
#if defined(__AVX2__)
for (; x < xmax - 7; x += 8) {
// AVX2 code
}
#endif
#if defined(__SSE4__)
for (; x < xmax - 3; x += 4) {
// SSE4 code
}
#endif
for (; x < xmax; x += 1) {
// General x86 code
}
} If we are compiling the code with We have to set compiler flags in Cons:
|
Different .so modulesSuggested by @socketpair. There are three .so files compiled with different flags:
We are detecting CPUID and load appropriate modules from python code. Pros:
Cons:
|
Different functions in different .o filesSuggested by @toshic here (rus). We pass compiler flags to each .c file, not whole module. So we can compile several .c files to the different .o files. Of event the same file with different flags to the different .o files and link them all. void
#if defined(__AVX2__)
ImagingResampleSomething_avx2(UINT32 *lineOut, UINT32 *lineIn, int xmax)
#elif defined(__SSE4__)
ImagingResampleSomething_sse4(UINT32 *lineOut, UINT32 *lineIn, int xmax)
#else
ImagingResampleSomething_general(UINT32 *lineOut, UINT32 *lineIn, int xmax)
#endif
{
// Same code as for compile-time decision
} Pros:
Cons:
|
I think that 3x compile time is not a problem at all novadays. Simplicity is the key, especially when no performance impact occurs. Linking I vote for my variant :) Also, it will be nice yo have API to force loading of |
There was a similar conversation on the Tensorflow issue tracker with some suggested approaches: tensorflow/tensorflow#7257 (comment). |
Would it be possible to package just the SIMD-accelerated routines, i.e. not duplicate all of Pillow in the fork? I.e. from PIL.Image import Image
from pillow_simd import resize
i = Image.open(...)
i = resize(i, (500, 500), ...) |
I have the same problems with projects at work. So I am working on this for trying find a easy to use solution. |
Hi, I have interest in this issue, and implemented AVX2 / SS4 runtime switching (grafi-tt@96f18c6) The usage is like: import PIL
from PIL import Image
import timeit
IMG_FILE = "image-1920x1200.png"
print(PIL.get_available_builds())
print(PIL.get_build())
img = Image.open(IMG_FILE)
result = timeit.timeit('img.resize((960, 600), resample=Image.BICUBIC)', globals=globals(), number=1000)
print(result)
PIL.set_build("SSE4")
img = Image.open(IMG_FILE)
result = timeit.timeit('img.resize((960, 600), resample=Image.BICUBIC)', globals=globals(), number=1000)
print(result)
PIL.set_build("AVX2")
img = Image.open(IMG_FILE)
result = timeit.timeit('img.resize((960, 600), resample=Image.BICUBIC)', globals=globals(), number=1000)
print(result)
PIL.set_build("SSE4")
img = Image.open(IMG_FILE)
result = timeit.timeit('img.resize((960, 600), resample=Image.BICUBIC)', globals=globals(), number=1000)
print(result) and the following is its (possible) output.
Following the strategy @socketpair and @homm mentioned, I made the core imaging module ( Generic build (AVX2 or SSE4 not used) is not available yet, as making it requires back-porting Pillow's non-SIMD codes. Though I could manage to do this, but it is difficult as I'm new to Pillow and Pillow-SIMD. I think it is an obvious work for developers used to the codebase of Pillow-SIMD. If someone is interested, feel free to fork and improve my implementation; or if preferred, I'm happy to send a PR. |
Project I linked below is completed and may be a solution to this problem. # Top of "PIL/Image.py"
try:
import compilertools
except ImportError:
pass # "setup.py"
# [...]
from setuptools import Extension, setup, find_packages
try:
import compilertools.build
compilertools.build.ConfigBuild.suffixes_includes = ['avx2', 'sse4']
except ImportError:
pass
# [...]
setup(
# [...]
install_requires=['...', 'compilertools'],
# [...]
) |
The best way to solve the compile-time decision problem on recent-ish |
Well, using poor inline assembly codes (like what I've wrote) is admittedly fragile. It surely causes tricky issues. (E.g. https://stackoverflow.com/questions/12221646/how-to-call-cpuid-instruction-in-a-mac-framework). Using the Regarding manual detection, the problem is there is no open-source simple, portable, easy-to-use and stable library for CPU detection. (https://wiki.linaro.org/LEG/Engineering/OPTIM/Assembly#Hardware_identification) Using |
If you want to go down this route, you might find some inspiration in pyfastnoisesimd. At build time, compiler support is queried here and the relevant files are enabled. At runtime, this function is used to determine support and the correct function is dispatched based on the results. |
It remains not straightforward to have |
As stated before, CPU dispatch is tricky to get right in a platform-independent way. OpenCV is a much larger project with more resources to maintain the build apparatus for doing such things properly, in particular doing dynamic dispatch in a way that doesn't severely impact runtime performance. If you were only targetting Both clang and gcc support the |
NumPy now has a pretty comprehensive SIMD build and dispatch system, but I don't know how portable it would be to Pillow since it doesn't depend on NumPy. |
Removed steps that are currently unnecessary. Hopefully they stay that way.
I'll just drop https://github.com/google/highway in here. |
Problems and solutions of merging SIMD code to the main codebase.
The text was updated successfully, but these errors were encountered: