-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve fbp speed #1505
base: master
Are you sure you want to change the base?
Improve fbp speed #1505
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the general idea, but the next power of 2 is not necessarily optimal. In particular pyfftw
has a utility function next_fast_len
that computes the optimal size. The scipy.fftpack
version (though not supported by us) has an analogous function. Perhaps Numpy's FFT has the same characteristics, but since it's slower than pyFFTW anyway, it's probably okay to not optimize it.
So I suggest that you use pyfftw.next_fast_len
for pyfftw
backend, and either do nothing for numpy
or guess and use the same.
Ok, fix for that coming along. Sadly it seems that the function you mentioned is not 100% valid for e.g. rfft so I need to special case that. This function demonstrates the results anyway with that fix:
|
Very interesting, thanks for the analysis @adler-j! As it turns out, SciPy's function was first and was ported to pyFFTW later: pyFFTW/pyFFTW#159. I see quite some benchmarking going on there, too, but it's possible that only C2C was covered. Might be good to make them aware? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some stuff
@@ -1607,6 +1608,28 @@ def unique(seq): | |||
return unique_values | |||
|
|||
|
|||
def nextpow2(n): | |||
""" | |||
Compute the integer which is a power of two. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On first line.
An maybe: "Return the next integer power of 2."
Actually, after the change, it's more like "Return the exponent of the next integer power of 2."
Parameters | ||
---------- | ||
n : int | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No text? Not that there's a lot to say, but doc formatting might break.
>>> odl.util.nextpow2(7) | ||
8 | ||
>>> odl.util.nextpow2(513) | ||
1024 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doctests ought to fail now
@@ -1607,6 +1608,28 @@ def unique(seq): | |||
return unique_values | |||
|
|||
|
|||
def nextpow2(n): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
next_int_log2
?
@@ -47,6 +47,7 @@ | |||
'repr_string', | |||
'attribute_repr_string', | |||
'method_repr_string', | |||
'nextpow2', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although tiny, I'd probably put this into numerics.py
, if only to avoid littering this kitchen sink file.
@@ -649,6 +650,19 @@ def reciprocal_space(space, axes=None, halfcomplex=False, shift=True, | |||
return recip_spc | |||
|
|||
|
|||
def next_fast_len(n, impl, rfft=False): | |||
"""The smallest size for which a fast fft implementation is available.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FFT
@@ -649,6 +650,19 @@ def reciprocal_space(space, axes=None, halfcomplex=False, shift=True, | |||
return recip_spc | |||
|
|||
|
|||
def next_fast_len(n, impl, rfft=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe is_rfft
Do you mean that it isn't 100% valid in SciPy? In FFTW, I am pretty sure it is equally valid for both fft and rfft functions. I think the only case where it doesn't apply are a subset of the real-to-real transforms (e.g. DCT-I, which are not currently wrapped in pyFFTW). Here is a figure I generated based on your script but calling npad = pyfftw.interfaces.scipy_fftpack.next_fast_len(int(n)) instead of npad = odl.trafos.util.next_fast_len(int(n), impl=impl, rfft=True) Also, if you enable caching of plans and increase the planner effort and number of threads, the difference can become larger. Here is a case using 'FFTW_MEASURE' and 10 threads on 10-core SkylakeX CPU. In this case you need to request this near the top of the benchmark script pyfftw.config.NUM_THREADS = 10
pyfftw.config.PLANNER_EFFORT = 'FFTW_MEASURE'
pyfftw.interfaces.cache.enable() and make sure to call the transform once before entering the Timer so the cached Plan will be used # call once so the FFTW plan will be cached
fft.irfft(fft.rfft(arr))
# average time over 10 calls
nreps = 10
with Timer() as timer:
for n in range(nreps):
fft.irfft(fft.rfft(arr))
times[i] = timer.duration / nreps |
Related to this, the following work might eventually remove the need to maintain multiple FFT backends in ODL itself: We currently have a student working on support for multiple FFT backends for SciPy as part of Google Summer of Code (GSOC). As of now it is only week 2 of 12 for the project, but the goal is to allow use of various backends (pyFFTW, pocketfft, mkl-fft, etc) via a unified API using a proposed |
Many thanks for the analysis @grlee77! Re: the new |
It should be mentioned that there is already the pyfftw numpy compatibility interfaces: https://hgomersall.github.io/pyFFTW/pyfftw/interfaces/interfaces.html the problem with them was that they do not provide optimal usage in terms of e.g. caching and planning. Is the planned scipy module supposed to solve this somehow? |
To me it seems like the exposed interface capabilities will depend on the backend. The SciPy interface, for instance, has always had But @grlee77 should know better. |
pyFFTW 0.11 added the ability to configure the default number of threads and planner effort without having to explicitly pass the number of threads and effort to each individual function call (this was used in my example above). When the cache is enabled, I think this should provide most of the performance one would get when explicitly preplanning transforms.
There are still open questions as to what the final form of this will be and whether backend-specific arguments will be supported, so any feedback you have as a potential downstream user would be greatly appreciated. The issue I linked above was mainly for the pocketfft work, but some initial backend discussion has been in scipy/scipy#10175 and some earlier posts on the scipy-dev mailing list. There is not yet an open PR for the backend support (Peter started out with the addition of pypocketfft to SciPy and is just now starting to implement the backends). Peter prepared a design document with initial plans that will be revised as needed going forward. At the moment we are exploring whether or not we should use uarray to implement the backend.
It can help some, but has also caused some confusion on the user end because setting it to True is not a guarantee that the array will be overwritten (this is only possible for some transform types). So the user should not try to use it in the same way as the |
It seems that pyfftw real to complex FFT is not optimised for non-powers of two. This fix increases performance for 1000 angles and 500 pixels from 180ms to 3ms on my laptop.