_Original ticket http://projects.scipy.org/numpy/ticket/1266 on 2009-10-19 by trac user koehler, assigned to unknown._
Although the documenttation of numpy.fft.fft states that:
"This is most efficient for n a power of two. This also stores a cache of working memory for different sizes of fft's, so you could theoretically run into memory problems if you call this too many times with too many different n's." ,I think that it may be important to report this oddity in the fft runtime.
Dependent on the array length, the fft runtime varies really extreme:
[ipython shell, from numpy import *]
In [1]: %time fft.fft(zeros(119516))
CPU times: user 22.83 s, sys: 0.39 s, total: 23.23 s
Wall time: 23.53 s
In [3]: %time fft.fft(zeros(119517))
CPU times: user 36.33 s, sys: 0.08 s, total: 36.40 s
Wall time: 36.51 s
In [5]: %time fft.fft(zeros(119518))
CPU times: user 4.88 s, sys: 0.08 s, total: 4.96 s
Wall time: 5.02 s
In [7]: %time fft.fft(zeros(119519))
CPU times: user 0.45 s, sys: 0.00 s, total: 0.45 s
Wall time: 0.45 s
In [9]: %time fft.fft(zeros(119515))
CPU times: user 0.07 s, sys: 0.00 s, total: 0.08 s
Wall time: 0.08 s
In [11]: %time fft.fft(zeros(119514))
CPU times: user 15.84 s, sys: 0.06 s, total: 15.90 s
Wall time: 15.95 s
In [13]: %time fft.fft(zeros(119513))
CPU times: user 272.75 s, sys: 1.03 s, total: 273.78 s
Wall time: 275.63 s
_@endolith wrote on 2009-11-20_
Related: #1177 http://projects.scipy.org/scipy/ticket/949
_@rgommers wrote on 2011-03-01_
_@rgommers wrote on 2011-03-01_
David C. has implemented the Bluestein transform if you need it: https://github.com/cournape/numpy/tree/bluestein
Should hopefully land in Numpy trunk soon.
Milestone changed to Unscheduled
by @mwiebe on 2011-03-25
in this pr a padding to small primes is proposed
https://github.com/scipy/scipy/pull/3144
having the function to get a better padding size might be a useful utility in numpys fftpack
Yes, instead of m = 2 ** nextpow2(2 * n - 1)
it will faster to use something like the next_regular
function
I have also come across this issue using the detect_equilibration function of pymbar that repeatedly calls np.fft and np.ifft through statsmodels autocorrelation function on many increasingly shorter arrays. I found out profiling the code, which has ultimately led me to this thread. The only work around so far is to explicitly call
np.fft.fftpack._fft_cache.clear()
to make sure that the memory requirement does not grow dangerously. This does not seem to be the ideal solution though. It would be nice to have either a kwarg such as "memsafe=True" and/or a function to manually clear the cache without referring to the global variable explicitely.
@juliantaylor Padding isn't applicable to plain FFTs, correct? Just to convolution/correlation?
@rgommers The Bluestein algorithm does speed up the FFT for prime sizes, as done in https://github.com/scipy/scipy/issues/4288 , but requires pre-computation of complex chirps, and is most efficient when you keep the chirp in memory and repeatedly re-use it on chunks of data. So I'm not sure if this is good for numpy and maybe just defer to scipy.fftpack.czt for this application?
Anyway I think this can be closed as a duplicate of https://github.com/numpy/numpy/issues/1177 ? Unless something else like Rader's algorithm is better than Bluestein's?
and @smcantab 's issue is different?
@endolith this was a long time ago :) but yes, now that I look at it again it seems a different issue. The problem I reported might still be relevant though and I had to implement the workaround I suggested in pymbar for it to work.
FYI, I ran in to someone at an audio conference whos showed me this example. It is easy to reproduce and extreme.
%time np.fft.fft( np.random.randn(100000) )
Wall time: 16 ms
Out[16]:
array([ 196.58599022 +0.j , -88.38483360 +89.2507627j ,
-166.72250316+339.27161306j, ..., 12.22959535 -64.01621313j,
-166.72250316-339.27161306j, -88.38483360 -89.2507627j ])
%time np.fft.fft( np.random.randn(100003) )
Wall time: 1min 42s
Out[17]:
array([ 13.36160617 +0.j , -314.86472577-340.44686425j,
-258.36716707-170.43805382j, ..., -21.18014704+441.3618185j ,
-258.36716707+170.43805382j, -314.86472577+340.44686425j])
fft of length 1e6: 16 MILLIseconds
fft of length 1e6 + 3: 1 MINUTE and 42 SECONDS
Feature, not a bug. FFTPACK is only "fast" when the size factors as a product of the numbers 2, 3, 4, 5. There has been a long standing desire to use a faster algorithm for large prime sizes, but it has not been implemented. Note that 100003 is prime.
I wouldn't call it a "feature", but it's normal and not a bug. :)
https://github.com/scipy/scipy/issues/4288 has Bluestein's algorithm, but it requires pre-computation of a complex chirp, so would need to do some testing to see at which prime sizes it becomes worthwhile.
Interesting. The main thing I know is that the default fft implementation for Julia and Matlab don't have this behavior. I'm curious what the Julia implementation does to avoid this behavior.
Julia and Matlab call FFTW, which we cannot do because of its license.
Note that there are Python bindings for FFTW; pyFFTW seems rather current. If FFT speed is a concern, that is probably the way to go. FFTPACK was a good implementation for its day, but code and hardware have moved on.
@charris I definitely appreciate the info and that's unfortunate, but makes sense regarding the license.
@standarddeviant You can also monkey-patch pyfftw over top of numpy/scipy to make it use FFTW
This should be fixed in numpy 1.17.
thanks @mreineck, closing
Most helpful comment
This should be fixed in numpy 1.17.