Numpy: Extremely long runtimes in numpy.fft.fft (Trac #1266)

Created on 19 Oct 2012  ·  19Comments  ·  Source: numpy/numpy

_Original ticket on 2009-10-19 by trac user koehler, assigned to unknown._

Although the documenttation of numpy.fft.fft states that:
"This is most efficient for n a power of two. This also stores a cache of working memory for different sizes of fft's, so you could theoretically run into memory problems if you call this too many times with too many different n's." ,I think that it may be important to report this oddity in the fft runtime.
Dependent on the array length, the fft runtime varies really extreme:

[ipython shell, from numpy import *]
In [1]: %time fft.fft(zeros(119516))
CPU times: user 22.83 s, sys: 0.39 s, total: 23.23 s
Wall time: 23.53 s

In [3]: %time fft.fft(zeros(119517))
CPU times: user 36.33 s, sys: 0.08 s, total: 36.40 s
Wall time: 36.51 s

In [5]: %time fft.fft(zeros(119518))
CPU times: user 4.88 s, sys: 0.08 s, total: 4.96 s
Wall time: 5.02 s

In [7]: %time fft.fft(zeros(119519))
CPU times: user 0.45 s, sys: 0.00 s, total: 0.45 s
Wall time: 0.45 s

In [9]: %time fft.fft(zeros(119515))
CPU times: user 0.07 s, sys: 0.00 s, total: 0.08 s
Wall time: 0.08 s

In [11]: %time fft.fft(zeros(119514))
CPU times: user 15.84 s, sys: 0.06 s, total: 15.90 s
Wall time: 15.95 s

In [13]: %time fft.fft(zeros(119513))
CPU times: user 272.75 s, sys: 1.03 s, total: 273.78 s
Wall time: 275.63 s
00 - Bug numpy.fft

Most helpful comment

This should be fixed in numpy 1.17.

All 19 comments

[email protected] wrote on 2011-03-01_

[email protected] wrote on 2011-03-01_

David C. has implemented the Bluestein transform if you need it:

Should hopefully land in Numpy trunk soon.

Milestone changed to Unscheduled by @mwiebe on 2011-03-25

in this pr a padding to small primes is proposed
having the function to get a better padding size might be a useful utility in numpys fftpack

Yes, instead of m = 2 ** nextpow2(2 * n - 1) it will faster to use something like the next_regular function

I have also come across this issue using the detect_equilibration function of pymbar that repeatedly calls np.fft and np.ifft through statsmodels autocorrelation function on many increasingly shorter arrays. I found out profiling the code, which has ultimately led me to this thread. The only work around so far is to explicitly call


to make sure that the memory requirement does not grow dangerously. This does not seem to be the ideal solution though. It would be nice to have either a kwarg such as "memsafe=True" and/or a function to manually clear the cache without referring to the global variable explicitely.

@juliantaylor Padding isn't applicable to plain FFTs, correct? Just to convolution/correlation?

@rgommers The Bluestein algorithm does speed up the FFT for prime sizes, as done in , but requires pre-computation of complex chirps, and is most efficient when you keep the chirp in memory and repeatedly re-use it on chunks of data. So I'm not sure if this is good for numpy and maybe just defer to scipy.fftpack.czt for this application?

Anyway I think this can be closed as a duplicate of ? Unless something else like Rader's algorithm is better than Bluestein's?

and @smcantab 's issue is different?

@endolith this was a long time ago :) but yes, now that I look at it again it seems a different issue. The problem I reported might still be relevant though and I had to implement the workaround I suggested in pymbar for it to work.

FYI, I ran in to someone at an audio conference whos showed me this example. It is easy to reproduce and extreme.

%time np.fft.fft( np.random.randn(100000) )
Wall time: 16 ms
array([ 196.58599022  +0.j        ,  -88.38483360 +89.2507627j ,
       -166.72250316+339.27161306j, ...,   12.22959535 -64.01621313j,
       -166.72250316-339.27161306j,  -88.38483360 -89.2507627j ])

%time np.fft.fft( np.random.randn(100003) )
Wall time: 1min 42s
array([  13.36160617  +0.j        , -314.86472577-340.44686425j,
       -258.36716707-170.43805382j, ...,  -21.18014704+441.3618185j ,
       -258.36716707+170.43805382j, -314.86472577+340.44686425j])

fft of length 1e6: 16 MILLIseconds

fft of length 1e6 + 3: 1 MINUTE and 42 SECONDS

Feature, not a bug. FFTPACK is only "fast" when the size factors as a product of the numbers 2, 3, 4, 5. There has been a long standing desire to use a faster algorithm for large prime sizes, but it has not been implemented. Note that 100003 is prime.

I wouldn't call it a "feature", but it's normal and not a bug. :) has Bluestein's algorithm, but it requires pre-computation of a complex chirp, so would need to do some testing to see at which prime sizes it becomes worthwhile.

Interesting. The main thing I know is that the default fft implementation for Julia and Matlab don't have this behavior. I'm curious what the Julia implementation does to avoid this behavior.

Julia and Matlab call FFTW, which we cannot do because of its license.

Note that there are Python bindings for FFTW; pyFFTW seems rather current. If FFT speed is a concern, that is probably the way to go. FFTPACK was a good implementation for its day, but code and hardware have moved on.

@charris I definitely appreciate the info and that's unfortunate, but makes sense regarding the license.

This should be fixed in numpy 1.17.

thanks @mreineck, closing

Was this page helpful?
0 / 5 - 0 ratings