Numpy: ndarray dump function (and straight cPickle) fails for large arrays (Trac #1803)

Created on 20 Oct 2012  ·  8Comments  ·  Source: numpy/numpy

_Original ticket http://projects.scipy.org/numpy/ticket/1803 on 2011-04-19 by trac user meawoppl, assigned to unknown._

a = zeros((300000, 1000))
f = open("test.pkl", "w")

cPickle.dump(a, f)

SystemError Traceback (most recent call last)

/home/kddcup/code/matt/svd-projection/take5/ in ()

SystemError: error return without exception set

Or using the .dump function:

a.dump("test.pkl")

SystemError Traceback (most recent call last)

/home/kddcup/code/matt/svd-projection/take5/ in ()

SystemError: NULL result without error in PyObject_Call

I am not sure if this is a numpy or Pickle/cPickle glitch. In either case, a more instructive error message would certainly help. I think the problem only happens for arrays larger than 2**(32-1) bytes but I would have to experiment more to be sure.

00 - Bug Other

All 8 comments

I am also facing this problem.

@zhlsk didn't try further or investigated if there is a fix in python, but I think this is not a numpy but a python problem, since: cPickle.dump(a.tostring(), f) also fails for me just the same and thats just a large string then and has nothing to do with numpy. Using the numpy save functions should not have these kind of problems though.

@seberg Thanks for your reply. You're right. It's a python problem : http://bugs.python.org/issue11564

In principle we could probably work around the bug anyway by passing large
arrays to the pickler as a series of objects, each of which fit in the
limits, and reconstruct the array at unpickle time. It sounds like the
limit is on the size of a single byte array.
On 14 Dec 2012 22:11, "zhlsk" [email protected] wrote:

@seberg https://github.com/seberg Thanks for your reply. You're right.
It's a python problem : http://bugs.python.org/issue11564


Reply to this email directly or view it on GitHubhttps://github.com/numpy/numpy/issues/2396#issuecomment-11394338.

Any progress on this? It seems that numpy.save and numpy.savetxt both are affected by this issue. Any workarounds?

It's a bug in Python, the fix is to upgrade to a newer version (it's fixed in Python 3.3), and the workaround is to pickle your array yourself in smaller parts, or use some other file format than Python pickles.

np.savetxt is not affected by this, and np.save is only affected for object arrays.

Closing.

I'm getting this when trying to cPickle a 11,314 x 8,463,980,778 sparse matrix with 352,451,719 stored elements in scipy.sparse.csr_matrix format. Python version: Python 2.7.10 |Anaconda 2.3.0 (x86_64)| (default, May 28 2015, 17:04:42)

Was this page helpful?
0 / 5 - 0 ratings