Numpy: ERROR: test_big_arrays (test_io.TestSavezLoad) on OS X + Python 3.3

Created on 3 Oct 2013  ·  29Comments  ·  Source: numpy/numpy

Reported by Piet van Oostrum on the mailing list against 1.8.0rc1 on OS X with Python 3.3:

======================================================================
ERROR: test_big_arrays (test_io.TestSavezLoad)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/numpy/testing/decorators.py", line 146, in skipper_func
    return f(*args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/numpy/lib/tests/test_io.py", line 149, in test_big_arrays
    np.savez(tmp, a=a)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/numpy/lib/npyio.py", line 530, in savez
    _savez(file, args, kwds, False)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/numpy/lib/npyio.py", line 589, in _savez
    format.write_array(fid, np.asanyarray(val))
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/numpy/lib/format.py", line 417, in write_array
    fp.write(array.tostring('C'))
OSError: [Errno 22] Invalid argument
00 - Bug numpy.lib

Most helpful comment

Hi all,

Could you try this issue with the last version of Python 3.6, 3.7 and 3.8a because I think to have fixed the issue on OSX with this PR (https://github.com/python/cpython/pull/1705).

I have an other PR for 2.7, but this one is not yet ready :/

Thank you for your feedback.

All 29 comments

I can reproduce this. Looks like a Python 3.x bug.

import os
import sys
import time


if sys.maxsize > 2**32:
    print('64-bit')
else:
    print('32-bit, exiting')
    sys.exit(0)

fname = 'write_large_bytestring.txt'
tmp = open(fname, 'wb')
try:
    L = (1 << 31) + 100000
    tmp.write(b'abc' * 2**32)
finally:
    tmp.close()
    os.remove(fname)
    print('Elapsed time: %s s' % time.clock())

The above works with Python 2.7 but not with 3.3:

$ python tmp3.py 
64-bit
Elapsed time: 7.896957 s

$ python3.3 tmp3.py 
64-bit
Elapsed time: 50.149956 s
Traceback (most recent call last):
  File "tmp3.py", line 16, in <module>
    tmp.write(b'abc' * 2**32)
OSError: [Errno 22] Invalid argument

$ ulimit
unlimited

Both Python are installed from the dmgs on python.org. I can't find an issue for this on bugs.python.org but IIRC the io module was completely rewritten.

Test introduced in gh-2942.

Or maybe this is another OSX I/O bug? Remember, OSX libc is buggy and has issues in fwrite/fread when dealing with data blocks close to 2**32, which we had to work around in tofile/fromfile...

In any case, we obviously have to work around it by splitting up the write
into smaller chunks, right?

(Even if it's ultimately libc's fault, python should probably work around
it itself - possibly 2.7 had code to do this that got lost in the
transition.)
On 3 Oct 2013 10:03, "Pauli Virtanen" [email protected] wrote:

Or maybe this is another OSX I/O bug? Remember, OSX libc is buggy and has
issues in fwrite/fread when dealing with data blocks close to 2**32...


Reply to this email directly or view it on GitHubhttps://github.com/numpy/numpy/issues/3858#issuecomment-25607852
.

Re: gh-574 and gh-2806 and gh-3473 The OSX version in question may be relevant, maybe it now fails rather than writing garbage like it did previously?

Yes, we can work around it by chunking. Maybe this issue should also be forwarded to Python devs, so that they could also implement chunking themselves...

Ah, forgot about those issues. Tried to test on my 10.6 machine, but there the same script just hangs. Could be due to the hardware though, it's an ancient machine.

Is this a 1.8.0 blocker? I'm going to put it there just so it isn't forgotten, it can always be removed. Does anyone know if it works for Python 3.2?

Also, IIRC, we've only chunked reads, it may be that a test that writes a large file is broken.

I wouldn't hold up the release for this one, it's not a regression. Mark it knownfail though in the 1.8.x branch if it's not failed before the release.

@rgommers Does if fail only with OSX and 3.3? You say that it works for python 2.7, is that correct?

It doesn't fail for 2.7, but the test doesn't check that what's written to file is correct. It's likely not to be correct, see other issues that Pauli linked.

@rgommers So OSX in general. I'll leave it open in 1.9-devel to motivate a fix and open an issue.

This is reported fixed in OS X Mavericks. This is probably won't fix, as the proper fix is to upgrade the OS.

See also #2931.

Closing, should be fixed by Mavericks. Please reopen if the problem persists.

This is still happening for me on Mavericks with numpy 1.8.1 and python 3.4 (also 3.3) from Anaconda; if I comment out the skipif decorator, https://github.com/numpy/numpy/blob/v1.8.1/numpy/lib/tests/test_io.py#L154 fails.

The test passes, and data appears to be loaded correctly, using python 2.7 from Anaconda.

This isn't necessarily a flaw with numpy, but others appear to be working around this or similar issues, e.g. https://github.com/torch/torch7-distro/commit/40e65934e071e452f194a9d8c0fd740131babefa (which is for reading rather than writing, but I'm also unable to read large files on python 3).

Running the testcase using Python 3.4 with numpy 1.8.1 on Mac OS X 10.9.4 (Mavericks) results in the known OSError: [Errno 22] Invalid argument error. @certik Please reopen issue!

Another data point - using OSX 10.9.5 (Mavericks) and I get the same issue. I just saw this bug in the python tracker: https://bugs.python.org/issue24658

Might as well reopen this. I don't know if it will be fixed when Python solves their part, but we will find out.

Happens here, latest macOS, python (3.6), and numpy.

FYI Still seeing this bug on latest macOS (10.12.6) and Python 3.5.2 (Anaconda 4.2.0).
Has anyone figured out an upper limit on chunk sizes?

Getting this error as well, macOS Sierra, python 3.6, numpy

There looks to be some motion on the Python issue, but perhaps we should just go ahead and chunk the writes.

Looks like this isn't going to get fixed upstream anytime soon. Anyone know the latest status?

Hi all,

Could you try this issue with the last version of Python 3.6, 3.7 and 3.8a because I think to have fixed the issue on OSX with this PR (https://github.com/python/cpython/pull/1705).

I have an other PR for 2.7, but this one is not yet ready :/

Thank you for your feedback.

@rgommers Any chance you can test this? Any other feedback on the current status of this would be welcome.

The test_big_arrays test passes, but https://github.com/numpy/numpy/issues/3858#issuecomment-25607105 still fails for me with the latest Python 3.6 shipped by Anaconda. That probably doesn't have the CPython fix though. No time to build Python myself right now, sorry.

@rgommers can you revisit this?

This is indeed fixed as far as I can tell, at least with Python 3.7 from Anaconda. No other reports either, so closing. Thanks everyone, and @matrixise in particular for fixing this.

Was this page helpful?
0 / 5 - 0 ratings