Astropy: Possible fits memmap bug: memmap just doesn't work.

Created on 26 Aug 2013  ·  12Comments  ·  Source: astropy/astropy

I'm trying to load some gigantic FITS record tables using memmap=True, and I'm getting error: [Errno 12] Cannot allocate memory.

An example session:

filename = '/home/sdfits/AGBT12B_221_01/AGBT12B_221_01.raw.acs.fits'
import astropy.io.fits as fits
filefits = fits.open(filename,memmap=True)
data = filefits[2].data[:50]

The error is at this line:

/users/aginsbur/anaconda/lib/python2.7/site-packages/numpy/core/memmap.py(253)__new__()
--> 253             mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start)

ipdb> bytes
23718381056L
ipdb> bytes/1024**2
22619L
ipdb> start
413921280
ipdb> acc
3

I don't really know what's going on, but I suspect memmap is improperly deciding on how much data to read. Any tips on how to further debug? Is this actually a FITS issue, or a numpy issue?

Details:

In [15]: numpy.__version__
Out[15]: '1.7.1'

In [16]: astropy.__version__
Out[16]: '0.2.4'

In [18]: sys.maxint
Out[18]: 9223372036854775807
Bug Effort-medium Package-intermediate io.fits

All 12 comments

What OS?

What does ulimit -v return?

OS is some flavor of linux; don't know off the top of my head or the
easiest command to find out.

Also, was using anaconda install of python/astropy/numpy but upgraded
astropy via pip.

$ ulimit -v
unlimited

On Tue, Aug 27, 2013 at 4:02 PM, Erik Bray [email protected] wrote:

What OS?

What does ulimit -v return?


Reply to this email directly or view it on GitHubhttps://github.com/astropy/astropy/issues/1380#issuecomment-23374814
.

Adam

What does cat /proc/meminfo show?

$ cat /proc/meminfo
MemTotal:        1903396 kB
MemFree:          203864 kB
Buffers:          215320 kB
Cached:           884708 kB
SwapCached:         2268 kB
Active:           492052 kB
Inactive:         954324 kB
Active(anon):     165684 kB
Inactive(anon):   181096 kB
Active(file):     326368 kB
Inactive(file):   773228 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       1048568 kB
SwapFree:        1031460 kB
Dirty:                24 kB
Writeback:             0 kB
AnonPages:        344352 kB
Mapped:            65676 kB
Shmem:               432 kB
Slab:             191348 kB
SReclaimable:     151148 kB
SUnreclaim:        40200 kB
KernelStack:        2312 kB
PageTables:        22940 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     2000264 kB
Committed_AS:     847268 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      286128 kB
VmallocChunk:   34359439336 kB
HardwareCorrupted:     0 kB
AnonHugePages:     12288 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:        8188 kB
DirectMap2M:     2070528 kB

...that seems like a tiny amount of memory; 2 GB? Hrmph.

@keflavich - are you still seeing this issue?

Totally forgot about this. MemTotal is just the total available physical memory. These 2GB is not a lot, sure, but that shouldn't be the issue. You have about 32 TB for VmallocTotal which is what should matter here--in principle mmap should be able to use most of that. So there's something fishy going on here.

Ah! I think I see the issue here. By default PyFITS uses the MAP_PRIVATE when opening a file in readonly mode so that users can still modify the data array in place as they would if the entire file were mapped into main memory.

The problem is, that means in principle the entire file can be overwritten, so mmap needs to be able to allocate enough memory ahead of time should that occur. That's why this is happening here. PyFITS/Astropy should definitely catch that scenario and provide a more helpful error.

Currently there are two ways around this: You can open the file with mode='denywrite. I added that a while ago specifically for this case, but it's rarely used. That opens the mmap with MAP_SHARED | PROT_READ--this means the pages are read-only (any attempt to modify the array will result in an exception). But if all you need is to read the data this works fine, and doesn't require allocating any swap space I don't think.

Another possibility is to open with mode='update'. Then any changes to the array can be synced directly back to the file which is fine if you want that, but obviously not so much if you don't.

Looking at the man page, it looks like there's also a flag, at least on Linux, called MAP_NORESERVE which will prevent it from pre-allocating space for copy-on-write. So if you don't _need_ to write any changes to the entire array that could work too. But we'd have to be able to catch the SIGSEGV that results if you do end up running out of swap space.

Managed to reproduce this directly--indeed, both of the workarounds I offered (mode='denywrite' and mode='update' work. Will still try to see what I can do about MAP_NORESERVE, and otherwise catching this error and providing a better error message.

Annoyance: numpy.memmap doesn't allow tweaking the flags that are passed to the mmap call. Though looking at it, it's not much more than a light subclass of ndarray that handles the work of creating an mmap with the right flags and then calling ndarray.__new__ with the mmap as its buffer. It also adds a flush method.

It should be easy enough to just eschew use of numpy.memmap at all and handle mmap ourselves. But that's still more than I want to do on this for now. So instead I'll resolve this issue by catching the error and suggesting one of the existing workarounds.

Note that with #7597 we no more use np.memmap, so it should easier to use other flags if that is useful.

This can now be closed, as a workaround has been merged in https://github.com/astropy/astropy/pull/7926. MAP_NORESERVE is not available from Python, so that isn't a solution unfortunately.

Was this page helpful?
0 / 5 - 0 ratings