Numpy: Default int type is platform dependent

Created on 26 Jul 2017  ·  12Comments  ·  Source: numpy/numpy

np.array([1]).dtype is platform-dependant, presumably because it defaults to np.int_

  1. Is this by design?
  2. If not, can we force it to int64?
23 - Wish List 54 - Needs decision numpy.dtype

Most helpful comment

We should seriously consider changing this.

In my experience, if a Python library of moderate complexity that uses NumPy does not run Windows specific tests, it probably broken for this reason.

All 12 comments

It is by design – the idea is that numpy's default int type matches the range of python 2's int, which in turn matches the platform C compiler's long.

Whether this is a good design is another question, especially since python 3 has eliminated this. There have been intermittent discussion about changing it before that you can probably dig up – especially the confusing and error prone way the default is 32 bits on win64.

I suppose one way to move that discussion forward would be to test whether any major packages break if you do make that change.

One thing that may break is if someone is using dtype=int and assumes this is somehow related to C long type...

Changing the default int type on windows 64 to 64 bit would imo be an important enough change to warrant breaking software.
The current behavior just causes too many bugs.

That the default int type on 32 bit is 32 bit int is probably not so bad, as it does at least cover the full addressable range and changing it could have performance impact.

We should seriously consider changing this.

In my experience, if a Python library of moderate complexity that uses NumPy does not run Windows specific tests, it probably broken for this reason.

@shoyer : We ran into this exact problem on Windows with @MichaelMauderer on https://github.com/colour-science/colour/pull/431.

I was assuming incorrectly that np.int_ was platform independent.

Perhaps we should drop this default at the same time as python 2, since the sole reason for defaulting to np.int_ was that it matched the size of builtins.int, which in python 3 is not even true.

Ideally numpy should behave the same way across platforms. A colleague of mine uses Windows and recently had to spend some time trying to figure out why a program was yielding different results on his machine than on my Mac. IMO performance considerations pale in comparison to getting correct and consistent results.

Is there any runtime workaround a user could execute, before their other code, to force numpy-on-Windows default types to the same widths as elsewhere? (Perhaps, a data-driven, tamperable mapping of Python types to numpy types?)

As a fresh example of some of the resulting craziness, specifically asking for a array of a type compatible with type(2**32) results in an array that can't store 2**32:

2020-07-07T06:53:20.9528159Z     def testTiny(self):
2020-07-07T06:53:20.9528423Z         a = np.empty(1, dtype=type(2**32))
2020-07-07T06:53:20.9529046Z >       a[0] = 2**32
2020-07-07T06:53:20.9529318Z E       OverflowError: Python int too large to convert to C long

@gojomo I'm not sure that's a right approach anyway. On python 3 type(2**32) is guaranteed to be int, so that's just a more complicated way of saying dtype=int. If you're using a literal like that anyway you could of course use explicit dtype=np.int64.

To make it more dynamic, does dtype=np.array(2**32).dtype work? (Odds are there are even more idiomatic ways to do this.)
EDIT: np.empty_like(2**32, shape=...) is probably it, assuming that works.

No, I had a PR to add one, maybe I can open that again now that we decided to start the deprecation on some of the aliases: https://github.com/numpy/numpy/pull/16535

So either use dtype=np.intp which gives you 32bit on 32bit systems and 64bit on 64bit systems, or use dtype=np.int64 to begin with. That PR made dtype=np.intp the default, which is the simpler change, because intp is fairly common in NumPy already.

I was thinking that if NEP 31 ever happens, it would also make this kind of replacing defaults easily opt-in.

@adeak My snippet's not a literal example; my actual issue is that I've got a list of many ints, which eventually reach 232, but a numpy array typed based on the first int breaks on Windows when it reaches 232, but works everywhere else.

(I was hoping the snippet highlighted some of the on-the-face absurdity of the Python-to-numpy interaction: shouldn't a reported type for a specific number specifically-communicate a corresponding type wide enough to store it? But I suppose Python is an equal contributor to the problem, as 2**65 & 2**129 have the same problem of reporting as simple int. So it's more a brain-teaser than a guide to better behavior.)

I'd answer the "is numpy's choice a good design?" question in @njsmith's 2017 comment as: "Reasonable way back when, but not anymore, with Python3, & the primacy of 64bit systems, and Microsoft's own phasing-out of WIndows 10's support for 32-bit systems."

Traffic on this issue since looks like it has referenced many places this has caused problems for people, but not yet any extant examples of code that'd break with a changed default. (There's probably some, somewhere.)

If the plunge of changing the default in one swoop is too risky, a call that opts-in to some minimum-width default (or user-chosen default) for all subsequent mappings of Python's int might help. (And then at some later date with warning to Windows users, change the default, but give laggards an option to change it back for a while.)

Was this page helpful?
0 / 5 - 0 ratings