Numpy: Unexpected output from arange with dtype=int

Created on 5 May 2020  ·  5Comments  ·  Source: numpy/numpy


In [3]: np.arange(-3, 0, 0.5, dtype=int)
Out[3]: array([-3, -2, -1, 0, 1, 2])

Well, to see a "1" and a "2" was a bit unexpected for us since both numbers are a bit bigger than 0.

Normally, this is the result without dtype=int:

In [2]: np.arange(-3, 0, 0.5)                                                  
Out[2]: array([-3. , -2.5, -2. , -1.5, -1. , -0.5])
and we should get this with dtype=int:
In [4]: np.arange(-3, 0, 0.5).astype(int)                                      
Out[4]: array([-3, -2, -2, -1, -1,  0])

The numpy manual states:
dtype : dtype
The type of the output array. If dtype is not given, infer the data type from the other input arguments.

Thus it should only effect the output array, right?

import numpy as np
print(np.arange(-3, 0, 0.5))
print(np.arange(-3, 0, 0.5, dtype=int))
print(np.arange(-3, 0, 0.5).astype(int))

Error message:

No error message...

Numpy/Python version information:

We tested it under numpy '1.18.4' (pure Python 3.7.6) as well as '1.18.1' (Anaconda 3.7 with the latest update applied). Same result.

1.18.4 3.7.6 (default, Feb 28 2020, 15:25:38)
[Clang 11.0.0 (https://github.com/llvm/llvm-project.git eefbff0082c5228e01611f7

1.18.1 3.7.4 (default, Aug 13 2019, 20:35:49)
[GCC 7.3.0]

00 - Bug numpy.core

Most helpful comment

Getting values that are bigger than "stop" is really not nice and a bit unexpected. If arange is not for float, you could check for the floaty numpy types and raise an exception.

Also the manual entry for dtype really lets the user expect something like an astype(dtype) conversion of only the output.

How about:
1.) Exception for non-integer arguments (i.e. start, stop, step).
2.) Check if stop >= start, otherwise raise an exception
3.) Cast start, stop, step to int64 in the beginning of the function.
4.) astype(dtype) the output

Instead of 1.) you can redirect to linspace inside of arange if a non-integer input is found.

All 5 comments

Bugs like this are reported over and over again. For reasons lost to time, I'm fairly confident the implementation of arange is something like:

def arange(start, stop, step, dtype):
    n = (start - stop) // step

    # dtype.type is a cast
    step = dtype.type(start + step) - dtype.type(start)

    # now do what you expect
    return [start + step*i for i in range(n)]

Perhaps we should add that pseudo-code to the documentation?

Yeah, that code is correct (not 100% sure about the n calculation though). This specific example is pretty extreme, and obviously broken, maybe we can actually get rid of it somehow?

arange is repeatedly hated for the arguably broken definition, but I cannot think of a really good proposal to address it (although maybe one came up before).
Its not like we can change arange behaviour for floats well (maybe precision fixups, but end-point changes are no good IMO). So we would need to create a new function... But then in most cases it seems to me that linspace is better than a "correct" float arange, I am not sure that a corrected float-arange actually has too many use-cases.

In the end, I guess I would like a well thought out proposal :/...

Getting values that are bigger than "stop" is really not nice and a bit unexpected. If arange is not for float, you could check for the floaty numpy types and raise an exception.

Also the manual entry for dtype really lets the user expect something like an astype(dtype) conversion of only the output.

How about:
1.) Exception for non-integer arguments (i.e. start, stop, step).
2.) Check if stop >= start, otherwise raise an exception
3.) Cast start, stop, step to int64 in the beginning of the function.
4.) astype(dtype) the output

Instead of 1.) you can redirect to linspace inside of arange if a non-integer input is found.

Hey I'm a complete beginner to open source contribution. Thought of giving it a try. How about this snippet? @eric-wieser

x = []
for i in range(start, stop):
    x.append(i)
    x.append(i+step)
print(np.array(x, dtype))

Bugs like this are reported over and over again. For reasons lost to time, I'm fairly confident the implementation of arange is something like:

def arange(start, stop, step, dtype):
    n = (start - stop) // step

    # dtype.type is a cast
    step = dtype.type(start + step) - dtype.type(start)

    # now do what you expect
    return [start + step*i for i in range(n)]
Was this page helpful?
0 / 5 - 0 ratings