Numpy: BUG: np.array fails on a list of arrays with partially matching dimensions

Created on 23 Mar 2016  ·  8Comments  ·  Source: numpy/numpy

The functions numpy.array and numpy.asarray have a well defined behaviour when applied to lists of arrays: if the listed arrays have the same dimensions and size, the list is turned in one of the dimensions of the resulting array (let's call it "mode 1"). If not, an array of arrays is returned ("mode 2").

However, the behaviour of numpy.array and numpy.asarray in "mode 2" seems to be dependent on the number of items in the arrays. The following code is not very elegant relative to numpy usefulness, but works:

>>> a = np.array([1, 2, 3])
>>> b = np.array([[1, 0], [0, 1]])
>>> np.asarray([a, b])
array([array([1, 2, 3]), array([[1, 0],
       [0, 1]])], dtype=object)

But the following doesn't:

>>> a = np.array([1, 2])
>>> b = np.array([[1, 0], [0, 1]])
>>> np.asarray([a, b])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.4/dist-packages/numpy/core/numeric.py", line 474, in asarray
    return array(a, dtype, copy=False, order=order)
ValueError: could not broadcast input array from shape (2,2) into shape (2)

Clearly, the problem is that, when numpy.asarray sees the first dimension has the same length in a and b, it tries to go for "mode 1", which is impossible here.

EDIT: I'm using numpy 1.10.4 and python 3.4.3.

00 - Bug numpy.core

Most helpful comment

Providing dtype=object does not help - you still get the same error.
However, adding an empty array lets you avoid it:

arr_of_arr = np.array([np.array([]), a, b])[1:]

All 8 comments

Agreed, this sort of fallback logic is unfortunate. We've discussed not making dtype=object arrays unless the dtype is explicitly provided. IMO np.array([a, b], dtype=object) should be the only way to write either of these -- and it shouldn't need to do any checks on shape.

Providing dtype=object does not help - you still get the same error.
However, adding an empty array lets you avoid it:

arr_of_arr = np.array([np.array([]), a, b])[1:]

Still exists in 1.14.4

@ppwwyyxx - indeed, partially as it is not a trivial change, partially as it is not something one gets hit with all the time, so the urgency is relatively low (and there not that many people having time to contribute...).

But what might help here is to make it clearer exactly what the desired behaviour would be. @shoyer mentioned the also long-standing request to explicitly requiring dtype=object if that is in fact wanted, otherwise raising TypeError for anything that cannot be parsed as a numerical or string array (#5353). I've been pondering recently whether it would be useful to similarly have a dtype='structured', which would strictly enforce a difference between lists as indicating elements of an array and tuples as elements of a structured dtype.

I'm willing to contribute if anyone can send me some pointers on what to do. I just glanced at the related code in ctors.c, the constructor logic seems to be quite complicated as it needs to deal with many different forms of input.

Same issue: #8330

I think this might be fixed by #11601. Edit: It is not.

So, what's happening here is roughly:

a = np.array([1, 2])
b = np.array([[1, 0], [0, 1]])
out = np.asarray([a, b])
# translates to
out = np.empty((2, 2))  #shape is correctly inferred
out[0,:] = a
out[1,:] = b  # error comes from here
Was this page helpful?
0 / 5 - 0 ratings