Numpy: Broadcasting errors with multi-dimensional boolean masks

Created on 3 Apr 2019  ·  3Comments  ·  Source: numpy/numpy

Attempting to index a 2D array of shape [N, M] with two 1D boolean masks, shapes N and M, certain combinations of True and False lead to a broadcasting error (particularly when one is all false). I'm not sure if this behaviour is expected but it seems highly surprising and undesirable.

In the example below, x[[False, True, True], [True, True, True]] errors, while x[[False, True, True], True] and x[[False, True, True]] have the expected behavior.

Reproducing code example:

import numpy as np
from itertools import product

x = np.zeros((3,3))
mask_1d = [*product([True, False], repeat=3)]

for row_mask, col_mask in product(mask_1d, mask_1d):
    try:
        x[row_mask, col_mask]
    except IndexError as e:
        print(row_mask, col_mask)
        print(e)

Error message:

     (True, True, True) (True, True, False)
shape mismatch: indexing arrays could not be broadcast together with shapes (3,) (2,) 
(True, True, True) (True, False, True)
shape mismatch: indexing arrays could not be broadcast together with shapes (3,) (2,) 
(True, True, True) (False, True, True)
shape mismatch: indexing arrays could not be broadcast together with shapes (3,) (2,) 
(True, True, True) (False, False, False)
shape mismatch: indexing arrays could not be broadcast together with shapes (3,) (0,) 
(True, True, False) (True, True, True)
shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (3,) 
(True, True, False) (False, False, False)
shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (0,) 
(True, False, True) (True, True, True)
shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (3,) 
(True, False, True) (False, False, False)
shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (0,) 
(False, True, True) (True, True, True)
shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (3,) 
(False, True, True) (False, False, False)
shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (0,) 
(False, False, False) (True, True, True)
shape mismatch: indexing arrays could not be broadcast together with shapes (0,) (3,) 
(False, False, False) (True, True, False)
shape mismatch: indexing arrays could not be broadcast together with shapes (0,) (2,) 
(False, False, False) (True, False, True)
shape mismatch: indexing arrays could not be broadcast together with shapes (0,) (2,) 
(False, False, False) (False, True, True)
shape mismatch: indexing arrays could not be broadcast together with shapes (0,) (2,)

Numpy/Python version information:

1.16.2 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34) 
[GCC 7.3.0]

Most helpful comment

With boolean arrays, the code assumes you are trying to index either a single dimension or all elements at the same time - with the choice somewhat unfortunately guessed in a way that allows a single True to be removed. I.e., it turns your row_mask, col_mask into a (2,3) boolean array and then finds that it cannot index the (3,3) array.

Part of the problem is that tuples and lists are treated as equivalent, something we're trying to move away from. Eventually, you'd handle the boolean array index by ensuring the mask was a double list.

For now, though, I fear the only solution is to do x[row_mask][:, col_mask].

cc @eric-wieser, who has been working to deprecate the "treat tuple as list" for indexing operations.

p.s. Most annoying I find this difference:

x = np.arange(9).reshape(3, 3)
# x[[False, True, True], True]
# array([[3, 4, 5],
#        [6, 7, 8]])
x[[False, True, True], False]
# IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (0,) 

All 3 comments

With boolean arrays, the code assumes you are trying to index either a single dimension or all elements at the same time - with the choice somewhat unfortunately guessed in a way that allows a single True to be removed. I.e., it turns your row_mask, col_mask into a (2,3) boolean array and then finds that it cannot index the (3,3) array.

Part of the problem is that tuples and lists are treated as equivalent, something we're trying to move away from. Eventually, you'd handle the boolean array index by ensuring the mask was a double list.

For now, though, I fear the only solution is to do x[row_mask][:, col_mask].

cc @eric-wieser, who has been working to deprecate the "treat tuple as list" for indexing operations.

p.s. Most annoying I find this difference:

x = np.arange(9).reshape(3, 3)
# x[[False, True, True], True]
# array([[3, 4, 5],
#        [6, 7, 8]])
x[[False, True, True], False]
# IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (0,) 

Yes, x[row_mask][:, col_mask] is what I ended up doing. Thanks for the explanation, I'm glad it's something that's being looked into.

I think arr[np.ix_(index)] is what you want/are expecting here, or in outher words on outer indexing logic as is in NEP 21: https://github.com/numpy/numpy/blob/master/doc/neps/nep-0021-advanced-indexing.rst

Maybe that will be picked up some time. The NEP also says that at least for the current indexing multiple boolean indices should just be deprecated (I think whether to allow this specific use case may still have been contested – it is consistent, but may not have much use case and be pretty confusing in any case).

Was this page helpful?
0 / 5 - 0 ratings