Numpy: np.random.choice returns out of range index on masked array prob.

Created on 15 Aug 2020  ·  6Comments  ·  Source: numpy/numpy

import numpy as np

a = np.array([0.1, 0.2, 0.4, 0.3])
masked_a = np.ma.masked_array(a, [0, 0, 1, 1])
counter = np.zeros(len(a) + 1)
while True:
    action = np.random.choice(len(a), p=masked_a)
    counter[action] += 1
    print(counter / counter.sum())

# [0.09931198 0.20076697 0.         0.         0.69992105]

It shouldn't returns an index > 3, but when the last element of masked array is true, it may return index = 4

00 - Bug numpy.ma numpy.random

All 6 comments

The issue also occurs in Generator.choice in the development version of numpy:

In [1]: import numpy as np

In [2]: np.__version__ 
Out[2]: '1.20.0.dev0+986e533'

In [3]: rng = np.random.default_rng()

In [4]: pm = np.ma.masked_array([0.1, 0.2, 0.4, 0.3], mask=[0, 0, 1, 1])

In [5]: rng.choice(4, p=pm, size=12)
Out[5]: array([4, 4, 4, 0, 4, 1, 4, 0, 0, 4, 4, 4])

Earlier I added the "bug" label, but that's a subjective judgement. There are other functions in numpy that don't handle masked arrays correctly (or better, "correctly", since "correctness" can also be a judgement call) where we don't necessarily consider it a bug. Perhaps this is just an example of undefined behavior, and the short answer for the issue is "don't do that!". It would be nicer, however, if we could raise an exception instead of returning nonsensical results.

It would be easy enough to add an explicit check for a masked array, and raise an error if any of the values are actually masked, but that feels like a very specific, ad hoc fix.

Does Generator.choice dispatch to __array_function__? If it doesn't, then it probably should.

If it does, then I think we just throw this in with all the other "maskedarray decays because we haven't implemented __array_function__ yet" bugs.

Earlier I added the "bug" label, but that's a subjective judgement. There are other functions in numpy that don't handle masked arrays correctly (or better, "correctly", since "correctness" can also be a judgement call) where we don't necessarily consider it a bug. Perhaps this is just an example of undefined behavior, and the short answer for the issue is "don't do that!". It would be nicer, however, if we could raise an exception instead of returning nonsensical results.

It would be easy enough to add an explicit check for a masked array, and raise an error if any of the values are actually masked, but that feels like a very specific, ad hoc fix.

I think if it doesn't handle it correctly, it shouldn't allow passing masked array (should raise an exception) but works correct mostly (it works well except some cases). it may cause potential issues for production environment.

There is no use of __array_function__ in Generator. While many force inputs to be well behaved C contiguous, it is probably that case that some functions do not always force this religiously enough.

If you add NPY_ARRAY_ENSUREARRAY then you end up with a ndarray that is full and ignores the mask. Hard to say this is "correct" to do. In @WarrenWeckesser's example, this ends up using pm=np.array([.1,.2,.3,.4]) as the probabilities.

Was this page helpful?
0 / 5 - 0 ratings