Numpy: np.copy(masked array) has unexpected behavior

Created on 26 Jun 2013  ·  11Comments  ·  Source: numpy/numpy

....where "unexpected" means _I_ didn't expect it. This may just be a documentation fix, or maybe I will be scolded for not realizing how np.ma works, but I don't think this behavior is appropriate:

In [100]: x = np.ma.ones(5)

In [101]: x.mask = [False,False,True,False,False]

In [102]: np.copy(x)
Out[102]: array([ 1.,  1.,  1.,  1.,  1.])

In [103]: np.ma.copy(x)
Out[103]:
masked_array(data = [1.0 1.0 -- 1.0 1.0],
             mask = [False False  True False False],
       fill_value = 1e+20)

In [104]: np.__version__
Out[104]: '1.8.0.dev-074a40c'

I would at least expect a warning, but would prefer to see, in order of preference:

  • np.copy(x) returns a masked array (since x is a masked array)
  • np.copy(x) warns that it is unmasking the array, then returns the same array as above
  • np.copy(x) raises an exception
  • np.copy(x) returns [1,1,nan,1,1] (this is probably not ideal)
00 - Bug numpy.lib

All 11 comments

Still open in 1.9-devel

Not sure how to fix this without special casing masked arrays, though.

Would changing the logic of np.copy to include subok=True in the call to array fix this issue?
From:

return array(a, order=order, copy=True)

To:

return array(a, order=order, subok=True, copy=True)

If so, should subok should be a parameter to np.copy and what should the default be, True or False? True will return the first preferred result, False will keep the current behavior, but allow the expected behavior but adding an argument.

I'm not in favor of adding a subok parameter to np.copy when we already have two completely viable solutions:

  • use the .copy() method
  • use np.array(a, subok=True) (it copies by default)

I would actually argue the opposite, since I think to most "copy" gives the impression that one would get something back that is, actually, a copy, and thus of the same type. I also see it as a small but useful contribution to make the whole of numpy more consistently able to deal with subclasses (just as I did earlier with np.broadcast_arrays, where no good substitute was available). However, given the need to remain backwards compatible, I do think we're stuck with a default of subok=False.

Anyway, @jjhelmus, why not make a quick PR and see what others think?

@shoyer: np.copy should probably go on your list of functions that need
some kind of duck array support, though -- for a duck array I think np.copy
and np.array should probably do different things. Maybe it should just call
arr.copy?

On Thu, Oct 15, 2015 at 7:00 AM, Marten van Kerkwijk <
[email protected]> wrote:

I would actually argue the opposite, since I think to most "copy" gives
the impression that one would get something back that is, actually, a copy,
and thus of the same type. I also see it as a small but useful contribution
to make the whole of numpy more consistently able to deal with subclasses
(just as I did earlier with np.broadcast_arrays, where no good substitute
was available). However, given the need to remain backwards compatible, I
do think we're stuck with a default of subok=False.

Anyway, @jjhelmus https://github.com/jjhelmus, why not make a quick PR
and see what others think?


Reply to this email directly or view it on GitHub
https://github.com/numpy/numpy/issues/3474#issuecomment-148395201.

Nathaniel J. Smith -- http://vorpus.org

Yes, I would be +1 on making np.copy try calling the copy method on duck arrays.

Still no reason not to support subok as well. And of course one can just call __copy__ if it exists.

Created PR #6509 which adds a subok parameter to the function with a default value of False. I'm guessing this will open up some discussion on the topic. Not sure if this typically takes place on the PR, the issue, or the mailing list.

Api discussion should be on the mailing list.
On Oct 18, 2015 7:37 PM, "Jonathan J. Helmus" [email protected]
wrote:

Created PR #6509 https://github.com/numpy/numpy/pull/6509 which adds a
_subok_ parameter to the function with a default value of _False_. I'm
guessing this will open up some discussion on the topic. Not sure if this
typically takes place on the PR, the issue, or the mailing list.


Reply to this email directly or view it on GitHub
https://github.com/numpy/numpy/issues/3474#issuecomment-149080511.

I'll send an email to the list on the topic tomorrow.

Was this page helpful?
0 / 5 - 0 ratings