Numpy: `np.diff` is broken for boolean ndarrays in 1.13.0

Created on 14 Jun 2017 · 29Comments · Source: numpy/numpy

Should modify the docstring accordingly, raise a meaningful error, or (preferably) make it work by hidding some logic under the hood.

Traceback:

In [11]: x = np.array([[True, True], [False, False]])

In [12]: np.diff(x)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-12-59225436601c> in <module>()
----> 1 np.diff(x)

/usr/lib/python3.6/site-packages/numpy/lib/function_base.py in diff(a, n, axis)
   1924         return diff(a[slice1]-a[slice2], n-1, axis=axis)
   1925     else:
-> 1926         return a[slice1]-a[slice2]
   1927 
   1928 

TypeError: numpy boolean subtract, the `-` operator, is deprecated, use the bitwise_xor, the `^` operator, or the logical_xor function instead.

In [13]: np.__version__
Out[13]: '1.13.0'

xref https://github.com/scikit-image/scikit-image/issues/2681

06 - Regression numpy.lib good first issue

Source

soupault

Most helpful comment

Well - I can see the purity argument for disallowing, but it seems to me that it is reasonable to expect diff to work on booleans, and not surprising that scikit image was doing that. So I would have thought a branch on bool was the better route.

matthew-brett on 14 Jun 2017

👍3

All 29 comments

This seems like an easy fix. The diff function is pure Python and lives here:

https://github.com/numpy/numpy/blob/master/numpy/lib/function_base.py#L1835

You would probably want to branch on a.dtype == bool, and given the note regarding booleans in the docstring, use ^ (or !=) instead of - to keep backwards compatibility.

jaimefrio on 14 Jun 2017

It seems to me that if - is deprecated, then diff is perfectly reasonable to disallow too.

eric-wieser on 14 Jun 2017

matthew-brett on 14 Jun 2017

👍3

I'd vote for keeping the functionality. If we all agree, I'll submit a patch later today or tomorrow.

P.S. Could you, please, point me at the discussion where the rationale for deprecating -= was given?

soupault on 14 Jun 2017

If diff is deprecated on booleans, then the docstring needs updating to reflect that. And we'd probably still want to branch on bool to raise a more in context error?

jaimefrio on 14 Jun 2017

👍1

Seems to me that we might want an np.uint1 type that behaves as a integer and not a boolean, on which np.diff and - can be defined.

eric-wieser on 14 Jun 2017

My vote would be to avoid the user having to stop and debug an error on np.diff(some_bools); first - it breaks backward compatibility, and second, the desired behavior is useful and clearly defined.

matthew-brett on 14 Jun 2017

👍1

first - it breaks backward compatibility, and second, the desired behavior is useful and clearly defined.

Both of these arguments apply to np.subtract too, don't they? Perhaps that discussion should be revisited

eric-wieser on 14 Jun 2017

Yes, they do, and if you think of instinctively think of diff as implementing subtract then you'd probably want an error. My belief is that people think of diff in a slightly different way from subtract, so that they would agree that subtracting bools does not make sense, whereas taking a diff of bools does - meaning 'is this value different from the previous one'.

matthew-brett on 14 Jun 2017

👍1

"is this value different from the previous one" sounds like an operation worthy of it's own name that generalizes to all dtypes, rather than spelling it np.diff(x).astype(bool).

Perhaps ufunc.pairwise(x, axis=...), implemented as ufunc(x[:-1], x[1:]) ignoring the axis argument? That would make the desired function np.not_equal.pairwise.

Although pairwise might imply the behaviour that outer already has...

eric-wieser on 14 Jun 2017

👍1

Sure, that might be a good idea as well - but I would still argue that it would be kinder (practicality beats purity) to forgive this way of thinking of bools and np.diff.

matthew-brett on 14 Jun 2017

👍1

The subtraction operator for booleans has been deprecated for some time, since 1.9 I believe, scikit image hasn't been paying attention. Paying attention is also a good idea ;) I think a different name would be appropriate, xdiff or some such.

charris on 14 Jun 2017

That said, having diff handle booleans for backwards compatibility is probably a good idea.

charris on 14 Jun 2017

👍1

It's a bit unfair to blame scikit image here - I can't see any deprecation warning for numpy 1.12.1 and:

In [4]: np.diff([True, False])
Out[4]: array([ True], dtype=bool)

matthew-brett on 14 Jun 2017

👍2

can't see any deprecation warning for numpy 1.12.1 and

That's because by default DeprecationWarning is not displayed. Add a import warnings; warnings.simplefilter('always') and you'll see it

eric-wieser on 14 Jun 2017

😕2

In [1]: import warnings

In [2]: warnings.simplefilter('always')

In [3]: np.diff([True, False])
/home/charris/.local/lib/python2.7/site-packages/numpy/lib/function_base.py:1175: DeprecationWarning: numpy boolean subtract, the `-` operator, is deprecated, use the bitwise_xor, the `^` operator, or the logical_xor function instead.
  return a[slice1]-a[slice2]
Out[3]: array([ True], dtype=bool)

In [4]: np.__version__
Out[4]: '1.10.4'

charris on 14 Jun 2017

👍1

One other thing we should do is add a test for diff using a boolean array, be it to assert_raises or to validate the result.

Since the solution seems to be controversial (I think I'm with Matthew on this one, although I don't care much either way), I have sent an e-mail to the list so that everyone gets a say.

jaimefrio on 14 Jun 2017

I'm also of the practical beats purity type here, especially since this used to work. Somehow, like @matthew-brett, there is a "diff" is "difference/differs" confusion in my mind, and I certainly have used np.diff as a quick way to find places where values changed.

p.s. @jaimefrio - I didn't see any message on the (numpy discussion) mailing list.

mhvk on 14 Jun 2017

+1 for practicality.

And definitely -1 for a new function. If it's useful, then put it in diff. The namespace is way too large already.

rgommers on 14 Jun 2017

👍1

Yes, I noticed I couldn't find my mail in the archives to link it here. But there's an email in my sent folder to [email protected] at 18:00 CET. Perhaps the server is acting up again? Or were those issues fixed for good already?

jaimefrio on 15 Jun 2017

Wrong address, it has moved ;) [email protected] .

charris on 15 Jun 2017

Ah, that would explain it...

Here's a link to the mail, resent to the correct address, for future reference:

https://mail.python.org/pipermail/numpy-discussion/2017-June/076877.html

jaimefrio on 15 Jun 2017

This doesn't look good at all - https://github.com/scipy/scipy/issues/7493 .

soupault on 15 Jun 2017

Someone want to make a PR adding support for boolean arrays to diff?

charris on 16 Jun 2017

@charris I do. Are we all agreed on this?