Numpy: `np.diff` is broken for boolean ndarrays in 1.13.0

Created on 14 Jun 2017  ·  29Comments  ·  Source: numpy/numpy

Should modify the docstring accordingly, raise a meaningful error, or (preferably) make it work by hidding some logic under the hood.

Traceback:

In [11]: x = np.array([[True, True], [False, False]])

In [12]: np.diff(x)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-12-59225436601c> in <module>()
----> 1 np.diff(x)

/usr/lib/python3.6/site-packages/numpy/lib/function_base.py in diff(a, n, axis)
   1924         return diff(a[slice1]-a[slice2], n-1, axis=axis)
   1925     else:
-> 1926         return a[slice1]-a[slice2]
   1927 
   1928 

TypeError: numpy boolean subtract, the `-` operator, is deprecated, use the bitwise_xor, the `^` operator, or the logical_xor function instead.

In [13]: np.__version__
Out[13]: '1.13.0'

xref https://github.com/scikit-image/scikit-image/issues/2681

06 - Regression numpy.lib good first issue

Most helpful comment

Well - I can see the purity argument for disallowing, but it seems to me that it is reasonable to expect diff to work on booleans, and not surprising that scikit image was doing that. So I would have thought a branch on bool was the better route.

All 29 comments

This seems like an easy fix. The diff function is pure Python and lives here:

https://github.com/numpy/numpy/blob/master/numpy/lib/function_base.py#L1835

You would probably want to branch on a.dtype == bool, and given the note regarding booleans in the docstring, use ^ (or !=) instead of - to keep backwards compatibility.

It seems to me that if - is deprecated, then diff is perfectly reasonable to disallow too.

Well - I can see the purity argument for disallowing, but it seems to me that it is reasonable to expect diff to work on booleans, and not surprising that scikit image was doing that. So I would have thought a branch on bool was the better route.

I'd vote for keeping the functionality. If we all agree, I'll submit a patch later today or tomorrow.

P.S. Could you, please, point me at the discussion where the rationale for deprecating -= was given?

If diff is deprecated on booleans, then the docstring needs updating to reflect that. And we'd probably still want to branch on bool to raise a more in context error?

Seems to me that we might want an np.uint1 type that behaves as a integer and not a boolean, on which np.diff and - can be defined.

My vote would be to avoid the user having to stop and debug an error on np.diff(some_bools); first - it breaks backward compatibility, and second, the desired behavior is useful and clearly defined.

first - it breaks backward compatibility, and second, the desired behavior is useful and clearly defined.

Both of these arguments apply to np.subtract too, don't they? Perhaps that discussion should be revisited

Yes, they do, and if you think of instinctively think of diff as implementing subtract then you'd probably want an error. My belief is that people think of diff in a slightly different way from subtract, so that they would agree that subtracting bools does not make sense, whereas taking a diff of bools does - meaning 'is this value different from the previous one'.

"is this value different from the previous one" sounds like an operation worthy of it's own name that generalizes to all dtypes, rather than spelling it np.diff(x).astype(bool).

Perhaps ufunc.pairwise(x, axis=...), implemented as ufunc(x[:-1], x[1:]) ignoring the axis argument? That would make the desired function np.not_equal.pairwise.

Although pairwise might imply the behaviour that outer already has...

Sure, that might be a good idea as well - but I would still argue that it would be kinder (practicality beats purity) to forgive this way of thinking of bools and np.diff.

The subtraction operator for booleans has been deprecated for some time, since 1.9 I believe, scikit image hasn't been paying attention. Paying attention is also a good idea ;) I think a different name would be appropriate, xdiff or some such.

That said, having diff handle booleans for backwards compatibility is probably a good idea.

It's a bit unfair to blame scikit image here - I can't see any deprecation warning for numpy 1.12.1 and:

In [4]: np.diff([True, False])
Out[4]: array([ True], dtype=bool)

can't see any deprecation warning for numpy 1.12.1 and

That's because by default DeprecationWarning is not displayed. Add a import warnings; warnings.simplefilter('always') and you'll see it

In [1]: import warnings

In [2]: warnings.simplefilter('always')

In [3]: np.diff([True, False])
/home/charris/.local/lib/python2.7/site-packages/numpy/lib/function_base.py:1175: DeprecationWarning: numpy boolean subtract, the `-` operator, is deprecated, use the bitwise_xor, the `^` operator, or the logical_xor function instead.
  return a[slice1]-a[slice2]
Out[3]: array([ True], dtype=bool)

In [4]: np.__version__
Out[4]: '1.10.4'

One other thing we should do is add a test for diff using a boolean array, be it to assert_raises or to validate the result.

Since the solution seems to be controversial (I think I'm with Matthew on this one, although I don't care much either way), I have sent an e-mail to the list so that everyone gets a say.

I'm also of the practical beats purity type here, especially since this used to work. Somehow, like @matthew-brett, there is a "diff" is "difference/differs" confusion in my mind, and I certainly have used np.diff as a quick way to find places where values changed.

p.s. @jaimefrio - I didn't see any message on the (numpy discussion) mailing list.

+1 for practicality.

And definitely -1 for a new function. If it's useful, then put it in diff. The namespace is way too large already.

Yes, I noticed I couldn't find my mail in the archives to link it here. But there's an email in my sent folder to [email protected] at 18:00 CET. Perhaps the server is acting up again? Or were those issues fixed for good already?

Wrong address, it has moved ;) [email protected] .

Ah, that would explain it...

Here's a link to the mail, resent to the correct address, for future reference:

https://mail.python.org/pipermail/numpy-discussion/2017-June/076877.html

This doesn't look good at all - https://github.com/scipy/scipy/issues/7493 .

Someone want to make a PR adding support for boolean arrays to diff?

@charris I do. Are we all agreed on this?

@soupault I think that is the consensus.

This also applies to np.gradient. Do we fix that too? (could fold into #9411)

Kind ping on @eric-wieser 's question. Should we fix np.gradient as well?

Personally, I don't see much point in supporting gradients of bool arrays.

Was this page helpful?
0 / 5 - 0 ratings