Should modify the docstring accordingly, raise a meaningful error, or (preferably) make it work by hidding some logic under the hood.
Traceback:
In [11]: x = np.array([[True, True], [False, False]])
In [12]: np.diff(x)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-12-59225436601c> in <module>()
----> 1 np.diff(x)
/usr/lib/python3.6/site-packages/numpy/lib/function_base.py in diff(a, n, axis)
1924 return diff(a[slice1]-a[slice2], n-1, axis=axis)
1925 else:
-> 1926 return a[slice1]-a[slice2]
1927
1928
TypeError: numpy boolean subtract, the `-` operator, is deprecated, use the bitwise_xor, the `^` operator, or the logical_xor function instead.
In [13]: np.__version__
Out[13]: '1.13.0'
xref https://github.com/scikit-image/scikit-image/issues/2681
This seems like an easy fix. The diff
function is pure Python and lives here:
https://github.com/numpy/numpy/blob/master/numpy/lib/function_base.py#L1835
You would probably want to branch on a.dtype == bool
, and given the note regarding booleans in the docstring, use ^
(or !=
) instead of -
to keep backwards compatibility.
It seems to me that if -
is deprecated, then diff
is perfectly reasonable to disallow too.
Well - I can see the purity argument for disallowing, but it seems to me that it is reasonable to expect diff to work on booleans, and not surprising that scikit image was doing that. So I would have thought a branch on bool
was the better route.
I'd vote for keeping the functionality. If we all agree, I'll submit a patch later today or tomorrow.
P.S. Could you, please, point me at the discussion where the rationale for deprecating -=
was given?
If diff
is deprecated on booleans, then the docstring needs updating to reflect that. And we'd probably still want to branch on bool
to raise a more in context error?
Seems to me that we might want an np.uint1
type that behaves as a integer and not a boolean, on which np.diff
and -
can be defined.
My vote would be to avoid the user having to stop and debug an error on np.diff(some_bools)
; first - it breaks backward compatibility, and second, the desired behavior is useful and clearly defined.
first - it breaks backward compatibility, and second, the desired behavior is useful and clearly defined.
Both of these arguments apply to np.subtract
too, don't they? Perhaps that discussion should be revisited
Yes, they do, and if you think of instinctively think of diff
as implementing subtract
then you'd probably want an error. My belief is that people think of diff
in a slightly different way from subtract
, so that they would agree that subtracting bools does not make sense, whereas taking a diff
of bools does - meaning 'is this value different from the previous one'.
"is this value different from the previous one" sounds like an operation worthy of it's own name that generalizes to all dtypes, rather than spelling it np.diff(x).astype(bool)
.
Perhaps ufunc.pairwise(x, axis=...)
, implemented as ufunc(x[:-1], x[1:])
ignoring the axis argument? That would make the desired function np.not_equal.pairwise
.
Although pairwise
might imply the behaviour that outer
already has...
Sure, that might be a good idea as well - but I would still argue that it would be kinder (practicality beats purity) to forgive this way of thinking of bools and np.diff
.
The subtraction operator for booleans has been deprecated for some time, since 1.9 I believe, scikit image hasn't been paying attention. Paying attention is also a good idea ;) I think a different name would be appropriate, xdiff or some such.
That said, having diff
handle booleans for backwards compatibility is probably a good idea.
It's a bit unfair to blame scikit image here - I can't see any deprecation warning for numpy 1.12.1 and:
In [4]: np.diff([True, False])
Out[4]: array([ True], dtype=bool)
can't see any deprecation warning for numpy 1.12.1 and
That's because by default DeprecationWarning
is not displayed. Add a import warnings; warnings.simplefilter('always')
and you'll see it
In [1]: import warnings
In [2]: warnings.simplefilter('always')
In [3]: np.diff([True, False])
/home/charris/.local/lib/python2.7/site-packages/numpy/lib/function_base.py:1175: DeprecationWarning: numpy boolean subtract, the `-` operator, is deprecated, use the bitwise_xor, the `^` operator, or the logical_xor function instead.
return a[slice1]-a[slice2]
Out[3]: array([ True], dtype=bool)
In [4]: np.__version__
Out[4]: '1.10.4'
One other thing we should do is add a test for diff
using a boolean array, be it to assert_raises
or to validate the result.
Since the solution seems to be controversial (I think I'm with Matthew on this one, although I don't care much either way), I have sent an e-mail to the list so that everyone gets a say.
I'm also of the practical beats purity type here, especially since this used to work. Somehow, like @matthew-brett, there is a "diff" is "difference/differs" confusion in my mind, and I certainly have used np.diff
as a quick way to find places where values changed.
p.s. @jaimefrio - I didn't see any message on the (numpy discussion) mailing list.
+1 for practicality.
And definitely -1 for a new function. If it's useful, then put it in diff
. The namespace is way too large already.
Yes, I noticed I couldn't find my mail in the archives to link it here. But there's an email in my sent folder to [email protected] at 18:00 CET. Perhaps the server is acting up again? Or were those issues fixed for good already?
Wrong address, it has moved ;) [email protected] .
Ah, that would explain it...
Here's a link to the mail, resent to the correct address, for future reference:
https://mail.python.org/pipermail/numpy-discussion/2017-June/076877.html
This doesn't look good at all - https://github.com/scipy/scipy/issues/7493 .
Someone want to make a PR adding support for boolean arrays to diff?
@charris I do. Are we all agreed on this?
@soupault I think that is the consensus.
This also applies to np.gradient
. Do we fix that too? (could fold into #9411)
Kind ping on @eric-wieser 's question. Should we fix np.gradient
as well?
Personally, I don't see much point in supporting gradients of bool arrays.
Most helpful comment
Well - I can see the purity argument for disallowing, but it seems to me that it is reasonable to expect diff to work on booleans, and not surprising that scikit image was doing that. So I would have thought a branch on
bool
was the better route.