numpy.isclose vs math.isclose

Created on 5 Dec 2017  ·  78Comments  ·  Source: numpy/numpy

numpy.isclose (https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.isclose.html):

abs(a - b) <= (atol + rtol * abs(b))

math.isclose (https://docs.python.org/3/library/math.html#math.isclose):

abs(a - b) <= max(rtol * max(abs(a), abs(b)), atol)

Note that Numpy's equation is not symmetric and correlates the atol and rtol parameters, both are bad things (IMO).

Here is a situation where Numpy "incorrectly" flags two numbers as equal:

a = 0.142253
b = 0.142219
rtol = 1e-4
atol = 2e-5

# true because atol interferes with the rtol measurement
abs(a - b) <= (atol + rtol * abs(b))
Out[24]: True

# correct result, rtol fails
abs(a - b) <= max(rtol * max(abs(a), abs(b)), atol)
Out[29]: False

Here is another one, this case symmetry problem:

a = 0.142253
b = 0.142219
rtol = 1e-4
atol = 1.9776e-05

# relative to b
abs(a - b) <= (atol + rtol * abs(b))
Out[73]: False

#relative to a
abs(a - b) <= (atol + rtol * abs(a))
Out[74]: True

# math one has no problems with this
abs(a - b) <= max(rtol * max(abs(a), abs(b)), atol)
Out[75]: False

Python math version looks to be bulletproof, should Numpy start using this? Are there any benefits of using the Numpy version?

57 - Close?

Most helpful comment

@njsmith: thanks for bringing me in.

A bit of history: when I proposed isclose for the stdlib, we certainly looked at numpy as prior art. If it was just me, I may have used a compatible approach, for the sake of, well compatibility :-).

But the rest of the community thought is was more important to do what was "right" for Python, so a long discussion ensued... I tried to capture most of the point in the PEP, if you want to go look.

Here is the reference to numpy:

https://www.python.org/dev/peps/pep-0485/#numpy-isclose

You can see that the same points were made as in this discussion.

In the end, three key points came out:

1) a symmetric approach would result in the least surprise.

2) the default absolute tolerance should probably be zero, so as not to make any assumptions about the order of magnitude of the arguments.

3) the difference between the "weak" and "strong" tests was irrelevant when used with small tolerances, as is the the expected use case.

In the end, I think we came up with the "best" solution for math.isclose.

But is it "better" enough to break backward compatibility? I don't think so.

Unfortunately, much of numpy (and python) had many features added because they were useful, but without a lot of the current discussion these things get, so we have a lot of sub-optimal designs. This is expected (and necessary) for a young library, and we just have to live with it now.

@njsmith is right -- I think very few uses of np.isclose() have the tolerances set by rigorous FP error analysis, but rather, trial an error, with np.isclose() itself.

However, I think the bigger problem is the default atol -- it assumes that your arguments are of order 1 -- which could be a VERY wrong assumption, and since it would often result in tests passing that shouldn't, users might not notice.

In [85]: np.isclose(9.0e-9, 1.0e-9)
Out[85]: True

ouch!

and yes, it's the atol causing this:

In [86]: np.isclose(9.0e-9, 1.0e-9, atol=0.0)
Out[86]: False

So it probably is a Good Idea to have a path forward.

I like the keyword argument idea -- seems a lot more straightforward than trying to mess with __future__ and the like.

And we could then decide if we wanted to start issuing deprecation warnings and ultimately change the default many versions downstream...

All 78 comments

Too late to change anything imho.

This is one of the most widely used function in all of numpy (through assert_allclose) The only path for this would be to choose another name.

I confess that the current implementation looks like a bug. Note that using max(abs(a), abs(b)) instead of abs(b) will not break any existing tests, it merely relaxes the condition in the case that abs(a) > abs(b).

At the very least, this needs a warning in the docstring that it mismatches the builtin

How about adding this?

from numpy.__future__ import isclose

Incidentally, that may be an idea for the random number versioning...

For clarification, the import would not actually import isclose but would enable the new semantics. It would be scooped to a module.

The idea is that we can allow people to use the correct function and at the same time, we don't need to break any existing code.

, the import would not actually import isclose but would enable the new semantics

I don't think making this not import is possible. Just pick a longer name, like mathlike_ufuncs, which could cover remainder as well.

Note that from __future__ import print_function actually does produce a print_function global

Actually, the from __future__ import * style of import might not be appropriate here - in python, it affects _syntax_ at a per-file level, and doing something similar in numpy would be tricky

The iterator exceptions were not a syntax change. We just need a small C module to inspect the namespace of the calling module. We can even cache the modules so that only import time is slowed, and only trivially.

You're right - and true_division is obviously analogous to isclose.

This is actually a nice way to avoid the problems in #9444, as an alternative to use with statements for managing deprecated features.

Let's CC @ChrisBarker-NOAA as the creator of math.isclose.

This is a pretty minor issue IMO. Generally atol and rtol are chosen by fiddling with them until tests pass, and the goal is to catch errors that are an order of magnitude larger than the tolerances. Maybe it would make sense to relax the rtol part like @charris suggested, but I really don't think it's worth pulling out stack introspection tricks for this tiny tweak. And we'd still have the problem that isclose is often called indirectly by things like assert_allclose or various third-party testing helpers.

I asked on StackOverflow about using __future__ style imports: https://stackoverflow.com/questions/29905278/using-future-style-imports-for-module-specific-features-in-python

TLDR: it's possible, but not easy or clean.

The stack introspection is not just for this, but is proposed as a general policy for changing the return values of functions. The question is: in principle, should this be changed? I do think that if the answer is yes, the deprecation period needs to be at least a few years given the widespread use. Python never included a past module but that's one option to try to improve API stability, if there is demand.

The other method would be to just add this as an option isclose(...,symmetric=True) or assert_allclose(symmetric=True) where the default would be False, the current situation.

I agree that this is a minor problem when you dont put a meaning to the rtol and atol values, just tune them to pass unit tests, as mentioned by @njsmith.

However, sometimes you would like to say that the error is within, for example, 1% (rtol=0.01).
In this case worst case error regarding rtol measurement is 100% (rtol * abs(b) gets close to atol, then atol + rtol * abs(b) ~= 2 * rtol * abs(b)).

Meaning that some values can pass with ~2% error:

atol = 1e-8 #default
rtol = 0.01 # 1%

b = 1e-6
a = 1.0199e-6 # ~ 2% larger comapared to b
abs(a - b) <= (atol + rtol * abs(b))
True

Implementing the idea of @charris, would make this particular case very slightly worse, as it even more relaxes the comparison, but still worth it as it gets rid of the symmetry problem and indeed is backward compatible.

IMO it would be better if Numpy used the Math function, but i understand that the change may be too disruptive and possibly not important for most users. Making the option to change between the isclose core would be useful.

@njsmith: thanks for bringing me in.

A bit of history: when I proposed isclose for the stdlib, we certainly looked at numpy as prior art. If it was just me, I may have used a compatible approach, for the sake of, well compatibility :-).

But the rest of the community thought is was more important to do what was "right" for Python, so a long discussion ensued... I tried to capture most of the point in the PEP, if you want to go look.

Here is the reference to numpy:

https://www.python.org/dev/peps/pep-0485/#numpy-isclose

You can see that the same points were made as in this discussion.

In the end, three key points came out:

1) a symmetric approach would result in the least surprise.

2) the default absolute tolerance should probably be zero, so as not to make any assumptions about the order of magnitude of the arguments.

3) the difference between the "weak" and "strong" tests was irrelevant when used with small tolerances, as is the the expected use case.

In the end, I think we came up with the "best" solution for math.isclose.

But is it "better" enough to break backward compatibility? I don't think so.

Unfortunately, much of numpy (and python) had many features added because they were useful, but without a lot of the current discussion these things get, so we have a lot of sub-optimal designs. This is expected (and necessary) for a young library, and we just have to live with it now.

@njsmith is right -- I think very few uses of np.isclose() have the tolerances set by rigorous FP error analysis, but rather, trial an error, with np.isclose() itself.

However, I think the bigger problem is the default atol -- it assumes that your arguments are of order 1 -- which could be a VERY wrong assumption, and since it would often result in tests passing that shouldn't, users might not notice.

In [85]: np.isclose(9.0e-9, 1.0e-9)
Out[85]: True

ouch!

and yes, it's the atol causing this:

In [86]: np.isclose(9.0e-9, 1.0e-9, atol=0.0)
Out[86]: False

So it probably is a Good Idea to have a path forward.

I like the keyword argument idea -- seems a lot more straightforward than trying to mess with __future__ and the like.

And we could then decide if we wanted to start issuing deprecation warnings and ultimately change the default many versions downstream...

I'd like to propose @bashtage's suggestion:

"""
The other method would be to just add this as an option isclose(...,symmetric=True) or assert_allclose(symmetric=True) where the default would be False, the current situation.
"""

I think it's a better option than a new function name, and could lead the way to future deprecation and change of the default (or not).

I think in addition to the (strong) symmetric test, atol should default to zero as well.

Given that, maybe we need a better name for the parameter than symmetric, though it's not bad...

Oh -- and it might be good to look to make sure that isclose() isn't called elsewhere in numpy in addition to assert allclose().

The easiest way to do that is probably to put deprecation warning in and
then remove it before merging. We could just go ahead and put the
deprecation warning in that says to change existing code to
symmetric=False; there's no reason to say that the default needs to be
changed anytime soon even if the warning is there.

Do we need to add it to assert_allclose too, or do we just make the change silently? Making people update every one of their tests to silence a warning is not really any different to making them update their tests to fix the tolerance reducing

@xoviat @eric-wieser there cannot be any backwards incompatible changes to assert_allclose nor a deprecation warning from it, way too disruptive. Unless I'm missing where you want to put a deprecation warning?

I'm afraid that this still sounds like a lot of pain for infinitesimal
gain, to me.

Having atol be nonzero by default is not an accident. It's needed to get
sensible default behavior for any test where the expected result includes
exact zeros.

No changes to any assert_allclose are required, as it would be updated to
internally to use the old behavior.

IMO silent changes are nasty! The deprecation warning is only so that
people eventually don't need to type the flag in an interactive prompt to
get the new behavior, nothing more

IMO silent changes are nasty!

Agreed. However, there should be zero changes. Deprecation warnings in widely used functions also force users (at least the ones that actually do maintenance) to make changes.

What about fixing the non-symmetry issue like @charris suggested? This would also free up some documentation space, that could be filled with a warning about the bad things that may happen when atol != 0.0.

We can't fix that issue without letting people know about the changed
behavior. The only way that I know how to do that is with a warning.

What would really be nice is an automated refactoring tool to inject flags
into old code. Then 'maintenance' would just be running a script on your
code; 5 minutes of work.

Sorry, this can be fixed, just with a new flag.

What about fixing the non-symmetry issue like @charris suggested?

That suggestion (https://github.com/numpy/numpy/issues/10161#issuecomment-349384830) seems feasible.

We can't fix that issue without letting people know about the changed behavior.

We're not fixing it. This is the thing about cost/benefit of any change that is just being discussed in the semantic versioning issue. This is definitely a case where we will not make any change or issue any warning from a widely used testing function. Note that the current behavior is not a bug (although arguably a suboptimal choice).

To be clear, I was proposing a change to isclose but not assert_allclose.
From looking at sources, that change will be a third as disruptive.
However, the benefit is still probably too small. I don't think anyone
objects to adding a flag though, correct?

I don't think anyone objects to adding a flag though, correct?

Not sure, depends on the details. @charris's suggestion may not require a flag, and any flag that introduces this is a bit questionable for arrays:

In [1]: import math

In [2]: math.isclose(0, 1e-200)
Out[2]: False

I'm opposed to adding flags without concrete motivations and use cases. Useful litmus test: if a novice dev asked you why this flag exists, how would you explain?

how would you explain?

  1. You use the symmetric=True flag if you want to vectorize calls to math.isclose
  2. It had to be a flag so that we didn't break existing code.

I'm more looking for situations where there is some actual problem being solved.

Probably the only actual problem being solved here is making beginners
think more about floating-point accuracy. I mean, the example by @rgommers
seems like it would be wrong, but what if you're dealing with really small
numbers? IMO the math.isclose implementation is better, but I don't even
think there is consensus on that. Unfortunately, there isn't really a
one-size-fits-all solution to determine whether numbers are "close." But
based on the responses from others (which are not wrong!), I don't really
any changes to the API moving forward. I guess the only action to take is
probably a documentation update then (famous last words, given that I
previously thought a flag would be okay)?

Surely a note comparing to math.isclose and documenting how to get vectored behavior that is identical by setting rtol and atol, if one wants, would be ok.

Also, if I understand correctly, @charris proposed a solution to allowing
isclose to be less tolerant than it currently is, which wouldn't break
any tests. I still think it would be a good idea to emit a warning (the
warning should be emitted once) if there is a situation where isclose would
consider numbers to be "close" when they were previously not. That is much
better
than simply changing function behavior silently and not letting
anyone know about it when it affects them.

I think some clarification is needed here on what we're discussing. There are two things that make math.isclose differ:

  • A different default for atol
  • A different definition of rtol

I don't think we can do anything about the first problem, other than document it as being different from `math.isclose.

The second problem, I think is best fixed by adding a symmetric argument that defaults to False. Now we can write in our docs _"np.isclose(x, y, symmetric=True, atol=0) is a vectorized version of math.isclose(x, y)"_, which to me seems to be the thing we're trying to solve.

From here, we have three options:

  1. Document the extra argument, and do nothing else
  2. Deprecate calling isclose without the argument, forcing users to write isclose(..., symmetric=False) to get the old behaviour without a warning (and similar for allclose). I suspect this won't hit too much code, but the outcome is less readable for no huge gain. assert_close would be changed to call isclose(..., symmetric=False) internally, so users of it would not be affected
  3. Like the above, but also require the symmetric argument to assert_close. This would be massive churn downstream
  4. Silently change the behaviour

Of these, I think option 1 unobjectionable, but the rest don't sound like things that are worth the disruption.

Edit: 2 might be acceptable if the warning is only emitted if behavior would change, which would be much less noisy.

@eric-wieser There's a third difference: the way atol and rtol are combined. (math.isclose uses max, numpy.isclose uses +). This means that unless atol or rtol is zero, there isn't any general way to make a math.isclose call match a numpy.isclose call.

I still don't think this is worth adding any user-visible APIs for though.

I was in favor of option two. I still am in favor of that option, with
the additional stipulation that numpy would provide an automated
refactoring tool (added to entry_points) that you could just run on your
existing projects to fix them. Based on what others said, it sounds like
this option would not be favored by others.

I am not, and have never been in favor of either options three or four. In
addition, I am not in favor of changing function behavior until a warning
has been emitted for at least four major releases.

Assuming that others disagree with option two (which they have), I would be
in favor of option one. But others (particularly @njsmith) are not in favor
of any of the options that you gave here. At least that's my perception.

@njsmith That's not correct; you can change function behavior with the flag.

I'm folding that third difference into a handwavey _"rtol is different"_

"""
I'm more looking for situations where there is some actual problem being solved.
"""
I'll bet dollars to doughnuts (which I'm will to do 'cause I have no idea what that means..) that there are tests out there that are passing that shouldn't because atol is making the test far less sensitive that is should be.

It seems there are three "problems" with the current implementation:

1) it's not symmetric

  • I think that's too bad, but really not a big deal, and it makes almost no difference when the values really are close :-) I _think_ it makes literally no difference if rtol < 1e-8 (at least if atol is 0.0)

2) atol effects the result even when not comparing to zero (that's unavoidable) -- but it effectively changes the tolerance by about a factor of two -- or even more if atol is large, which it might be if working with large order of magnitude values.

3) atol is non-zero be default -- I actually think this is the largest problem (particularly with the current algorithm of adding both) as it can very easily lead to tests passing that shouldn't -- and all too often we are a bit lazy -- we write the test with the default tolerance, ans if it passes, we think we're done. (when I realized that was how it worked, I went back to y code, and DID find a couple of those -- whoops!

Someone in this thread said something about "there would be something wring if:

isclose(1e-200, 0.0)

returned False by default. I disagree -- yes, that would be surprising, but it would force the user to think about what's going on, whereas the current implementation rsults in (for example):

In [8]: np.isclose(1e-20, 1e-10)
Out[8]: True

really? one is TEN ORDERS of MAGNITUDE larger than other and it comes back True????

My point is that having atol non-zero gives perhaps less surprising results in the common case of comparing to zero, but MUCH more dangerous and wrong results when working with small numbers (small really being anything less that order of magnitude 1.

And if you are working with large numbers, say larger than 1e8, than the default atol is also inappropriate.

And coupled with (2), it means the default atol also can mess up the relative tolerance tests in surprising ways.

So: no, it's not a bug, and it's not "broken", but is pretty sub-optimal, so it would be nice to have a way forward to a better implementation.

I did like the flag approach, but I think I'm changing my mind -- the problem is that unless we deprecate it, and have the default flag changed at some point, we'll just have almost everyone using the "old" algorithm pretty much forever. And there have been plenty of good arguments why we probably couldn't deprecate it.

So maybe a new function, with a new name is in order. We could add to the docs encouraging people to use the new one, and _maybe_ add warnings at some point when folks use the new one, but we'd never break anyone's code.

Anyone have an idea for a good name for a new function????

Maybe np.math.isclose and friends? I don't know.

That solves np.remainder being different too, and also allows you to easily vectorize code by using from numpy import math

So I would be +1 on a np.math module

Introducing new functions with only slightly different behavior from other functions that have worked like that for a decade is in general a very poor idea. In this case there really isn't much of a problem that cannot adequately be solved with documentation. So -1 on a new function. And definitely -1 on a whole new submodule.

really? one is TEN ORDERS of MAGNITUDE larger than other and it comes back True????

Whether your expectation is True or False is totally dependent on context. If that's for numbers that come from a continuous distribution on [0, 1) then yes, you'll probably expect True. A user really should understand absolute/relative tolerances and what a function actually does, and that's what docs are for.

symmetric_isclose() is a bit long-winded, but not offensive to my eyes. :bikeshed: emoji>

On Dec 10, 2017, at 8:09 PM, Ralf Gommers notifications@github.com wrote:

Introducing new functions with only slightly different behavior from other
functions that have worked like that for a decade is in general a very poor
idea.

I don’t like it either — but breaking people’s code with a change is worse.
Do you have an idea other than simple not making numpy better?

In this case there really isn't much of a problem that cannot adequately be
solved with documentation.

I disagree — defaults matter — a lot.

And I don’t think there is any way to adjust the parameters to give you a
symmetric comparison.

And definitely -1 on a whole new submodule.

Me too. If there are other things in the math module without numpy
equivalents that would be useful, they could be added to numpy’s namespace
like everything else.

really? one is TEN ORDERS of MAGNITUDE larger than other and it comes back
True????

Whether your expectation is True or False is totally dependent on context.

Exactly — that’s why there IS NO “reasonable” default for atol.

If that's for numbers that come from a continuous distribution on [0, 1)
then yes, you'll probably expect True.

But isclose is advertised as being a relative comparison — and here the
relative ness is being completely washed out be the absolute comparison,
without the user having to have thought about it.

That’s my whole point - the default is ONLY appropriate if you expect your
values to be of order of magnitude 1. A common case, sure, but not
universal.

This comes down to which is worse — a false negative or a false positive.
And in the common use case of testing, a false positive is much worse.

(i.e. a test passing that shouldn’t)

A user really should understand absolute/relative tolerances and what a
function actually does,

Absolutely - but a function also should have reasonable defaults and a
robust algorithm that is easy to understand.

I'm tending towards improve the documentation and leave the function alone. The lack of symmetry can be explained as "isclose(a, b) means that a is close to b, but due to the varying absolute precision of floating point, it is not always the case that b is close to a. In testing, b should be the expected result and a should be the actual result." Note that this makes sense for testing, the symmetric function case is actually a bit more complicated to justify. Relative error, involving division as it does, is not symmetric.

I don’t like it either — but breaking people’s code with a change is worse. Do you have an idea other than simple not making numpy better?

You're already passing a value judgment here that adding your new function is better than no new function. It's not imho. Three options:

  1. A breaking change - not acceptable, you can stop discussing it.
  2. Just adding better docs
  3. Adding a function

2 is overall a better choice than 3, hence that is what makes numpy "better". You're also ignoring that this atol/rtol thing is not limited to a single function, so what's next - a new and slightly "better" assert_allclose? This makes the case for doc-only even more clear.

This is a quite a serious defect, basically the code testing your code is bugged...not good. Would you send anything to the moon that is tested with numpy.isclose and default atol? I would think twice...and thats why we need to make these pitfalls stand out in documentation.
Adding an alias function will just clutter the codebase unless it is forced on users (not happening).

I do agree that the symmetry stuff is minor, but we should still fix it. Leaving it might distract users from real pitfalls.

@rgommers wrote:
"""
You're already passing a value judgment here that adding your new function is better than no new function. It's not imho.
"""
Well, I spent a LOT of time thinking about, and debating about math.isclose,and we did start by looking at the numpy implementation among others. so, yes, I DO think that approach is better. And I thought from this discussion that that was pretty much a consensus.

And getting a better algorithm / interface into numpy makes it better, yes.

Perhaps you mean that having both the old and new, better, function in numpy is NOT better, than simply leaving the old (perhaps better documented) function there. Sure, that's a totally valid point, but I was trying t have that discussion, and the previous comment that "Introducing new functions with only slightly different behavior from other functions that have worked like that for a decade is in general a very poor idea" seemed to shut down the discussion -- my point is that if the new way is "better enough" than it would be worth it. What we clearly don't have a consensus on is whether this particular option is "better enough", not whether it's "better".

And by the way, I personally have not convinced myself that it's worth a change, but I do want to to have the discussion.

Key here are a couple assumptions on my part. I don't know that we can ever know for sure whether they are correct, but:

1) the greatest use of np.isclose() is for testing -- are your answers close enough to what you expect? -- this is via np.assert_all_close, or more directly, in pytest tests, or....

2) Most people, most of the time, do not do anything like rigorous floating point error analysis to determine how good the answers are expected to be. Rather, they try a value, and if it fails they look at the results and decide if it really is an error, or if tehy need to adjust the toleranc(es) of the test.

  • this means that it doesn't matter all that much whether the atol and rtol get merged, and whether the test is symmetric.

3) many people, much of the time, start out the process in (2) with default tolerances, and only look to adjust the tolerance sif a test fails.

  • THIS means that having a default atol is quite dangerous -- tests passing that shouldn't is a really bad thing.

4) People do not read docs (beyond the initial "how do I call this" stage) -- at least not until they find confusing behaviour, and then they might go in and try to understand how something really works to clear up the confusion. But see (3) -- if a test does not fail, they don't know to go look at the docs to understand why.

All this leads me to the conclusion that numpy would be "better" with a more math.isclose-like FP closeness test.

And why better docs a great idea, but not enough.

Maybe I'm totally wrong, and most people carefully read the docs and select both rtol and atol carefully for their problem most of the time -- but I know I, nor the half a dozen people on my team, did that until I became aware of these issues.

:bikeshed: (darn, that didn't work -- no nifty emoji)

maybe relatively_close, or rel_close?

An additional "fun" wrinkle: assert_allclose actually does use atol=0 by
default. There's another whole thread somewhere debating whether we can fix
that inconsistency. (On my phone so can't easily find it.)

On Dec 11, 2017 14:58, "Chris Barker" notifications@github.com wrote:

@rgommers https://github.com/rgommers wrote:
"""
You're already passing a value judgment here that adding your new function
is better than no new function. It's not imho.
"""
Well, I spent a LOT of time thinking about, and debating about
math.isclose,and we did start by looking at the numpy implementation
among others. so, yes, I DO think that approach is better. And I thought
from this discussion that that was pretty much a consensus.

And getting a better algorithm / interface into numpy makes it better, yes.

Perhaps you mean that having both the old and new, better, function in
numpy is NOT better, than simply leaving the old (perhaps better
documented) function there. Sure, that's a totally valid point, but I was
trying t have that discussion, and the previous comment that "Introducing
new functions with only slightly different behavior from other functions
that have worked like that for a decade is in general a very poor idea"
seemed to shut down the discussion -- my point is that if the new way is
"better enough" than it would be worth it. What we clearly don't have a
consensus on is whether this particular option is "better enough", not
whether it's "better".

And by the way, I personally have not convinced myself that it's worth a
change, but I do want to to have the discussion.

Key here are a couple assumptions on my part. I don't know that we can
ever know for sure whether they are correct, but:

1.

the greatest use of np.isclose() is for testing -- are your answers
close enough to what you expect? -- this is via np.assert_all_close, or
more directly, in pytest tests, or....
2.

Most people, most of the time, do not do anything like rigorous
floating point error analysis to determine how good the answers are
expected to be. Rather, they try a value, and if it fails they look at the
results and decide if it really is an error, or if tehy need to adjust the
toleranc(es) of the test.

  • this means that it doesn't matter all that much whether the atol and
    rtol get merged, and whether the test is symmetric.

  • many people, much of the time, start out the process in (2) with
    default tolerances, and only look to adjust the tolerance sif a test fails.

  • THIS means that having a default atol is quite dangerous -- tests
    passing that shouldn't is a really bad thing.

  • People do not read docs (beyond the initial "how do I call this"
    stage) -- at least not until they find confusing behaviour, and then they
    might go in and try to understand how something really works to clear up
    the confusion. But see (3) -- if a test does not fail, they don't know to
    go look at the docs to understand why.

All this leads me to the conclusion that numpy would be "better" with a
more math.isclose-like FP closeness test.

And why better docs a great idea, but not enough.

Maybe I'm totally wrong, and most people carefully read the docs and
select both rtol and atol carefully for their problem most of the time --
but I know I, nor the half a dozen people on my team, did that until I
became aware of these issues.

:bikeshed: (darn, that didn't work -- no nifty emoji)

maybe relatively_close, or rel_close?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/numpy/numpy/issues/10161#issuecomment-350886540, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAlOaNquy47fsOkvBlxcSa-Mxyhgdimlks5s_bORgaJpZM4Q2J-P
.

You mean #3183?

assert_allclose actually does use atol=0 by default.

Interesting -- that does make me feel better.

This is bringing back a hazy memory of past discussions of differences between the two -- was that the only one?

Hi,
My thermo, fluids, and ht libraries make extensive use of numpy's assert_allclose, so I'm sharing a few thoughts here as it is related. This thread makes the behavioral difference seem very alarming, and a little like people should expect to find bugs in their code because of the lack of symmetry and/or the difference in combining atol and rtol as well as the default atol (not present in assert_allclose, I know). So I wanted to see if my libraries had any bugs, and hacked up a very crude implementation of assert_allclose which I then changed my tests to use instead temporarily.

I think my use of assert_allclose is representative - I compare individually selected values, and do tests where I fuzz things and call assert_allclose on the result of functions parametrically. I compare ints and floats of all sorts of magnitudes. I have put relatively little thought into choosing an rtol or atol, only rarely adding an atol or changing rtol to a less strict default. The libraries call assert_allclose 13159, 1178, and 4243 times respectively.

So I made the change, and I ran my tests tests. I'm very happy to say I did not find any new bugs or test failures; the only tests failures I could find were with my implementation of assert_allclose.

I know there are others who would be less fortunate, and run into an issue if anything in assert_allclose or isclose are changed. I personally would feel more comfortable writing tests if I had an assert_allclose mode which replicated the behavior of math.allclose, whether that be though another function or symmetric flag. But I know that is a lot of work for someone to do and there haven't been and PRs yet either. I'm glad to have the comfort of having checking my code for this flaw even once though!

4880 is the one I was thinking of.

On Dec 11, 2017, at 5:01 PM, Nathaniel J. Smith notifications@github.com
wrote:

4880 https://github.com/numpy/numpy/pull/4880 is the one I was thinking

of.

Thanks, Note that the statsmodel example makes my point well about the
default of atol — anything other than 0.0 is dangerous.

Does the fact that this comes up every couple years indicate that we should
finally do something about it?

Maybe it does. It’s a small wart, but sometimes those small annoyances add
up over time.

And while I’m gratified that assert_allclose has atol default to zero
(supporting my point that it’s an acceptable option, even if you don’t
agree it’s the best option) a lot of us — and more all the time — are not
using unittest, and thus may use np.all_close directly in tests.

And yes Ralf, if we do make a change, we will want to change all three
closely related functions :-(. But that gives us the chance to make them
more consistent, a bonus?

While the py2-3 transition has not gone very well, there is something to be
said for having an “it’s OK to clean up the warts” version at some point :-)

There is another option, to actually move is_close to the (indeed better) math version of atol=0: change the default to None, and try with atol=0; if all True just return; if some False, try again with standard atol and if the results change emit a deprecation warning (and return the "old" result).

Sadly, one cannot do both symmetry and atol at the same time, since they go in different directions (i.e., one would always have to run both cases, which seems a bit much). But I dislike the atol default much more than the lack of symmetry...

I am not in favor of changing any functions unless there is a function that
you can use that has the old behavior. In addition, there should be an
automated refactoring tool to make this change if done. However that is not
even on the table now.

The py3k transition is not something that we ever want to have again and it
is not a role model. Rolling cleanups into one big release is not a good
approach IMO.

While the py2-3 transition has not gone very well, there is something to be said for having an “it’s OK to clean up the warts” version at some point :-)

Just to address this general idea that special version numbers can make breaking changes OK: the general rule is that it is ok to clean up the warts, if there's a clear transition plan that avoids unacceptable levels of pain and the benefits are sufficient to justify the costs. The key point here is that the judgment needs to be in terms of the effect on users, not on whether we "followed the rules". (If you want to be fancy, we're consequentialists, not deontologists.) So the only reason to declare a specific version to be "this is the one that breaks stuff" is if makes things substantially easier for users to handle. IMO the benefits of rolling up breaking changes together are generally minor if they exist at all – broken code is broken code – and even if they exist it's very rare that they're going to tip the cost/benefit analysis from "no" to "yes".

atol is non-zero be default -- I actually think this is the largest problem (particularly with the current algorithm of adding both) as it can very easily lead to tests passing that shouldn't -- and all too often we are a bit lazy -- we write the test with the default tolerance, ans if it passes, we think we're done. (when I realized that was how it worked, I went back to y code, and DID find a couple of those -- whoops!

Note that the statsmodel example makes my point well about the default of atol — anything other than 0.0 is dangerous. [...] a lot of us — and more all the time — are not using unittest, and thus may use np.all_close directly in tests.

"We should break user code to increase consistency with some other package" doesn't score very well on the cost/benefit scale. "There are inconsistencies inside numpy that are not just confusing, but confusing in a way that directly leads to silent bugs in user code, and we can fix that" is way more compelling. I haven't formed an opinion myself yet, but if you want to get progress here then this is what I'd be pushing on.

A note:

“”” I have put relatively little thought into choosing an rtol or atol,
only rarely adding an atol
-snip-

$ call assert_allclose 13159, 1178, and 4243 times respectively.

-snip-

I did not find any new bugs or test failures;

“””

Good news, though I note that assert_allclose has atol default to zero. And
this is why :-)

@njsmith wrote:

"We should break user code to increase consistency with some other package"

I don’t think anyone on this thread is advocating that — I’m sure not. The only consistency anyone is advocating is consistency among the related numpy functions.

"There are inconsistencies inside numpy that are not just confusing, but confusing in a way that directly leads to silent bugs in user code, and we can fix that"

THAT is what I, at least am advocating. And I think most others.

The problem is that we can’t fix it without either:

Breaking backward compatibility, which I don’t think anyone thinks this is serious enough to do — even with a deprecation cycle.

Or

Making a new flag or function.

I think a new function would be a cleaner way to do it. Old code could remain unchanged as long as it wants, new code could use the new functions, and a search and replace (or a few import as calls) could make it easy to switch one file at a time.

( I suppose you could even monkey patch numpy in you test code....)

We already have:

  • allclose
  • assert_allclose
  • assert_almost_equal
  • assert_approx_equal
  • assert_array_almost_equal
  • assert_array_almost_equal_nulp
  • assert_array_max_ulp

I don't think adding more options to this list is actually going to make much difference to real users.

Well, I spent a LOT of time thinking about, and debating about math.isclose,and we did start by looking at the numpy implementation among others. so, yes, I DO think that approach is better.

So did I, and I've been one of the main maintainers of numpy.testing - this is hardly a new issue. Insisting with capital letters that you're right doesn't make it so.

Perhaps you mean that having both the old and new, better, function in numpy is NOT better, than simply leaving the old (perhaps better documented) function there. Sure, that's a totally valid point, but I was trying t have that discussion,

Indeed, it should be clear that that's what I meant.

"Introducing new functions with only slightly different behavior from other functions that have worked like that for a decade is in general a very poor idea" seemed to shut down the discussion

No, it points out a real issue with adding new functions that's too often ignored or not given enough weight. In this case it seems pretty clear to many core devs - it looks like you're getting the judgment that this ship has sailed from @njsmith, @pv, @charris and me.

I apologize in advance for rehashing this, but assert_allclose is the correct behavior (or at least close enough to it). However, isclose may get people into trouble because it assumes an absolute tolerance as others have noted. I'm considering submitting a PR to put a big red box on the np.isclose documentation page to warn people about this behavior. How does that sound?

I apologize in advance for rehashing this, but assert_allclose is the
correct behavior (or at least close enough to it). However, isclose may get
people into trouble because it assumes an absolute tolerance as others have
noted. I'm considering submitting a PR to put a big red box on the
np.isclose documentation page to warn people about this behavior. How does
that sound?

+1

I there is consensus that the docs should be better.

Thanks,

-CHB


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/numpy/numpy/issues/10161#issuecomment-351182296, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AA38YDw4sdhRWYmeeyr4met1RCwWsjMQks5s_uBmgaJpZM4Q2J-P
.

I support improving the documentation, but I think that the proposed change is not as helpful as it could be.

.. warning:: The default atol is not appropriate for numbers that
are much less than one.

Firstly, this warning is not even true, in general. The default atol _is_ appropriate when you desire to treat numbers smaller than 1e-8 as equal to zero. This is not possible with a relative tolerance, which is why an absolute tolerance exists. Conversely, the default atol _is not_ appropriate when you desire to treat numbers smaller than 1e-8 as meaningful. Let's not assume that one size fits all.

Therefore, @xoviat, with the deepest respect, I strongly object to your subjective statement:

... assert_allclose _is_ the correct behavior (or at least close enough to it).

I think the problem with the current docs, is that it does a very good job of describing _what_ atol is, but doesn't describe _why_ it is there, and therefore users may not understand when to change the default value.

I propose something along the lines of:

"Setting a small nonzero value of atol allows one to treat numbers that are very close to zero as being effectively equal to zero. Therefore, the default value of atol may not be appropriate for your use case. Choose atol such that numbers larger than atol are considered by you to be meaningful (distinct from zero), and numbers smaller than atol are considered by you to be negligible (same as zero)."

Lastly, I would add the same note to the docstrings of isclose and allclose.

This is my suggestion. Take it or leave it.
Cheers.

Please leave review feedback on the PR. In particular, your proposal is too verbose IMO.

Sorry all if I was being too annoying -- I really didn't think there was a consensus yet.

@xoviat: thanks for working on the docs.

Since this will come up again, I'm going to (probably) bring an end to this discussion with a mini version of a NEP designed to be rejected -- summarize the proposals and the reasons for rejection. For the record -- since the previous issues just kind of petered out.

Sorry for bringing up py2/3 issue -- whether/how to clean up warts in numpy is a discussion for another place and time.

The Issue with isclose()

This issue (and others referenced) is the result of observations that numpy.isclose() and friends use a less than optimal algorithm, has a contentious default for atol, and the various related functions have different defaults. This overall results in confusion for users, and in a worst case, false positive tests when users aren't thoughtful about setting the atol value.

In particular, the implementation in the stdlib: math.isclose(), provides a different, and arguably better, algorithm and defaults : it is a symmetric test, it doesn't mix rtol and atol, and the default atol is 0.0

There is near-consensus that the situation isn't ideal, but no consensus that it is bad enough to do anything about it.

Options considered:

Changing the algorithms and/or defaults:

Universally rejected due to backward compatibility issues, even with a deprecation period and warnings -- these functions are widely used in tests, so it would be big annoyance at the very least.

Adding an extra parameter with a flag to select a different algorithm

This would break no existing code, but would persist forever with an ugly and confusing API.

Adding a __future__ - type directive

TLDR: it's possible, but not easy or clean. No one seemed to want to pursue this.

Creating yet another function

This one seemed to be the only option that gained any traction, but was not supported by the core devs that participated in the discussion.

The cleanest way to "fix" this would be to add a np.rel_close() [or other name] with a new algorithm and defaults, likely the ones used in math.isclose. The new function would be documented as the "recommended" one to use for future code. It would be possible to add deprecation warnings to the old one in the future -- but no one seemed to think the noise would be worth it at this point.

A refactoring tool could be built to make the replacement -- but who's going to do that for this one use?

This would result in probably two very similar functions for the foreseeable future, and "Introducing new functions with only slightly different behavior from other functions that have worked like that for a decade is in general a very poor idea."

Conclusion:

It's not worth it for the small gain, but better docs are in order, and that is done here: #10214

I'm still left with one question:

@njsmith wrote:

"""
We already have:

allclose
assert_allclose
assert_almost_equal
assert_approx_equal
assert_array_almost_equal
assert_array_almost_equal_nulp
assert_array_max_ulp

I don't think adding more options to this list is actually going to make much difference to real users.
"""

Do you mean that adding a new one would be an OK idea?

I'd also note:

most of these are asserts -- and proliferation of asserts is a side effect of the unittest architecture.

As many of us move to other testing architectures (e.g. pytest), the need for asserts goes away.

So what do do with numpy.testing is a separate question from what to do with numpy core.

As many of us move to other testing architectures (e.g. pytest), the need for asserts goes away.

Actually, this is a danger in itself. Tests might be changed from assert_allclose(...) to assert np.allclose(...), and will silently become less strict, which is a bad thing.

I still use assert_allclose in pytest environment because it gives helpful failure message:

    def test_shs():
        a = [0.1, 0.2, 0.3, 0.4]
        b = [0.2, 0.3, 0.3, 0.4]

>       np.testing.assert_allclose(a,b)
E       AssertionError: 
E       Not equal to tolerance rtol=1e-07, atol=0
E       
E       (mismatch 50.0%)
E        x: array([ 0.1,  0.2,  0.3,  0.4])
E        y: array([ 0.2,  0.3,  0.3,  0.4])

vs using assert np.allclose()

    def test_shs():
        a = [0.1, 0.2, 0.3, 0.4]
        b = [0.2, 0.3, 0.3, 0.4]
>       assert np.allclose(a, b)
E       assert False
E        +  where False = <function allclose at 0x7f20b13c9840>([0.1, 0.2, 0.3, 0.4], [0.2, 0.3, 0.3, 0.4])
E        +    where <function allclose at 0x7f20b13c9840> = np.allclose

It might be possible to fix that by implementing pytest_assertrepr_compare, but I'm not sure how to apply that to function calls, and can't work out where pytest even invokes it.

@eric-wieser: wrote:

"Actually, this is a danger in itself. Tests might be changed from assert_allclose(...) to assert np.allclose(...), and will silently become less strict, which is a bad thing."

Exactly my point -- it's a "Bad Idea" to assume that everyone is going to use the asserts for testing, and thus not be concerned with whether the defaults of isclose() and allclose() are appropriate for testing -- in an ideal world they certainly should be.

Do you mean that adding a new one would be an OK idea?

No, I meant that given we already have a whole menagerie of slightly-different ways of expressing almost-equal tests, unfortunately most users are not going to notice or understand yet-another addition.

I'd also note:
most of these are asserts -- and proliferation of asserts is a side effect of the unittest architecture.
As many of us move to other testing architectures (e.g. pytest), the need for asserts goes away.

They happen to be written as asserts, but AFAICT every one of those functions actually encodes a different definition of "almost equal". (I think. Some of the distinctions are so obscure that I can't tell if they're real or not.)

Changing the algorithms and/or defaults:
Universally rejected due to backward compatibility issues, even with a deprecation period and warnings -- these functions are widely used in tests, so it would be big annoyance at the very least.

I wouldn't quite put it like that. For me, this is the only approach that would potentially have benefits sufficient to justify the cost. I'm not saying that they would, and I can't speak for the other core devs; I'd want to see some data, and without data, erring on the side of conservatism seems like the right choice. But if someone produced data, I'd at least look at it :-).

Like, if someone showed up and said "I tried my proposed change on 3 big projects, and it led to 12 extra failures out of 10,000 tests, and of those 12, 8 of them were actual silent bugs that had been hidden by the old bad defaults"... that would be pretty convincing. Especially since this is a case where we have the technical ability to do narrowly targeted warnings. I'm guessing the actual numbers would not be quite so favorable, but until someone checks, who knows.

On Mon, Dec 18, 2017 at 3:57 PM, Nathaniel J. Smith <
notifications@github.com> wrote:

No, I meant that given we already have a whole menagerie of
slightly-different ways of expressing almost-equal tests, unfortunately
most users are not going to notice or understand yet-another addition.

Well, there is one big difference -- each of those was added because it did
something different, and with the vagaries of floating point, those
differences can be important.

A new relative-closeness function would do essentially the same thing, but
do it "better". And the goal would be to recommend that the new one be used
instead of the old (and maybe deprecate the old eventually).

Honestly, I'm not sure if that's an argument for or against the idea,
though.

Like, if someone showed up and said "I tried my proposed change on 3 big

projects, and it led to 12 extra failures out of 10,000 tests, and of those
12, 8 of them were actual silent bugs that had been hidden by the old bad
defaults"... that would be pretty convincing. Especially since this is a
case where we have the technical ability to do narrowly targeted warnings.
I'm guessing the actual numbers would not be quite so favorable, but until
someone checks, who knows.

hmm, my code base has about 1500 tests -- not 10,000, but I'll give that a
shot. If nothing else, I may find a bug or poor test or two. Or get more
reassurance!

Hey all,

Wanted to chime in here after a colleague encountered this problem.

_Disclaimer: I didn’t really give the logic much thought before screaming “fire in the hall” and throwing this over the fence. If I’ve made an embarrassing hash of it, be kind, pretty please._

To me, this all sounds a lot like the classic architecture/design “chicken & egg” problem - semantics vs implementation. Not all bugs exist within the code, sometimes they exist within the semantics. This isn’t much different (if at all) from an algorithm being flawlessly implemented only to discover that the algorithm is flawed - the bug isn’t in the code, it’s in the algorithm, but either way it’s still a bug. Basically, this discussion sounds a bit like the classic joke “it’s not a bug it’s a feature”. Of course, a “choice” is just that, but IMHO this normally signals the end of rationale - if the semantics are chosen to be as is, as implemented, then so be it, the implementation is all cool, end of discussion.

So what are the “desired" and/or “expected” semantics of isclose(). Thinking this over, I’m left inexorably converging on the semantics being user defined, i.e. the user needs to be able to define the semantic definition of “close”.

Defaults

Regarding defaults, this ultimately doesn’t break any semantics. However, poor choice _is_ dangerous. If the non zero default was _only_ chosen to give reasonable behavior for when abs(a - b) == 0 then this definitely sounds like the incorrect soln. This particular state would be better off being special cased as it is special cased semantically. There is no “wiggle room” in the definition of “close” when the diff is zero, it’s the self-defining corner stone, i.e. they’re not _close_, they’re _exact_ - where “close” is a relative deviation from exact. (_Note:_The special casing conditional may (or may not) impact performance.)

+ vs max

The mixing or “shadowing” of rel_tol and a_tol due to the implementation using a sum and not max _does_ break the above function semantics. The user’s definition of “close” is bound in a non-trivial manner as it mangles, and therefore breaks, the semantics of “relative” and “absolute". Considering only the above semantics, this aspect _is_ a bug. I find @gasparka opening example an irrefutable argument of this point.

Commutativity (i.e. symmetry)

There are really only two operators here: - in a-b, and <= in |a-b| <= f(a, b, atol, rtol). Whilst subtraction is anticommutative, |a - b| isn't, it is commutative, by definition. However, this alone may not be enough to state the commutative of f(a, b, atol, rtol). There is however a strong and weak argument here.

  • The weak being the simplest - encapsulation of function separation or lack of - whilst arithemtically the commutativity of |a - b| may not enforce that of f(a, b, atol, rtol), programmatically, f(a, b, atol, rtol) is not that really in question but isClose(a, b, atol, rtol). Here the computation |a - b| is conducted internally to the function i.e. isClose() := |a-b| <= f(a, b, atol, rtol). Since this is internal and a and b are passed through, for |a-b| to be commutative isClose() must also be. If it isn’t, the commutativity of |a-b| losses all meaning.
  • Strong argument: The comparison op is not-strict, i.e. it is <= not <, therefore for the equality portion of the comparison you have to satisfy |a-b| = f(a, b, atol, rtol) which implies f(a, b, atol, rtol) _must_ also be commutative (I think?). Failing to do so means that either the above equality is _never_ true (there are no values of a & b that satisfy this) or that the strict inequality, <, has actually been semantically defined, right?

What you do (or don’t do) about all or any of this is a completely separate question.

Was this page helpful?
0 / 5 - 0 ratings