Numpy: Sometimes, supress_warnings misses one of its attributes

Created on 23 Dec 2016  ·  60Comments  ·  Source: numpy/numpy

When trying to compile skimage, I sometimes get the following error:

Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/build/skimage-0.12.3/debian/tmp/usr/lib/python2.7/dist-packages/skimage/transform/tests/test_integral.py", line 46, in test_vectorized_integrate
    assert_equal(expected, integrate(s, r0, c0, r1, c1))  # test deprecated
  File "/build/skimage-0.12.3/debian/tmp/usr/lib/python2.7/dist-packages/skimage/transform/integral.py", line 86, in integrate
    warn("The syntax 'integrate(ii, r0, c0, r1, c1)' is "
  File "/build/skimage-0.12.3/debian/tmp/usr/lib/python2.7/dist-packages/skimage/_shared/_warnings.py", line 16, in warn
    warnings.warn(message, stacklevel=stacklevel)
  File "/usr/lib/python2.7/dist-packages/numpy/testing/utils.py", line 2199, in _showwarning
    self._orig_show(message, category, filename, lineno,
AttributeError: 'suppress_warnings' object has no attribute '_orig_show'

I assume this is a numpy problem, but I am not sure.

00 - Bug numpy.testing

All 60 comments

Hmmm, a bit odd. Is there some (strange) threading going on in the skimage tests? That would break warning testing pretty badly, since warning handling is not thread safe in python. I have a bit difficulty to see how else this could happen, but maybe should look at the exact code the test runs.

Since there is no assert_warns or so here I guess. You should only have a single suppress_warnings context going alive (which would be the outermost scope and created by numpy's test runner, assuming skimage ends up using it). Now why I am confused, is that _orig_show not being defined should only be possible if the context has already been exited. At that time warnings.showwarning should already be reset to the old value.

Of course the whole warnings stuff breaks down if you have threads. For example:

thread1: enters warning context -> replaces normal warning printing
thread2: enters warning context -> replaces thread1 warning handler
thread1: exits warning context -> resets to normal warning printing
thread2: exists warning context -> resets to thread1's warning handler -> kaboom.

Btw. I see you have a "try to clean up after __warning_registry__ stuff in skimage, the suppress warnings is a context manager that tries to solve a similar issue (and adds some other stuff), may or may not be interesting.

I just got the problem when I tried to build skimage for Debian, and I have no idea here. However, I opened scikit-image/scikit-image#2412 to have them involved.

Just for completeness: Sometimes, I even get a stacktrace without any involvement of skimage:

ERROR: test suite for <module 'skimage.transform.tests' from '/build/skimage-0.12.3/debian/tmp/usr/lib/python3/dist-packages/skimage/transform/tests/__init__.py'>
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/nose/suite.py", line 229, in run
    self.tearDown()
  File "/usr/lib/python3/dist-packages/nose/suite.py", line 352, in tearDown
    self.teardownContext(ancestor)
  File "/usr/lib/python3/dist-packages/nose/suite.py", line 368, in teardownContext
    try_run(context, names)
  File "/usr/lib/python3/dist-packages/nose/util.py", line 453, in try_run
    inspect.getargspec(func)
  File "/usr/lib/python3.5/inspect.py", line 1040, in getargspec
    stacklevel=2)
  File "/usr/lib/python3/dist-packages/numpy/testing/utils.py", line 2199, in _showwarning
    self._orig_show(message, category, filename, lineno,
AttributeError: 'suppress_warnings' object has no attribute '_orig_show'

What versions of python, numpy, etc., are you compiling with?

Numpy 1.12 ~RC~ beta 1, Python 2.7 and 3.5
(sorry, my mistake: didn't check RC yet)

Hmm, I am confused.... I can't see why there would be threading problems, but also can't really fathom this error occuring without either a race condition or incorrect nesting of catch_warning like stuff (including suppress_warning).

I would also think it is a race condition, since it does not happen always. Running the build twice under exactly the same environment lets the problem pop up on different places (or even not at all).

@olebole how exactly are you running the test suit?

Copied from our test suite:

#!/bin/sh
set -efu

pys="$(pyversions -rv 2>/dev/null)"
pkgbuild=${pkgbuild:-no}

srcdir=$PWD

for py in $pys; do
    echo "=== python$py ==="
    if [ "$pkgbuild" = "yes" ]; then
        export PYTHONPATH="$srcdir/debian/tmp/usr/lib/python$py/dist-packages"
        cd "$srcdir/build/"
    else
        cd "$ADTTMP"
    fi

    xvfb-run -a python$py /usr/bin/nosetests -s -v --exclude test_tools.py skimage 2>&1
done

The xvfb-run is there since the test needs to be run in an X11 environment.

Hmmmpf, anyone knows what exactly happens under the hood when nose starts to test a module? Nose confuses me, and I can't see why it would end up also messing around with warnings.... Or it is a bug in suppress warnings after all, but can't really see it, heh.

In my nosetests man page, I have

       --processes=NUM
              Spread  test run among this many processes. Set a number equal to the number of processors or cores in your machine for best results. Pass a negative
              number to have the number of processes automatically set to the number of cores. Passing 0 means to disable parallel testing.  Default  is  0  unless
              NOSE_PROCESSES is set. [NOSE_PROCESSES]

It seems that there is no parallel testing by default. At least, the race condition should not be there.

Is there any remote chance that nose runs this teardownContext based on garbage collection stuff?!

No, I am being silly probably. The suppress warning context resets warnings.showwarning before it deletes the attribute so doubt even the gc kicking in can't really create anything without something else going on. Just got no idea what else :).

Well, I see 33 errors with skimage-0.9.3, most from PIL or indexing, but none for suppress. How do you deal with all the other errors?

@charris The issue is reported for 0.12.3, not 0.9.3. :)

OK, finally got that pulled down from upstream, now 39 errors and a ton of deprecations. Most of the warnings seem due to the fact that tests think they are running in QT instead of Wayland, might be a configuration problem here. I also need to run the tests as

python -c'import skimage; skimage.test()'

as nosetests doesn't work at all.

Deprecations are normal. We still check our old API, even if a deprecation is expected. About errors, it's not normal. Would you mind reporting them on our (scikit-image) bug tracker please?

@sciunto The docs could use instructions for testing locally.

OK, I see it in the same place. The test is

def test_vectorized_integrate():
    r0 = np.array([12, 0, 0, 10, 0, 10, 30])
    c0 = np.array([10, 0, 10, 0, 0, 10, 31])
    r1 = np.array([23, 19, 19, 19, 0, 10, 49])
    c1 = np.array([19, 19, 19, 19, 0, 10, 49])

    expected = np.array([x[12:24, 10:20].sum(),
                         x[:20, :20].sum(),
                         x[:20, 10:20].sum(),
                         x[10:20, :20].sum(),
                         x[0,0],
                         x[10, 10],
                         x[30:, 31:].sum()])
    start_pts = [(r0[i], c0[i]) for i in range(len(r0))]
    end_pts = [(r1[i], c1[i]) for i in range(len(r0))]
    assert_equal(expected, integrate(s, r0, c0, r1, c1))  # test deprecated
    assert_equal(expected, integrate(s, start_pts, end_pts))

The # test deprecated comment suggests to me that perhaps the test needs some fixing, but still suppress_warnings should fail more gracefully.

Bit more information, doing

$ python skimage/transform/tests/test_integral.py

Which uses NumPy run_module_suite doesn't seem to fail, but issues the warning

======================================================================
ERROR: skimage.transform.tests.test_integral.test_vectorized_integrate
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/charris/Workspace/scikit-image/skimage/transform/tests/test_integral.py", line 46, in test_vectorized_integrate
    assert_equal(expected, integrate(s, r0, c0, r1, c1))  # test deprecated
  File "/home/charris/Workspace/scikit-image/skimage/transform/integral.py", line 86, in integrate
    warn("The syntax 'integrate(ii, r0, c0, r1, c1)' is "
  File "/home/charris/Workspace/scikit-image/skimage/_shared/_warnings.py", line 16, in warn
    warnings.warn(message, stacklevel=stacklevel)
UserWarning: The syntax 'integrate(ii, r0, c0, r1, c1)' is deprecated, and will be phased out in release 0.14. The new syntax is 'integrate(ii, (r0, c0), (r1, c1))'.

Note that scikit-image has it's own context manager for warnings in skimage/_shared/_warnings.py. That may be in conflict with suppress_warnings.

Out of curiosity, I wonder if it makes a difference to explicitly specify DeprecationWarning instead of using the default UserWarning.

So I don't see errors with

$ nosetests-2.7 skimage/transform/tests/test_integral.py |& grep orig_show

Whereas

$ nosetests-2.7 skimage/transform/tests/ |& grep orig_show

Shows it about 25% of the time. That suggests that the source of the error lies elsewhere and the failing test just exposes it.

In particular, skimage/transform/tests/test_geometric.py contains the warnings context manager expected_warnings and that makes me suspicious.

OK, now I suspect the test_parallel decorator, https://github.com/scikit-image/scikit-image/blob/master/skimage/_shared/testing.py . That defaults to two threads.

If I remove the three files importing test_parallel, there is no longer a problem.

EDIT: And now I can't reproduce the problem at all. Hmm... Maybe also depends on what else is running on the machine.

Hmmm, the test_parallel could be it I guess, though looking through the code, none of the functions obviously uses a warning context, though its not impossible I guess.

something in test_hough_transform.py seems to be the initiator. With that file removed there is no error.

EDIT: Perhaps because is precedes the test_integral.py module?

The problem seems to be this test in test_hough_transform.py

@test_parallel()
def test_hough_circle():
    # Prepare picture
    img = np.zeros((120, 100), dtype=int)
    radius = 20
    x_0, y_0 = (99, 50)
    y, x = circle_perimeter(y_0, x_0, radius)
    img[x, y] = 1

    out1 = tf.hough_circle(img, radius)
    out2 = tf.hough_circle(img, [radius])
    assert_equal(out1, out2)
    out = tf.hough_circle(img, np.array([radius], dtype=np.intp))
    assert_equal(out, out1)
    x, y = np.where(out[0] == out[0].max())
    assert_equal(x[0], x_0)
    assert_equal(y[0], y_0)

in particuler, either of the two lines

    assert_equal(out1, out2)

# or

    assert_equal(out, out1)

Will enable the error.

Likewise, removing the @parallel decorator fixes the error. So the upshot is that threads in combination with the assert_equal called with vectors leads to the problem.

Makes sense, the assert equal function uses it to filter ".*NAT ==" warnings, we could possibly try to remove it from that function since its not really obvious that assert_equal does warning suppression (and thus does not fully support threading).

Nice tracking down Chuck!

Trying to remove it would be a good thing.

I wonder if there are other numpy functions that are not thread safe?

Not sure, I think we only use warning stuff in the test suit. And the np.errstate is probably threadsafe?

many thanks for this nice investigation @charris and @seberg

Hmmm, I find it a bit annoying. Removing the suppressions from where they are might change behaviour for downstream, that is likely fine (maybe even cleaner), but not sure it is any good for a bugfix release. Could also lock a mutex on the comparison type tests, which may not always be fine in principle, but I can't really think of how it could break.

I think it would be enough to just fix assert_equal, it will need fixing in the future for the NaT comparison in any case. Note the NaT can be converted to int64 and has value min_int64.

Oh, true, we can actually just support NaT explicitly, a bit annoying currently (since there is no function to do check for NaT specifically), but not hard. There is another of these things around other array comparisons, but I guess it is not used much (and not sure for what).

Adding NaT logic to the asserts seems to work pretty well for master, though I am not sure it will work for backporting. I think while the suppress_warnings may be new, the race condition itself was there already in the last release.

We still need a isnat function or maybe support datetime/timedelta in isnan.

Yes, agree about the isnat function. There are also two types where is accurs: timedelta64 and datetime64. Note that the NaT value isn't exposed anywhere either, we could probably use a np.nat as well. The actual definition is in ndarraytypes.h.

@charris, Chuck, what is your take on it, rather create np.isnat or rather allow datetime and timedelta in isnan?

Hmm, putting it all in isnan is an interesting thought but I suspect it might cause trouble at this time, maybe later? @njsmith @juliantaylor Thoughts?

@shoyer may have an opinion as well.

I just implemented isnat, but I am not sure about all the timing stuff with respect to the release. If we have to try to fix this in a minimal version, the better option is probably to plug in a python isnat version into the test suit, though fixing it may turn out to be a bit tricky in any case (at the very least others will get more warnings then before, though maybe few 1. use our test suit stuff and 2. actually test warnings carefully).

I don't consider the failure as terribly severe but it would be nice to have it fixed. A (private?) python version of isnat would be fine with me for 1.12.

The suppress_warnings docstring should also mention that it is not thread safe.

It does :)

thanks a lot guys for jumping on the issue during holidays and for the thorough debug

is there anything i can do to implement/test a solution for this? debian is eager to get a fix for this :)

@sandrotosi, unfortunatly it is a bit tricky maybe, the simplest fix may be to simply not use the parallel testing stuff in skimage, but that somewhat defeats the purpose. We can pretty simply do something like in my isnat stuff (using a private python version of isnat). However, I am not quite certain that it may not create test regressions elsewhere :/.

Adding a mutex for those two cases may actually be a plausible hack which should remove the trouble and seems unlikely to go bad (since you would not call assert_equal or similar from inside of assert_equal). I would not actually use that in numpy master, but as a minimal bugfix for 1.12 it may be a real option. And it would not interfere with the skimage tests itself.

@olebole would disable parallel testing in skimage be an acceptable temporary solution?

@seberg yeah i think the target might be to have a minimal fix for 1.12 and eventually address it in a more complete/comprehensive way in master - that at least would let the next debian release have numpy without this issue

@sandrotosi I tried the lock approach in gh-8427. I am not whether or not the thought is nuts, but if someone wants to try it....

Is there some way we could simply modify suppress_warnings to have undefined behavior but not crash when used in a multi-threaded fashion?

(I have some ideas, but I'm still thinking through how exactly it could work.)

Sure, we can just not delete the attribute, but it might create bugs in the tests lateron... Though I guess most test suits are not as nitpicky as numpy about testing warnings, so....

I've uploaded 1.12.0rc2 (which contains https://github.com/numpy/numpy/pull/8427) to debian and rebuilt 3 times skimage (a slightly old version of teh debian package, without the test suite completely disabled) and all the times it built succesfully.

thanks a lot guys for working on this during the holidays!

Now, any plans for the final 1.12.0 release? :)

I plan on making the final release Jan 15.

I can confirm that it works with rc2. Thank you very much for your efforts!

I'm leaving this open until it is fixed in master.

Hmm, not fixed in master. @seberg Is it correct that #8421 will close this?

Fixed by #8421.

Was this page helpful?
0 / 5 - 0 ratings