Junit4: Timeout ThreadGroups should not be destroyed

Created on 11 Feb 2020  ·  27Comments  ·  Source: junit-team/junit4

Environment:

Eclipse: Version: 2019-09 R (4.13.0), Build id: 20190917-1200
EclEmma 3.1.2.201903112331 org.eclipse.eclemma.feature.feature.group Eclipse EclEmma
JUnit 4.13
Windows 10 x64
Java 1.8u241

Issue:

I've run into an odd error while running coverage on an unit test having some methods use Google Guava's CacheBuilder in Eclipse. The coverage fails with the error code 5013 and error report indicating java.net.SocketException: Connection reset. If I remove these methods, the coverage succeeds.

If I revert back to JUnit 4.12, both the test file and the individual test succeeds in producing the coverage data.

Example method:
https://github.com/ReactiveX/RxJava/blob/5f6aafcbaa97f330ad2007f4ad02c5e33db8914b/src/test/java/io/reactivex/rxjava3/internal/operators/flowable/FlowableGroupByTest.java#L1917

This is where the cache is created:
https://github.com/ReactiveX/RxJava/blob/5f6aafcbaa97f330ad2007f4ad02c5e33db8914b/src/test/java/io/reactivex/rxjava3/internal/operators/flowable/FlowableGroupByTest.java#L2194

Additional details:

  • If I run "gradlew test jacocoTestReport" on the entire project, it succeeds in creating the report HTML apparently.
  • If I run "Coverage As > JUnit test" on the entire project from Eclipse, it fails with the same 5013 error.
  • If I run any other test class/package/method in the project, the coverage is created successfully.

Please advise.

Most helpful comment

Hi guys 👋

@akarnokd thank you for forwarding here my responses from JaCoCo Mailing List.

@marcphilipp I think that the following happens in the case of the above example (https://github.com/junit-team/junit4/issues/1652#issuecomment-593559646).

After execution of static block there is one shutdown hook in java.lang.ApplicationShutdownHooks.hooks

{Thread@1552} "Thread[Thread-0,5,main]" -> {Thread@1552} "Thread[Thread-0,5,main]"

And after execution of CacheBuilder.newBuilder() there are two

{LogManager$Cleaner@1551} "Thread[Logging-Cleaner,5,FailOnTimeoutGroup]" -> {LogManager$Cleaner@1551} "Thread[Logging-Cleaner,5,FailOnTimeoutGroup]"
{Thread@1552} "Thread[Thread-0,5,main]" -> {Thread@1552} "Thread[Thread-0,5,main]"

The LogManager$Cleaner@1551 is created by java.util.logging.LogManager (https://github.com/openjdk/jdk/blob/jdk8-b120/jdk/src/share/classes/java/util/logging/LogManager.java#L253) because com.google.common.cache.CacheBuilder uses java.util.logging (https://github.com/google/guava/blob/v28.2/guava/src/com/google/common/cache/CacheBuilder.java#L226).

Note that it also belongs to FailOnTimeoutGroup created by org.junit.rules.Timeout (https://github.com/junit-team/junit4/blob/r4.13/src/main/java/org/junit/internal/runners/statements/FailOnTimeout.java#L123).

So start of this hook after the test throws IllegalThreadStateException (at https://github.com/openjdk/jdk/blob/jdk8-b120/jdk/src/share/classes/java/lang/Thread.java#L709), because the group is already destroyed (https://github.com/junit-team/junit4/blob/r4.13/src/main/java/org/junit/internal/runners/statements/FailOnTimeout.java#L140).

This exception stops the execution of subsequent hooks (https://github.com/openjdk/jdk/blob/jdk8-b120/jdk/src/share/classes/java/lang/ApplicationShutdownHooks.java#L101-L103) and silently ignored (https://github.com/openjdk/jdk/blob/jdk8-b120/jdk/src/share/classes/java/lang/Shutdown.java#L115-L129).

The sequence in which hooks start depends on keySet (https://github.com/openjdk/jdk/blob/jdk8-b120/jdk/src/share/classes/java/lang/ApplicationShutdownHooks.java#L97) of IdentityHashMap (https://github.com/openjdk/jdk/blob/jdk8-b120/jdk/src/share/classes/java/lang/ApplicationShutdownHooks.java#L39) and that's why varies from run to run.

This doesn't happen in case of JUnit 4.12, because destroy of FailOnTimeoutGroup was added in JUnit 4.13 (https://github.com/junit-team/junit4/commit/63fd27710ee2bab6fdf00965f41ef12b79120c70). And this change also explains the difference between 4.12 and 4.13 observed during investigations (https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/jacoco/SHw3R6cvZSs/yaFgd7XUAQAJ) for the following example:

import org.junit.Rule;
import org.junit.Test;
import org.junit.rules.Timeout;

import java.util.concurrent.TimeUnit;

public class ExampleTest {
  @Rule
  public Timeout globalTimeout = new Timeout(5, TimeUnit.MINUTES);

  @Test
  public final void test() {
    Runtime.getRuntime().addShutdownHook(new Thread(new Runnable() {
      @Override
      public void run() {
        System.err.println("hook executed");
      }
    }));
  }
}

The same can be demonstrated without JUnit as following

import java.util.logging.LogManager;

public class Example {
  public static void main(String[] args) throws Exception {
    Runtime.getRuntime().addShutdownHook(new Thread(new Runnable() {
      @Override
      public void run() {
        System.err.println("example hook");
      }
    }));

    ThreadGroup group = new ThreadGroup("example group");
    Thread thread = new Thread(group, new Runnable() {
      @Override
      public void run() {
        LogManager.getLogManager();
      }
    }, "example thread");
    thread.start();
    thread.join();
    group.destroy();
  }
}

So I was also thinking about creation of a ticket in OpenJDK, and while thinking about what can be suggested in such ticket, found already existing https://bugs.openjdk.java.net/browse/JDK-6704979

Nevertheless wondering if there is a chance to do something about this in JUnit.

All 27 comments

Could you please provide a stacktrace and/or Gradle build scan?

There is no JVM crash file or failure stacktrace. Only an Eclipse error about losing connection (java.net.SocketException: Connection). I don't have any issues when I run the test(s) in Gradle from command line so not sure how that would help.

In that case, I suspect an Eclipse or EclEmma issue. Could you please bring it up with them?

Okay. One last question. Could it be that the JUnit 4.13 integration has some trouble with Eclipse?

Well, there‘s always a chance we broke something. But we didn‘t change any APIs use by integrations as far as I know.

When I searched for the error message on StackOverflow I found https://stackoverflow.com/a/42904311/95725 which indicates that it might be a memory problem. Maybe with the upgrade to 4.13 you cross some kind of threshold memory-wise.

There is also always the possibility that you have multiple versions of JUnit on your classpath.

Yes I tried that before posting to no avail. Also it is quite peculiar to fail only at those cache-methods and not for the other 12000+ tests. I've yet to hear back from the EclEmma people.

Hi. I got some feedback from the EclEmma people: https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/jacoco/SHw3R6cvZSs/yaFgd7XUAQAJ

Looks like shutdown hooks are not executed the same way in 4.13 as it used to happen in 4.12 and thus the coverage failure.

Here is a JUnit demonstrating the issue (similar to how EclEmma would do it) that sometimes prints "hook executed", sometimes doesn't with 4.13:

import java.util.concurrent.TimeUnit;

import org.junit.Rule;
import org.junit.Test;
import org.junit.rules.Timeout;

import com.google.common.cache.CacheBuilder;

public class ExampleTest {

  static {
    System.err.println("registering hook");
    Thread thread = new Thread(new Runnable() {
      @Override
      public void run() {
        System.err.println("hook executed");
      }
    });
    Runtime.getRuntime().addShutdownHook(thread);
    System.err.println(thread.getThreadGroup());
  }

  @Rule
  public Timeout timeout = new Timeout(5, TimeUnit.MINUTES);

  @Test
  public void test() {
    CacheBuilder.newBuilder() //
      .<Integer, Object>build();
  }
}

Hi guys 👋

@akarnokd thank you for forwarding here my responses from JaCoCo Mailing List.

@marcphilipp I think that the following happens in the case of the above example (https://github.com/junit-team/junit4/issues/1652#issuecomment-593559646).

After execution of static block there is one shutdown hook in java.lang.ApplicationShutdownHooks.hooks

{Thread@1552} "Thread[Thread-0,5,main]" -> {Thread@1552} "Thread[Thread-0,5,main]"

And after execution of CacheBuilder.newBuilder() there are two

{LogManager$Cleaner@1551} "Thread[Logging-Cleaner,5,FailOnTimeoutGroup]" -> {LogManager$Cleaner@1551} "Thread[Logging-Cleaner,5,FailOnTimeoutGroup]"
{Thread@1552} "Thread[Thread-0,5,main]" -> {Thread@1552} "Thread[Thread-0,5,main]"

The LogManager$Cleaner@1551 is created by java.util.logging.LogManager (https://github.com/openjdk/jdk/blob/jdk8-b120/jdk/src/share/classes/java/util/logging/LogManager.java#L253) because com.google.common.cache.CacheBuilder uses java.util.logging (https://github.com/google/guava/blob/v28.2/guava/src/com/google/common/cache/CacheBuilder.java#L226).

Note that it also belongs to FailOnTimeoutGroup created by org.junit.rules.Timeout (https://github.com/junit-team/junit4/blob/r4.13/src/main/java/org/junit/internal/runners/statements/FailOnTimeout.java#L123).

So start of this hook after the test throws IllegalThreadStateException (at https://github.com/openjdk/jdk/blob/jdk8-b120/jdk/src/share/classes/java/lang/Thread.java#L709), because the group is already destroyed (https://github.com/junit-team/junit4/blob/r4.13/src/main/java/org/junit/internal/runners/statements/FailOnTimeout.java#L140).

This exception stops the execution of subsequent hooks (https://github.com/openjdk/jdk/blob/jdk8-b120/jdk/src/share/classes/java/lang/ApplicationShutdownHooks.java#L101-L103) and silently ignored (https://github.com/openjdk/jdk/blob/jdk8-b120/jdk/src/share/classes/java/lang/Shutdown.java#L115-L129).

The sequence in which hooks start depends on keySet (https://github.com/openjdk/jdk/blob/jdk8-b120/jdk/src/share/classes/java/lang/ApplicationShutdownHooks.java#L97) of IdentityHashMap (https://github.com/openjdk/jdk/blob/jdk8-b120/jdk/src/share/classes/java/lang/ApplicationShutdownHooks.java#L39) and that's why varies from run to run.

This doesn't happen in case of JUnit 4.12, because destroy of FailOnTimeoutGroup was added in JUnit 4.13 (https://github.com/junit-team/junit4/commit/63fd27710ee2bab6fdf00965f41ef12b79120c70). And this change also explains the difference between 4.12 and 4.13 observed during investigations (https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/jacoco/SHw3R6cvZSs/yaFgd7XUAQAJ) for the following example:

import org.junit.Rule;
import org.junit.Test;
import org.junit.rules.Timeout;

import java.util.concurrent.TimeUnit;

public class ExampleTest {
  @Rule
  public Timeout globalTimeout = new Timeout(5, TimeUnit.MINUTES);

  @Test
  public final void test() {
    Runtime.getRuntime().addShutdownHook(new Thread(new Runnable() {
      @Override
      public void run() {
        System.err.println("hook executed");
      }
    }));
  }
}

The same can be demonstrated without JUnit as following

import java.util.logging.LogManager;

public class Example {
  public static void main(String[] args) throws Exception {
    Runtime.getRuntime().addShutdownHook(new Thread(new Runnable() {
      @Override
      public void run() {
        System.err.println("example hook");
      }
    }));

    ThreadGroup group = new ThreadGroup("example group");
    Thread thread = new Thread(group, new Runnable() {
      @Override
      public void run() {
        LogManager.getLogManager();
      }
    }, "example thread");
    thread.start();
    thread.join();
    group.destroy();
  }
}

So I was also thinking about creation of a ticket in OpenJDK, and while thinking about what can be suggested in such ticket, found already existing https://bugs.openjdk.java.net/browse/JDK-6704979

Nevertheless wondering if there is a chance to do something about this in JUnit.

@Godin Thanks for the detailed analysis! 👍

@kcooney Do you know/remember what the intention of using a ThreadGroup here was?

@marcphilipp my guess is this breakage was caused by https://github.com/junit-team/junit4/commit/63fd27710 which was proposed by @elrodro83

The ThreadGroup was added in https://github.com/junit-team/junit4/commit/1768d08d0

@akarnokd As a workaround, have you tried initializing JUL early, e.g. in a static initializer of a base test class?

No. I rarely run the full project coverage locally and so far, Travis CI managed to produce coverage data. I worked around by commenting out those CacheBuilder tests while I needed to work with the class being tested.

caused by 63fd277

I.e., #1517, for the backlink.

Hey, everyone. @jglick suggested in #1517 that we roll back #1517 or make it opt-in.

My experience with the ThreadGroup.destroy()of PR #1517 isn't positive as well.

In Eclipse Californium it causes unit tests to fail randomly .
The randomness is caused by a "race" of terminating threads and the call of ThreadGroup.destroy().
And there is also a not to obvious order condition between the Timeout rule and static ThreadGroups. If the group is instantiated outside the Timeout rule, it's preserved, otherwise destroyed.

In my opinion, if such a feature is required, please

  1. add a "opt-in" (or "opt-out").
  2. One simple idea maybe to not "destroy", if any of the sub-groups are "none-daemon-groups" (daemon groups are destroy with the last terminating thread, for these groups the destroy should work. For none-daemon-groups, it may cause trouble, if such thread-groups are destroyed, they maybe considered to be static.)
  3. prevent/reduce the race-condition with terminating threads. One idea maybe to use a "ThreadGroup.interrupt()" before joining the "Thread". Maybe it's required to first join all Threads of the groups and sub-groups.

As temporary workaround I try to ensure, that all "static" thread-groups are initialized before any Timeout rule.
But that's not too easy to see.

I think an opt-out flag as a system property might suffice given the relatively low number of reports of this problem.

@kcooney What do you suggest we do?

The Timeout rule has already a Builder. FMPOV it's easier to use, if a parameter is added there.

It's possible there are low reports of this problem because of low usage of 4.13. I don't think we had many reports of the ThreadGroup leakages either, and JUnit as been using ThreadGroup since 4.12 (see https://github.com/junit-team/junit4/commit/1768d08).

If we hadn't released 4.13, I would suggest making the new behavior opt-in. If we make it opt-in now, then that seems like a visible enough change that we should make the new release 4.14.

If we make it opt in or opt out, I agree that we should use the Builder. JUnit 4.x rarely uses system properties to control behavior, and David Saff has objected to adding them in the past, so I'm not sure if that's a good option.

I wonder if instead of destroying the ThreadGroup we should just mark it as a daemon group. I think that would resolve the original issue (child ThreadGroups inherit the parent's daemon status) and we could do that change in a bug fix release.

I think making it a daemon thread group is a good idea. Does someone here have time to work on a PR?

Can we somehow vote for 4.13.2 to be released sooner because of this? This is somewhat blocking me because of the security issues < 4.13.1. Thanks much for the work here.

FYI my minimum reproducible example can be found on this stackoverflow post and this github repo.

@j256 Thanks for the feedback. We haven't discussed timelines yet. I've started associating things with the 4.13.2 milestone.

The following workaround works for me:

/** Workaround for the already fixed JUnit 4 issue #1652 (<https://github.com/junit-team/junit4/issues/1652>):
 * Can be used in JUnit 4.13 and 4.13.1 in tests with a timeout to prevent the thread group from being destroyed
 * and thus JaCoCo (used e.g. by EclEmma) loosing its collected data. */
public static void workaround_for_junit4_issue_1652() {
    String version = junit.runner.Version.id();
    if (!"4.13.1".equals(version) && !"4.13".equals(version))
        fail("Workaround for JUnit 4 issue #1652 required for JUnit 4.13 and 4.13.1 only; actual version: "
           + version);
    Thread thread = Thread.currentThread();
    if (!"Time-limited test".equals(thread.getName())) fail("Workaround only required for a test with a timeout.");
    new Thread(thread.getThreadGroup(), new Runnable() {
        @Override
        public void run() {
            try {
                while (!thread.isInterrupted()) Thread.sleep(100);
            } catch (InterruptedException e) {}
        }
    }).start();
}

@Test(timeout = 42)
public void test() {
    workaround_for_junit4_issue_1652();
    // ...
}

I'm still waiting for 4.13.2 with this fix. Any update @kcooney? Again, because of the security problems before 4.13.1, I'm stuck with now with a lot of projects that aren't publishing CI unit test information because of this bug. Thanks.

@j256 4.13.2 is now released.

Thanks much @marcphilipp !

Was this page helpful?
0 / 5 - 0 ratings