Libseccomp: BUG: compatibility with openmp

Created on 2 Sep 2017  ·  19Comments  ·  Source: seccomp/libseccomp

The non-deterministic nature of multi-threading makes this one hard to debug, but I think, I have arrived at a rather reproducible test case seccomp_omp.c.gz. Compile the program with -O0 -ggdb -fopenmp.

If we execute it with just one thread and let it do the forbidden syscall, the program dies as expected.

$ ./seccomp_omp -t 1 -k 0      
86195973
[1]    10103 invalid system call (core dumped)  ./seccomp_omp -t 1 -k 0

However, given more threads it just ceases to function.

$ ./seccomp_omp -t 2 -k 0
86198868
86195973
86200317
^C

The program doesn't get killed, it just stops doing anything, at all. I am at a loss what happened to the SIGSYS signal, and why it doesn't get handled?

This might not be a libseccomp issue at all, but I am not proficient enough with seccomp, signals and the threading system in general to debug the root cause. Hope you can make some sense of it.

bug prioritmedium

All 19 comments

NOTE: I haven't looked at your test case yet, I'm just guessing based on your well-written description

Considering the multi-threaded nature of this, have you tried setting the SCMP_FLTATR_CTL_TSYNC filter attribute to true before loading the filter into the kernel?

I did not set SCMP_FLTATR_CTL_TSYNC because I add the seccomp filter, before splitting into threads. Thus, the kernel should apply the filter to all threads, anyway (and apparently it does do that). Adding SCMP_FLTATR_CTL_TSYNC to the testcase makes no difference (which is less than 100 lines with pedantic error handling, btw).

Okay, I just wanted to mention it; it sounds like you are doing things correctly (setting the filter before spawning new threads). I'll have to take a closer look, but I may not get a chance to do that very soon.

A confirmation that I am not doing things blatantly wrong is good enough for me. Take your time!

In the single-threaded example, ./seccomp_omp -t 1 -k 0, openmp recognizes that only a single thread is going to run, so openmp bypasses much of the synchronization that it would normally do in a multi-threaded for loop. I verified this by observing the syscalls that were processed in seccomp_run_filters() in the kernel. As one would expect, I saw a call to __NR_write() and a call to __NR_madvise() which prompted seccomp to instruct the kernel to kill the thread.

In the multi-threaded example, ./seccomp_omp -t 2 -k 0, openmp tries to parallelize the for loop. Thus much more of the openmp library is utilized. This is apparent when again watching syscalls through seccomp_run_filters(). __NR_mmap(), __NR_mprotect(), __NR_clone(), __NR_futex(), and more syscalls are called prior to the call to madvise(). Ultimately one of the two threads finally calls madvise() and seccomp properly kills that thread. But openmp has synchronization between the threads and before the second thread can call madvise() (and get killed), the second thread calls futex() as it is waiting on something from the dead thread. And this is why the program hangs. One thread has been killed and the second thread is waiting for data (from the dead thread) that will never arrive.

I have not worked much with openmp, but there seems to be some discussion [1, 2, 3, ...] on signals in parallel blocks. Ultimately it looks like a difficult problem that you would be best to avoid.

It seems like there are a couple potential easy solutions:

  1. Don't use SCMP_ACT_KILL as your default handler. Rather, switch to using an error code instead, e.g. SCMP_ACT_ERROR(5). I made this change in your example program, and verified that it terminated properly.

  2. If you don't need the power of openmp, another option may be to switch to using a more common solution like pthread_create() or fork()

@drakenclimber so it sounds like the problem is really just openmp making additional syscalls that may not normally be part of the filter? Or is that missing a larger point?

@drakenclimber Thank you very much for investing your time. Waiting for a dead thread explains the symptoms.

To give more context: I write scientific code, thus I like to keep things simple with OpenMP. However, I also want to protect my users from themselves. Thus if my program does anything out of the ordinary, just nuke it. I am guessing, what I ultimately want to achieve is to just kill the process (#96) with a somewhat reasonable error message.

What value should I use for errno in SCMP_ACT_ERROR to closely mimic SECCOMP_RET_KILL_PROCESS? Is there something that works fine across all syscalls?

@pcmoore - kinda. But I think the bigger issue is that openmp does not handle signals gracefully inside of its parallel construct. I would guess this is by design.

@kloetzl - No worries - you can totally stick with OpenMP. It looks like others in the OpenMP community are doing something like the following:

ctx = seccomp_init(SCMP_ACT_ERRNO(ENOTBLK));
...
bool kill_process = false;
int rc;
#pragma omp parallel for num_threads(THREADS)
  for (int i = 0; i < THREADS * 2; i++) {
    fprintf(stderr, "%u\n", func(i));
    if (i == K) {
      rc = madvise(NULL, 0, 0); 
      if (rc < 0)
        kill_process = true;
      }   
    // sleep(10);
  }

  if (kill_process)
    goto error;

  seccomp_release(ctx);
  return 0;
error:
  seccomp_release(ctx);
  return -1; 
}

As for what errno should SCMP_ACT_ERRNO() return, it's really up to you. SCMP_ACT_ERRNO() will work across all syscalls. And your program will be the one handling it and returning an error back to the user, so you can choose whatever errno works best for you. In fact, picking an obscure one - say ENOTBLK - could be a way that you know the error came from seccomp blocking the call.

tl;dr - Detect the problem in the parallel loop, but wait to handle it until after the loop completes. You may need to add extra defensive coding within the loop due to a failing syscall.

@kloetzl - I'll look into issue #96 and see how well that plays with OpenMP. That could be another solution as well. Thanks!

And your program will be the one handling it and returning an error back to the user, so you can choose whatever errno works best for you. In fact, picking an obscure one - say ENOTBLK - could be a way that you know the error came from seccomp blocking the call.

The thing is, madvise is called deep in the bowels of malloc. Thus, I am not the one handling the error, glibc is. So the question is, does glibc (and all of the other libraries doing syscalls) know that a syscall can return other errors than given in its man page and handle it appropriately?

The thing is, madvise is called deep in the bowels of malloc. Thus, I am not the one handling the error, glibc is. So the question is, does glibc (and all of the other libraries doing syscalls) know that a syscall can return other errors than given in its man page and handle it appropriately?

Ahhh... gotcha. I would have to look through the glibc code to be sure, but I would be willing to hazard the following guesses:

  • If madvise returns an error, glibc will _definitely_ also return an error
  • It's definitely possible glibc could return a different error than seccomp returned, thus you should use caution if you are expecting a "custom" return code like ENOTBLK

Long story short, glibc should not suppress _any_ error code, but it could return a different error code instead

I'm trying to keep up with you guys, but I fear my lack of background with OpenMP has me a bit behind ... based on the comments above it looks like KILL_PROCESS could be a workable solution here? If so, we can bump up the priority of that issue, although it is on my list of things to address before the v2.4 release.

@pcmoore - Yes, I believe KILL_PROCESS is a viable solution to this bug. I tested @kloetzl 's test program above using the KILL_PROCESS action and the hang no longer occurs.

I have it implemented but am currently hitting a couple snags in the python autotests. I should have a patch out next week.

@drakenclimber ooh, patches? I love patches :)

Thanks guys.

I can confirm that KILL_PROCESS solves the issue: I too hacked a local version of libseccomp on Arch and the test program fails as intended. However, providing solid tests and running them on different kernels might be the bigger challenge.

Thanks for the confirmation @kloetzl.

Yes, writing proper tests can be tedious, but it's important. Just as important as the code it test IMHO.

I'm looking forward to the PR from @drakenclimber, we can discuss things a bit more once that code is published.

I'm doing some COVID-19spring cleaning and I think we've resolved your issue, is that correct @kloetzl? I'm going to close this issue, but if I'm wrong and you are still seeing problems please let us know and we will reopen it!

Thanks for reminding me, I tested a bit on my end.

This issue is mostly fixed, with a small snag. The program no longer hangs in limbo when an invalid syscall is encountered, yay! With SCMP_ACT_TRAP one can even produce the name of the bad syscall. However, OpenMP overwrites my signal handler, so it falls back to the default error message once multiple threads are spawned. From a perspective of user experience I don't like this behavior, but think it is as good as it gets.

Ah, yes, I'm not sure there is much we can about the signal handler getting overwritten in libseccomp, sorry about that.

Thanks for letting us know the rest of it worked!

Was this page helpful?
0 / 5 - 0 ratings