Libseccomp: Q: cannot have different filters with SCMP_ACT_NOTIFY in different threads

Created on 6 Nov 2020 · 9Comments · Source: seccomp/libseccomp

libseccomp stores the notification fd in a global variable: state.notify_fd. This makes it impossible to use libseccomp for a multi-threaded application with different filters in different threads (i.e. without using TSYNC).

libseccomp has seccomp_reset(NULL, ...) to reset the global variable state.notify_fd. But seccomp_reset() has the unfortunate consequence to reset state.nr_seccomp = -1.

This issue was noticed when trying to resolve unit tests in libseccomp-golang, see https://github.com/seccomp/libseccomp-golang/pull/59#issuecomment-723045033

cc @yvesf @rata

prioritlow question

Source

alban

Most helpful comment

It sounds like we are all in agreement here so I'm going to close this issue. @alban if you think we are missing something important, please let us know and/or re-open this issue so we can sort it all out.

Thanks everyone.

pcmoore on 12 Nov 2020

👍2

All 9 comments

@drakenclimber hold the upcoming v2.5.1 releases until we get some clarification on this. I'm not sure I completely understand the original report yet, but since v2.5.1 is the first release to implement the seccomp_reset(NULL, ...) concept, let's wait just a minute until we can verify things ...

pcmoore on 6 Nov 2020

Adding this to the v2.5.1 milestone simply as a blocker for right now, this may change as we investigate things.

pcmoore on 6 Nov 2020

@drakenclimber hold the upcoming v2.5.1 releases until we get some clarification on this. I'm not sure I completely understand the original report yet, but since v2.5.1 is the first release to implement the seccomp_reset(NULL, ...) concept, let's wait just a minute until we can verify things ...

Agreed. Will do.

drakenclimber on 6 Nov 2020

libseccomp stores the notification fd in a global variable: state.notify_fd. This makes it impossible to use libseccomp for a multi-threaded application with different filters in different threads (i.e. without using TSYNC).

Can you elaborate a bit more on that last sentence? I'm not sure I understand what you are trying to convey.

FWIW, the seccomp notification FD is a process global object, you can only request it from the kernel once. You can read https://github.com/seccomp/libseccomp/issues/273 to get some more background on the issue.

libseccomp has seccomp_reset(NULL, ...) to reset the global variable state.notify_fd. But seccomp_reset() has the unfortunate consequence to reset state.nr_seccomp = -1.

Why is resetting state.nr_seccomp = -1 a significant problem? If the nr_seccomp field is reset to -1 the next time a operation is requested that could make use of seccomp(2) the library checks to see if seccomp(2) is supported and uses it if it is available. Yes, it could result in extra calls to seccomp(2) but that shouldn't be a major issue, is it a concern for your use case?

pcmoore on 6 Nov 2020

@alban - Similar question from me. I spent a bit of time trying to invent a sensible multi-threaded, multi-seccomp-filter use case (in C) and really couldn't come up with anything.

I read through the Go documentation of os.LockOSThread(), and it makes sense to me. But I'm struggling to convert this knowledge into a multi-threaded, multi-seccomp-filter solution.

Could you share some pseudo-code or some high level design of what you're thinking? I would gladly prototype it up in C then.

drakenclimber on 6 Nov 2020

Can you elaborate a bit more on that last sentence? I'm not sure I understand what you are trying to convey.

While working on the unit tests in libseccomp-golang, I realised that the following scenario was being tested:

In a first execution of the unit test, a seccomp policy was applied without SECCOMP_FILTER_FLAG_TSYNC (meaning it is applied at the thread level and not at the process level) but with SECCOMP_FILTER_FLAG_NEW_LISTENER (so libseccomp will store the fd in state.notify_fd).
The same unit test is executed again in the same process but in a different thread (using runtime.LockOSThread in Go to make sure of that). But libseccomp reuses the fd from the previous seccomp filter (from state.notify_fd) instead of getting a new fd for the new filter. Then the test fails because we expect to receive the events on the wrong seccomp fd.

FWIW, the seccomp notification FD is a process global object, you can only request it from the kernel once. You can read #273 to get some more background on the issue.

From what I understood, the kernel restricts us to get only one seccomp notification FD in a filter tree. But in the scenario above, the two threads from the same process are using different filter trees, so that should be fine from the kernel perspective.

This scenario to use different filter trees with SECCOMP_FILTER_FLAG_NEW_LISTENER was not built on purpose, it just came out as a consequence of the libseccomp-golang unit tests running in the same process. But since the root cause is libseccomp using a global variable state.notify_fd that is shared between threads, I thought I should open this bug here to open the discussion. However I would be fine if this gets closed as "WONTFIX" (I don't know if other libseccomp users would need to have support for this kind of scenario). In that case, we can just write the libseccomp-golang unit tests in a different way (i.e. using a separate process for each test iteration); we would have to do that anyway (to avoid mixing thread level filters and process level filters - that would be rejected by the kernel).

alban on 8 Nov 2020

While working on the unit tests in libseccomp-golang, I realised that the following scenario was being tested:

In a first execution of the unit test, a seccomp policy was applied without SECCOMP_FILTER_FLAG_TSYNC (meaning it is applied at the thread level and not at the process level) but with SECCOMP_FILTER_FLAG_NEW_LISTENER (so libseccomp will store the fd in state.notify_fd).

The same unit test is executed again in the same process but in a different thread (using runtime.LockOSThread in Go to make sure of that). But libseccomp reuses the fd from the previous seccomp filter (from state.notify_fd) instead of getting a new fd for the new filter. Then the test fails because we expect to receive the events on the wrong seccomp fd.

Ah ha, that makes more sense now. I've often wondered if we should make TSYNC the default for the libseccomp-golang bindings; given the thread ambiguity in Go it seems like a much safer choice.

From what I understood, the kernel restricts us to get only one seccomp notification FD in a filter tree. But in the scenario above, the two threads from the same process are using different filter trees, so that should be fine from the kernel perspective.

This scenario to use different filter trees with SECCOMP_FILTER_FLAG_NEW_LISTENER was not built on purpose, it just came out as a consequence of the libseccomp-golang unit tests running in the same process. But since the root cause is libseccomp using a global variable state.notify_fd that is shared between threads, I thought I should open this bug here to open the discussion. However I would be fine if this gets closed as "WONTFIX" (I don't know if other libseccomp users would need to have support for this kind of scenario). In that case, we can just write the libseccomp-golang unit tests in a different way (i.e. using a separate process for each test iteration); we would have to do that anyway (to avoid mixing thread level filters and process level filters - that would be rejected by the kernel).

I'll be interested to get @drakenclimber's opinion on this, but yes my thinking right now is that this is a "WONTFIX" due to it being a rather bizarre corner case. If an application did want to do something like this, they could save the notification fd from filter tree "A", reset libseccomp, then retrieve and save the notification fd from filter tree "B".

In order to "fix" this in libseccomp, we would need to make libseccomp thread aware, which comes with a number of challenges and pitfalls, such that I'm currently of the opinion that this would be a Bad Idea. However, for the sake of argument, if we did make libseccomp thread aware we could also make the internal global state a thread specific state and/or possibly a filter tree specific state; in either of these cases I think the current API (including the seccomp_reset(NULL, ...)) would still be reasonable so I'm tempted to leave things as they are with the API. If anyone has any concerns over that, please let us know soon.

@drakenclimber? Barring any objections to the above, I think we're back on for the v2.5.1 release.

pcmoore on 11 Nov 2020

I'll be interested to get @drakenclimber's opinion on this, but yes my thinking right now is that this is a "WONTFIX" due to it being a rather bizarre corner case. If an application did want to do something like this, they could save the notification fd from filter tree "A", reset libseccomp, then retrieve and save the notification fd from filter tree "B".

I agree. I spent some time digging through @tych0's original patchset and comments as well as the kernel code itself. His envisioned use case is a monitoring process running the notification handler while containerized processes come and go. When these containerized processes invoke the syscall that has a notification on it, the monitoring process can do additional logic to allow/deny the request.

With that said, I tried come up with a multi-threaded, single-process use case that has multiple notification invokers and handlers. Honestly, I couldn't coherently concoct such a scenario. Since this is a rather contrived use case, and I can't figure out a realistic use case, we should mark this WONTFIX.

One aside - I was able to get the kernel to return multiple notify fds to a process. By creating multiple pthreads and having them all immediately load a seccomp filter with a notification action, occasionally I was able to get two or three different notification fds returned to the user space process. Since this is so unrealistic, I would lean toward marking this kernel issue as a WONTFIX as well.