Libseccomp: RFE: Support "maximum kernel version"

Created on 21 Aug 2015  ·  14Comments  ·  Source: seccomp/libseccomp

As system calls are added to the kernel, I feel there is not enough discussion by default of the wide variety of applications that will suddenly gain access to a new attack surface.

The canonical example here is perf_event_open(), the source of numerous CVEs. While perf is awesome, my (e.g.) web server should not (by default) be able to use it.

It's possible to use seccomp today to blacklist. whitelists can get very difficult to manage.

One thing that might be useful is a filter for any system calls newer than a particular kernel version, say 3.10. That way, each new system call would have to be verified for use in e.g. containers before it's added. Upgrading the kernel wouldn't suddenly expose containers to new attack surface.

In a discussion with @pcmoore he indicated this could be another annotation in the struct in e.g. arch-x86-syscalls.c.

enhancement pendininfo prioritmedium

Most helpful comment

It looks like issue #286 is the concrete issue to help drive this work forward ... even if it has been almost five years ;)

I think the first step towards this is to add a new field to the syscalls.csv file that indicates when the syscall was first introduced. That is going to be a good chunk of work as we currently have ~469 syscalls defined (!). However, we could amortize this work for the existing syscalls with an "undefined" value that we would treat simply as the syscall being created at the dawn of time. Of course all new additions to the syscalls.csv table would need to be added with the kernel version.

Some more quick thoughts:

  • syscall.csv format
#syscall (v5.8.0-rc5 2020-07-14),kver_min,x86,x86_64,...
accept,<version>,PNR,43,...

... where <version> could be something like "5_8", "UNDEF", or similar.

  • version tokens
enum kernel_version {
    KV_UNDEF = 0,
    KV_1_0,
    KV_1_1,
    KV_1_3,
    KV_2_0,
    ...
    KV_5_8,
    _KV_MAX,
};

All 14 comments

+1
That would help make blacklists usable for mitigation of security issues.

@nmav to be clear, this RFE is for adding information to the internal syscall tables about when the syscall was first introduced to the Linux kernel, not for adding logic to determine if the current running kernel supports a given syscall. However, if you are trying to block a syscall, you can do so with libseccomp regardless of if it is supported on a particular arch/ABI and kernel version, libseccomp will do the right thing for you.

This RFE is almost five years old, and outside of a single discussion with @cgwalters I haven't seen or heard of much other interest in such a feature. With plenty of other open issues, most with higher priority, it is not clear when we would work on this, or even if such a thing would be a useful addition.

@cgwalters and @drakenclimber what do you think of this issue in 2020? I'm tempted to close this as WONTFIX, but I would like to get some comments and feedback before we take that step.

@cgwalters and @drakenclimber what do you think of this issue in 2020? I'm tempted to close this as WONTFIX, but I would like to get some comments and feedback before we take that step.

Honestly I think this is a really cool idea. Several of my in-house customers are using allowlists because of this exact reason. If they were to use a denylist and a new syscall is added to the kernel, then that syscall would be another avenue of attack.

Let's leave it open for a bit longer. I'll ask around within Oracle and see if any customers are interested enough in this feature for me to pick it up. But @cgwalters (or anyone else for that matter) is totally welcome to own it if they have the time and interest :).

Okay, as long as there is interest, I've got no problem in keeping this one open.

I do still think it'd be useful!

It looks like issue #286 is the concrete issue to help drive this work forward ... even if it has been almost five years ;)

I think the first step towards this is to add a new field to the syscalls.csv file that indicates when the syscall was first introduced. That is going to be a good chunk of work as we currently have ~469 syscalls defined (!). However, we could amortize this work for the existing syscalls with an "undefined" value that we would treat simply as the syscall being created at the dawn of time. Of course all new additions to the syscalls.csv table would need to be added with the kernel version.

Some more quick thoughts:

  • syscall.csv format
#syscall (v5.8.0-rc5 2020-07-14),kver_min,x86,x86_64,...
accept,<version>,PNR,43,...

... where <version> could be something like "5_8", "UNDEF", or similar.

  • version tokens
enum kernel_version {
    KV_UNDEF = 0,
    KV_1_0,
    KV_1_1,
    KV_1_3,
    KV_2_0,
    ...
    KV_5_8,
    _KV_MAX,
};

Is kernel version the right thing to track? Is it guaranteed that newer syscalls are not backported to e.g. stable kernel branches with a lower version number?

Red Hat will backport all kinds of things to their kernels, so no.

If RedHat backports a syscall to an older kernel version, they can also patch their version of libseccomp to match. Though to be fair this might matter more for certain use-cases but as an approach to fixing #286 I think it's fairly workable. The other problem is that I'm not sure there's any better approach -- syscalls can be added in non-consecutive order (for instance openat2 was added before close_range -- though this example is kind of my fault).

If RedHat backports a syscall to an older kernel version, they can also patch their version of libseccomp to match.

Yes, exactly. The upstream libseccomp project has no control over the various enterprise Linux distributions and if those distributions decide to deviate from the upstream projects (either the Linux Kernel or libseccomp) they are on their own for support. While we will do our best to help, we can't sacrifice the upstream project in favor of these enterprise distributions with their own support and engineering staff.

As a point of reference, the syscalls(2) manpage has some historical information regarding when various syscalls were introduced into the kernel:

I can do the syscall spelunking to figure out a version number for each syscall -- the only question is whether we should have the version number be per-architecture since I'm pretty sure certain syscalls were added to different architectures in different releases.

... the only question is whether we should have the version number be per-architecture since I'm pretty sure certain syscalls were added to different architectures in different releases.

They most definitely were, and still are, as far as I can see. While it is going to be slightly annoying, and will definitely explode the CSV, tracking the syscall's first appearance for each arch/ABI is probably the right thing to do.

Any help you can provide on this @cyphar would be greatly appreciated.

Was this page helpful?
0 / 5 - 0 ratings