Libseccomp: BUG: SCMP_CMP_GT/GE/LT/LE not working as expected for negative syscall arguments

Created on 18 Jan 2017  ·  20Comments  ·  Source: seccomp/libseccomp

Hi!

I'm not sure if the current behavior of SCMP_CMP_GT/GE/LT/LE is working as intended or if there is a bug in its implementation. The man page for seccomp_rule_add has only this to say about SCMP_CMP_GT:

SCMP_CMP_GT:
        Matches when the argument value is greater than the datum value,
        example:

        SCMP_CMP( arg , SCMP_CMP_GT , datum )

The man page does not specify the type for datum and has examples for various (implied) types (and one cast to scmp_datum_t).

Based on the man page, I expected something like this to work for any value given to setpriority's 3rd argument (assume default policy of SCMP_ACT_ALLOW for this):

rc = seccomp_rule_add(ctx, SCMP_ACT_ERRNO(EPERM),
        SCMP_SYS(setpriority),
        3,
        SCMP_A0(SCMP_CMP_EQ, PRIO_PROCESS),
        SCMP_A1(SCMP_CMP_EQ, 0),
        SCMP_A2(SCMP_CMP_GT, 0));

Instead, setpriority(PRIO_PROCESS, 0, -1) results in the syscall being blocked when '-1' is obviously less than '0'. setpriority(PRIO_PROCESS, 0, 0) and setpriority(PRIO_PROCESS, 0, 1) work as expected. What is happening is that '-1' is being converted to scmp_datum_t (uint64_t from secomp.h.in) which of course makes it positive, but SCMP_CMP_GT and friends aren't handling this conversion. SCMP_CMP_EQ works just fine with a negative datum (guessing datum is still positive (I didn't verify), but the comparison is between converted scmp_datum_t).

This behavior was confirmed with 2.1.0+dfsg-1 (Ubuntu 14.04 LTS, 3.13 kernel), 2.2.3-3ubuntu3 (Ubuntu 16.04 LTS, 4.9 kernel), 2.3.1-2ubuntu2 (Ubuntu 17.04 dev release, 4.9 kernel) and master from a few moments ago (on Ubuntu 17.04 dev release, 4.9 kernel), all on amd64.

AFAICT, there are no tests for SCMP_CMP_GT and SCMP_CMP_LE. The few tests for SCMP_CMP_LT don't seem to account for negative values and neither does the one for SCMP_CMP_GE (please correct me if I'm wrong).

The question is then: is this behavior intentional? If so, while I admit that it could be argued that the man page is accurate since these are working perfectly correctly when understanding scmp_datum_t is the data type, this situation is not immediately clear and the man page should probably say that applications need to account for this. Otherwise, this appears to be a bug in the implementation for SCMP_CMP_GT/GE/LT/LE.

Here is a small program that demonstrates this issue with SCMP_CMP_GT, though GE, LT and LE can all be observed to have the same behavior:

/*
 * gcc -o test-nice test-nice.c -lseccomp
 * sudo ./test-nice 0 1  # should be denied
 * sudo ./test-nice 0 0  # should be allowed
 * sudo ./test-nice 0 -1 # should be allowed?
 */
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <ctype.h>
#include <string.h>
#include <fcntl.h>
#include <stdarg.h>
#include <seccomp.h>
#include <sys/resource.h>

int main(int argc, char **argv)
{
    if (argc < 3) {
        fprintf(stderr, "test-nice N N\n");
        return 1;
    }

    int rc = 0;
    scmp_filter_ctx ctx = NULL;
    int filter_n = atoi(argv[1]);
    int n = atoi(argv[2]);

    // Allow everything by default for this test
    ctx = seccomp_init(SCMP_ACT_ALLOW);
    if (ctx == NULL)
        return ENOMEM;

    printf("set EPERM for nice(>%d)\n", filter_n);
    rc = seccomp_rule_add(ctx, SCMP_ACT_ERRNO(EPERM),
            SCMP_SYS(setpriority),
            3,
            SCMP_A0(SCMP_CMP_EQ, PRIO_PROCESS),
            SCMP_A1(SCMP_CMP_EQ, 0),
            SCMP_A2(SCMP_CMP_GT, filter_n));

    if (rc != 0) {
        perror("seccomp_rule_add failed");
        goto out;
    }

    rc = seccomp_load(ctx);
    if (rc != 0) {
        perror("seccomp_load failed");
        goto out;
    }

    // try to use the filtered syscall
    errno = 0;
    printf("Attempting nice(%d)\n", n);
    nice(n);
    if (errno != 0) {
        perror("could not nice");
        if (filter_n > n)
            fprintf(stderr, "nice(%d) unsuccessful. bug?\n", n);
        rc = 1;
        goto out;
    } else
        printf("nice(%d) successful\n", n);

out:
    seccomp_release(ctx);

    return rc;
}
bug prioritmedium

All 20 comments

Thanks for the problem report; that's a good one.

By any chance, have you tried writing the reproducer using the headers/macros in the kernel's samples/seccomp directory?

I was under the impression that the BPF code in the kernel treated the immediate values as signed; that may not be the case, or I may have screwed something up in the libseccomp code.

FWIW, BPF itself uses u32 for its arguments. Does libseccomp do sign extension on compat arguments? (It probably shouldn't, but then rules to match "-1" have to be different between 32-bit and 64-bit...)

The issue that is worrying me right now is the BPF GT/GE comparisons in the jump operator, especially since I suspect most everyone has been treating the BPF immediate as a signed value for these comparisons.

@kees what is the recommended approach to doing signed comparisons of syscall arguments with the kernel's seccomp-bpf machine? I'm hoping it isn't something along the lines of "check the high-bit first and then do the necessary two's compliment conversion before comparing negative numbers". While annoying, we can always change libseccomp to generate the necessary BPF (although the generated filters will now be much larger in some cases), but I worry about the applications which create their own BPF filters; the odds of them handling this correctly are probably not very good.

Unfortunately, since syscall arguments are "unsigned long" (see syscall_get_arguments() and struct seccomp_data), there isn't any common case for how a syscall handles sign conversions. Some syscalls when crossing the compat barrier will do sign extension, others (prctl) don't. Are there a lot of negative-but-not-minus-one syscall arguments?

Getting back to this today, and having played with things a bit more this morning, I think this is going to end up a documentation/"be careful!" issue as there isn't a good solution, especially when we are talking about existing users. Let me try to provide some libseccomp background/explanation to go along with @kees' helpful comments from the kernel side.

FWIW, BPF itself uses u32 for its arguments. Does libseccomp do sign extension on compat arguments? (It probably shouldn't, but then rules to match "-1" have to be different between 32[-bit and 64-bit...)

The libseccomp API rule functions interpret all take immediate values as _uint64_t_ so if you are careless with your types/casting you may run into problems. Example:

$ cat 00-test.c
    /* ... */
    seccomp_rule_add_exact(ctx, SCMP_ACT_KILL, 1000, 1,
                           SCMP_A0(SCMP_CMP_GT, -1));
    seccomp_rule_add_exact(ctx, SCMP_ACT_KILL, 1001, 1,
                           SCMP_A0(SCMP_CMP_GT, (uint32_t)-1));
    seccomp_rule_add_exact(ctx, SCMP_ACT_KILL, 1002, 1,
                           SCMP_A0(SCMP_CMP_GT, 0xffffffff));
    /* ... */
$ make 00-test
  CC       00-test.o
  CCLD     00-test
$ ./00-test -p
  #
  # pseudo filter code start
  #
  # filter for arch x86_64 (3221225534)
  if ($arch == 3221225534)
    # filter for syscall "UNKNOWN" (1002) [priority: 65533]
    if ($syscall == 1002)
      if ($a0.hi32 >= 0)
        if ($a0.lo32 > 4294967295)
          action KILL;
    # filter for syscall "UNKNOWN" (1001) [priority: 65533]
    if ($syscall == 1001)
      if ($a0.hi32 >= 0)
        if ($a0.lo32 > 4294967295)
          action KILL;
    # filter for syscall "UNKNOWN" (1000) [priority: 65533]
    if ($syscall == 1000)
      if ($a0.hi32 >= 4294967295)
        if ($a0.lo32 > 4294967295)
          action KILL;
    # default action
    action ALLOW;
  # invalid architecture action
  action KILL;
  #
  # pseudo filter code end
  # 
$ ./00-test -b | ../tools/scmp_bpf_disasm 
   line  OP   JT   JF   K
  =================================
   0000: 0x20 0x00 0x00 0x00000004   ld  $data[4]
   0001: 0x15 0x00 0x0c 0xc000003e   jeq 3221225534 true:0002 false:0014
   0002: 0x20 0x00 0x00 0x00000000   ld  $data[0]
   0003: 0x35 0x0a 0x00 0x40000000   jge 1073741824 true:0014 false:0004
   0004: 0x15 0x00 0x02 0x000003e8   jeq 1000 true:0005 false:0007
   0005: 0x20 0x00 0x00 0x00000014   ld  $data[20]
   0006: 0x35 0x04 0x06 0xffffffff   jge 4294967295 true:0011 false:0013
   0007: 0x15 0x01 0x00 0x000003e9   jeq 1001 true:0009 false:0008
   0008: 0x15 0x00 0x04 0x000003ea   jeq 1002 true:0009 false:0013
   0009: 0x20 0x00 0x00 0x00000014   ld  $data[20]
   0010: 0x35 0x00 0x02 0x00000000   jge 0    true:0011 false:0013
   0011: 0x20 0x00 0x00 0x00000010   ld  $data[16]
   0012: 0x25 0x01 0x00 0xffffffff   jgt 4294967295 true:0014 false:0013
   0013: 0x06 0x00 0x00 0x7fff0000   ret ALLOW
   0014: 0x06 0x00 0x00 0x00000000   ret KILL

... as we can see, if you use appropriate casting then the value is not sign-extended. However, I expect this is not what most people are doing. The good news is that I imagine the number of syscalls that take negative arguments to be relatively small, so the impact should be somewhat limited.

Moving forward we definitely need to put something in the docs about this and see if we can do something to make life easier for developers, perhaps implement 32-bit variants of the _SCMP_A*_ macros.

@pcmoore - thanks for the detailed response and sorry for not getting back sooner. No, I did not try writing a reproducer based on https://github.com/torvalds/linux/tree/master/samples/seccomp yet, but based on your feedback it sounds like I don't need to. Let me know if you need anything else. For now, I'll take the 'be careful' approach and report back if I have any issues, and look forward to how you might make this easier to get right in the future.

@jdstrand I think we are all set for the time being. Thanks again for the report, I'm sorry I didn't have a better answer for you, but hopefully we'll have something in the future.

In the meantime, if you run into any problems once with a properly type cast value please feel free to update this issue.

The good news is that I imagine the number of syscalls that take negative arguments to be relatively small, so the impact should be somewhat limited.

I just ran into this problem when checking (among other things) whether openat()'s fd parameter is equal to the special value AT_FDCWD which is -100. This lead to:

  # filter for syscall "openat" (257) [priority: 131067]
  if ($syscall == 257)
    if ($a0.hi32 == 4294967295)
      if ($a0.lo32 == 4294967196)
        if ($a2.hi32 & 0x00000000 == 0)
          if ($a2.lo32 & 0x00000003 == 0)
            action ERRNO(2);

Where it should be:

  # filter for syscall "openat" (257) [priority: 131067]
  if ($syscall == 257)
    if ($a0.hi32 == 0)
      if ($a0.lo32 == 4294967196)
        if ($a2.hi32 & 0x00000000 == 0)
          if ($a2.lo32 & 0x00000003 == 0)
            action ERRNO(2);

Since glibc 2.26+ seems to exclusively use the openat syscall with AT_FDCWD to implement open() this might trip up a lot of people. Applying a cast to uint32_t as suggested above fixed the problem for me:

        // selector, action, syscall, no of args, args
        { SEL, SCMP_ACT_ERRNO(ENOENT), "openat", 2,
-               { SCMP_A0(SCMP_CMP_EQ, AT_FDCWD), /* glibc 2.26+ */
+               { SCMP_A0(SCMP_CMP_EQ, (uint32_t)AT_FDCWD), /* glibc 2.26+ */
                  SCMP_A2(SCMP_CMP_MASKED_EQ, O_ACCMODE, O_RDONLY) }},

An explicit SCMP_A0_U32 sure would be nice to have.

@drakenclimber @jdstrand @michaelweiser what do you guys think of https://github.com/pcmoore/misc-libseccomp/commit/b9ce39d776ed5a984c7e9e6db3b87463edce82a7 as a fix for this?

@pcmoore: Thank you for continuing to look into this! I just gave it a whirl and it looks really nice in code:

static struct {
        const uint64_t promises;
        const uint32_t action;
        const char *syscall;
        const int arg_cnt;
        const struct scmp_arg_cmp args[3];
} scsb_calls[] = {
[...]
        { PLEDGE_WPATH, SCMP_ACT_ALLOW, "openat", 2, /* glibc 2.26+ */
                { SCMP_A0_32(SCMP_CMP_EQ, AT_FDCWD),
                  SCMP_A2_64(SCMP_CMP_MASKED_EQ, O_ACCMODE, O_WRONLY) }},

Unfortunatey it seems the helper function isn't suitable as a struct initializer:

In file included from pledge.c:42:
/include/seccomp.h:230:26: error: initializer element is not constant
 #define SCMP_CMP32(...)  (__scmp_arg_32(SCMP_CMP64(__VA_ARGS__)))
                          ^
/include/seccomp.h:241:26: note: in expansion of macro ‘SCMP_CMP32’
 #define SCMP_A0_32(...)  SCMP_CMP32(0, __VA_ARGS__)
                          ^~~~~~~~~~
pledge.c:188:5: note: in expansion of macro ‘SCMP_A0_32’
   { SCMP_A0_32(SCMP_CMP_EQ, AT_FDCWD),
     ^~~~~~~~~~
/include/seccomp.h:230:26: note: (near initialization for ‘scsb_calls[21].args[0]’)
 #define SCMP_CMP32(...)  (__scmp_arg_32(SCMP_CMP64(__VA_ARGS__)))
                          ^
/include/seccomp.h:241:26: note: in expansion of macro ‘SCMP_CMP32’
 #define SCMP_A0_32(...)  SCMP_CMP32(0, __VA_ARGS__)
                          ^~~~~~~~~~
pledge.c:188:5: note: in expansion of macro ‘SCMP_A0_32’
   { SCMP_A0_32(SCMP_CMP_EQ, AT_FDCWD),
     ^~~~~~~~~~

Thanks for the review @michaelweiser, unfortunately I didn't think people were using this macros as initializers, but that is a valid use and surely there are a few people using it this way.

I'll need to think about this for a little bit ... did you have any ideas about how to solve this in an elegant manner?

No idea, sorry, I already had my eyes propped open with matches. :)

Looking at it now I think I see the problem: Due to the variable argument lists you can't inject the necessary casts, right?

Could scmp_arg_cmp perhaps contain a union giving different views on the data with the correct width, alignment (and perhaps even byte order) (which IMO kinda conflicts with "elegant"):? If it's purely internal to libseccomp and doesn't need to be compatible to a kernel interface, could it carry a data type indicator as a separate field and have the user functions sort it out? And could that even be initialised using varargs?

Otherwise, instead of marking the operation as a whole 32/64 bit, could the operands be annotated, wrapping a cast and giving stern recommendation to the user (as you do) to always use those annotations on penalty of running into hard to debug problems?

{ SCMP_A0(SCMP_CMP_EQ, SCMP_OP_32(AT_FDCWD)),
  SCMP_A2(SCMP_CMP_MASKED_EQ, SCMP_OP_64(O_ACCMODE), SCMP_OP_64(O_WRONLY)) }},

or

{ SCMP_A0(SCMP_CMP_EQ, SCMP_OP1_32(AT_FDCWD)),
  SCMP_A2(SCMP_CMP_MASKED_EQ, SCMP_OP2_64(O_ACCMODE, O_WRONLY)) }},

I'm not enough of a preprocessor crack to come up with much more, sorry.

@pcmoore, the changes look good to me. I'm no preprocessor expert, but I'll take a look at the issue that @michaelweiser mentioned above

Looking at it now I think I see the problem: Due to the variable argument lists you can't inject the necessary casts, right?

Yep, that's pretty much it. Maybe there is a non-terrible way around that, but I haven't found one yet.

Could scmp_arg_cmp perhaps contain a union giving different views on the data with the correct width, alignment (and perhaps even byte order) (which IMO kinda conflicts with "elegant"):? If it's purely internal to libseccomp and doesn't need to be compatible to a kernel interface, could it carry a data type indicator as a separate field and have the user functions sort it out? And could that even be initialised using varargs?

We have a problem in that the scmp_arg_cmp struct is part of the libseccomp API so unless we want to bump the libseccomp major version we can't really change size of the struct or the offset of any of the member fields; doing so would break the existing binary interface with existing applications. Converting the 64-bit datum fields into a union containing either a 64-bit or 32-bit value should be okay in itself, but you would also need to add some additional information to the scmp_arg_cmp struct to indicate which of the union member to use; it is this extra flag which could be problematic.

It might be possible to steal some bits from the either the "arg" or "op" fields, both are 32-bit values and are using only a fraction of that space. However, I consider that a rather extreme option and I'd like to avoid that if possible.

Otherwise, instead of marking the operation as a whole 32/64 bit, could the operands be annotated, wrapping a cast and giving stern recommendation to the user (as you do) to always use those annotations on penalty of running into hard to debug problems?

I don't quite understand what we would gain by wrapping the operands with a macro, can you elaborate a bit more? We could provide a macro to wrap the datum values, but that isn't really any different than just asking callers to provide the proper casting.

@pcmoore, the changes look good to me. I'm no preprocessor expert, but I'll take a look at the issue that @michaelweiser mentioned above

Great, thanks. Hopefully between the three of us we can come up with something useful here.

@pcmoore: Looking at http://efesx.com/2010/07/17/variadic-macro-to-count-number-of-arguments/ and http://efesx.com/2010/08/31/overloading-macros/ I come up with the following:

#define VA_NUM_ARGS(...) VA_NUM_ARGS_IMPL(__VA_ARGS__, 5,4,3,2,1)
#define VA_NUM_ARGS_IMPL(_1,_2,_3,_4,_5,N,...) N
#define macro_dispatcher(func, ...) \
            macro_dispatcher_(func, VA_NUM_ARGS(__VA_ARGS__))
#define macro_dispatcher_(func, nargs) \
            macro_dispatcher__(func, nargs)
#define macro_dispatcher__(func, nargs) \
            func ## nargs

#define SCMP_CMP64(...)         ((struct scmp_arg_cmp){__VA_ARGS__})

#define SCMP_CMP32_1(x)                 SCMP_CMP64(x)
#define SCMP_CMP32_2(x, y)              SCMP_CMP64(x, y)
#define SCMP_CMP32_3(x, y, z)           SCMP_CMP64(x, y, (uint32_t)(z))
#define SCMP_CMP32_4(x, y, z, q)        SCMP_CMP64(x, y, (uint32_t)(z), (uint32_t)(q))
#define SCMP_CMP32(...) macro_dispatcher(SCMP_CMP32_, __VA_ARGS__)(__VA_ARGS__)

#define SCMP_A0_64(...)         SCMP_CMP64(0, __VA_ARGS__)
#define SCMP_A0_32(...)         SCMP_CMP32(0, __VA_ARGS__)

For this test case:

        struct scmp_arg_cmp f[] = {
                SCMP_A0_64(SCMP_CMP_EQ, 1, 20),
                SCMP_A0_32(SCMP_CMP_EQ, 2, 3),
                SCMP_A0_32(SCMP_CMP_LT, 2),
        };

it comes out of gcc-7.4.0 -E and clang-7 -E as:

 struct scmp_arg_cmp f[] = {
  ((struct scmp_arg_cmp){0, SCMP_CMP_EQ, 1, 20}),
  ((struct scmp_arg_cmp){0, SCMP_CMP_EQ, (uint32_t)(2), (uint32_t)(3)}),
  ((struct scmp_arg_cmp){0, SCMP_CMP_LT, (uint32_t)(2)}),
 };

Under the assumption that SCMP_A[0-5]_43 needs at least op to work and SCMP_CMP32 requires arg, two lines can be saved by making those parameters positional:

#define SCMP_CMP32_1(x, y, z)           SCMP_CMP64(x, y, (uint32_t)(z))
#define SCMP_CMP32_2(x, y, z, q)        SCMP_CMP64(x, y, (uint32_t)(z), (uint32_t)(q))
#define SCMP_CMP32(x, y,...)            macro_dispatcher(SCMP_CMP32_, __VA_ARGS__)(x, y, __VA_ARGS__)

#define SCMP_A0_32(x,...)       SCMP_CMP32(0, x, __VA_ARGS__)

Well done @michaelweiser! Did you want to put together a PR so we can review/comment on the changes a bit easier? If not, that's perfectly fine, I'll throw one together and make sure you get plenty of credit :)

I'll make the PR tonight's project. On top of https://github.com/pcmoore/misc-libseccomp/commit/b9ce39d776ed5a984c7e9e6db3b87463edce82a7 or from scratch?
How do we credit the Blogger Roman for his overloading solution? Found what seems to be the current home of his blog at https://kecher.net/overloading-macros/. His post looks like quite the original source. Comment with link to the post above the macro_dispatcher logic?

I'll make the PR tonight's project. On top of pcmoore@b9ce39d or from scratch?

Great, thank you! Go ahead and base it off of the master branch, I never merged the stuff in my misc-libseccomp tree, and don't plan to at this point since your approach is much better.

How do we credit the Blogger Roman for his overloading solution? Found what seems to be the current home of his blog at https://kecher.net/overloading-macros/. His post looks like quite the original source. Comment with link to the post above the macro_dispatcher logic?

We generally don't credit people directly in the source, unless there is some license requirement; I would recommend adding a comment in the in the patch description crediting Roman with the basic idea and providing a link to his blog post. I don't see any license or restrictions placed on his examples, so I don't believe there are any problems in that regard, and based on a sampling of his blog I believe his intent is to share these ideas with others (like us) to help them solve their problems. If you have an email address for Roman you could always try sending him some email; if we can't reach him for any reason I think it's okay to proceed.

Resolved via 80a987d6f8d0152def07fa90ace6417d56eea741.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

drakenclimber picture drakenclimber  ·  18Comments

erdumbledore picture erdumbledore  ·  3Comments

grubeli picture grubeli  ·  3Comments

mvo5 picture mvo5  ·  18Comments

drakenclimber picture drakenclimber  ·  10Comments