Libseccomp: BUG: A2 Handling Broken by src/db.c Rework

Created on 26 Feb 2018 · 18Comments · Source: seccomp/libseccomp

To test out my proposed binary tree performance improvements, I wrote an unrealistic set of rules for read() and its buffer size argument (A2). But it appears that the src/db.c rework commit (ce3dda9a1) broke the A2 processing - at least for this test case.

Prior to the db rework commit, a read like the following - read(devzero_fd, buf, 8000) - returned -10. After this commit, it now returns -5.

Here's the C code I used to generate my silly read() rules:

        /* read */
        for (i = 5; i <= 12; i++) {
                rc = seccomp_rule_add(ctx, SCMP_ACT_ERRNO(i), SCMP_SYS(read), 1,
                        SCMP_A2(SCMP_CMP_GT, 4 << i));
                if (rc < 0) {
                        fprintf(stdout, "%s:%d Failed to add read rule %d : rc = %d\n",
                                __FUNCTION__, __LINE__, i, rc);
                        goto error;
                }   
        }   
        rc = seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(read), 1,
                SCMP_A2(SCMP_CMP_LE, 64));
        if (rc < 0) {
                fprintf(stdout, "%s:%d Failed to add read allow rule : rc = %d\n",
                        __FUNCTION__, __LINE__, rc);
                goto error;
        }

And here's the PFC it generated:

  # filter for syscall "read" (0) [priority: 65525]
  if ($syscall == 0)
    if ($a2.hi32 >= 0)
      if ($a2.lo32 > 64)
      else
        action ALLOW;
      if ($a2.lo32 > 16384)
        action ERRNO(12);
      if ($a2.lo32 > 8192)
        action ERRNO(11);
      if ($a2.lo32 > 4096)
        action ERRNO(10);
      if ($a2.lo32 > 2048)
        action ERRNO(9);
      if ($a2.lo32 > 1024)
        action ERRNO(8);
      if ($a2.lo32 > 512)
        action ERRNO(7);
      if ($a2.lo32 > 256)
        action ERRNO(6);
      if ($a2.lo32 > 128)
        action ERRNO(5);
    else
      action ALLOW;
  # default action
  action ERRNO(34);

bug priorithigh

Source

drakenclimber

All 18 comments

By the way, I'll do what I can to help root cause it

drakenclimber on 26 Feb 2018

Using scmp_bpf_disasm, the latest libseccomp is putting the jumps in the incorrect order

HEAD

 0014: 0x25 0x11 0x00 0x00000080   jgt 128  true:0032 false:0015
 0015: 0x25 0x0f 0x00 0x00000100   jgt 256  true:0031 false:0016
 0016: 0x25 0x0d 0x00 0x00000200   jgt 512  true:0030 false:0017
 0017: 0x25 0x0b 0x00 0x00000400   jgt 1024 true:0029 false:0018
 0018: 0x25 0x09 0x00 0x00000800   jgt 2048 true:0028 false:0019
 0019: 0x25 0x07 0x00 0x00001000   jgt 4096 true:0027 false:0020
 0020: 0x25 0x05 0x00 0x00002000   jgt 8192 true:0026 false:0021
 0021: 0x25 0x03 0x00 0x00004000   jgt 16384 true:0025 false:0022
 0022: 0x25 0x01 0x00 0x00000040   jgt 64   true:0024 false:0023
 0023: 0x06 0x00 0x00 0x7fff0000   ret ALLOW

pre-rework

 0014: 0x25 0x01 0x00 0x00000040   jgt 64   true:0016 false:0015
 0015: 0x06 0x00 0x00 0x7fff0000   ret ALLOW
 0016: 0x25 0x0f 0x00 0x00004000   jgt 16384 true:0032 false:0017
 0017: 0x25 0x0d 0x00 0x00002000   jgt 8192 true:0031 false:0018
 0018: 0x25 0x0b 0x00 0x00001000   jgt 4096 true:0030 false:0019
 0019: 0x25 0x09 0x00 0x00000800   jgt 2048 true:0029 false:0020
 0020: 0x25 0x07 0x00 0x00000400   jgt 1024 true:0028 false:0021
 0021: 0x25 0x05 0x00 0x00000200   jgt 512  true:0027 false:0022
 0022: 0x25 0x03 0x00 0x00000100   jgt 256  true:0026 false:0023
 0023: 0x25 0x01 0x00 0x00000080   jgt 128  true:0025 false:0024
 0024: 0x06 0x00 0x00 0x00050022   ret ERRNO(34)

drakenclimber on 27 Feb 2018

Interesting. So the PFC appears to be "correct", but the generated BPF is ... backwards. Odd. Especially given that the commit didn't change the BPF generation code.

I wonder if the priority values are getting messed up somehow?

pcmoore on 27 Feb 2018

Sorry for the ambiguity. The PFC (post the rework change) is also in the incorrect order. The PFC I posted above is the order it was in prior to the db.c rework.

Here's the PFC currently generated by HEAD

  # filter for syscall "read" (0) [priority: 65525]
  if ($syscall == 0)
    if ($a2.hi32 >= 0)
      if ($a2.lo32 > 128)
        action ERRNO(5);
      if ($a2.lo32 > 256)
        action ERRNO(6);
      if ($a2.lo32 > 512)
        action ERRNO(7);
      if ($a2.lo32 > 1024)
        action ERRNO(8);
      if ($a2.lo32 > 2048)
        action ERRNO(9);
      if ($a2.lo32 > 4096)
        action ERRNO(10);
      if ($a2.lo32 > 8192)
        action ERRNO(11);
      if ($a2.lo32 > 16384)
        action ERRNO(12);
      if ($a2.lo32 > 64) 
      else
        action ALLOW;
    else
      action ALLOW;
  # default action
  action ERRNO(34);

drakenclimber on 27 Feb 2018

Okay, that makes a bit more sense. The problems definitely lives somewhere in the db layer.

It is a bit funny how it is exactly backwards.

pcmoore on 27 Feb 2018

I found the issue. In the chain argument management, the behavior of lvl_nxt and lvl_prv swapped after the massive db.c rework. A couple small changes to _db_tree_add() made it match previous libseccomp behavior.

Here's a branch with the fix
https://github.com/drakenclimber/libseccomp/tree/issues/112

I'll clean up the changes, add a test or two, and ensure the code coverage is up to snuff.

drakenclimber on 28 Feb 2018

I found some time this morning, likely just before you posted the above, and decided to look into this a bit. It looks like we arrived at pretty much the same conclusion, although the fixes are slightly different. Here is my current fix, although like yours it needs some additional work/cleanup:

https://gist.github.com/pcmoore/f644341a85c6ad7131a26f68f99e3fc6

I'm not sure which approach I like more right now, I need to think on this a bit, thoughts?

pcmoore on 28 Feb 2018

Hmmm... I won't lie; I'm not enamored with either fix at this point.

Mine is simple-ish, but it completely ignored _db_tree_prune() which - like you said in your gist - probably has similar issues.

I like your idea to rework the gt() macro to utilize the lt() and eq() macros, but they're getting unwieldy - especially lt(). Is there any reason not to convert lt() to an inline function?
EDIT - I just noticed you made a similar comment in the gist.

I ran gdb against the old libseccomp and HEAD, and the behaviors of lvl_prv and lvl_nxt did change, but perhaps that isn't a big deal since it's an internal variable that no one should see but us.

I guess after all this rambling... I don't know. I agree, I need to have a think on it ;)

drakenclimber on 28 Feb 2018

Hmmm... I won't lie; I'm not enamored with either fix at this point.

I'm worried that there are some subtle bugs with reordering a tree level like this, although it does seem like perhaps the level was reordered by that previous commit and this is one of the subtle bugs.

Either way, I want to understand what the desired ordering should be for a level: "biggest" first, or "biggest" last? Once we understand that then we can move forward with testing/fixing. I think the answer, if for no other reason than compatibility with previous 2.x releases, is "biggest" first, but I can't say that for certain at this point.

Mine is simple-ish, but it completely ignored _db_tree_prune() which - like you said in your gist - probably has similar issues.

They both basically do the same thing in principle, mine goes a bit further by adding some additional conditions and cleaning up the db_chain_lt(x,y) macro.

I like your idea to rework the gt() macro to utilize the lt() and eq() macros, but they're getting unwieldy - especially lt(). Is there any reason not to convert lt() to an inline function?

Mostly historical reasons. They started life as much simpler macros, but they have grown quite a bit to the point where I think they probably should be functions. I think it would also be good to evaluate if they really need to be in the header file, I believe they are only used by src/db.c.

I ran gdb against the old libseccomp and HEAD, and the behaviors of lvl_prv and lvl_nxt did change, but perhaps that isn't a big deal since it's an internal variable that no one should see but us.

Yeah, it's an internal state/tree, I'm not too worried about that. The important thing is correctness of the generated filter.

I guess after all this rambling... I don't know. I agree, I need to have a think on it ;)

Heh. Lets give this a day or two and regroup :) Right now this doesn't affect any released versions, it's only in the master branch, so we've got some time to get things right.

pcmoore on 28 Feb 2018

Right now this doesn't affect any released versions, it's only in the master branch, so we've got some time to get things right.

Sounds good. I'll make a few tests while we contemplate a plan

drakenclimber on 28 Feb 2018

👍1

I wrote a program to evaluate current seccomp A2 handling. The whole program is available here:

https://gist.github.com/drakenclimber/3c6b45ecd973ee495281ef225fa5e54a

In a nutshell, greater than rules are generated in a "last created" "first processed" order.

For a filter where > rules are created in an ascending order, e.g.
seccomp_rule_add(ctx, action1, syscall, 1, SCMP(SCMP_CMP_GT, 10)
seccomp_rule_add(ctx, action2, syscall, 1, SCMP(SCMP_CMP_GT, 20)
seccomp_rule_add(ctx, action3, syscall, 1, SCMP(SCMP_CMP_GT. 30)
then the filter will behave in a coherent fashion, e.g.

if (A2 > 30)
    do action3
if (A2 > 20)
    do action2
if (A2 > 10)
    do action1

For a filter where > rules are created in a descending order, e.g.
seccomp_rule_add(ctx, action3, syscall, 1, SCMP(SCMP_CMP_GT, 30)
seccomp_rule_add(ctx, action2, syscall, 1, SCMP(SCMP_CMP_GT, 20)
seccomp_rule_add(ctx, action1, syscall, 1, SCMP(SCMP_CMP_GT. 10)
then the filter will be created, but behave oddly. Dead code will be produced. The last two if statements are unreachable

if (A2 > 10)
    do action1
if (A2 > 20)
    do action2
if (A2 > 30)
    do action1

Filters with multiple < A2 operations are not currently allowed by seccomp. This seems strange because I was unable to figure out a way to make the <= equivalent of the > filter above

tom@OracleDesktop $ ./a2test 3
Failed to add rule
        action = 0x5000e op = 0x3 datum = 18000 rc = -17
Mode 3 (LE descending) test failed.  rc = -17

tom@OracleDesktop $ ./a2test 4
Failed to add rule
        action = 0x50006 op = 0x3 datum = 250 rc = -17
Mode 4 (LE ascending) test failed.  rc = -17

I am guessing the else if logic buried deep in src/db.c is causing the < failures, e.g. . I'm not sure if it's worth changing/fixing.

I will try and convert some of this code into automated tests so we can capture current behavior.

drakenclimber on 1 Mar 2018

As written, the gist here failed the automated tests I added last week. I'll dig in and try to figure out why.

 batch name: 43-sim-a2_order
 test mode:  c
 test type:  bpf-sim
Test 43-sim-a2_order%%001-00001 result:   SUCCESS
Test 43-sim-a2_order%%002-00001 result:   SUCCESS
Test 43-sim-a2_order%%003-00001 result:   SUCCESS
Test 43-sim-a2_order%%004-00001 result:   SUCCESS
Test 43-sim-a2_order%%005-00001 result:   SUCCESS
Test 43-sim-a2_order%%006-00001 result:   SUCCESS
Test 43-sim-a2_order%%007-00001 result:   SUCCESS
Test 43-sim-a2_order%%008-00001 result:   FAILURE bpf_sim resulted in ERRNO(5)
Test 43-sim-a2_order%%009-00001 result:   FAILURE bpf_sim resulted in ERRNO(5)
Test 43-sim-a2_order%%010-00001 result:   FAILURE bpf_sim resulted in ERRNO(5)
Test 43-sim-a2_order%%011-00001 result:   FAILURE bpf_sim resulted in ERRNO(5)
Test 43-sim-a2_order%%012-00001 result:   FAILURE bpf_sim resulted in ERRNO(5)
Test 43-sim-a2_order%%013-00001 result:   FAILURE bpf_sim resulted in ERRNO(5)
Test 43-sim-a2_order%%014-00001 result:   FAILURE bpf_sim resulted in ERRNO(5)
Test 43-sim-a2_order%%015-00001 result:   FAILURE bpf_sim resulted in ERRNO(5)
Test 43-sim-a2_order%%016-00001 result:   FAILURE bpf_sim resulted in ERRNO(5)
Test 43-sim-a2_order%%017-00001 result:   FAILURE bpf_sim resulted in ERRNO(5)
Test 43-sim-a2_order%%018-00001 result:   FAILURE bpf_sim resulted in ERRNO(5)
Test 43-sim-a2_order%%019-00001 result:   FAILURE bpf_sim resulted in ERRNO(5)
Test 43-sim-a2_order%%020-00001 result:   FAILURE bpf_sim resulted in ERRNO(5)

drakenclimber on 12 Mar 2018

My bad - I misapplied the gist. The tests are passing. Phew :)

drakenclimber on 12 Mar 2018

Ha! :)

I thought I ran the tests against it, but I was playing with a lot of things at that time, so I figured I was just remembering wrong. Thanks for continuing to look at this, I'm still bogged down a bit with SELinux and audit, but since the kernel is at -rc5 right now I expect it to calm down soon as I put the breaks on new code prior to the merge window ...

pcmoore on 12 Mar 2018

No worries. That's definitely higher priority.

I have been running your gist through various unrealistic tests. I haven't gotten it to break, but I am also only exercising bits and pieces of _db_tree_prune() so far. I'm starting to feel more comfortable with the changes, but I want to get a little more time on it.

drakenclimber on 13 Mar 2018

👍1

I poked and prodded at the _db_tree_prune() code and I couldn't break it. Test 08-sim-subtree_checks really does a good job of testing most of the code paths within prune().

I think the changes from your gist are good to go.

drakenclimber on 15 Mar 2018

I submitted pull request #115. I think this is ready to roll

drakenclimber on 23 Mar 2018

Closing as this should now be resolved (see history above).

pcmoore on 15 Jan 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

BUG: SCMP_CMP_GT/GE/LT/LE not working as expected for negative syscall arguments

jdstrand · 20Comments

Q: intercept syscalls from same process

grubeli · 3Comments

BUG: look into replacing Travis CI with GitHub actions

pcmoore · 14Comments

RFE: distinguish unknown syscalls

srd424 · 18Comments

Q: backport patches from master to release-2.4 for v2.4.2

pcmoore · 10Comments