Libseccomp: BUG: seccomp_arch_add() returns -EEXISTS on endian mismatch

Created on 20 Jun 2017  ·  18Comments  ·  Source: seccomp/libseccomp

(note that my issue description starts with golang but its actually a C issue, see below)

I was writing some unit tests today that exercise ScmpFilter.AddArch(seccomp.ArchPPC) on my amd64 system. This did not return any error, however the architecture was not added to the filter (as visible via exportPFC).

Here is a trivial reproducer (needs to run on amd64 or i386):

package main

import (
    "os"
    "github.com/seccomp/libseccomp-golang"
)

func main() {
    secFilter, err := seccomp.NewFilter(seccomp.ActKill)
    if err != nil {
        panic(err)
    }
    err = secFilter.AddArch(seccomp.ArchPPC)
    if err != nil {
        panic(err)
    }
    secFilter.ExportPFC(os.Stdout)
}

After a bit of debugging it turned out that seccomp_arch_add() will return EEXIST if there is an endian mismatch. In db.c:db_col_db_add() there is:

if (col->endian != 0 && col->endian != db->arch->endian)
        return -EFAULT;

The golang code (rightfully) ignores EEXIST which leads to the behaviour I have observed.

I wonder if it would make sense that seccomp_arch_add() would return a different error code, maybe EINVAL or something? If this is too dangerous (as it might break existing applications), maybe it could be documented better. I am happy to provide a PR.

bug prioritmedium

All 18 comments

Thanks for reporting this, I'll have to take a closer look.

I'm very sorry it has taken me so long to get back to this @mvo5, I appreciate your patience.

To clarify things a bit, the relevant current db.c:db_col_db_add() looks like this:

        if (col->endian != 0 && col->endian != db->arch->endian)
                return -EEXIST;

... I think the version above in the original problem report was a locally patched copy to work around the problem. That said, changing the returned error here does sound reasonable; we've been using EDOM in at least some of the other arch/endian code, does that sound reasonable to you?

What do you think @mheon?

It looks like we should probably also update db.c:db_col_merge() while we are at it.

From an API standpoint, I definitely think it makes good sense to make the change - overloading error codes is always problematic for debugging.

On the libseccomp-golang side, this shouldn't require any code changes, so long as the negative ERRNO convention is maintained; maybe a few lines of comments on AddArch to explain what the new error means, so the API documentation will be complete.

Travis is still running too old of a kernel, thus test #47 failed again. As @pcmoore mentioned a while back, I'll try to add some smarts to the test to avoid this.

Otherwise, everything else from this change looks good in my book

@drakenclimber I think you mean test 46, not 47, right?

A few months ago I added the API level check for the live tests, but I don't think we need that for the bpf-sim tests, do we? The bpf-sim tests were running clean ... ?

Travis definitely puked on test #47 (KILL_PROCESS) for the above run. I also saw the same failure on the HEAD of master this morning when I pushed it to a separate branch as a sanity check.

Test 47-live-kill_process%%001-00001 result: FAILURE 47-live-kill_process 3 KILL_PROCESS rc=12

We talked about this quite a while ago, so my memory may be foggy, but I thought that Travis had issues with the KILL_PROCESS test because Travis' kernel is older than 4.14 when that feature was introduced.

Or am I crazy... and misremembering entirely?

Hmm, are we looking at the same log? I'm looking at the build and log below:

... which show the following results (only copied the "c" tests here, the "python" results were the same):

 batch name: 46-sim-kill_process
 test mode:  c
 test type:  bpf-sim
Test 46-sim-kill_process%%001-00001 result:   ERROR 46-sim-kill_process rc=12
Test 46-sim-kill_process%%002-00001 result:   ERROR 46-sim-kill_process rc=12
Test 46-sim-kill_process%%003-00001 result:   ERROR 46-sim-kill_process rc=12
Test 46-sim-kill_process%%004-00001 result:   ERROR 46-sim-kill_process rc=12
Test 46-sim-kill_process%%005-00001 result:   ERROR 46-sim-kill_process rc=12
Test 46-sim-kill_process%%006-00001 result:   ERROR 46-sim-kill_process rc=12
 batch name: 47-live-kill_process
 test mode:  c
 test type:  live
Test 47-live-kill_process%%001-00001 result:   SKIPPED (must specify live tests)

... and yes, older kernels do have a problem with some of the live tests, but that should have been fixed in 9d4f7f69714d5af80309aa1b8a6d2c8300bb6730.

FWIW, the last Travis build on the master branch ran clean:

I just triggered a new build using the master branch just to verify that everything is "OK" with Travis:

I admit that I am now _really_ confused. Your link definitely shows test #46 being the problem. But when I click on the "Raw Log" link in the top right corner, it tells me #47 failed. For the link you listed above, here's where raw log redirected me:

So looking closer, I think we are both right.

46 is returning ERROR as you pointed out. And #47 is returning FAILURE (which is what I had searched for originally.)

And this is visible in the summaries as well:

Regression Test Summary
 tests run: 14090
 tests skipped: 114
 tests passed: 14090
 tests failed: 0
 tests errored: 12
Regression Test Summary
 tests run: 16
 tests skipped: 0
 tests passed: 14
 tests failed: 2
 tests errored: 0

I am able to reproduce TravisCI's problems on one of my systems if I boot into a 3.x kernel. Test 46 ends up having troubles in sys_chk_seccomp_action(). This ultimately causes seccomp_init() to return a NULL context back to the test.

I would imagine Test 47's problems are similar, but I didn't investigate its failure path. (Although the change @pcmoore added to verify the API should have prevented this. Hmmm...)

I would be curious what has changed on Travis to cause this. Did they fall back to a really old kernel? Something on our end?

It looks like we will need to add some smarts to the bpf-sim tests to handle this. Do we want to mimic the API column added to the *.tests file or do something else entirely?

Yeah, I'm really confused as to why it was working as now it isn't. Ubuntu 14.xx is really old these days, I'm going to see if there is a more recent version available on Travis.

It looks like Ubuntu 16.04 (Xenial) is available, let's try that ...

Commit 06f63ba691cb9df119c6759e8f0a150a2a9cbe69 bumps us up to Ubuntu 16.04. I'm doing this in the master branch, not as a PR, because I want to force this switch; if the build breaks we'll fix it.

That build seemed to have fixed the problem with the tests, but clang just found a memory leak in an error handling code path. I'll fix that in just a minute.

Awesome! Thanks for the help, @pcmoore

Okay, with commit f8854f990004e71ccb9955c33d88d82cdb97ea42 we should have a clean building master branch. It worked fine on my personal branch, waiting on the main build now.

@drakenclimber I see you've got a patch ready for this in the issue log above, but I don't see a PR yet - are you still chasing down some issues with the patch or is it ready for a PR?

This should be resolved with commit 4a35b6ea6f7c836734536420c50a2745a9e24c69, closing this out now. If anyone finds a problem with this please reopen this issue or create a new one.

Was this page helpful?
0 / 5 - 0 ratings