Moby: docker build should support privileged operations

Created on 18 Sep 2013  ·  286Comments  ·  Source: moby/moby

Currently there seems to be no way to run privileged operations outside of docker run -privileged.

That means that I cannot do the same things in a Dockerfile. My recent issue: I'd like to run fuse (for encfs) inside of a container. Installing fuse is already a mess with hacks and ugly workarounds (see [1] and [2]), because mknod fails/isn't supported without a privileged build step.

The only workaround right now is to do the installation manually, using run -privileged, and creating a new 'fuse base image'. Which means that I cannot describe the whole container, from an official base image to finish, in a single Dockerfile.

I'd therefor suggest adding either

  • a docker build -privileged
    this should do the same thing as run -privileged, i.e. removing all caps limitations

or

  • a RUNP command in the Dockerfile
    this should .. well .. RUN, but with _P_rivileges

I tried looking at the source, but I'm useless with go and couldn't find a decent entrypoint to attach a proof of concept, unfortunately. :(

1: https://github.com/rogaha/docker-desktop/blob/master/Dockerfile#L40
2: https://github.com/dotcloud/docker/issues/514#issuecomment-22101217

arebuilder kinfeature

Most helpful comment

I really don't understand why there is so much pushback from devs regarding --privileged for docker image.
If the users want to shoot themselves in the foot, why not let them? Just put a warning message and that's it. There are already workarounds for achieving same thing, why not make it easier for the users that really need it??
It's been 4-5 years and has been no progress on this.
Just amazing...

All 286 comments

If we go for this, I'm more in favor of the RUNP option, instead of having
all container running in -privileged mode.

On Wed, Sep 18, 2013 at 1:07 PM, Benjamin Podszun
[email protected]:

Currently there seems to be no way to run privileged operations outside of
docker run -privileged.

That means that I cannot do the same things in a Dockerfile. My recent
issue: I'd like to run fuse (for encfs) inside of a container. Installing
fuse is already a mess with hacks and ugly workarounds (see [1] and [2]),
because mknod fails/isn't supported without a privileged build step.

The only workaround right now is to do the installation manually, using
run -privileged, and creating a new 'fuse base image'. Which means that I
cannot describe the whole container, from an official base image to finish,
in a single Dockerfile.

I'd therefor suggest adding either

  • a docker build -privileged
    this should do the same thing as run -privileged, i.e. removing all
    caps limitations

or

  • a RUNP command in the Dockerfile
    this should .. well .. RUN, but with _P_rivileges

I tried looking at the source, but I'm useless with go and couldn't find a
decent entrypoint to attach a proof of concept, unfortunately. :(

1: https://github.com/rogaha/docker-desktop/blob/master/Dockerfile#L40
2: #514 (comment)https://github.com/dotcloud/docker/issues/514#issuecomment-22101217


Reply to this email directly or view it on GitHubhttps://github.com/dotcloud/docker/issues/1916
.

Victor VIEUX
http://vvieux.com

Actually, we might have to do both — i.e., RUNP + require a "-privileged"
flag.

If we rely only on RUNP (without requiring "-privileged"), then we would
have to wonder when we do a build "is this build safe?".
If we rely only on "-privileged", we miss the information (in the
Dockerfile) that "this action requires extended privileges".

I think a combination of both is the safest way.

On Wed, Sep 18, 2013 at 4:07 AM, Benjamin Podszun
[email protected]:

Currently there seems to be no way to run privileged operations outside of
docker run -privileged.

That means that I cannot do the same things in a Dockerfile. My recent
issue: I'd like to run fuse (for encfs) inside of a container. Installing
fuse is already a mess with hacks and ugly workarounds (see [1] and [2]),
because mknod fails/isn't supported without a privileged build step.

The only workaround right now is to do the installation manually, using
run -privileged, and creating a new 'fuse base image'. Which means that I
cannot describe the whole container, from an official base image to finish,
in a single Dockerfile.

I'd therefor suggest adding either

  • a docker build -privileged
    this should do the same thing as run -privileged, i.e. removing all
    caps limitations

or

  • a RUNP command in the Dockerfile
    this should .. well .. RUN, but with _P_rivileges

I tried looking at the source, but I'm useless with go and couldn't find a
decent entrypoint to attach a proof of concept, unfortunately. :(

1: https://github.com/rogaha/docker-desktop/blob/master/Dockerfile#L40
2: #514 (comment)https://github.com/dotcloud/docker/issues/514#issuecomment-22101217


Reply to this email directly or view it on GitHubhttps://github.com/dotcloud/docker/issues/1916
.

@jpetazzo https://twitter.com/jpetazzo
Latest blog post: http://blog.docker.io/2013/09/docker-joyent-openvpn-bliss/

Sounds reasonable. For me this feature (being able to create device nodes) makes or breaks the ability to create the deployment in Docker. If I can help (testing mostly, I tried looking at the source but failed so far. It seems the available commands in a buildfile are found via reflection, I added a runp command that set the config.privileged to true, but so far I'm unable to build and test -> stuck) I'd gladly invest some time.

I'd suggest RUNP + build -privileged.

_lights up some smoke signals to catch attention of @shykes, @crosbymichael_

... And then we'll have to find someone to implement it, of course ☺
Would that be something you'd want to try (with appropriate guidance and feedback from the core team, of course) ?

If the last part was targeted at me: Sure, why not. I'm already messing with the go code (not a language I'm familiar with, but see above: I'm trying to figure out what's going on anyway).

With a couple of pointers / someone to ping for some questions I'd certainly give it a try.

I'm not sold on RUNP or build -privileged.

​Generally I don't like anything that introduces different possible builds of the same input. That's why you can't pass arguments or env variables to a build.

Specifically I don't like introducing dependencies on "privileged" all over the place, because it designates a set of capabilities that is a) very large and b) not clearly spec-ed or defined. That's ok as a coarse mechanism for sysadmins to bypass security in an all-or-nothing way - an "escape hatch" when the standard docker execution environment is not enough. It's similar in that way to bind-mounts and custom lxc-conf.


@solomonstre
@getdocker

On Fri, Sep 20, 2013 at 3:18 PM, Benjamin Podszun
[email protected] wrote:

If the last part was targeted at me: Sure, why not. I'm already messing with the go code (not a language I'm familiar with, but see above: I'm trying to figure out what's going on anyway).

With a couple of pointers / someone to ping for some questions I'd certainly give it a try.

Reply to this email directly or view it on GitHub:
https://github.com/dotcloud/docker/issues/1916#issuecomment-24844868

Well, do you agree that it should be possible to build a docker image that - for example - runs fuse?
For that we'd need to mknod.

The way I see it, there's no way these builds could be different depending on parameters: The build will work (caps are not / less restricted than now) or fail (status quo). There's little to no risk of different 'versions' of the same build file, right?

I'm running into this issue now. To build the image I need, I have to perform a series of run -privileged steps + a commit step, rather than building a Dockerfile. Ideally, it would be nice to express the image build steps in a Dockerfile.

Is it also related to mknod operations?
If you could describe exactly the actions that require privileged mode in
your case, it would be very helpful!
Thanks,

Hey @jpetazzo, from the mailing list, here is the issue I'm facing: https://groups.google.com/forum/#!topic/docker-user/1pFhqlfbqQI

I'm trying to mount a fs I created (created to work around aufs and something about journaling) inside the container. The specific command I'm running is mount -o loop=/dev/loop0 /db/disk-image /home/db2inst1, where /db/disk-image was created with dd if=/dev/zero of=disk-image count=409600 and home/db2inst1 is where I'm trying to start db2 from.

If I understand correctly, during the installation process, you need a non-AUFS directory — or rather, something that supports O_DIRECT. If that's the case, Docker 0.7 should solve the problem, since it will use ext4 (and block-level snapshots) instead of AUFS.

+1 for this as well.

Installing packages that require change to memory settings and kernel configuration (e.g. Vertica DB, WebSphere MQ) can only be done in privileged mode.

Let's try to separate concerns when it comes to running / building with "privileged": it can be required just during the build, just during execution via docker run or both.

It should be possible to allow a build to do something requiring a bit more permissions for a step (or more) if that's necessary. I needed this for a project and had to convert half of a Dockerfile to a shell script which invoked the build and continued to build things in privileged mode, so having a "privileged" build would be useful.

However, we shouldn't go all the way down to privileged mode by default just so we can use sysctl to change some settings. This should be done via image configuration or via command line args to be passed to docker run.

Right. @orikremer, do you have details on the parameters that Vertica DB and WebSphere MQ were trying to change?

If it's stuff in /sys or /proc, the best solution might be to put some mock up there instead, rather than switching to privileged mode (since the changes won't be persisted anyway).

In the long run, a mock filesystem might capture the change and convert them to Dockerfile directives, instructing the runtime that "hey, this container needs such or such tweak".

@jpetazzo It's been a couple of days since I created the image. AFAIR Vertica was complaining that it doesn't have enough memory and both were trying to change max open files.
I will try to recreate the image using a Dockerfile and report back.

Noting issue #2080 as it is related.

@jpetazzo started recreating the image without -privileged. Two issues so far:

  • nice in limits.conf: Vertica adds "dbadmin - nice 0" to /etc/security/limits.conf. When trying to switch to that user when running in a non-privileged container I get a "could not open session" error. In a privileged container switch user works with no errors.
  • max open files: since the max needed in the container was higher than the one set in host I had to change /etc/init/docker.conf on the host and set "limit nofile" and then ulimit -n in the container. Is that the correct approach ?

When trying to switch to that user,

How does the switch happen? I don't understand how -privileged would help with user-switching; I'm probably missing something here :-)

max open files

If I understand correctly, the Vertical installer tries to set the max number of open files to a very high number, and that only works if Docker was started with such a high number _or_ with the -privileged flag; right?

switching user - su dbadmin fails with that error.
I was able to reproduce by:

  • pull a new image (centos-6.4-x86_64) and run non privileged
  • useradd testuser
  • edit /etc/security/limits.conf, add "testuser - nice 0"
  • try su testuser --> should fail with "could not open session"
    In a -privileged container su testuser works fine.

max open files - correct. the installer tries to set to a number higher than the host has. Only by increasing the hosts setting or starting -privileged does this work.

I just tried with the following Dockerfile:

FROM ubuntu
RUN useradd testuser
RUN echo testuser - nice 0 > /etc/security/limits.conf
CMD su testuser

And it works fine. What's the exact name of the image you're using?
(I tried centos-6.4-x86_64 but looks like I can't pull it!)

@lukewpatterson Can you share how you got the loop filesystem working inside your container?

@jpetazzo Running this docker file

FROM backjlack/centos-6.4-x86_64
RUN useradd testuser
RUN echo 'testuser - nice 0' >> /etc/security/limits.conf
RUN su testuser
RUN echo 'test' > ~/test.txt

failed with:

ori@ubuntu:~/su_test$ sudo docker build .
Uploading context 10240 bytes
Step 1 : FROM backjlack/centos-6.4-x86_64
 ---> b1343935b9e5
Step 2 : RUN useradd testuser
 ---> Running in b41d9aa2be1b
 ---> 2ff05b54e806
Step 3 : RUN echo 'testuser - nice 0' >> /etc/security/limits.conf
 ---> Running in e83291fafc66
 ---> 03b85baf140a
Step 4 : RUN su testuser
 ---> Running in c289f6e5f3f4
could not open session
Error build: The command [/bin/sh -c su testuser] returned a non-zero code: 1
The command [/bin/sh -c su testuser] returned a non-zero code: 1
ori@ubuntu:~/su_test$

I turned on debugging for the PAM module (by adding debug to the pam_limits.so line in /etc/pam.d/system-auth), installed syslog, tried to su again, and here's what I found in /var/log/secure:

Oct 7 14:12:23 8be1e7bc5590 su: pam_limits(su:session): reading settings from '/etc/security/limits.conf'
Oct 7 14:12:23 8be1e7bc5590 su: pam_limits(su:session): process_limit: processing - nice 0 for USER
Oct 7 14:12:23 8be1e7bc5590 su: pam_limits(su:session): reading settings from '/etc/security/limits.d/90-nproc.conf'
Oct 7 14:12:23 8be1e7bc5590 su: pam_limits(su:session): process_limit: processing soft nproc 1024 for DEFAULT
Oct 7 14:12:23 8be1e7bc5590 su: pam_limits(su:session): Could not set limit for 'nice': Operation not permitted

Then I straced the su process, and found out that the following system call was failing:

setrlimit(RLIMIT_NICE, {rlim_cur=20, rlim_max=20}) = -1 EPERM (Operation not permitted)

This, in turn, causes the pam_limits module to report a failure; and this prevents su from continuing.
Interestingly, on Ubuntu, pam_limits is not enabled for su by default; and even if you enable it, the setrlimit call fails, but su continues and works anyway.
It might be related to the audit code, I'm not sure.

Now, why is setrlimit failing? Because the container is missing the sys_resource capability, which is required to raise any kind of limit.

I think I would just comment out that limits.conf directive.
(By the way, it's bad practice to add stuff directly to limits.conf; it should go to a separate file in limits.d, I think.)

Note: since you already increased the limit for number of open files for Docker, you could also raise the limit for the max priority; that should work as well!

I hope this helps.

In that Dockerfile, you've got the following line by itself:

RUN su testuser

There's no command to go with this (and it won't apply any resulting shell to subsequent RUN commands), so I wouldn't be surprised if it's really failing at trying to open a shell and not being somewhere interactive that doing so makes sense (since docker build is not an interactive process). I don't have time right now to confirm, but it's probably worth a try with an actual command being passed to su.

@jpetazzo Thanks for the detailed description. I will try raising the max priority for Docker and see if that helps.

(By the way, it's bad practice to add stuff directly to limits.conf; it should go to a separate file in limits.d, I think.)

Agreed, but as this is the Vertica installer code that's putting it there I am trying work around that.

@tianon The same happens if I run this in an interactive shell (/bin/bash).

My apologies, I think it was still worth a try.

The point about that line not making much sense in the Dockerfile still does apply though. You probably wanted something more like this (after you figure out the limits issues):

RUN su testuser -c 'echo test > ~/test.txt'

@tianon you're right, it doesn't make much sense. That was merely to demonstrate that the su itself fails.

To get back to the original discussion: I believe it is fine from a security standpoint (and useful!) to allow setfcap and mknod capabilities in the build process (and probably in regular container execution as well). Does anyone see any problem that could stem from that?

@jpetazzo Quite the opposite! It would solve many problems I'm encountering. I think this is necessary for people who want to run Docker containers that act/look more like a real machine.

OK! If that's fine with you, I'm closing this issue in favor of #2191, "Enable mknod and setfcap capabilities in all containers" then :-)

Do we know about such scenarios?

On Sun, Oct 13, 2013 at 12:22 PM, unclejack [email protected]:

2191 https://github.com/dotcloud/docker/issues/2191 doesn't solve this

problem for all scenarions which would require docker build -privileged.


Reply to this email directly or view it on GitHubhttps://github.com/dotcloud/docker/issues/1916#issuecomment-26224788
.

@jpetazzo https://twitter.com/jpetazzo
Latest blog post:
http://jpetazzo.github.io/2013/10/06/policy-rc-d-do-not-start-services-automatically/

@jpetazzo This is needed when you want to use a Dockerfile to build an operating system image.

I've deleted my comment by mistake when editing it.

Please take a look at how this would look like without doing everything in a Dockerfile:

from ubuntu:12.04
run apt-get update
[... a few more run commands]
add build.sh /root/build.sh

build.sh

docker build -t mybuild .
docker run -i -t -privileged -cidfile mybuild.cid mybuild /root/build.sh
buildcid=`cat mybuild.cid`
rm mybuild.cid
docker commit $buildcid mybuild-final

This is just forcing me to work around the lack of runp in the Dockerfile or docker build -privileged, thus rendering Dockerfiles useless and forcing me write a tool to duplicate Dockerfile-like functionality.

Obviously, the Dockerfile would be much more useful with runp or docker build -privileged:
runp example:

from ubuntu:12.04
run apt-get update
[... a few more run commands]
add build.sh /root/build.sh
runp /root/build.sh

docker build -privileged example:

from ubuntu:12.04
run apt-get update
[... a few more run commands]
add build.sh /root/build.sh
run /root/build.sh

@unclejack : sorry, my question was not accurate enough!
What I meant was "which permission do you need exactly (on top of mknod and setfcap)?"

@jpetazzo It's difficult to say, I'd have to audit this somehow to figure out what's needed. Mounting file systems, using loopback mounted block devices and a few other things.

There are at least three separate needs when it comes to building images: permissions required during a docker build, permissions required when running a container with docker and runtime needs for processes during build, run or both (like sysctls and others).

I think having docker build -privileged (or runp to use -privileged mode only for the commands which really need it) would be useful.

Ah, mounts are definitely a big one. That's a very valid use-case, and we _probably_ don't want to allow them in the general case. I'm re-opening the issue.

@jpetazzo RE: PAM module (I am installing Vertica as well) would you suggest recompiling docker after taking the sys_resource out of the lxc.cap.drop?

Maybe some of these limits can be set via the docker.conf file?

It should be considered that Docker itself might be running in a limited set of capabilities whereby these privileges may not be available for Docker to delegate to its containers. That would be especially true in a nested Docker scenario should issue #2080 land -- this might allow non-privileged nested Docker.

Not that this changes anything, except that solutions such as 'runp' or '-priviledged' might not be guaranteed success in all Docker environments. That should be considered when adding such commands and when documenting them.

@ramarnat @jpetazzo just to close the loop on the Vertica install and nice level issue,
I did try to set the nice limit in docker.conf but that didn't work for me and I was forced to run bash privileged and manually install it.

@orikremer @jpetazzo I was able to run the install by removing sys_resource from the lxc_template.go, and recompiling docker. I can put a pull request out there, but I'll wait for others to opine about the security implications of removing that from the lxc config.

@ramarnat: depending on the scenario, some people will think that removing sys_resource is fine; for some others, it will be a problem.

A possibility could be to increase the base limits to something higher (file descriptors are also a problem for Elastic Search). This would be like asserting "a minimal Docker runtime should be able to handle 1,000,000 file descriptors". If Docker cannot raise the limit when it starts, it will issue a warning and continue (like it does for the memory/swap controller group).

This doesn't fix the mount/loop scenario, though (I'm still sleeping on this one).

@jpetazzo maybe supply a way to override the hard coded values in lxc_template.go. There already is something for the scenario with the command line -lxc_conf, but it doesnt work for the .drop nature of these lxc config directives - I tried!

Well, that's a possibility, but that's also a good way to break future compatibility across different containerization systems :-) We'll see if we can't find anything better.

Could we whitelist /dev/loop* in non-privileged mode? I suppose the problem is it might give me access to other container's loop mounted files, or even the host's...

@solomonstre
@docker

On Thu, Oct 17, 2013 at 6:09 PM, Jérôme Petazzoni
[email protected] wrote:

Well, that's a possibility, but that's also a good way to break future compatibility across different containerization systems :-) We'll see if we can't find anything better.

Reply to this email directly or view it on GitHub:
https://github.com/dotcloud/docker/issues/1916#issuecomment-26565782

@jpetazzo Thats true, but I would think docker will need a standard way of overriding the underlying containerization systems configuration if it is allowed to - back to the build privileged consideration I guess!

@solomonstre The point is there has to be a way to allow docker build to build in privileged mode. Allowing access to /dev/loop* won't help me with my particular use case.

@solomonstre: whitelisting /dev/loop is, IMHO, a big no-no, because with the DM branch, it would give read/write access to everything (since the default behavior of the DM branch is to use loop devices to store the pools).

I understand that some builds will require loop devices, mounts, and other things. Let's review our options:

  1. docker build -privileged
    Convenient, but draws the line between "normal builds" and "privileged builds". If you happen to have a very useful image that requires a privileged builder, it will be difficult to build it on public builders. E.g. if someone starts a service to automate builds, they will probably not offer privileged builds (or they will have to use extra safeguards).
  2. Relax permissions a little bit in the builder
    This means (at least) enabling cap_sysadmin (this makes the paranoid me shiver a bit), and maybe giving one or two loop devices to each builder. This would limit the total number of builders running in parallel; but it's not a big deal since builds are supposed to be fast, and more importantly, active processes. I.E. if you have 50 builds running in parallel, unless you have a machine with a kickass I/O subsystem, those builds won't progress much.
  3. Wrap the build within another layer of virtualization/isolation.
    Instead of running the build straight within Docker, run something like QEMU-in -Docker, or UML-in-Docker. This is a cool solution from a Docker developer point of view, since it means no additional work; this is a poor solution from a DOcker user point of view, since it means dealing with another layer of complexity.

I wonder if the right solution might be to allow docker build-privileged`, and at the same time, think about hooks which would allow transparent implementation of option 3. Suppose I'm a "docker build provider": if someone requests a privileged build, all I have to do is to insert something somewhere to run their build process within a sandboxed environment (QEMU and UML are obvious candidates, but others might work too; they're just convenient examples).

What do you guys think?

@backjlack, may I ask how you use that container once it's built? What
happens when you "docker run" it exactly, what is the application? Just
trying to get a sense of the use cases for this.

On Fri, Oct 18, 2013 at 9:59 AM, Jérôme Petazzoni
[email protected]:

@solomonstre https://github.com/solomonstre: whitelisting /dev/loop is,
IMHO, a big no-no, because with the DM branch, it would give read/write
access to everything (since the default behavior of the DM branch is to use
loop devices to store the pools).

I understand that some builds will require loop devices, mounts, and other
things. Let's review our options:

  1. docker build -privileged Convenient, but draws the line between
    "normal builds" and "privileged builds". If you happen to have a very
    useful image that requires a privileged builder, it will be difficult to
    build it on public builders. E.g. if someone starts a service to automate
    builds, they will probably not offer privileged builds (or they will have
    to use extra safeguards).
  2. Relax permissions a little bit in the builder This means (at least)
    enabling cap_sysadmin (this makes the paranoid me shiver a bit), and maybe
    giving one or two loop devices to each builder. This would limit the total
    number of builders running in parallel; but it's not a big deal since
    builds are supposed to be fast, and more importantly, active processes.
    I.E. if you have 50 builds running in parallel, unless you have a machine
    with a kickass I/O subsystem, those builds won't progress much.
  3. Wrap the build within another layer of virtualization/isolation.
    Instead of running the build straight within Docker, run something like
    QEMU-in -Docker, or UML-in-Docker. This is a cool solution from a Docker
    developer point of view, since it means no additional work; this is a poor
    solution from a DOcker user point of view, since it means dealing with
    another layer of complexity.

I wonder if the right solution might be to allow docker build-privileged`,
and at the same time, think about hooks which would allow transparent
implementation of option 3. Suppose I'm a "docker build provider": if
someone requests a privileged build, all I have to do is to insert
something somewhere to run their build process within a sandboxed
environment (QEMU and UML are obvious candidates, but others might work
too; they're just convenient examples).

What do you guys think?


Reply to this email directly or view it on GitHubhttps://github.com/dotcloud/docker/issues/1916#issuecomment-26612009
.

+1 - I would like to see mknode capabilities to install fuse(for mounting S3 buckets) or the ability to execute privileged runs in Dockerfiles. Unsure what the best solution is, yet.

+1. any updates on this issue?

+1
On Nov 17, 2013 11:31 PM, "yukw777" [email protected] wrote:

+1. any updates on this issue?


Reply to this email directly or view it on GitHubhttps://github.com/dotcloud/docker/issues/1916#issuecomment-28676216
.

I have also hit the Fuse issue in trying to install Java and I am interested in a solution. I tried the workaround suggested here https://gist.github.com/henrik-muehe/6155333 but it doesn't work for me on docker on the Raspberry Pi.

@jpetazzo: I like the overall strategy of implementing the -privileged flag while concurrently exploring a longer-term solution. To that end, I've submitted a pull request to implement this feature. Note that as of right now, it does not implement the "RUNP" command as discussed earlier in this thread.

(Let me "cross post" this, since people might end up here looking for a workaround)

If you're not actually using the device file (but it's just part of a post-inst script as in the case with the fuse package), you can do:

fakeroot apt-get ...

or:

dpkg-divert --local --rename --add /sbin/mknod && ln -s /bin/true /sbin/mknod`

I'm sure that's well-meant, but the very first comment/my report already included two workarounds.
To be fair, you added new ones to the list, but the problem isn't 'Cannot install fuse' and reading the first post should help those people that just need the package to install no matter what.

The real problem is 'I need to call mknod' (or more generic: I need privileged operations that so far fail).

@jpetazzo This could fix that, right? https://lwn.net/Articles/564977/ - Until then, I'd go for 3) because isolating device access _is_ another layer of complexity and it has to be managed somewhere.

I'm also not sold that loop mounting or fuse is a necessary feature. It's feels crazy to give a container in use space root permissions to mount a filesystem which's (fuse: implementation runs, loop: image file is) in userspace.

If you need to mount a filesystem image or fuse fs, you can mount it outside the container and use it as a volume/bind mount. Although it might be a nice feature to support and manage remote filesystems in docker iself. Maybe a dockerfile MOUNT /mount-point.

@discordianfish 3) is pretty much a non-solution.

would #2979 help with this issue?

I'm waiting for a resolution for this, too, but not because of mknod. We're running centos containers with rpms that set up limits for users using /etc/security/limits.d/ and I'm currently using a sledgehammer workaround consisting of:

RUN /bin/sed --in-place -e "s/^\s\?session.*pam_limits.so.*/\#\0/g" /etc/pam.d/*

at the top of my Dockerfile. (We're just prototyping, don't worry :) )

Hi @jpetazzo I tried both options that you suggested. I am trying to build an image "oracle_xe" using

sudo docker build - privileged -t oracle_xe becuase in my Dockerfile i want to run these 2 commands

RUN mount -o remount,size=3G /dev/shm
RUN mount -a

But that does not work, i don't know whether the syntax i used is incorrect, the error that i get is
flag provided but not defined: -privileged

I also tried the second choice to use RUNP but that also did not work, when I build the image it skips that step saying

Skipping Unknown instruction RUNP. I can send you the Dockerfile that I am trying to build. Please help me solve this issue.

Thanks.

I think neither RUNP nor "build --privileged" were implemented.
If possible, don't hesitate to share the Dockerfile; it could be useful so
we can give you the "least worse" workaround :-)

On Wed, Apr 9, 2014 at 7:44 AM, Manoj7 [email protected] wrote:

Hi jpetazzo I tried both options that you suggested. I am trying to build
an image "oracle_xe" using

sudo docker build - privileged -t oracle_xe becuase in my Dockerfile i
want to run these 2 commands

RUN mount -o remount,size=3G /dev/shm
RUN mount -a

But i that does not work, i don't know whether the syntax i used is
incorrect. I also tried the second choice to use RUNP but that also did not
work, when I build the image it skips that step saying
Skipping Unknown instruction RUNP. I can send you the Dockerfile that I am
trying to build. Please help me solve this issue.

Thanks.

Reply to this email directly or view it on GitHubhttps://github.com/dotcloud/docker/issues/1916#issuecomment-39972199
.

@jpetazzo https://twitter.com/jpetazzo
Latest blog post:
http://jpetazzo.github.io/2014/03/23/lxc-attach-nsinit-nsenter-docker-0-9/

Hi @jpetazzo, I'd like to do a "RUN sudo umount /etc/hosts" in my Dockerfile - is there a "least worse" workaround for this ? ;)

@jpetazzo

The Dockerfile I used to build oracle_xe image

From *

MAINTAINER *******

ADD oracle-xe-11.2.0-1.0.x86_64.rpm.zip /appl/oracle/xe/oracle-xe-11.2.0-1.0.x86_64.rpm.zip
RUN mount -o remount,size=3G /dev/shm
RUN mount -a
RUN cd /appl/oracle/xe && unzip oracle-xe-11.2.0-1.0.x86_64.rpm.zip
RUN cd /appl/oracle/xe/Disk1 && rpm -Uvh oracle-xe-11.2.0-1.0.x86_64.rpm
RUN cd /appl/oracle/xe && rm oracle-xe-11.2.0-1.0.x86_64.rpm.zip
ENV ORACLE_HOME /u01/app/oracle/product/11.2.0/xe
ENV ORACLE_SID XE

The first thing i tried was

sudo docker build -privileged -t oracle_xe .

This did not work and then i tried to use RUNP
RUNP mount -o remount,size=3G /dev/shm
RUNP mount -a
this also did not work , these two steps were skipped.

@gatoravi: unfortunately, unmounting /etc/hosts won't work easily. Why do you need to do that? (I understand that you can have very valid reasons, but I'd love to hear them to give you the best workaround...)

@Bhagat7: right! Question: do you need the bigger /dev/shm at run time _and_ install time, or only run time? If it's at build time, which step is failing and how?

@jpetazzo I'd like to add a new and IP address to /etc/hosts as a part of my tool's build process.
Something like echo $IP $HOST >> /etc/hosts.

I can do this fine if I use docker run --privileged and then do a sudo umount \etc\hosts but it looks like I am unable to commit that using docker commit hence I have to repeat the umount step each time manually when I run a container.

I'd like someway to make \etc\hosts writeable and make it persistent, can't seem to find a way do it either with docker commit or with a Dockerfile.

@jpetazzo

I had this problem

bash-4.1#/etc/init.d/oracle-xe configure
Specify the HTTP port that will be used for Oracle Application Express [8080]:

Specify a port that will be used for the database listener [1521]: 1521

Specify a password to be used for database accounts. Note that the same
password will be used for SYS and SYSTEM. Oracle recommends the use of
different passwords for each database account. This can be done after
initial configuration:
Confirm the password:

Do you want Oracle Database 11g Express Edition to be started on boot (y/n) [y]:y

Starting Oracle Net Listener...Done
Configuring database...Done
Starting Oracle Database 11g Express Edition instance...Done
Installation completed successfully.
bash-4.1#cd /u01/app/oracle/product/11.2.0/xe/bin
base-4.1#sqlplus
Enter Username: system
Enter Password: ***
But I get this error
ORA-01034: ORACLE not available
ORA-27101: shared memory realm does not exist
Linux-x86_64 Error: 2: No such file or directory
Process ID: 0
Session ID: 0 Serial number: 0
df -h inside the container returned
Filesystem Size Used Avail Use% Mounted on
tmpfs 64M 0 64M 0% /dev/shm

So when i increased the size of tmpfs to 3G i did not get this error. I resolved it by running the container as
sudo docker run -privileged -i -t oracle_xe /bin/bash. I ran the 2 mount commands inside the container. But I don't want to do it that way instead I want to put them in my Dockerfile and build it.

@gatoravi: OK, understood. Two more questions then: do you need that extra hosts in /etc/hosts during the build, or only during run? And why do you need it?

@Bhagat7: Sorry, I don't have an elegant solution for this yet :-( I would suggest to have two Dockerfiles:

  • a first one that does all the steps (except the one that requires the bigger /dev/shm), and defines a CMD that will check if the container is running in privileged mode, mount the bigger /dev/shm, and run the special command;
  • a second Dockerfile to perform further steps (unless you also need /dev/shm on runtime, then for now you need a privileged thing).

@jpetazzo We'd like to provide our users with an image(/container) with an editable /etc/hosts/ so that they can build our code which modifies that file :) As to why we need to add the host, I'm not really sure to be honest, we do it as a part of our install to help point certain host-names to IP addresses.

@Bhagat7 I have been able to get oracle XE running in a docker 0.9 container using a combination of:

  1. https://github.com/wnameless/docker-oracle-xe-11g
    and
  2. on the host ...
sysctl -w kernel.msgmni=4096
sysctl -w kernel.msgmax=65536
sysctl -w kernel.msgmnb=65536
sysctl -w fs.file-max=6815744
echo "fs.file-max = 7000000" > /etc/sysctl.d/30-docker.conf
service procps start

@mikewaters Thank you so much for replying. I think you built Oracle XE on top of Ubuntu. But I am trying to build it on Centos.

@jpetazzo Thank you so much for your suggestion

Hi guys,

I'm using google-chrome which needs to write to /dev/shm which seems to be usually 777 and is 755 here. I tried to add a specific configuration to my /etc/fstab but cannot run mout -a to apply the modifications on a build. Of course i tried the basic chmod or chown but can't do it on build neither.

If i use my commands when loggedin in --privileged mode, everything is OK. But I need, as other people explained, to do this on build.

Any suggestion?

Thank you.

@tomav the "/dev/shm" permissions issue is actually #5126, which was fixed in #5131 and is already merged into master (and will be in the next release)

Thank you @tianon.

today i had this idea: i want a container managing my data volumes. easy, but since i am on a vps i want those volumes encrypted, but provided to other containers as usual in clear. the point is, just to have no clear data on the virtual disk and a quick way to destroy by deleting the key.

i followed some of the steps beautifully documented in this article about creating a cryptfs to put containers in: https://launchbylunch.com/posts/2014/Jan/13/encrypting-docker-on-digitalocean/

note, that i am _not_ trying to do that, but actually have a container with a mounted cryptfs:
so an encrypted filesystem should created, mounted, formatted during build via docker.

that fails:

  • when i try to find a loop device:
+ losetup -f
losetup: Could not find any loop device. Maybe this kernel does not know
       about the loop device? (If so, recompile or `modprobe loop'.)

  • weirdly the exact same dockerfile _sometimes_ succeeds in finding a loop device, then:
+ losetup -f
+ LOOP_DEVICE=/dev/loop1
+ losetup /dev/loop1 /cryptfs/disk
+ cryptsetup luksFormat --batch-mode --key-file=/etc/cryptfs/random /dev/loop1
setpriority -18 failed: Permission denied
/dev/mapper/control: mknod failed: Operation not permitted
Failure to communicate with kernel device-mapper driver.
Cannot initialize device-mapper. Is dm_mod kernel module loaded?

is there a way around this yet? (other than moving the disk mount/format steps into run)

+1 It would be especially useful for "docker in docker" environments

+1 on on this, iptables doesn't work in un-privileged mode, which causes anything trying to set up firewall rules to fail.

@PerilousApricot: note, however, that even if you could set an iptables rule in RUN, it would be lost immediately, since an image holds only the state of the filesystem. It doesn't know about running processes, network routes, iptables rules, etc.

That's fine with me, since the container would only have specific ports
forwarded, I'm not concerned with the firewall, I mostly just want the
installer to be able to succeed at all

Andrew Melo

@PerilousApricot I see! In that case, what about symlinking iptables to /bin/true? That should make the installer happy as well. (Or some similar trick to fool the installer?)

I tried that, but the installer also needs to parse the output from
iptables, so it's not quite that easy :)

OK, I know this is getting hackish, but -- what about putting a fake iptables instead? Which would generate some dummy output?

I totally understand that it's not great; but seriously, that kind of installer should be fixed in the first place :)

The docker in docker use case is what brought me here. Well, docker in lxc, to be specific, as our development environment uses lxc, and I'd like devs to be able to build the images in the lxc.

I'd like this for docker in docker as well. There is an image that needs to be pulled before the application can be run, which is rather large, and I'd rather have it pulled and cached as part of docker build instead of needing to frequently pull and/or commit containers that have it pulled.

This feature is a must-have IMHO, a combination of RUNP along with build-privileged would be great.

Real-life/production scenario I 've come against to is Docker images built with Puppet provisioning in an intermediate container. On certain services that require elevated capabilities there are failures on build requiring the container to be ran under -privileged with an ENTRYPOINT or CMD that re-applies the puppet script.

This delays the startup time of real service within the container as puppet configuration needs to be built and then applied to ensure proper state (and this is time consuming), as well as the running container _might_ not need to be running in actual -privileged mode but only during evaluating certain states in an intermediate container during build.

I hope the above makes sense.

@jpetazzo I'm trying to build a webserver on top of centos6. I'm stranded at configuring iptable rules via the dockerfile. It's similar to @PerilousApricot 's issue.

btw: I'm NOT for implementing the hacks such as a fake iptables.

@pmoust: do you have details about which build operations require elevated privileges? I will probably recommend to dodge the issue, and I totally realize that this might not be satisfactory for you; but nonetheless, I would be happy to understand what kind of installer/builder might require those privileges...

@passion4aix: note that if you set iptables rules in the Dockerfile, they will NOT be saved. Docker only saves filesystem state, not routing/filtering/running processes... There are ways to setup iptables rules with "sidekick" containers. Is that something that could be interesting for you?

@jpetazzo The Bitrock installer is one example. It requires /tmp to be mounted as tmpfs. You might want to have a look at http://answers.bitrock.com/questions/3092/running-installer-inside-docker

@jpetazzo or basically any openstack installer

I also just ran into a similar issue when trying to run TokuMX in a Docker container since TokuMX requires the 'transparent_hugepage' kernel option to be disabled.

Is there any progress on this issue? It is over a year old already and looking at the comments, most people have use for running privileged actions from a Dockerfile.

Personally I would not opt for the build with '--privileged' solution. The RUNP solution is better since then you can only run some actions as privileged user instead of running the whole installation as privileged. This way at least you have to think about when to use RUNP and only use it when needed.

Seems to me the question is not longer IF this option should be added, but only When it is done. So, when can we expect this functionality?

@diversit They would have to be coupled. So --privileged on the command line would enable the ability to use RUNP, otherwise this would be a security nightmare for folks doing builds of untrusted code (including DockerHub).

But also keep in mind, you can do this manually outside of the Dockerfile syntax. The build process is reading the Dockerfile, creating a container from it, and committing it back to an image.

@deas: I think this can be solved by doing VOLUME /tmp.

@PerilousApricot: can you elaborate a bit? I don't understand why any kind of installer would require special privileges. (Yeah, I'm an old stubborn Unix guy, that's one of my flaws :D)

@diversit: for that specific case, I think the admin of the machine should disable transparent hugepages before building. Because if the builder is allowed to do that, it will do it globally (right?) and it might break other containers which might require the feature. Do you see what I mean? It would be bad if building container X breaks running container Y...

Everybody: I totally understand that it's super frustrating when a Dockerfile doesn't work, and all you need is this --privileged/RUNP flag. But if we start to have privileged builds, it will break a ton of stuff (e.g. automated builds on the Docker Hub!), so that's why we feel very bad about it. And for what it's worth, I'm willing to investigate all scenarios requiring privileged builds and help to fix them :-) (Since it's my personal conviction that those installers are broken!)

@jpetazzo Many/most openstack deployment tools (ex https://openstack.redhat.com/Main_Page) sets iptables rules. I'd like to be able to roll/deploy containerized versions of the application, so being able to build a dockerfile and do it in one go is important to me. I know that iptables rules aren't preserved through the containerization process natively, but they are persisted through iptables-save, so a simple iptables-restore in the CMD process will cause the rules to be reloaded. It's much much more complicated to just "stub out" iptables because a lot of the deployment tools use a CI tools like puppet or chef to perform the actual deployment, so you would need to somehow make a compatible stub that would end up emulating all of the inputs/outputs to the "real" iptables command.

Further, I think it's a fair tradeoff to say, "If you have a Dockerfile that required privileged builds, you lose features X, Y, Z"

Oracle xe will not run without sufficient shared mem space. All accounts are that remountimg tmpfs with enuf space make Oracle xe happy to startup and complete its configuration .. (for those with insight, it is the '/etc/init.d/oracle-xe configure' step that complains about target memory limitations rumored to be ovecome by increasing the mount size)

During a build
RUN unmount tmpfs
fails with
umount: /proc/kcore: must be superuser to umount

give me RUNP or give me death .... or.... show me what could do differently :)

My example is invalid; @jpefazzo still stands :) My Oracle settings were causing the trouble and there appears to be no need to adjust tmpfs size... at least for initial install.

I'm running into an iptables issue in CentOS 7.0 that is only resolved when running with --privileged https://github.com/docker/docker/issues/3416

Without support for privileged building, I'm not sure how else to workaround the issue

Step 24 : RUN iptables -I INPUT -p tcp --dport 80 -j ACCEPT
 ---> Running in 74ebc19b6935
iptables v1.4.21: can't initialize iptables table `filter': Permission denied (you must be root)
Perhaps iptables or your kernel needs to be upgraded.

@buley I do believe with the added security-opts in #8299 it will be possible to do this without --privileged

@jpetazzo I don't see how using VOLUME /tmp solves the problem of the Bitrock Installers. That may work for the build, but it will also make /tmp bypass AUFS layering for all containers based on that image, right? At the end of the day, it seems the root cause should be fixed in AUFS.

Here's my use case: I want to build a chroot environment inside a docker container. The problem is that debootstrap cannot run, because it cannot mount proc in the chroot:
W: Failure trying to run: chroot /var/chroot mount -t proc proc /proc
mount: permission denied
If I run --privileged the container, it (of course) works...
I'd really really really like to debootstrap the chroot in the Dockerfile (much much cleaner). Is there a way I can get it to work, without waiting for RUNP or build --privileged?

Thanks a lot!

+1 for --privileged or mounting. Need for automate building glusterfs

This is impacting my efforts to build an image from the Bitnami Tomcat installer. 99% of the installation process runs with no problem but, when it attempts to start tomcat for the first time, that fails with the following output in catalina-daemon.out:

set_caps: failed to set capabilities
check that your kernel supports capabilities
set_caps(CAPS) failed for user 'tomcat'
Service exit with a return value of 4

I can successfully run the Tomcat installer manually in a container that I create with "--cap-add ALL". It seems strange that I can't use 'docker build' to create an image that I can create manually using 'docker run' then 'docker commit'. The containers that are used during the build process should have all the functionality as the containers you can create using 'docker run'.

@gilbertpilz They very explicitly can't do that in order to ensure build portability and security.

@cpuguy83 - That doesn't make any sense. I can build the image I want, by hand, if I do:

docker run --cap-add ALL .... /bin/bash
bitnami-tomcatstack-7.0.56-0-linux-x64-installer.run ...
exit
docker commit ....

All I'm asking for is the ability to use "docker build" to do the same thing that I'm manually doing here. I fail to see how "portability and security" are being "ensured" if I can create the image one way but not the other.

Ok, let me give you a Dockerfile which mounts the host's /etc/passwd into the builder and just happens to send that up to me.

This stuff can be dangerous.
Also note that --privileged (and --cap-add ALL) give the user in the container full control of the host.

Allowing these things would compromise the entirety of DockerHub

@cpuguy83 - You don't have to put the control in the Dockerfile. You could add the "--cap-add" option (or something similar) to the "docker build" command. If we followed your logic, shell scripts should not allow the use of the "sudo" command because someone could write a script that did bad things.

But you wouldn't give someone sudo that you don't imicorly trust with the keys to the kingdom.
Build has to be able to, as safely as possible, run untrusted code.

Introducing CLI flags to enable extra features on build breaks the portability of the build, and that is why it is not yet added.

That said, the installer is almost certainly in the wrong here requesting things it should itself not have access to.

Exactly. You shouldn't give people that you don't trust the ability to run docker builds that require privileges, but docker shouldn't prevent people that you _do_ trust from doing so. This is a policy decision and I really dislike it when tools presume to make policy decisions for me. A good tool allows me to implement the policy decisions I have made in a clear and non-surprising way.

You aren't seeing the bigger picture.

There is more to this ecosystem than your server and my server. DockerHub, for instance, is doing builds of untrusted code all the time.

Then DockerHub should definitely _not_ enable adding capabilities for its builds. If this means that I can't push my docker build up to DockerHub, I'm okay with that.

@cpuguy83 @tianon @jpetazzo -- When the FUD starts, I am compelled to speak up:

Allowing these things would compromise the entirety of DockerHub

srsly?
Implementing this feature == TEOTWAWKI ?

Of course DockerHub would never ever run docker build with the requested --privileged flag.
Without too much thought, there's at least two obvious ways to implement it:

  • flag only works if you also launch docker -d with some new flag such as: --i-want-a-broken-security-model
  • Create a compile-time flag which enables the code path.

Overall the ratio of teeth-gnashing to engineering-based-reasons against implementation seems really high here.

@tamsky And then we have a situation where builds work in one place but not another.
I am explaining why things are the way they are, not arguing one case or the other.

But also... most things don't need any sort of privileged access, and those that do need privileged access generally don't _really_ need it for installation to work. If the install of something is failing because of this, that installer is broken, such as is the case with the cited tomcat issue.
Enabling such a feature would encourage people to run with privileged mode instead of actually resolving the real issue at hand.

@cpuguy83

And then we have a situation where builds work in one place but not another.

Please imagine for a moment we've been magically transported to a world where the _policy_ is different, and some builds work in one place but not another...

Why is this a big deal?
Exactly who cares?

Has Docker Inc considered that their lowest-common-denominator-mantra/requirement of "all builds must work everywhere" might actually be ignoring an actual customer need?

The currently policy is externalizing the engineering cost for customers to "get X to build in docker":

Instead of providing this feature in docker, you're forcing every 3rd party project in the world that doesn't "need any sort of privileged access" (but actually does), to first be updated or monkeypatched to handle the docker build case.

Eventually, if Docker is going to run on multiple platforms, 'docker build' will NOT work the same on all systems. That is, a build of a Windows container, Solaris container, or even ARM Linux container will not be the same as it is on x86-64 Linux. The security context for these will be different as well, as appropriate for their platforms.

This is to say, @cpuguy83, we can not always presume that Dockerfiles will stay universal. However, I agree that we need to minimize how much variance there is between them. It might be worth including the consideration for users that want this feature, as dangerous as it is, in the conversations that ultimately need to happen around multi-arch / multi-platform support.

The builds are not working everywhere because for example loaded app armor profiles.
Also how would you do case with pre cached docker containers baked into an image?

On 18. 11. 2014, at 2:53, tamsky [email protected] wrote:

@cpuguy83

And then we have a situation where builds work in one place but not another.

Please imagine for a moment we've been magically transported to a world where the policy is different, and some builds work in one place but not another...

Why is this a big deal?
Exactly who cares?

Has Docker Inc considered that their lowest-common-denominator-mantra/requirement of "all builds must work everywhere" might actually be ignoring an actual customer need?

The currently policy is externalizing the engineering cost for customers to "get X to build in docker":

Instead of providing this feature in docker, you're forcing every 3rd party project in the world that doesn't "need any sort of privileged access" (but actually does), to first be updated or monkeypatched to handle the docker build case.


Reply to this email directly or view it on GitHub.

It's not "the installer" that is "broken" in this situation, it is Tomcat 7. I'm using Bitnami's Tomcat stack which integrates Tomcat with Apache and MySQL. Docker is sitting on the end of a supply chain of source, configuration, integration, testing, and packaging services. Requiring me to "fix" Tomcat prevents me from taking advantage of this supply chain. It is way easier to build the image I want by hand (start a container with "--privileged", run the installer, snapshot the container, etc.) than it is to "fix" Tomcat.

+1
I can't port my chef roles to docker because they all involve using ufw to open ports up.
adding --privileged to build would fix this.

+1. Can't have debootstrap as a step in my Dockerfiles.

+1. Can't have debootstrap as a step in my Dockerfiles.

It seemed natural to build my chroot through a Dockerfile / build but ran into the same issues as @fbrusch mentioned.

FROM ubuntu:utopic
ENV HOME /root
RUN sudo apt-get update
RUN sudo apt-get install -y eatmydata
RUN for i in /usr/bin/apt*; do sudo ln -s /usr/bin/eatmydata $(basename $i); done
RUN sudo apt-get install -y debootstrap qemu-user-static binfmt-support
RUN sudo debootstrap --foreign --arch arm64 trusty ubuntu-arm64-chroot
RUN ls ubuntu-arm64-chroot
RUN sudo cp /usr/bin/qemu-aarch64-static ubuntu-arm64-chroot/usr/bin
RUN sudo cp /etc/resolv.conf ubuntu-arm64-chroot/etc
RUN sudo DEBIAN_FRONTEND=noninteractive DEBCONF_NONINTERACTIVE_SEEN=true LC_ALL=C LANGUAGE=C LANG=C chroot ubuntu-arm64-chroot /debootstrap/debootstrap --second-stage; sudo cat ubuntu-arm64-chroot/debootstrap/debootstrap.log

fails with:

Step 11 : RUN sudo DEBIAN_FRONTEND=noninteractive DEBCONF_NONINTERACTIVE_SEEN=true LC_ALL=C LANGUAGE=C LANG=C chroot ubuntu-arm64-chroot /debootstrap/debootstrap --second-stage; sudo cat ubuntu-arm64-chroot/debootstrap/debootstrap.log
 ---> Running in 2654257e860a
I: Keyring file not available at /usr/share/keyrings/ubuntu-archive-keyring.gpg; switching to https mirror https://mirrors.kernel.org/debian
W: Failure trying to run:  mount -t proc proc /proc
W: See //debootstrap/debootstrap.log for details
gpgv: Signature made Thu May  8 14:20:33 2014 UTC using DSA key ID 437D05B5
gpgv: Good signature from "Ubuntu Archive Automatic Signing Key <[email protected]>"
gpgv: Signature made Thu May  8 14:20:33 2014 UTC using RSA key ID C0B21F32
gpgv: Good signature from "Ubuntu Archive Automatic Signing Key (2012) <[email protected]>"
mount: block device proc is write-protected, mounting read-only
mount: cannot mount block device proc read-only
 ---> de534a4e5458
Removing intermediate container 2654257e860a
Successfully built de534a4e5458

What about instead of RUNP, have a build flag --insecure.
For all RUN commands, the subsequent container would be run with --add-cap=all. This instead of privileged since privileged does some other things as well...
But really this could change to implement the full privileged settings if needed at some point without having to modify the flag.

@cpuguy83
I don't mind having to use a flag passed into docker build that enables RUN commands to be privileged or enables RUNP commands. There is value in being able to look at a Dockerfile and tell by the commands or something inside it, that it would require privileged access, and at runtime instead of getting to step 10 and erroring, it would have shortcutted at the start that the Dockerfile contains commands which require privilege.


The use case that brought me to this thread is bind mounts, which I want to be able to do intracontainer. Right now you can only do them if you run the container in privileged mode. It forces you to chain commands together at start, or have an init script that runs to complete setting up the system before the process you wanted to run.

It would be nice to be able to just have in the Dockerfile:

RUN mount --bind /dir1 /dir2

I'll describe my use case more so this isn't just a broad give me privileged commands request. My particular case is I want to bind mount a directory in the application area to a data volume that was attached.

e.g.

/usr/local/application/data -> /mnt/data 
/mnt/data -> HOST:/var/datasets/dataset1

This could be solved by also doing the volume mount directly into the application area, but I am looking for a way to have them provided in a common location and let the application container perform its specific mapping. This could also be solved with symlinks , but some applications do not play well with symlinks as their target / data folder. And if the application supports configuring its data directory location, that also could be done to point to the volume mount area. My use case the application does not support configuring the data directory location, and the reality is there will always be applications which you have to perform some bind mount or symlinking to properly separate their data and application space.

This ability of being able to do this A -> B -> C allows keeping the data containers generic and provides flexibility in the different combinations you can achieve with --volumes-from with application and data containers.

You could achieve this also by having a chain of containers with --volumes-from:

GenericDataContainer -> ApplicationDataContainer -> ApplicationContainer

Which may be the right answer, but you could eliminate having to make yet another container for the application data if the application container could execute a bind mount.

I can achieve this today by running the container in privileged mode and then go execute the mount bind, but as you will see below, there is not a way to make that mount bind persist and it must be reset up every time you start the container. Symlinks are persisted on commits.

The answer for my particular use case may be to use the 3 chain container approach or init script, but having the ability to do bind mounts intracontainer (or other privileged commands) from the Dockerfile would be nice. There are probably other use cases for bind mounts that could be described which do not involve any host to container mapping which cannot be solved with chaining data containers.

Not sure if this related or more a bind mount specific issue, but having the results of privileged commands persist when you do a docker commit would allow you to separate the building of the docker image and the running of the docker image. You could control the area where you perform the docker build and end users only get the committed container which they can run in unprivileged mode. This is currently not the cause when you execute a bind mount and commit. This might be more related to how /proc/mounts work though.

Here is a simple example

[root@ip-10-0-3-202 ~]# docker run --privileged -i -t --name test_priv centos:centos6 /bin/bash
[root@d1d037cb170c /]# cat /proc/mounts 
rootfs / rootfs rw 0 0
/dev/mapper/docker-202:1-25352538-d1d037cb170c12dab94ebd01c56807210cf2aec50bef52c944f89225c8346827 / ext4 rw,seclabel,relatime,discard,stripe=16,data=ordered 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
tmpfs /dev tmpfs rw,seclabel,nosuid,mode=755 0 0
shm /dev/shm tmpfs rw,seclabel,nosuid,nodev,noexec,relatime,size=65536k 0 0
devpts /dev/pts devpts rw,seclabel,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666 0 0
sysfs /sys sysfs rw,seclabel,nosuid,nodev,noexec,relatime 0 0
/dev/xvda1 /etc/resolv.conf xfs rw,seclabel,relatime,attr2,inode64,noquota 0 0
/dev/xvda1 /etc/hostname xfs rw,seclabel,relatime,attr2,inode64,noquota 0 0
/dev/xvda1 /etc/hosts xfs rw,seclabel,relatime,attr2,inode64,noquota 0 0
tmpfs /run/secrets tmpfs rw,seclabel,nosuid,nodev,noexec,relatime 0 0
devpts /dev/console devpts rw,seclabel,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0

Create a bind mount example, also create a symlink example

[root@d1d037cb170c /]# mkdir /var/data1
[root@d1d037cb170c /]# mkdir /var/data2
[root@d1d037cb170c /]# mount --bind /var/data1 /var/data2
[root@d1d037cb170c /]# ln -s /var/data1 /var/data3

Show file is seen from all 3 directories

[root@d1d037cb170c /]# touch /var/data1/test
[root@d1d037cb170c /]# ls /var/data1
test
[root@d1d037cb170c /]# ls /var/data2
test
[root@d1d037cb170c /]# ls /var/data3
test

Show /proc/mounts updated

[root@d1d037cb170c /]# cat /proc/mounts
rootfs / rootfs rw 0 0
/dev/mapper/docker-202:1-25352538-d1d037cb170c12dab94ebd01c56807210cf2aec50bef52c944f89225c8346827 / ext4 rw,seclabel,relatime,discard,stripe=16,data=ordered 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
tmpfs /dev tmpfs rw,seclabel,nosuid,mode=755 0 0
shm /dev/shm tmpfs rw,seclabel,nosuid,nodev,noexec,relatime,size=65536k 0 0
devpts /dev/pts devpts rw,seclabel,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666 0 0
sysfs /sys sysfs rw,seclabel,nosuid,nodev,noexec,relatime 0 0
/dev/xvda1 /etc/resolv.conf xfs rw,seclabel,relatime,attr2,inode64,noquota 0 0
/dev/xvda1 /etc/hostname xfs rw,seclabel,relatime,attr2,inode64,noquota 0 0
/dev/xvda1 /etc/hosts xfs rw,seclabel,relatime,attr2,inode64,noquota 0 0
tmpfs /run/secrets tmpfs rw,seclabel,nosuid,nodev,noexec,relatime 0 0
devpts /dev/console devpts rw,seclabel,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
/dev/mapper/docker-202:1-25352538-d1d037cb170c12dab94ebd01c56807210cf2aec50bef52c944f89225c8346827 /var/data2 ext4 rw,seclabel,relatime,discard,stripe=16,data=ordered 0 0

Exit the container which stops it, then start again

[root@d1d037cb170c /]# exit
[root@ip-10-0-3-202 ~]# docker start -a -i test_priv
test_priv

/proc/mounts is missing the bind mount

[root@d1d037cb170c /]# cat /proc/mounts 
rootfs / rootfs rw 0 0
/dev/mapper/docker-202:1-25352538-d1d037cb170c12dab94ebd01c56807210cf2aec50bef52c944f89225c8346827 / ext4 rw,seclabel,relatime,discard,stripe=16,data=ordered 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
tmpfs /dev tmpfs rw,seclabel,nosuid,mode=755 0 0
shm /dev/shm tmpfs rw,seclabel,nosuid,nodev,noexec,relatime,size=65536k 0 0
devpts /dev/pts devpts rw,seclabel,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666 0 0
sysfs /sys sysfs rw,seclabel,nosuid,nodev,noexec,relatime 0 0
/dev/xvda1 /etc/resolv.conf xfs rw,seclabel,relatime,attr2,inode64,noquota 0 0
/dev/xvda1 /etc/hostname xfs rw,seclabel,relatime,attr2,inode64,noquota 0 0
/dev/xvda1 /etc/hosts xfs rw,seclabel,relatime,attr2,inode64,noquota 0 0
tmpfs /run/secrets tmpfs rw,seclabel,nosuid,nodev,noexec,relatime 0 0
devpts /dev/console devpts rw,seclabel,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0

Symlink survived, but not bind mount

[root@d1d037cb170c /]# ls /var/data1
test
[root@d1d037cb170c /]# ls /var/data2
[root@d1d037cb170c /]# ls /var/data3
test
[root@d1d037cb170c /]#

Resetup bind mount

[root@d1d037cb170c /]# mount --bind /var/data1 /var/data2

Instead of exiting the container, detach with ctrl+p ctrl+q and then commit the container

Commit the container as new image, start new container from image in nonpriv mode

[root@ip-10-0-3-202 ~]# docker commit test_priv test_priv
74305f12076a8a6a78f492fd5f5110b251a1d361e63dda2b167848f59e3799e2
[root@ip-10-0-3-202 ~]# docker run -i -t --name test_nonpriv test_priv /bin/bash

Check the /proc/mounts
bind mount is missing, not sure what triggered the extra /proc/[sys,sysrq-trigger,irq,bus,kcore] mounts

[root@ba1ba4083763 /]# cat /proc/mounts 
rootfs / rootfs rw 0 0
/dev/mapper/docker-202:1-25352538-ba1ba40837632c3900e4986b78d234aefbe678a5ad7e675dbab7d91a9a68469e / ext4 rw,context="system_u:object_r:svirt_sandbox_file_t:s0:c327,c505",relatime,discard,stripe=16,data=ordered 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
tmpfs /dev tmpfs rw,context="system_u:object_r:svirt_sandbox_file_t:s0:c327,c505",nosuid,mode=755 0 0
shm /dev/shm tmpfs rw,context="system_u:object_r:svirt_sandbox_file_t:s0:c327,c505",nosuid,nodev,noexec,relatime,size=65536k 0 0
devpts /dev/pts devpts rw,context="system_u:object_r:svirt_sandbox_file_t:s0:c327,c505",nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666 0 0
sysfs /sys sysfs ro,seclabel,nosuid,nodev,noexec,relatime 0 0
/dev/xvda1 /etc/resolv.conf xfs rw,seclabel,relatime,attr2,inode64,noquota 0 0
/dev/xvda1 /etc/hostname xfs rw,seclabel,relatime,attr2,inode64,noquota 0 0
/dev/xvda1 /etc/hosts xfs rw,seclabel,relatime,attr2,inode64,noquota 0 0
tmpfs /run/secrets tmpfs rw,context="system_u:object_r:svirt_sandbox_file_t:s0:c327,c505",nosuid,nodev,noexec,relatime 0 0
devpts /dev/console devpts rw,seclabel,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
proc /proc/sys proc ro,nosuid,nodev,noexec,relatime 0 0
proc /proc/sysrq-trigger proc ro,nosuid,nodev,noexec,relatime 0 0
proc /proc/irq proc ro,nosuid,nodev,noexec,relatime 0 0
proc /proc/bus proc ro,nosuid,nodev,noexec,relatime 0 0
tmpfs /proc/kcore tmpfs rw,context="system_u:object_r:svirt_sandbox_file_t:s0:c327,c505",nosuid,mode=755 0 0

Symlink survived

[root@ba1ba4083763 /]# ls /var/data1
test
[root@ba1ba4083763 /]# ls /var/data2
[root@ba1ba4083763 /]# ls /var/data3
test
[root@ba1ba4083763 /]# exit

I am currently trying to run docker images in my build step, using dind. So, currently, there is no way to use run docker images in your build?

Everyone, if you desire this, try '/usr/bin/unshare -f -m -u -i -n -p -U -r -- /path/to/binary'. This will create a container inside of your build with a user namespace. You may tweak the options to unshare as necessary. I actually use this to run '/sbin/capsh', to granularly set the capabilities for my processes.

I cannot say this will solve all user-cases for privileged builds, but it should help some of you.

I agree this should become part of Docker itself, and integration of user namespaces appears to be in progress.

@saulshanabrook you cannot run docker images in a build, not exactly. I hope that one day soon, this will be possible. I've done some investigation into this and have discovered that you may do a 'docker pull' from within a build, as long as you use VFS storage. Conveniently, 'docker save' also works.

It isn't a real solution for someone looking to run docker images, but I'll note that 'unshare' and 'capsh' do work, so it's possible to do a container-like runtime in unprivileged containers (such as during a build). Arguably, it's possible to side-step 'docker run' and do this step manually, and recommit images back into docker. I have most of this working today, even if I don't have it all wrapped together in a bow. Eventually, of course, such functionality needs to go into Docker itself.

+1 for docker pull via RUNP

The inability to run docker build privileged promotes building manually with a shell and docker commit, which renders Dockerfiles pointless. I don't think that adding a privileged flag to docker build would draw a line between builds and privileged builds; that line was already drawn when the flag was added to run and it needs to be supported.

+1 This makes the reproducible docker container at any given point in time( just need to carry the dockerfile alone)

+1 having to completely tear apart my ansible baseline roles just to work around stuff like this. I was really hoping that docker adoption would let me use a lot of my existing ansible code but not only have I already created custom roles for size but now I am having to work around issues like this.

@lsjommer what kinda of things do you have to work around? --privileged is a completely insecure way to run a container and gives the user in the container full root access to the host.

I'm not saying let's not implement this, but let's get real about what we are talking about.

Also this would be relatively easy to implement if someone wants to go for it...

@cpuguy83 This is from our standard "bare metal" baseline role that I am trying to adopt into docker containers to get all libs installed, and it deals with shared memory but maybe I don't even need to bother with it on the container build and need only run it on the container host?
http://pastebin.com/P3QQxjNQ

I admit that I don't fully understand how Docker handles resource sharing.

@ljsommer So setting shm is a different beast all-together and would not persist between RUN commands (or for when you actually docker run) anyway.

@cpuguy83 Yeah I think this was mostly my fault in the regard that I hit what I thought to be an issue that I can just shift onto the baseline for the container host itself.
Thanks for taking the time to respond and apologies for not appropriately educating myself before complaining.
;)

Any idea on RUNP / -privileged during build process?
It would be great fo setting ip tables to limit docker access to spcified ip-addresses

I also want RUNP and/or "docker build --privileged".

FROM ubuntu:latest
MAINTAINER xyz

RUN apt-get -qq update
RUN apt-get -yq install iptables

RUN iptables -t nat -I OUTPUT -p tcp --dport 443 -j DNAT --to-destination 127.0.0.1:8080 && iptables-save > /etc/iptables.rules

This Dockerfile doesn't work due to following error but works if doing with "docker run --privileged"...

getsockopt failed strangely: Operation not permitted

@malcm, @sakurai-youhei: even if you had something like RUNP, it would not work in this scenario, because iptables rules are not persisted in the filesystem.

Let me explain: when you do RUN x, Docker executes x then takes a snapshot of the filesystem. Things outside of the filesystem (running processes, routing tables, iptables rules, sysctl settings...) are not stored in Docker images.

If you want custom iptables rules, one method is:

  • start your container e.g. with --name myapp
  • start another container, privileged, one-shot, to set up the iptables rule, e.g. docker run --net container:myapp --privileged iptablesimage iptables -t nat ...

Does that make sense?

@jpetazzo: Thank you for your response. In my case, I put second command to make iptables rule persist as data on file system as below. This should enable me to load iptables rule after starting container with --privileged option.

RUN do-something-with-iptables && iptables-save > /etc/iptables.rules

Without RUNP nor "build --privileged", I'm enforced to write like:

ADD iptables.rules /etc/

Yes, this might be sufficient, however, I need to add iptables.rules beside Dockerfile in my repo.

That's why I want (or would like to have gently) RUNP. :)

@jpetazzo @strib
Beyond iptables issues, mounts and other privileged operations, I think there's one build scenario we should be addressing.

We ship appliances for deployment in VMs and bare-metal installation. However, for testing we use container environments. In turn, inside of those appliances we run containers. So test containers have to be based on docker-in-docker. Now imagine that we have a service image that needs to be pre-loaded in the test image (so that service images are not downloaded from the registry in test time). Right now we can't do that since we can't run d-in-d container in privileged mode during the build of the Dockerfile that uses d-in-d as a base: docker daemon won't start, "docker pull" or "docker load" won't work.

I had an issue where, when running on a RHEL7 host, su would fail if the current user was root. Weirdly, if the current user was something else, su worked fine. Regardless, the workaround was to pass the run command the flag --add-cap=SYS_RESOURCE; due to this issue, however, it wasn't possible to do this during the build step.

+1 Scripting around Dockerfiles with docker run and docker commit is ridiculous. Please include this functionality.

+1 on the necessity of this feature. I think that a global "security level" could be configured in a config file which limits the capabilities that can be given to a container. There should be a safe default (like today) and a sysadmin could change it to allow containers to run with more privileges. dockerfiles with such a RUNP instruction could fail to run with a message such as "this Dockerfile requires the following capabilities .... to build" on a system which has such global limits.

I think this allows a balance between security and usability.

We also have this issue while trying to build an image with an evli proprietary db,which shall remain nameless, inside it.
The db wants to allocate massive amounts of memory, which its not allowed to by docker.

Our current workaround is a 2 phase build with a run --privileged step, and separate commit step.

Maybe we could configure docker to allow the memory allocation some other way. Its a bit hard to find out what the db actually wants to do because its proprietary.

+1
for this feature.
for historic and use case example see this dupe
https://github.com/docker/docker/issues/12138#issuecomment-90536998
thanks @cpuguy83 for pointing out the dupe

i too have this issue, trying to hook into a cifs share during docker build is not permitted unless the privileged flag is supplied, any way around this?

There's now a pull request that implements this; you can check progress there; https://github.com/docker/docker/issues/12261

If something does require privileged mode then it is possibly modifying the host in some way, which means the image may not be portable since these modifications would need to be run on other hosts trying to consume the image.

Once #13171 is merged I think we should close this as it will make rolling your own builder trivial, and as such allowing --privileged.
I don't think the built-in docker build should allow this.

So @cpuguy83, if I understand right, the way to support this issue would be to completely reimplement docker build but with an extra parameter?

I guess to say, once the other patch is pushed through, I'd need to put together my own version of docker build (maybe docker pbuild?) to fill in the additional functionality?

Is there any progress on this issue? I checked the PRs mentioned above and all of them failed.
Is it possible to make a BUILD --privileged/--granted option more granular and make granted accesses to a specific group of host resources limited only to the image builder/owner?

+1 for any solution that allows me to do RUN docker pull in a Dockerfile.

Use case: I need a bunch of tools for image conversion and documentation build, however, all these tools can't be installed into a single image because of conflicting libraries. That's why I separate some of these tools into a separate image and I would like to distribute all tools in a single image, i.e. image in an image. That's why I want to do RUN docker pull in my Dockerfile.

@cpuguy83 it doesn't look like this issue was resolved to anyone's satisfaction. I absolutely, 100%, need to be able to do something as dull as write to /proc/sys/kernel/core_pattern during a build.

In the current world, I can do this privileged operation via a run workaround and just push that image to the hub anyway. Additionally, no Dockerfile I have _ever_ produced is strictly reproducible, because they pull from random constantly changing public repos. I had no idea that

  1. Public consumption of my images was a priority.
  2. They had any need, ever, to be reproducible.

People are _going_ to do shitty workarounds to get privileged in build. I think you should definitely go where your users are and allow this in the core build tool. Add a terrifying message if necessary.

cc @thaJeztah, who seems sympathetic to this position

Look, I created a PR to enable this and it was rejected.
I don't see this happening with the builder in any form.

It seemed like you made the final call. I'll agitate in the PR thread itself, then.

This is needed for installing the older JDK 1.6 packages under CentOS as its RPMs attempt to register is binmft_misc which fails without running under --privileged.

Dockerbuild is a non-stater for building containers with it.

To replicate

FROM Centos5.11
RUN yum intall -y jre-1.6.0_29-fcs

We need to have privileged command part of build as an optional flag. I am trying to port one of our application to docker as a POC and it fails for one of our component as it has IPtables settings that are not being applied and the build fails. I can do the necessary changes manually and commit it but then what's the fun of docker build. It's just a part of the build and should be easy to port as it already a part of the main release.

Docker should be easily able to run intermediate containers with privileged option set.

@shubhamrajvanshi we're in the process of moving the "builder" out of the daemon (and to the client), this should open the door for more customizations to the build process (including being able to implement custom builders). Possibly allowing "privileged" builds can be considered, but that's a decision that can be made after the refactoring. (See https://github.com/docker/docker/blob/master/ROADMAP.md#122-builder)

@shubhamrajvanshi you can't make change to iptables in build for a good reason, the settings would never stick.

There are very few things one can do with --privileged that even makes sense in build.

@thaJeztah Thanks that would be helpful.
@cpuguy83 Would that be the case even if I am using iptables-persistent package on the image.

This would save the rules to disk, then they'd still have to be reloaded, unfortunately.

_USER POLL_

_The best way to get notified when there are changes in this discussion is by clicking the Subscribe button in the top right._

The people listed below have appreciated your meaningfull discussion with a random +1:

@karelstriegel

I would really like this too, to allow using nVidia's CUDA drivers from with Docker on CoreOS. The installer they provide builds a kernel module against the kernel source, and then installs it using modprobe. I can't see how to make it work without some kind of --privileged option to build.

Why not by default always support privileged mode on build?

+1
I want to use mysql command in Dockerfile for centos7.
Of course we can use entrypoint.sh but it's more useful if we can use -privileged for both build and run.

There is no need for --privileged to run the MySQL command.

This issue should be closed as it doesn't seem like this is going to happen (or even should happen).
Allowing privileged in build means allowing the builder to change things on the host which in turn makes it so that image only works on that host (or hosts that have had similar modifications).

Allowing privileged in build means allowing the builder to change things on the host which in turn makes it so that image only works on that host (or hosts that have had similar modifications).

Does this apply to the chroot user case?

I'm trying to work out how to do dpkg-depcheck -d ./configure without something like this.

During build (or run without --priviledged) I get the error below - I've got no idea how to work out what permission it needs or how to enable it.

dpkg-depcheck -d ./configure
strace: test_ptrace_setoptions_followfork: PTRACE_TRACEME doesn't work: Permission denied
strace: test_ptrace_setoptions_followfork: unexpected exit status 1
Running strace failed (command line:
strace -e trace=open,execve -f -q -o /tmp/depchJNii2o ./configure
devel@98013910108c:~/src/cairo-1.14.2$ 

After about 3 years and 162 comments, I think it's got enuf interest to do. The comments about a privileged mode not being necessary for most of the cited cases are true, even my own; but should not be used to proscribe against what might be useful for local, temporary, exploratory, and/or expedient builds. Publish warnings until the cows fart out a harmony, print out command line warnings, revile and denounce it's use, but give people flexibility. Portability is not always everyone's primary interest.

_USER POLL_

_The best way to get notified of updates is to use the _Subscribe_ button on this page._

Please don't use "+1" or "I have this too" comments on issues. We automatically
collect those comments to keep the thread short.

The people listed below have upvoted this issue by leaving a +1 comment:

@robeferre

+1

I really need to mount a NFS volume inside a docker container, until now I couldn't create the NFS share without the "--privileged=true" flag. The best case in my opinion is to build the image using privileged command. How Is this possible?

+1

Step 19 : RUN lxc-create -t ubuntu.sf -n percise -- -r precise -a i386 -b root
 ---> Running in 4c51b7cf0058
lxc_container: lxccontainer.c: create_run_template: 893 error unsharing mounts
lxc_container: lxccontainer.c: create_run_template: 1084 container creation template for percise failed
lxc_container: lxc_create.c: main: 274 Error creating container percise
The command '/bin/sh -c lxc-create -t ubuntu.sf -n percise -- -r precise -a i386 -b root' returned a non-zero code: 1

I'm trying to install gobject-introspection in gentoo system in docker during build but it fails with this error:

  • ISE:_do_ptrace: ptrace(PTRACE_TRACEME, ..., 0x0000000000000000, 0x0000000000000000): Operation not permitted

The same result when I try to install it manually in container but when try it from container launched in privileged mode (docker run --privileged) then it works well.

The same problem when I try to install glibc.

So I also need a way how to run privileged command during build.

I'm using docker version 1.10.1 and the problem with glibc doesn't appears in 1.9

In version 1.10 something broken an we can not build 32bit containers, because networking is unavailable
--privileged or --security-opt seccomp:unconfined for BUILD - is really necessary.
Or corresponding directives in Dockerfile

big +1 from me

It's a real problem for me that I cannot use 'mount' command during the build.
I was trying to overcome the limitaion of a host directory cannot be mounted into a container during the build, so I set up an NFS server on host and tried mounting an NFS share, just to find out that's it's not possible because of unprivileged mode.

In my use case I need to install some stuff in the image without copying it to the build context, and ADDing it before installing it.

Feels like I'm left without options.

thaJeztah referenced this issue on 10 Mar
Regression in LTTng behavior after upgrading to 1.10.2 #20818 Closed

No it is not closed, we are using 1.11.0-0~wily and in 32bit containers neworking is not working since 1.10.0, but 1.9.x worked well.
Only -privileged can help us start containers. But we can not build new

I am amazed that something so obviously needed by so many people has not been implemented even though people have been begging for it for 2.5 years in this thread alone and given that people have submitted PRs to implement this functionality.

Agreed @ctindel, this issue is one of the reasons why I'm migrating from docker to rkt.

@ctindel It's something we're not ready to implement or support. The implementation itself is rather simple (I even implemented it myself so we could discuss), that's not the issue.

--privileged is a tank, and allowing it on build is dangerous, and greatly affects image portability.

Brian,

If you have a work around can you please share it with me as well. I would
appreciate that.

Thanks
Shubham Rajvanshi
669-300-8346

On Mon, May 2, 2016 at 2:30 PM, Brian Goff [email protected] wrote:

@ctindel https://github.com/ctindel It's something we're not ready to
implement or support. The implementation itself is rather simple (I even
implemented it myself so we could discuss), that's not the issue.

--privileged is a tank, and allowing it on build is dangerous, and
greatly affects image portability.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/docker/docker/issues/1916#issuecomment-216369957

I don't understand the effect on portability. How do privileged operations at build time affect portability? I mean, it's not hard to create non-portable images in various ways with or without privilege, but is there some way that images built with privileged operations are necessarily non-portable?

I don't believe every container must be portable. Some containers are created to share with the community and some might be created for deploying internal applications.

The portability problem with an app that requires running in privileged mode lies in the app itself.

Privileged mode is the means of last resort for getting them the app to work without code changes.

I believe that an image builder who requires privileged mode building understands that such container might also require running in privileged mode.

It should be clearly documented that building in privileged mode is discouraged since it might create portability problems.

Sent from Outlook Mobilehttps://aka.ms/blhgte

On Mon, May 2, 2016 at 2:53 PM -0700, "Trevor Blackwell" <[email protected]notifications@github.com> wrote:

I don't understand the effect on portability. How do privileged operations at build time affect portability? I mean, it's not hard to create non-portable images in various ways with or without privilege, but is there some way that images built with privileged operations are necessarily non-portable?

You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHubhttps://github.com/docker/docker/issues/1916#issuecomment-216375159

@tlbtlbtlb Because privileged mode gives you full access to the host. Maybe I set something simple like shmmax or something much worse.
I guarantee these things will happen on day 1 that this is available.

@davidl-zend "portable" doesn't mean sharing with the community. It means moving from one machine to another.

@cpuguy83 As others have pointed out, there are many many other ways to break image portability as well. All you're doing by not having a process for privileged building is forcing people to have a two step process, either by partially building from a Dockerfile and then manually changing the container and doing a commit, or by partially building from a Dockerfile and finishing the privileged installation the first time the container is launched which sucks because that means if you're doing something time consuming it could take minutes for the first boot to work.

Given the comments I've seen in various forums I'm willing to bet that A LOT of people are doing exactly this already to work around the docker limitation that exists today.

Once you have an architecture where image portability is broken in dozens of other ways, what point is there in tilting against this specific windmill?

Clearly you would no longer be able to have docker hub or travis-ci build your image, but people who need it built with privilege mode would understand that anyway.

@ctindel I would love to see some examples of image portability being broken

Doing this kind of stuff at first container start is _exactly_ the right way to go.
It's a runtime configuration it shouldn't be in the image.

@cpuguy83 Are you debating the fact that someone could do a partial build from a Dockerfile, spin up a container in privileged mode, make more changes, then do a docker commit? How is that any different than doing a build in privileged mode? Because that's what other people are doing today.

I am not trying to be testy, I'm just saying this is a severe limitation on the platform that people are working around in an awkward way.

For example, there are 3rd party debian packages (and probably RPMs) that can't be installed correctly unless you have certain capabilities in the system. Installing a debian package isn't a "runtime configuration" it's a freaking part of the installation process.

@ctindel I'm not debating this at all. The difference is supporting the behavior. If there was no difference, we wouldn't be having this discussion.

For me, I'm a consenting adult and want to be able to roll a packstack image across a bunch of nodes. Building with a Dockerfile bombs currently, so I have to cheat to get around the docker restrictions.

@cpuguy83 It is still very unclear to me (and I think others in this thread) what exactly is gained by not providing this option that people are working around in other ways. The sort of architectural purity (I don't have a better word for it) you seem to be arguing for was gone the minute the commit option was added anyway, so I really do not understand what difference there is whether it is done via a Dockerfile with privileged build or via a docker commit in a privileged container.

Except that one way is a freaking PITA for people and one way slots into the current mechanism of building with Dockerfile very nicely.

Also, you didn't answer my question. Why do you consider the simple installation of a 3rd party debian package (or packstack) to be "runtime configuration"?

@ctindel Portability. Just because something can be done doesn't mean it is supported, and including it in build, making it convenient for everyone to do, it means it is supported.
We _will_ be flooded with issues of images not working between hosts... which defeats basically the whole reason for Docker.

If a package requires --privileged to even be installed, then it should be addressed with the package. Installation should not require --privileged... and if it really does require it, then this is a sign that the installation itself is not portable (requires changing things on the host)... I'd like to even see docker be runnable in a container without --privileged (note that's runnable, you can install it all you want with no issue without --privileged).

But allowing it to be done via docker commit means it is also supported!

I don't understand, you're making tons of people work around limitations in this product because you're worried someone is going to complain to you guys personally about some sort of unsupported image?

You still didn't answer my question about why the act of installing a package (I'm not even talking about configuring it here) is an act of "runtime configuration". Just saying "portability" doesn't mean anything.

Is there something x86_64 specific about docker? Won't there eventually be docker images that are built for a specific CPU architecture? Doesn't that also make them non-portable? I mean this whole idea that somehow you'll always be able to take any docker image and run it on any docker host in the world regardless of tons of other variables seems impossible anyway, so I don't understand the super strong need to push back on this particular feature that people are asking for.

BTW, let me thank you here for your replies and continued engagement. Plenty of other github projects where issue threads are ignored!

I concur with the point about people working around this by using docker run --privileged with docker commit. I've made a solution like that for two companies so far. People WILL make images like that and there is no point in acting as if doing it is some horrible, horrible thing.

Hell, if you are so afraid of supporting containers that were built with --privileged then just state so clearly in the documentation so people are perfectly aware they are doing it at their own risk. Though I haven't seen any negative effects so far. But then again we haven't been attempting to run containers on different distros.

@PerilousApricot what is actually causing the problem with packstack? We are happy to fix specific problems, or help upstream fix them, but do not think that just adding --privileged which gives complete unfettered root access to your host server is the right way to do this. All cases that I am aware of where people have raised specific build issues generally can be fixed, as most things do not actually need root access on the host machine in fact in order to do a build.

@justincormack What's the solution for a 3rd party package (i.e. can't be changed) that starts it's own service where the init script needs to mount a tmpfs filesystem? I mean even ignoring --privileged for now there's also no way to do a --cap-add or a --security-opt apparmor:unconfined during a build (I don't think?)

@ctindel It shouldn't be trying to mount a tmpfs on installation. If it needs tmpfs at runtime then great, but at install time it is most certainly not correct.

@cpuguy83 You are imposing architectural and implementation philosophy on something that is unchangeable because it's coming from a commercial 3rd party. Not everything is going to be fixed "upstream", even if they fix the newer version of the .deb or .rpm it doesn't mean they'll go back and repost older version which is what you might need in the docker image.

That's the point of this whole discussion, you are imposing arbitrary restrictions that make it much more difficult to use docker because of some philosophical concern about requests for support from people who "do it wrong".

That's like saying Operating Systems should not allow people to change process scheduling classes because if you do it wrong it can lead to priority inversion. Or, that nobody should make a hammer because it's possible to hit your thumb if you use it wrong.

As has been said many times, docker ALREADY SUPPORTS THIS via the commit command, it's just more painful for your users. The people who don't want this feature won't use it, and the consenting adults who want to use it by understanding the limitations can do so with eyes wide open.

@ctindel More like no, you can't handle this nuclear bomb because you could kill everyone in a 50 km radius.

What is it about this package that it needs to load a tmpfs during installation? Installation is literally extracting files from some archive format to the rootfs.

Anything can be changed.
It is a far simpler and safer change to be made upstream to not mount a tmpfs at installation than it is to enable privileged on build.

Docker is about workload portability. Enabling privileged on build (or extra privileges, or tweaking security profiles, etc) fundamentally breaks this and is not something we can accept today.

commit and build are two very different things, and just because it's possible to do something one way doesn't mean we should allow doing it in every other way too.

FROM python

ENV PACKSTACK_VERSION 7.0.1
RUN cd /opt && git clone https://github.com/openstack/packstack.git \
  && cd packstack \
  && git checkout $PACKSTACK_VERSION \
  && rm -rf .git \
  && python setup.py install

No privileges required.

The church of protability.
One day "forced" portability will kill this project - it's already doing it.
So many features are being denied because of elusive portability, so much progress is not being made because of it .....
One day someone will fork a project and make portability optional. Dreams ... dreams .... Amen.

If we break it down to two cases:

  1. Installers that use privileged operations frivolously, like mounting tmpfs for performance. Such installers could easily be fixed (but might not be in the near future).

In this case, it's a valid philosophy for Docker to push back on poorly behaved installers. Most installers have some kind of workaround which just makes the Dockerfile a bit longer.

  1. Installers that depend fundamentally on privileged operations, like installing kernel modules for GPU drivers. These are also fundamentally non-portable. They won't work on docker-machine for Mac, for example.

In this case, the Docker experience is broken anyway. I can't use docker-machine on Mac, for example, I can only build the image on a compatible target host machine. My use case was installing the nVidia GPU drivers on a host OS (CoreOS) which discouraged installing directly in the host OS.

So, I guess I've come around to see the virtue of not supporting --privileged in either case. I think what made me change my mind is the convenience of building images on my laptop using docker-machine, rather than first pushing my code to a Ubuntu box and building there.

@tlbtlbtlb I don't understand what "virtue" you're referring. Consider something that is not frivolous, but there are tons of docker images that will run in one environment but not another. For example, you can mount a host volume into a mongodb container going from linux->linux because and the mmapv1 storage driver will work fine, but you can't pass a mac osx directory through virtualbox to a mongodb container on your laptop because the mmap stuff won't work properly in that case.

I realize this isn't a problem with respect to building, but the idea that docker images are "portable" and can "run anywhere" is total nonsense at this point. If it's the case that they can't run anywhere, what is the virtue in saying they should be able to "build anywhere"?

The point is that the mongodb image works everywhere. Providing invalid runtime configuration is a different beast.

Docker has a very specific and intestinal separation of portable configuration and non-portable configuration.

How about this ?
I need to have my real ips inside container to my nginx configuration check pass.

this is my Dockerfile:

FROM ubuntu:14.04.4

RUN apt-get update
RUN apt-get install -y software-properties-common
RUN add-apt-repository ppa:nginx/stable
RUN apt-get update
RUN apt-get install -y nginx-full vim
RUN ifconfig lo:0 192.168.168.70 netmask 255.255.255.0 up
RUN ifconfig lo:1 192.168.168.57 netmask 255.255.255.0 up
RUN ifconfig lo:2 192.168.168.58 netmask 255.255.255.0 up

ADD . /etc/nginx

➜  nginx git:(ha-node-01) ✗ docker build -t nginx4test .
Sending build context to Docker daemon 976.4 kB
Step 1 : FROM ubuntu:14.04.4
 ---> 90d5884b1ee0
Step 2 : RUN apt-get update
 ---> Using cache
 ---> eea42cb6135d
Step 3 : RUN apt-get install -y software-properties-common
 ---> Using cache
 ---> 9db86ab17850
Step 4 : RUN add-apt-repository ppa:nginx/stable
 ---> Using cache
 ---> 5ed2266a93a9
Step 5 : RUN apt-get update
 ---> Using cache
 ---> 09fcfdc1fed3
Step 6 : RUN apt-get install -y nginx-full vim
 ---> Using cache
 ---> cc0c1662e009
Step 7 : RUN ifconfig lo:0 192.168.168.70 netmask 255.255.255.0 up
 ---> Running in 5d962ec4e35d
SIOCSIFADDR: Operation not permitted
SIOCSIFFLAGS: Operation not permitted
SIOCSIFNETMASK: Operation not permitted
SIOCSIFFLAGS: Operation not permitted
The command '/bin/sh -c ifconfig lo:0 192.168.168.70 netmask 255.255.255.0 up' returned a non-zero code: 255

Of cause if I run container with privileges option I can settle up ip to loopback interface. But it is one more script to add.

@cpuguy83 I have about 20 lines or so of iptable entries I'd like to RUN in my Dockderfile but can't since I need --cap-add=NET_ADMIN. These commands should happen regardless of who is running the container and regardless of what machine they run it on (the container runs an internal app). Where/how would you suggest I do that based on what you discuss above?

@MatthewHerbst Unfortunately iptables rules won't/can't persist with the image.

@cpuguy83 I'm using a centos:6 image and can run run /sbin/service iptables save to persist the rules to the filesystem. I believe their is a similar capability on Ubuntu and others via the iptables-persistent package.

You can just generate the iptables rules for that file, there is no need to
actually apply them. The network situation where the container is run may
be very different so you should just apply rules at run time (if that, you
may be better off with the host generating them).

On 16 May 2016 16:03, "Matthew Herbst" [email protected] wrote:

@cpuguy83 I'm on CentOS and can run run /sbin/service iptables save to
persist the rules to the filesystem. I believe their is a similar
capability on Ubuntu and others via the iptables-persistent package.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub

@justincormack don't know why I didn't think of that! Thanks!

How are you supposed to execute commands that require privilege when using docker service? I need to set the hostname on a few of my machines, but unfortunately this requires privilege.

@nostrebor that is very much unrelated to this open issue.
We are evaluating what options need to exist for services rather than copying them 1-to-1. Privileged mode probably will not be in 1.12 for services.

I'm trying docker builds which compile something for installation, but it must compile against libraries which exist on a CVMFS network file system. Of course, I can't mount CVMFS without running --privileged, so I can't do it at all using a Dockerfile.

@cpuguy83 @tlbtlbtlb This is a case of an 'installer' depending fundamentally on privileged action. But it's not what I would call a 'frivolous use', I just need access to shared libraries on a network file system. Still, installation in this case is NOT simply extracting files from some archive.

I don't understand how mounting a network file system is a portability problem. (All of our target environments will have access to this file system. They are required to because they build other code which ALSO must link to binary code on the network file system.)

Should I try a different approach? Should I mount CVMFS on the host and share that directory with the container or something? I don't want to have to set up an external build system just to make this image - the image upon which it's based already has an entire build system that will do the job. I just need to be able to mount CVMFS.

I was excited about doing this with a Dockerfile, but it seems I'll have to do it the hard way using some scripts with docker run --privileged, or use something else instead of docker. Is there a way of mounting a filesystem in a container _without_ having privileged access?

I did a workaround , by placing / echoing privileged commands inside a script and used CMD instruction to run the script at container entry point , so after building an image like that , i am able to just run the container in privileged mode and everything works.

@drstapletron , according to CERN cvmfs documentation , you have two options for now , either mount cvmfs from host to container or install cvmfs inside a privileged container.

For the seconds case , i have just write a docker-file for cmssw guys , here:
https://github.com/iahmad-khan/system-admin/blob/master/cvmfs-inside-docker.Dockerfile

so using this file , you can just build an image ( or may be you grab it from cmssw dockerhub) , and run it in P mode , and everything will already be there inside container ( ls /cvmfs/*)

Not sure if this was covered above as it is a rather long list of feedback on this issue. I too would like to have --privileged build commands. My current use case is to work around an issue I ran into with building the go ebuild on a gentoo stage3. If I don't use a container, the current instructions in the gentoo handbook cause systemd to behave erratically once I 'umount -l /mnt/gentoo/dev{/shm,pts,} && umount -l /mnt/gentoo/{proc,sys}' from a system booted with systemd... When I move my stage3 build into a docker container everything works fine up until a build requires ptrace, or some other restricted feature... which the go-1.7.1.ebuild seems to require.

For now I am simply running the build within a docker run command, committing and then continuing from there, but it would be preferable to enable ptrace within the docker build command itself to avoid the manual steps.

I'd like this feature as well. I want to create a build environment, but it requires a kernel module and modprobe won't cooperate with me when I'm building. Are there workarounds for this?

Specifically:

modprobe vcan && \
ip link add type vcan && \
ifconfig vcan0 up

Seems like a totally reasonable use-case.

@seltzy I suggest not holding your breath waiting for anyone from docker to acknowledge the reasonableness of your use case.

When it comes to architecture and feature-inclusion, they are very heavy-handed, pragmatic, aloof, and tend to ignore any and all use cases that don't fit into _their_ roadmap.

Us ordinary folks need to understand, the docker team makes architecture decisions that further their own (likely business customer and self-serving) needs, and those decisions rarely overlap with our (the public end-user) very laboriously crafted issues that we file here.

They are, of course, free to allocate their engineering resources in this way.
But it does provide a track record that would beg the imagination if the company was ever described as "taking user's needs seriously."

@tamsky you nailed it !

@tamsky I can understand why you would think that, since the project has not accepted a feature that you clearly desire.

I can assure you this has nothing to do with any sort of business decision. The fact is, --privileged on build will yield to non-portable images.
Things like modprobe in the build environment are not helpful and can even cause two builds to yield entirely different results.

I myself have implemented --privileged on build. It is not an engineering problem, and really it's quite trivial to implement. It's supporting it that is the problem.
For advanced users, a custom builder can be implemented, even reusing the existing code, using the existing API's that can include privileged support.

That said, this issue is still open for a reason. It is because people are listening.

Thanks for your consideration.

@cpuguy83 Thank you for the explanation. I didn't realize it was a portability problem. I suppose my desire for this is fueled by misunderstanding.

What's the general philosophical approach to take when faced with the temptation to have privileged build?

@seltzy don't be so sure that your use case isn't a reasonable example of the need for this feature

@cpuguy83 I'm still waiting for a reply about my use case. Our institution's build system is distributed over a network file system which I must mount in my container. This requires the container to run in privileged mode. My institution's decision to use a network file system for software distribution is not unusual for particle physics. You claim that --privileged on build creates non-portable images, but this is completely unrelated to my use case. Our development model has already given up any portability we _might_ lose due to our use of a network file system (really?). We just need the development machine to be about to mount it.

@cpuguy83 PS you mentioned a custom builder. Where can I find information about this? Thanks!

This whole discussion about container portability is a giant red herring anyway. You can already accomplish the same thing by creating a stage 1 image, launching a container in privileged mode, doing whatever you have to do, then using docker commit to create the final image from that container.

Once they added the docker commit option any notion of image portability went out the door anyway, so forcing people to do this in 3 steps instead of 1 is not gaining anything and only serves to annoy people who could really use a privileged build option.

@drstapletron Mounting a filesystem is not necessarily something that can break portability (unless someone was expecting that to be mounted in the image).
The issue here is having the capability to mount a filesystem also means being able to do a bunch of other nasty things.

@ctindel Yes, you can do anything you want to in the containers you create. The fact that docker build is "the" supported method of building images, though, means we need to ensure that images built with it _are_ portable.

Portable images are not a red herring. Workload portability is the primary design goal of Docker. Our prime directive, as it were.

@seltzy Most things which require extra privileges belong at runtime, because most of the time the elevated privileges are used to modify the host in some way.
That said I can certainly understand that some things are required at build time (such as the nfs mount above).... but even with the case of NFS and manually building the image (not with docker build), I would not give the container --privileged or any extra capabilities at all, instead I would mount the nfs export as a volume.

@drstapletron mount does not require --privileged only a more restricted capability set and is much more likely to happen sooner than a full privileged mode, as that gives full root access to the host which most people do not want. (It still has security issues but they are more manageable).

So I have a completely portable, non-host-modifying use case. It is even open source and you can see it here.

Basically, I want to run Mock in a portable Docker container to build a customized CentOS ISO image. Mock, for those that don't know, is a containerized RPM builder. The problem is, since it uses containers I need --privileged or --cap-add. Ideally, I think docker build would work like a function, taking a few arguments and returning its final result. Without these flags, though, I cannot do that.

Same here ! Using mock inside docker is nightmare because of that :(

Sending build context to Docker daemon 9.728 kB
Step 1 : FROM centos
 ---> 980e0e4c79ec
Step 2 : MAINTAINER Gregory Boddin
 ---> Using cache
 ---> 93e709c87f25
Step 3 : RUN yum install -y spectool mock
 ---> Using cache
 ---> 7006ef8d0276
Step 4 : RUN useradd mock -g mock
 ---> Using cache
 ---> bfb931c56d89
Step 5 : ADD *.cfg /etc/mock/
 ---> Using cache
 ---> 15521d2822b1
Step 6 : RUN su mock -c"/usr/bin/mock -r edge-5-x86_64 --init"
 ---> Running in 542a742b6017
INFO: mock.py version 1.2.17 starting (python version = 2.7.5)...
Start: init plugins
INFO: selinux disabled
Finish: init plugins
Start: run
ERROR: Namespace unshare failed.

@cpuguy83 wrote:

The fact is, --privileged on build will yield to non-portable images.

Why not allow --privileged for those not requiring far-reaching portability?
A simple note in the official documentation would be a reasonable compromise (e.g. _Warning: passing--privilege to the build command may result in a less portable image!_). This would solve nearly everyone's requirements; some users don't need portability, some do, a warning would be enough to meet everyone's needs.

I'm certain the lack of build --privileged significantly complicates my current use case.

It could be called --non-portable. I haven't used the deployment parts of docker yet, but the isolation + overlay filesystem stuff has been really useful without that.

I need to use some proprietary software which requires a privileged container to install. There's nothing I can do about this. I'm stuck with needing to do the 3-stage build, run, commit process.

Container portability doesn't mean anything to me or my business, in fact I'll bet it doesn't mean anything to the vast majority of businesses. What is important is that I would like to maintain less software, so I think selecting portability over usability in this issue is detrimental to Docker.

+1 for this, in the build process we use setfacl, this fails during build and services fail to start in the container. I think that as end users we should not be restricted, use the --priviledge option only if you need and default is disabled.

+1 for this. In our build process it's required to mount the /proc and /dev. Ideally we should be able to have the mount step as part of the dockerfile.

@samsun387 Why does your build process require this?

@skshandilya setfacl is not portable and I would be surprised if acl's could be persisted to an image.

@robhaswell "requires a privileged container" doesn't help much. What is actually being used on install?

+1. mock init needs this.
almost read the whole issue. don't get why people keep asking "why do you need this" for 3 years straight.

@Betriebsrat Because "X needs this" is not really all that helpful.
What is "X" doing? Why does "X" need this during the build phase?

For instance, the above case with capabilities to mount /proc and /dev really do not seem like the right place for the build phase, and even seem like the image would be tied to the host in such a case.

"privileged" is also a tank. It opens up absolutely everything, disables all security features, gives write access to typically read-only places... where someone probably just needs to be able to do a very specific thing.

These questions are asked so we can get the real use case and how we might satisfy such a case.

By the way, when I say "security features" I mean two things:

  1. Things to prevent hacking
  2. Isolation of application concerns from host concerns (ie, image build should not tie the image to the host it's built on).

Looks like mine was solved by 21051, I'm out, for now :)

@shykes said on Nov 28, 2013 @ https://github.com/docker/docker/pull/2839#issuecomment-29481246 ::

Sorry, the current design is to enforce '1 source, 1 build' which is why we don't allow any arguments to docker build other than the source directory.

I understand the fact that some builds currently can't take place because they need privileged operations. To handle that properly we might need to 1) allow a Dockerfile to express the need to be built in a privileged way, and 2) implement an authorization/trus system which allows docker to pause the build, properly warn the user of the risk, expose information on the origin and trustworthiness of the Dockerfile, and then collect the user's decision to allow or deny the build.

@cpuguy83 , is has the design changed at all from enforcing: "1 source, 1 build" ?
Is the Docker Project willing to alter that design and allow this community-requested feature?

Shykes's comment above appears to spell out what "we might need to" do to get this handled. At the same time, the language used ("might") appears to provide the docker project lots of room to come up with additional reasons to reject this design change.

Adding a a NEEDS_PRIVILEGED declaration makes sense but all this stuff about pausing the build? Just fail with an error and let the operator pass the --privileged option if they do indeed want to allow the privileged build.

@cpuguy83 the thing is, people who ever need privileged mode in build are usually power users that know perfectly well what the risks of using it are. And most of them accept that and work around this by simply using docker commit as part of their build for the step that needs it.

You are not in any way preventing people from using privileged mode in build, you are just making it annoying to do.

If your goal is making it annoying to do then just say so outright and close this issue instead of letting it go on for years and years.

Say "we won't fix this because we want it to be annoying to do" and close this issue.

/thread

@cpuguy83, from my understanding, Mock uses unshare(2) with the CLONE_NEWNS flag-- and possibly others-- when it creates its chroot/container environment. This requires at least CAP_SYS_ADMIN.

What is "X" doing? Why does "X" need this during the build phase?

In our use-case, we don't know. It's some propietary piece of crap that we can't change. The thing is, our business doesn't care about "security" (in this context) or portability, or any of the concerns that have been listed. We just want to put this damn piece of crap in a container and move on to doing something valuable.

As @PonderingGrower says, we're going to do it anyway, it's just a matter of how much time we waste while doing it.

people who ever need privileged mode in build are usually power users that know perfectly well what the risks of using it are

I strongly disagree with that assumption. Overall, people that use --privileged are the same category of users that blindly run chmod -r 777 "because someone wrote that it fixed the issue"

In our use-case, we don't know. It's some propietary piece of crap that we can't change. The thing is, our business doesn't care about "security" (in this context) or portability, or any of the concerns that have been listed.

"in this context" here, meaning: giving "some proprietary piece of crap" root access on your host.

@thaJeztah I have no problem with that. It's software we have purchased and pay support for. If we weren't using containers, it would still need root to install.

We need this feature for using dind while building to preconfigure some containers inside the one we are building.

What are you discussing here for 3 years?

docker run has an options --cap-add, --cap-drop and others. So RUN command in Dockerfile want to have the same options. So Dockerfile want to send requests to the parent machine and ask it to add/drop some privileges.

Parent machine can do what it want with these requests. You can make shell interactive, you can make a gui confirm dialog, etc. Why are you discussing the resolution of these requests in this issue?

A significant number of docker users want the ability to --cap-add or --privileged in the build command, to mimic what is there in the run command.

That's why this ticket has been open for 3 years with people constantly chiming in even though the maintainers aren't interested in giving the users what they want in this specific instance.

@ctindel This is definitely the problem of this issue. There is a gap between docker build --cap-add and RUN --cap-add.

Some people want to resolve privileges requests from child machine with just docker build --cap-add=caps_array. What is it? This is just: caps_array.include? requested_cap.

Some people want pre_requested_caps.include? requested_cap. Some people want stdout << requested_cap, stdin.gets == 'y'.Some people want gui_confirm requested_cap. Some people will definitely want UAC_fullscreen_dialog requested_cap.

The resolution method of requested_cap depends on user taste and I think this question won't ever be done.

But RUN --cap-add has nothing to do with people tastes. What are we waiting for?

@andrew-aladev I don't really understand what your post is saying. The point is that people have 3rd party software (RPMs, DEBs, whatever) that is not under their control, which they want to install into an image at "docker build" time, and which requires extra capabilities in order to install correctly. Since they are third party RPMs there is no way to resolve the requirement for increased privileges during the install phase.

They are working around the problem by running a container with those increased capabilities, installing their software, and then creating an image from that container. It's a pain and clearly shows that there is no functional reason to prohibit cap-add at build time since the same end can be achieved circuitously.

@ctindel My english is not very good, sorry.

I know. I've tried to emerge glibc and I've received ptrace not permitted.

docker can run container with increased/decreased capabilities by itself. RUN command in Dockerfile should support --cap-add, --cap-drop, etc.

Lets imagine that our Dockerfile will have RUN --cap-add=SYS_PTRACE -- emerge -v1 glibc. How it would work?

  1. Child machine sends a request to parent and asks for SYS_PTRACE.
  2. Parent allows extended capabilities.
  3. Parent creates new container with SYS_PTRACE allowed.

I see that nobody in this issue is actually arguing about that. People are just arguing about a method of allowing these capabilities.

@thaJeztah said

Overall, people that use --privileged are the same category of users that blindly run chmod -r 777

This man wants a more flexible method of validation required capabilities than just log :info, requested_cap; return privileged?.

@ctindel you said

Adding a a NEEDS_PRIVILEGED declaration makes sense but all this stuff about pausing the build? Just fail with an error and let the operator pass the --privileged option if they do indeed want to allow the privileged build.

You want to make shell interactive. You want stdout << requested_cap, stdin.gets == 'y'. This is the another method of validation required capabilities.

@cpuguy83 said

someone probably just needs to be able to do a very specific thing... things to prevent hacking.

This man wants docker build --cap-add=caps_array caps_array.include? requested_cap. This is the another method of validation required capabilities.

So I am asking: Why RUN in Dockerfile still have no support of --cap-add, --cap-drop, etc? Nobody argues about that. 3 years passed!

@andrew-aladev I'm assuming no one has argued for that syntax because it's been made clear that dockerfile syntax is frozen until the builder is rewritten/refactored/decoupled from the main engine. https://github.com/docker/docker/issues/29719#issuecomment-269342554

More specifically, the title of the issue and OP are requesting --privileged build

this gets a Fonzie thumps up: Fonzie.

being able to run strace in the build step helps a lot.
currently, i work around this by moving all the things i need to debug to the run step - not ideal.

does anyone know why it would work in the run and not the build step? i.e., the the historical reasons.
is there an alternative to strace that works without much permissioning or config?

There is a proposed solution/workaround to this in
https://github.com/docker/docker/issues/6800#issuecomment-50494871 :

if you have issues in docker build, you can use a "builder container":
docker run --cap-add [...] mybuilder | docker build -t myimage -

Could someone (possibly @tiborvass) elaborate on this ? What kind is mybuilder here ?
The image name with some ENTRYPOINT ? Or is the image part of [...] and mybuilder refers
to a shell script ? And how do I convince docker run to pass the context.tar.gz into the docker build -
if that is really what's happening here. Thanks in advance, Steffen

@sneumann mybuilder would be an image name and have some CMD or ENTRYPOINT indeed. The contract for that workaround to work is that mybuilder will have to tar the context from within the container and let it go to stdout. That is passed on to docker build's stdin, thanks to the shell pipe | and is considered to be the context because the context path for docker build -t myimage - is -.

a bit strange upon looking at the code it seems that this option is avaialble in the build command:

anyone more clued up have any idea why it's not being applied?

@mbana the --security-opt is for native Windows containers, which support "credentialspec" https://github.com/docker/docker/pull/23389

is it possible to modify this and make it persist so that future build will enable ptrace?

for anyone interested here are some good links:

I've seen a lot of claims from various folks that this feature isn't needed because builds can be changed to not require certain privileged operations, but no suggestions on what to do about the "docker in docker" case. If a build needs to run docker commands, for example to pull down some images that we want to ship inside this one, or to build a sub-image, how are we supposed to do this without some sort of privileged build option?

For now I'm going to work around this using docker run and docker commit but it would be great if docker build could support this use case.

@scjody Sounds like you want #31257

@cpuguy83 I'm not sure that covers what's going on in this case but I'll give it a shot once that's merged. Thanks!

Hi, i'd like to throw my name into the please-implement-this hat. Or maybe there's a different solution to my problem (docker noob here) that someone could point me to?

I'm trying to build an image based on the official centos/systemd image and provision it with Saltstack. This requires starting (and maybe restarting) the salt-minion daemon with systemd which can't be done (AFAIK) without privileged mode.

@onlyanegg I think in that situation, Saltstack largely replaces the functionality of the builder; keep in mind that each RUN statement is executed in a new container, at which point the previous build container is stopped, and committed to an image / layer.

Have you considered performing the build by running a container, and committing the results (docker commit)?

Thanks for responding, @thaJeztah. I didn't realize that's what the RUN directive did. I did read most of this issue, so I am aware of the docker build -> docker run -> docker commit workaround, which is what I'll likely end up doing. I'm just more in favor of having a single file describe my image - seems neater. Maybe I can put all those steps in packer post-processors and then I'll have that.

Why is this one ignored so much? In times of containers, kubernetes and minikube, and usage of docker in CI and development environment unification, this functionality is really crucial.

@onlyanegg you should be able to restart services _without_ privileged mode. If you have a Dockerfile illustrating that (i.e., "the RUN command at line 8 of this Dockerfile doesn't work because it requires privileged mode") I would be more than happy to take a look!

@derberg precisely! In times of containers, CI, CD, it is important that build tools can be contained (in the security sense). If you allow privileged mode, you have to change dramatically how you use CI tools like Jenkins, Travis, Codeship, etc. Same question: if you have a Dockerfile that requires privileged mode, I would be happy to take a look to suggest alternatives.

Thank you!

@jpetazzo try to get a docker image with docker inside it:

FROM ubuntu:16.04

# Get dependencies for curl of the docker
RUN apt-get update && apt-get install -y \
    curl \
    sudo \
    bash \
    && rm -rf /var/lib/apt/lists/*

RUN curl -sSL https://get.docker.com/ | sh

Now build it and start it. After starting run service docker start to start the docker deamon. Then check the status of the service service docker status:

  • with privileged flag status is ok and you can start container without issues
  • without the flag, it never starts

@jpetazzo ech, just noticed you are the creator of https://github.com/jpetazzo/dind :) so you are aware of docker in docker concept :)

Anyway, so you are aware that privileged flag is needed for run. So now you can imagine a group of people working on some environment and wanting to have a unified environment for development with already some preconfigured things inside, for example minikube with preinstalled components or anything else

So, is there a way to mount an NFS or SMB share in docker build yet ?

@derberg those steps won't work, even if the build container was running --privileged; the docker packages (and install script) are (for example) installing kernel packages on Ubuntu 16.04.
That's exactly the reason --privileged is a bad idea for docker build, because it would be able to make changes on the _host_.

Even though docker will need privileged when _running_, the installation itself does not need this; for example, here's the steps you'd run to install docker in your image;

docker build -t foo -<<'EOF'
FROM ubuntu:16.04

RUN apt-get update && apt-get install -y \
    apt-transport-https \
    ca-certificates \
    curl \
    software-properties-common \
    && rm -rf /var/lib/apt/lists/*

RUN curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add -
RUN add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu xenial stable"
RUN apt-get update && apt-get install -y docker-ce \
    && rm -rf /var/lib/apt/lists/*
EOF

And you can run it just fine (I'm using --privileged here, but perhaps more fine-grained permissions are possible):

docker run -it --rm --privileged -v /var/lib/docker foo dockerd --debug

Here is how I circumvent this issue, for people who really need to build a docker image in privileged mode. It may not solve all cases but can help in some.

This method need a Dockerfile, a docker-compse file, and a shell script.

The Dockerfile

It is the same as you would need to build the image. The only difference, stop at where you need do privileged operations, don't include them. They need to be run by the docker-compose, as a script. For example:

FROM ubuntu:16.04
RUN apt-get update && apt-get install <your packages>
# And more commands
......

## Below are the operations you intended to run in privileged mode when building the image, which does not work.
# More commands....
## But they now are moved to a separated shell script and it will be included in the image
COPY further-commands-to-run-in-privileged-mode.sh /

Those more commands that need to be run in privileged mode are now in further-commands-to-run-in-privileged-mode.sh. It is included in the image and later will be run by docker composer to finished the build process.

The docker compose file

The compose file is the key. It will build the image first from above Dockerfile, and start a container from that image in privileged mode, then you can do your privileged operation there. For example:

version: '3'

services:
  your_service:
    container_name: your_container
    # First build the image from the Dockerfile
    build:
      # Change this to where you keep above Dockerfile
      context: ../docker-build
    image: "your_image_name:your_image_tag"

    # Then start a container from the just built image in privileged mode to finish what's left
    entrypoint: /further-commands-to-run-in-privileged-mode.sh
    privileged: true

Build the image

Below steps can also be save in a shell script.

# First build the image and container(in privileged mode)
docker-compose -f docker-compose.yml up

# Then commit the temporary build container to a new image, change the ENTRYPOINT to what you want
docker commit \
    -c 'ENTRYPOINT ["/bin/bash"]' \
    <build container name> \
    <final image name>:<final image tag>

# Remove the temporary build container
docker rm <build container name>

@thaJeztah I don't have a problem with installation, I have problem with starting the docker service during the build and pulling some images to have them available in the image out of the box.

Add the following script to your Dockerfile and you will see that docker service never gets up

#!/bin/bash

service docker start

sleep 20

service docker status

docker pull busybox

@derberg OK, I see! _Personally_, if I wanted to include images in a dind container, I would download them (for instance with reg) and I would load them the first time that the container starts. Why? Because if you pull the images during the build, the image will work _only if dind is started with the same storage driver_. In other words, your image might or might not work on other machine.

Also–if your images are big (i.e. anything else than, say, busybox or alpine) you will end up with a really big DinD image...

I'd love to know more about your final use-case – because I'm sure we can help you to find a more efficient way than baking a huge DinD image :-)

(Otherwise, the solution proposed by @kraml is rather elegant!)

@jpetazzo we already run with such workaround build-run-commit, but yeah, it is still a workaround from my point of view. Use case is pretty specific related to kubernetes and minikube environment and there is nothing we can do for now. For now we were able to start minikube in docker only with docker as a daemon, using virtualbox or other vm drivers didn't work, so we are dependent on dind approach

Ran into this issue trying to build a image containing a legacy application (pretty normal use-case), where the installer tried to run sysctl commands and failed.

Coming back to this thread and reviewing the entire 4 years (!!!) of back and forth on the issue of how to add some kind of privileged capabilities to the docker build command, it appears that the available options in this situation are either a nasty bunch of sed commands to modify the installer to remove the sysctl calls, or a multi-stage build -> run -> commit pipeline. I agree with @derberg that 'build -> run -> commit' feels to be a workaround (imo a gross/hacky workaround) and I don't think that my use-case is that unique. Checking other threads I've seen plenty of people reporting issues with various application and database installations with failed docker build commands due to lack of privileges.

At this point, the docker run command supports extensive 'privileged' options, along with "fine grain control over the capabilities using --cap-add and --cap-drop". And so, I think objections on a security or technical basis are moot. If the privileged run option is added along with '--cap-add' and '--cap-drop', a security-conscious engineer could choose to limit a privileged build to only include the specific capabilities required for their build.

Hi ,

I have already reported this before , same problem.

What about those who just want to run one container per VM with same user id on VM and container , using docker just as a packaging tool?

Is there still a security concern relating to this?

Ran into this problem. Could really use capabilities for build.

Ran into this issue too.
It's very useful when using docker as CI/CD slaves which could require privileged permission on docker build to install the CI/CD build/test toolchains when building the slave docker image.
I'm waiting for this feature for years and really hope it could be supported in future.

I really don't understand why there is so much pushback from devs regarding --privileged for docker image.
If the users want to shoot themselves in the foot, why not let them? Just put a warning message and that's it. There are already workarounds for achieving same thing, why not make it easier for the users that really need it??
It's been 4-5 years and has been no progress on this.
Just amazing...

As of today, not even this feature has been implemented yet:
RUN --cap-add=SYS_PTRACE
which would fit the needs of many users..

Could you please suggest how I can build this Dockerfile on Gentoo Linux host:

FROM gentoo/stage3-amd64
# Download and extract latest portage
RUN wget http://distfiles.gentoo.org/snapshots/portage-latest.tar.bz2 && \
    wget http://distfiles.gentoo.org/snapshots/portage-latest.tar.bz2.md5sum && \
    md5sum -c portage-latest.tar.bz2.md5sum 
RUN tar -xjvf portage-latest.tar.bz2 -C /usr
RUN emerge dev-lang/go

because I am getting this error when emerge-ing dev-lang/go:

##### Building Go bootstrap tool.
cmd/dist
 * /var/tmp/portage/sys-apps/sandbox-2.12/work/sandbox-2.12/libsandbox/trace.c:_do_ptrace():75: failure (Operation not permitted):
 * ISE:_do_ptrace: ptrace(PTRACE_TRACEME, ..., 0x0000000000000000, 0x0000000000000000): Operation not permitted
/usr/lib64/libsandbox.so(+0xb692)[0x7fd10e265692]
/usr/lib64/libsandbox.so(+0xb778)[0x7fd10e265778]
/usr/lib64/libsandbox.so(+0x6259)[0x7fd10e260259]
/usr/lib64/libsandbox.so(+0x6478)[0x7fd10e260478]
/usr/lib64/libsandbox.so(+0x7611)[0x7fd10e261611]
/usr/lib64/libsandbox.so(execve+0x3f)[0x7fd10e2634ff]
bash[0x41d8ff]
bash[0x41f387]
bash[0x420138]
bash[0x4219ce]
/proc/330/cmdline: bash ./make.bash 

 * ERROR: dev-lang/go-1.9.2::gentoo failed (compile phase):
 *   build failed
 * 
 * Call stack:
 *     ebuild.sh, line 124:  Called src_compile
 *   environment, line 1034:  Called die
 * The specific snippet of code:
 *       ./make.bash || die "build failed"
 * 
 * If you need support, post the output of `emerge --info '=dev-lang/go-1.9.2::gentoo'`,
 * the complete build log and the output of `emerge -pqv '=dev-lang/go-1.9.2::gentoo'`.
 * The complete build log is located at '/var/tmp/portage/dev-lang/go-1.9.2/temp/build.log'.
 * The ebuild environment file is located at '/var/tmp/portage/dev-lang/go-1.9.2/temp/environment'.
 * Working directory: '/var/tmp/portage/dev-lang/go-1.9.2/work/go/src'
 * S: '/var/tmp/portage/dev-lang/go-1.9.2/work/go'

How can I run it without --cap-add=SYS_ADMIN --device /dev/fuse or --privileged?

RUN apt-get -y install unionfs-fuse
RUN unionfs-fuse -o cow dir1=RW:dir2=RO dir3/

I can do it with a separate bash file in entrypoint, but I need single Dockerfile

@amd-nick what's your expectation of the RUN unionfs-fuse ... line during build? Even if that worked, it would only have the filesystem mounted during that single RUN, and be gone in the next step.

@thaJeztah it's hard to explain for me. I am trying to modify this repo. Can I just skip this line on the building?

Hi

Randomly the docker build choose an hostname starting zero '0' which breaks our application, I tried to run "hostname" in such case inside my DockerFile but faced the same issue.

I also would like to have an option to run the docker build with RUNP or get an option to choose the hostname during build.

Has anybody tried building these kinds of images with Kaniko? I just did it with @maneamarius 's Dockerfile on Docker for Mac and it seems to build successfully once you call Kaniko's docker run "build" command with --cap-add=SYS_PTRACE. Though, I'm having a bit of trouble loading the resulting tarball locally, the RAM usage is a bit high since it can't use overlayfs, and layer caching is still WIP. Things might Just Work if I push to a registry but I haven't tried that yet.

docker run --cap-add=SYS_PTRACE --rm -v $(pwd):/workspace gcr.io/kaniko-project/executor:latest --dockerfile=Dockerfile --context=/workspace --tarPath=/workspace/test.tar --destination=test  --single-snapshot

Having this feature would greatly help efforts to build Docker images via Puppet on Redhat/CentOS base images.

Since I last posted, I've followed back up with the changes in Kaniko. They are no longer tarballing in memory and are tarballing onto the disk which means support for Dockerfiles describing big images. Layer caching is still a WIP but they have an option for caching the base images for the moment (That means currently no fast RUN iteration save and run kind of work but we can cache alpine, ubuntu, and whatever popular bases are out there).

It's at a state where I've been successful in building @maneamarius's Dockerfile that emerges Golang in a Gentoo image in this project/demo without modifying @maneamarius 's Dockerfile or chopping it up in any way (EDIT: I've since had to modify the Dockerfile to pin the gentoo base image to the version that was latest at the time of this post. Otherwise, it's still unmodified.) :

https://github.com/nelsonjchen/kaniko-privileged-maneamarius-moby-1916

I've also configured Azure Pipelines to build the Dockerfile into an image with Kaniko with --cap-add=SYS_PTRACE, load Kaniko's output tarball, and run go version in the generated image. I figured some interactive "proof of life" would be interesting. Some of the earlier comments in here also were concerned about CI systems so I figured I'll configure a public CI system to work as well. BTW, Travis CI was considered but the build output was too long and it got terminated and Azure is perfectly happy with 166k lines of output. If the Dockerfile built with about 70k less lines of output, it probably would have succeeded on Travis CI. A link to the Azure Pipeline build outputs is at the top of the README.

Use buildah Luke

I'm closing this issue, because the feature is now available as docker buildx build --allow security.insecure

https://github.com/docker/buildx/blob/master/README.md#--allowentitlement
https://github.com/moby/buildkit/blob/master/frontend/dockerfile/docs/experimental.md#run---securityinsecuresandbox

@AkihiroSuda I have updated my docker to version 19.03 to try buildx. When i was trying the command you mentioned, its giving me an error

$ docker buildx build --allow security.insecure -t sample-petclinic -f Dockerfile .
[+] Building 0.0s (0/0)                                                                                                                                                         
failed to solve: rpc error: code = Unknown desc = entitlement security.insecure is not allowed

Docker version:

Client: Docker Engine - Enterprise
 Version:           19.03.2
 API version:       1.40
 Go version:        go1.12.8
 Git commit:        c92ab06
 Built:             Tue Sep  3 15:57:09 2019
 OS/Arch:           linux/amd64
 Experimental:      true

Server: Docker Engine - Enterprise
 Engine:
  Version:          19.03.2
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.8
  Git commit:       c92ab06
  Built:            Tue Sep  3 15:55:37 2019
  OS/Arch:          linux/amd64
  Experimental:     true
 containerd:
  Version:          1.2.6
  GitCommit:        894b81a4b802e4eb2a91d1ce216b8817763c29fb
 runc:
  Version:          1.0.0-rc8
  GitCommit:        425e105d5a03fabd737a126ad93d62a9eeede87f
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

buildx docs: For entitlements to be enabled, the buildkitd daemon also needs to allow them with --allow-insecure-entitlement

Thanks @AkihiroSuda . It worked now.

just to add another use case.
I am trying to fix a dockerfile build of an ibmdb2 container with a test database
IBM removed the v10 image from the hub. But the v11 DB image only starts with --privileged.
So all the code setting up the database in the Dockerfile is unfunctional now because the db2 doesn 't start without privileged. :(
There seem to be a complicated workaround with using docker run and docker commit.
In a productive build pipeline this creates a lot of extra complexity.

I have to ask like https://github.com/maneamarius in https://github.com/moby/moby/issues/1916#issuecomment-361173550

Why is it such a big deal to support this? The build does execute a run under the hood.

In this specific use case a privileged build option would support a kind of "backward compatibility" and I know I am not the only one who had this issue after my web research.

@uvwild I'm not sure if this helps your use case but you can give a try to build with kaniko Your image will be built without a docker deamon and you can extract the image once its done and using kaniko is jus like running a container you can use --privileged or --cap-add <capability which is needed> which might solve your problem.

I accept its not a complete solution you were expecting but an easier workaround which may fit in your build pipeline.

EDIT: As @alexey-vostrikov said buildah could be a more feasible solution for use cases which need --privileged to build image

Was this page helpful?
0 / 5 - 0 ratings