Moby: Add ability to mount volume as user other than root

Created on 17 Oct 2013  ·  157Comments  ·  Source: moby/moby

Use case: mount a volume from host to container for use by apache as www user.
The problem is currently all mounts are mounted as root inside the container.
For example, this command
docker run -v /tmp:/var/www ubuntu stat -c "%U %G" /var/www
will print "root root"

I need to mount it as user www inside the container.

areapi arekernel arevolumes exexpert kinenhancement

Most helpful comment

Can I say no - forcing users to add a helper script that does

#!/bin/sh
chown -R redis:redis /var/lib/redis
exec sudo -u redis /usr/bin/redis-server

(thanks @bfirsh for your eg)

is pretty terrible.

It means that the container has to be started as root, rather than running as the intended redis user. (as @aldanor alluded to )

and it means a user can't do something like:

docker run -v /home/user/.app_cfg/ -u user application_container application :(

All 157 comments

If you chown the volume (on the host side) before bind-mounting it, it will work.
In that case, you could do:

mkdir /tmp/www
chown 101:101 /tmp/www
docker run -v /tmp/www:/var/www ubuntu stat -c "%U %G" /var/www

(Assuming that 101:101 is the UID:GID of the www-data user in your container.)

Another possibility is to do the bind-mount, then chown inside the container.

@mingfang Will chown not work for you ?

It would be useful to have a shortcut for this. I often find myself writing run scripts that just set the permissions on a volume:

https://github.com/orchardup/docker-redis/blob/07b65befbd69d9118e6c089e8616d48fe76232fd/run

What if you don't have the rights to chown it?

Would a helper script that chown the volume solve this problem? This scirpt can be the ENTRYPOINT of your Dockerfile.

Can I say no - forcing users to add a helper script that does

#!/bin/sh
chown -R redis:redis /var/lib/redis
exec sudo -u redis /usr/bin/redis-server

(thanks @bfirsh for your eg)

is pretty terrible.

It means that the container has to be started as root, rather than running as the intended redis user. (as @aldanor alluded to )

and it means a user can't do something like:

docker run -v /home/user/.app_cfg/ -u user application_container application :(

There is _one_ way to make it work, but you need to prepare ahead of time inside your Dockrfile.

RUN mkdir -p /var/lib/redis ; chown -R redis:redis /var/lib/redis
VOLUME ["/var/lib/redis"]
ENTRYPOINT ["usr/bin/redis-server"]
USER redis

(I didn't test this example, I'm working on a chromium container that then displays on a _separate_ X11 container that .... )

And of course that method only works for direct new volumes, not bind
mounted or volumes-from volumes. ;)

Additionally, multiple containers using volumes-from will have different uid/gid for the same user, which complicates stuff as well.

@SvenDowideit @tianon that method doesn't work either. Full example:

FROM ubuntu
RUN groupadd -r redis    -g 433 && \
useradd -u 431 -r -g redis -d /app -s /sbin/nologin -c "Docker image user" redis 
RUN mkdir -p /var/lib/redis
RUN echo "thing" > /var/lib/redis/thing.txt
RUN chown -R redis:redis /var/lib/redis
VOLUME ["/var/lib/redis"]
USER redis
CMD /bin/ls -lah /var/lib/redis

Two runs, with and without a -v volume:

bash-3.2$ docker run -v `pwd`:/var/lib/redis voltest 
total 8.0K
drwxr-xr-x  1 root root  102 Aug  7 21:30 .
drwxr-xr-x 28 root root 4.0K Aug  7 21:26 ..
-rw-r--r--  1 root root  312 Aug  7 21:30 Dockerfile
bash-3.2$ docker run  voltest 
total 12K
drwxr-xr-x  2 redis redis 4.0K Aug  7 21:30 .
drwxr-xr-x 28 root  root  4.0K Aug  7 21:26 ..
-rw-r--r--  1 redis redis    6 Aug  7 21:26 thing.txt
bash-3.2$ 

We're hitting an issue that would be solved by this (I think). We have an NFS share for our developer's home directories. Developers want to mount /home/dev/git/project in to Docker but cannot because we have Root Squash enabled.

This forbids root from accessing /home/dev/git/project so when I try and run docker mounting /home/dev/git/project I get an lstat permission denied error.

@frankamp This is because docker's current preference is to not modify host things which are not within Docker's own control.

Your "VOLUME" definition is being overwritten by your -vpwd`:/var/lib/reds`.
But in your 2nd run, it is using a docker controlled volume, which is created in /var/lib/docker. When the container starts, docker is copying the data from the image into the volume, then chowning the volume with the uid:gid of the dir the volume was specified for.

I'm not sure there is much that can be done here, and unfortunately bind mounts do not support (as far as I can tell) mounting as a different uid/gid.

My solution to this was to do what SvenDowideit did above (create new user and chown up front in dockerfile), but then instead of mounting the host volume, use a data-only container, and copy the host volume I wanted to mount into the container with tar cf - . | docker run -i --volumes-from app_data app tar xvf - -C /data. This will become a tad easier once https://github.com/docker/docker/pull/13171 is merged (and docker cp works both ways), but perhaps it could become an alternative to -v host_dir:container_dir, ie. maybe -vc host_dir:container_dir, (vc for volume-copy), wherein the host_dir's contents would get copied into the data container. Though I can't say I understand why/how the copied files inherit the container user's permissions, from what I can tell they do, and this is the only reasonable solution I've managed to come up with that doesn't destroy portability.

What about acl?

Is there any fix or workaround? I run into same issue with OpenShift, mounted folder is owned by root:root and precreated images wont work.

I'm looking for a workaround too. If all mounted volumes are owned by root, it makes it impossible to run your Docker containers with any user other than root.

Well you can try s6-overlay. It includes features which are specifically targeted to help to work-around these kinds of problems.

@dreamcat4: Thanks for the pointer. The fixing ownership & permissions seems like an interesting workaround, but wouldn't I have to run my Docker container as root for that to work?

@brikis98 Yes that is true. However s6-overlay also has yet another feature, which allows you to drop the permissions back again when launching your servers / daemons.

@dreamcat4 Ah, gotcha, thanks.

I have the same uid/gid inside and outside of a container and this is what I get:

nonroot$ ls -l .dotfiles/
ls: cannot access .dotfiles/byobu: Permission denied
ls: cannot access .dotfiles/config: Permission denied
ls: cannot access .dotfiles/docker: Permission denied
ls: cannot access .dotfiles/vim: Permission denied
ls: cannot access .dotfiles/bashrc: Permission denied
ls: cannot access .dotfiles/muse.yml: Permission denied
ls: cannot access .dotfiles/my.cnf: Permission denied
ls: cannot access .dotfiles/profile: Permission denied
total 0
-????????? ? ? ? ?            ? bashrc
d????????? ? ? ? ?            ? byobu
d????????? ? ? ? ?            ? config
d????????? ? ? ? ?            ? docker
-????????? ? ? ? ?            ? muse.yml
-????????? ? ? ? ?            ? my.cnf
-????????? ? ? ? ?            ? profile
d????????? ? ? ? ?            ? vim
nonroot$ ls -l .ssh
ls: cannot access .ssh/authorized_keys: Permission denied
total 0
-????????? ? ? ? ?            ? authorized_keys
nonroot$

@darkermatter could you please open a separate issue?

not a problem, but is this not relevant here?

@darkermatter this is a feature request, not a bug report, mixing your case with other cases makes it difficult to follow the discussion, also your issue may be not directly related

@thaJeztah well, as @frankamp and others have done, I was simply demonstrating what happens after running chmod, etc. inside the Dockerfile. I will file it as a bug report, but it is relevant to this discussion.

similar to what @ebuchman proposed, without copying a host volume, you could to create a data-only container first, that does a
chown 1000:1000 /volume-mountas root when it got started.
E.g. in docker compose v2 syntax

version: '2'
services:
  my-beautiful-service:
    ...
    depends_on:
      - data-container
    volumes_from:
      - data-container

  data-container:
    image: same_base_OS_as_my-beautiful-service
    volumes:
      - /volume-mount
    command: "chown 1000:1000 /volume-mount"

This way your container can run as non-root user. The data-only container only runs once.
Assuming you know the uid and gid that my-beautiful-service uses beforehand. It usually is 1000,1000.

Being that you can (in 1.11) specify mount options for a volume to use in your docker volume create, I'd say this seems pretty close to being ready to close.

You can't just specify uid/gid directly because this is not supported with bind mounts, but many filesystems that you can use with the new mount opts can work with uid/gid opts.

I think the issue still sands in cases where you want to mount a CIFS drive inside your container however maybe that should be another ticket?

@michaeljs1990 You can do this, just not per-container (unless you create separate volumes for each uid/gid combo you want).

@cpuguy83, could you please clarify how one must use docker volume create to avoid this issue?

I just ran into this issue today with docker 1.11 and had to do some painful rejiggering to convince the docker image to let me write to files on a mounted drive. It would be really nice if I never need to do that again let alone try to explain it to someone else.

Not sure if this is what you are asking but...

FROM busybox
RUN mkdir /hello && echo hello > /hello/world && chown -R 1000:1000 /hello

Build above image named as "test"

$ docker volume create --name hello
$ docker run -v hello:/hello test ls -lh /hello

Both /hello and /hello/world in the above example would be owned by 1000:1000

I see. So, I did something similar but a little different, which may make it worth sharing. Basically, I added a user to the Dockerfile that shared my UID, GID, username, and group for the user outside the container. All <...> are things replaced by relevant values.

FROM <some_image>
RUN groupadd -g <my_gid> <my_group> && \
    useradd -u <my_uid> -g <my_gid> <my_user>

After this, one can either switch using USER or using su at some later point (e.g. entrypoint script or when using a shell). This let me write to the mounted volume as I was the same user that created. One could additionally use chown inside the container to make sure one has permissions on relevant things. Also, installing sudo is generally a smart move when doing this too.

While it solves the problem, I don't know that I love it as this would need to be done for any user. Also, I hard-coded stuff (yuck!), but maybe templates could be used to make this a bit smoother. I wonder if this shim could be absorbed into docker run somehow. If there is a better way to do this already, I'd be very interested to know what it is.

There is an option to map host users uids/gids with container users uids/gids with --userns-remap. Personally I haven't tried it. See a good discussion on this topic http://stackoverflow.com/questions/35291520/docker-and-userns-remap-how-to-manage-volume-permissions-to-share-data-betwee .

@cpuguy83:

You can't just specify uid/gid directly because this is not supported with bind mounts, but many filesystems that you can use with the new mount opts can work with uid/gid opts.

What filesystems are you thinking of that can accept uid/gid arguments? I know FAT can, but that feels just as hacky as anything else being proposed in this thread.

IMO, Docker has two options:

  1. Official support for mounting volumes as a specified user/group (using the user/group name defined inside the container, not requiring the host to have this knowledge of the container's internals).
  2. Or... get rid of the USER directive (and associated runtime flags).

Being able to run as a non-root user while only being able to mount volumes owned by root is a misfeature. The sharing of uid/gid between host and container is another misfeature.

@mehaase volumes take the ownership of whatever is already at the path in the container. If the location in the container is owned by root, then the volume will get root. If the location in the container is owned by something else, the volume will get that.

Some sort of workaround for this would be great. Unless the container specifically expects it, it makes it _very_ hard to add volumes to standard containers like elasticsearch, redis, couchDB, and many others without writing a custom Dockerfile that sets the permissions. This mostly makes the docker run -v command or volume: directive in docker-compose useless.

@chrisfosterelli why useless? I do not think it is out of the ordinary to set ownerships of files/dirs you expect to use.

@cpuguy83 Because it does not appear to be possible to set the ownership without using a custom Dockerfile that sets permissions and volumes, which is why think they are not useful for defining volumes. I'm not binding containers to my host filesystem, if that's relevant.

@chrisfosterelli But all these standard Dockerfiles should have the permissions already set.

I think what @chrisfosterelli is trying to say, @cpuguy83, (and please correct me if I am wrong @chrisfosterelli) is that it has become clear that these variables (UID, GID, etc.) are dynamic and need to be set at run-time (particular w.r.t. to files owned internally and from mounted volumes), but we lack a way to do that currently. The response thus far seems to be they shouldn't be run-time determined, but that is ignoring the fundamental usability problem presented by such a suggestion. Again if I am misunderstanding any of this please feel free to correct me.

@jakirkham I must not be understanding what the usability problem is.
The files are in the image, they should have the ownership and permissions required for the application to run. It has nothing to do with the volume itself. The volume just takes on what was set in the image.

@cpuguy83 I did a bit more digging and isolated it to this: Say I have an elasticsearch container that will create a directory /data when starting up (if no data is present), then use docker run -v /data elasticsearch. The directory /data becomes owned by root:root and the daemon that runs as elasticsearch inside the container will now fail to start because it cannot write to /data.

It'd be ideal if I could set this volume to be owned by elasticsearch without needing a custom Dockerfile... although I guess you could argue this sort of issue should be resolved in the upstream image.

@chrisfosterelli there is some talk on the kernel mailing lists of having an overlay like driver that can change ownership but there is not much we can do without something like that. I am curious, can you just make all the files in your volume world read and write and set umasks appropriately so new files are too? (I havent tried yet).

@justincormack I believe so, but I think that doesn't work when I'm expecting the container to create the data in the volume (rather than the host). I understand this is kind of a weird issue, so I am currently addressing it by fixing it in the upstream Dockerfile itself to mkdir -p && chmod the directory.

@chrisfosterelli thats why I said set the umask, if your umask is 000 (in the container) all new files will be created with 666 or 777 permissions, and if the mount point is 777 to start with that should be ok? If the permissions are always world read and write, uid and gid should not matter?

@justincormack Yes that sounds correct... how can I do that while creating a docker container with a non-host-mounted volume?

@chrisfosterelli hmm, thats a good question. It looks to me like the permissions on a new volume are what the default umask would give, so you could try running the docker daemon with a 000 umask and see if then the volume is world writeable. Maybe we should have some permissions options on docker volume create.

(you could fix it up with a root container that did chmod and exited too but thats ugly)

On create is no good. The issue is if the container doesn't have the path, the path gets created with root. This could arguably be done as whatever the passed in user is.

@cpuguy83 I think it would make more sense to create it as the user passed in with -u since that would probably be the user trying to write the volume anyway from inside the container right?

I was able to mount as the user of my choice using the below steps:

  • Create the user inside Docker with the same UID/GID as the user owning the files on the host.
  • Create the mount points in advance and chown them to the target user in the container

Quote @chrisfosterelli:

I did a bit more digging and isolated it to this: Say I have an elasticsearch container that will create a directory /data when starting up (if no data is present), then use docker run -v /data elasticsearch. The directory /data becomes owned by root:root and the daemon that runs as elasticsearch inside the container will now fail to start because it cannot write to /data.

This is a great example! I have a similar example with the Solr image. Solr needs one or more "cores", each of which is a collection of related configuration files and index fragments. Each core is placed inside a directory with a user-specified name. For example if I want to create a core named products, the path would be /opt/solr/server/solr/products. The name of the core is chosen by me, so the Solr image maintainer cannot pre-create this directory in the image.

I want my index data to be saved so that I can upgrade my image to a newer Solr without needing to re-index all of my documents, but if I mount a volume to /opt/solr/server/solr/products, then it is owned by root and Solr (running as solr) cannot actually write anything to it. The parent directory /opt/solr/server/solr contains other files, so I cannot mount a volume there either. (In the latest Docker parlance, I believe my volumes are called "named volumes", i.e. volumes that are not mounted to a specified path on the host but are totally managed by Docker for me.)

I've talked about this with the Solr image maintainer and there are some workarounds (and he has made some changes to the image to help) but it's all pretty hacky and requires case-by-case changes to the upstream image. Having the feature discussed in this thread would make _all images_ more extensible without needing to create a new Dockerfile.

@ctindel maybe... if the directory doesn't already exist.

@cpuguy83 That's true, I agree. That was definitely my use case. It doesn't seem to make sense to create the directory as root if it doesn't exist when a user id has been explicitly specified for running the container.

@cpuguy83 it just works for named volume.

@kamechen What just works?

@cpuguy83 When you use a named volume the files are mounted under the user that you need

@eciuca Well.... it depends. If the named volume was empty, or the data in the named volume was created by the same user you happened to need.

Was there ever a solution to the issue raise by @andrewmichaelsmith?

We're hitting an issue that would be solved by this (I think). We have an NFS share for our developer's home directories. Developers want to mount /home/dev/git/project in to Docker but cannot because we have Root Squash enabled.

This forbids root from accessing /home/dev/git/project so when I try and run docker mounting /home/dev/git/project I get an lstat permission denied error.

I think it is possible to work around this using bindfs.
Using docker's -v ... to mount the volume in a temporary location and then bindfs to mount it where neded as another user.

@piccaso, the way I understood @andrewmichaelsmith is that the issue is that the bind-mount on the host side fails because of rootsquash. But bindfs could still be used as a workaround actually, but this time on the host side. First, on the host, you bind-mount the nfs share into a temporary location as a non-root user using FUSE, and then you mount that temporary folder in docker with -v ....

Mind you that bindfs (at least with FUSE) has quite a bit of CPU overhead.

Yes, bindfs is very much undesirable. It's slower than even the CoW filesystems.
There is some work being done in the kernel to allow uid/gid shifting on mount that may help.

There is some work being done in the kernel to allow uid/gid shifting on mount that may help.

This will probably also only help to address the use-case where I want to re-map uid/gid inside the container. The mount itself as executed by the docker daemon would still be executed as root on the host. My understanding is that in with the Kernel bind mounts can be only created as root. Not sure if there is work to change that to allow mounts to be performed from by non-root users (I know to little about the way Linux handles mounts to judge if the even makes sense).

@NikolausDemmel I doubt that will change. The mount syscall requires CAP_SYS_ADMIN, which is not something given to a non-root user, or even something we give to the root user of a container.

@cpuguy83 Thanks for the clarification. That means mounting docker volumens to host folders that are NFS mounts with root-squash will not work in the foreseable future (due to limitations of the mount syscall as you say), save for using workarounds like FUSE with bindfs.

Sorry, this was slightly OT, since OP was asking about changing UID/GID inside the container. But it is kindof the other side of the coin and it had come up in the discussion above. Just wanted to clarify that distinction.

I am running Docker for Mac, and have mounted a volume but I can't seem to get the permissions set for the web service to access the files. How would I fix this? I tried changing perms and setting the group to staff, but it seems Alpine doesn't have a staff group.

Sorry if this isn't the best place to put it, I've been been struggling for days and couldn't think of a better place.

@NikolausDemmel: We are trying to use Docker for some bioinformatics work. We have multiple huge file systems mounted root-squashed via NFS. We read in huge sequence data (fastq), and the write out a somewhat smaller BAM file with the genomic reads aligned to the datastore. Currently, we can use docker by doing the custom image to create a user inside the container and use USER at the end to make it work, but this is problematic for a couple reasons:

  1. If we want to use someone else's Docker image, we have to rebuild with a custom Dockerfile
  2. We either need to create a custom docker image for each local user, or we use a single "service" account and we won't be able to distinguish between the user's activities in the FS.

Can the Bindfs or userNS let us get around this?

I think I'm running into the same issue, my use case is:
Docker images hold our portable build tools for a given project, we use volume mounting with relative paths either with something like docker run -v ./:/src/ image or the equivalent in a docker-compose file. A build is automatically kicked off, and new files are generated in a subfolder in the linked volume.
Sometimes we like to use those built files from the host, but the fact that they are still owned by the user in docker, rather than the host user that ran docker tends to make things tricky.

Am I doing something particularly wrong here?

@rlabrecque see my post earlier about matching the docker user's id with that of the host. I used this approach and it works really nicely for us. Basically, run HOST_UID=$(id -u) and HOST_GID=$(id -g) and generate a Dockerfile that expands $HOST_GID and $HOST_UID in the below two commands:

RUN groupadd -g $HOST_GID mygroup
RUN useradd -l -u $HOST_UID -g mygroup myuser

Use the generated Dockerfile with the ID's filled in, to build your image.

@haridsv I've done something similar and it works great on Linux. But that doesn't seem to work for me on Windows: the files inside the mount are still owned by root.

I solved this by using inotifywait. You will need to install inotify-tools to run it inside your docker image. It's possible to run it on your host system instead, but I wanted a portable solution.

RUN export DEBIAN_FRONTEND=noninteractive \
  && apt -y update \
  && apt -y install inotify-tools \
  && inotifywait -m -r /mount -e create --format '%w%f' \
    | while read f; do chown $(stat -c '%u' /mount):$(stat -c '%g' /mount) $f; done

This works by instructing inotifywait to watch for any new files or directories created in the directory /mount. When it notices one, it changes the ownership to the same user and group as the /mount folder. I used the integer representation of both, in case the host user/group does not exist in the container. Inside the container it doesn't matter who owns it, because everything runs as root. Outside the container, the host filesystem shows the same ownership as whatever directory was mounted at /mount.

I deliberately designed it to only set the ownership of newly created files and directories, in order to preserve the ownership of pre-existing files and directories. It's safer than blowing all that away with a chown -R statement every time the filesystem gets mounted. If uniform permissions work for your project and you want a simpler solution that runs more efficiently, look at inotify-hookable.

Warning: Since one inotify watch will be established per subdirectory, it is possible that the maximum amount of inotify watches per user will be reached. The default maximum is 8192; it can be increased by writing to /proc/sys/fs/inotify/max_user_watches.

We've used a host-side script to chown the volume beng mounted which avoids the need to rebuild the image:

#!/bin/bash
set -e

DOCKER_IMAGE=<docker_image>
COMMAND=<internal_command>

DOCKER_USER=docker-user
DOCKER_GROUP=docker-group

HOME_DIR=/work
WORK_DIR="$HOME_DIR/$(basename $PWD)"

PARAMS="$PARAMS -it --rm"
PARAMS="$PARAMS -v $PWD:$WORK_DIR"
PARAMS="$PARAMS -w $WORK_DIR"

USER_ID=$(id -u)
GROUP_ID=$(id -g)

run_docker()
{
  echo \
    groupadd -f -g $GROUP_ID $DOCKER_GROUP '&&' \
    useradd -u $USER_ID -g $DOCKER_GROUP $DOCKER_USER '&&' \
    chown $DOCKER_USER:$DOCKER_GROUP $WORK_DIR '&&' \
    sudo -u $DOCKER_USER HOME=$HOME_DIR $COMMAND
}

if [ -z "$DOCKER_HOST" ]; then
    docker run $PARAMS $DOCKER_IMAGE "$(run_docker) $*"
else
    docker run $PARAMS $DOCKER_IMAGE $COMMAND "$*"
fi

what about using filesystem ACL's on the host directory? That way you can tell the filesystem to apply specific permissions to newly created files inside the directory. If you set the ACL at the host level, if you modify the data from the container it will also happen.

@thaJeztah @justincormack @cpuguy83

@kamechen seems to be right that the named volumes "just work". In the case of the named volumes, the existing permissions "backfire" and change the volume permissions, and I, personally, consider this to be a bug (#28041).

@thegecko, why don't take this approach further and not create users within an entrypoint?

Here is my example, it detects an owner of the mounted directory, creates a user with the same UID and runs the command as under this user:

Dockerfile

FROM ubuntu

RUN mkdir /project
VOLUME /project

ENV GOSU_VERSION 1.9
RUN set -x \
    && apt-get update && apt-get install -y --no-install-recommends ca-certificates wget && rm -rf /var/lib/apt/lists/* \
    && dpkgArch="$(dpkg --print-architecture | awk -F- '{ print $NF }')" \
    && wget -O /usr/local/bin/gosu "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-$dpkgArch" \
    && wget -O /usr/local/bin/gosu.asc "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-$dpkgArch.asc" \
    && export GNUPGHOME="$(mktemp -d)" \
    && gpg --keyserver ha.pool.sks-keyservers.net --recv-keys B42F6819007F00F88E364FD4036A9C25BF357DD4 \
    && gpg --batch --verify /usr/local/bin/gosu.asc /usr/local/bin/gosu \
    && rm -r "$GNUPGHOME" /usr/local/bin/gosu.asc \
    && chmod +x /usr/local/bin/gosu \
    && gosu nobody true \
    && apt-get purge -y --auto-remove ca-certificates wget

ADD entrypoint.sh /

ENTRYPOINT ["/entrypoint.sh"]

CMD /project/run.sh

entrypoint.sh

#!/bin/sh

USER=dockeruser
VOLUME=/project
UID="$(stat -c '%u' $VOLUME)" && \
useradd --uid "$UID" "$USER" && \
ls -l "$VOLUME" && \
exec gosu "$USER" "$@"

run.sh

#!/bin/sh

echo "Running as \"$(id -nu)\""

When I run sudo docker build -t test . && sudo docker run --rm -v /tmp/docker-test/:/project test:latest it outputs:

total 12
-rw-r--r-- 1 dockeruser dockeruser 990 Dec 12 10:55 Dockerfile
-rwxr-xr-x 1 dockeruser dockeruser 156 Dec 12 11:03 entrypoint.sh
-rwxr-xr-x 1 dockeruser dockeruser  31 Dec 12 11:01 run.sh
Running as "dockeruser"

Has anyone considered this issue? Making sure your volumes have the same gid and uid as your containers makes it harder to manage your containers to not use root. Docker's best practises recommends not running services with root if possible, but doesn't having to setup the gid and uid on the host as well as the containers kind of defeat the ease of use that made Docker popular?

@AndreasBackx Using volumes just works assuming the image has data at the path your mounting to.
Using binds uses the host path's UID/GID.

There is currently no way (as in no kernel support) to map or change the UID/GID of a file/dir to something else without changing the original unless using something like FUSE, which is horribly slow.

But let's take a step back for a moment.
Docker is not really making things more difficult here.
UID/GID in the container is the same as UID/GID on the host.. even if user/group names don't match, UID/GID is what matters here.
Just as without docker you need to come up with a uid/gid you want to use for your service(s) and use it out of convention.
Remember, a uid/gid does not need to exist in /etc/passwd or /etc/group for file ownership to be set to it.

@cpuguy83 thank you for the explanation.

I ran into this issue today while creating a node pipeline, my host user UID is 1000 and the node image creates a custom user with that specific UID, there is even an issue about that

I would use the node user and move on but it feels a little bit dirty. i really share what you wrote @cpuguy83 about "let's step back for a moment", but sometimes it could become a hard to resolve issue.

Just found out the -o option on usermod to allow duplicate IDS, seems a legit option.

RUN usermod -o -u 1000 <user>

Unsure why this hasn't been fixed in any reasonable way.

docker run -it -u 1000:4211 -v /home/web/production/nginx_socks:/app/socks -e SOCKS_PATH=/app/socks --name time_master  time_master

login to see:

drwxr-xr-x    8 geodocr_ geodocr       4096 Jun  4 18:51 .
drwxr-xr-x   57 root     root          4096 Jun  6 21:17 ..
-rwxrwx---    1 geodocr_ geodocr        140 Jun  4 18:49 .env
-rwxrwx--x    1 geodocr_ geodocr         78 Jun  4 18:49 entrypoint.sh
drwxrwxr-x    2 geodocr_ geodocr       4096 Jun  4 18:51 handlers
-rwxrwx---    1 geodocr_ geodocr        242 Jun  4 18:49 requirements.txt
-rwxrwx---    1 geodocr_ geodocr       1270 Jun  4 18:49 server.py
drwxr-xr-x    2 root     root          4096 Jun  6 21:00 socks
drwxr-xr-x   10 geodocr_ geodocr       4096 Jun  4 18:51 utils

The dockefile specifically does

RUN adduser  -D -u 1000 $USER 
#
RUN addgroup $GROUP -g 4211 
#
RUN addgroup $USER $GROUP 
RUN mkdir /app/socks
USER $USER  
#

It doesn't make any sense that this volume is mounted as root, when neither the user in the container is selected, nor is the user running the command selected. I could understand if the RUN command mounted as the user running the command, or the user who owns the directory, or even the user who is specified in the Dockerfile.

None of these are root, so it mounting as root seems to be a bug.

Also just checked, creating a volume, then mounting it works. So still a bug.

@disarticulate If you want the host path to be something other than root, then you should change the host path.

I don't think this has been mentioned before, but this bug is particularly annoying when you rely on Docker to create the host volume for you. Docker seems to always create the host volume with root, even when the directory you're mounting over has a different owner.

It would seem that the correct thing to do here would be to create the volume with ownership permissions belonging to the image's USER.

@jalaziz what should it do when the container's user doesn't exist on the host? One of the main benefits of containers is that you don't have to expose their dependencies (including users) to the host.

@taybin I would expect that Docker just creates the folder with the uid:gid of the container's user OR if the folder exists inside the container, it would create the host folder with the same uid:gid and mask.

NOTE: I don't expect Docker change permissions if the folder already exists on the host.

@taybin Exactly as @frol described. It should just use the uid:gid from the container.

However, this reveals my general problem with the current approach. I have to know the uid the container uses to allow writes and set permissions on host directories based on that uid:gid. If an upstream author ever changes the uid, permissions will break. It all seems very fragile.

In my case, I didn't necessarily have explicit control over which docker image that was being used (I couldn't just edit a Dockerfile to my liking).

So, I tried this:

docker run -it -u $(id -u $USER):$(id -g $USER) -v $(pwd):/src -w /src node:latest npm run build

Which creates a folder called ./built-app. However, the owner was still root with strict permissions.

My workaround was this:

docker run -it -v $(pwd):/src -w /src node:latest /bin/bash -c "npm run build; chmod -R 777 ./built-app"

Which still has root owner, but relaxed permissions. Then my host OS (Ubuntu) was able to access ./built-app without sudo privileges.

@rms1000watt have you tried the following command?

docker run -it -v $(pwd):/src -w /src node:latest /bin/bash -c "npm run build; chown -R ${UID}:${GID} ./built-app"

That should work since it would use your host's UID and GID directly onto the files themselves. Using chmod -R 777 is generally bad practice.

@saada Whoops! Good call. I'll give it a shot.

I could accomplish my approach reading this and understanding how _UID and GID work in Docker containers_
https://medium.com/@mccode/understanding-how-uid-and-gid-work-in-docker-containers-c37a01d01cf

Basically there’s a single kernel and a single, shared pool of uids and gids that means that root of your local machine is the same that the root of your container, they both share the same UID


I have a apache server and I want to share my webapp files with the apache container to modify it on the host (development, change them using a text editor) and see the results by a process running in the container. Sometimes that process, write new files and if I don't change the default behaviour with the privileges that files are generated by the root user and my local user can not modify them anymore.

What I did was, generate a custom image adding this in my dockerfile:

RUN adduser -D -u 1002 dianjuar -G www-data
USER dianjuar

Maybe to make my docker-compose.yml portable to anyone can use it is put some parameters on the build process.

Here’s a container pattern for assigning the userid / groupid at runtime in a way that’s easily portable. https://github.com/Graham42/mapped-uid-docker

The way I followed:

1- create the directory on the host server
2- change it's permissions to the user which has userid and groupid = 1000
3- run docker-compose up
checked the container and everything is fine.

Note: I was using root user on the host server and I assume that if I were using a non-root user who has uid = 1000 i would be able to mount the volume without worrying about the permission but i have not tested it yet. Is there anyone followed a similar way ?

Typical Problem:

  • docker swarm, so CAPP_ADD is not vailable, bind-mount is not a solution
  • two containers of two different images share the same volume, so different user/group databases on both
  • e.g. one must have access rights www-data (i.e. let's encrypt certificate downloader)
  • the other also uses www-data (i.e. nginx)
  • but a third one requires access from user openldap (i.e. openldap server)
  • so simple chmod ist also not a solution

That means, I have a webserver, that gets SSL certificates for its domains from let's encrpt and an OpenLDAP server for the same domain, where I want to reuse the certificates.

There are other combinations that run into exactly the same problem.

Any Idea, how to solve this?

How would you solve this without Docker? This is not a Docker specific
problem.

On Fri, Jan 12, 2018 at 10:24 AM, Marc Wäckerlin notifications@github.com
wrote:

Typical Problem:

  • docker swarm, so CAPP_ADD is not vailable, bind-mount is not a
    solution
  • two containers of two different images share the same volume, so
    different user/group databases on both
  • e.g. one must have access rights www-data (i.e. let's encrypt
    certificate downloader)
  • the other also uses www-data (i.e. nginx)
  • but a third one requires access from user openldap (i.e. openldap
    server)
  • so simple chmod ist also not a solution

That means, I have a webserver, that gets SSL certificates for its domains
from let's encrpt and an OpenLDAP server for the same domain, where I want
to reuse the certificates.

There are other combinations that run into exactly the same problem.

Any Idea, how to solve this?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/moby/moby/issues/2259#issuecomment-357267193, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAwxZgyvdCwGGVkUqCxK9nDFw1zxSKjUks5tJ3kXgaJpZM4BGxv9
.

--

  • Brian Goff

Even without swarm, I could solve it in docker: bind-mount.

It is a docker-swarm specific problem, because there is no CAP_ADD.

@mwaeckerlin bind mounts can't map to different user ID's.
But even with swarm you can bind mount.... why is CAP_ADD needed?

Without CAP_ADD, mount inside docker fails.

But by writing my comment, I just got a possible solution, but unfortunately that requires to change the Dockerfile in both images, so it won't work for library or other 3rd-party images:

  • create a group with a explicitly gven common group id in all Dockerfiles
  • give rights to the group

@mwaeckerlin Why do you need to mount inside the container?

Because I cannot specify a user/group with docker option -v.

The Idea specified above was: Bind-mount inside the container, then chown on the target should not change the source.

@mwaeckerlin If you change it, it's changed everywhere. This is the crux of the problem in this issue.
Chowning/Chmoding a bind-mounted file/dir changes both places.

Also there is no need to be able to mount inside the container you can --mount type=bind,source=/foo,target=/bar

Yes, I just tested it outside of docker, so the idea above is wrong.

The main problem, that I have often see in docker, is that the users, groups are not identical in different images, even when the same username or groupname exists in both, they often have different ids.

Here something like this would at least help in some cases: --mount type=bind,source=/foo,target=/bar,user=me,group=mine

Any recommendations or best practices regarding this topic: shared volume user by different users in different images in docker swarm?

  1. Don't share volumes
  2. Sync up your uid/gids
  3. Ensure permissions are permissive enough for all that are sharing
  4. Use fuse mounts on the host to bind to different uid/gid's for each container

Can you elaborate on point 4?
A practical example maybe on how to do so?

On Fri, 12 Jan 2018, 17:27 Brian Goff, notifications@github.com wrote:

>

  1. Don't share volumes
  2. Sync up your uid/gids
  3. Ensure permissions are permissive enough for all that are sharing
  4. Use fuse mounts on the host to bind to different uid/gid's for each
    container


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/moby/moby/issues/2259#issuecomment-357282169, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AHSjvgjb0BFbJhZ1VWM-pLGfa7tRBvDNks5tJ4VPgaJpZM4BGxv9
.

Using something like https://bindfs.org/ -- there's even at least one Docker volume plugin which already implements it (https://github.com/lebokus/docker-volume-bindfs is the first result I found via Google).

i cant change permission after mounting the volume, is that any one get this?

a workaround :
Adding this to Dockerfile
RUN echo "if [ -e container_volume_path ]; then sudo chown user_name container_volume_path; fi" >> /home/user_name/.bashrc
The ownership of the container_volume_path is changed after the volume has been mounted.

Being able to map uid and gid seems like a mysterious missing element to docker volume handling. The path of least surprise would be to include it and the suggested fixes are clunky and harder to discover, while providing no best practices benefit:

Re:
1) Don't share volumes

  • Good, but immaterial to the discussion on mapping uid/gid
    2) Sync up your uid/gids
  • That's what the functionality is intended to do, but without forcing a chown in a Dockerfile
    3) Ensure permissions are permissive enough for all that are sharing
  • This again relies on behavior defined in a Dockerfile, when it could be a simple mapping
    4) Use fuse mounts on the host to bind different uid/gid's for each container
  • Good advice, that also seems like yak-shaving.

@colbygk

when it could be a simple mapping

That's the problem, it is not possible to do a "simple mapping" as it is not supported at the vfs layer.
Some filesystems provide the ability to map ownership (e.g. bindfs or nfs), but implementing this in the generic case is not currently possible.

I need shared volumes e.g. in the following situation:

Shared Certificates

  • container 1 is a reverse-proxy that handles let's encrypt for all hosted domains
  • container 2 is an ldap server that also needs to provide the certificate of it's domain

solution: the image container 2 inherits from the same image than container 1, the common base image creates a common group, then both container have the same group access

Dockerfile of common base:

RUN groupadd -g 500 ssl-cert

letsencrypt-config.sh in the let's encrypt image:

chgrp -R ssl-cert /etc/letsencrypt

Dockerfile of mwaeckerlin/reverse-proxy:

RUN usermod -a -G ssl-cert www-data

Dockerfile of mwaeckerlin/openldap:

RUN usermod -a -G ssl-cert openldap

That's it.

All of this illustrates on how to change userperms during entrypoint or during build process to have the entire docker run in a different user.

But maybe i miss a big point after searching the web for past 3 days.
None of above or otherwise linked recommendations and (workarounds) will work in any way.

All volumes mounted to a container are always owned by root:root inside the container. Regardsless if i change the owner upfront on the host with matching UID/GID or not.

I cant loose the feeling im being stupid trying to do something very very basic - from my point of view.

  • Windows 10 Pro 1803 (17134.112)
  • Docker for Windows 18.03.1-ce-win65 (17513)
  • Windows WSL with Hyper-V and Ubuntu

Trying to start a plain apache2 container where the document root is mounted to host so i am able to develop on php source code while immediately test it on the docker container.

root@win10:# docker run --rm -v /c/Users/<MyUser>/Development/www-data:/var/www/html -it httpd:2.4 /bin/bash

Inside the docker container, the directoy _/var/www/html_ is always owned by _root:root_, so my php app wont ever be able to fopen, or write with any data inside that folder ever..
Nothing worked yet... :(

For those searching for a reasonably elegant solution check out what @elquimista suggested here. I've tested this and is working nicely

We've been using https://github.com/boxboat/fixuid#specify-paths-and-behavior-across-devices with luck. In addition it sets up a user inside the container to match a user on the host.

Here's an example configuration from the image:

$ cat /etc/fixuid/config.yml
user: lion
group: lion
paths:
  - /home/lion
  - /home/lion/.composer/cache
  - /tmp

and to run:

$ docker run --rm -it --init \
    -u 1000:1000 \
    -v `pwd`:/app \
    -v "$HOME/.composer/cache:/home/lion/.composer/cache" \
    --entrypoint='fixuid' \
    php:7.2-cli \
        /bin/bash

Note that this is also an annoyance when using storage systems that don't support unix permissions and ownership. In that case the mounting of the storage must be done so that it gets the correct uid for use inside the container because any attempt to chown the files simply fails. If there were a way to tell docker to present the files as owned by a particular uid, regardless of ownership outside the container that would simplify things.

@tlhonmey

If there were a way to tell docker to present the files as owned by a particular uid

There isn't, not without a custom filesystem (e.g. like bindfs).

@tlhonmey Yeah, I was able to get around the problem of "storage systems that don't support unix permissions" with some symlinks.

Basically, mounting from NTFS drives, I'd put things in -v ./HostNtfsStuff:/data/ntfsMount and then make a symlink and chown that ln -s -T /data/ntfsMount /var/lib/myApp && chown -Rh myApp:myApp /var/lib/myApp/

You can test too: su myApp -c 'echo foo > /var/lib/myApp/bar' && cat /data/ntfsMount/bar

My use was for windows devs to be able to run MySQL containers and persist on mounted volumes, but it applies to plenty of apps.

So the solution is to manually manage a bunch of uid:gid pairs and hope they don't clash on the host, or a helper script, or:

There is _one_ way to make it work, but you need to prepare ahead of time inside your Dockrfile.

RUN mkdir -p /var/lib/redis ; chown -R redis:redis /var/lib/redis
VOLUME ["/var/lib/redis"]
ENTRYPOINT ["usr/bin/redis-server"]
USER redis

(I didn't test this example, I'm working on a chromium container that then displays on a _separate_ X11 container that .... )

I was using the last technique until today when I tried to bind mount the container volume and it broke. Apparently you can't do that. The volume gets created as root and then the app inside can't write to it as the user. Auto-population described in VOLUME documentation doesn't seem to work with bind mounting either.

I saw this reading Dockerfile Best Practices and the helper script is what they recommend:

#!/usr/bin/env bash
set -e

if [ "$1" = 'postgres' ]; then
    chown -R postgres "$PGDATA"

    if [ -z "$(ls -A "$PGDATA")" ]; then
        gosu postgres initdb
    fi

    exec gosu postgres "$@"
fi

exec "$@"

So a recursive chown to make sure your have ownership on every start, then run your app as the user. exec also forces PID 1 so signals work. And if I want to populate the volume with something like a helper script for use outside the container on the resulting data, it probably has to go in the helper script as well. Wondering, however whether there is a performance hit to container start if your app stores tons of files on a volume, especially if the storage is not local.

It does seem like there could be a better solution. Maybe something like mapping container uid and gid to that of a specified username and group on the host. Can docker peek at the container's /etc and figure this out maybe?

You cannot map uids/gids at the filesystem level, at least not without fuse.

You cannot map uids/gids at the filesystem level, at least not without fuse.

Kinda what I was fearing. Any idea what the performance penalty might be if docker used fuse like this?

@mdegans

So a recursive chown to make sure your have ownership on every start,

You don't need to do a chown on every start. Instead, check the owner of the data directory and only do the recursive chown if it is not correct. Like this:

 [ $(stat -c %U "$PG_DATA") == "postgres" ] || chown -R postgres "$PG_DATA"

So ideally this will happen on first start only.

And be very careful when running a container with such an entry point script; if you mount (eg) your home-directory into the container, all your files will be chowned to postgres

In a good docker image design, the run time user is not root and therefore cannot chown files …!

In a good docker image design, the run time user is not root and therefore cannot chown files …!

Correct, but there shouldn't be anything stopping a switch to and from root which is often required... just like you shouldn't normally be running anything as root until you need it, but when you do, then you can do one or more of the following:

  • sudo
  • su
  • USER root

As per: https://f1.holisticinfosecforwebdevelopers.com/chap03.html#vps-countermeasures-docker-the-default-user-is-root

In my humble opinion, it is up to the user of the Docker image to make sure he/she sets the permission on the mounted volume right.

It is very similar to what we did traditionally before containers were a thing, e.g. when I wanted to run nginx and I needed to make sure the static HTML directory was own by the right user. In order to know that I would need to open my nginx.conf file, check the workers' user and set permissions accordingly. Actually, this was all described in nginx documentation.

This is just a Unix permission problem, nothing new with Docker here. So perhaps the solution to this issue is better documentation for each Docker image of what should be the ownership of mounted volumes. I do not recall nginx start-up daemon making sure the directory had the right ownership, it would simply fail if it was not correctly setup.

However this is true that because we now have users potentially defined inside the container and not outside that it makes things looking different (and they are not). But the UID inside and outside are equivalent, so user foobar with UID 2000 might exist inside a container and not outside, but UID 2000 can still be set on files and directories on the outside. We have to change our thinking in terms of UID/GID rather than in the human friendly names we used to deal with.
It makes also things potentially more difficult if you need to share a volume between 2 containers written by 2 different authors. It is possible that setting permissions using the traditional Unix system (of user, group and others) is not enough to solve the issue (no common UID or GID). I admit that since I use Docker, I'm making a lot more uses of POSIX ACL. I can therefore assign 3 different users permissions to the same file. e.g. a container-writer with rw permission, a container-reader with r permission and a host user with r permission.

One more option: common GID can be enforced using setgid flag for shared directories. File mask can be enforced using ACL.

Before you do anything in Docker container, run:

```
umask 0000
````

https://en.wikipedia.org/wiki/Umask

Dropping by late to this thread to reaffirm how helpful this feature would be.

To be honest, I've been deploying containers for about a year now, and I see this becoming a real problem all over the place. Providing a solution at this level, here, seems like the only sensible choice.

As it stands today, a considerable number of Docker images chose to keep running their entrypoints as root so they can bootstrap directory and file permissions only to drop privileges before running the application process.

A real issue appears when you realize that not everyone can resort to this practice. For some popular platforms, like Kubernetes or OpenShift, some of these environments may be configured to not allow privileged containers... because... security. Out of the top of my head, I can't possibly see how a large financial institution would even consider adopting a container platform processing sensitive information without this type of restriction.

The security concerns raised by the _entrypoint-as-root_ practice have driven a large number of Kubernetes helm charts to provide initContainers that can chown and chmod volumes before the application container starts. This might seem like a nice way around, but trust me when I say this: it is not.

Helm charts, in particular, are being littered with hardcoded uids and gids because those need to be secretly ripped from the application runtime. That information is hidden inside the container and not promptly available during deployment.

While there's a number of ways works around the issue, it continues to plague deployment configurations all over as a _hack to make things work_. The number of deployments affected by this is rapidly increasing and the techniques people are resorting to are contrary to all other benefits that containers bring to the table.

I hope there's a way to implement this as part of the OCI spec so that other solutions that depend on Docker can use it to elegantly provide a fully automated deployment.

SO then the question becomes: where else on the internet do they develop the common OCI spec, where this discussion should be taken? Assuming that is not the best way to get this feature into docker (eventually, through a requirement for compliance to a future a commonly agreed upon standards adoption).

Since the problem definitely isn't ever going away all by itself, and the solution requires some very fundamental kinds of changes.

initContainers that can chown and chmod volumes before the application container starts. This might seem like a nice way around, but trust me when I say this: it is not.

FWIW; this feature would only be needed for situations where files are shared between multiple namespaces (either files (pre-) existing on "the host", or a common file location shared between multiple containers running as different users). In situations where files are pre-created on the host, this can be mitigated by making sure those files have the correct ownership and permissions before sharing it with the container. Effectively that's not any different from (e.g.) running nginx on the host, and making sure that files in the webroot have the correct permissions.

When sharing between containers that are running as a different user, either run both containers with the same uid (or gid, and set the correct group permissions, similar to how this would work when running two non-containerised processes that need to have access to the same resources).

some of these environments may be configured to not allow privileged containers... because... security. Out of the top of my head, I can't possibly see how a large financial institution would even consider adopting a container platform processing sensitive information without this type of restriction.

Just to prevent confusion; a container running as root is not the same as a "privileged" container (--privileged or options, such as --cap-add set). Privileged (--privileged) containers are highly insecure, whereas a container running as root is fully contained and won't be able to break out; passing it bind-mounted files/directories is punching holes in that, so _will_ give it access to the files/directories you pass as a bind-mount.

Helm charts, in particular, are being littered with hardcoded uids and gids because those need to be secretly ripped from the application runtime. That information is hidden inside the container and not promptly available during deployment.

Wondering: if those uids/gids are not known; what would the UX look like? (as I'd have to provide a mapping uid/gid to use to map a host uid/gid to an (unknown) container-uid/gid ?

SO then the question becomes: where else on the internet do they develop the common OCI spec, where this discussion should be taken?

I don't think (at a glance) that a change to the OCI spec is needed; this can be solved outside of the OCI spec; the main problem is that mechanisms to map uids/gids are currently missing in the kernel (or exist (such as shiftfs), but not commonly available)

This is a classic pentangle of the passing of passing of responsibilities / somebody else can or should solve this problem. Either it's the:

  • User
  • The specific implementation of the Docker / containerization platform
  • OCI spec
  • Kernel
  • Filesystem

The problem was already effectively stated: that having the user do this is both clunky and less secure. However the knock-on effect of having users make per-image hacks is also important:

Which is that you cannot so easily inter-operate and share / mix images to work together from different users. So it either:

  • Breaks the community sharing (quite a lot). Because different users define from the same global namespace pool their uids and gids for their individually developed images
  • Forces users to develop their own ad-hoc standard and hope that others will follow a convention that they themselves have chosen
  • Forces users to use root for everything. Which is for certain less secure. Because you are stripping away an extra layer of privileged escalation protection that you would have otherwise had. And makes container breakout vulnerabilities that much easier to exploit, since the user is already root inside of the container to begin with. Not to mention being able to run other services within the same container, which is also another way to go sideways, before going up.

So it's a trade. Those above are the current trade offs. Wheras you would have different set of trade offs to pass the responsibility elsewhere, over to one or more of the other entities listed above.

BTW in regards to taking a closer look at a filesystem-based solution, have found this 'potentially could be useful' comment of spider links:

https://github.com/docker/compose/issues/3270#issuecomment-365644540

Which has several different references to this same general feature (to other project /places), including to a distributed filesystem (known as 'Lustre'), and other issue regarding ZFS. Well, I just so happen to be using ZFS over here myself.

Then also found another copy of the same bug on ubuntu / launchpad. Referencing the same ZOL #4177 issue,

https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1567558

Which says the bug in question has been fixed in zfs version 0.6.5.7+ SO. Does this mean that potentially we can use zfs and ACLs, as some kind of backing store for remapping uids and gids somehow? Well this is not something I have heard of before.

Oh maybe this solution only works for LXC containers. Because he was also saying in his comments there (the lead of the LXC project), "we use setuid helpers (newuidmap and newgidmap)" which can then "setup a uid and gid map". So presumably there is also some necessary mechanism in LXC itself, otherwise the zfs acls part cannot be utilized? Or perhaps I am mistaken. I am not completely sure I follow this all the way.

Another interesting link, this time about shiftfs, and a discussion about the possibility of absorbing it's features into overlayfs. Which of course is an underlying filesystem which docker already uses.

However what happens if the remapping feature gets implemented into overlayfs, yet I want to use zfs storage driver instead for my underlying filesystem? Must I then be left out of the capability to remap uids/gids, if it is being implemented on a per-filesystem basis? Or can we have both implemented separately? Sorry I am a bit unclear as to whether Docker daemon needs to be aware of such remappings, and provide a common api and flags (to pass down to the fs drivers layer). Or if we instead would be performing such remapping manually ourselves on the host side (in the filesystem, outside of docker). That aspect also remains a little unclear to me.

[EDIT] oops, forgot to include the link! Here it is

https://lists.linuxfoundation.org/pipermail/containers/2018-June/039172.html

This issue is about volumes / bind-mounts, so separate from the container's filesystem

We would use overly with a uid/gid shift for the bindmount if overlay incorporated shiftfs features, but would have to fallback to something else (or nothing) on unsupported systems.

Podman is a rootless Docker drop-in replacement https://www.youtube.com/watch?v=N0hSn5EwW8w https://podman.io/ . With podman, root is not used so user permission is handled correctly. Our team switched to Podman due to this problem and worked very well.

This makes no sense.
The same problems apply.
Note that docker also has a rootless mode.

You can test Podman with the following commands. Podman does not have a separate daemon unlike Docker, and everything runs under the user that executes podman commands. So files created inside podman is owned by the user who ran podman run ... command.

kkimdev@ubuntu:~$ mkdir podman_test
kkimdev@ubuntu:~$ ls -agh podman_test
total 8.0K
drwxrwxr-x 2 kkimdev 4.0K Jun 27 04:23 .
drwxr-xr-x 8 kkimdev 4.0K Jun 27 04:23 ..

kkimdev@ubuntu:~$ podman run --rm -it -v ~/podman_test:/podman_test alpine
/ # cd /podman_test/
/podman_test # touch test_file
/podman_test # ls -agh
total 8K
drwxrwxr-x    2 root        4.0K Jun 27 02:24 .
drwxr-xr-x   20 root        4.0K Jun 27 02:24 ..
-rw-r--r--    1 root           0 Jun 27 02:24 test_file

/podman_test #

kkimdev@ubuntu:~$ ls -agh podman_test/
total 8.0K
drwxrwxr-x 2 kkimdev 4.0K Jun 27 04:24 .
drwxr-xr-x 8 kkimdev 4.0K Jun 27 04:23 ..
-rw-r--r-- 1 kkimdev    0 Jun 27 04:24 test_file

This is not the appropriate place to advertise for podman -- if there are specific technical details about how it works that can help solve this issue, those would be relevant to discuss, especially as potential solutions to the issue you're currently commenting on. So far, that is not what this has been, so please take this discussion elsewhere.

The fact that podman has a very different architecture to Docker which makes this issue less severe/painful does not magically allow Docker to completely change the way it works just to solve this one problem. I can assure you there are a lot of reasons for why Docker is structured the way it is, and it's frankly poor faith to ignore all that history.

@tianon Yes absolutely, there are pros and cons for both approaches. I mentioned podman just because running a container with podman with the target user specifically solves this technical issue, which is, "mounting volume as user other than root".

Please take a look at the permission of "test_file" created in my above comment. It first mounts "~/podman_test" directory, and writes "test_file" file inside the podman container. Then once user gets outside of the container, you can see that the file is owned by "kkimdev", not root.

The problem is that your suggestion to fix an issue with Docker is that it amounts to "don't use Docker" which isn't terribly constructive on the issue tracker for Docker.

Yes, podman is designed differently which makes this issue moot for that tool -- that's good and fine, but totally off topic here. Rootless has different tradeoffs, some of which are fine for some people, some of which are not. It's getting better with time (and mostly kernel improvements), but isn't a generic solution for everyone here.

This requires either kernel modifications or a shim for a generic solution as discussed in detail above (and as @cpuguy83 and others have been working on trying to help solve this issue in a generic way).

Docker had this particular issue open since 2013 and nearly six years later there is no easy improvement in sight. Podman was designed to gain compatibility with Docker but solve the design flaws of Docker as well (including running as an unprivileged user that won't require a superuser Docker daemon).

If users are able to giver others an advice on a GitHub issue that's totally fine. This is a community. Feel free to recommend whatever could be helpful.

I can assure you there are a lot of reasons for why Docker is structured the way it is

So is grep. But if someone needs to search faster I would still recommend ripgrep. Even on the grep issue tracker. It shouldn't matter whose issue tracker it is, as long as it solves the problem of the users and makes them happy.

If Podman doesn't work for you: fine! But if helps others because they just have to replace docker with podman in their infrastructure: just let them do so.

Podmans main argument is that it does not run a daemon and that is my main argument against it. How do I get my container back up after a reboot? I won't do it by hand and everything else is just bad design. Also I don't want my docker container owned by a user but owned by the system and this means root.
Podman makes sense if you are the only person using it.

And to fix your problem: Build a container with COPY --chown ...:...!

Also Docker has not such problems and you can remotely control docker servers which is important for me, too.

There are tools to generate pods from running containers, too which I won't recommend cause you should build them from the ground up in a clean way.

I guess we should go back on topic now: IMHO the first advice was okay but everything else just blows this issue up and won't solve anything.


@SuperSandro2000, you can click here for the response on your statements, though.

How do I get my container back up after a reboot? I won't do it by hand and everything else is just bad design.

Well, Podman has native integration with systemd (like _nearly_ every other thing on nearly all modern GNU Linux distributions). So you don't have to maintain 'two' booting systems (like first having systemd to start the Docker daemon which then have to make another round of starting containers in a different configuration). So with Podman you can control everything with systemd (meaning: the system you most probably already have installed and running anyway).

Also I don't want my docker container owned by a user but owned by the system and this means root.

It is totally fine if you don't want it. You are still able to run Podman as superuser but you don't _have to_ anymore. In general it is considered a bad idea and increases the attack surface because if someone is able to exploit your Docker daemon, he has control about _everything_ on on the system.

Podman makes sense if you are the only person using it.

This statement does not make any sense. Podman enables you to spread up on a single system which is a feature that especially makes sense if you have many people working on the same system.

And to fix your problem: Build a container with COPY --chown ...:...!

IMHO the issue here is _mounting_ a volume for a container on _runtime_. Which has only little to do with building an image.

Also Docker has not such problems and you can remotely control docker servers which is important for me, too.

Funny that you mention exactly the blog that has this post in it. However I'm not very experienced with network details of both implementations but as I understood, podman starts with the least possible network rules and unprivileged users cannot set up veth pairs

To be clear, you should be able to get the same affect with rootless docker as you get with podman.
This is because dockerd is running as your user and root in the container is mapped to your UID.

This does have drawbacks and of course does not work when sharing a daemon with multiple users.
https://get.docker.com/rootless

On Jun 27, 2019, at 7:52 AM, Alexander Adam notifications@github.com wrote:

I guess we should go back on topic now: IMHO the first advice was okay but everything else just blows this issue up and won't solve anything.

@SuperSandro2000 https://github.com/SuperSandro2000, you can click here for the response on your statements, though.
https://podman.io/blogs/2018/09/13/systemd.html https://osric.com/chris/accidental-developer/2018/12/docker-versus-podman-and-iptables/ https://osric.com/chris/accidental-developer/2018/12/using-docker-to-get-root-access/

You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/moby/moby/issues/2259?email_source=notifications&email_token=AAGDCZXX2UQCG7LUVH57V6LP4TH2DA5CNFSM4AI3DP62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYXL2XI#issuecomment-506379613, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGDCZX437HJP4M6XG3SEY3P4TH2DANCNFSM4AI3DP6Q.

@alexanderadam

IMHO the issue here is mounting a volume for a container on runtime. Which has only little to do with building an image.

My solution was to no mount the directory but bake it in the container if it is possible.

I mean podman sounds good but I won't switch cause for now I do not see any advantage for me. Thanks for the explanation anyway.

podman suffers from the same problem if Apache inside container is run under www user. https://github.com/containers/libpod/issues/3990

The solution could be to map www user from container to UID on the host if there is no root user inside container. I do not know if that is possible.

If you want to run with --read-only (to do the same as the readOnlyRootFilesystem Kubernetes policy), it's possible to do the below. It builds on the workaround @jpetazzo was suggesting:

  • My docker image creates and uses a user with uid=1001 and gid=1001
  • Separately, create a docker volume
  • Chown the uid:gid to 1001
  • Mount that image when running the application.

Dockerfile:

FROM ubuntu

RUN groupadd -g 1001 appgroup && \
    useradd -u 1001 -g appgroup appuser

USER appuser

Then:

$ docker build . -t test
$ docker volume create somedir
$ docker run -v somedir:/some_dir alpine chown -R 1001:1001 /some_dir

Now, when running the docker image and mounting the volume, /some_dir belongs to the user I want.

$ docker run -it --read-only -v somedir:/some_dir test ls -lrt

...
dr-xr-xr-x  13 root    root        0 Nov  4 15:22 sys
drwxr-xr-x   2 appuser appgroup 4096 Nov  5 09:45 some_dir
drwxr-xr-x   1 root    root     4096 Nov  5 09:45 etc
...

$ docker run -it --read-only -v somedir:/some_dir test touch /some_dir/hello
$ docker run -it --read-only -v somedir:/some_dir test ls -lrt /some_dir

-rw-r--r-- 1 appuser appgroup 0 Nov  5 09:52 hello

I'll point out again, because it is easily lost in the thread, that a chowned symlink will probably work for most scenarios. The downside being that you need someway of setting it up, which often means replacing the entrypoint with a script that then runs the original command.

https://github.com/moby/moby/issues/2259#issuecomment-466094263

+1

I think this is so far the most annoying issue I have with docker and seeing how long this is already open suggests this is not the case for many others?

It's not an issue if you know the workaround. My cases:

  • Host is Linux

    • uid in container == desired uid on host - no workaround is needed
    • uid in container != desired uid on host - just run a couple of setfacl commands and give rw access both for host user and container user
  • Host is MacOS - everything works out of the box for official Docker app.

just run a couple of setfacl commands and give rw access both for host user and container user

This is problem. I don't want to run a couple of setfacl commands for every docker image and detect OS.

This actually is a huge security issue as well.

Example scenario:

  • host1 has docker installed
  • host1 has a multiple service running in docker containers - all of which are mount local paths under /docker/my-service-01|02|03|etc
  • each container was built by different vendor and each one follows their own uid and guid policy, thus requiring you to chown -R uid.gid /docker/my-service-01... accordingly.

Result:

  • At some point normal or service users created on host will have full access to /docker/my-service-01|02|03|etc which is not intended nor desired.
  • If you want to mount a volume as "read-only" on two containers from different vendors - it will fail since the uid.gid won't match the required ones and you won't be able to chown because each container has it's own uid.gid policy and they are different :)

Yes we discussed this issue at length previously and the key fact being communicated back (at that time) was that the linux kernel did not have an underlying supporting mechanism to provide remappable uids and gids. So one would need to to added to the kernel in order for this project (moby / docker) to implement this highly desirable functionality. Otherwise we would have already gotten this feature some time ago. Back when it was first looked at.

So the most productive way to continue this discussion (today) would be to: See if any of that situation has since changed. Look for a technical commentary from linux kernel mainline devs over on vger.org. Look for past patch sets / merge requests on the kernel for this underlying missing feature. etc.

In the hopes for a better understanding of what has been happening at that lower level. What was the stumbling block? Was it a performance issue? Was it an objection in terms of the security model / weakening ? Is it still on the table or in a future roadmap, but only makes sense after other features B and C can be implemented? All this kernel development goes on elsewhere. In other channels.

@DXist The fact this works magically on OSX and not on Linux is surprising and a problem in itself.

As per @dreamcat4 's last comment, has anyone made a new attempt to see what the status of this is? Do we have support in the Kernel for remappable uids and gids now? What's the overall status here?

I've used Linux User Namespaces to perfectly solve this issue. Works exactly the same (AFAICT) as the other platforms (container sees bind mounted volume as root, host sees it as the user who's running docker).

Guide is here: https://www.jujens.eu/posts/en/2017/Jul/02/docker-userns-remap/

@patrobinson +1

Was this page helpful?
0 / 5 - 0 ratings