Moby: Docker Network bypasses Firewall, no option to disable

Created on 14 Apr 2016  ·  114Comments  ·  Source: moby/moby

Output of docker version:

Client:
 Version:      1.10.3
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   20f81dd
 Built:        Thu Mar 10 15:54:52 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.10.3
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   20f81dd
 Built:        Thu Mar 10 15:54:52 2016
 OS/Arch:      linux/amd64

Output of docker info:

Containers: 14
 Running: 5
 Paused: 0
 Stopped: 9
Images: 152
Server Version: 1.10.3
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 204
 Dirperm1 Supported: false
Execution Driver: native-0.2
Logging Driver: json-file
Plugins: 
 Volume: local
 Network: bridge null host
Kernel Version: 3.13.0-58-generic
Operating System: Ubuntu 14.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 7.793 GiB
Name: brm-pheonix-dev
ID: Y6Z4:6D53:RFOL:Z3CM:P7ZK:H6HL:RLV5:JT73:LZMC:DTBD:7ILK:2RS5
Username: benjamenmeyer
Registry: https://index.docker.io/v1/

Additional environment details (AWS, VirtualBox, physical, etc.):
Rackspace Cloud Server, Ubuntu 14.04, but that shouldn't really matter

Steps to reproduce the issue:

  1. Setup the system with a locked down firewall
  2. Create a set of docker containers with exposed ports
  3. Check the firewall; docker will by use "anywhere" as the source, thereby all containers are exposed to the public.

Describe the results you received:
root@brm-pheonix-dev:~/rse# iptables --list DOCKER
Chain DOCKER (1 references)
target prot opt source destination
ACCEPT tcp -- anywhere 172.17.0.2 tcp dpt:6379

Describe the results you expected:
root@brm-pheonix-dev:~/rse# iptables --list DOCKER
Chain DOCKER (1 references)
target prot opt source destination
ACCEPT tcp -- 127.0.0.0/24 172.17.0.2 tcp dpt:6379
ACCEPT tcp -- 172.16.0.0/16 172.17.0.2 tcp dpt:6379

Additional information you deem important (e.g. issue happens only occasionally):

By default docker is munging the firewall in a way that breaks security - it allows all traffic from all network devices to access the exposed ports on containers. Consider a site that has 2 containers: Container A exposes 443 running Nginx, and Container B runs an API on port 8000. It's desirable to open Container A to the public for use, but hide Container B entirely so that it can only talk to localhost (for testing by the user) and the docker network (for talking to Container A). It might also be desirable for testing purposes to have Container C be a database used by Container B with the same kind of restrictions.

I found this because of monitoring logs on a service I had _thought_ was not open to the public. After finding log entries from sources trying to break in, I checked the firewall rules and found there was no limit on the source addresses or interfaces. I use UFW and only allow SSH onto this particular box, and would prefer to keep it that way. This can dramatically impact using Docker containers to deploy services and lead to potential security problems if people are not careful.

The best security practice would be to by default limit the networking to work like above desired effect example, and then allow the user to add the appropriate firewall, etc rules to override such behavior, or have an option to revert to the current behavior. I know that for legacy reasons that is not likely since it would break a lot of things on up-date; so at least having an option to enable the above that can be turned on now would be a good first step, and perhaps later after much warning make it the default behavior. Assuming the default behavior is secure, having functionality to manage this (firewall->enable public port, ip) in the docker-compose yml would be a great way to visibly make it known what is going on.

I did find the --iptables=false option, however, I don't want to have to be setting all the rules myself. The only thing I am objecting to is the source setting for the rules.

While I have not verified it, I suspect all the firewalls supported by docker will have the same issue.

arenetworking versio1.10

Most helpful comment

The issue that Ben is bringing to light is real and surprising (a bad combination). Many admins, like myself, are using the tried-and-true ufw firewall. Docker is doing and end-run around ufw and altering the iptables rules is in such a way that it 1) causes ufw to misreport the current status of the packet filtering rules, and 2) exposes seemingly private services to the public network. In order for docker to remain in the good graces of the sysadmin community, another approach must be devised. Right now there are many admins out there, who, like Ben and myself, inadvertently opened ports to the wider Internet. Unlike Ben and myself though, they have not figured it out yet.

All 114 comments

Note: I noticed in https://github.com/docker/docker/blob/master/vendor/src/github.com/docker/libnetwork/iptables/iptables.go that there is not even an existing option to set the source, so it's just using the iptables defaults for source ip/device.

It is not quite #14041 as this issue is talking about exposed ports. Exposing ports is intended to make them publicly accessible, as this is how you expose services to the outside world. If you are working in a development environment, you can either disable access to ports from outside your computer with a host firewall, or simply not expose the ports and access the services directly, or from other containers on the same network.

I would recommend you use the newer docker networking features to set up private networks for services that you do not want exposed at all, see https://docs.docker.com/engine/userguide/networking/

That's what I thought of first; but was a bit confused, because _exposing_ a port (EXPOSE), doesn't actually do anything, but _publishing_ a port (-p / -P) actually exposes it on the host.

If you're actually talking about _publishing_, then this is as designed;

In your example, container B and C should not publish their ports, and container A can communicate with them through the Docker Network, e,g.

docker network create mynet

docker run -d --net=mynet --name=api api-image
docker run -d --net=mynet --name=db database-image
docker run -d --net=mynet --name=web -p 443:443 nginx

This only publishes the "web" container to the host, The web container can access the "API" and "database" containers through their name (I.e. http://api:80/, and db:3306 (assuming MySQL))

@justincormack so I don't think using a private network solves the issue. In my case I'm using a private network between the containers, and they're still publicly exposed because the host firewall isn't configured to limit the exposure to the private network.

@thaJeztah the issue still comes down to the firewall support - there's no firewall support in docker to limit it to a specific network. You can probably still access those containers from another system as the firewall will not prevent other systems from accessing the port on the host.

Now I'm running this via docker-compose; however, it's not entirely a docker-compose issue since the libnetwork functionality has _no_ capability of limiting the network in the firewall rules - iptables rules have no source specification so regardless of how one configures the network as long as one relies on docker to create the firewall rules (which one should because it's more likely to get them right) then this becomes an issue. Consider the following in a docker-compose.yml file:

nginx:
    build: ./docker/nginx/.
    ports:
        - "127.0.0.1:8080:80"
        - "127.0.0.1:443:443"
    environment:
        DESTINATION_HOST: repose
    links:
        - repose
repose:
    build: ./docker/repose/.
    ports:
        - "127.0.0.1:80:8080"
    environment:
        DESTINATION_HOST: phoenix
        DESTINATION_PORT: 8888
    links:
        - phoenix
curryproxy:
    build: ./docker/curryproxy/.
    ports:
        - "127.0.0.1:8081:8081"
    external_links:
        - rse_rse_1
        - rse_rse_2
        - rse_rse_3
phoenix:
    build: .
    ports:
        - '127.0.0.1:88:8888'
    links:
        - curryproxy:curry
    external_links:
        - rse_rse_1:rse
        - rse_rse_2
        - rse_rse_3
        - rse_cache_1:cache
    volumes:
        - .:/home/phoenix

The above is an excerpt from one of my projects. While I want to be able to test all of them locally from my host, I don't want anyone else to be able to access anything but the nginx instance.

I'm not sure how this translates to your nomenclature...it may be that this is part of the "publishing" aspect, and the publishing capability needs to be expanded to do what I'm saying.

If this is by design, then it's a poor security model as you now expose all developers to extreme risks when on unfamiliar networks (e.g traveling).

As I said, I don't expect the default to change immediately but having the option would be a good first step.

I am a bit confused then, can you give some examples externally of what you can connect to? The backend services will be (by default) on the 172.17.0.0/16 network, which you will not be able to access externally I wouldn't think first because you will not have a route to that defined from an external host.

There is a potential issue if your external IP is also a private IP that traffic will not be dropped that is routed for the internal networks (whereas it should be from public to private) - is that the issue?

@justincormack so I'm primarily setting up proper proxying so that some services can only be hit via the proxy (nginx - ssl termination), which then filters through an authentication proxy (repose), and finally off to another service (phoenix). I could care less if all of them are bound to the 0.0.0.0 interface; but I only want nginx to be externally accessible (or at the very least the repose portion if I didn't have nginx in place here). An easy solution, for example, would be to not have to set "127.0.0.1" in the configuration, but have a firewall section where it's easy to specify that to allow through the firewall with a base configuration of only the docker network and local host (loopback) interfaces enabled to talk - something like:

firewall:
    external:
        ports:
            - 80
            - 443

Now the situation can be mitigated somewhat by limited the network mapping on the _host_ to 127.0.0.1 instead of the default 0.0.0.0 map. Note that this is what really mitigates it because otherwise the bridging will forward the host port into the docker network.

And yes, I did verify that that limiting works; however, it still leaves potential vulnerabilities in place and the firewall rules do not match what is actually being done.

As another example, there was a Linux Kernel vulnerability a little while back (having trouble finding it at the moment) that was related to ports that were marked in IPtables as being opened for use by applications, but then not actually being connected to an application - for instance, being on a local host port but not a public IP port. This potentially sets that up, and it would be better practice to limit the IPtables rules to the expected networks instead of leaving them open to connect from any where. As I said, at the very least have the option to specify. They've likely fixed that particular issue but why leave the possibility open?

IOW, it's all about security.

@BenjamenMeyer if you don't want the other services to be accessible, why do you publish their ports at all? i.e. "127.0.0.1:8081:8081" is not needed if it's only accessed through the docker network (other services connect directly through the docker network)

One issue I have that's related to this one is that I would like to publish ports, but only allow certain IP addresses to access them.

For example, I run a Jenkins environment in a couple of containers. The master node is "published", but I have to make some pretty convoluted iptables rules to lock it down so that only the 2 offices we have can access it.

Is there a way around this currently built into Docker? Or at least a recommended practice? I've seen in the documentation how you might restrict access to 1 IP address; but not several. The other issue with this is that if you have a server that already has an iptables configuration, you might be resetting all of the rules before applying your rules (hence the convoluted rules I have had to set up).

I have an issue similar to the one stated by @SeerUK. There is a jarring violation of expectation when preexisting firewall rules don't apply to the published container ports. The desired behavior is as follows (for me at least)

  1. User connection attempt is filtered based on INPUT configurations, etc
  2. Traffic forwarding then happens as usual based on the docker-added FORWARD rules

Is there a succinct way to achieve this in iptables, or does it not easily permit such a construct. I'm particularly limited in my knowledge of iptables so bear with me. I've just recently picked up knowledge about it while trying to understand docker's interactions with it.

What I've actually resorted to for the time being, since I am actually running these containers on a pretty powerful dedicated server, I've set up a KVM VM running Docker, then using some more standard iptables rules to restrict access from the host. The VM has it's own network interface that's only accessibly from the server, so I have to add rules to explicitly allow access to ports in iptables on the host. I have lost a little bit of performance, but not much.

@thaJeztah I want to be able to access it from the local system, and test against it easily. For example, setting up a RESTful HTTP API that has a Health end-point and being able to reliably run curl against it by using localhost (I have to document this for others and having IP addresses that change is not reliable). In most cases for my dev environment I only want the containers to talk to each other, but I also want to be able to access it from the host.

For @SeerUK's case, being able set an IP block (5.5.0.0/16 - a valid parameter for a source address in iptables rules) would be a very good thing. IPtables already has the capability to do the limiting, but docker is not taking advantage of it.

@thaJeztah I set "127.0.0.1:8081:8081" explicitly to keep it off the external network; I had found logs in my docker containers from people trying to crack into the containers via the exposed ports.

My work around right now is to turn off docker containers before I leave for the day because I can't ensure the environment I want to be external actually _is_ external, or that the environment is properly limited for security purposes.

@BenjamenMeyer one way to do this is running those tests in a container, e.g.

docker run --net -it --rm --net=mynetwork healthchecker 

The issue that Ben is bringing to light is real and surprising (a bad combination). Many admins, like myself, are using the tried-and-true ufw firewall. Docker is doing and end-run around ufw and altering the iptables rules is in such a way that it 1) causes ufw to misreport the current status of the packet filtering rules, and 2) exposes seemingly private services to the public network. In order for docker to remain in the good graces of the sysadmin community, another approach must be devised. Right now there are many admins out there, who, like Ben and myself, inadvertently opened ports to the wider Internet. Unlike Ben and myself though, they have not figured it out yet.

@thaJeztah that assumes that I am doing it via the command-line and not using another tool where I only have to set an IP address.

For example, I'm working on an API. I have a tool that I can use to work with that API in production to support it; for development of the tool and the API I want to just point the tool at the dockerized API. The tool knows nothing about docker, nor should it. And I don't necessarily want to put the tool into docker just to use it - pointing it at a port exposed only to the local host should be sufficient.

@jcheroske I agree, but I don't know that there's a good solution to _that_ aspect. For that, ufw probably needs to be made smarter to be able to lookup and report on rules that it wasn't involved in creating. There's lot of software out there that can adjust the iptables rules in ways that ufw (or AFAIK firewalld, etc) won't know about. There's not really a simple solution to fixing that either.

That said, it would be nice if Docker could integrate with those to dump out the appropriate config files to be able to enable/disable them, or integrate with those tools such that it gets hooked in and dumps out the information appropriately, however, given there are better solutions I don't think _that_ aspect will really be solved. Here, it's more about just limiting the scope of the iptables rules that are being generated to at least minimize the potential impacts by allowing the specification of the source (lo, eth0, 127.0.0.0/24, etc).

If you are willing to do so, using iptables does make this totally possible.

This is a trimmed-down example of how you can use it: https://gist.github.com/SeerUK/b583cc6f048270e0ddc0105e4b36e480

You can see that right at the bottom, 1.2.3.4 is explicitly given access to port 8000 (which is exposed by Docker), then anything else to that port is dropped. The PRE_DOCKER chain is inserted to be before the DOCKER chain so that it is hit first, meaning the DROP stops the blocked requests from ever reaching the DOCKER chain.

It's a bit annoying that Docker doesn't have this functionality built-in, but it is possible work around it right now.

Another alternative would be using an external firewall. Some places like AWS and Scaleway offer things like security groups where you can manage access to your boxes from outside, from there every port behaves the same way.

I never actually managed to figure out how to make this work with UFW. Though for now, I'm happy with using iptables as a solution. It seems to be working out very well for me so far.

Obviously, this isn't much of a great solution if you have already built a reasonably complicated set of firewall rules around UFW. Though, it does make using things like iptables-persistent quite easy. You can also use alternative ways of allowing access to this way that seem more "normal" in iptables.

@BenjamenMeyer have you thought of using an user-defined docker network with a subnet & ip-range option and assigning a static ip-address for containers & using them for local development so that you don't have to depend on a virtual static ip such as 127.0.0.1 ? That will avoid the need to have port-mapping all together for those containers that are private to a host.

docker network create --subnet=30.1.0.0/16 --ip-range=30.1.0.0/24 mynetwork
docker run --net=mynetwork --ip=30.1.1.1 --name=myservice1 xxxx
docker run --net=mynetwork --ip=30.1.1.2 --name=myservice2 yyyy

With this setup, myservice2 can reach myservice1 by name myservice1 and there is no need to even depend on the static ip. Also the host can reach the static-ip freely without the need to have port-mapping.

Also with compose 1.7, you can specify static ip address for containers and specify network subnets and ranges.

I did figure out a simple workaround.

1) Edit /etc/default/docker: DOCKER_OPTS="--iptables=false"

2) Add ufw rule: ufw allow to <private_ip> port <port>

So simple that it really makes me wonder why the --iptables=false option is not the default. Why create such a situation when all docker has to do is say, "Hey, if you're running a firewall you're going to have to punch a hole through it!" What am I missing?

https://fralef.me/docker-and-iptables.html
http://blog.viktorpetersson.com/post/101707677489/the-dangers-of-ufw-docker

I can't get docker to stop modifying iptables to save my life. Tried updating /etc/default/docker to no avail on Ubuntu 16.04

@enzeart Try /lib/systemd/system/docker.service.

@SeerUK Bless your soul

@enzeart to configure a daemon running on a host that uses systemd, it's best to not edit the docker.unit file itself, but to use a "drop in" file. That way, you won't run into issues when upgrading docker (in case there's a newer docker.unit file). See https://docs.docker.com/engine/admin/systemd/#custom-docker-daemon-options for more info.

You can also use a daemon.json configuration file, see https://docs.docker.com/engine/reference/commandline/daemon/#daemon-configuration-file

@mavenugo There's already a docker network in place.

@jcheroske that works, but as I noted it would mean that the _end-user_ (me) would then have to make sure that all iptables rules were correct, which is not optimal and not nearly as likely to happen as having docker do it automatically, thus this issue.

Hi, please up. I think its issue too. Containers chain in Iptables need be after main rules and not be exposed to the world by default.

I would really like to see Docker (and docker-compose) having the ability to whitelist or blacklist IPs that can access that port.

Take for example:

nginx:
    ports:
      - "8000:8000"
    whitelist:
      - 10.6.20.2

Would imply that only a source IP of 10.6.20.2 could access port 8000 on this host.

@StefanPanait I really like that idea. It could also work with a similar syntax to volumes and access/deny lists, something like:

nginx:
  access:
  - "10.0.1.6:allow"
  - "deny"

It would of course still have to allow things like inter-container communication.

@SeerUK inter-container communication should be a default, but why should you not be able to prohibit one container from talking to it? That could be extremely useful for debugging...

Though I guess the proper way to do that would be to separate the docker networks...still, I think being able to do something like:

nginx: access: - "10.0.1.6:allow" - "webapi:allow" - "database:deny" - "deny"
Could be useful...question is, is it useful enough to justify implementation to that degree? I don't know.

At present, I'd like to see the original issue resolved, then features like this can be added if they don't make sense to design into the resolution to start with (they might).

why should you not be able to prohibit one container from talking to it? That could be extremely useful for debugging

That's were docker network disconnect is for? You can disconnect a container from the network for debugging, and reattach it with docker network attach

For those that just discovered that a ton of ports were open on their internet exposed servers, after utilizing UFW, I dug and dug and discovered the following:

Ubuntu 16.04 with UFW and Docker presents new challenges. I did all the steps as shown here: https://svenv.nl/unixandlinux/dockerufw BUT I could NOT get docker plus UFW to work on 16.04. In other words no matter what I did all docker ports became globally exposed to the internet. Until I found this: http://blog.samcater.com/how-to-set-docker-1-12-to-not-interfere-with-iptables-firewalld/
I had to create the file: /etc/docker/daemon.json and put the following in:

{
"iptables": false
}

I then issued sudo service docker stop then sudo service docker start FINALLY docker is simply following the appropriate rules in UFW.

Additional data: https://chjdev.com/2016/06/08/docker-ufw/

@thaJeztah

why should you not be able to prohibit one container from talking to it? That could be extremely useful for debugging
That's were docker network disconnect is for? You can disconnect a container from the network for debugging, and reattach it with docker network attach

And what if it was desired to still have network connectivity? Example: Testing Health Check failure from server in Container B for Container A while still having services provided by Containers C, D, and E. Easier to just disallow Container B from taking to Container A than close out the whole network - Containers C might also depend on access to Container B for its Health Check to pass.

Still, that doesn't withstand the "let's fix the original issue".

@gts24 interesting find.

IMHO, the whole problem is that Docker, like literally all other programs, shouldn't touch the firewall (iptables or otherwise) at all. When I install (e.g.) apache and tell it to listen on 0.0.0.0:80 it's my decision to open up port 80 on the firewall or not, where I can specify any rules for it that I wish to.

Instead of reinventing firewall rules on docker (and/or compose) configuration files, the whole PUBLISH feature should be deprecated, and a new LISTEN feature created to work like all other programs do. At best, docker could create disabled by default firewalld services for each port/container, on systems who use it.

@gcscaglia set --iptables=false on the daemon and you should get that

@thaJeztah That feature is useless because Docker offers no alternative to get the necessary firewall rules (or more specific trigger script hooks with the relevant parameters) from the daemon. It is only barely usable if you have static containers and never change anything about them (e.g. published ports) but for all other cases you can forget about it.

@thaJeztah exactly what taladar said.

--iptables=false should be renamed to --networking=false since not even internal container-to-container networking works with it disabled. An option to have a containers listen on some port/interface combination without punching holes on the inbound firewall rules (i.e LISTEN) would solve all this, be backwards-compatible and allow --iptables=true to be used.

With such an listen mode, those who want it to "just work" can keep using PUBLISH, and those who want control can use LISTEN.

@gcscaglia ok, so if you want docker to setup the basic rules and handle container-container networking but not "publishing". You can keep --iptables enabled, but _don't_ use -p / --publish. You should be able to manually set the IPTables rules to forward ports to containers. The container already listens on it's own private IP-address, and the ports that the service in your container listens on.

@thaJeztah No, you can't. Because you have no idea that there even is a container that just started and needs firewall rules. You have no way to tell whoever used the API to launch it on which host port it is listening either.

Simple example. We use Docker container to run Jenkins jobs. They expose their SSH Port so Docker can contact them. They are destroyed as soon as the job is done. Docker does not offer any way to get this to work with --iptables=false because you have no way to tell Jenkins (using the Docker API to launch the container) the host port and you have no way to even trigger a script to setup the necessary firewall rules.

Your idea only works for the ridiculously simple use case of having permanent, never changing containers launched manually. Even a simple --restart=always in the container will break in this setup unless the container has a static IP.

@taladar I'm replying to @gcscaglia's use case, who requested a feature where docker does _not_ manage IPTables to open up ports, and where they are in control over IPTables; i.e., no container is exposed, unless done so manually. Also, in my reply I explained to _not_ use --iptables=false, but to keep it enabled, but just don't use the -p / --publish feature (which tells docker to make those ports accessible through IPTables).

@thaJeztah While you've got my use case right, AFAIK the proposed solution won't work for me for the same reasons it won't work for taladar case: A single restart of anything and suddenly my manual rules need to be updated. There are no hooks or triggers I can use to be notified of a restart so I can automate such an update (not to mention I did be reinventing the wheel).

Currently the only solution for my case is to have another firewall (which docker can't touch) sitting between the docker host and all external networks. But if Docker would simply do everything it already does except opening ports to the world, I could do away with second firewall entirely.

I guess you can't have everything, but it surely is frustrating. Do you thing it's worth to open a feature request about my LISTEN idea, or Docker team wouldn't have interest in such a feature?

What would the LISTEN do? There's EXPOSE, which allows you to annotate what ports the container listens on. For triggers, you could listen to the docker events.

Not saying there's no room for improvement here (I know there's being looked into), just wondering what you'd expect

Currently, as you said, all services running in a container bind to the container private IP (if you don't use --net=host or similar). This is good and desirable since such isolation between the host and the containers is exactly Docker sell point.

But, currently, if I want a application running outside any container (be it on the host or elsewhere on the network) to have access to a service running inside a container I need means to make that service listen on one of host's network interfaces. To solve this problem without exposing any host's interfaces to the container, Docker created the -p / --publish feature, which:

  1. Creates iptables rules to forward a chosen port on a chosen host's interface to a chosen port on the container private IP (Which we all expect since that's what we asked for)
  2. Tells iptables to allow anyone from anywhere in the world access to that port on the chosen host's interface (Which is unnecessary for forwarding to work and hence catches many of us by surprise)

What I'm proposing is a feature (named LISTEN or otherwise) that does only "1" and lets "2" to the user's discretion, like all other services / programs usually do.

As for EXPOSE, AFAIK that's only metadata in Docker images so the daemon knows what to do when the user specifies -P (publish everything). Perhaps I'm wrong about it, and "exposed" ports can be forwarded to in a restart-resistant fashion (w/o world-wide access)?

--off-topic--

IMHO the OP of this issue is asking exactly why -p does "2" and how to stop Docker from doing it. While a daemon-level setting (other than disabling networking) could solve it, in order to keep things backward-compatible, the best would be a new feature (i.e. LISTEN or some other name).

While many users like this "works out of the box" effect, no sysadmin expects programs other than their iptables / firewalld to go around opening ports on the firewall, even less so in a way their firewall management software doesn't reports on.

I see there being three major problems:

  1. Docker's way of publishing/exposing ports bypasses common firewall rules like UFW.
  2. The above fact doesn't appear to be well-documented (it is completely unexpected since no other Linux services I know of bypass firewall rules themselves).
  3. There is no simple way to adapt iptables to Docker's way of publishing/exposing ports or the ways people know of are not easy to automate and none are documented.

Perhaps it is technically not feasible to implement port exposing at the FILTER table and so fixing 1 is not possible. At the very least this should get a big warning somewhere (fix 2) but ideally 3 could be solved with new options like people here have suggested such as allow or deny which would add additional firewall rules automatically to allow or deny specific IPs for the exposed/published ports.

E.g. allow: "tcp:{your_trusted_ip}" for a container named "elasticsearch" publishing port 9200 might do something like:

iptables -t mangle -N DOCKER-elasticsearch
iptables -t mangle -A DOCKER-elasticsearch -s {your_trusted_ip} -j RETURN
iptables -t mangle -A DOCKER-elasticsearch -j DROP
iptables -t mangle -I PREROUTING -p tcp --dport 9200 -j DOCKER-elasticsearch

I found this pretty useful from the Docker docs: https://docs.docker.com/engine/userguide/networking/default_network/container-communication/#communicating-to-the-outside-world

Docker’s forward rules permit all external source IPs by default. To allow only a specific IP or network to access the containers, insert a negated rule at the top of the DOCKER filter chain. For example, to restrict external access such that only source IP 8.8.8.8 can access the containers, the following rule could be added:

$ iptables -I DOCKER -i ext_if ! -s 8.8.8.8 -j DROP

@jmimico yes I have come across that before. How would you restrict access from 2 or more IPs?

What is so difficult about Docket just adding an option to run a shell script in any place where it now creates iptables rules with all the Docker-internal information passed to the script as parameters? That would allow everyone to create exactly the rules they need. Add some way to trigger a re-execution of the scripts for active containers people can call after they did an iptables-restore or flushed the chains for some other reasons and you are done.

Docker does not need to recreate all kinds of pre-built firewall scenarios as some people here are suggesting, that could be built on top of a system like that by distribution sets of hook scripts. At most something like exposing only on localhost and exposing globally (as Docker does now) might make sense. IPTables has too much flexibility for Docker to ever hope to model all scenarios directly in settings.

This ticket has been around seemingly forever, the current behaviour makes Docker borderline unusable (seems to be the standard way to implement features in this project, e.g. lack of proper built-in GC or barely one usable, kernel-bug-free, performant storage backend,...) and there is an easy solution to allow people to implement their own solutions that fit their environments.

@StefanPanait Good question. My bet is that you would need to leverage the use of object-groups. Populate the object-groups with white-listed IP's and then use that object-group in the first line of the DOCKER chain.

Example:
iptables -N docker-allow
iptables -A docker-allow -s 1.1.1.1 -j ACCEPT
iptables -A docker-allow -s 2.2.2.2 -j ACCEPT
iptables -A docker-allow -s 3.3.3.3 -j ACCEPT
iptables -A docker-allow -j DROP

iptables -I DOCKER -i ext_if -j docker-allow

Are not rules added to DOCKER chain ever clobbered by things like daemon restart? I think that is the problem with manual solutions is that they are hard to do right and so there is a strong case to better support the common cases (block/allow individual IPs) more directly.

Perhaps a plugin would be the appropriate place to support this? E.g. maybe a "ufw" plugin could add rules in a ufw-compatible way so that the user could manage the firewall effectively with their regular firewall toolchain and docker services would behave more like normal host services.

Are not rules added to DOCKER chain ever clobbered by things like daemon restart? I think that is the problem with manual solutions is that they are hard to do right and so there is a strong case to better support the common cases (block/allow individual IPs) more directly.

Adding the dropping rule to DOCKER-USER chain seem to work better in making it persistent across docker restarts.

With Docker v.17.06 there is a new iptables chain called DOCKER-USER. This one is for your custom rules, see my answer on serverfault: https://serverfault.com/questions/704643/steps-for-limiting-outside-connections-to-docker-container-with-iptables/886257#886257

As I commented on SF, I miss why this DOCKER-USER chain is any different than any other user-added chain.. It has no filter pre-applied to it and it filters all traffic and not just docker-container-destined traffic so you still have to specify the interface names yourself and it is still prone to non-iptables expert making grave mistakes.

On the other hand it still goes with the "Docker is the only iptables user" mentality that sucks for people who do want to use iptables for more than just Docker. So it is bad for the whole range of potential users besides maybe the people dedicating entire hosts to nothing but Docker.

OK, so using DOCKER-USER solves the issue of order of inserts by ensuring that it always comes in the chain before the other Docker-related rules. However, it doesn't make whitelisting traffic to a container by port number much easier since at this point the --dport is the port of the service inside the docker container, not the exposed port. Example:

Publish port 9900 to expose a docker service listening internally on 9000.

$ sudo iptables -A DOCKER-USER -m limit --limit 20/min -j LOG --log-prefix "IPTables: "
$ docker run --rm -it -p '192.168.56.101:9900:9000' alpine nc -l 9000

From another machine on the network:

$ telnet 192.168.56.101 9900

What gets logged:

IPTables: IN=enp0s8 OUT=docker0 MAC=08:00:27:b6:8d:d6:0a:00:27:00:00:04:08:00 SRC=192.168.56.1 DST=172.17.0.2 LEN=52 TOS=0x00 PREC=0x00 TTL=127 ID=14127 DF PROTO=TCP SPT=51208 DPT=9000 WINDOW=64240 RES=0x00 SYN URGP=0
IPTables: IN=docker0 OUT=enp0s8 PHYSIN=veth05ba007 MAC=02:42:0f:f9:76:4c:02:42:ac:11:00:02:08:00 SRC=172.17.0.2 DST=192.168.56.1 LEN=40 TOS=0x00 PREC=0x00 TTL=63 ID=23041 DF PROTO=TCP SPT=9000 DPT=51208 WINDOW=0 RES=0x00 ACK RST URGP=0

So as you see there is no opportunity at this point to filter traffic to port 9900. Sure, I could filter traffic to 9000 but this is a problem because the internal port could unintentionally overlap between multiple containers or even services running on the host. This is one of the great selling-points of Docker that you can run multiple services on one host and not worry about port conflicts. So, many containers are designed to just listen on a port and the user can use the --publish option to change which port is exposed on which interface:

$ docker run -d -p 7777:6379 --name data1 redis
$ docker run -d -p 8888:6379 --name data2 redis

However, I cannot use DOCKER-USER (please correct me if I'm wrong) to affect traffic to data1 without also affecting traffic to data2 unless I use some sort of introspection to discover the destination container IP which is transient which brings you back to square one on finding a simple, reliable way to firewall published services without scripting and introspection.

To be clear, this DOES NOT work:

$ sudo iptables -A DOCKER-USER -p tcp -m tcp -s 192.168.56.0/24 --dport 7777 -j RETURN
$ sudo iptables -A DOCKER-USER -p tcp -m tcp -s 10.0.24.0/24 --dport 8888 -j RETURN
$ sudo iptables -A DOCKER-USER -p tcp -m tcp --dport 7777 -j DROP
$ sudo iptables -A DOCKER-USER -p tcp -m tcp --dport 8888 -j DROP

This does work, but the result is that both services are exposed to both whitelisted CIDRs:

$ sudo iptables -A DOCKER-USER -p tcp -m tcp -s 192.168.56.0/24 --dport 6379 -j RETURN
$ sudo iptables -A DOCKER-USER -p tcp -m tcp -s 10.0.24.0/24 --dport 6379 -j RETURN
$ sudo iptables -A DOCKER-USER -p tcp -m tcp --dport 6379 -j DROP

So it looks like DOCKER-USER is only useful for exposing all ports to specific IPs but not exposing specific ports to specific IPs unless you don't mind writing your iptables rules to reference internal port numbers and don't have multiple containers using the same internal port numbers. Everyone seems to be missing these points and running with DOCKER-USER as a solution and I think this deserves a new, better solution.

@SeerUK
The solution posted in https://gist.github.com/SeerUK/b583cc6f048270e0ddc0105e4b36e480 did not work for me. Could you please help?

docker error is to use iptables (system firewall) for doing a application routing. It will create for ever problems and chaos . Change iptables it is very dangerous. To add this function embedded in docker it is not so expesive. In addition it could have also concurrent inconsistent status if you execute docker in concurrent way .

It seams iptables has strange behaviours if you change iptables when docker is running. If you stop docker set iptables and the restart docker all working as foreseein. My worry is ... if i have to change iptables in production?

It seams iptables has strange behaviours if you change iptables when docker is running.

iptables is iptables, it doesn't care if docker is running or not..

It is better if you make a lot of tests before ... to tell iptables is iptables is just a tautology. It is my suggestion :)

I think his point was that iptables does not behave differently when Docker is running like your comment suggested. This is clearly a Docker issue with the way they use iptables as if nobody else needed it on the same system.

I do not see why DOCKER-USER does not resolve this.
You use iptables to filter however you like: source port, target port, source addr, non-local traffic, etc.

The whole point of DOCKER-USER is that it runs whatever rules the user wants to run before any docker rules get run, this allows you to do whatever you want to with the traffic before it hits docker.

@cpuguy83 the core issue is that docker is opening ports to the public network - outside the system - without notice because it doesn't tie the exposed ports to a network interface (such as eth0 or lo) or a specific IP (f.e 127.0.0.1, 172.16.1.1); nor do the rules introduced by Docker show up in iptables management tools like UFW - thereby users may be unaware that the various docker containers are accessible from any system on the network, not just their local host.

DOCKER-USER cannot resolve that because it also doesn't tie the docker networks to a specific network interface or a specific IP (f.e 127.0.0.1).

Per my original request:
1.Docker ought to tie itself to a specific IP - based on the Docker Network configuration - and then allow the user to add rules to expose the container off-system (using the tooling of their choice); by default the container should not be exposed off-system.

  1. That cannot be done without a transition period due to at least some folks relying on the historic behavior.

IMHO, the whole problem is that Docker, like literally all other programs, shouldn't touch the firewall (iptables or otherwise) at all. When I install (e.g.) apache and tell it to listen on 0.0.0.0:80 it's my decision to open up port 80 on the firewall or not, where I can specify any rules for it that I wish to.
Instead of reinventing firewall rules on docker (and/or compose) configuration files, the whole PUBLISH feature should be deprecated, and a new LISTEN feature created to work like all other programs do. At best, docker could create disabled by default firewalld services for each port/container, on systems who use it.

Well said @gcscaglia ! Even more baffling, there's little or no mention of all this in https://docs.docker.com/engine/reference/run/#expose-incoming-ports , which I think is where most of those who do care to look for a little bit of documentation to supplement learning-by-example end up at. There should be a bright red box somewhere explaining the risks of Docker overriding pre-existing iptables rules.

If you don't want docker to manage iptables, then set --iptables-=false
If you only want docker to open ports on a specific interface, then you can also set that in the daemon config.

@BenjamenMeyer
You could block all traffic in DOCKER-USER and only let through what you want.

You could block all traffic in DOCKER-USER and only let through what you want.

@cpuguy83 Please tell me if anything I say here is incorrect:

$ docker run -d -p 7777:6379 --name data1 redis
$ docker run -d -p 8888:6379 --name data2 redis

However, I cannot use DOCKER-USER (please correct me if I'm wrong) to affect traffic to data1 without also affecting traffic to data2 unless I use some sort of introspection to discover the destination container IP which is transient which brings you back to square one on finding a simple, reliable way to firewall published services without scripting and introspection.

To be clear, this DOES NOT work:

$ sudo iptables -A DOCKER-USER -p tcp -m tcp -s 192.168.56.0/24 --dport 7777 -j RETURN
$ sudo iptables -A DOCKER-USER -p tcp -m tcp -s 10.0.24.0/24 --dport 8888 -j RETURN
$ sudo iptables -A DOCKER-USER -p tcp -m tcp --dport 7777 -j DROP
$ sudo iptables -A DOCKER-USER -p tcp -m tcp --dport 8888 -j DROP

This does work, but the result is that both services are exposed to both whitelisted CIDRs:

$ sudo iptables -A DOCKER-USER -p tcp -m tcp -s 192.168.56.0/24 --dport 6379 -j RETURN
$ sudo iptables -A DOCKER-USER -p tcp -m tcp -s 10.0.24.0/24 --dport 6379 -j RETURN
$ sudo iptables -A DOCKER-USER -p tcp -m tcp --dport 6379 -j DROP

So there is no way to control traffic to/from data1 container independently of data2 container using DOCKER-USER. To me this makes DOCKER-USER not much of a solution.

@colinmollenhour

If you have a generic drop rule in DOCKER-USER, how is this different than if docker appended vs prepended it's main jump rule?

@cpuguy83 my point is not against the override per-se. What I am saying is that having Docker override pre-existing iptables rules should most definitely be an opt-in feature, not an opt-out feature.

And even as an opt-out - which it should not be - it should be extremely well documented as it is completely unexpected and fairly counter-intuitive to debug. Especially considering how many use ufw. I know of no other software doing something like this without very explicit warnings and I would personally stay away from such software.

Furthermore, this is exacerbated by the fact that most Docker users - at least in my experience - start using it in environments where additional network infrastructure masks the issue, reinforcing the assumption that the machine is configured as they believe it is.

I hope Docker will move to a LISTEN-only approach, as exemplified by @gcscaglia .

If you have a generic drop rule in DOCKER-USER, how is this different than if docker appended vs prepended it's main jump rule?

I don't think I understand your question... Please ignore my Apr 22, 2017 comment as that is just in regards to the persistence which DOCKER-USER does indeed solve. The problem I'm pointing out is not about the persistence.

@taladar What are you thumbs downing, exactly?

@jacoscaz

First, I think all of the maintainers agree that the existing behavior is not ideal. Unfortunately the behavior has existed since forever. Changing a default for something that is used by millions of people is not really something we can do here. This is one of the reasons why we added DOCKER-USER so at least people can inject whatever rules they need to.

However we cannot get around needing to use iptables (or ebpf, or some other nating solution) AND provide the functionality that -p provides... flip side, if you want to prevent people from poking holes in the firewall, disallow them from using -p.

To recap, you can:

  1. Tell docker to (by default) bind to a specific address (the default address is of course 0.0.0.0, or all interfaces)... however this won't prevent someone from manually specifying an address in the -p spec (e.g. -p 1.2.3.4:80:80)
  2. Inject custom rules into DOCKER-USER, including a deny all
  3. Disable iptables management with --iptables=false

Is there something else you can suggest that doesn't include breaking existing users?

@cpuguy83 I understand that. I am not advocating for such a change to happen drastically from one version to another. I do think it would be good change to plan for over the course of a number of versions. I also do not think Docker should stop using iptables entirely. Forwarding is different from allowing and as far as I can understand Docker should be able to forward by default without allowing access to anyone anywhere by default.

As far as things that can be done now, additional documentation about this should be the first and foremost. Which would be the most appropriate channel to raise this with the maintainers of docker.com?

In the absence of interest on their part, keeping this conversation alive is probably the best way to maximize the chances of this behavior becoming more widely known.

I agree with the call to add additional documentation for this functionality. As far as I can tell it is only mentioned on https://docs.docker.com/network/iptables/. I only found out about this issue after trying to figure out why a service that I had restricted to be available to a specific IP through the firewall was available to the public. This behaviour was completely unexpected to me as it runs contrary to every other networked service that I have ever dealt with.

I understand that there are workarounds, but I believe this should be looked at for a long term change to the functioning of Docker. I like the idea of setting it to LISTEN on a port, and allowing user defined rules to build ontop of that.

As suggested I added a DROP rule to DOCKER-USER to prevent containers being exposed to the outside accidentally (which alarmingly, did happen).

However, I now have one service I wish to expose. But as @colinmollenhour has explained, because NAT happens before the filtering, I can only filter on the docker ip (which isn't fixed) and internal port number (which could be the same for several containers).

So how can I expose this one service?

@SystemParadox that's one of the many reasons that DOCKER-USER isn't a real fix to the problem.

@cpuguy83
I do like the proposed LISTEN solution, and have never advocated a breaking change from one release to another; but rather advocated making the change over a series of versions with appropriate notices going out b/c I do realize that lots of people use Docker and having a breaking change from one version to another would be detrimental to all.

I also agree that updating the Docker documentation regarding -p and EXPOSE, etc should be a priority that is done immediately to at least draw awareness to the issue. In my experience most folks using Docker are not firewall experts so they're trusting Docker to do what they expect, which it isn't in the current design.

Further, the recaps solutions in https://github.com/moby/moby/issues/22054#issuecomment-425580301 doesn't really work either. Why? I don't run Docker directly, I run through Docker Compose - based on the YAML; IP Addresses are dynamic (controlled by Docker) and often I deploy multiple services within the same Docker Network that need to interact with each other. Thus both the -p usage and address binding usage (option 1 in the recap) are non-solutions. DOCKER-USER doesn't really solve anything as others have pointed out (option 2 in the recap) and disabling IP Tables entirely (option 3 in recap) also doesn't help anything b/c now everything is broken (IPs are dynamic so hard to script out a solution; inter-container networking is broken b/c Docker relies on IPTables to move between the containers; etc).

Again, there is no call in this thread for a breaking change between two versions; but a call for a planned phased approached that enables folks to migrate appropriately.

As a workaround, you can access the original destination port using -m conntrack --ctorigdstport.

So in my case I have the following:

-A DOCKER-USER -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
# Allow docker out
-A DOCKER-USER -s 172.17.0.0/16 -j ACCEPT
# Allow access to docker service mapped to host 8702 (the service is actually listening on port 8088 in the container)
-A DOCKER-USER -p tcp -m conntrack --ctorigdstport 8702 -j ACCEPT
# Prevent access to docker from outside
-A DOCKER-USER -j DROP

@SystemParadox

Hooray for the correct iptables solution! 🍻

I had never heard of --ctorigdstport but I guess that is because I've not read about and tried every possible iptables extension and you're the first to mention it in this context to my knowledge.

I tested and this does indeed work:

$ docker run -d -p 7777:6379 --name data1 redis
$ docker run -d -p 8888:6379 --name data2 redis
$ sudo iptables -N DOCKER-USER-redis1
$ sudo iptables -A DOCKER-USER-redis1 -s 192.168.56.0/24 -p tcp -m tcp -j RETURN
$ sudo iptables -A DOCKER-USER-redis1 -j REJECT --reject-with icmp-port-unreachable
$ sudo iptables -N DOCKER-USER-redis2
$ sudo iptables -A DOCKER-USER-redis2 -s 10.0.24.0/24 -p tcp -m tcp -j RETURN
$ sudo iptables -A DOCKER-USER-redis2 -j REJECT --reject-with icmp-port-unreachable
$ sudo iptables -A DOCKER-USER -i eth0 -p tcp -m conntrack --ctorigdstport 7777 -j DOCKER-USER-redis1
$ sudo iptables -A DOCKER-USER -i eth0 -p tcp -m conntrack --ctorigdstport 8888 -j DOCKER-USER-redis2

I think an example like this belongs in the docs as it probably covers what 99% of users are looking for: the ability to expose ports using -p but still be able to control traffic to them using common filters like -s.

I created a request to update the Docker documentation regarding iptables.

https://github.com/docker/docker.github.io/issues/8087

The solution listed there in https://unrouted.io/2017/08/15/docker-firewall/
seems to be something similar, creating an additional iptables chain called FILTERS
to where chains INPUT and DOCKER-USER jump to.

@SystemParadox @colinmollenhour After testing --ctorigdstport I can confirm it works, but with a small caveat.

In my case, I have a dockerized PHP application on Apache listening on port 80. My rules allowing only 1.2.3.4 are as follows:

-A DOCKER-USER -s 1.2.3.4/32 -i eth0 -p tcp -m conntrack --ctorigdstport 80 -j ACCEPT
-A DOCKER-USER -i eth0 -p tcp -m conntrack --ctorigdstport 80 -j DROP

So my dropping rule is a bit more specific than yours, dropping only packets hitting my webserver — or so I thought. In fact, it was dropping packets directed to my webserver as well as packets returning with responses from requests made by PHP application to third-party servers.

This is due to the fact that --ctorigdstport matches not the destination port on the packet being filtered, but on the packet that initiated the connection. So, responses to requests going out from Docker to other servers will have SPT=80 and also will match --ctorigdstport 80.

If anyone want to have tighter control in DROP rules, --ctdir also has to be added:

-A DOCKER-USER -i eth0 -p tcp -m conntrack --ctorigdstport 80 --ctdir ORIGINAL -j DROP

In fact, all rules allowing the connection should also have --ctdir added for exactly expressing their meaning:

-A DOCKER-USER -s 1.2.3.4/32 -i eth0 -p tcp -m conntrack --ctorigdstport 80 --ctdir ORIGINAL -j ACCEPT

@jest wow that's really important to know! I hadn't realised it could match packets in the other direction. It makes sense when you think about it, as it's matching against the state of the whole connection, but it's easy to miss when reading the docs.

@SystemParadox yeah, I hadn't a chance to inform myself through the docs and was caught by surprise by requests from Docker that were hanging waiting for responses. :)

I keep going round in circles with the reasons for needing --ctdir ORIGINAL. On one hand, the explanation by @jest makes perfect sense, and on the other hand I never normally have to deal with reply packets so why should it be any different here?

I think the difference is that I have -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT as the first rule, so the rest of my rules never see any reply packets. In this case, I think --ctdir ORIGINAL isn't strictly required, although it would probably be safer to include it anyway.

@jest, would you agree with this? Presumably you don't have an early ESTABLISHED,RELATED -j ACCEPT rule, which is why this makes a difference for you?

@jest Your post was a great help, thanks.

Is your approach required only for stuff managed by docker, or for everything? For example, my ssh port (22) has nothing to do with docker. Do I need to use -m tcp --dport 22 as usual, or must I use -m conntrack --ctorigdstport 22 --ctdir ORIGINAL?

I assume your approach is only needed for the docker-managed traffic, as those packets undergo mangling/natting before they get to me in the filters table. BUT, I'm new to iptables so I want to be sure from someone who knows more than me!

@lonix1 The rules are added by docker to DOCKER chain only if you expose a port, and probably only when the container is running. If in neither of your container you expose port 22, you should have your firewall rules working unmodified.

@SystemParadox It's been some time and I don't have access to that server to check, but if I remember correctly, there were ESTABLISHED,RELATED rules (managed by UFW, in ufw-before-input chain). However, in my case, they would not match on the first packet (SYN) of connections made from docker to Internet hosts on port 80. And would be dropped by the approriate rule in DOCKER-USER when there was no --ctdir.

@Ionix1, packets for services on the host only go through INPUT, whereas packets for docker services only go through FORWARD and DOCKER-USER.

For example, given an external IP of 10.0.0.1 and two containers with -p 4000:80 and -p 4001:80, you would see packets with the following properties:

INPUT:
dst 10.0.0.1 dport 80 ctorigdst 10.0.0.1 ctorigdstport 80
FORWARD/DOCKER-USER:
dst 172.17.0.5 dport 80 ctorigdst 10.0.0.1 ctorigdstport 4000
dst 172.17.0.6 dport 80 ctorigdst 10.0.0.1 ctorigdstport 4001

So you could safely use --dport 80 for the INPUT rules, as they're in totally separate chains. As you can see, --ctorigdstport 80 would still match, but unless you're also mangling input for some reason I probably wouldn't do that.

You might also notice that you could actually use --dport 80 with --dst 172.17.0.5 to filter packets for a specific docker container, but that IP isn't predictable, which is why we're using --ctorigdstport.

Ultimately you need to be aware of what packets any given rule might match, depending on what chain you are in, what the destination is, and whether there is any mangling going on.

@jest thanks, that would seem to confirm my thinking.

So, I'm needing a little direction... to switch from UFW to iptables...
-do i just turn off UFW via "ufw disable"?
-do i create my own .sh file, or is there an existing one (I'm Ubuntu on DigitalOcean)?
-i just need to tell Docker "--iptables=false"? (that's it for Docker?)
-DOCKER-USER is already created and chained by Docker?

@fredjohnston you can continue using UFW if you want. Profiles are stored in /etc/ufw. The issue here is that Docker won't show up b/c there's no app profile listed in /etc/ufw/applications.d (same for any other firewall tool and its configuration).

Disabling IPTables in Docker means you won't have much of any networking in Docker, just IP addresses and containers won't be able to talk to each other. DOCKER_USER is a hack to give you some control, but really doesn't solve the issue - which is really about not making the Docker containers public on the network by default, but locked to the IP address of the container.

For the moment, I do recommend you continue using whatever Firewall tool you're most comfortable with (ufw, etc) but be aware that Docker Containers will be public on your network.

Since I'm in here any way - I'm actually running into a related issue now where this is an issue on platforms other than Linux too. Consider the following:

  • Project A has two containers, one for a database and one for an application.
  • Project B has two containers, one for a database and one for an application.
  • Both projects are isolated from each other (separate source repositories, configurations, etc)
  • Both projects are managed under Docker Compose
  • Both projects expose their Database Ports for local development purposes
  • Both projects use the same database server (postgres, mysql, etc)

Now suppose you want to run both projects locally - for instance to work on a shared library that both projects use so that you can easily pull it into their code for testing.

Under the current firewall interaction design - and what in part leads to the issues above about exposing the container to the public network without the user's knowledge - the database dockers of both projects cannot be run at the same time since they will be in contention for the exposed port for the database. Same if you wanted to expose both their application port and they both used the same port for the application server - common since HTTP-based APIs and applications are extremely common now especially in cloud oriented applications.

Sure you can hack your way to setting both up under one DB container; but you're not isolating them per your project design, and have to be even more careful about configurations, etc.

In a proper solution here is two fold:

  1. The containers would be only bound to their IPs and their exposed ports would be bound to their IPs alone within their respective Docker Networks, not on the system match all IPs (0.0.0.0, ::).
  2. Docker also wouldn't publicly expose the route to the Docker Networks off the system by default. Docker Networking can be utilized to establish inter-network (docker network to docker network) connections as currently designed, and then also allow local host connections by default.
  3. Users would then be on the hook for adding the appropriate firewall rules to expose the container to the outside world when and if desired - for example, by port forwarding port 443 to port 443 of their container of choice.

Again, this could be done gradually:

  • Release 1: Implement Steps 1 and 2; but add non-localhost routing too (temporarily) with a warning. Current behavior of first-come-first-serve for getting a port is maintained; and a warning about this behavior going away is issued.
  • Release 1+N: Drop the warning, and drop the non-localhost routing. Require Step 3 for users wanting Docker to expose the ports off system and make sure this is well documented.

DOCKER_USER is a hack to give you some control, but really doesn't solve the issue - which is really about not making the Docker containers public on the network by default, but locked to the IP address of the container.

For the moment, I do recommend you continue using whatever Firewall tool you're most comfortable with (ufw, etc) but be aware that Docker Containers will be public on your network.

There's a document update request here https://github.com/docker/docker.github.io/pull/8357 to create a bolt-on iptables static configuration but I'm not sure of the status of it.

Edit: I noticed that you have misunderstood the meaning of DOCKER-USER. It's not used to "make containers locked to the IP address of the container" but to filter access to the container with iptables rules.

@aki-k per DOCKER_USER I'm aware DOCKER_USER doesn't lock the container against the IP Address and never claimed that it did. DOCKER_USER simply turns the security issue over to the user to manage - meaning the user has to know how to manipulate their firewall to ensure they actually have a secure environment. Making it a user issue is equally unacceptable because most users do not know how to manage their firewall - firewall rules are hard, and even those of us that do know a good bit about writing firewall rules can still often get things wrong.

My request - and whole point - in the issue is that Docker needs to be secure by default and not expose containers to the outside world without the user's knowledge or explicit intervention to do so. And per my last comment (https://github.com/moby/moby/issues/22054#issuecomment-552951146) doing so will also have some other technical benefits too.

My request - and whole point - in the issue is that Docker needs to be secure by default and not expose containers to the outside world without the user's knowledge or explicit intervention to do so.

The earliest issue report I've found about this is from Mar 18, 2014:

https://github.com/moby/moby/issues/4737

It's probably a design decision they don't want to change and try to fix with the DOCKER-USER iptables chain. You can use the -p option for docker run to publish the port only to the docker host (-p 127.0.0.1:port:port) but that's not solving the problem either.

Let's be clear, Docker Is secure by default. You have to tell Docker to do port forwarding.

As for services that need to communicate with each other, you should be using networks (docker network) to limit access, not port forwarding.

Let's be clear, Docker _Is_ secure by default. You have to tell Docker to do port forwarding.

Please don't dismiss a factual problem with docker running over your established iptables protection. I'm sure you've seen the countless issue reports discussing this.

Let's be clear, Docker _Is_ secure by default. You have to tell Docker to do port forwarding.

As for services that need to communicate with each other, you should be using networks (docker network) to limit access, not port forwarding.

Let's be clear - Docker is secure until you want to access it outside of the docker network. Once you want to access it even from local host (127.0.0.1) then Docker is not secure by default as it binds against 0.0.0.0 and exposes the containers off the system - bypassing anything the user does to secure their system, without showing up in tooling other than direct usage of iptables. DOCKER_USER does not and will not ever be a proper solution because it requires the user know too much about the underlying firewall system (iptables on Linux, whatever Mac uses, and the Windows Firewall on Windows). That still needs to be secure by default. There's a very easy way to do it as I outlined earlier in https://github.com/moby/moby/issues/22054#issuecomment-552951146 and have called out multiple times throughout this issues (though perhaps not as clearly as in that comment).

Do note too that the documentation for exposing a port does not mention this security issue either, making no reference to how Docker interacts with the system firewall when exposing a port - thus reducing security on systems that had been designed to be secure until Docker was used.

Let's be clear, Docker _Is_ secure by default. You have to tell Docker to do port forwarding.
As for services that need to communicate with each other, you should be using networks (docker network) to limit access, not port forwarding.

Let's be clear - Docker is secure until you want to access it outside of the docker network. Once you want to access it even from local host (127.0.0.1) then Docker is _not_ secure by default as it binds against 0.0.0.0 [...]

To be precise, Docker doesn't bind to 0.0.0.0. That is the (reasonable) expectance of the user: that specifying --publish on CLI, hence operating in user-space, starts some kind of proxy daemon that listens on the specified port, accepts incoming connections and pushes back and forth all packets between Docker and the outer world.

But instead, Docker injects magic DNAT/masquarading rules into the firewall to rewrite the adresses on the packet, hence randomly breaking any pre-installed rules system.

In my opinion messing levels of abstractions here is the biggest bummer and confuses the user. I don't know what scenarios were considered by the Docker team when designing the --publish machinery we see here, I see none that justifies the decision (maybe beside performance reasons).

Let's be clear, Docker Is secure by default. You have to tell Docker to do port forwarding.

... which happens to be one of Docker's most used features. Effectively, you have just stated that an undocumented override of pre-existing firewall rules within one of Docker's most used features makes Docker _secure by default_.

Well, whatever works for you I guess. As it doesn't work for me, I'll start looking at alternative container platforms. Security issues can be fixed but blatant disregard of reasonable security expectations is a different beast.

Let's assess the situation as it is today:

By default, no ports are exposed. You have to tell Docker to expose a port.
In the past Docker would setup iptables such that anything that knew how to route to the bridge network could access the container IP's (by setting the forward policy to "accept"), but this is no longer true.

I have seen some people saying they are using -p to expose services to each other, which should not be required.
On the default network you can use --link to wire up services together, the container is available over DNS.
On non-default networks (user defined networks), containers can access each other by DNS as well, including setting up aliases via --link, or even to a network with a specified alias.

It seems like many cases really you just want to connect to the service from the client, in which case it is recommended to use another container with access to the service you want to connect to rather than exposing a port.

-p is specifically designed for ingress, as in let external things access this service.
The default for -p is indeed to allow traffic from anywhere. You can change this by manually specifying the address to allow from either per -p or as a daemon wide setting.
Since -p does use iptables, the DOCKER-USER chain was created so users can add their own filter rules before it hits the container.

Since -p is designed for ingress, I think it is reasonable for this to expose traffic as it does. I do feel like it is unfortunate that the Docker rules are inserted at the top of the filter table, however changing this would very much be a breaking change to a large group of users who DO want this behavior.

There are a couple of other alternatives to -p:

  1. Don't use -p, connect to the container IP directly. This requires a little extra work since you have to lookup the IP, but this data is available from the API. You'll also need to make sure the forward policy on the firewall allows this (assuming you are connecting from a different host, same host should be fine)
  2. Use macvlan or ipvlan networking for services you want to be accessible from the host's network (i.e. ingress). These networking options give the container an IP directly from the host's network interface (you choose which interface it is bound to).
  3. Use --net=host, this runs the service on the host network namespace giving the service access to network infra already exists on the host.

You say "make this secure by default", but exposing a port is by definition a potentially insecure action. There also seems to be some idea that exposing to localhost only is secure, but it is not as anything running on the host can access localhost (including javascript in a browser if it is a desktop).

What use case are you trying to solve by using -p?
Do you have some thoughts on the actual change you'd like to see?

I'm fine changing something to make this better for your workflow, but there are a lot of different uses cases here and one size never fits all (see complaints about -p).

@cpuguy83 that's all fine and dandy, but changes nothing in this request.

People use docker and want to connect to applications running under docker from their local system - this is an extremely common use case for developers especially when trying to diagnose something or writing a service that is not under docker but needs to connect to services hosted in docker (to fill out a dev environment). Developing under Docker isn't always a good, fun, or useful experience either so saying:

it is recommended to use another container with access to the service you want to connect to rather than exposing a port.

Is simply a no-go. Not all tools are Docker friendly, nor should they have to be. The user should not be required to be fully under Docker to utilize services running under Docker. Even then, if Docker is used to operate servers the administrator cannot easily control them via the firewall configuration and the Expose Port functionality however it is orchestrated (command-line, Dockerfile, Docker Compose Config) is utterly broken.

Further, folks use Docker Compose to manage much of the environment, and specify via docker-compose.yml or Dockerfile that a port needs to be exposed so they can access it locally. Therefore saying use the -p parameter is incorrect as they never interface with the docker command directly in such a way that that would work.

Exposing a port does not mean that security must be broken. I've outlined how you can expose a port to the local system without breaking security (https://github.com/moby/moby/issues/22054#issuecomment-552951146) and it would put the management of the external exposure (off system) in the hands of the user in a way they can easily control with their existing tooling.

Solution:

  • Use Docker Network
  • Expose the Ports on the Docker Network alone and local host
  • Stop binding the port on 0.0.0.0 - or effectively doing so
  • Require users to use their own firewall tooling to expose the port off system (ufw, firewalld, etc)
  • Provide integrations for common firewalls to make this easy

In other words, instead of accessing a service in a docker container via 127.0.0.1:<port> require <docker container ip>:<service port> even from local host. If people want to expose the service off system they can add a firewall rule via their tooling (ufw, etc) to port forward from a given port to <docker container ip>:<service port>.

Alternatively, follow the Kubernetes approach with their Proxy design, effectively doing as @jest suggested in https://github.com/moby/moby/issues/22054#issuecomment-554665865. Again, this would still need to be local system only until the user purposefully exposed that to the outside network.

The biggest issue here is that Docker is compromising the integrity of the firewall in order to provide its services, doing so without telling the user at all, and without integrating into the user's tooling so the user neither knows about it nor can control it.

I realize there are a ton of firewall interfaces on the market - even with iptables there's firewalld, ufw, and a dozen others. I don't really expect Docker to integrate with them (though that would be nice), but I do expect Docker to work in a way that doesn't bypass them or break any security the user did setup.

As a basic test case:

  • Setup a Debian-based server (Debian, Ubuntu)
  • Install ufw, OpenSSH Server
  • run ufw allow OpenSSH
  • run ufw enable

At this point you have a pretty secure server; the only traffic allowed in is either (a) related to outgoing traffic or (b) SSH Server traffic.

Now, start a docker container with an exposed port.

An acceptable solution would meet the following:

  • The port should not be accessible from another computer; the firewall should continue to block it from outside systems.
  • The port should be accessible from the local computer (localhost).
  • Multiple docker containers should be able to expose the same port and have everything work; all containers with an exposed port should be accessible from the local system only.
  • The user should not have to know about their firewall configuration details to make this happen.

Try the same for a CentOS/Fedora/RHEL based system with firewalld.

@cpuguy83 IMO people expect -p to work on an application level. That is, if I -p 80:80, I expect the behavior as if the application bound to port 80 in the container were running on the host.

VirtualBox or SSH model ports forwarding in such a way, so people are assume the same is in Docker case.

To use a broader expectation parallel: from the user point of view, it should work as host-bound volumes. You just point to the host directory, and "as magic" it is visible inside container; any modifications made to the filesystem from the other side works the same, including permissions, quota, etc.

Narrowing down -p problems to the firewall cases: in the normal world the user expects that an application bound to 0.0.0.0:80 to be visible to the outside world. If the user want to limit the access, he is instructed in a numerous guides to use a firewall and setup a rule in the INPUT chain:

-P INPUT DROP
-A INPUT -s 1.2.3.4/32 -p tcp --dst-port 80 -j ACCEPT

or use tools like UFW:

ufw enable
ufw allow http

But with docker, the user has to be able to construct such rules out of a blue:

-A DOCKER-USER -s 1.2.3.4/32 -i eth0 -p tcp -m conntrack --ctorigdstport 80 --ctdir ORIGINAL -j ACCEPT
-A DOCKER-USER -i eth0 -p tcp -m conntrack --ctorigdstport 80 --ctdir ORIGINAL -j DROP

Please, show me one comprehensive guide for an average user on how to solve such common scenarios with Docker.

And for e.g. an average developer — to whom you've been selling the whole "application containerization" concept — the consequences of exposing a port are simply unpredictable.

These days I simply don't expose ports due to my lack of knowledge, and use services like Traefik instead. Maybe --expose also should not be proposed as a general-purpose in CLI docs, because IMO it simply does not do any good for an average user.

Or, provide anyone without in-depth knowledge of these tools with a Linux laptop, such as a Dell or Lenovo, and witness how their bona-fide efforts at properly setting their firewalls make absolutely no difference when someone manages to access their local database while having a coffee at Starbucks.

This exact issue caused a system vulnerability for us that I discovered over the weekend. Per @jacoscaz, I was just able to tap into another developer's local database because it has a published port. It would be great if y'all could include this in intro documentation so others don't do the same. We need a non-containerized service to connect to a container without everyone else on the network getting access, so Docker networks are out of the question. Sounds like the best option for now is to connect to the local container IP, unless anyone has a better idea.

@dentonmwood That sounds exactly like what you should do. As I mentioned above, you can even setup the service to run using macvlan or ipvlan which will give it an IP directly on your normal network.

@BenjamenMeyer @jest @jacoscaz

Thanks for the extra feedback.

I agree the knowledge we are asking users to poses in this regard is not great. We currently put the onus on the user (be it a sysadmin or a developer) to understand what -p does and to take appropriate precautions to ensure the service is only exposed to who they think it is exposed to. We take that a step further even by expecting developers who often don't have any reason to know iptables to step in and fix it for their own environment (via DOCKER-USER).
We are basically in this situation due to fears of breaking compatibility.

I have some ideas that I haven't had time to fully think through yet, but it would basically be changing the behavior of -p based on API version of the client and handling ingress separately. Still a breaking change which worries me but at least old behavior is preserved in older API versions.

I did have a thought that we could do a local proxy (ala kubectl proxy) for the use-case of local access, however this again puts the onus on the developer to know and understand more than they really should need to.

Thoughts?

I think that introducing proxy is a step in good direction.

I acknowledge that "port forwarding" is an shared responsibility between a container and an orchestrator, similar in the spirit to the Swarm mode. But the minimal host integration is always required and when giving an option like -p to the (non-expert) user, the proposed solution should be comprehensible to such a person.

As the -p consequences to the environment are not specified (at least on the CLI level, I think), there is a possibility to introduce a user-friendly configuration of the way it works.

For example, at the moment there is iptables in /etc/docker/daemon.json which determines whether to manipulate DOCKER or not. The config could be extended with another entry, such that combination like iptables==true && expose_with_network_proxy==true would switch on the proxy-like behavior.

This should be generally safe for new installations, as was preferring overlay2 over aufs some time ago (if I remember correctly). And easily deployed as such, as upgrades don't touch existing configuration files, but new installs are free to go.

We are basically in this situation due to fears of breaking compatibility.

I'd really like to stress the fact that the gravity of this issue, at least in my eyes, has more to do with the lack of documentation than with the current technical underpinnings of -p. Yes, I do not believe that the current mechanism can be said to be _secure_, but security flaws can be fixed and I am in no position to criticize the reasons that led to the current scenario. Hindsight is always 20/20, after all, and I admire the commitment to backward compatibility and the willingness to face the technical challenges that such commitment brings in.

What I don't get is why there are still no bright, big, red boxes across Docker's documentation explicitly warning against the side effects of using -p. Had I come across such a warning I don't think I would even be here in the first place.

Thoughts?

The proxy option sounds reasonable. Plus, I like the idea of being able to expose and un-expose ports whenever needed rather than being bound to do so while launching docker run.

@BenjamenMeyer @jest @jacoscaz

Thanks for the extra feedback.

Thanks for finally looking at this more closely.

I agree the knowledge we are asking users to poses in this regard is not great. We currently put the onus on the user (be it a sysadmin or a developer) to understand what -p does and to take appropriate precautions to ensure the service is only exposed to who they think it is exposed to. We take that a step further even by expecting developers who often don't have any reason to know iptables to step in and fix it for their own environment (via DOCKER-USER).
We are basically in this situation due to fears of breaking compatibility.

I have some ideas that I haven't had time to fully think through yet, but it would basically be changing the behavior of -p based on API version of the client and handling ingress separately. Still a breaking change which worries me but at least old behavior is preserved in older API versions.

Thoughts?

  1. Start with updating the documentation so users are aware of the effect of EXPOSE from all methods (-p, Dockerfile, docker-compose.yml). That much can be done quickly. It may also help get some attention on the issue to get more folks offering advise/expertise in finding a good solution and help answer if there is a community desire for backwards compatibility over it too.
  2. Plan out a method to get from where Docker is now to where Docker needs to be. This can include a breaking change at some point.

I think the gravity of the situation certainly allows for a breaking change to fix it; just don't do it without informing people. Set a timeline (6 months? 12 months?) of where the change is heavily discussed, etc and you prepare the community; then in a "major" version make the change. Based on your version schema it looks like the first set is the year (19); given we're at the end of 2019, use the remainder of 2019 and then 2020 to work out the solution and advertise it; introduce it as an optional feature sometime in 2020 and then promote it to first place/default usage in 2021.

The Dockerfile version and the Docker Compose Schema Version may be good tools too as far as setting a default behavior, but I wouldn't block older versions from being able to take advantage of it, and would put a strong warning up about needing to update in such a case too.

However it is rolled out, I think you'll find the overall community a lot more supportive of such a change if they understand why and what's going on and don't feel like they're blindsided by it.

I did have a thought that we could do a local proxy (ala kubectl proxy) for the use-case of local access, however this again puts the onus on the developer to know and understand more than they really should need to.

I do like the proxy idea, and @jest 's suggestion of how to enable it may very well be a great method of doing the migration too.

Honestly, most folks that will be looking to expose it off system will or should be familiar with firewalls to some degree, even if it's just how to configure UFW or firewalld to do so. So if the solution only makes it available to local host, then I think it's acceptable for folks to learn how to use their firewall tooling to do a port forward through the firewall.

I do think it's important that such functionality be done through the firewall tooling that the user has decided to use as it will make everything visible to them. Such functionality should not bypass their tooling. I also realize that there are so many firewall tools out there that it is unreasonable for Docker to integrate with them all. So I would suggest taking a documentation route and highlighting how to do it with some of the popular ones to start, and let the community add more and update them.

I do like the proxy idea, and @jest 's suggestion of how to enable it may very well be a great method of doing the migration too.

Honestly, most folks that will be looking to expose it off system will or should be familiar with firewalls to some degree, even if it's just how to configure UFW or firewalld to do so. So if the solution only makes it available to local host, then I think it's acceptable for folks to learn how to use their firewall tooling to do a port forward through the firewall.

I echo this. With the proxy solution, I think it would be feasible to come up with something that would allow users to bind the proxy to whatever combination of interface(s) and port(s) they wanted. However, if forced to compromise in the name of backward-compatibility (which I support!) or any other reason, priority should be given to match reasonable security expectations, even at the cost of delegating off-system exposure to users.

Further to my comment, the proxy could actually operate in different ways depending on the firewall tooling available. If a supported firewall configuration tool were detected, the proxy could use that to set the appropriate forwarding rules. The presence of such a tool could be added as a requirement for production environments. Should such a tool not be available, the proxy would default to spawning up an application-level proxy server.

Docker bypasses the macOS firewall the same as it does on Linux.

On my development machine (Docker Desktop for Mac) I added "ip": "127.0.0.1" to the Docker daemon configuration. This way any development databases etc. will by default be accessible only from localhost. For those rare cases that I need to make an app visible to others, I can publish it explicitly with -p 0.0.0.0:8080:8080.

Edit: Seems that Docker Compose still binds to 0.0.0.0 by default, regardless of the Docker daemon configuration. So this trick works only when running Docker from the command line. And anyways since Compose files are often shared with teammates who might have different system configuration, it's better add 127.0.0.1 explicitly to the Compose file.

@luontola What an elegant solution.

Edit: What I was also thinking was maybe a way to have a allow/block list of addresses that could be defined in for example Docker compose that would then automatically be added to the DOCKER-USER iptables chain.

A small oopsie daisy that highlights the importance of this issue: https://github.com/docker/for-linux/issues/810 (summary: DOCKER-USER was removed so even if you did configure iptables to block dockers external access, it would ignore that extra configuration and expose all containers)

@danhallin comments like yours can really save the day in some situations. Thank you for sharing.

Hmm interesting reading!

I'm running on Mac OS X and also have a local firewall called LittleSnitch. It just popped upp a dialog asking if it was OK for 185.156.177.252 to connect to com.docker.backend process. I klicked on deny.

I was wondering how it could open a port forward in my router, but I have just realized that this is of course done via UPnP! All routers support this.

What I'm wondering now is the why. Why is something from 185.156.177.252 trying to connect from the outside ? If my local Docker process needs something it should call home from the inside, not open a port on the outside.

@cpuguy83 that's all fine and dandy, but changes nothing in this request.

People use docker and want to connect to applications running under docker from their local system - this is an extremely common use case for developers especially when trying to diagnose something or writing a service that is not under docker but needs to connect to services hosted in docker (to fill out a dev environment). Developing under Docker isn't always a good, fun, or useful experience either so saying:

it is recommended to use another container with access to the service you want to connect to rather than exposing a port.

Is simply a no-go. Not all tools are Docker friendly, nor should they have to be. The user should not be required to be fully under Docker to utilize services running under Docker. Even then, if Docker is used to operate servers the administrator cannot easily control them via the firewall configuration and the Expose Port functionality however it is orchestrated (command-line, Dockerfile, Docker Compose Config) is utterly broken.

Further, folks use Docker Compose to manage much of the environment, and specify via docker-compose.yml or Dockerfile that a port needs to be exposed so they can access it locally. Therefore saying use the -p parameter is incorrect as they never interface with the docker command directly in such a way that that would work.

Exposing a port does not mean that security must be broken. I've outlined how you can expose a port to the local system without breaking security (#22054 (comment)) and it would put the management of the external exposure (off system) in the hands of the user in a way they can easily control with their existing tooling.

Solution:

* Use Docker Network

* Expose the Ports on the Docker Network alone and local host

* Stop binding the port on 0.0.0.0 - or effectively doing so

* Require users to use their own firewall tooling to expose the port off system (ufw, firewalld, etc)

* Provide integrations for common firewalls to make this easy

In other words, instead of accessing a service in a docker container via 127.0.0.1:<port> require <docker container ip>:<service port> even from local host. If people want to expose the service off system they can add a firewall rule via their tooling (ufw, etc) to port forward from a given port to <docker container ip>:<service port>.

Alternatively, follow the Kubernetes approach with their Proxy design, effectively doing as @jest suggested in #22054 (comment). Again, this would still need to be local system only until the user purposefully exposed that to the outside network.

The biggest issue here is that Docker is compromising the integrity of the firewall in order to provide its services, doing so without telling the user at all, and without integrating into the user's tooling so the user neither knows about it nor can control it.

I realize there are a ton of firewall interfaces on the market - even with iptables there's firewalld, ufw, and a dozen others. I don't really expect Docker to integrate with them (though that would be nice), but I do expect Docker to work in a way that doesn't bypass them or break any security the user did setup.

As a basic test case:

* Setup a Debian-based server (Debian, Ubuntu)

* Install ufw, OpenSSH Server

* run `ufw allow OpenSSH`

* run `ufw enable`

At this point you have a pretty secure server; the only traffic allowed in is either (a) related to outgoing traffic or (b) SSH Server traffic.

Now, start a docker container with an exposed port.

An acceptable solution would meet the following:

* The port should not be accessible from another computer; the firewall should continue to block it from outside systems.

* The port should be accessible from the local computer (localhost).

* Multiple docker containers should be able to expose the same port and have everything work; all containers with an exposed port should be accessible from the local system only.

* The user should not have to know about their firewall configuration details to make this happen.

Try the same for a CentOS/Fedora/RHEL based system with firewalld.

I came across this issue while I was trying to figure if there is a way to configure docker to not bypass my firewall input rules every time a port is published. Te comment above does a good work on explaining the whole thing and a also a more preferable, IMO, approach.

Yesterday, I stumbled on this issue by accident. I've spent weeks meticulously customizing my iptables firewall rules for my VPS. I've tried to be cautious and restrictive. Filter Input & Output are set to drop by default. I explicitly allow certain traffic, and block everything else. Traffic is logged, and I've been testing and monitoring.

I knew Docker made it's own changes to the Forward chain and NAT table.. So I've been _extremely_ careful to respect those changes throughout my process. Also, all my containers are attached to user-defined networks. The default host network is never used. The official documentation tells us not to.

My first surprise was that my containerized Nginx Proxy was accessible to the entire Internet. I have an option in my firewall rules to allow inbound web traffic from the world. But I had not turned that feature on yet. Nowhere in my iptables is it obvious that HTTP 80/443 are allowed.

Some of my web applications rely on databases like MySQL. Initially I did _not_ use the -p option in Docker Compose. I knew it wasn't necessary, since my web application and the db server shared the same, user-defined Docker network. But as a former DBA, I'm always thinking about backups. So I activated the -p option to allow my host's cron jobs and backup tools to take timely backups. I'd written yet another firewall _option_ to allow external SQL access. But again, had not-yet enabled that.

After my surprise at Nginx, I wisely decided to double-check that MySQL wasn't exposed too. I attempted to connect (from my laptop) to the MySQL database on my remote VPS. And was again shocked when I successfully connected immediately.

Nothing I'm saying hasn't been discussed at length in previous posts. Many thanks to @BenjamenMeyer @jest @jacoscaz for their helpful investigation and suggestions. But for those asking for modern use-cases and experiences? Here you have it. Over 4 years after this thread began, people like myself are still encountering this behavior. And still coming away feeling shocked.

Yes, there are plenty of workarounds. I'll be pursuing a few immediately. But to implement those, you have to actually know this issue exists in the first place. That is what disappoints me most. Not that Docker developers have made certain design decisions, whether good or bad.

But that this "firewall bypass behavior" isn't clearly called out, loudly warned about, and more widely documented. When it comes to network security, surprises are never a good thing.

Wanted to share my workaround for this issue in case others find it useful. Using iptables & ipset. Tested on Ubuntu, CentOS 7 & RHEL 7. ipset is used since "trusted" IPs are not always in the same IP range (fix limitation of iptables).
It works with normal Docker, and Docker Swarm (aka SwarmKit).
It ensures you are secure by default, only allowing IPs you specify to be able to connect to OS ports & Docker ports (using iptables and ipset) that you specify should be open. Also has option to make an OS port or Docker port "public" (open to all IPs)

Ansible is not required, but makes it easier.. Ansible role: https://github.com/ryandaniels/ansible-role-iptables-docker
Manual steps for CentOS/RHEL:
https://github.com/ryandaniels/ansible-role-iptables-docker#manual-commands-centosrhel
And Ubuntu 18.04/20.04:
https://github.com/ryandaniels/ansible-role-iptables-docker#manual-commands-ubuntu-2004

You'll need to install/configure ipset in addition to iptables (see above links if having trouble).
Example iptables config (with ssh port 22 open to everyone):

*filter
:DOCKER-USER - [0:0]
:FILTERS - [0:0]
#Can't flush INPUT. wipes out docker swarm encrypted overlay rules
#-F INPUT
#Use ansible or run manually once instead to add -I INPUT -j FILTERS
#-I INPUT -j FILTERS
-A DOCKER-USER -m state --state RELATED,ESTABLISHED -j RETURN
-A DOCKER-USER -i docker_gwbridge -j RETURN
-A DOCKER-USER -s 172.18.0.0/16 -j RETURN
-A DOCKER-USER -i docker0 -j RETURN
-A DOCKER-USER -s 172.17.0.0/16 -j RETURN
#Below Docker ports open to everyone if uncommented
#-A DOCKER-USER -p tcp -m tcp -m multiport --dports 8000,8001 -j RETURN
#-A DOCKER-USER -p udp -m udp -m multiport --dports 9000,9001 -j RETURN
-A DOCKER-USER -m set ! --match-set ip_allow src -j DROP
-A DOCKER-USER -j RETURN
-F FILTERS
#Because Docker Swarm encrypted overlay network just appends rules to INPUT
-A FILTERS -p udp -m policy --dir in --pol ipsec -m udp --dport 4789 -m set --match-set ip_allow src -j RETURN
-A FILTERS -m state --state RELATED,ESTABLISHED -j ACCEPT
-A FILTERS -p icmp -j ACCEPT
-A FILTERS -i lo -j ACCEPT
#Below OS ports open to everyone if uncommented
-A FILTERS -p tcp -m state --state NEW -m tcp -m multiport --dports 22 -j ACCEPT
#-A FILTERS -p udp -m udp -m multiport --dports 53,123 -j ACCEPT
-A FILTERS -m set ! --match-set ip_allow src -j DROP
-A FILTERS -j RETURN
COMMIT
Was this page helpful?
0 / 5 - 0 ratings