Moby: Unable to retrieve user's IP address in docker swarm mode

Created on 9 Aug 2016  ·  324Comments  ·  Source: moby/moby

Output of docker version:

Client:
 Version:      1.12.0
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   8eab29e
 Built:        Thu Jul 28 22:00:36 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.0
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   8eab29e
 Built:        Thu Jul 28 22:00:36 2016
 OS/Arch:      linux/amd64

Output of docker info:

Containers: 155
 Running: 65
 Paused: 0
 Stopped: 90
Images: 57
Server Version: 1.12.0
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 868
 Dirperm1 Supported: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: host overlay null bridge
Swarm: active
 NodeID: 0ddz27v59pwh2g5rr1k32d9bv
 Is Manager: true
 ClusterID: 32c5sn0lgxoq9gsl1er0aucsr
 Managers: 1
 Nodes: 1
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot interval: 10000
  Heartbeat tick: 1
  Election tick: 3
 Dispatcher:
  Heartbeat period: 5 seconds
 CA configuration:
  Expiry duration: 3 months
 Node Address: 172.31.24.209
Runtimes: runc
Default Runtime: runc
Security Options: apparmor
Kernel Version: 3.13.0-92-generic
Operating System: Ubuntu 14.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 31.42 GiB
Name: ip-172-31-24-209
ID: 4LDN:RTAI:5KG5:KHR2:RD4D:MV5P:DEXQ:G5RE:AZBQ:OPQJ:N4DK:WCQQ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Username: panj
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Insecure Registries:
 127.0.0.0/8

Additional environment details (AWS, VirtualBox, physical, etc.):

Steps to reproduce the issue:

  1. run following service which publishes port 80
docker service create \
--name debugging-simple-server \
--publish 80:3000 \
panj/debugging-simple-server
  1. Try connecting with http://<public-ip>/.

Describe the results you received:
Neither ip nor header.x-forwarded-for is the correct user's IP address.

Describe the results you expected:
ip or header.x-forwarded-for should be user's IP address. The expected result can be archieved using standalone docker container docker run -d -p 80:3000 panj/debugging-simple-server. You can see both of the results via following links,
http://swarm.issue-25526.docker.takemetour.com:81/
http://container.issue-25526.docker.takemetour.com:82/

Additional information you deem important (e.g. issue happens only occasionally):
This happens on both global mode and replicated mode.

I am not sure if I missed anything that should solve this issue easily.

In the meantime, I think I have to do a workaround which is running a proxy container outside of swarm mode and let it forward to published port in swarm mode (SSL termination should be done on this container too), which breaks the purpose of swarm mode for self-healing and orchestration.

arenetworking areswarm kinenhancement statuneeds-attention versio1.12

Most helpful comment

I've also run into the issue when trying to run logstash in swarm mode (for collecting syslog messages from various hosts). The logstash "host" field always appears as 10.255.0.x, instead of the actual IP of the connecting host. This makes it totally unusable, as you can't tell which host the log messages are coming from. Is there some way we can avoid translating the source IP?

All 324 comments

/cc @aluzzardi @mrjana ptal

@PanJ can you please share some details on how debugging-simple-server determines the ip ? Also what is the expectation if a service is scaled to more than 1 replica across multiple hosts (or global mode) ?

@mavenugo it's koa's request object which uses node's remoteAddress from net module. The result should be the same for any other libraries that can retrieve remote address.

The expectation is that ip field should always be remote address regardless of any configuration.

@PanJ you still use your workaround or found some better solution?

@PanJ When I run your app as a standalone container..

docker run -it --rm -p 80:3000 --name test panj/debugging-simple-server

and access the published port from another host I get this

vagrant@net-1:~$ curl 192.168.33.12
{"method":"GET","url":"/","header":{"user-agent":"curl/7.38.0","host":"192.168.33.12","accept":"*/*"},"ip":"::ffff:192.168.33.11","ips":[]}
vagrant@net-1:~$

192.168.33.11 is the IP of the host in which I am running curl. Is this the expected behavior ?

@sanimej Yes, it is the expected behavior that should be on swarm mode as well.

@marech I am still using the standalone container as a workaround, which works fine.

In my case, there are 2 nginx intances, standalone and swarm instances. SSL termination and reverse proxy is done on standalone nginx. Swarm instance is used to route to other services based on request host.

@PanJ The way the published port of a container is accessed is different in swarm mode. In the swarm mode a service can be reached from any node in the cluster. To facilitate this we route through an ingress network. 10.255.0.x is the address of the ingress network interface on the host in the cluster from which you try to reach the published port.

@sanimej I kinda saw how it works when I dug into the issue. But the use case (ability to retrieve user's IP) is quite common.

I have limited knowledge on how the fix should be implemented. Maybe a special type of network that does not alter source IP address?

Rancher is similar to Docker swarm mode and it seems to have expected behavior. Maybe it is a good place to start.

@sanimej good idea could be add all IPs to X-Forwarded-For header if its possible then we can see all chain.

@PanJ hmm, and how your nignx standalone container communicate to swarm instance, via service name or ip? Maybe can share nginx config part where you pass it to swarm instance.

@marech standalone container listens to port 80 and then proxies to localhost:8181

server {
  listen 80 default_server;
  location / {
    proxy_set_header        Host $host;
    proxy_set_header        X-Real-IP $remote_addr;
    proxy_set_header        X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header        X-Forwarded-Proto $scheme;
    proxy_pass          http://localhost:8181;
    proxy_read_timeout  90;
  }
}

If you have to do SSL termination, add another server block that listens to port 443, then do the SSL termination and proxies to localhost:8181 as well

Swarm mode's nginx publishes 8181:80 and routes to another service based on request host.

server {
  listen 80;
  server_name your.domain.com;
  location / {
    proxy_pass          http://your-service:80;
    proxy_set_header Host $host;
    proxy_read_timeout  90;
  }
}

server {
  listen 80;
  server_name another.domain.com;
  location / {
    proxy_pass          http://another-service:80;
    proxy_set_header Host $host;
    proxy_read_timeout  90;
  }
}

In our case, our API RateLimit and other functions is depend on the user's ip address. Is there any way to skip the problem in swarm mode?

I've also run into the issue when trying to run logstash in swarm mode (for collecting syslog messages from various hosts). The logstash "host" field always appears as 10.255.0.x, instead of the actual IP of the connecting host. This makes it totally unusable, as you can't tell which host the log messages are coming from. Is there some way we can avoid translating the source IP?

+1 for a solution for this issue.

Without the ability to retrieve user's IP prevents us from using monitoring solutions like Prometheus.

Perhaps the linux kernel IPVS capabilities would be of some use here. I'm guessing that the IP change is taking place because the connections are being proxied in user space. IPVS, on the other hand, can redirect and load balance requests in kernel space without changing the source IP address. IPVS could also be good down the road for building in more advanced functionality, such as different load balancing algorithms, floating IP addresses, and direct routing.

For me, it would be enough if I could somehow find out the relation between the virtual IP and the IP of the server the endpoint belongs to. That way, when Prometheus send an alert related to some virtual IP, I could find out what is the affected server. It would not be a good solution but it would be better than nothing.

@vfarcic I don't think that's possible with the way it works now. All client connections come from the same IP, so you can't translate it back. The only way that would work is if whatever is doing the proxy/nat of the connections saved a connection log with timestamp, source ip, and source port. Even then, it wouldn't be much help in most use cases where the source IP is needed.

I probably did not explain well the use case.

I use Prometheus that is configured to scrap exporters that are running as Swarm global services. It uses tasks. to get the IPs of all replicas. So, it's not using the service but replica endpoints (no load balancing). What I'd need is to somehow figure out the IP of the node where each of those replica IPs come from.

I just realized the "docker network inspect " provides information about containers and IPv4 addresses of a single node. Can this be extended so that there is a cluster-wide information of a network together with nodes?

Something like:

       "Containers": {
            "57bc4f3d826d4955deb32c3b71550473e55139a86bef7d5e584786a3a5fa6f37": {
                "Name": "cadvisor.0.8d1s6qb63xdir22xyhrcjhgsa",
                "EndpointID": "084a032fcd404ae1b51f33f07ffb2df9c1f9ec18276d2f414c2b453fc8e85576",
                "MacAddress": "02:42:0a:00:00:1e",
                "IPv4Address": "10.0.0.30/24",
                "IPv6Address": "",
                "Node": "swarm-4"
            },
...

Note the addition of the "Node".

If such information would be available for the whole cluster, not only a single node with the addition of a --filter argument, I'd have everything I'd need to figure out the relation between a container IPv4 address and the node. It would not be a great solution but still better than nothing. Right now, when Prometheus detects a problem, I need to execute "docker network inspect" on each node until I find out the location of the address.

I agree with @dack , given the ingress network is using IPVS, we should solve this issue using IPVS so that the source IP is preserved and presented to the service correctly and transparently.

The solution need to work at the IP level so that any service that are not based on HTTP can still work properly as well (Can't rely on http headers...).

And I cant stress out how important this is, without it, there are many services that simply cant operate at all in swarm mode.

@kobolog might be able to shed some light on this matter given his talk on IPVS at DockerCon.

Just adding myself to the list. I'm using logstash to accept syslog messages, and they're all getting pushed into elasticsearch with the host IP set to 10.255.0.4, which makes it unuseable, and I'm going to have to revert to my non-containerized logstash deployment if there's no fix for this.

@mrjana can u pls add the suggestion you had to workaround this problem ?

IPVS is not a userspace reverse proxy which can fix up things in HTTP layer. That is the difference between a userspace proxy like HAProxy and this. If you want to use HAProxy you could do that by putting a HAProxy in the cluster and have all your service instances and HAProxy to participate in the same network. That way HAProxy can fix up HTTP header.x-forwarded-for. Or if the L7 load balancer is external to the cluster you can use the upcoming (in 1.13) feature for a new PublishMode called Host PublishMode which will expose each one of the individual instances of the service in it's own individual port and you can point your external load balancer to that.

@mrjana The whole idea of using IPVS (instead of whatever docker currently does in swarm mode) would be to avoid translating the source IP to begin with. Adding an X-Forwarded-For might help for some HTTP applications, but it's of no use whatsoever for all the other applications that are broken by the current behaviour.

@dack my understanding is the Docker ingress network already use IPVS.

If you want to use HAProxy you could do that by putting a HAProxy in the cluster and have all your service instances and HAProxy to participate in the same network. That way HAProxy can fix up HTTP header.x-forwarded-for

That would not work either @mrjana , the only way for HAProxy to get the client ip is to run outside the ingress network using docker run or directly on the host but then you cant use any of your services since they are on a different network and you cant access them.

Simply put, there is absolutely no way as far as I know to deal with this as soon as you use docker services and swarm mode.

It would be interesting if the author(s) of the docker ingress network could join the discussion as they would probably have some insight as to how IPVS is configured / operated under the hood ( there are many modes for IPVS) and how we can fix the issue.

@tlvenn Do you know where this is in the source code? I could be wrong, but I don't believe it is using IPVS based on some things I've observed:

  • The source port is translated (the whole reason for this issue). IPVS doesn't do this. Even in NAT mode, it only translates the destination address. You need to use the default route or policy routing to send return packets back to the IPVS host.
  • When a port is published in swarm mode, all the dockerd instances in the swarm listen on the published port. If IPVS was used, then it would happen in kernel space and dockerd would not be listening on the port.

Hi @dack,

From their blog:

Internally, we make this work using Linux IPVS, an in-kernel Layer 4 multi-protocol load balancer that’s been in the Linux kernel for more than 15 years. With IPVS routing packets inside the kernel, swarm’s routing mesh delivers high performance container-aware load-balancing.

The code source should live in swarmkit project if I am not wrong.

I wonder if @stevvooe can help us out understand what is the underlying issue here.

OK, I've had a brief look through the code and I think I have a slightly better understanding of it now. It does indeed appear to be using IPVS as stated in the blog. SNAT is done via an iptables rule which set up in service_linux.go. If I understand correctly, the logic behind it would be something like this (assuming node A receives a client packet for the service running on node B):

  • Swarm node A receives the client packet. IPVS/iptables translates (src ip)->(node a ip) and (dst ip)->(node B ip)
  • The packet is forwarded to node B
  • Node B sends it's reply to node A (as that's what it sees as the src ip)
  • Node A translates the src and dst back to the original values and forwards the reply to the client

I think the reasoning behind the SNAT is that the reply must go through the same node that the original request came through (as that's where the NAT/IPVS state is stored). As requests may come through any node, the SNAT is used so that the service node knows which node to route the request back through. In an IPVS setup with a single load balancing node, that wouldn't be an issue.

So, the question is then how to avoid the SNAT while still allowing all nodes handle incoming client requests. I'm not totally sure what the best approach is. Maybe there's a way to have a state table on the service node so that it can use policy routing to direct replies instead of relying on SNAT. Or maybe some kind of encapsulation could help (VXLAN?). Or, the direct routing method of IPVS could be used. This would allow the service node to reply directly to the client (rather than via the node that received the original request) and would allow adding new floating IPs for services. However, it would also mean that the service can only be contacted via the floating IP and not the individual node IPs (not sure if that's a problem for any use cases).

Pretty interesting discovery @dack !

Hopefully a solution will be found to skip that SNAT all together.

In the meantime, there is maybe a workaround that has been committed not long ago which introduce a host-level port publishing with PublishMode, effectively bypassing the ingress network.

https://github.com/docker/swarmkit/pull/1645

Hey, thanks for the large amount of feedback - we'll take a deep look at this issue after the weekend.

Some info in the meantime:

@tlvenn: @mrjana is the main author behind the ingress network feature. Source lives in docker/libnetwork mostly, some in SwarmKit

@dack: it is indeed backed by IPVS

@tlvenn as far as I know, Docker Swarm uses masquerading, since it's the most straightforward way and guaranteed to work in most configurations. Plus this is the only mode that actually allows to masquerade ports too [re: @dack], which is handy. In theory, this issue could be solved by using IPIP encapsulation mode – the packet flow will be like this then:

  • A packet arrives at the gateway server – in our case any node of the swarm – and IPVS on that node determines that it is in fact a packet for a virtual service, based on its destination IP address and port.
  • Packet is encapsulated into another IP packet and sent over to the real server which was choosen based on the load balancing algorithm.
  • The real server receives the enclosing packet, decapsulates it and sees real client IP as source and virtual service IP as destination. All real servers are supposed to have a non-ARPable interface alias with the virtual service IP so that they would assume that this packet is actually destined for them.
  • The real server processes the packet and sends the response back to the client directly. The source IP in this case will be the virtual service IP, so no martian replies involved, which is good.

There're, of course, many caveats and things-which-can-go-wrong, but generally this is possible and IPIP mode is widely used in production.

Hoping a solution can be found soon for this, as IP-fixation and other security checks need to be able to receive the correct external IP.

Watching. Our product leverages source IP information for security and analytics.

@aluzzardi any update for us ?

bump, we need this to be working for a very large project we are starting early next year.

Examing the flow, it seems to currently work like this (in this example, node A receives the incoming traffic and node B is running the service container):

  • node A performs DNAT to direct the packet into the ingress_sbox network namespace (/var/run/docker/netns/ingress_sbox)
  • ingress_sbox on node A runs IPVS in NAT mode, which performs DNAT to direct the packet to the container on node B (via the ingress overlay network) and also SNAT to change the source IP to the node A ingress overlay network IP
  • the packet is routed through the overlay to the real server
  • the return packets follow the same path in reverse, rewriting the source/dest addresses back to the original values

I think the SNAT could be avoided with something like this:

  • node A passes the packet into ingress_sbox without any NAT (iptables/policy routing ?)
  • node A ingress_sbox runs IPVS in direct routing mode, which sends packet to node B via ingress overlay network
  • container on node B receives the unaltered packet (container must accept packets for all public IPs, but not send ARP for them. there are several ways to do this, see IPVS docs).
  • the return packets send directly from node B to the client (does not need to go back through the overlay network or node A)

As an added bonus, no NAT state needs to be stored and overlay network traffic is reduced.

@aluzzardi @mrjana Any update on this please ? A little bit of feedback from Docker would be very much appreciated.

Watching. without source IP information, most of our services can't work as expected

How did that happen ?
unassign_bug

@tlvenn seems like a bug in Github ?

@PanJ @tlvenn @vfarcic @dack and others, PTAL #27917. We introduced the ability to enable service publish mode = host which will provide a way for the service to bypass IPVS & bring back docker run -p like behavior and will that will retain the source-ip for cases that need it.

Pls try 1.13.0-rc2 and provide feedback.

ya pretty weird @mavenugo ..

Regarding the publish mode, I had already linked this from swarm kit above, this could be a workaround but I truly hope a proper solution comes with Docker 1.13 to address this issue for good.

This issue could very much be categorized as a bug because preserving the source ip is the behaviour we as users expect and it's a very serious limitation of the docker services right now.

I believe both @kobolog and @dack have come up with some potential leads on how to solve this and it's been almost 2 weeks with no follow up on those from Docker side.

Could we please have some visibility on who is looking into this issue at Docker and a status update ? Thanks in advance.

Other than #27917, there is no other solution for 1.13. The Direct-return functionality needs to be analyzed for various use-cases and should not be taken lightly to be considered as a bug-fix. We can look into this for 1.14. But, this also falls under the category of configurable LB behavior, that includes the algorithm (rr vs 10 other methods), Data-path (LVS-DR, LVS-NAT & LVS-TUN). If someone is willing to contribute to this, pls push a PR and we can get that moving.

Fair enough I guess @mavenugo given we have an alternative now.

At the very least, can we amend the doc for 1.13 so it clearly state that when using docker services with the default ingress publishing mode, the source ip is not preserved and hint at using the host mode if this is a requirement for running the service ?

I think it will help people who are migrating to services to not being burnt by this unexpected behaviour.

Sure and yes a doc update to indicate this behavior and the workaround of using the publish mode=host will be useful for such use-cases that fails in LVS-NAT mode.

Just checking back in to see if there was no new developments in getting this real up thing figured out? It certainly is a huge limitation for us as well

Is a solution on the roadmap for docker 1.14? We are delayed deployed our solutions using docker due in part to this issue.

Would love to see a custom header added to the http/https request which preserves the client-ip. This should be possible, shouldn't it? I don't mind when X_Forwarded_for is overwritten, I just want to have a custom field which is only set the very first time the request enters the swarm.

Would love to see a custom header added to the http/https request which preserves the client-ip. This should be possible, shouldn't it? I don't mind when X_Forwarded_for is overwritten, I just want to have a custom field which is only set the very first time the request enters the swarm.

Load balancing is done at L3/4. Adding an http header is not possible.

A fix will involve removing the rewrite of the source address.

@mavenugo I updated to docker 1.13 today and used mode=host on my proxy service. Currently it works, Client IP is preserved, but I hope for a better solution :) Thanks for your work!

Sorry for double post...
How can I use a stack file (yml v3) to get the same behaviour as when I would use --publish mode=host,target=80,published=80 via docker service create?

I tried

...
services:
  proxy:
    image: vfarcic/docker-flow-proxy:1.166
    ports:
      - "80:80/host"
      - "443:443/host" 
...

but that's not working (used same pattern as in https://docs.docker.com/docker-cloud/apps/stack-yaml-reference/#/ports)

How can I use a stack file (yml v3) to get the same behaviour as when I would use --publish mode=host,target=80,published=80 via docker service create?

@hamburml - keep an eye on https://github.com/docker/docker/issues/30447 its an open issue/feature.

Unfortunately I cannot use mode=host as a workaround because I need my service to communicate to the swarm network and also be listening on all nodes, not just the host interface...

@tkeeler33 I think you should be able to deploy the service as a global service (which deploys an instance on each node in the swarm), and connect it to a swarm network to communicate with other services in the swarm

@thaJeztah - Yes, but I can't connect a container to both an overlay/swarm network and host mode=host at the same time. That is my biggest limitation at the moment.

@tkeeler33 seems to work for me;

$ docker network create -d overlay swarm-net

$ docker service create \
  --name web \
  --publish mode=host,published=80,target=80 \
  --network swarm-net \
  --mode=global \
  nginx:alpine

$ docker service create --name something --network swarm-net nginx:alpine

Test if web service is able to connect with something service on the same network;

docker exec -it web.xczrerg6yca1f8ruext0br2ow.kv8iqp0wdzj3bw7325j9lw8qe sh -c 'ping -c3 -w1 something'
PING something (10.0.0.4): 56 data bytes
64 bytes from 10.0.0.4: seq=0 ttl=64 time=0.251 ms

--- something ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.251/0.251/0.251 ms

@thaJeztah - Thank you! After digging in deeper I realized my problem was that I created my docker network using the --opt encrypted option, which caused the container to fail connections from the host. Once I tried your steps, I was able to quickly narrow down the root cause. This option may be a good interim workaround, I just need to wrap my head around any security implications.

Greatly appreciate the info!

@tkeeler33 --opt encrypted should not affect host-port mapping. The only purpose of encrypted option is to encrypt the vxlan tunnel traffic between the nodes. From docs : "If you are planning on creating an overlay network with encryption (--opt encrypted), you will also need to ensure protocol 50 (ESP) traffic is allowed." Can you pls check your configurations to make sure ESP is allowed ?
Also, the --opt encrypted option is purely data-plane encryption. All the control-plane traffic (routing exchanges, Service Discovery distribution, etc...) are all encrypted by default even without the option.

@mavenugo You're right. When I created a new network with --opt encrypted it worked fine. When I compared the newly created network with my existing one I noticed "Internal": true was set. That was likely the problem and was a mistake during the initial network creation... Thanks for your help & clarification, It's been a long day...

@dack @kobolog In typical deployments of LVS-Tunnel and LVS-DR mode the destination IP in the incoming packet will be the service VIP which is also programmed as a non ARP IP in the real servers. Routing mesh works in a fundamentally different way, the incoming request could be to any of the hosts. For the real server to accept the packet (in any LVS mode) the destination IP has to be changed to a local IP. There is no way for the reply packet from the backend container to go back with the right source address. Instead of direct return, we can try to get the reply packet back to the ingress host. But there is no clean way to do it except by changing the source IP which brings us back to square one.

@thaJeztah I think we should clarify this in the documentation, suggest using the host mod if client IP has to be preserved and close this issue.

@sanimej I still don't see why it's impossible to do this without NAT. Couldn't we just have the option to use, for example, the regular LVS-DR flow? Docker adds the non-arp vip to the appropriate nodes, LVS directs the incoming packets to the nodes, and outgoing packets return directly. Why does it matter that the incoming packet could hit any host? That's no different than standard LVS with multiple frontend and multiple backend servers.

@thaJeztah thanks for workaround :)
If you are deploying your proxy with compose version 3 new publish syntax is not supported so we can patch deployed service using this command (replace nginx_proxy with service name)

docker service update nginx_proxy \
    --publish-rm 80 \
    --publish-add "mode=host,published=80,target=80" \
    --publish-rm 443 \
    --publish-add "mode=host,published=443,target=443"

@dack In the regular LVS-DR flow the destination IP will be the service VIP. So the LB can send the packet to the backend without any dest IP change. This is not the case with routing mesh because the incoming packet's dest IP will be one of the host's IP.

@sanimej any feedback on the proposal above to use IPIP encapsulation mode to solve this issue ?

@tlvenn LVS-IP tunnel works very similar to LVS-DR, except that the backend gets the packet through an IP in IP tunnel rather than a mac-rewrite. So it has the same problem for the routing mesh use case.

From the proposal you referred to..
The real server receives the enclosing packet, decapsulates it and sees real client IP as source and virtual service IP as destination.

Destination IP of the packet would be the IP of the host to which the client sent the packet and not the VIP. If its not rewritten, the real server would drop it after removing the outer IP header. If the destination IP is rewritten, the real server's reply to the client will have an incorrect Source IP resulting in connection failure.

Thanks for the clarification @sanimej. Could you perhaps implement the PROXY protocol ? It would not provide a seamless solution but at least it would offer the service a solution to resolve the user IP.

There is a kludgy way to achieve the source IP preservation by splitting the source port range into blocks and assign a block for each host in the cluster. Then its possible to do a hybrid NAT+DR approach, where the ingress host does the usual SNAT and sends the packet to a real server. On the host where the real server is running, based on the source IP do a SNAT to change the source port to a port in the range assigned for the ingress host. Then on the return packet from the container match against the source port range (and the target port) and change the source IP to that of the ingress Host.

Technically this would work But impractical and fragile in real deployments where the cluster members are being added and removed quickly. This also reduces the port space significantly.

The NAT+DR approach I mentioned wouldn't work because the source IP can't be changed in the ingress host. By changing only the source port to one in the range for that particular host and using the routing policy from the backend host to get the packet back to the ingress host might be an option. This still has other issues I mentioned earlier.

@thaJeztah
Is any workaround at the moment to forward real IP address from Nginx container to web container?
I have Nginx container running in global mode and published to host, so Nginx container gets a correct IP address. Both containers see each other fine, however, web container gets Nginx container IP address, not a client one.
Nginx is a reverse proxy for the web, and web runs uwsgi on port 8000:

server {
    resolver 127.0.0.11;
    set $web_upstream http://web:8000;

    listen 80;
    server_name domain.com;
    location / {
        proxy_pass $web_upstream;
        proxy_next_upstream error timeout invalid_header http_500 http_502 http_503 http_504;
        proxy_redirect off;
        proxy_buffering off;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}

@lpakula Please check my answer above + this working nginx configuration

@pi0 Thanks for reply

I'm using nginx configuration from the link, but IP address is still wrong, i must have something missing in my configuration

I have a docker (17.03.0-ce) swarm cluster with overlay network and two services

    docker service create --name nginx --network overlay_network --mode=global \
        --publish mode=host,published=80,target=80 \
        --publish mode=host,published=443,target=443 \
        nginx:1.11.10

    docker service create --name web --network overlay_network \
        --replicas 1 \
        web:newest

Nginx container uses the latest official container https://hub.docker.com/_/nginx/
Web container runs uwsgi server on port 8000

I'm using global nginx.conf from the link and conf.d/default.conf looks as follow:

   server {
       resolver 127.0.0.11;
       set $web_upstream http://web:8000;

       listen 80;
       server_name domain.com;
       location / {
        proxy_pass $web_upstream;
      }
  }

And then nginx container logs:

  194.168.X.X - - [17/Mar/2017:12:25:08 +0000] "GET / HTTP/1.1" 200

Web container logs:

  10.0.0.47 - - [17/Mar/2017 12:25:08] "GET / HTTP/1.1" 200 -

What is missing there?

The IP address will still be wrong. But it will add HTTP headers that
contain real IP address. You must configure your web server of your choice
to trust proxy (use header instead of source IP)
On Fri, Mar 17, 2560 at 7:36 PM Lukasz Pakula notifications@github.com
wrote:

@pi0 https://github.com/pi0 Thanks for reply

I'm using nginx configuration from the link, but IP address is still
wrong, i must have something missing in my configuration

I have a docker (17.03.0-ce) swarm cluster with overlay network and two
services

docker service create --name nginx --network overlay_network --mode=global \
    --publish mode=host,published=80,target=80 \
    --publish mode=host,published=443,target=443 \
    nginx:1.11.10

docker service create --name web --network overlay_network \
    --replicas 1 \
    web:newest

Nginx container uses the latest official container
https://hub.docker.com/_/nginx/ http://url
Web container runs uwsgi server on port 8000

I'm using global nginx.conf from the link and conf.d/default.conf looks
as follow:

server {
resolver 127.0.0.11;
set $web_upstream http://web:8000;

   listen 80;
   server_name domain.com;
   location / {
    proxy_pass $web_upstream;
  }

}

And then nginx container logs:

194.168.X.X - - [17/Mar/2017:12:25:08 +0000] "GET / HTTP/1.1" 200

Web container logs:

10.0.0.47 - - [17/Mar/2017 12:25:08] "GET / HTTP/1.1" 200 -

What is missing there?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/docker/docker/issues/25526#issuecomment-287342795,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABtu97EFaCmLwAZiOrYT4nXi4oXPCbLQks5rmn43gaJpZM4Jf2WK
.

>

PanJ,
Panjamapong Sermsawatsri
Tel. (+66)869761168

@lpakula Ah there is another thing your web:newest image should honor X-Real-IP header too. nginx won't automatically change senders IP it just sends a hint header.

@pi0 @PanJ
It does make sense, thanks guys!

Bind the port using host mode.

nginx supports IP Transparency using the TPROXY kernel module.

@stevvooe Can Docker do something like that too?

nginx supports IP Transparency using the TPROXY kernel module.
@stevvooe Can Docker do something like that too?

Unlikely, as the entry needs to be tracked across nodes. I'll let @sanimej or @mavenugo.

Can swarm provide the REST API to get the client IP address?

@tonysongtl that's not related to this issue

Something else to consider is how your traffic is delivered to your nodes in a highly available setup. A node should be able to go down without creating errors for clients. The current recommendation is to use an external load balancer (ELB, F5, etc) and load balance at Layer 4 to each Swarm node, with a simple Layer 4 health check. I believe F5 uses SNAT, so the best case in this configuration is to capture the single IP of your F5, and not the real client IP.

References:
https://docs.docker.com/engine/swarm/ingress/#configure-an-external-load-balancer
https://success.docker.com/Architecture/Docker_Reference_Architecture%3A_Docker_EE_Best_Practices_and_Design_Considerations
https://success.docker.com/Architecture/Docker_Reference_Architecture%3A_Universal_Control_Plane_2.0_Service_Discovery_and_Load_Balancing

mirroring the comment above - can proxy protocol not be used ? All cloud load balancers and haproxy use this for source ip preservation.

Calico also has ipip mode - https://docs.projectcalico.org/v2.2/usage/configuration/ip-in-ip - which is one of the reasons why github uses it. https://githubengineering.com/kubernetes-at-github/

Hi.

For the sake of understanding and completeness, let me summarize and please correct me if I'm wrong:

The main issue is that containers aren't receiving original src-IP but swarm VIP. I have replicated this issue with the following scenario:

create docker swarm
docker service create --name web --publish 80:80 nginx
access.log source IP is 10.255.0.7 instead of client's browser IP

It seems:

When services within swarm are using (default) mesh, swarm does NAT to ensure traffic from same origin is always sent to same host-running-service?
Hence, it's loosing the original src-IP and replacing it by swarm's service VIP.

Seems @kobolog https://github.com/moby/moby/issues/25526#issuecomment-258660348 and @dack https://github.com/moby/moby/issues/25526#issuecomment-260813865 proposals were refuted by @sanimej https://github.com/moby/moby/issues/25526#issuecomment-280722179 https://github.com/moby/moby/issues/25526#issuecomment-281289906 but, TBH, his arguments aren't fully clear to me yet, neither I understand why thread hasn't been closed if this is definitively impossible. @stevvooe ?

@sanimej wouldn't this work?:

  1. Swarm receives message with src-IP=A and destination="my-service-virtual-address"
  2. Package is sent to a swarm node running that service, encapsulating the original msg.
  3. Node forwards to task changing destination to container-running-that-service-IP
    Swarm and nodes could maintain tables to ensure traffic from same origin is forwarded to same node whenever possible.

Wouldn't an option to enable "reverse proxy instead of NAT" for specific services solve all this issues satisfying everybody?

On the other hand, IIUC, the only option left is to use https://docs.docker.com/engine/swarm/services/#publish-a-services-ports-directly-on-the-swarm-node, which -again IIUC- seems to be like not using mesh at all, hence I don't see the benefits of using swarm mode (vs compose). In fact, it looks like pre-1.12 swarm, needing _Consul_ and so.

Thanks for your help and patience.
Regards

@sanimej
Even more...why Docker is not just doing a port forwarding NAT (changing only dest IP/port) ?

  1. Swarm receive message "from A to myservice"
  2. Swarm forwards message to host running that service, setting dest=node1
  3. Node1 receive message "from A to node1", and forwards setting dest=container1
  4. Container1 receives message "from A to container1"
  5. To reply, container use default gateway route

I'd just like to chime in; while I do understand that there is no easy way to do this, not having the originating IP address preserved in some manner severely hampers a number of application use cases. Here's a few I can think of off the top of my head:

  • Being able to have metrics detailing where your users originate from is vital for network/service engineering.

  • In many security applications you need to have access to the originating IP address in order to allow for dynamic blacklisting based upon service abuse.

  • Location awareness services often need to be able to access the IP address in order to locate the user's general location when other methods fail.

From my reading of this issue thread, it does not seem that the given work-around(s) work very well when you want to have scalable services within a Docker Swarm. Limiting yourself to one instance per worker node greatly reduces the flexibility of the offering. Also, maintaining a hybrid approach of having an LB/Proxy on the edge running as a non-Swarm orchestrated container before feeding into Swarm orchestrated containers seems like going back in time. Why should the user need to maintain 2 different paradigms for service orchestration? What about being able to dynamically scale the LB/Proxy at the edge? That would have to be done manually, right?

Could the Docker team perhaps consider these comments and see if there is some way to introduce this functionality, while still maintaining the quality and flexibility present in the Docker ecosystem?

As a further aside, I'm currently getting hit by this now. I have a web application which forwards authorized/authenticated requests to a downstream web server. Our service technicians need to be able to verify whether people have reached the downstream server, which they like to use web access logs for. In the current scenario, there is no way for me to provide that functionality as my proxy server never sees the originating IP address. I want my application to be easily scalable, and it doesn't seem like I can do this with the work-arounds presented, at least not without throwing new VMs around for each scaled instance.

@Jitsusama could Kubernetes solve your issue?

@thaJeztah is there a way of doing the work around using docker-compose?

I tried

`services:
  math:
    build: ./math
    restart: always
    ports:
    - target: 12555
      published: 12555
      mode: host

But it seems to take 172.x.x.1 as the source IP

@trajano, I have no clue. Does Kubernetes somehow manage to get around this issue?

@Jitsusama
Yes, they have documentation referring to how they preserve the source IP. It is functional, but not so pretty if you don't use a Load Balancer since the packet gets dropped on nodes without those endpoints. If you plan on using Rancher as your self-hosted Load Balancer, unfortunately it currently does not yet support it.

@trajano

But it seems to take 172.x.x.1 as the source IP

If you're accessing your application locally, that IP should be correct (if you use swarm) since the docker_gwbridge is the interface that interacts with your proxy container. You can try accessing the app from another machine within your IP network to see if it catches the right address.

As for the compose workaround, it is possible. Here, I use the image jwilder/nginx-proxy as my frontend reverse-proxy (to simplify concepts) along with an official build image of nginx as the backend service. I deploy the stack in Docker Swarm Mode:

version: '3.3'

services:

  nginx-proxy:
    image: 'jwilder/nginx-proxy:alpine'
    deploy:
      mode: global
    ports:
      - target: 80
        published: 80
        protocol: tcp
        mode: host
    volumes:
      - /var/run/docker.sock:/tmp/docker.sock:ro

  nginx:
    image: 'nginx:1.13.5-alpine'
    deploy:
      replicas: 3
    ports:
      - 80
      - 443
    environment:
      - VIRTUAL_HOST=website.local
$ echo '127.0.0.1 website.local' | sudo tee -a /etc/hosts
$ docker stack deploy --compose-file docker-compose.yml website

This will create a website_default network for the stack. My endpoint is defined in the environment variable VIRTUAL_HOST and accessing http://website.local gives me:

website_nginx-proxy.0.ny152x5l9sh7@Sherry    | nginx.1    | website.local 172.18.0.1 - - [08/Sep/2017:21:33:36 +0000] "GET / HTTP/1.1" 200 612 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.79 Safari/537.36"
website_nginx.1.vskh5941kgkb@Sherry    | 10.0.1.3 - - [08/Sep/2017:21:33:36 +0000] "GET / HTTP/1.1" 200 612 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.79 Safari/537.36" "172.18.0.1"

Note that the end of the header for website_nginx.1.vskh5941kgkb has hint of the original IP (172.18.0.1). X-Forwarded-For & X-Real-IP are set in the nginx.tmpl of jwilder/nginx-proxy by default.

For port 443, I was not able to add both ports in the docker-compose file, so I just use:

docker service update website_nginx-proxy \
    --publish-rm 80 \
    --publish-add "mode=host,published=80,target=80" \
    --publish-rm 443 \
    --publish-add "mode=host,published=443,target=443" \
    --network-add "<network>"

while also adding networks that I want to reverse-proxy with apps containing the environment variable VIRTUAL_HOST. More granular options are possible in the documentation for jwilder/nginx-proxy, or you can create your own setup.

Ingress controllers on Kubernetes essentially do the same thing, as ingress charts (usually) have support X-Forwarded-For and X-Real-IP with a bit more flexibility with choice and type of ingresses and also their deployment replicas.

So the kubernetes documentation is not complete. Another way which is being
pretty commonly is actually ingress+proxy protocol.

https://www.haproxy.com/blog/haproxy/proxy-protocol/

Proxy protocol is a widely accepted protocol that preserves source
information. Haproxy comes with built-in support for proxy protocol. Nginx
can read but not inject proxy protocol.

Once the proxy protocol is setup, you can access that information from any
downstream services like
https://github.com/nginxinc/kubernetes-ingress/blob/master/examples/proxy-protocol/README.md

Even openshift leverages this for source IP information
https://docs.openshift.org/latest/install_config/router/proxy_protocol.html

This is the latest haproxy ingress for k8s that injects proxy protocol.

IMHO the way to do this in swarm is to make the ingress able to read proxy
protocol (in case it's receiving traffic from an upstream LB that has
already injected proxy protocol) as well as inject proxy protocol
information (in case all the traffic actually hits the ingress first).

I am not in favour of doing it any other way especially when there is a
generally accepted standard to do this.

Traefik did add proxy_protocol support a few weeks ago and is available from v1.4.0-rc1 onwards.

This needs to be done at the docker swarm ingress level. If the ingress
does not inject proxy protocol data, none of the downstream services
(including traefix, nginx,etc) will be able to read it.

On 10-Sep-2017 21:42, "monotykamary" notifications@github.com wrote:

Traefik did add proxy_protocol support
https://github.com/containous/traefik/pull/2004 a few weeks ago and is
available from v1.4.0-rc1 onwards.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/moby/moby/issues/25526#issuecomment-328352805, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAEsU3jj5dJcpMDysjIyGQK7SGx8GwWbks5shApqgaJpZM4Jf2WK
.

im also confused on the relationship of this bug to infrakit . e.g. https://github.com/docker/infrakit/pull/601 can someone comment on the direction that docker swarm is going to take ?

Will swarm rollup into infrakit ? I'm especially keen on the ingress side of it.

We are running into this issue as well. We want to know the client ip and requested IP for inbound connections. For example if the user performs a raw TCP connection to our server, we want to know what their IP is and which ip on our machine they connected to.

@blazedd As commented previously and in other threads this is actually possible using publishMode. ie: services are not handled by mesh network.

IIUC, there are some undergoing progress towards improving how ingress handles this, but that's actually the only solution.

We anded deploying our nginx service using publishmode and mode:global, to avoid external LB configuration

@mostolog Thanks for your reply. Just a few notes:

  1. publishMode does not resolve the issue whatsoever. The inbound socket connection still resolves to the local network that swarm sets up. At least when you use the ports list mode: host
  2. nginx isn't really a good solution. Our application is TCP based, but isn't a web server. There aren't any headers we'll be able to use without coding it manually.
  3. If I use docker run --net=host ... it everything works fine.
  4. The only solution I've seen that works so far is to use: https://github.com/moby/moby/issues/25873#issuecomment-319109840

@blazedd In our stack we have:

    ports:
      - target: 80
        published: 80
        protocol: tcp
        mode: host

and so, I would bet we get real IP's on our logs.

@mostolog It does not work on Windows at least. I am still getting the 172.0.0.x address as the source.

@mostolog mode: host doesn't expose your container to the host network. It removes the container from the ingress network, which is how Docker normally operates when running a container. Its would replicate the --publish 8080:8080 used in a docker run command. If nginx is getting real ips, it's not a result of the socket being connected to those ips directly. To test this you should seriously consider using a raw TCP implementation or HTTP server, without a framework, and check the reported address.

Why not use IPVS route network to container directly? bind all swarm node's overlay interface's ips as virtual ips, use ip rule from xxx table xxx to make multi-gateway, then swarm nodes can route client to container directly(DNAT), without any userspace network proxy daemon(dockerd)

@blazedd Have you tried it? I'm getting external ip addresses when following @mostolog's example.

I'm running up against this issue again.

My setup is as follows:

  • ipvs load balancer in DR mode (external to the docker swarm)
  • 3 docker nodes, with destination IP added to all nodes and arp configured appropriately for IPVS DR routing

I would like to deploy a stack to the swarm and have it listen on port 80 on the virtual IP without mangling the addresses.

I can almost get there by doing this:
ports:
- target: 80
published: 80
protocol: tcp
mode: host

The problem here is that it doesn't allow you to specify which IP address to bind to - it just binds to all. This creates problems if you want to run more than a single service using that port. It needs to to bind only to the one IP. Using different ports isn't an option with DR load balancing. It seems that the devs made the assumption that the same IP will never exist on multiple nodes, which is not the case when using a DR load balancer.

In addition, if you use the short syntax, it will ignore the bind IP and still bind to all addresses. The only way I've found to bind to a single IP is to run a non-clustered container (not a service or stack).

So now I'm back to having to use standalone containers and having to manage them myself instead of relying on service/stack features to do that.

We have the same issue.
I'd vote for a transparent solution within docker ingress that'd allow all applications (some using raw UDP/TCP, not especially HTTP) to work as expected.

I could use the "mode=host port publishing" workaround as my service is deployed globally.
However, it seems that this is incompatible with the use of the macvlan network driver, which i need for some other reasons.
We get logs like "macvlan driver does not support port mappings".
I tried using multiple networks, but it does not help.

I created a specific ticket here : https://github.com/docker/libnetwork/issues/2050
That leaves me no solution for now :'(

Hi guys
Is there a workaround for now ? Without having it as a host port published
port ?

On 11-Jan-2018 00:03, "Olivier Voortman" notifications@github.com wrote:

We have the same issue.
I'd vote for a transparent solution within docker ingress that'd allow all
applications (some using raw UDP/TCP, not especially HTTP) to work as
expected.

I could use the "mode=host port publishing" workaround as my service is
deployed globally.
However, it seems that this is incompatible with the use of the macvlan
network driver, which i need for some other reasons.
We get logs like "macvlan driver does not support port mappings".
I tried using multiple networks, but it does not help.

I created a specific ticket here : docker/libnetwork#2050
https://github.com/docker/libnetwork/issues/2050
That leaves me no solution for now :'(


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/moby/moby/issues/25526#issuecomment-356693751, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAEsUzlM-BMbEsDYAiYH6hKLha-aRqerks5tJQJngaJpZM4Jf2WK
.

It is really a pity that is not possible to get client's IP. this makes not usable most of the docker swarm nice features.

On my setup the only way to get the client's IP is to use network_mode:host and not use swarm at all.

using mode=host port publishing or a traditional docker run -p "80:80" ... did not work

Some solutions were suggested in https://github.com/moby/moby/issues/15086 but the only solution that worked for me was "host" networking...

Another issue with not having right IP is that nginx rate limiting doesn't work correctly and therefore cannot be used with docker swarm load balancer, because requests are being rate limited and denied as nginx counts all them as they come from single user/IP. So only workaround is to use mode=host, but this way I loose load balancing capabilities and have to point DNS to specific instances.

Perhaps docker is not the ideal tool for this job, I was looking into vagrant to set up the front facing HTTP servers and put the client IP as part of the HTTP request headers.

Until Docker is capable of passing client information over overlay networks, one could use a proxy like Docker Flow Proxy or Traefik, publish the desired ports in host mode in that service and connect application services to it. Not a complete solution but works pretty well and allows for load balancing of application services / retrieval of client IP.

@deeky666 Traefik and similar work only if not dockerized

I don’t see udo support on traefik

Sent from my iPhone

Finally we gave up on docker container. It is not production ready!

On Wed, Jan 24, 2018 at 5:43 AM, Efrain notifications@github.com wrote:

I don’t see udo support on traefik

Sent from my iPhone

>


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/moby/moby/issues/25526#issuecomment-360091189, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AHf7rvMcH2iFBxcExfO_Ol0UttCspuTnks5tNwlkgaJpZM4Jf2WK
.

The problem seems partially solved in 17.12.0-ce by using mode=host.

docker service create --publish mode=host,target=80,published=80 --name=nginx nginx

It has some limitations (no routing mesh) but works!

@goetas mode=host worked for a while as a workaround, so I wouldn't say problem is somehow solved. Using mode=host has lots of limitations, port is being exposed, can't use swarm load balancing, etc.

@darklow I know the limitations, but for my usecase is fine (if not even better!). In 17.09.1-ce was not working at all, so for me is already an improvement!

Large drawback of that workaround is being not possible to avoid the down time during update.
Currently, we have to choose to give up whether stability or source IP address.

I agree. Swarm needs a high availability way to preserve source IP.

Probably using proxy protocol. I don't think it's a huge effort to add
proxy protocol support to docker swarm.

Is anyone looking into this ?

On 28-Jan-2018 22:39, "Genki Takiuchi" notifications@github.com wrote:

Large drawback of that workaround is being not possible to avoid the down
time during update.
Currently, we have to choose to give up whether stability or source IP
address.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/moby/moby/issues/25526#issuecomment-361078416, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAEsU-or7fnhKTg7fhjtZYjGYHBFRE7Dks5tPKnYgaJpZM4Jf2WK
.

@sandys I agree. Proxy protocol would be great idea.
@thaJeztah @aluzzardi @mrjana could this issue get some attention please? There haven't been any response from team for a while. Thank you.

The Proxy protocol sounds like the best solution to me. Hopefully the team will consider it.

@goetas it worked at one point at least I saw it working but it seems to have reverted back to the 172.x.x.x behaviour again in docker 1.12.6

This is VERY bad, it mitigates any rate limiting, fraud prevention, loging, secure logins, session monitoring etc.!
Listening with mode:host works, but is no real solution as you lose mesh loadbalancing and only the software loadbalanacer on the host that has the public ip has to handle all the traffic alone.

this is a very critical and important bug for us and this is blocking our go-live with Swarm. We also believe proxy protocol is the right solution for this. Docker ingress must pass source ip on proxy protocol.

On twitter one of the solutions that has been proposed is to use Traefik as ingress managed outside of Swarm. This is highly suboptimal for us - and not an overhead that we would like to manage.

If the Swarm devs want to check out how to implement proxy protocol in Swarm-ingress, they should check out all the bugs being discussed in Traefik (e.g. https://github.com/containous/traefik/issues/2619)

I got this working consistently using "compose" rather than swarm mode. Maybe something to think about.

A few concerns with proxy protocol:

Is it decoded by docker itself, or by the application? If we are relying on the application to implement proxy protocol, then this is not a general solution for all applications and only works for web servers or other application that implement proxy protocol. If docker unwraps the proxy protocol and translates the address, then it will also have to track the connection state and perform the inverse translation on outgoing packets. I'm not in favor of a web-specific solution (relying on proxy protocol in the application), as docker is useful for many non-web applications as well. This issue should be addressed for the general case of any TCP/UDP application - nothing else in docker is web-specific.

As with any other encapsulation method, there is also the concern of packet size/MTU issues. However, I think this is probably going to be a concern with just about any solution to this issue. The answer to that will likely be make sure your swarm network supports a large enough MTU to allow for the overhead. I would think most swarms are run on local networks, so that's probably not a major issue.

@trajano - We know it works with host networking (which is likely what your compose solution is doing). However, that throws out all of the cluster networking advantages of swarm (such as load balancing).

@dack Backends must know the proxy protocol.
I think it solves most cases and at least you can lay a thin passthrough-like proxy that process the protocol header in front of your backends inside containers.
Because of the lack of information is deadly issue, I believe it is necessary to solve it as fast as possible in advance of other neat solution.

proxy protocol has wide acceptance. check out the number of tools supported - https://www.haproxy.com/blog/haproxy/proxy-protocol/
it doesnt even cover the cloud load balancers (ELB, Google LB) and newer tools like Traefik.

Also - this is pretty much the standard in kubernetes : https://github.com/kubernetes/ingress-nginx#proxy-protocol

At this point, proxy protocol is pretty the most widely accepted standard in solving this problem. I dont see a massive value in reinventing this and breaking compatibility with the nginxes of the world.

These are L7 protocols. Swarm ingress is L4. There is nothing being reinvented here, it's all IPVS using DNAT.

@cpuguy83 couldnt understand what you just meant.

Proxy protocol is layer 4.
http://www.haproxy.org/download/1.8/doc/proxy-protocol.txt

The PROXY protocol's goal is to fill the server's internal structures with the
information collected by the proxy that the server would have been able to get
by itself if the client was connecting directly to the server instead of via a
proxy. The information carried by the protocol are the ones the server would
get using getsockname() and getpeername() :

  • address family (AF_INET for IPv4, AF_INET6 for IPv6, AF_UNIX)
  • socket protocol (SOCK_STREAM for TCP, SOCK_DGRAM for UDP)
  • layer 3 source and destination addresses
  • layer 4 source and destination ports if any

http://cbonte.github.io/haproxy-dconv/1.9/configuration.html#5.1-accept-proxy

accept-proxy

Enforces the use of the PROXY protocol over any connection accepted by any of
the sockets declared on the same line. Versions 1 and 2 of the PROXY protocol
are supported and correctly detected. The PROXY protocol dictates the layer
3/4 addresses of the incoming connection to be used everywhere an address is
used, with the only exception of "tcp-request connection" rules which will
only see the real connection address. Logs will reflect the addresses
indicated in the protocol, unless it is violated, in which case the real
address will still be used. This keyword combined with support from external
components can be used as an efficient and reliable alternative to the
X-Forwarded-For mechanism which is not always reliable and not even always
usable. See also "tcp-request connection expect-proxy" for a finer-grained
setting of which client is allowed to use the protocol.

Did you mean there was a better way than proxy protocol ? that's entirely possible and would love to know more in context of source ip preservation in docker swarm. However, Proxy Protocol is more widely supported by other tools (like nginx, etc) which will be downstream to swarm-ingress... as well as tools like AWS ELB which will upstream to swarm-ingress. That was my only $0.02

@sandys The proxy protocol looks like encapsulation (at least at connection initiation), which requires knowledge of the encapsulation from the receiver all the way down the stack. There are a lot of trade-offs to this approach.

I wouldn't want to support this in core, but perhaps making ingress pluggable would be a worthwhile approach.

@sandys https://github.com/sandys The proxy protocol looks like
encapsulation (at least at connection initiation), which requires knowledge
of the encapsulation from the receiver all the way down the stack. There
are a lot of trade-offs to this approach.

That is true. That's pretty much why it's a standard with an RFC. There's
momentum behind this though - pretty much every component importance
supports it. IMHO it's not a bad decision to support it.

I wouldn't want to support this in core, but perhaps making ingress
pluggable would be a worthwhile approach.

This is a larger discussion - however i might add that the single biggest
advantage of Docker Swarm over others is that it has all batteries
built-in.

I would still request you to consider proxy protocol as a great solution to
this problem which has industry support.

Is it not possible to simulate an L3 router on Linux and LxCs (not specifically docker)

@trajano Simulation is not needed but encapsulation to solve this issue.
For example, an option (ex: --use-proxy-protocol) can be provided for services that need client IP address and knows how treat encapsulated packets such as nginx.

As it currently works, the docker node that receives the packet performs SNAT and forwards the packet to the node with the application container. If some form of tunneling/encapsulation were used instead of SNAT, then it should be possible to pass the original un-altered packet to the application.

This is a solved problem in other projects. For example, with OpenStack you can use tunnels like GRE and VXLAN.

Is anyone in the recent part of this thread here to represent the docker team and at least say that 'we hear you' ? Seems quite something that a feature you would expect to be 'out of the box' and of such interest to the community is still not resolved after being first reported August 9th 2016, some 18 months ago.

Is anyone in the recent part of this thread here to represent the docker team and at least say that 'we hear you' ?

/cc @GordonTheTurtle @thaJeztah @riyazdf @aluzzardi

@bluejaguar @ruudboon I am part of Docker. This is a well known issue. Right now the network team is focused on long standing bugs with overlay networking stability. This is why there haven't really been new networking features in the last few releases.

My suggestion would be to come up with a concrete proposal that you are willing to work on to resolve the issue or at least a good enough proposal that anyone could take it and run with it.

@cpuguy83 i have been following some of the incoming proxy protocol features in k8s. E.g. https://github.com/kubernetes/kubernetes/issues/42616 (P.S. interestingly the proxy protocol here is flowing in from the Google Kubernetes Engine, which supports proxy protocol natively in HTTPS mode).

In addition, ELB has added support for Proxy Protocol v2 in Nov 2017 (https://docs.aws.amazon.com/elasticloadbalancing/latest/network/doc-history.html)

Openstack Octavia LB-as-a-service (similar to our ingress) merged proxy protocol last April -http://git.openstack.org/cgit/openstack/octavia/commit/?id=bf7693dfd884329f7d1169eec33eb03d2ae81ace

Here's some of the documentation around proxy protocol in openstack - https://docs.openshift.com/container-platform/3.5/install_config/router/proxy_protocol.html
Some of the nuances are around proxy protocol for https (both in cases when you are terminating certificates at ingress or not).

Any updates/workarounds regarding this issue? we really need to know the client ip in docker swarm mode.
Any help would be much appreciated.

My Version:

Client:
Version: 18.02.0-ce
API version: 1.36
Go version: go1.9.3
Git commit: fc4de44
Built: Wed Feb 7 21:16:33 2018
OS/Arch: linux/amd64
Experimental: false
Orchestrator: swarm

Server:
Engine:
Version: 18.02.0-ce
API version: 1.36 (minimum version 1.12)
Go version: go1.9.3
Git commit: fc4de44
Built: Wed Feb 7 21:15:05 2018
OS/Arch: linux/amd64
Experimental: false

@adijes, and other user who are facing this issue. You can bind the containers to the bridge network (as mentioned by some one in this thread).

version: "3.4"

services:
  frontend:
    image: nginx
    deploy:
      placement:
        constraints:
          - node.hostname == "prod1"
    networks:
      - default
      - bridge
  # backed services...
  # ...

networks:
  bridge:
    external:
      name: bridge

Our frontend is bind to bridge and always stay in an exact host, whose IP is bind to our public domain. This enable it receive real user IP. And because it's also bind to default network, it will be able to connect to backed services.

You can also scale the frontend, as long as you keep it live in that only host. This make the host is a Single Point of Failure, but (I think) it's OK for small site.

Edited to add more information:

My nginx containers is behind https://github.com/jwilder/nginx-proxy, I also use https://github.com/JrCs/docker-letsencrypt-nginx-proxy-companion to enable SSL. The nginx-proxy is run via docker run command, not a docker swarm service. Perhaps, that's why I got real IP from clients. The bridge network is required to allow my nginx containers communicate with nginx-proxy.

FWIW, I'm using:

Client:
 Version:      17.09.1-ce
 API version:  1.32
 Go version:   go1.8.3
 Git commit:   19e2cf6
 Built:        Thu Dec  7 22:23:40 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.09.1-ce
 API version:  1.32 (minimum version 1.12)
 Go version:   go1.8.3
 Git commit:   19e2cf6
 Built:        Thu Dec  7 22:25:03 2017
 OS/Arch:      linux/amd64
 Experimental: false

Above setup also works on another setup, which is running:

Client:
 Version:      17.09.1-ce
 API version:  1.32
 Go version:   go1.8.3
 Git commit:   19e2cf6
 Built:        Thu Dec  7 22:23:40 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.09.1-ce
 API version:  1.32 (minimum version 1.12)
 Go version:   go1.8.3
 Git commit:   19e2cf6
 Built:        Thu Dec  7 22:25:03 2017
 OS/Arch:      linux/amd64
 Experimental: false

@letientai299 that does not work for me I get

network "bridge" is declared as external, but it is not in the right scope: "local" instead of "swarm"

i have a master and three worker nodes

@trajano, see my update.

@letientai299 actually I was wondering how you got bridge to work in swarm mode. i.e. you didn't get the error I have.

@dack when you say host networking I presume you mean having

ports:
- target: 12555
  published: 12555
  protocol: tcp
  mode: host

Unfortunately, when run in docker stack deploy mode it does not work and still loses the source IP whereas docker-compose up works correctly.

I had also tried the following based on @goetas

docker service create --constraint node.hostname==exposedhost \
  --publish published=12555,target=12555,mode=host \
  trajano.net/myimage

Still no luck getting the source IP this is on Server Version: 17.12.0-ce

Seems like something everyone would want at some point, and since using overlay networks together with bridge/host networking is not really possible, this is a blocker in cases when you really need the client IP for various reasons.

Client:
Version: 17.12.0-ce
API version: 1.35
Go version: go1.9.2
Git commit: c97c6d6
Built: Wed Dec 27 20:03:51 2017
OS/Arch: darwin/amd64

Server:
Engine:
Version: 17.12.1-ce
API version: 1.35 (minimum version 1.12)
Go version: go1.9.4
Git commit: 7390fc6
Built: Tue Feb 27 22:17:54 2018
OS/Arch: linux/amd64
Experimental: true

It's 2018. Anything newer about this issue?
In swarm mode, I can't use nginx req limit. $remote_addr always caught 10.255.0.2.
This is a really serious problem about docker swarm.
Perhaps I should try kubernetes since today.

@Maslow I posted where we are just a few comments above.

Can we relax the check for

networks:
  bridge:
    external:
      name: bridge

or extend it like

networks:
  bridge:
    external:
      name: bridge
      scope: local

and scope: local networks are only allowed if network mode is host

network "bridge" is declared as external, but it is not in the right scope: "local" instead of "swarm"

or allow

networks:
  bridge:
    driver: bridge

To not fail with

failed to create service trajano_serv: Error response from daemon: The network trajano_bridge cannot be used with services. Only networks scoped to the swarm can be used, such as those created with the overlay driver.

when having mode: host on published ports.

ports:
- target: 32555
  published: 32555
  protocol: tcp
  mode: host

@trajano You can use non-swarm scoped networks with swarm already... e.g. this works:

version: '3.4'

services:
  test:
    image: alpine
    command: top
    ports:
      - target: 32555
        published: 32555
        protocol: tcp
        mode: host
    networks:
      - bridge

networks:
  bridge:
    external:
      name: bridge

Did you test this on a swarm with more than one worker with docker stack deploy. I know it works with compose.

On Mar 18, 2018, at 8:55 PM, Brian Goff notifications@github.com wrote:

@trajano You can use non-swarm scoped networks with swarm already... e.g. this works:

version: '3.4'

services:
test:
image: alpine
command: top
ports:
- target: 32555
published: 32555
protocol: tcp
mode: host
networks:
- bridge

networks:
bridge:
external:
name: bridge

You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

Yes, I'm doing this through swarm...

On Mon, Mar 19, 2018 at 9:12 AM, Archimedes Trajano <
[email protected]> wrote:

Did you test this on a swarm with more than one worker with docker stack
deploy. I know it works with compose.

On Mar 18, 2018, at 8:55 PM, Brian Goff notifications@github.com
wrote:

@trajano You can use non-swarm scoped networks with swarm already...
e.g. this works:

version: '3.4'

services:
test:
image: alpine
command: top
ports:

  • target: 32555
    published: 32555
    protocol: tcp
    mode: host
    networks:
  • bridge

networks:
bridge:
external:
name: bridge

You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/moby/moby/issues/25526#issuecomment-374206587, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAwxZsm3OohKL0sqUWhlgUNjCrqR0OaVks5tf67YgaJpZM4Jf2WK
.

--

  • Brian Goff

+1

Having this issue with the following docker swarm load balancing with 3 nodes:

overlay network <-> nginx proxy jwilder docker <-> nginx web head docker

I followed suggestions and the logs keep returning the docker network ip 10.255.0.3 instead of the Real Client IP.

+1

@cpuguy83 this is has started becoming a blocker for our larger swarm setups. as we start leveraging more of the cloud (where proxy protocol is being used defacto by load balancers), we are losing this info which is very important to us.

Do you have any idea of an ETA ? this would help us a lot.

@sandys An ETA for what exactly?

@cpuguy83 hi, thanks for your reply. im aware there is no broad agreement on how you want to solve it. I'm kind of commenting on how the team has been occupied on stability issues and is not freed up for this issue.
When would you think that this issue would be taken up (if at all) ?

Note you can solve this problem by running a global service and publishing ports using PublishMode=host. If you know which node people will be connecting on, you don't even need that, just use a constraint to fix it to that node.

@kleptog Partially you can't. It can't avoid downtime while updating service.

Test scenario - closer look into lvs/ipvs.

  • nsenter to the hidden ingress container and delete the snat rule
  • nsenter to the service with published ports, delete default gw and add default route to ingress containers ip.

Now the source ip will be preserved.

I am still trying to understand the implications of overhead, maintaining policy based routing within each service container instead of having only the snat rule in the ingress container.
It would be really relieving though, to have this working.

Sorry for my naive fiddling, but could anyone ( @dack ? ) point me to the docker code, where this is done?

Ah, now I understood. In a multinode swarm the IP has to be the lvs directors', to find it´s way back to the right node the request came in...

It would be interesting to see the code anyways, though. It could save me some time if anyone already knew. Thank you

Any updates on this, having three cluster in different countries, and even Azure Traffic Manager need real user IP's if not, he will not redirect user to good cluster etc.. Anybody, soon or ever will check on this? Thanks

Also need an update on this - this is a huge gotcha - the only way I found around this is to add another proxy in front and send x-forwarded-for to the stack, kind of means that Swarm is a non-option for public facing traffic for many scenarios.

@cpuguy83 @trajano
I can confirm that the following does not work

version: '3.4'
services:
  nginx:
    ports:
      - mode: host
        protocol: tcp
        published: 80
        target: 80
      - mode: host
        protocol: tcp
        published: 443
        target: 81
networks:
  bridge:
    external:
      name: bridge

It fails with network "bridge" is declared as external, but it is not in the right scope: "local" instead of "swarm".

docker version

Client:
 Version:       18.03.0-ce-rc4
 API version:   1.37
 Go version:    go1.9.4
 Git commit:    fbedb97
 Built: Thu Mar 15 07:33:59 2018
 OS/Arch:       windows/amd64
 Experimental:  false
 Orchestrator:  swarm

Server:
 Engine:
  Version:      18.03.0-ce
  API version:  1.37 (minimum version 1.12)
  Go version:   go1.9.4
  Git commit:   0520e24
  Built:        Wed Mar 21 23:08:31 2018
  OS/Arch:      linux/amd64
  Experimental: false

@Mobe91
Try to recreate the swarm. I also had an error. After the re-init swarm, everything worked for me.
My docker-compose.yml file:

version: "3.6"

services:
    nginx:
        image: nginx:latest
        depends_on:
            - my-app
            - my-admin
        ports: 
            - target: 80
              published: 80
              protocol: tcp
              mode: host
            - target: 443
              published: 443
              protocol: tcp
              mode: host
            - target: 9080
              published: 9080
              protocol: tcp
              mode: host
        volumes:
            - /etc/letsencrypt:/etc/letsencrypt:ro
            - /home/project/data/nginx/nginx.conf:/etc/nginx/nginx.conf:ro
            - /home/project/data/nginx/conf.d:/etc/nginx/conf.d
            - /home/project/public:/var/public
        networks:
            - my-network
            - bridge
        deploy:
            placement:
                constraints: [node.role == manager]

    my-app:
        image: my-app
        ports:
            - 8080:8080
        volumes:
            - /usr/src/app/node_modules
            - /home/project/public:/usr/src/app/public
        networks:
            - my-network

    my-admin:
        image: my-admin
        ports:
            - 9000:9000
        networks:
            - my-network

networks:
    my-network:
    bridge:
        external: true
        name: bridge

my docker version:

Client:
 Version:   18.03.0-ce
 API version:   1.37
 Go version:    go1.9.4
 Git commit:    0520e24
 Built: Wed Mar 21 23:10:01 2018
 OS/Arch:   linux/amd64
 Experimental:  false
 Orchestrator:  swarm

Server:
 Engine:
  Version:  18.03.0-ce
  API version:  1.37 (minimum version 1.12)
  Go version:   go1.9.4
  Git commit:   0520e24
  Built:    Wed Mar 21 23:08:31 2018
  OS/Arch:  linux/amd64
  Experimental: false

Sorry for my English.

@Mobe91 this is what I used but I deploy from "portainer" or on the Linux machine. I can't get it to deploy properly from Windows.

version: '3.4'
services:
  hath:
    image: trajano.net/hath
    deploy:
      placement:
        constraints:
        - node.hostname==docker-engine
    networks:
    - host
    ports:
    - target: 12555
      published: 12555
      protocol: tcp
      mode: host
    secrets:
    - hath_client_login
    volumes:
    - hath:/var/lib/hath
volumes:
  hath:
    name: 'noriko/s/hath'
    driver: cifs
networks:
  host:
    external:
      name: host
secrets:
  hath_client_login:
    external:
      name: hath_client_login

Key difference is I use host rather than bridge In my case I am also running my hosts as VirtualBox VMs and I use the router which does NAT routing and that preserves the incoming IP all the way to the container.

Of course there's no load balancing capability I think if you want load balancing you need to have something in front like a L3 router that would do load balancing

@trajano is right, the Windows client was the problem, deployment with the Linux client worked.

But I don't understand why you even need the host or bridge network?
The following works just fine for me, i.e. I get real client IP addresses in nginx:

version: '3.4'
services:
  nginx:
    ports:
      - mode: host
        protocol: tcp
        published: 80
        target: 80

@Mobe91 thanks I was meaning to open up an issue for that. Basically tie with https://github.com/moby/moby/issues/32957 since it still occured with the 18.03-ce client for Windows.

has anyone used Cilium ? http://cilium.readthedocs.io/en/latest/gettingstarted/docker/ .

It seems as if it might be able to fix this without tying services to host.

@sandys good find - I'm about to start testing it, did it work for you? I'm just about to pull nginx out of my swarm if I cant fix this.....

We hit this in redesigning our deployment to avoid pinning proxies to individual hosts (which, in production, bind to an interface for other reasons and therefore "pick up" the client IP as a byproduct).

In our test environment we can improve only by deploying to managers by constraint and setting mode = global to ensure each manager gets a running instance. It's still extra overhead to have to be aware of, particularly if we lose a manager node and something is directing our traffic to it. However, it's better than being pinned to single host.

@sandys did you try Cilium? Looks similar to Weave, which appears to suffer the same problem at least with k8s: https://github.com/kubernetes/kubernetes/issues/51014

I haven't been able to use Cilium, but I have reached out to the Cilium
devs to help out in swarm config. But I'm quite excited about Cilium
because ingress is a stated problem it wants to solve (unlike weave)

On Thu 10 May, 2018, 17:24 James Green, notifications@github.com wrote:

We hit this in redesigning our deployment to avoid pinning proxies to
individual hosts (which, in production, bind to an interface for other
reasons and therefore "pick up" the client IP as a byproduct).

In our test environment we can improve only by deploying to managers by
constraint and setting mode = global to ensure each manager gets a
running instance. It's still extra overhead to have to be aware of,
particularly if we lose a manager node and something is directing our
traffic to it. However, it's better than being pinned to single host.

@sandys https://github.com/sandys did you try Cilium? Looks similar to
Weave, which appears to suffer the same problem at least with k8s:
kubernetes/kubernetes#51014
https://github.com/kubernetes/kubernetes/issues/51014


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/moby/moby/issues/25526#issuecomment-388032011, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAEsUzQCgIeTenQIHIERxOfHKCzn1O6Aks5txCpogaJpZM4Jf2WK
.

On 10-May-2018 17:24, "James Green" notifications@github.com wrote:

We hit this in redesigning our deployment to avoid pinning proxies to
individual hosts (which, in production, bind to an interface for other
reasons and therefore "pick up" the client IP as a byproduct).

In our test environment we can improve only by deploying to managers by
constraint and setting mode = global to ensure each manager gets a running
instance. It's still extra overhead to have to be aware of, particularly if
we lose a manager node and something is directing our traffic to it.
However, it's better than being pinned to single host.

@sandys https://github.com/sandys did you try Cilium? Looks similar to
Weave, which appears to suffer the same problem at least with k8s:
kubernetes/kubernetes#51014
https://github.com/kubernetes/kubernetes/issues/51014


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/moby/moby/issues/25526#issuecomment-388032011, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAEsUzQCgIeTenQIHIERxOfHKCzn1O6Aks5txCpogaJpZM4Jf2WK
.

  • 1

hi guys,
if you want Docker Swarm support in Cilium (especially for ingress and
around this particular problem), please comment/like on this bug -
https://github.com/cilium/cilium/issues/4159

On Fri, May 11, 2018 at 12:59 AM, McBacker notifications@github.com wrote:

>

  • 1


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/moby/moby/issues/25526#issuecomment-388159466, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAEsU_18F_cNttRUaAwaRF3gVpMZ-3qSks5txJUfgaJpZM4Jf2WK
.

for me with the current version it works like this:
i then can access the other nodes in the swarm as it also is in the 'default' network

  web-server:
    image: blabla:7000/something/nginx:latest
    #ports:
    #  - "80:80"
    #  - "443:443"
    ports:
      - target: 80
        published: 80
        protocol: tcp
        mode: host
      - target: 443
        published: 443
        protocol: tcp
        mode: host        
    deploy:
      mode: global
      restart_policy:
        condition: any
      update_config:
        parallelism: 1
        delay: 30s

I can confirm that the key is to use ports.mode: host . From the documentation (https://docs.docker.com/compose/compose-file/#long-syntax-1):

mode: host for publishing a host port on each node, or ingress for a swarm mode port to be load balanced.

Then, using mode: host stop being loadbalanced by ingress and the real IP appears. As example, here are my nginx logs:

  • with mode: host
    metrics-agents_nginx.1.pip12ztq3y1h@xxxxxxxx | 62.4.X.X - - [12/Jun/2018:08:46:04 +0000] "GET /metrics HTTP/1.1" 200 173979 "-" "Prometheus/2.2.1" "-" [CUSTOM] "request_time: 0.227" remote_addr: 62.4.X.X proxy_add_x_forwarded_for: 62.4.X.X
  • without mode: host
    metrics-agents_nginx.1.q1eosiklkgac@xxxxxxxx | 10.255.0.2 - - [12/Jun/2018:08:50:04 +0000] "GET /metrics HTTP/1.1" 403 162 "-" "Prometheus/2.2.1" "-" [CUSTOM] "request_time: 0.000" remote_addr: 10.255.0.2 proxy_add_x_forwarded_for: 10.255.0.2

And If you wonder why the last log is a 403 Forbidden response, this is because of using a whitelist on nginx (allow 62.4.X.X and deny all).

Context:
Description: Debian GNU/Linux 9.4 (stretch)
Docker version 18.03.0-ce, build 0520e24

I confirm what @nperron said.
Using host mode allows to get the client ip.

Docker version 18.03.1-ce, build 9ee9f40
Ubuntu 16.04.4 LTS

I can confirm it is working.

Docker version 18.03.1-ce, build 9ee9f40
Ubuntu 16.04.4 LTS

CAVEAT: THIS WILL NOT WORK IF YOU HAVE SET IPTABLES=FALSE!
You may have done this(or at least I did) if you are using UFW to secure ports and found docker swarm to be overriding those UFW settings.

There are some tutorials around that suggest setting iptables = false on via command or in /etc/docker/daemon.json

Hopefully this saves someone the frustration I just went through!

people really should stop saying "Mode: host" = working, because that's not using Ingress. That makes it impossible to have just one container with a service running on the swarm but still be able to access it via any host. You either have to make the service "Global" or you can only access it on the host it is running, which kinda defeats the purpose of Swarm.

TLDR: "Mode: Host" is a workaround, not a solution

@r3pek While I agree with you that you lose Ingress if you use Host mode to solve this predicament, I'd say that it hardly defeats the whole purpose of Swarm, which does so much more that a public facing ingress network. In our usage scenario we have in the same overlay swarm:
management replicated containers that should only be accessed over the intranet -> they don't need the caller's ip, therefore they are configured "normally" and take advantage of the ingress.
non-exposed containers -> nothing to say about these (I belive you are underestimating the power of being able to access them via their service name though).
public facing service -> this is an nginx proxy that does https and url based routing. It was defined global even before the need to x-forward-for the client's real ip, so no real issue there.

Having nginx global and not having ingress means that you can reach it via any ip of the cluster, but it's not load balanced or fault tolerant, so we added a very very cheap and easy to set up L4 Azure Load Balancer in front of the nginx service.

As you say, Host is a workaround, but saying that enabling it completely defeats the purpose of Docker Swarm is a little exagerated imo.

Hi Roberto
I don't think it is exaggerated - because host mode exposes single points
of failure. Moreover, it expects additional layers of management for load
balancing outside the swarm ecosystem.

By saying that you used azure lb yourself, you have kind of validated that
argument.

It is tantamount to saying that "to run swarm with client ip propagation,
make sure you are using an external load balancer that you setup...Or use
one of the cloud services".

We are not saying that it is not a temporary workaround...But it would be
ignoring the promise of Swarm if we all do not categorically recognize the
shortcoming.

On Thu, 5 Jul, 2018, 14:16 Roberto Fabrizi, notifications@github.com
wrote:

@r3pek https://github.com/r3pek While I agree with you that you lose
Ingress if you use Host mode to solve this predicament, I'd say that it
hardly defeats the whole purpose of Swarm, which does so much more that a
public facing ingress network. In our usage scenario we have in the same
overlay swarm:
management replicated containers that should only be accessed over the
intranet -> they don't need the caller's ip, therefore they are configured
"normally" and take advantage of the ingress.
non-exposed containers -> nothing to say about these (I belive you are
underestimating the power of being able to access them via their service
name though).
public facing container -> this is an nginx proxy that does https and url
based routing. It was defined global even before the need to x-forward-for
the client's real ip, so no real issue there.

Having nginx global and not having ingress means that you can reach it via
any ip of the cluster, but it's not load balanced, so we added a very very
cheap and easy to set up L4 Azure Load Balancer in front of the nginx
service.

As you say, Host is a workaround, but saying that enabling it completely
defeats the purpose of Docker Swarm is a little exagerated imo.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/moby/moby/issues/25526#issuecomment-402650066, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAEsU_ogRzwM6X0PMknXxsxmZLLTtfraks5uDdJlgaJpZM4Jf2WK
.

It is clear that a poor load balancer (IPVS) was picked for the Docker Swarm's ingress. If it was supporting at least the L4 proxy protocol then this would not be an issue. Except that it would still be a L4(TCP) load balancer without all the extra features that L7 lb can give.

In Kubernetes there are L4(TCP)-L7(HTTP) load balancers like nginx ingress, haproxy ingress which both allow usage of the L4 proxy protocol or the L7 HTTP headers to ensure X-Forwarded-For is leveraged for passing the user's real IP to the backend.

I am wondering what would the Docker Swarm ingress's developers say. Probably someone has to move this case to https://github.com/docker/swarmkit/issues ?

In Kubernetes there are L4(TCP)-L7(HTTP) load balancers like nginx ingress, haproxy ingress which both allow usage of the L4 proxy protocol or the L7 HTTP headers to ensure X-Forwarded-For is leveraged for passing the user's real IP to the backend.

AFAICS, those LB services are not embedded into K8s but services which need to be explicitely deployed. You can do the same with Docker swarm as well. I do not see a difference here. (Apart from that the nginx ingress controller seems to be "official".)

As far as i know, the difference is that even if you deploy such a loadbalancing service it will be 'called' from the swarmkit loadbalancer and so you loose the users ip. So you can not disable the swarmkit loadbalancer if not using hostmode.

to be fair - in k8s, it is possible to have a custom ingress. in swarm it
is not.

swarm takes the stand that everything is "built-in". Same is the case with
networks - in k8s, you need to setup weave, etc... in swarm its built in.

so the point that andrey is making (and i kind of agree with ) is that -
swarm should make this features as part of the ingress, since the user has
no control over it.

On Sat, Jul 28, 2018 at 5:07 PM Seti notifications@github.com wrote:

As far as i know, the difference is that e en if you deploy such a
loadbalancing service it will be 'called' from the swarmkit loadbalancer
and so you loose the users ip. So you can not disable the swarmkit
loadbalancer if not using hostmode.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/moby/moby/issues/25526#issuecomment-408601274, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAEsU1-Ism_S1Awml8lO8N0Aq6rtrLH4ks5uLEzugaJpZM4Jf2WK
.

I thought we were finished ironing out our swarm wrinkles, but then we got to stage and noticed that all external access to the web server container appears as the ingress network IP.

I'm running my stack on a single-node swarm and will be doing so for at least the next few months. Can you recommend the least bad workaround for our current (single-node swarm) use case? I can't do without the client IP--too much relies on it.

Our temporary approach has been to run a simple proxy container in “global” mode (which IIRC can get the actual NIC’s IP) and then have it forward all connections to the internal service running on the swarm overlay network with added proxy headers.

If getting an x-forwarded-for header is enough for you, that setup should work AFAICT.

Thanks, @maximelb. What did you end up going with (e.g., nginx, haproxy)?

@jamiejackson that’s where things will be a bit different. In our case we are running a server that hosts long-running SSL connections and a custom binary protocol underneath so HTTP proxies were not possible. So we created a simple TCP forwarder and used a “msgpack” header that we could unpack manually on the internal server.

I’m not super familiar with HTTP proxies but I suspect most of them would do the trick for you. :-/

hi Maxime,
this is very interesting for us. Can you share you docker-compose by any
chance ?

I'm trying to understand how this works. Today we have nginx as a reverse
proxy (as a service) and multiple docker services behind it.

In your case - does nginx become the "global mode" proxy ? or is it a
special TCP forwarder. So as you scale number of nodes, the proxy forwarder
goes on each node. I somehow thought in this situation the x-forwarded for
header gets lost .. because the ingress network kills the external ip
(since there is no proxy-protocol).

We would be super-grateful if you can help us out with some more details.

regards
sandeep

On Wed, Aug 8, 2018 at 7:18 AM Maxime Lamothe-Brassard <
[email protected]> wrote:

Our temporary approach has been to run a simple proxy container in
“global” mode (which IIRC can get the actual NIC’s IP) and then have it
forward all connections to the internal service running on the swarm
overlay network with added proxy headers.

If getting an x-forwarded-for header is enough for you, that setup should
work AFAICT.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/moby/moby/issues/25526#issuecomment-411257087, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAEsUx3DOjXb79FNjsuZ-RZVqkkhHAbYks5uOkOHgaJpZM4Jf2WK
.

@sandys sure, here is an excerpt from our docker-compose with the relevant containers.

This is the reverse proxy docker-compose entry:

reverseproxy:
    image: yourorg/repo-proxy:latest
    networks:
      - network_with_backend_service
    deploy:
      mode: global
    ports:
      - target: 443
        published: 443
        protocol: tcp
        mode: host

This is the backend service entry:

backendservice:
    image: yourorg/repo-backend:latest
    networks:
      - network_with_backend_service
    deploy:
      replicas: 2

The target of the reverseproxy (the backend side) would be tasks.backendservice (which has A records for every replica). You can skip the networks part if the backend service is on the default swarm overlay network.

The global bit says "deploy this container exactly-once on every Docker swarm node. The ports mode: host is the one saying "bind to the native NIC of the node".

Hope it helps.

You are using host mode. So pretty much you have an external load balancer
in front of the whole thing.
You can't depend on Swarm anymore because you are in host mode.

That's actually the problem we have been talking about for a while :(

On Wed, 8 Aug, 2018, 20:47 Maxime Lamothe-Brassard, <
[email protected]> wrote:

@sandys https://github.com/sandys sure, here is an excerpt from our
docker-compose with the relevant containers.

This is the reverse proxy docker-compose entry:

reverseproxy:
image: yourorg/repo-proxy:latest
networks:
- network_with_backend_service
deploy:
mode: global
ports:
- target: 443
published: 443
protocol: tcp
mode: host

This is the backend service entry:

backendservice:
image: yourorg/repo-backend:latest
networks:
- network_with_backend_service
deploy:
replicas: 2

The target of the reverseproxy (the backend side) would be
tasks.backendservice (which has A records for every replica). You can
skip the networks part if the backend service is on the default swarm
overlay network.

The global bit says "deploy this container exactly-once on every Docker
swarm node. The ports mode: host is the one saying "bind to the native
NIC of the node".

Hope it helps.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/moby/moby/issues/25526#issuecomment-411442155, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAEsU8N7KAFtOp_cPO8wpbBQqzDfpBWOks5uOwEkgaJpZM4Jf2WK
.

Not 100% sure on what you mean, but externally we use a DNS with an A record per cluster node. This provides cheap "balancing" without having an external moving part. When a client makes a request, they chose a random A record, and connect to 443 on one of the cluster nodes.

There, the reverse proxy that is running on that specific node and listening on 443 gets a native connection, including the actual client IP. That reverse proxy container then adds a header and forwards the connection to another internal container using the swarm overlay network (tasks.backend). Since it uses the tasks.backend target, it will also get a random A record for an internal service.

So in the strict sense, it is bypassing magic of the overlay network that redirects the connection. It instead kind of replicates this behavior with the reverse proxy and adds a header. The final effect is the same (in a loose sense) as the magic of the overlay network. It also does it in parallel to running the swarm, meaning I can run all my other services that do not require the client IP on the same cluster without doing anything else for those.

By no means a perfect solution but until a fix is made (if ever) it gets you by without external components or major docker configuration.

@jamiejackson the "least bad" workaround we've found is using Traefik as a global service in host mode. They have a good generic example in their docs. We've seen some bugs that may or may not be related to this setup, but Traefik is a great project and it seems pretty stable on Swarm. There's a whole thread on their issues page on it (that loops back here :) ), with similar workarounds:
https://github.com/containous/traefik/issues/1880

Hope this helps. We also can't use a solution that doesn't allow us to check actual requester IPs so we're stuck with this kludge fix until something changes. It seems like a pretty common need, for security reasons at least.

Understood (and a loose version of this is what we use).

However - this particular bug' agenda was to go about requesting the devs
to build that into the magical overlay network (perhaps by using proxy
protocol or other mechanisms)

On Wed, 8 Aug, 2018, 21:22 Maxime Lamothe-Brassard, <
[email protected]> wrote:

Not 100% sure on what you mean, but externally we use a DNS with an A
record per cluster node. This provides cheap "balancing" without having an
external moving part. When a client makes a request, they chose a random A
record, and connect to 443 on one of the cluster nodes.

There, the reverse proxy that is running on that specific node and
listening on 443 gets a native connection, including the actual client IP.
That reverse proxy container then adds a header and forwards the connection
to another internal container using the swarm overlay network
(tasks.backend). Since it uses the tasks.backend target, it will also get a
random A record for an internal service.

So in the strict sense, it is bypassing magic of the overlay network that
redirects the connection. It instead kind of replicates this behavior with
the reverse proxy and adds a header. The final effect is the same (in a
loose sense) as the magic of the overlay network. It also does it in
parallel to running the swarm, meaning I can run all my other services that
do not require the client IP on the same cluster without doing anything
else for those.

By no means a perfect solution but until a fix is made (if ever) it gets
you by without external components or major docker configuration.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/moby/moby/issues/25526#issuecomment-411455384, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAEsU5RKjGc3hEk6bk-doicDa1MbYGAyks5uOwlIgaJpZM4Jf2WK
.

TBH I'm not sure why the ingress network is not being patched to add ip
data in proxy protocol.

It's incremental, it won't break existing stacks, it is a well defined
standard, it's widely supported by even the big cloud vendors, it's widely
supported by application frameworks.

Is it a significant Dev effort?

On Wed, 8 Aug, 2018, 21:30 Matt Glaser, notifications@github.com wrote:

@jamiejackson https://github.com/jamiejackson the "least bad"
workaround we've found is using Traefik as a global service in host mode.
They have a good generic example in their docs
https://docs.traefik.io/user-guide/cluster-docker-consul/#full-docker-compose-file_1.
We've seen some bugs that may or may not be related to this setup, but
Traefik is a great project and it seems pretty stable on Swarm. There's a
whole thread on their issues page on it (that loops back here :) ), with
similar workarounds:
containous/traefik#1880
https://github.com/containous/traefik/issues/1880

Hope this helps. We also can't use a solution that doesn't allow us to
check actual requester IPs so we're stuck with this kludge fix until
something changes. It seems like a pretty common need, for security reasons
at least.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/moby/moby/issues/25526#issuecomment-411458326, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAEsU7NNbsW44L95VYCvlyL_Bje-h6L9ks5uOwsUgaJpZM4Jf2WK
.

Well, Docker does not currently touch ingress traffic, so definitely at least not insignificant to add.
Keep in mind also this is an open source project, if you really want something then it's generally going to be up to you to implement it.

+1, this really is a showstopper.
I would believe the majority of applications needs the real clients ip. Just think of a mailserver stack - you cannt afford to accept mails from arbitrary hosts.

We switched to proxy_protocol nginx global stream instance host mode, which is forwarding to replicated application proxy_nginx. This works well enough for the moment.

service global nginx_stream

stream {
    resolver_timeout 5s;
    # 127.0.0.11 is docker swarms dns server
    resolver 127.0.0.11 valid=30s;
    # set does not work in stream module, using map here
    map '' $upstream_endpoint {
        default proxy_nginx:443;
    }

    server {
        listen 443;
        proxy_pass $upstream_endpoint;
        proxy_protocol on;
    }
}

service replicated nginx_proxy

server {
    listen 443 ssl http2 proxy_protocol;
    include /ssl.conf.include;

    ssl_certificate /etc/nginx/certs/main.crt;
    ssl_certificate_key /etc/nginx/certs/main.key;

    server_name example.org;

    auth_basic           "closed site";
    auth_basic_user_file /run/secrets/default.htpasswd;

    # resolver info in nginx.conf
    set $upstream_endpoint app;
    location / {
        # relevant proxy_set_header in nginx.conf
        proxy_pass http://$upstream_endpoint;
    }
}

Would it be possible to past the whole nginx config for nginx_stream and
nginx_proxy with their Swarm configs ?

This is awesome if it works !

On Tue, 11 Sep, 2018, 17:14 rubot, notifications@github.com wrote:

We switched to proxy_protocol nginx global stream instance, which is
forwarding to replicated application proxy_nginx. This works well enough
for the moment.

service global nginx_stream

stream {
resolver_timeout 5s;
# 127.0.0.11 is docker swarms dns server
resolver 127.0.0.11 valid=30s;
# set does not work in stream module, using map here
map '' $upstream_endpoint {
default proxy_nginx:443;
}

server {
    listen 443;
    proxy_pass $upstream_endpoint;
    proxy_protocol on;
}

}

service replicated nginx_proxy

server {
listen 443 ssl http2 proxy_protocol;
include /ssl.conf.include;

ssl_certificate /etc/nginx/certs/main.crt;
ssl_certificate_key /etc/nginx/certs/main.key;

server_name example.org;

auth_basic           "closed site";
auth_basic_user_file /run/secrets/default.htpasswd;

# resolver info in nginx.conf
set $upstream_endpoint app;
location / {
    # relevant proxy_set_header in nginx.conf
    proxy_pass http://$upstream_endpoint;
}

}


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/moby/moby/issues/25526#issuecomment-420244262, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAEsU5K-gK09XdI9NxLlT36IrJP7U7_cks5uZ6IrgaJpZM4Jf2WK
.

@sandys I've got a haproxy based solution for the proxy protocol part which is configured via environment variables.

Would it be possible to past the whole nginx config for nginx_stream and nginx_proxy with their Swarm configs ? This is awesome if it works !

@sandys Something like this:
https://gist.github.com/rubot/10c79ee0086a8a246eb43ab631f3581f

running into the same issue, Is this going to be addressed? seems like basic functionality that should be slated for a release.

deploy:
mode: global
ports:

  • target: 443 published: 443 protocol: tcp mode: host

Following this advice fixes the issue as docker swarm balancer is now out of the equation.
For me it is a valid solution since it is still HA and I had already haproxy (inside docker flow proxy container).
The only issue is that the haproxy stats are distribued among all the replicas so I need somehow to agregate that info when monitoring trafic for the whole cluster. In the past I just have one haproxy instance that was behind the docker swarm balancer.
Cheers,
Jacq

When reading the OP's request ( @PanJ ), it seems current features now solve this problem, as have been suggested for months. The OP didn't ask for ingress routing + client IP AFAIK, they asked for a way to have a swarm service in replica/global obtain client IP's, which is now doable. Two main areas of improvement allows this to happen:

  1. We can now create a Swarm service that "publishes" a port to the host IP, skipping the ingress routing layer
  2. That same service can attach to other networks like overlay at the same time, so it can access other services with overlay benifits

For me with 18.09 engine, I get the best of both worlds in testing. A single service can connect to backend overlay networks and also publish ports on the host NIC and see real client IP's incoming on the host IP. I'm using that with traefik reverse proxy to log client IP traffic in traefik that is destined for backend services. I feel like this could solve most requests I've seen for "logging the real IP".

@PanJ does this solve it for you?

The key is to publish ports in mode: host rather than mode: ingress (the default).

The pro to this mode is you get real client IP's and native host NIC performance (since it's outside IPVS encapulation AFAIK). The con is it will only listen on the node(s) running the replicas.

To me, the request of "I want to use ingress IPVS routing and also see client IP" is a different feature request of libnetwork.

What has changed here ? Because we have been using host mode to do this for
a long time now. In fact that is the workaround suggested in this thread as
well.

The problem is that of course you have to lock this service to a particular
host so Swarm can't schedule it elsewhere. Which is what the issue was
entirely - that proxy protocol/IPVS, etc solve this problem.

On Fri, 4 Jan, 2019, 09:34 Bret Fisher <[email protected] wrote:

When reading the OP's request ( @PanJ https://github.com/PanJ ), it
seems current features now solve this problem, as have been suggested for
months. The OP didn't ask for ingress routing + client IP AFAIK, they asked
for a way to have a swarm service in replica/global obtain client IP's,
which is now doable. Two main areas of improvement allows this to happen:

  1. We can now create a Swarm service that "publishes" a port to the
    host IP, skipping the ingress routing layer
  2. That same service can attach to other networks like overlay at the
    same time, so it can access other services with overlay benifits

For me with 18.09 engine, I get the best of both worlds in testing. A
single service can connect to backend overlay networks and also publish
ports on the host NIC and see real client IP's incoming on the host IP. I'm
using that with traefik reverse proxy to log client IP traffic in traefik
that is destined for backend services
https://github.com/BretFisher/dogvscat/blob/7e9fe5b998f2cf86951df3f443714beb413d63fb/stack-proxy-global.yml#L75-L83.
I feel like this could solve most requests I've seen for "logging the real
IP".

@PanJ https://github.com/PanJ does this solve it for you?

The key is to publish ports in mode: host rather than mode: ingress (the
default).

The pro to this mode is you get real client IP's and native host NIC
performance (since it's outside IPVS encapulation AFAIK). The con is it
will only listen on the node(s) running the replicas.

To me, the request of "I want to use ingress IPVS routing and also see
client IP" is a different feature request of libnetwork.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/moby/moby/issues/25526#issuecomment-451348906, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAEsUzs15UVWOVl54FLwBJSZJKX-9D0jks5u_tLPgaJpZM4Jf2WK
.

@BretFisher the mode: host is only a workaround but not the solution. As @sandys said that the workaround has few caveats, we should not consider this issue as fixed.

I'm not sure if there's any improvement since the workaround has been discovered. I have moved to Kubernetes for quite a long time and still be surprised that the issue is still open for over two years.

I'm still kind of surprised, why people think this is a bug. From my
perspective even the statement moving to kubernetes is not an adequate
answer. As I see kubernetes has exact the same problem/behavior. You either
have an external LB, or use something like nginx ingress proxy which must
run as daemonset. Please correct me if I am wrong, but we have the same
exact situation here, but no prepared autosolution here. Somebody could
check and pack my proposed tcp stream solution described above to get
something like nginx proxy behavior. Just accept, that swarm needs to be
customized by yourself

PanJ notifications@github.com schrieb am Fr., 4. Jan. 2019, 09:28:

@BretFisher https://github.com/BretFisher the mode: host is only a
workaround but not the solution. As @sandys https://github.com/sandys
said that the workaround has few caveats, we should not consider this issue
as fixed.

I'm not sure if there's any improvement since the workaround has been
discovered. I have moved to Kubernetes for quite a long time and still be
surprised that the issue is still open for over two years.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/moby/moby/issues/25526#issuecomment-451382365, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAPgu40OJ-uNKORD-LAD12m1lafxzMiSks5u_xCcgaJpZM4Jf2WK
.

You could even extend dockerflow project and add an nginx variant to start
kubernetes-ingressproxy for swarn. Definitely this all packed with swarm
would raise additional system container as you know there are a bunch of
them with kubernetes. Isn't it the strength of swarm for slim resource
projects to be lean?

Ruben Nicolaides ruben@rubot.de schrieb am Fr., 4. Jan. 2019, 09:48:

I'm still kind of surprised, why people think this is a bug. From my
perspective even the statement moving to kubernetes is not an adequate
answer. As I see kubernetes has exact the same problem/behavior. You either
have an external LB, or use something like nginx ingress proxy which must
run as daemonset. Please correct me if I am wrong, but we have the same
exact situation here, but no prepared autosolution here. Somebody could
check and pack my proposed tcp stream solution described above to get
something like nginx proxy behavior. Just accept, that swarm needs to be
customized by yourself

PanJ notifications@github.com schrieb am Fr., 4. Jan. 2019, 09:28:

@BretFisher https://github.com/BretFisher the mode: host is only a
workaround but not the solution. As @sandys https://github.com/sandys
said that the workaround has few caveats, we should not consider this issue
as fixed.

I'm not sure if there's any improvement since the workaround has been
discovered. I have moved to Kubernetes for quite a long time and still be
surprised that the issue is still open for over two years.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/moby/moby/issues/25526#issuecomment-451382365, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAPgu40OJ-uNKORD-LAD12m1lafxzMiSks5u_xCcgaJpZM4Jf2WK
.

Those are complex solutions - proxy protocol just adds additional header
information and is a very well known standard - haproxy, nginx, AWS elb,
etc all follow it. https://www.haproxy.com/blog/haproxy/proxy-protocol/

The surface area of the change would be limited to the Swarm built in
ingress (where this support would be added). And all services will have it
available.

On Fri, 4 Jan, 2019, 14:36 rubot <[email protected] wrote:

You could even extend dockerflow project and add an nginx variant to start
kubernetes-ingressproxy for swarn. Definitely this all packed with swarm
would raise additional system container as you know there are a bunch of
them with kubernetes. Isn't it the strength of swarm for slim resource
projects to be lean?

Ruben Nicolaides ruben@rubot.de schrieb am Fr., 4. Jan. 2019, 09:48:

I'm still kind of surprised, why people think this is a bug. From my
perspective even the statement moving to kubernetes is not an adequate
answer. As I see kubernetes has exact the same problem/behavior. You
either
have an external LB, or use something like nginx ingress proxy which must
run as daemonset. Please correct me if I am wrong, but we have the same
exact situation here, but no prepared autosolution here. Somebody could
check and pack my proposed tcp stream solution described above to get
something like nginx proxy behavior. Just accept, that swarm needs to be
customized by yourself

PanJ notifications@github.com schrieb am Fr., 4. Jan. 2019, 09:28:

@BretFisher https://github.com/BretFisher the mode: host is only a
workaround but not the solution. As @sandys https://github.com/sandys
said that the workaround has few caveats, we should not consider this
issue
as fixed.

I'm not sure if there's any improvement since the workaround has been
discovered. I have moved to Kubernetes for quite a long time and still
be
surprised that the issue is still open for over two years.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/moby/moby/issues/25526#issuecomment-451382365, or
mute
the thread
<
https://github.com/notifications/unsubscribe-auth/AAPgu40OJ-uNKORD-LAD12m1lafxzMiSks5u_xCcgaJpZM4Jf2WK

.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/moby/moby/issues/25526#issuecomment-451389574, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAEsU2FCEGFs5v6IOEy6AqjcBMl7IqEiks5u_xmTgaJpZM4Jf2WK
.

As I said, check tcp stream solution above, which already utilizes proxy
protocol.
Adding proxy protocol would require also configuration inside container if
added to swarm upstream. I see no value, besides a cleaner and maybe better
documented goal in your request

Sandeep Srinivasa notifications@github.com schrieb am Fr., 4. Jan. 2019,
11:37:

Those are complex solutions - proxy protocol just adds additional header
information and is a very well known standard - haproxy, nginx, AWS elb,
etc all follow it. https://www.haproxy.com/blog/haproxy/proxy-protocol/

The surface area of the change would be limited to the Swarm built in
ingress (where this support would be added). And all services will have it
available.

On Fri, 4 Jan, 2019, 14:36 rubot <[email protected] wrote:

You could even extend dockerflow project and add an nginx variant to
start
kubernetes-ingressproxy for swarn. Definitely this all packed with swarm
would raise additional system container as you know there are a bunch of
them with kubernetes. Isn't it the strength of swarm for slim resource
projects to be lean?

Ruben Nicolaides ruben@rubot.de schrieb am Fr., 4. Jan. 2019, 09:48:

I'm still kind of surprised, why people think this is a bug. From my
perspective even the statement moving to kubernetes is not an adequate
answer. As I see kubernetes has exact the same problem/behavior. You
either
have an external LB, or use something like nginx ingress proxy which
must
run as daemonset. Please correct me if I am wrong, but we have the same
exact situation here, but no prepared autosolution here. Somebody could
check and pack my proposed tcp stream solution described above to get
something like nginx proxy behavior. Just accept, that swarm needs to
be
customized by yourself

PanJ notifications@github.com schrieb am Fr., 4. Jan. 2019, 09:28:

@BretFisher https://github.com/BretFisher the mode: host is only a
workaround but not the solution. As @sandys <
https://github.com/sandys>
said that the workaround has few caveats, we should not consider this
issue
as fixed.

I'm not sure if there's any improvement since the workaround has been
discovered. I have moved to Kubernetes for quite a long time and still
be
surprised that the issue is still open for over two years.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/moby/moby/issues/25526#issuecomment-451382365,
or
mute
the thread
<

https://github.com/notifications/unsubscribe-auth/AAPgu40OJ-uNKORD-LAD12m1lafxzMiSks5u_xCcgaJpZM4Jf2WK
>

.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/moby/moby/issues/25526#issuecomment-451389574, or
mute
the thread
<
https://github.com/notifications/unsubscribe-auth/AAEsU2FCEGFs5v6IOEy6AqjcBMl7IqEiks5u_xmTgaJpZM4Jf2WK

.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/moby/moby/issues/25526#issuecomment-451409453, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAPgu83fSrSzfopOlDXsDooN1tMboGZaks5u_y8EgaJpZM4Jf2WK
.

The solution above requires host mode binding . That is the big issue. It
eliminates the possibility of using docker scheduler to allocate containers
to different hosts - I no longer am part of the mesh network.

On Fri, 4 Jan, 2019, 17:28 rubot <[email protected] wrote:

As I said, check tcp stream solution above, which already utilizes proxy
protocol.
Adding proxy protocol would require also configuration inside container if
added to swarm upstream. I see no value, besides a cleaner and maybe better
documented goal in your request

Sandeep Srinivasa notifications@github.com schrieb am Fr., 4. Jan. 2019,
11:37:

Those are complex solutions - proxy protocol just adds additional header
information and is a very well known standard - haproxy, nginx, AWS elb,
etc all follow it. https://www.haproxy.com/blog/haproxy/proxy-protocol/

The surface area of the change would be limited to the Swarm built in
ingress (where this support would be added). And all services will have
it
available.

On Fri, 4 Jan, 2019, 14:36 rubot <[email protected] wrote:

You could even extend dockerflow project and add an nginx variant to
start
kubernetes-ingressproxy for swarn. Definitely this all packed with
swarm
would raise additional system container as you know there are a bunch
of
them with kubernetes. Isn't it the strength of swarm for slim resource
projects to be lean?

Ruben Nicolaides ruben@rubot.de schrieb am Fr., 4. Jan. 2019, 09:48:

I'm still kind of surprised, why people think this is a bug. From my
perspective even the statement moving to kubernetes is not an
adequate
answer. As I see kubernetes has exact the same problem/behavior. You
either
have an external LB, or use something like nginx ingress proxy which
must
run as daemonset. Please correct me if I am wrong, but we have the
same
exact situation here, but no prepared autosolution here. Somebody
could
check and pack my proposed tcp stream solution described above to get
something like nginx proxy behavior. Just accept, that swarm needs to
be
customized by yourself

PanJ notifications@github.com schrieb am Fr., 4. Jan. 2019, 09:28:

@BretFisher https://github.com/BretFisher the mode: host is only
a
workaround but not the solution. As @sandys <
https://github.com/sandys>
said that the workaround has few caveats, we should not consider
this
issue
as fixed.

I'm not sure if there's any improvement since the workaround has
been
discovered. I have moved to Kubernetes for quite a long time and
still
be
surprised that the issue is still open for over two years.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/moby/moby/issues/25526#issuecomment-451382365,
or
mute
the thread
<

https://github.com/notifications/unsubscribe-auth/AAPgu40OJ-uNKORD-LAD12m1lafxzMiSks5u_xCcgaJpZM4Jf2WK

>

.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/moby/moby/issues/25526#issuecomment-451389574, or
mute
the thread
<

https://github.com/notifications/unsubscribe-auth/AAEsU2FCEGFs5v6IOEy6AqjcBMl7IqEiks5u_xmTgaJpZM4Jf2WK
>

.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/moby/moby/issues/25526#issuecomment-451409453, or
mute
the thread
<
https://github.com/notifications/unsubscribe-auth/AAPgu83fSrSzfopOlDXsDooN1tMboGZaks5u_y8EgaJpZM4Jf2WK

.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/moby/moby/issues/25526#issuecomment-451424992, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAEsU-q-I3fXVAP9JcGgTdJJOzI7b575ks5u_0HIgaJpZM4Jf2WK
.

As I said, kubernetes nginx ingress needs host mode binding too, called
daemonset. External LB connect to nodeports, which also require host mode
in service, or manually configure proxy protocol in service. Kubernetes
deals with the same problems, still.
One possible feature request from my point of view for swarm would be to
make the networking provider pluggable. This would make possible to use
other techniques than lvs/iptables

Sandeep Srinivasa notifications@github.com schrieb am Fr., 4. Jan. 2019,
13:05:

The solution above requires host mode binding . That is the big issue. It
eliminates the possibility of using docker scheduler to allocate containers
to different hosts - I no longer am part of the mesh network.

On Fri, 4 Jan, 2019, 17:28 rubot <[email protected] wrote:

As I said, check tcp stream solution above, which already utilizes proxy
protocol.
Adding proxy protocol would require also configuration inside container
if
added to swarm upstream. I see no value, besides a cleaner and maybe
better
documented goal in your request

Sandeep Srinivasa notifications@github.com schrieb am Fr., 4. Jan.
2019,
11:37:

Those are complex solutions - proxy protocol just adds additional
header
information and is a very well known standard - haproxy, nginx, AWS
elb,
etc all follow it.
https://www.haproxy.com/blog/haproxy/proxy-protocol/

The surface area of the change would be limited to the Swarm built in
ingress (where this support would be added). And all services will have
it
available.

On Fri, 4 Jan, 2019, 14:36 rubot <[email protected] wrote:

You could even extend dockerflow project and add an nginx variant to
start
kubernetes-ingressproxy for swarn. Definitely this all packed with
swarm
would raise additional system container as you know there are a bunch
of
them with kubernetes. Isn't it the strength of swarm for slim
resource
projects to be lean?

Ruben Nicolaides ruben@rubot.de schrieb am Fr., 4. Jan. 2019,
09:48:

I'm still kind of surprised, why people think this is a bug. From
my
perspective even the statement moving to kubernetes is not an
adequate
answer. As I see kubernetes has exact the same problem/behavior.
You
either
have an external LB, or use something like nginx ingress proxy
which
must
run as daemonset. Please correct me if I am wrong, but we have the
same
exact situation here, but no prepared autosolution here. Somebody
could
check and pack my proposed tcp stream solution described above to
get
something like nginx proxy behavior. Just accept, that swarm needs
to
be
customized by yourself

PanJ notifications@github.com schrieb am Fr., 4. Jan. 2019,
09:28:

@BretFisher https://github.com/BretFisher the mode: host is
only
a
workaround but not the solution. As @sandys <
https://github.com/sandys>
said that the workaround has few caveats, we should not consider
this
issue
as fixed.

I'm not sure if there's any improvement since the workaround has
been
discovered. I have moved to Kubernetes for quite a long time and
still
be
surprised that the issue is still open for over two years.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<https://github.com/moby/moby/issues/25526#issuecomment-451382365
,
or
mute
the thread
<

https://github.com/notifications/unsubscribe-auth/AAPgu40OJ-uNKORD-LAD12m1lafxzMiSks5u_xCcgaJpZM4Jf2WK

>

.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/moby/moby/issues/25526#issuecomment-451389574,
or
mute
the thread
<

https://github.com/notifications/unsubscribe-auth/AAEsU2FCEGFs5v6IOEy6AqjcBMl7IqEiks5u_xmTgaJpZM4Jf2WK

>

.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/moby/moby/issues/25526#issuecomment-451409453, or
mute
the thread
<

https://github.com/notifications/unsubscribe-auth/AAPgu83fSrSzfopOlDXsDooN1tMboGZaks5u_y8EgaJpZM4Jf2WK
>

.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/moby/moby/issues/25526#issuecomment-451424992, or
mute
the thread
<
https://github.com/notifications/unsubscribe-auth/AAEsU-q-I3fXVAP9JcGgTdJJOzI7b575ks5u_0HIgaJpZM4Jf2WK

.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/moby/moby/issues/25526#issuecomment-451426276, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAPguw88UN68sw_TNTunZpuAGqgvexxMks5u_0NxgaJpZM4Jf2WK
.

And just to clarify, the above solution has tcp stream in front of service
proxy. So, your request is definitely not a bug, but a feature request. And
this feature could only be implemented in swarm, if networking mode would
change, as the main problem remains in losing ip on Nat/host level

Ruben Nicolaides ruben@rubot.de schrieb am Fr., 4. Jan. 2019, 13:11:

As I said, kubernetes nginx ingress needs host mode binding too, called
daemonset. External LB connect to nodeports, which also require host mode
in service, or manually configure proxy protocol in service. Kubernetes
deals with the same problems, still.
One possible feature request from my point of view for swarm would be to
make the networking provider pluggable. This would make possible to use
other techniques than lvs/iptables

Sandeep Srinivasa notifications@github.com schrieb am Fr., 4. Jan.
2019, 13:05:

The solution above requires host mode binding . That is the big issue. It
eliminates the possibility of using docker scheduler to allocate
containers
to different hosts - I no longer am part of the mesh network.

On Fri, 4 Jan, 2019, 17:28 rubot <[email protected] wrote:

As I said, check tcp stream solution above, which already utilizes proxy
protocol.
Adding proxy protocol would require also configuration inside container
if
added to swarm upstream. I see no value, besides a cleaner and maybe
better
documented goal in your request

Sandeep Srinivasa notifications@github.com schrieb am Fr., 4. Jan.
2019,
11:37:

Those are complex solutions - proxy protocol just adds additional
header
information and is a very well known standard - haproxy, nginx, AWS
elb,
etc all follow it.
https://www.haproxy.com/blog/haproxy/proxy-protocol/

The surface area of the change would be limited to the Swarm built in
ingress (where this support would be added). And all services will
have
it
available.

On Fri, 4 Jan, 2019, 14:36 rubot <[email protected] wrote:

You could even extend dockerflow project and add an nginx variant to
start
kubernetes-ingressproxy for swarn. Definitely this all packed with
swarm
would raise additional system container as you know there are a
bunch
of
them with kubernetes. Isn't it the strength of swarm for slim
resource
projects to be lean?

Ruben Nicolaides ruben@rubot.de schrieb am Fr., 4. Jan. 2019,
09:48:

I'm still kind of surprised, why people think this is a bug. From
my
perspective even the statement moving to kubernetes is not an
adequate
answer. As I see kubernetes has exact the same problem/behavior.
You
either
have an external LB, or use something like nginx ingress proxy
which
must
run as daemonset. Please correct me if I am wrong, but we have the
same
exact situation here, but no prepared autosolution here. Somebody
could
check and pack my proposed tcp stream solution described above to
get
something like nginx proxy behavior. Just accept, that swarm
needs to
be
customized by yourself

PanJ notifications@github.com schrieb am Fr., 4. Jan. 2019,
09:28:

@BretFisher https://github.com/BretFisher the mode: host is
only
a
workaround but not the solution. As @sandys <
https://github.com/sandys>
said that the workaround has few caveats, we should not consider
this
issue
as fixed.

I'm not sure if there's any improvement since the workaround has
been
discovered. I have moved to Kubernetes for quite a long time and
still
be
surprised that the issue is still open for over two years.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<
https://github.com/moby/moby/issues/25526#issuecomment-451382365>,
or
mute
the thread
<

https://github.com/notifications/unsubscribe-auth/AAPgu40OJ-uNKORD-LAD12m1lafxzMiSks5u_xCcgaJpZM4Jf2WK

>

.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/moby/moby/issues/25526#issuecomment-451389574,
or
mute
the thread
<

https://github.com/notifications/unsubscribe-auth/AAEsU2FCEGFs5v6IOEy6AqjcBMl7IqEiks5u_xmTgaJpZM4Jf2WK

>

.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/moby/moby/issues/25526#issuecomment-451409453,
or
mute
the thread
<

https://github.com/notifications/unsubscribe-auth/AAPgu83fSrSzfopOlDXsDooN1tMboGZaks5u_y8EgaJpZM4Jf2WK
>

.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/moby/moby/issues/25526#issuecomment-451424992, or
mute
the thread
<
https://github.com/notifications/unsubscribe-auth/AAEsU-q-I3fXVAP9JcGgTdJJOzI7b575ks5u_0HIgaJpZM4Jf2WK

.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/moby/moby/issues/25526#issuecomment-451426276, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAPguw88UN68sw_TNTunZpuAGqgvexxMks5u_0NxgaJpZM4Jf2WK
.

  1. after such a long thread, i was trying to document the current featureset with a complete example.
  2. I don't see your specific need in the OP's request. @PanJ asked to see client IP's out of the box, which as of mid 2018 you can do. I don't see them requiring that it also use ingress routing mesh.

Whether you call it a bug or a feature request, ingress mesh without source nat is (in my opinion) essential. There are many applications that break when the can't see the true source IP. Sure, in the case of web servers you can reverse proxy using a host node and add client IP headers. However, this adds overhead and is probably not an option for non web-based applications. With an application that actually needs the real source IP on the packet to be correct, the only option is to not use ingress mesh. That throws out a large part of the benefit of using swarm in the first place.

Please let us know when this issue has been fixed or not ?!
should we use kuberneties instead ?

I ran in the same issue... i've not found a fix at the moment.

When someone find a solution for this behaviour, please report it here.

Thanks!

I have the same issue. I have an apache httpd server and I want to log all access in order to extract statistics later about what countries we are receiving requests from.

I stumbled upon this issue myself while trying to figure out why php:apache wasn't logging the host header field correctly. I am shocked and disappointed this isn't working yet after all these years. How are we supposed to use Swarm Mode for web hosting when the host field keeps logging the userland proxy IP? I haven't been able to find a way around this with Swarm Mode. I suppose I could use Classic Swarm (container based) and something like Consul but I feel that's going backwards.

I found an acceptable solution for my scenario:

services:
  server:
    image: httpd:2
    deploy:
      mode: global
    ports:
      - target: 80
        published: 80
        protocol: tcp
        mode: host
      - target: 443
        published: 443
        protocol: tcp
        mode: host
    networks:
      - my_second_service
      - another_great_software

This will cause apache to listen on the host computer instead of behind the overlay network (reading the proper remote IP address), while still proxying requests to other services via the networks options and achieving "high availability" by having it running everywhere

@rafaelsierra - here's the issue I have with this (and correct me if I'm wrong), but this configuration only allows for one Apache/PHP container to be running and binding to port 80 on the host node. I need to run lots and lots of Apache containers with an Nginx container bound to port 80/443, and then vhost them.

@SysEngDan yes, it is true that you can only have a single container binding to the 80/443 ports, but in my case that is not a problem because the container that binds to this port is only responsible for proxying all requests to other containers that are running behind the overlay network.

You can probably use the same solution by having a single nginx/apache container receiving all requests and proxing to the proper container based on the vhost, and those containers don't have to bind to the host

@rafaelsierra - Respectfully, I'm not sure if you understand the issue documented in this ticket. If I configure services as you mentioned in your last paragraph, the issue is that the Client IP isn't passed to the containers only listening in the overlay network. If I bind directly to the host, not an issue. If we rely on docker network proxying from external (host) to internal (overlay), the destination Apache container won't receive the original Client IP address but instead that IP of the proxy (from docker networking).

@SysEngDan I do understand the issue, and since there is no solution for the past 2 years (and I am honestly not sure if this is "fixable"), I had to come up with an alternative solution that fits my need (restrict access based on remote IP address).

Having a single container listening on port 80/443 on the host and then proxing to other containers (with appropriate HTTP headers that I didn't mentioned because is outside the scope of this issue) solved my problem, and I wanted to share this solution for the people who faces a similar problem due to overlayed networks not being able to pass the remote IP address

Oh I see what you did there..... sorry, I missed that. You cut out the overlay network and instead attached your external facing container directly to the service network (that which is created automatically when you start a new service without specifying a network). Ok, I think that does work. The added overhead is the task of adding the service network to the docker-compose file. I wonder what happens when the host-container starts and one of those services isn't available?

In that case you will get a 502.

I don't have a single docker-compose.yml, I have multiple stacks with multiple services that talks to each other via overlayed network, and then I have the public facing service that is binding to the host server but still have access to the all other overlayed networks so it can proxy all requests.

The host mode workaround has been discussed multiple times on this issue already. While it may be OK for some limited scenarios (such as certain reverse proxy web traffic setups), it is not a general solution to this problem. Please read the previous posts rather than re-hashing the same "solutions" over again.

@darrellenns there is over 200 comments here, I think it would be better to lock and clean this issue providing the basic "just use host bind if it applies to you" solution while no official solution if provided, otherwise more people like me will miss that and just keep commenting the same stuff over and over

So, I believe that this bug affects traefiks ability to whitelist ips. Is that correct?

Anyway, for anybody looking to run swarm mode, this is an example with using host mode to publish ports.

docker service create \
--name traefik \
--constraint=node.role==manager \
--publish mode=host,target=80,published=80 \
--publish mode=host,target=443,published=443 \
--mount type=bind,source=/var/run/docker.sock,target=/var/run/docker.sock \
--mount type=bind,source=/home/$USER/dev-ops/logs,target=/dev-ops/logs \
--mount type=bind,source=/opt/data/traefik/traefik.toml,target=/traefik.toml \
--mount type=bind,source=/opt/data/traefik/acme.json,target=/acme.json \
--network traefik \
--label traefik.frontend.rule=Host:traefik.example.com \
--label traefik.port=8080 \
traefik \
--docker \
--docker.swarmMode \
--docker.watch \
--docker.exposedByDefault

@coltenkrauter I do not know exactly what it affects, but in host mode I can only run one replica of traefik service, and I do not think it is just me. This way I have to fully trust traefik stability without relaying on swarm mode feature for services.

Also, as first reported, it has not so much to do with traefik special needs, it was tested with a generic http service that does not receive the original ip, that means that docker swarm mode is broken (this important feature is missing), and it look like nobody care about it.

And I want to keep commenting on this stuff, because I hope that nois is disturbing someone who would prefer to fix it :) (sorry, it applys the same to me from my users)

in host mode I can only run one replica of traefik service, and I do not think it is just me. This way I have to fully trust traefik stability without relaying on swarm mode feature for services.

You can run one instance per host

in host mode I can only run one replica of traefik service, and I do not think it is just me. This way I have to fully trust traefik stability without relaying on swarm mode feature for services.

You can run one instance per host

ya, but traefik is forced to work on manager node, because it needs this to work properly. So, one manager node, one host, one instance

traefik can work off manager nodes in multiple ways, including using a
docker socket proxy, remote socket, or traefik enterprise. here's an
example stack file for how to do that:
https://github.com/BretFisher/dogvscat/blob/master/stack-proxy-global.yml

On Sat, Mar 16, 2019 at 5:25 PM Daniele Cruciani notifications@github.com
wrote:

in host mode I can only run one replica of traefik service, and I do not
think it is just me. This way I have to fully trust traefik stability
without relaying on swarm mode feature for services.

You can run one instance per host

ya, but traefik is forced to work on manager node, because it needs this
to work properly. So, one manager node, one host, one instance


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/moby/moby/issues/25526#issuecomment-473593956, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAwW31DHIwEJE1EqN3-8qj44WopocuQTks5vXWE_gaJpZM4Jf2WK
.

It is interesting to know it, but see, this feature is available on kubernetes but not in docker swarm mode, and you are insisting there are options to run multiple instances of traefik, but in multiple nodes, if I want to run multiple instance in a single node, it is not possible, because this is not supported.
Also, any other service, that does not just proxies requests, is not allowed to map any port, because it needs a special kind of configuration that need to map every host to it, and anyway it needs multiple node, at least one per instance.

And so on, and so on. You can scroll this discussion up and found other concerning about it. I do not think it could be reduced to a demostration of how good are you to produce workaround, because those remains workaround hard to maintain and hard to follow. And all the time spent to maintain special case workaround are better spent to fix the problem.

On other hand, if this kind of feature is a security problem for the model of docker swarm, just mark it as wontfix and I would plan to switch to kubernetes, if it is the case, I do not think there are conflict between projects, it is just saying explicitely it would never happen, and so everybody can take action, if possible before the choice of docker swarm mode for any kind of node swarm thing

There are lots of features in kubernetes that are not in swarm, and vice versa. We all make decisions on which orchestrator to use for a specific solution based on many factors, including features. No one tool solves all problems/needs.

I'm just a community member trying to help. If you don't like the current solutions for this problem, then it sounds like you should look at other ways to solve it, possibly with something like kubernetes. That's a reasonable reason to choose one orchestrator over another if you think the kubernetes way of solving it is more to your liking.

Historically, the moby and swarm maintainers don't close issues like this as wontfix because tomorrow someone from the community could drop a PR with a solution to this problem. Also, I think discussing the ways to work around it until then, are a valid use of this issue thread. :)

While not a swarm maintainer, I can say that historically the team doesn't disclose future feature plans beyond what PR's you can currently see getting commits in the repos.

I forgot to say that of course your comment is welcomed (or I said it in a obscure way, sorry). But I like to reinforce the original @PanJ report:

In the meantime, I think I have to do a workaround which is running a proxy container outside of swarm mode and let it forward to published port in swarm mode (SSL termination should be done on this container too), which breaks the purpose of swarm mode for self-healing and orchestration.

I mean this "breaks the purpose of swarm mode", of course only on this specific topic, is enought to deserve more attension.

I'm trying to get my team to build a PR which adds the proxy protocol to
the ingress network. We are not Golang programmers, so we find it a bit
tricky.

But I'm fervently hoping that the Docker team agrees that the best and most
compatible (across the ecosystem) solution is to layer on proxy protocol
support to the ingress network.

The complexity comes in the fact that the ingress network not only has to
inject its own headers, but it has to support the fact that there might be
upstream proxy protocol headers already inserted (for example Google LB or
AWS ELB) .

On Sun, 17 Mar, 2019, 12:17 Daniele Cruciani, notifications@github.com
wrote:

I forgot to say that of course your comment is welcomed (or I said it in a
obscure way, sorry). But I like to reinforce the original @PanJ
https://github.com/PanJ report:

In the meantime, I think I have to do a workaround which is running a
proxy container outside of swarm mode and let it forward to published port
in swarm mode (SSL termination should be done on this container too), which
breaks the purpose of swarm mode for self-healing and orchestration.

I mean this "breaks the purpose of swarm mode", of course only on this
specific topic, is enought to deserve more attension.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/moby/moby/issues/25526#issuecomment-473621667, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAEsUwNWJsGKlLejcNzS2pR0awBB4OVlks5vXeTugaJpZM4Jf2WK
.

https://stackoverflow.com/questions/50585616/kubernetes-metallb-traefik-how-to-get-real-client-ip
As asked for k8s where it is layered, completely and configurable

For anyone running nginx on digitalocean with docker swarm and trying to get the real $remote_addr instead of just 10.255.0.2 within your nginx logs; you can use the solution from @coltenkrauter. The catch is that you can only run one nginx container on the host with this solution, which should be ok for most people.

Just change your docker-compose.yml file:

INCORRECT

services:
  nginx:
    ports:
      - "80:80"
      - "443:443"

CORRECT

services:
  nginx:
    ports:
      - target: 80
        published: 80
        mode: host
      - target: 443
        published: 443
        mode: host

_edit: now we're all guaranteed to get the right answer_

Not using ingress (mode: host) is not a workaround, when the issue states that the problem happens with ingress networking.
Nobody would use just a single host as a reverse-proxy. You want multiple hosts with a floating ip, and the swarm-mesh is mandatory to achieve this setup.

Maybe it's not possible, but I thought, modifying the iptables rules to do MASQUERADE at some stage in the INGRESS chains would be a workaround so it preserves the real source ip. Aren't some iptables/netfilter experts around?

Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy DROP)
target     prot opt source               destination         
DOCKER-USER  all  --  anywhere             anywhere            
DOCKER-INGRESS  all  --  anywhere             anywhere            
DOCKER-ISOLATION-STAGE-1  all  --  anywhere             anywhere            
ACCEPT     all  --  anywhere             anywhere             ctstate RELATED,ESTABLISHED
DOCKER     all  --  anywhere             anywhere            
ACCEPT     all  --  anywhere             anywhere            
ACCEPT     all  --  anywhere             anywhere             ctstate RELATED,ESTABLISHED
DOCKER     all  --  anywhere             anywhere            
ACCEPT     all  --  anywhere             anywhere            
ACCEPT     all  --  anywhere             anywhere            
DROP       all  --  anywhere             anywhere            

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         

Chain DOCKER (2 references)
target     prot opt source               destination         

Chain DOCKER-INGRESS (1 references)
target     prot opt source               destination         
RETURN     all  --  anywhere             anywhere            

Chain DOCKER-ISOLATION-STAGE-1 (1 references)
target     prot opt source               destination         
DOCKER-ISOLATION-STAGE-2  all  --  anywhere             anywhere            
DOCKER-ISOLATION-STAGE-2  all  --  anywhere             anywhere            
RETURN     all  --  anywhere             anywhere            

Chain DOCKER-ISOLATION-STAGE-2 (2 references)
target     prot opt source               destination         
DROP       all  --  anywhere             anywhere            
DROP       all  --  anywhere             anywhere            
RETURN     all  --  anywhere             anywhere            

Chain DOCKER-USER (1 references)
target     prot opt source               destination         
RETURN     all  --  anywhere             anywhere            

As an alternative, can't swarm just take the original source ip and create a X-Forwarded-For Header?

Nobody would use just a single host as a reverse-proxy. You want multiple hosts with a floating ip, and the swarm-mesh is mandatory to achieve this setup.

Each node in the swarm can run an instance of the reverse-proxy, and route traffic to the underlying services over an overlay network (but only the proxy would know about the original IP-address).

Make sure to read the whole thread (I see GitHub hides quite some useful comments, so you'll have to expand those :disappointed:);

As an alternative, can't swarm just take the original source ip and create a X-Forwarded-For Header?

See https://github.com/moby/moby/issues/25526#issuecomment-367642600; X-Forwarded-For is L7 protocol; Swarm ingress is L4, using IPVS with DNAT

@port22 generally we agree that a workaround is not a solution, a solution is to make it layerable, see @sandys propose on #25526 comment

As an alternative, can't swarm just take the original source ip and create

a X-Forwarded-For Header?
See #25526 (comment)
https://github.com/moby/moby/issues/25526#issuecomment-367642600;
X-Forwarded-For is L7 protocol; Swarm ingress is L4, using IPVS with DNAT

the right solution here is proxy protocol injected at L4 . there are some
relevant pro and con discussions in Envoy for the same usecase
https://github.com/envoyproxy/envoy/issues/4128 and
https://github.com/envoyproxy/envoy/issues/1031

On Wed, Apr 10, 2019 at 1:40 AM Sebastiaan van Stijn <
[email protected]> wrote:

Nobody would use just a single host as a reverse-proxy. You want multiple
hosts with a floating ip, and the swarm-mesh is mandatory to achieve this
setup.

Each node in the swarm can run an instance of the reverse-proxy, and route
traffic to the underlying services over an overlay network (but only the
proxy would know about the original IP-address).

Make sure to read the whole thread (I see GitHub hides quite some useful
comments, so you'll have to expand those 😞);

As an alternative, can't swarm just take the original source ip and create
a X-Forwarded-For Header?

See #25526 (comment)
https://github.com/moby/moby/issues/25526#issuecomment-367642600;
X-Forwarded-For is L7 protocol; Swarm ingress is L4, using IPVS with DNAT


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/moby/moby/issues/25526#issuecomment-481415217, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAEsU5KdnWQ21hJx_xzc-QROJiWbAlulks5vfPOigaJpZM4Jf2WK
.

Each node in the swarm can run an instance of the reverse-proxy

This eliminates the feature of the swarm load balancer, which that problem is actually all about.
And my problem to be specific is, that traefik is not cluster-agile. It must be run standalone unless you're using consul as a configuration backend, which then limits the maximum certificates to ~100 which is not applicable for me. Sure, you can state that this is not a swarm problem but traefik's problem. fun fact: traefik states this is a consul problem. consul states: traefik does it wrong.

@port22 generally we agree that a workaround is not a solution

My point is, that NOT using ingress ist not a workaround when you NEED ingress. A workaround would be something that would make it possible to still use the swarm loadbalancer while persisting the source ip, even if it requires some hacking.

using IPVS with DNAT

thus I was thinking that it could be done with MASQUERADE within the DNAT rule/chain. ?

@port22 I got your point, but docker manage its networks by itself, I tried to make it works with shorewall, but the only way is to create exceptions for docker rules/chains, and I had no success with docker swarm mode (but it is ok for docker in swarm mode, as far I disable all services but the ones running into the swarm)
Maybe there should be options like there are for bridge network https://docs.docker.com/network/overlay/#customize-the-docker_gwbridge-interface
so to make it simple to setup this, but still the main problem is the missing support in the overlay network. So options are not there, because those would be ignored, and dockerd will rewrite rules if modified from outside.

I have filed a feature request for proxy protocol support to solve the
issue in this bug.

Just in case anyone wants to add their comments.

https://github.com/moby/moby/issues/39465

On Wed, 10 Apr, 2019, 21:37 Daniele Cruciani, notifications@github.com
wrote:

@port22 https://github.com/port22 I got your point, but docker manage
its networks by itself, I tried to make it works with shorewall, but the
only way is to create exceptions for docker rules/chains, and I had no
success with docker swarm mode (but it is ok for docker in swarm mode, as
far I disable all services but the ones running into the swarm)
Maybe there should be options like there are for bridge network
https://docs.docker.com/network/overlay/#customize-the-docker_gwbridge-interface
so to make it simple to setup this, but still the main problem is the
missing support in the overlay network. So options are not there, because
those would be ignored, and dockerd will rewrite rules if modified from
outside.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/moby/moby/issues/25526#issuecomment-481754635, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAEsUxsVQ7m9uiYbHhNKMMtkhTZV6iTNks5vfgwygaJpZM4Jf2WK
.

After 3 years, no fix?

I'm also having the same problem but with haproxy. Though it's ok to have proxy servers in host mode and HA using keepalived, the only missing part would be load balancing that I think is not much of an issue for a simple web proxy. Unless complicated scripts are included or proxy and backend are not on the same physical machine and network traffic is too high for one NIC and...

So is it really not possible to see the source IP address of a request from outside a Docker Swarm rather than the internal overlay network private address? Still?

@thaJeztah Can someone on the Docker Inc team update us on the status of this issue. Is it still being considered and/or worked on ? Any ETA ? Or is this completely ignored since Docker integration with Kubernetes ? It has been reported almost 3 years ago :/

@thaJeztah https://github.com/thaJeztah Can someone on the Docker Inc
team update us on the status of this issue. Is it still being considered
and/or worked on ? Any ETA ? Or is this completely ignored since Docker
integration with Kubernetes ? It has been reported almost 3 years ago :/

It would really be good to get this statement ("won't fix") so I can fully
justify a migration to kubernetes. Such a shame.

Thanks.

>

there's a proposed enhancement request which should fix this - https://github.com/moby/moby/issues/39465

please do add your thoughts and comments there

I've already been commenting on that issue :-)

This one has been a blocker for me for some time. I need to pass through the IP addresses and after much search (wow almost 3 years of searching along with he others in this thread...) have not yet found any solution that is workable with swarm.

I have been unable to use swarm in production due to this issue and am awaiting an official answer if this can be added or not. If this is not being added alternative proposed solutions are welcomed.

We are running into the same issue using traefik behind haproxy. I was surpised seeing this has 254 comments since 2016.

@Betriebsrat Why not allowing traefik handle requests right away ? Is haproxy really necessary, or just a habit? If you expose traefik in host mode, you will see the client IP addresses, and then everything is fine :)

I believe this "solution" was mentioned several times, but people keep missing it.

I also know it is not an option sometimes, but I believe most of the time this should be possible.

@ajardan that solution I have tried and is not viable for me as I more than a single host to respond on the frontend. Ideally I want the entire swarm to be able to route the requests. I agree that for small scale operations simply flipping one service to host mode and using it as an ingest server can work fine.

Placing something like traefik in host mode negates the benefits we are trying to take advantage of from using swarm though in most cases :(

@pattonwebz Host mode can be enabled for a service running multiple containers on multiple hosts, you can even do that with mode=global. Then traefik will run on all your swarm nodes and accept connections to specified ports, then route the requests internally to services that need to see these connections.

I used this setup with a service in global mode but limited to manager nodes, and it was working perfectly fine for tens of thousands of requests/s

I would be happy to elaborate if more details are required.

@pattonwebz @ajardan I am using a configurable haproxy service for all these cases. haproxy uses just 2 MB of RAM in my case. I think that is negligible.

@pattonwebz In addition to @ajardan's solution above, you can run https://hub.docker.com/r/decentralize/swarm-tcp-proxy in global mode with host networking to add PROXY protocol support to the inbound traffic, and then forward it to Traefik configured to decode the proxy protocol headers.

It should just be a flag as part of Docker Swarm proper, not all these
convoluted solutions IMHO.

We just use haproxy to manage certs and offload ssl.
People keep missing that the solution "running is host mode" is not a solution.
They want it working with the ingress network to make advantage of the docker load balancing.
The whole thread is basically a 'use hostmode' -> 'not possible becauses "reasons"' circle which is going for 3 years now.

I will look at swarm-tcp-proxy as a viable alternative here again however looking at similar things in the past something always wound up being a deal beaker for me with such ideas.

In a perfect world my existing (and working well with exception of no ability to retrieve the real client IP) swarm would just be working and passing through the IP data without need for any additional service layers or more proxies over proxies.

People keep missing that the solution "running is host mode" is not a solution.

It's not a solution by itself, but can be used (and is being used) very successfully as a workaround. You can still use Docker's native load balancer - all you're doing is adding a layer to the host network stack before you hit Docker's service mesh.

@Betriebsrat traefik can do certificates and SSL very well, so I am still not sure why is it required.

Also as mentioned by @matthanley before, the docker load balancing doesn't go away. If you don't like how traefik balances your requests in the backend, you can instruct it to use swarm's LB, and it will just send requests to the service VIP, where swarm will take care of it later on.

This is configurable per service even, so you are quite flexible.

You can try to set another Nginx server outside the docker swarm cluster, and forward request to the swarm service. in this Niginx conf just add the forward headers. eg.
location / {
proxy_pass http://phpestate;

    #Proxy Settings
    proxy_redirect     off;
    proxy_set_header   Host             $host;
    proxy_set_header   X-Real-IP        $remote_addr;
    proxy_set_header   X-Forwarded-For  $proxy_add_x_forwarded_for;
    proxy_next_upstream error timeout invalid_header http_500 http_502 http_503 http_504;

It seems there is no solution to get real client ip in the docker swarm mode.

We saw the same issue and worked around it by implementing:
https://github.com/moby/moby/issues/25526#issuecomment-475083415

It's a non-ideal solution since we can't run multiple ingress containers on a single node (guess they're global now)

The difficulty is Docker deals in TCP/UDP, while this is a HTTP protocol issue. At minimum I wish docker would "fake" the source IP as the remote host vs giving it's own internal IP from the Swarm Mesh... but that would likely break things since return traffic would be going to the wrong place.

The easiest way would be to add the header for the original IP for every http request.

Correct. Just to be specific - as a proxy protocol header which works on l4
and l7 and is accepted by most known application software (as well as the
big cloud providers).

I have filed a separate bug for that , which is linked a few comments
above. Do add to that bug if youre interested

On Thu, 5 Sep, 2019, 18:56 Vladimir, notifications@github.com wrote:

The easiest way would be to add the header for the original IP for every
http request.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/moby/moby/issues/25526?email_source=notifications&email_token=AAASYU7APUNJPLZ6AJ6XXMDQIECIJA5CNFSM4CL7MWFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD57COLA#issuecomment-528361260,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAASYU4VZGKUFLL5STZ44GDQIECIJANCNFSM4CL7MWFA
.

Its 2019 and this is still an issue?? It makes ip whitelisting on traefik a pain. I shouldn't need host ports on every node.

@kaysond our position has been to give up on Swarm. We have been moving to AWS and ECS. I'm just sorry I cannot post something more constructive but ultimately we need something that works; this is not the only major Swarm bug (or lack of feature) affecting us and other receiving no apparent fix/feedback over recent years. Most disappointing, but there.

@jmkgreen we are in the same position and have spent the last 6+ months moving away from docker swarm to other things because of this problem still ongoing. I have already put dozens of hours myself and hundreds of hours of team member times into this without ever finding an acceptable workaround. Binding to all the host ports totally defeats the purpose of our floating LBs :(

What is your problem with the workaround ? You declare your service on host mode + global and setup your LB for hitting all the nodes, it works. Because the proxy is lightweight (I use nginx because I do https offloading and other stuff), the fact that it's deployed on every server is not a problem it uses less than 1% of a server resource. I can help you if you encounter any error during the process ([email protected]).

What is your problem with the workaround ? You declare your service on host mode + global and setup your LB for hitting all the nodes, it works.

@RemiBou When the proxy itself needs to be updated/restarted, the external load balancer doesn't immediately detect the outage and keeps sending requests to node(s) where the proxy is still restarting. So there's a ~30-second outage depending on external LB configuration.

There's also no way in Swarm to put a hook into the service update process to call the external load balancer and take a node out of service during the update. You also can't trigger a script to run inside the container before it's updated (for example, to remove an "i_am_healthy" flag and let the external LB discover it's going out of service through polling).

What is your problem with the workaround ?

My problem is that with that workaround it's impossible for me to run several of the same service (or several services that want the same ports) on the host. That's a need for projects I work on.

Indeed but can't you deploy a proxy service that would do only this and then when the ip is inside the swarm.you can forward it as http header to your other services ?

Indeed but can't you deploy a proxy service that would do only this and then when the ip is inside the swarm.you can forward it as http header to your other services ?

Yes... and as long as that thin proxy service never needs to be reconfigured or updated, it's possible to update the components behind it using the Swarm LB to avoid downtime.

Someone pointed at https://hub.docker.com/r/decentralize/swarm-tcp-proxy which uses haproxy to get it done.

Kind of a pain though. And if you have to update the proxy you still have downtime.

@ms1111 Nginx docker image startup in a few second, and if this service only manages this part then you won't need to update it often. IMHO the downside is not that much important but it might be different on your case

What is your problem with the workaround ?

In our case, it's the combination of this workaround with the inability to bind a host-exposed port to a specific IP address. Instead, all internal services that need the real visitor's IP and support PROXY protocol, have their port exposed on 0.0.0.0 on the host which is less than optimal.

Another one is the non-negligible performance hit when you have hundreds of new connections per second - all the exposed ports are actually DNAT rules in iptables that require conntrack and have other problems (hits k8s too, but Swarm has this addidional level of NATs that make it worse).

To Docker,

Wake up! There is an obvious problem given how many people are involved in this issue (there are others with the same cause). All we're getting are people who repeat over and over again that there is a workaround, even though it's been explained quite a few times why that workaround is not a solution. The very word "workaround" indicates that it is a temporary thing that will be resolved later. It's been over 3 years since the issue was created and for all that time the response is "there is a workaround".

To all Swarm users,

Let's be realistic. The sad truth is that no one, including Docker, truly cares about Swarm. Everyone moved to k8s and there are no "real" investments in Swarm. The project is on life-support waiting to die so do not expect this issue to be fixed. Be smart and move to k8s.

This issue seems to be ignored for far too long. It doesn't seem to ever going to be implemented. Just cut to the chase and use k8s.

@leojonathanoh can you please elaborate how exactly does k8s solve this particular issue :)?

Simple: proxy protocol

@ajatkj As said. Or, if that is not possible, then an external load balancer and externalTrafficPolicy: Local on the Service resource. That's all I will say here. And I'm unsubscribing from the thread.

Why do people expect that other people will do the work for them?

I'd love to be the hero and take care of this, but the reality is I'm working on many other things and this has no effect on my day to day. Does this affect your day to day? We'd love some help getting this resolved!

I've also looked at this multiple times and it really doesn't seem like there is a way to make this work with IPVS NAT, which is what the magical swarm routing is using.

I agree that k8s is much more flexible here. If it suits your needs better then use it.
Complaining that it's not fixed and then threatening to switch to k8s really has no place in our issue tracker and is just generally unhelpful.

People helps with the knowledge they have. Not all have the skill set to change the code themselves, so they create issues like this to help achieving a consensus about the necessary change.

No one here is arguing that you specifically has to make the changes, but neven on the issues opened by @sandys about the proxy protocol the core team agreed on the changes. So how can someone work on this if they don't know if the change will be accepted?

The best way is to come up with a proposal, ie. what do you expect the architecture to look like after the work is done. What does it bring? What do we lose?

The best way is to come up with a proposal, ie. what do you expect the architecture to look like after the work is done. What does it bring? What do we lose?

Already done here: #39465

try host-mode-networking

Please read the whole thread before commenting

"Use the proxy protocol", while indeed interesting does not lay out what
changes need to be made to the code base.

Maybe this is a naive question, but why is it necessary to rewrite the source ip to begin with? Wouldn't the traffic be returned via the interface's default gateway anyways? Even if it came via the swarm load balancer, the gateway could just return it via the load balancer which already knows where the traffic came from...

Maybe this is a naive question, but why is it necessary to rewrite the source ip to begin with? Wouldn't the traffic be returned via the interface's default gateway anyways? Even if it came via the swarm load balancer, the gateway could just return it via the load balancer which already knows where the traffic came from...

It is necessary to know from which IP is coming the request. Maybe a specific user want to limit the ip, and you can not do it outer of the service running, i.e. traefik do not know the content of the request that may specify which user is making it, so it can not exclude some user and accepts other based only on ip (because the policy in this example is ip + request-content => allow/disallow).

Or, more often, just for logging connection. I need to bill customer for my service usage, and I need to provide in tabular form: time of request, amount of resource, source IP of request. Almost every service billed provide this kind of report.

I think you misunderstood my question. I understand why services would want to see the true source ip. I want to know why Docker changes it before it gets to a container

On Nov 1, 2019, 1:47 AM, at 1:47 AM, Daniele Cruciani notifications@github.com wrote:

Maybe this is a naive question, but why is it necessary to rewrite
the source ip to begin with? Wouldn't the traffic be returned via the
interface's default gateway anyways? Even if it came via the swarm load
balancer, the gateway could just return it via the load balancer which
already knows where the traffic came from...

It is necessary to know from which IP is coming the request. Maybe a
specific user want to limit the ip, and you can not do it outer of the
service running, i.e. traefik do not know the content of the request
that may specify which user is making it, so it can not exclude some
user and accepts other based only on ip (because the policy in this
example is ip + request-content => allow/disallow).

Or, more often, just for logging connection. I need to bill customer
for my service usage, and I need to provide in tabular form: time of
request, amount of resource, source IP of request. Almost every service
billed provide this kind of report.

--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/moby/moby/issues/25526#issuecomment-548711563

@kaysond Not a good place to ask.

Your are essentially asking two questions,

  1. How IPVS works technically, and
  2. Why libnetwork choose IPVS to start with

Both of them is hard to answer in different ways.

I wonder where the best place ask these questions is because I am now very intrigued to read the history of those choices and how it all works so I can get some more context here.

@kaysond Not a good place to ask.

Your are essentially asking two questions,

  1. How IPVS works technically, and
  2. Why libnetwork choose IPVS to start with

Both of them is hard to answer in different ways.

any update?

I've been following this thread for a while now because I stumbled upon the same issue, but after spinning up a few whoami containers in swarm behind traefik I saw it was working. The thing was we were behind cloudflare and had to get the CF forwarded headers. (yes we use ipvs and our services are replicated in swarm).

Just tried this again with with:

Client: Docker Engine - Community
 Version:           19.03.5
 API version:       1.40
 Go version:        go1.12.12
 Git commit:        633a0ea838
 Built:             Wed Nov 13 07:29:52 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.5
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.12
  Git commit:       633a0ea838
  Built:            Wed Nov 13 07:28:22 2019
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.10
  GitCommit:        b34a5c8af56e510852c35414db4c1f4fa6172339
 runc:
  Version:          1.0.0-rc8+dev
  GitCommit:        3e425f80a8c931f88e6d94a8c831b9d5aa481657
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

and the following docker compose:

version: "3.3"

services:

  traefik:
    image: "traefik:v2.0.0-rc3"
    container_name: "traefik"
    command:
      #- "--log.level=DEBUG"
      - "--api.insecure=true"
      - "--providers.docker=true"
      - "--providers.docker.swarmMode=true"
      - "--providers.docker.endpoint=unix:///var/run/docker.sock"
      - "--providers.docker.exposedbydefault=false"
      - "--entrypoints.web.address=:80"
    ports:
      - "80:80"
      - "8080:8080"
    volumes:
      - "/var/run/docker.sock:/var/run/docker.sock:ro"

  whoami:
    image: "containous/whoami"
    container_name: "simple-service"
    deploy:
      labels:
        - "traefik.enable=true"
        - "traefik.http.routers.whoami.rule=HostRegexp(`{any:.*}`)"
        - "traefik.http.routers.whoami.entrypoints=web"
        - "traefik.http.services.whoami.loadbalancer.server.port=80"

whoami output was:

Hostname: 085c373eb06d
IP: 127.0.0.1
IP: 10.0.1.10
IP: 172.19.0.4
RemoteAddr: 10.0.1.11:51888
GET / HTTP/1.1
Host: testserver.nub.local
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.5
Dnt: 1
Upgrade-Insecure-Requests: 1
X-Forwarded-For: 10.0.0.2
X-Forwarded-Host: testserver.nub.local
X-Forwarded-Port: 80
X-Forwarded-Proto: http
X-Forwarded-Server: ad14e372f6e9
X-Real-Ip: 10.0.0.2

So no. it still doesnt work

Out of curiosity.... can some dev point me to the code that manages swarm networking?

Just tried this again with with:

Client: Docker Engine - Community
 Version:           19.03.5
 API version:       1.40
 Go version:        go1.12.12
 Git commit:        633a0ea838
 Built:             Wed Nov 13 07:29:52 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.5
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.12
  Git commit:       633a0ea838
  Built:            Wed Nov 13 07:28:22 2019
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.10
  GitCommit:        b34a5c8af56e510852c35414db4c1f4fa6172339
 runc:
  Version:          1.0.0-rc8+dev
  GitCommit:        3e425f80a8c931f88e6d94a8c831b9d5aa481657
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

and the following docker compose:

version: "3.3"

services:

  traefik:
    image: "traefik:v2.0.0-rc3"
    container_name: "traefik"
    command:
      #- "--log.level=DEBUG"
      - "--api.insecure=true"
      - "--providers.docker=true"
      - "--providers.docker.swarmMode=true"
      - "--providers.docker.endpoint=unix:///var/run/docker.sock"
      - "--providers.docker.exposedbydefault=false"
      - "--entrypoints.web.address=:80"
    ports:
      - "80:80"
      - "8080:8080"
    volumes:
      - "/var/run/docker.sock:/var/run/docker.sock:ro"

  whoami:
    image: "containous/whoami"
    container_name: "simple-service"
    deploy:
      labels:
        - "traefik.enable=true"
        - "traefik.http.routers.whoami.rule=HostRegexp(`{any:.*}`)"
        - "traefik.http.routers.whoami.entrypoints=web"
        - "traefik.http.services.whoami.loadbalancer.server.port=80"

whoami output was:

Hostname: 085c373eb06d
IP: 127.0.0.1
IP: 10.0.1.10
IP: 172.19.0.4
RemoteAddr: 10.0.1.11:51888
GET / HTTP/1.1
Host: testserver.nub.local
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.5
Dnt: 1
Upgrade-Insecure-Requests: 1
X-Forwarded-For: 10.0.0.2
X-Forwarded-Host: testserver.nub.local
X-Forwarded-Port: 80
X-Forwarded-Proto: http
X-Forwarded-Server: ad14e372f6e9
X-Real-Ip: 10.0.0.2

So no. it still doesnt work

you can use traefik by host mode to get real ip

ports:
      - target: 80
        published: 80
        mode: host
      - target: 443
        published: 443
        mode: host

still opened?
2020-05-08

still opened?
2020-05-08

Yup, still opened. There are architectural issues noted in the thread that highlight why this cannot be resolved as easily as it seems like it should be on the surface. At this point it's likely those issues probably can't be overcome.

If you need to get real user IP then some alternatives are posted in the thread here which may be suitable. HOST mode for services seems the simplest approach but that is not suitable for some that need scalability on individual nodes.

We have had success using the PROXY protocol with DigitalOcean LB -> Traefik -> Apache container. The Apache container was able to log the real IPs of the users hitting the service. Theoretically should work as long as all the proxy layers support PROXY protocol.

https://docs.traefik.io/v1.7/configuration/entrypoints/#proxyprotocol

The Traefik service is on one Docker network named 'ingress', the Apache service has its own stack network but is also part of the 'ingress' network as external.

https://autoize.com/logging-client-ip-addresses-behind-a-proxy-with-docker/

2020 and still not fixed, what a drag. seems like a very important feature

This is very needed. Put some host mode is just a patch, sometimes it is necessary to run NGINX behind the network (depending on the use and the setup). Please fix this.

i think a workaround for this and to have a docker swarm run without setting host is to get the IP on the client-side. ex. using js for web and mobile clients and only accept from trusted sources. ex. js -> get ip, backend only accepts ips that include user-token or etc. ip can be set in the header and encrypted through https. however, i don't know about performance

@Damidara16 that's exactly what we don't want to do. Is really insecure to do that. You can bypass it as you want.

To bad this is still an open issue , sadly ... it doesn't look like it's going to be fixed soon

To bad this is still an open issue , sadly ... it doesn't look like it's going to be fixed soon

I think it will be closed by the bot soon. Since github launched this feature, many bugs can be ignored.

To bad this is still an open issue , sadly ... it doesn't look like it's going to be fixed soon

I think it will be closed by the bot soon. Since github launched this feature, many bugs can be ignored.

This is the best feature for enterprises' bloated teams to gain control of the community.

There is very little chance this is going to be fixed ever. AFAIK everyone considers k8s won the "race" and swarm is not needed, but I would say both can co-exist and be properly used depending on the necessities and skills of the team using these. RIP swarm :)

I use a managed HAIP, but you could use something else in front of the swarm, a standalone nginx load balancer that points to the IPs of your swarm.
https://docs.nginx.com/nginx/admin-guide/load-balancer/http-load-balancer/

In your swarm, the reverse proxy needs this:

server {
        listen 443 ssl proxy_protocol;
        location / {
        proxy_set_header   X-Real-IP $proxy_protocol_addr;  # this is the real IP address 

If you are running a swarm, you will need a load balancer to round-robin the requests to your swarm (or sticky, etc).

So far, this architectural decision may seem like a "missing piece", however, this adds flexibility by providing options and removing the need to disable inbuilt functionality to replace it for something more suitable to the application needs.

I believe I may have found a workaround for this issue, with the _current_ limitation that service container replicas must all be deployed to a single node, for example with --constraint-add='node.hostname==mynode', or with a set of swarms each consisting of a single node.

The problem

The underlying problem is caused by the SNAT rule in the iptables nat table in the ingress_sbox namespace, which causes all incoming requests to be seen by containers to have the node's IP address in the ingress network (e.g. 10.0.0.2, 10.0.0.3, ..., in the default ingress network configuration), e.g.:

iptables -t nat -A POSTROUTING -d 10.0.0.0/24 -m ipvs --ipvs -j SNAT --to-source 10.0.0.2

However, removing this SNAT rule means that while containers still receive incoming packets - now originating from the original source IP - outgoing packets sent back to the original source IP are sent via the container's default gateway, which is not on the same ingress network but on the docker_gwbridge network (e.g. 172.31.0.1), and those packets are then lost.

The workaround

So the workaround comprises: 1. removing (in fact, inhibiting) this SNAT rule in the ingress_sbox namespace; and 2. creating a policy routing rule for swarm service containers, that forces those outgoing packets back to the node's ingress network IP address it would have gone back to (e.g. 10.0.0.2); 3. automating the addition of the policy routing rules, so that every new service container has them promptly installed upon creation.

  1. To inhibit the SNAT rule, we create a rule earlier in the table that prevents the usual SNAT being reached:
nsenter --net=/var/run/docker/netns/ingress_sbox iptables -t nat -I POSTROUTING -d $INGRESS_SUBNET -m ipvs --ipvs -j ACCEPT

(We do it this way, rather than just deleting the existing SNAT rule, as docker seems to recreate the SNAT rule several times during the course of creating a service. This approach just supersedes that rule, which makes it more resilient).

  1. To create the container policy routing rule:
docker inspect -f '{{.State.Pid}}' <container-id>
nsenter -n -t $NID bash -c "ip route add table 1 default via 10.0.0.2 && ip rule add from 10.0.0.0/24 lookup 1 priority 32761"
  1. Finally, putting the above together with docker event we automate the process of modifying the SNAT rules, and watching for newly started containers, and adding the policy routing rules, via this ingress-routing-daemon script:
#!/bin/bash

# Ingress Routing Daemon
# Copyright © 2020 Struan Bartlett
# --------------------------------------------------------------------
# Permission is hereby granted, free of charge, to any person 
# obtaining a copy of this software and associated documentation files 
# (the "Software"), to deal in the Software without restriction, 
# including without limitation the rights to use, copy, modify, merge, 
# publish, distribute, sublicense, and/or sell copies of the Software, 
# and to permit persons to whom the Software is furnished to do so, 
# subject to the following conditions:
#
# The above copyright notice and this permission notice shall be 
# included in all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND 
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS 
# BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN 
# ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN 
# CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 
# SOFTWARE.
# --------------------------------------------------------------------
# Workaround for https://github.com/moby/moby/issues/25526

echo "Ingress Routing Daemon starting ..."

read INGRESS_SUBNET INGRESS_DEFAULT_GATEWAY \
  < <(docker inspect ingress --format '{{(index .IPAM.Config 0).Subnet}} {{index (split (index .Containers "ingress-sbox").IPv4Address "/") 0}}')

echo INGRESS_SUBNET=$INGRESS_SUBNET
echo INGRESS_DEFAULT_GATEWAY=$INGRESS_DEFAULT_GATEWAY

# Add a rule ahead of the ingress network SNAT rule, that will cause the SNAT rule to be skipped.
echo "Adding ingress_sbox iptables nat rule: iptables -t nat -I POSTROUTING -d $INGRESS_SUBNET -m ipvs --ipvs -j ACCEPT"
while nsenter --net=/var/run/docker/netns/ingress_sbox iptables -t nat -D POSTROUTING -d 10.0.0.0/24 -m ipvs --ipvs -j ACCEPT; do true; done 2>/dev/null
nsenter --net=/var/run/docker/netns/ingress_sbox iptables -t nat -I POSTROUTING -d $INGRESS_SUBNET -m ipvs --ipvs -j ACCEPT

# Watch for container start events, and configure policy routing rules on each container
# to ensure return path traffic from incoming connections is routed back via the correct interface.
docker events \
  --format '{{.ID}} {{index .Actor.Attributes "com.docker.swarm.service.name"}}' \
  --filter 'event=start' \
  --filter 'type=container' | \
  while read ID SERVICE
  do
    if [ -n "$SERVICE" ]; then

      NID=$(docker inspect -f '{{.State.Pid}}' $ID)
      echo "Container ID=$ID, NID=$NID, SERVICE=$SERVICE started: applying policy route."
      nsenter -n -t $NID bash -c "ip route add table 1 default via $INGRESS_DEFAULT_GATEWAY && ip rule add from $INGRESS_SUBNET lookup 1 priority 32761"
    fi
  done

Now, when requests arrive at the published ports for the single node, its containers will see the original IP address of the machine making the request.

Usage

Run the above ingress-routing-daemon as root on _each and every one_ of your swarm nodes _before_ creating your service. (If your service is already created, then ensure you scale it to 0 before scaling it back to a positive number of replicas.) The daemon will initialise iptables, detect when docker creates new containers, and apply new routing rules to each new container.

Testing, use-cases and limitations

The above has been tested using multiple replicas constrained to a single node on a service running on a multi-node swarm.

It has also been tested using multiple nodes, each with a separate per-node service constrained to that node, but this comes with the limitation that different published ports must be used for each per-node service. Still that might work for some use-cases.

The method should also work using multiple nodes, if each were configured as a single node in its own swarm. This carries the limitation that the docker swarms can no longer be used to distribute containers across nodes, however there could still be other administration benefits of using docker services, such as container replica and lifecycle management.

Improving the workaround to address further use-cases

With further development, this method should be capable of scaling to multiple nodes without the need for separate per-node services or splitting the swarm. I can think of two possible approaches: 1. Arranging for Docker, or a bespoke daemon, to remove all non-local IPs from each node's ipvsadm table. 2. Extending the policy routing rules to accommodate routing output packages back to the correct node.

For 1, we could poll ipvsadm -S -n to look for new IPs added to any service, check whether each is local, and remove any that aren't. This would allow each node to function as a load balancer for its own containers within the overall service, but without requests reaching one node being able to be forwarded to another. This would certainly satisfy my own use-case, where we have our own IPVS load balancer sitting in front of a set of servers, each running a web application, which we would like to replace with several load-balanced containerised instances of the same application, to allow us to roll out updates without losing a whole server.

For 2, we could use iptables to assign a per-node TOS in each node's ingress_sbox iptable (for example to the final byte of the node ingress network IP); then in the container, arrange to map the TOS value to a connection mark, and then from a connection mark to a firewall mark for outgoing packets, and for each firewall mark select a different routing table that routes the packets back to the originating node. The rules for this will be a bit clunky, but I imagine should scale fine to 2-16 nodes.

I hope the above comes in useful. I will also have a go at (2), and if I make progress will post a further update.

Below is an improved version of the ingress routing daemon, ingress-routing-daemon-v2, which extends the policy routing rule model to allow each container to route its output packets back to the correct node, without the need for SNAT.

The improved model

In addition to inhibiting the SNAT rule as per the previous model, the new model requires an iptables rule in the ingress_sbox namespace on each node you intend to use as an IPVS load-balancer endpoint (so normally your manager nodes, or a subset of those manager nodes), that assigns a per-node TOS value to all packets destined for any node in the ingress network. (We use the final byte of the node's ingress network IP.)

As the TOS value is stored within the packet, it can be read by the destination node to which the incoming request has been directed, and the packet has been sent.

Then in the container on the destination node, we arrange to map the TOS value on any incoming packets to a connection mark, using the same value.

Now, since outgoing packets on the same connection will have the same connection mark, we map the connection mark on any outgoing packets to a firewall mark, again using the same value.

Finally, a set of policy routing rules selects a different routing table, designed to route the outgoing packets back to the required load-balancer endpoint node, according to the firewall mark value.

Now, when client requests arrive at the published ports for any node in the swarm, the container (whether on the same and/or other nodes) to which the request is directed will see the original IP address of the client making the request, and be able to route the response back to the originating load-balancer node; which will, in turn, be able to route the response back to the client.

Usage

Setting up

Generate a value for INGRESS_NODE_GATEWAY_IPS specific to your swarm, by running ingress-routing-daemon-v2 as root on every one of your swarm's nodes _that you'd like to use as a load-balancer endpoint_ (normally only your manager nodes, or a subset of your manager nodes), noting the values shown for INGRESS_DEFAULT_GATEWAY. You only have to do this once, or whenever you add or remove nodes. Your INGRESS_NODE_GATEWAY_IPS should look like 10.0.0.2 10.0.0.3 10.0.0.4 10.0.0.5 (according to the subnet defined for the ingress network, and the number of nodes).

Running the daemon

Run INGRESS_NODE_GATEWAY_IPS="<Node Ingress IP List>" ingress-routing-daemon-v2 --install as root on _each and every one_ of your swarm's nodes (managers and workers) _before_ creating your service. (If your service is already created, then ensure you scale it to 0 before scaling it back to a positive number of replicas.) The daemon will initialise iptables, detect when docker creates new containers, and apply new routing rules to each new container.

If you need to restrict the daemon’s activities to a particular service, then modify [ -n "$SERVICE" ] to [ "$SERVICE" = "myservice" ].

Uninstalling iptables rules

Run ingress-routing-daemon-v2 --uninstall on each node.

Testing

The ingress-routing-daemon-v2 script has been tested with 8 replicas of a web service deployed to a four-node swarm.

Curl requests for the service, directed to any of the specified load-balanced endpoint node IPs, returned successful responses, and examination of the container logs showed the application saw the incoming requests as originating from the Curl client’s IP.

Limitations

As the TOS value can store an 8-bit number, this model can in principle support up to 256 load-balancer endpoint nodes.

However as the model requires every container be installed with one iptables mangle rule + one policy routing rule + one policy routing table per manager endpoint node, there might possibly be some performance degradation as the number of such endpoint nodes increases (although experience suggests this is unlikely to be noticeable with <= 16 load-balancer endpoint nodes on modern hardware).

If you add load-balancer endpoints nodes to your swarm - or want to start using existing manager nodes as load-balancer endpoints - you will need to tread carefully as existing containers will not be able to route traffic back to the new endpoint nodes. Try restarting INGRESS_NODE_GATEWAY_IPS="<Node Ingress IP List>" ingress-routing-daemon-v2 with the updated value for INGRESS_NODE_GATEWAY_IPS, then perform a rolling update of all containers, before using the new load-balancer endpoint.

Scope for native Docker integration

I’m not familiar with the Docker codebase, but I can’t see anything that ingress-routing-daemon-v2 does that couldn’t, in principle, be implemented by Docker natively, but I'll leave that for the Docker team to consider, or as an exercise for someone familiar with the Docker code.

The ingress routing daemon v2 script

Here is the new ingress-routing-daemon-v2 script.

#!/bin/bash

# Ingress Routing Daemon v2
# Copyright © 2020 Struan Bartlett
# ----------------------------------------------------------------------
# Permission is hereby granted, free of charge, to any person 
# obtaining a copy of this software and associated documentation files 
# (the "Software"), to deal in the Software without restriction, 
# including without limitation the rights to use, copy, modify, merge, 
# publish, distribute, sublicense, and/or sell copies of the Software, 
# and to permit persons to whom the Software is furnished to do so, 
# subject to the following conditions:
#
# The above copyright notice and this permission notice shall be 
# included in all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND 
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS 
# BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN 
# ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN 
# CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 
# SOFTWARE.
# ----------------------------------------------------------------------
# Workaround for https://github.com/moby/moby/issues/25526

if [ "$1" = "--install" ]; then
  INSTALL=1
elif [ "$1" = "--uninstall" ]; then
  INSTALL=0
else
  echo "Usage: $0 [--install|--uninstall]"
fi

echo
echo "  Dumping key variables..."

if [ "$INSTALL" = "1" ] && [ -z "$INGRESS_NODE_GATEWAY_IPS" ]; then
  echo "!!! ----------------------------------------------------------------------"
  echo "!!! WARNING: Using default INGRESS_NODE_GATEWAY_IPS"
  echo "!!! Please generate a list by noting the values shown"
  echo "!!! for INGRESS_DEFAULT_GATEWAY on each of your swarm nodes."
  echo "!!!"
  echo "!!! You only have to do this once, or whenever you add or remove nodes."
  echo "!!!"
  echo "!!! Then relaunch using:"
  echo "!!! INGRESS_NODE_GATEWAY_IPS=\"<Node Ingress IP List>\" $0 -x"
  echo "!!! ----------------------------------------------------------------------"
fi

read INGRESS_SUBNET INGRESS_DEFAULT_GATEWAY \
  < <(docker inspect ingress --format '{{(index .IPAM.Config 0).Subnet}} {{index (split (index .Containers "ingress-sbox").IPv4Address "/") 0}}')

echo "  - INGRESS_SUBNET=$INGRESS_SUBNET"
echo "  - INGRESS_DEFAULT_GATEWAY=$INGRESS_DEFAULT_GATEWAY"

# We need the final bytes of the IP addresses on the ingress network of every node
# i.e. We need the final byte of $INGRESS_DEFAULT_GATEWAY for every node in the swarm
# This shouldn't change except when nodes are added or removed from the swarm, so should be reasonably stable.
# You should configure this yourself, but for now let's assume we have 8 nodes with IPs in the INGRESS_SUBNET numbered x.x.x.2 ... x.x.x.9
if [ -z "$INGRESS_NODE_GATEWAY_IPS" ]; then
  INGRESS_NET=$(echo $INGRESS_DEFAULT_GATEWAY | cut -d'.' -f1,2,3)
  INGRESS_NODE_GATEWAY_IPS="$INGRESS_NET.2 $INGRESS_NET.3 $INGRESS_NET.4 $INGRESS_NET.5 $INGRESS_NET.6 $INGRESS_NET.7 $INGRESS_NET.8 $INGRESS_NET.9"
fi

echo "  - INGRESS_NODE_GATEWAY_IPS=\"$INGRESS_NODE_GATEWAY_IPS\""

# Create node ID from INGRESS_DEFAULT_GATEWAY final byte
NODE_ID=$(echo $INGRESS_DEFAULT_GATEWAY | cut -d'.' -f4)
echo "  - NODE_ID=$NODE_ID"

if [ -z "$INSTALL" ]; then
  echo
  echo "Ingress Routing Daemon v2 exiting."
  exit 0
fi

# Add a rule ahead of the ingress network SNAT rule, that will cause the SNAT rule to be skipped.
[ "$INSTALL" = "1" ] && echo "Adding ingress_sbox iptables nat rule: iptables -t nat -I POSTROUTING -d $INGRESS_SUBNET -m ipvs --ipvs -j ACCEPT"
while nsenter --net=/var/run/docker/netns/ingress_sbox iptables -t nat -D POSTROUTING -d 10.0.0.0/24 -m ipvs --ipvs -j ACCEPT; do true; done 2>/dev/null
[ "$INSTALL" = "1" ] && nsenter --net=/var/run/docker/netns/ingress_sbox iptables -t nat -I POSTROUTING -d $INGRESS_SUBNET -m ipvs --ipvs -j ACCEPT

# 1. Set TOS to NODE_ID in all outgoing packets to INGRESS_SUBNET
[ "$INSTALL" = "1" ] && echo "Adding ingress_sbox iptables mangle rule: iptables -t mangle -A POSTROUTING -d $INGRESS_SUBNET -j TOS --set-tos $NODE_ID/0xff"
while nsenter --net=/var/run/docker/netns/ingress_sbox iptables -t mangle -D POSTROUTING -d $INGRESS_SUBNET -j TOS --set-tos $NODE_ID/0xff; do true; done 2>/dev/null
[ "$INSTALL" = "1" ] && nsenter --net=/var/run/docker/netns/ingress_sbox iptables -t mangle -A POSTROUTING -d $INGRESS_SUBNET -j TOS --set-tos $NODE_ID/0xff

if [ "$INSTALL" = "0" ]; then
  echo
  echo "Ingress Routing Daemon v2 iptables rules uninstalled, exiting."
  exit 0
fi

echo "Ingress Routing Daemon v2 starting ..."

# Watch for container start events, and configure policy routing rules on each container
# to ensure return path traffic for incoming connections is routed back via the correct interface
# and to the correct node from which the incoming connection was received.
docker events \
  --format '{{.ID}} {{index .Actor.Attributes "com.docker.swarm.service.name"}}' \
  --filter 'event=start' \
  --filter 'type=container' | \
  while read ID SERVICE
  do
    if [ -n "$SERVICE" ]; then

      NID=$(docker inspect -f '{{.State.Pid}}' $ID)
      echo "Container ID=$ID, NID=$NID, SERVICE=$SERVICE started: applying policy routes."

      # 3. Map any connection mark on outgoing traffic to a firewall mark on the individual packets.
      nsenter -n -t $NID iptables -t mangle -A OUTPUT -p tcp -j CONNMARK --restore-mark

      for NODE_IP in $INGRESS_NODE_GATEWAY_IPS
      do
        NODE_ID=$(echo $NODE_IP | cut -d'.' -f4)

    # 2. Map the TOS value on any incoming packets to a connection mark, using the same value.
        nsenter -n -t $NID iptables -t mangle -A PREROUTING -m tos --tos $NODE_ID/0xff -j CONNMARK --set-xmark $NODE_ID/0xffffffff

    # 4. Select the correct routing table to use, according to the firewall mark on the outgoing packet.
        nsenter -n -t $NID ip rule add from $INGRESS_SUBNET fwmark $NODE_ID lookup $NODE_ID prio 32700

    # 5. Route outgoing traffic to the correct node's ingress network IP, according to its firewall mark
    #    (which in turn came from its connection mark, its TOS value, and ultimately its IP).
        nsenter -n -t $NID ip route add table $NODE_ID default via $NODE_IP dev eth0

      done

    fi
  done

Hello @struanb, I don't understand how the uninstall section works in your v2 script, is there something missing?

Hello @jrbecart. I hope not. Before iptables rules are installed, you'll see there are two while loops that delete any pre-existing rules, using iptables -D. This is a safety measure, in case the script is run with --install multiple times successively, without any intervening call with --uninstall.

As such, when the script is called with --uninstall, by the time the script exits those rules will have been removed, and new rules not yet added.

Hope this answers your question.

Hi everyone, I want to tell you that I discovered a fix to this issue, without installing and configuring nothing else than defining the NGINX config well. I know that all of us have tried different approaches. This one was discovered by mistake. To be honest, I gave up with this a long time ago. Well, until today. While I was implementing a monitoring system, I was able to get the source IP, the real source IP, using the NGINX log, so I began to debug how was that possible.

Here is an example of that kind of log

10.0.0.2 - - [19/Nov/2020:04:56:31 +0000] "GET / HTTP/1.1" 200 58 "-" req_t=0.003 upstream_t=0.004 "<browser-info>" "<source-ip-1,source-ip2,....>"

Note: There are multiple source IPs if you're using a proxies (i.e. Cloudfare and others).

The info was there, my real IP was there. Then, I reviewed the logging NGINX format to know how the magic was possible, and I found this:

log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      'req_t=$request_time upstream_t=$upstream_response_time '
                      '"$http_user_agent" "$http_x_forwarded_for"';

That's mean, the magic is here -> $http_x_forwarded_for

After this, I changed the proxy headers like proxy_set_header X-Real-IP $http_x_forwarded_for;.

And finally, the last test, using that information on a NodeJS project, inside the production like system, using Docker Swarm with an overlay network, with about 4 VMs, and guess what, it worked! I could finally get the real IP address.

I'm so happy because this issue has been opened for a long long time but I think this is the answer. The versions I used are:

Docker version: 19.03.8
NGINX version: nginx/1.14.2

I will wait for your feedback. I hope you can have the same results as me.

Cheers!
Sebastián.

P.S: Try this using another network interface, that means, outside localhost, because you will find a "-" in the log, instead of your real IP address. Try to test it along the internet, completely outside your home network.

Bonus: I also could map the IP address to a geolocation, using a lookup table, count them and put it on a map, so the answer is yes, this is what we were looking for guys :)

@sebastianfelipe that's a big claim after all these years. You sure you're not using host mode or other workarounds in this thread?

@sebastianfelipe that's a big claim after all these years. You sure you're not using host mode or other workarounds in this thread?

I'm sure. I'm not using network host on all those connected services. I just deployed a stack, with an overlay network in a production-like environment, including a Digital Ocean load balancer and it worked. I mean, I can't test it better than this. Is 100% real.

@sebastianfelipe I'm guessing the Digital Ocean load balancer is appending the user's IP address to the X-Forwarded-For header. This is a known workaround which doesn't solve the issue of retrieving the user's IP in standalone Docker Swarm mode.

@beornf I was trying to sleep and then I read your notification so I had to wake up and try an approach without a Digital Ocean load balancer and it failed. You're right, Digital Ocean add a magic there when a load balancer is added. This happens to $http_x_forwarded_for variable. Digital Ocean load balancer add info to another NGINX variable, info that is not added by Docker Swarm directly. Probably this could lead to a "dummy-like" approach to have a real solution for every case. At least Digital Ocean customers can be happy to know how to deal with this at this moment.

@beornf @sebastianfelipe Adding to the context, CloudFlare also adds X-Forwarded-For and is largely free.

@beornf @sebastianfelipe Adding to the context, CloudFlare also adds X-Forwarded-For and is largely free.

I think this could work for a lot of us that need a way out to get the real IP. Cloudfare can be adjusted as proxy or just DNS only. It fits perfectly for no Digital Ocean customers. It is the cleaner workaround until now. But I agree with @beornf, we need a real solution, without depending on Digital Ocean or Cloudfare to get this done.

Thanks!

Was this page helpful?
0 / 5 - 0 ratings