Moby: Can't stop docker container

Created on 4 Jan 2018  ·  146Comments  ·  Source: moby/moby


Can't stop container.

I'm starting and removing containers concurrently using docker-compose.
Sometimes it fails to remove the containers.

I checked that I can't docker stop the container. The command hangs and after change docker daemon to debug I just see this line when I run the command.
dockerd[101922]: time="2018-01-04T15:54:07.406980654Z" level=debug msg="Calling POST /v1.35/containers/4c2b5e7f466c/stop"

Steps to reproduce the issue:

  1. Run tests in jenkins
  2. Eventually it fails to remove containers.

Describe the results you received:

Can't stop container.

Describe the results you expected:

Container should have been stopped. And then removed.

Additional information you deem important (e.g. issue happens only occasionally):

Issue happens only occasionally

Output of docker version:

 Version:   17.12.0-ce
 API version:   1.35
 Go version:    go1.9.2
 Git commit:    c97c6d6
 Built: Wed Dec 27 20:10:14 2017
 OS/Arch:   linux/amd64

  Version:  17.12.0-ce
  API version:  1.35 (minimum version 1.12)
  Go version:   go1.9.2
  Git commit:   c97c6d6
  Built:    Wed Dec 27 20:12:46 2017
  OS/Arch:  linux/amd64
  Experimental: false

Output of docker info:

Containers: 6
 Running: 1
 Paused: 0
 Stopped: 5
Images: 75
Server Version: 17.12.0-ce
Storage Driver: devicemapper
 Pool Name: docker-253:0-33643212-pool
 Pool Blocksize: 65.54kB
 Base Device Size: 10.74GB
 Backing Filesystem: xfs
 Udev Sync Supported: true
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Data Space Used: 31.43GB
 Data Space Total: 107.4GB
 Data Space Available: 75.95GB
 Metadata Space Used: 35.81MB
 Metadata Space Total: 2.147GB
 Metadata Space Available: 2.112GB
 Thin Pool Minimum Free Space: 10.74GB
 Deferred Removal Enabled: true
 Deferred Deletion Enabled: true
 Deferred Deleted Device Count: 1
 Library Version: 1.02.140-RHEL7 (2017-05-03)
Logging Driver: json-file
Cgroup Driver: cgroupfs
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 89623f28b87a6004d4b785663257362d1658a729
runc version: b2567b37d7b75eb4cf325b77297b140ea686ce8f
init version: 949e6fa
Security Options:
  Profile: default
Kernel Version: 3.10.0-693.11.1.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 36
Total Memory: 117.9GiB
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 37
 Goroutines: 51
 System Time: 2018-01-04T16:02:36.54459153Z
 EventsListeners: 0
Experimental: false
Insecure Registries:
Live Restore Enabled: false

WARNING: devicemapper: usage of loopback devices is strongly discouraged for production use.
         Use `--storage-opt dm.thinpooldev` to specify a custom block storage device.

areruntime kinbug statumore-info-needed statuneeds-attention versio17.12

Most helpful comment

I get the same issue, though without using docker-compose. I'm using docker swarm. Same thing though, I occasionally get containers that neither docker swarm nor I with the docker CLI can stop. This causes docker swarm to end up collecting more replicas than desired that it can't scale down. Sometimes these replicas can still service requests and receive traffic. The only way to remove the containers is to restart docker on the effected node.

All 146 comments

This really needs more information, and reproduction steps

dockerd[101922]: time="2018-01-04T15:54:07.406980654Z" level=debug msg="Calling POST /v1.35/containers/4c2b5e7f466c/stop"
  • The message above is only showing that the call was made to stop the container; are there any messages after that?
  • How is docker setup? Are you running docker-in-docker?
  • Have you verified the container is still running? What does docker inspect of the container show? Is there a PID in the output? And is that process still running? (ps auxf on the host)
  • Can you reproduce the issue without Jenkins? Can you provide exact steps to reproduce?

No more messages are logged.

Meanwhile, I made some more tests and after the container enters in this state:

  • I can't stop the container
  • I can't docker exec to bash.
  • I can start and stop other containers

To exit this state I have to:

  • service docker stop
  • kill container processes, (if not docker doesn't start)
  • service docker start

I think I have reproduced this outside Jenkins one time but thought it was another problem.

As with Jenkins, it is easier to reproduce this, I'll wait for next time to do docker inspect container.

The setup (running in CENTOS vm):

  • Starting containers with certain images using docker-compose up
  • Performing some tests using the applications started in containers
  • Stopping containers using docker-compose down

And these steps are done for each test, and I'm running tests concurrently.

I'm sorry for not giving more information, but this is what I could collect so far.

I got a similar problem _now_ with _different docker version_. I can't stop any container that is created.

And this is logged for all containers.

```Jan 08 16:53:10 dockerd[7012]: time="2018-01-08T16:53:10.983935134Z" level=debug msg="Calling POST /v1.34/containers/9cdc36c44340/stop"
Jan 08 16:53:10 dockerd[7012]: time="2018-01-08T16:53:10.984024605Z" level=debug msg="Sending kill signal 15 to container 9cdc36c44340cd23a5cbfb884c1fab4d47b173552dd992f392d4398603b46a94"
Jan 08 16:53:12 dockerd[7012]: time="2018-01-08T16:53:12.985034572Z" level=info msg="Container failed to stop after sending signal 15 to the process, force killing"
Jan 08 16:53:12 dockerd[7012]: time="2018-01-08T16:53:12.985087603Z" level=debug msg="Sending kill signal 9 to container 9cdc36c44340cd23a5cbfb884c1fab4d47b173552dd992f392d4398603b46a94"
Jan 08 16:53:12 dockerd[7012]: time="2018-01-08T16:53:12.986759908Z" level=debug msg="FIXME: Got an API for which error does not match any expected type!!!: not found\\n\t/go/src/\\n\t/go/src/\\n\t/go/src/\\n\t/go/src/\\n\t/go/src/\\n\t/go/src/\nmain.init\n\t/go/src/\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:173\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:2197" error_type="errors.fundamental" module=api
Jan 08 16:53:12 dockerd[7012]: time="2018-01-08T16:53:12.986856140Z" level=error msg="Handler for POST /v1.34/containers/9cdc36c44340/stop returned error: cannot stop container: 9cdc36c44340: Cannot kill container 9cdc36c44340cd23a5cbfb884c1fab4d47b173552dd992f392d4398603b46a94: process 9cdc36c44340cd23a5cbfb884c1fab4d47b173552dd992f392d4398603b46a94 not found: not found"
Jan 08 16:53:12 dockerd[7012]: time="2018-01-08T16:53:12.987051906Z" level=debug msg="FIXME: Got an API for which error does not match any expected type!!!: not found\\n\t/go/src/\\n\t/go/src/\\n\t/go/src/\\n\t/go/src/\\n\t/go/src/\\n\t/go/src/\nmain.init\n\t/go/src/\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:173\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:2197" error_type="
errors.fundamental" module=api

**docker-info :** 
```Containers: 6
 Running: 6
 Paused: 0
 Stopped: 0
Images: 61
Server Version: 17.11.0-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 992280e8e265f491f7a624ab82f3e238be086e49
runc version: 0351df1c5a66838d0c392b4ac4cf9450de844e2d
init version: 949e6fa
Security Options:
  Profile: default
Kernel Version: 4.10.0-42-generic
Operating System: Ubuntu 16.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 31GiB
Name: Laptop-749
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 26
 Goroutines: 48
 System Time: 2018-01-08T16:58:47.457072503Z
 EventsListeners: 0
Experimental: false
Insecure Registries:
Live Restore Enabled: false

WARNING: No swap limit support

docker version:

Version: 17.11.0-ce
API version: 1.34
Go version: go1.8.3
Git commit: 1caf76c
Built: Mon Nov 20 18:37:39 2017
OS/Arch: linux/amd64

Version: 17.11.0-ce
API version: 1.34 (minimum version 1.12)
Go version: go1.8.3
Git commit: 1caf76c
Built: Mon Nov 20 18:36:09 2017
OS/Arch: linux/amd64
Experimental: false

**docker inspect** : 
        "Id": "9cdc36c44340cd23a5cbfb884c1fab4d47b173552dd992f392d4398603b46a94",
        "Created": "2018-01-08T16:32:30.716158282Z",
        "Path": "/opt/",
        "Args": [],
        "State": {
            "Status": "running",
            "Running": true,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 477,
            "ExitCode": 0,
            "Error": "",
            "StartedAt": "2018-01-08T16:32:31.370353796Z",
            "FinishedAt": "0001-01-01T00:00:00Z",
            "Health": {
                "Status": "healthy",
                "FailingStreak": 0,
                "Log": [
                        "Start": "2018-01-08T16:40:52.760255527Z",
                        "End": "2018-01-08T16:40:52.814916997Z",
                        "ExitCode": 0,
                        "Output": ""
                        "Start": "2018-01-08T16:41:12.821209911Z",
                        "End": "2018-01-08T16:41:12.872327217Z",
                        "ExitCode": 0,
                        "Output": ""
                        "Start": "2018-01-08T16:41:32.879017542Z",
                        "End": "2018-01-08T16:41:32.932394782Z",
                        "ExitCode": 0,
                        "Output": ""
                        "Start": "2018-01-08T16:41:52.938598813Z",
                        "End": "2018-01-08T16:41:52.993106466Z",
                        "ExitCode": 0,
                        "Output": ""
                        "Start": "2018-01-08T16:42:12.998820005Z",
                        "End": "2018-01-08T16:42:13.056301771Z",
                        "ExitCode": 0,
                        "Output": ""
        "Image": "sha256:71843cc0ac81d2a365553dd5b69f6643dab212fd8b45d498c6a92614352ed75f",
        "ResolvConfPath": "/var/lib/docker/containers/9cdc36c44340cd23a5cbfb884c1fab4d47b173552dd992f392d4398603b46a94/resolv.conf",
        "HostnamePath": "/var/lib/docker/containers/9cdc36c44340cd23a5cbfb884c1fab4d47b173552dd992f392d4398603b46a94/hostname",
        "HostsPath": "/var/lib/docker/containers/9cdc36c44340cd23a5cbfb884c1fab4d47b173552dd992f392d4398603b46a94/hosts",
        "LogPath": "/var/lib/docker/containers/9cdc36c44340cd23a5cbfb884c1fab4d47b173552dd992f392d4398603b46a94/9cdc36c44340cd23a5cbfb884c1fab4d47b173552dd992f392d4398603b46a94-json.log",
        "Name": "/kegfngsmzx_component_1",
        "RestartCount": 0,
        "Driver": "overlay2",
        "Platform": "linux",
        "MountLabel": "",
        "ProcessLabel": "",
        "AppArmorProfile": "docker-default",
        "ExecIDs": null,
        "HostConfig": {
            "Binds": [
            "ContainerIDFile": "",
            "LogConfig": {
                "Type": "json-file",
                "Config": {}
            "NetworkMode": "kegfngsmzx_default",
            "PortBindings": {
                "1099/tcp": [
                        "HostIp": "",
                        "HostPort": ""
                "7000/tcp": [
                        "HostIp": "",
                        "HostPort": ""
                "8080/tcp": [
                        "HostIp": "",
                        "HostPort": ""
            "RestartPolicy": {
                "Name": "",
                "MaximumRetryCount": 0
            "AutoRemove": false,
            "VolumeDriver": "",
            "VolumesFrom": [],
            "CapAdd": null,
            "CapDrop": null,
            "Dns": [],
            "DnsOptions": [],
            "DnsSearch": [],
            "ExtraHosts": null,
            "GroupAdd": null,
            "IpcMode": "shareable",
            "Cgroup": "",
            "Links": null,
            "OomScoreAdj": 0,
            "PidMode": "",
            "Privileged": false,
            "PublishAllPorts": false,
            "ReadonlyRootfs": false,
            "SecurityOpt": null,
            "UTSMode": "",
            "UsernsMode": "",
            "ShmSize": 67108864,
            "Runtime": "runc",
            "ConsoleSize": [
            "Isolation": "",
            "CpuShares": 0,
            "Memory": 0,
            "NanoCpus": 0,
            "CgroupParent": "",
            "BlkioWeight": 0,
            "BlkioWeightDevice": null,
            "BlkioDeviceReadBps": null,
            "BlkioDeviceWriteBps": null,
            "BlkioDeviceReadIOps": null,
            "BlkioDeviceWriteIOps": null,
            "CpuPeriod": 0,
            "CpuQuota": 0,
            "CpuRealtimePeriod": 0,
            "CpuRealtimeRuntime": 0,
            "CpusetCpus": "",
            "CpusetMems": "",
            "Devices": null,
            "DeviceCgroupRules": null,
            "DiskQuota": 0,
            "KernelMemory": 0,
            "MemoryReservation": 0,
            "MemorySwap": 0,
            "MemorySwappiness": null,
            "OomKillDisable": false,
            "PidsLimit": 0,
            "Ulimits": null,
            "CpuCount": 0,
            "CpuPercent": 0,
            "IOMaximumIOps": 0,
            "IOMaximumBandwidth": 0
        "GraphDriver": {
            "Data": {
                "LowerDir": "/var/lib/docker/overlay2/5fbfde1f36fe0da7cee8ed92b728f7b92ddd4a1b8d9aeafe44b7e8cde581aeb3-init/diff:/var/lib/docker/overlay2/a9c4a86986bf84eff4d3156580e986daed91c7a37d937c5e4f608cd90b78f50a/diff:/var/lib/docker/overlay2/566bb33f0a3140bdb3726e3581bc703557f729010d2fb5b76ba21ac04157e5eb/diff:/var/lib/docker/overlay2/92302187d5633c0e6f3577edf93e2f1fbc133ccfcd11c6ce4a2b0fd06eb33db4/diff:/var/lib/docker/overlay2/3ac16dcca78ec2202d9af5e2e1ca50053612b75247d685c66418516aa7a1f91e/diff:/var/lib/docker/overlay2/3c2bef86bfac98dace20fb5ad4461601d444797454a5561bb543e4478d3aed25/diff:/var/lib/docker/overlay2/82de5471b51e7a55f8d9ff61983b36e9302b2fc7f4ba3fcc6ce5bde9f426ac9b/diff:/var/lib/docker/overlay2/7103da23a70519f91ae53950b6da99797d75104815ff43a1662efc92a933dc45/diff:/var/lib/docker/overlay2/70d522784351b087ee139f429dd041e1966308365e222f9022ab33f1f6da5089/diff:/var/lib/docker/overlay2/05d68822eebc4564c7e4597ee7c3d2bece406703e2e042bdf2ec35061a178f3a/diff:/var/lib/docker/overlay2/cc4fbcefd6fc474463d00d55d708988fc68f6eca5534675992e157743cb04af7/diff:/var/lib/docker/overlay2/50a363caa96c54de6cf17bfa477e384694f0fdf15a81c27cb92b830c0a8782b1/diff:/var/lib/docker/overlay2/ee1dadb2c4a98b37896eeb4e97f0715d97485bd10ef2b70d3b279d7fb93a4b18/diff:/var/lib/docker/overlay2/a66b6a45869ab5484cc04259ee7e11d32526a1fa1c91748f71754b57a87b69d9/diff:/var/lib/docker/overlay2/58472f6337dd2f95a5bda690e630fc6ddf4f661b6e965cfa798c666cde72457a/diff:/var/lib/docker/overlay2/22657f15e2d1411269f3201e63705babaaa7a04275f6c91ca5df4dc167abd93f/diff:/var/lib/docker/overlay2/5483cd1fad2a005e68e2656c5fcee54b8844576743288c06e49f40f6a4381a63/diff:/var/lib/docker/overlay2/ba02a2666cd21a254805404d1757f8ed90e28089e4a924e15a524c1e09265d0a/diff:/var/lib/docker/overlay2/07359ba2f66ba314629b1a6df441a7b96470e5d55ec22b88a48cc7c93b34f515/diff:/var/lib/docker/overlay2/99ecef114a5db24e123e4f5d9a8a01c3a79fa6aaed1af1095669f374a689294d/diff:/var/lib/docker/overlay2/7cfa73084c807c05112368f9c60627622b807b5ad932ace14541994f95209329/diff:/var/lib/docker/overlay2/b8e4cd0ea2811b61210129cc97ef4d10489bcb61b3b1dbe64d5a7af65bc284e2/diff:/var/lib/docker/overlay2/5cb7c00c701b24ca232c773eff803b0ca26a4bb137a5960920f5f3e9c96cfe7b/diff:/var/lib/docker/overlay2/6e722e736fb0acf96c2bbd2b29cd10e79955fe4b5fd8bf862a17ffa241b68a1b/diff:/var/lib/docker/overlay2/160835aace0cb1e2f4b9360934188b99ca9a65c74ee8d100f613275024e9d811/diff:/var/lib/docker/overlay2/5c7ba1cf63c83cda117ef0eca2bfd65d9bd44669e0e80933e351620bce546354/diff:/var/lib/docker/overlay2/c58b587a8318b57dc1f39c2aa2df68fa86295280fc007650a16008d05685b356/diff",
                "MergedDir": "/var/lib/docker/overlay2/5fbfde1f36fe0da7cee8ed92b728f7b92ddd4a1b8d9aeafe44b7e8cde581aeb3/merged",
                "UpperDir": "/var/lib/docker/overlay2/5fbfde1f36fe0da7cee8ed92b728f7b92ddd4a1b8d9aeafe44b7e8cde581aeb3/diff",
                "WorkDir": "/var/lib/docker/overlay2/5fbfde1f36fe0da7cee8ed92b728f7b92ddd4a1b8d9aeafe44b7e8cde581aeb3/work"
            "Name": "overlay2"
        "Mounts": [
                "Type": "bind",
                "Source": "/home/joao.suzana/gitprojects/superComponent/docker/configurations/default/component/common",
                "Destination": "/tmp/conf/1",
                "Mode": "rw",
                "RW": true,
                "Propagation": "rprivate"
                "Type": "bind",
                "Source": "/home/joao.suzana/gitprojects/superComponent/docker/configurations/custom/component",
                "Destination": "/tmp/conf/2",
                "Mode": "rw",
                "RW": true,
                "Propagation": "rprivate"
                "Type": "bind",
                "Source": "/home/joao.suzana/gitprojects/superComponent/docker/configurations/system-tests/component",
                "Destination": "/tmp/conf/3",
                "Mode": "rw",
                "RW": true,
                "Propagation": "rprivate"
                "Type": "bind",
                "Source": "/dev/null",
                "Destination": "/tmp/conf/4",
                "Mode": "rw",
                "RW": true,
                "Propagation": "rprivate"
                "Type": "bind",
                "Source": "/home/joao.suzana/gitprojects/superComponent/docker/configurations/default/component/basic",
                "Destination": "/tmp/conf/0",
                "Mode": "rw",
                "RW": true,
                "Propagation": "rprivate"
        "Config": {
            "Hostname": "9cdc36c44340",
            "Domainname": "",
            "User": "",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "ExposedPorts": {
                "1099/tcp": {},
                "7000/tcp": {},
                "8080/tcp": {}
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
            "Cmd": [
            "Healthcheck": {
                "Test": [
                    "grep -q \"App Service is ready.\""
                "Interval": 20000000000,
                "Retries": 30
            "ArgsEscaped": true,
            "Image": "",
            "Volumes": {
                "/tmp/conf/0": {},
                "/tmp/conf/1": {},
                "/tmp/conf/2": {},
                "/tmp/conf/3": {},
                "/tmp/conf/4": {}
            "WorkingDir": "",
            "Entrypoint": null,
            "OnBuild": null,
            "Labels": {
                "build-date": "20171128",
                "com.docker.compose.config-hash": "51a3c3781142fce6292e53a5a42dd804a41e5c6e81b02b2dab14647d5f3fe774",
                "com.docker.compose.container-number": "1",
                "com.docker.compose.oneoff": "False",
                "com.docker.compose.project": "kegfngsmzx",
                "com.docker.compose.service": "private-component",
                "com.docker.compose.version": "1.17.1",
                "com.super.component": "Super",
                "license": "GPLv2",
                "name": "CentOS Base Image",
                "vendor": "CentOS"
        "NetworkSettings": {
            "Bridge": "",
            "SandboxID": "4a6a1b4492dce570a42cb735915c76fab4c0e92dd712bf81ae323df8eec1d0a3",
            "HairpinMode": false,
            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
            "Ports": {
                "1099/tcp": [
                        "HostIp": "",
                        "HostPort": "32863"
                "7000/tcp": [
                        "HostIp": "",
                        "HostPort": "32862"
                "8080/tcp": [
                        "HostIp": "",
                        "HostPort": "32861"
            "SandboxKey": "/var/run/docker/netns/4a6a1b4492dc",
            "SecondaryIPAddresses": null,
            "SecondaryIPv6Addresses": null,
            "EndpointID": "",
            "Gateway": "",
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "IPAddress": "",
            "IPPrefixLen": 0,
            "IPv6Gateway": "",
            "MacAddress": "",
            "Networks": {
                "kegfngsmzx_default": {
                    "IPAMConfig": null,
                    "Links": null,
                    "Aliases": [
                    "NetworkID": "19e6624e9254883228576ad289770611fd066ed7fc1c847eb0dd25899b240d07",
                    "EndpointID": "850780c0914d118382913f0ff287433e88c01a56d3e42fa95ce890c737027b76",
                    "Gateway": "",
                    "IPAddress": "",
                    "IPPrefixLen": 16,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": "02:42:ac:12:00:07",
                    "DriverOpts": null

@Timunas can you try updating to 17.12?

The original issue was with 17.12

Regarding the original issue, I reproduced it once again and I cannot docker inspect it just hangs for all commands

I get the same issue, though without using docker-compose. I'm using docker swarm. Same thing though, I occasionally get containers that neither docker swarm nor I with the docker CLI can stop. This causes docker swarm to end up collecting more replicas than desired that it can't scale down. Sometimes these replicas can still service requests and receive traffic. The only way to remove the containers is to restart docker on the effected node.

I have the same issue with docker swarm. I remove one of multuple docker stacks, but only some of the containers in the stack are removed, while some containers hang around. Commands to docker inspect or docker rm on the hung containers just hang on the command line until I Ctrl-c. Need to reboot to get the containers removed. Did not have the issue in 17.09, only after upgrading to 17.12.0-ce (also had the problem on 17.12.0-ce-rc4).

I have the issue on an Azure VM: docker info

```Containers: 95
Running: 83
Paused: 0
Stopped: 12
Images: 579
Server Version: 17.12.0-ce
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
NodeID: hy0kx44q5m9jg0lc1n5ylxkw6
Is Manager: true
ClusterID: ordhsz694y98k3r4604ksc937
Managers: 1
Nodes: 1
Task History Retention Limit: 2
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 3
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 0
Autolock Managers: false
Root Rotation In Progress: false
Node Address:
Manager Addresses:
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 89623f28b87a6004d4b785663257362d1658a729
runc version: b2567b37d7b75eb4cf325b77297b140ea686ce8f
init version: 949e6fa
Security Options:
Profile: default
Kernel Version: 4.4.0-104-generic
Operating System: Ubuntu 16.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 27.47GiB
Name: build-agent-vm001
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Experimental: false
Insecure Registries:
Live Restore Enabled: false

WARNING: No swap limit support

I also have the same issue on Docker for Mac (Edge: 17.12): `docker info`

```Containers: 110
 Running: 65
 Paused: 0
 Stopped: 45
Images: 607
Server Version: 17.12.0-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
 Volume: local
 Network: bridge host ipvlan macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: qfzh0tqkchl2m42uhju7k3ml4
 Is Manager: true
 ClusterID: q14zy6epqkpx0w112wusdtd3u
 Managers: 1
 Nodes: 1
  Task History Retention Limit: 2
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address:
 Manager Addresses:
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 89623f28b87a6004d4b785663257362d1658a729
runc version: b2567b37d7b75eb4cf325b77297b140ea686ce8f
init version: 949e6fa
Security Options:
  Profile: default
Kernel Version: 4.9.60-linuxkit-aufs
Operating System: Docker for Mac
OSType: linux
Architecture: x86_64
CPUs: 6
Total Memory: 5.817GiB
Name: linuxkit-025000000001
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 260
 Goroutines: 491
 System Time: 2018-01-09T00:13:09.053688513Z
 EventsListeners: 28
HTTP Proxy: docker.for.mac.http.internal:3128
HTTPS Proxy: docker.for.mac.http.internal:3128
Experimental: true
Insecure Registries:
Live Restore Enabled: false

We are also experiencing non-responsive docker-deamon on some commands:

currently I cannot

docker rmi
docker system prune -f
docker exec
docker logs

this happends on multiple engines, all running 17.12.

seems related to

I experience the same bug. It is not consistent though. I don't see a pattern yet but it does happen.

I am running Docker for Mac Version 17.12.0-ce-mac46 (21698). I am not running Docker in Docker.

Container is created by docker-compose up.

Yes I can see that container is still running but stop or kill just hangs and does nothing.

10:13:13 Alexei-Workstation /Users/alexei.chekulaev/Projects/SBD-MASTER
$ docker ps
CONTAINER ID        IMAGE                     COMMAND                  CREATED             STATUS                    PORTS                                                    NAMES
f0e36d3589d3        docksal/cli:1.3-php7      "/opt/ sup…"   44 hours ago        Up 28 minutes (healthy)   22/tcp, 9000/tcp                                         sbdmaster_cli_1
b93c84c9a3a3        docksal/ssh-agent:1.0     "/ ssh-agent"      44 hours ago        Up 29 minutes                                                                      docksal-ssh-agent
91ce00eb35fa        docksal/dns:1.0           "/opt/ …"   44 hours ago        Up 29 minutes   >53/udp                                docksal-dns
ae867cca0f21        docksal/vhost-proxy:1.1   "docker-entrypoint.s…"   44 hours ago        Up 29 minutes   >80/tcp,>443/tcp   docksal-vhost-proxy
10:13:17 Alexei-Workstation /Users/alexei.chekulaev/Projects/SBD-MASTER
$ docker stop f0e36d3589d3
10:16:03 Alexei-Workstation /Users/alexei.chekulaev/Projects/SBD-MASTER
$ docker kill f0e36d3589d3
10:30:51 Alexei-Workstation /Users/alexei.chekulaev/Projects/SBD-MASTER

(You can see that minutes passed before I pressed Ctrl-C)

In another Terminal I tried to start another docker-compose project, that's what I have seen in the output the first time:

$ docker-compose up
rm: can't remove '/.ssh/': Stale file handle
rm: can't remove '/.ssh/authorized_keys': Stale file handle
rm: can't remove '/.ssh/': Stale file handle
rm: can't remove '/.ssh/known_hosts': Stale file handle
rm: can't remove '/.ssh/id_test': Stale file handle
rm: can't remove '/.ssh/': Stale file handle
rm: can't remove '/.ssh/id_rsa2': Stale file handle
rm: can't remove '/.ssh/id_dsa': Stale file handle
rm: can't remove '/.ssh/id_boot2docker': Stale file handle
rm: can't remove '/.ssh/': Stale file handle
rm: can't remove '/.ssh/id_sbd': Stale file handle
rm: can't remove '/.ssh/id_rsa': Stale file handle
rm: can't remove '/.ssh/': Stale file handle
rm: can't remove '/.ssh': Directory not empty
Starting services...
Creating network "demonodb_default" with the default driver
Creating demonodb_cli_1 ... done
Creating demonodb_cli_1 ... 
Creating demonodb_web_1 ... done

Another project started fine but with these errors about stale file names above. Subsequent stops and starts of the another project did not throw any errors and worked fine.

These files are on a named volume. The volume is mounted as ro in docker-compose, so I'm not sure why there are "cant remove" messages.

Restarting Docker daemon solves the issue... temporarily. I forgot to do docker inspect and already restarted daemon but I think inspect would just hang like stop and kill do.

UPDATE: wanted to note that the container with issues has healthcheck on it. Looks like this might be the culprit.

I get the same issue. Can reproduce it everytime using different environments:
Docker for Mac Version 17.12.0-ce-mac46 (started hanging after update)
or using docker natively on Arch linux (kernel 4.14.14-1-ARCH), cannot restart docker service using systemctl restart docker.service, it hangs too. docker info

 Version:       18.01.0-ce
 API version:   1.35
 Go version:    go1.9.2
 Git commit:    03596f51b1
 Built: Sun Jan 14 23:10:39 2018
 OS/Arch:       linux/amd64
 Experimental:  false
 Orchestrator:  swarm
  Version:      18.01.0-ce
  API version:  1.35 (minimum version 1.12)
  Go version:   go1.9.2
  Git commit:   03596f51b1
  Built:        Sun Jan 14 23:11:14 2018
  OS/Arch:      linux/amd64
  Experimental: false

journalctl shows

dockerd[26382]: time="2018-01-25T12:39:22.289082720+03:00" level=error msg="stream copy error: reading from a closed fifo"

Also seeing this on 18.01. Hang on container inspect.

 Version:   18.01.0-ce
 API version:   1.35
 Go version:    go1.9.2
 Git commit:    03596f51b1
 Built: Sun Jan 14 23:10:39 2018
 OS/Arch:   linux/amd64
 Experimental:  false
 Orchestrator:  swarm

  Version:  18.01.0-ce
  API version:  1.35 (minimum version 1.12)
  Go version:   go1.9.2
  Git commit:   03596f51b1
  Built:    Sun Jan 14 23:11:14 2018
  OS/Arch:  linux/amd64
  Experimental: false
compose.cli.command.get_client: Docker version: Platform={'Name': ''}, Components=[{'Name': 'Engine', 'Version': '18.01.0-ce', 'Details': {'ApiVersion': '1.35', 'Arch': 'amd64', 'BuildTime': '2018-01-14T23:11:14.000000000+00:00', 'Experimental': 'false', 'GitCommit': '03596f51b1', 'GoVersion': 'go1.9.2', 'KernelVersion': '4.14.15-1-ARCH', 'MinAPIVersion': '1.12', 'Os': 'linux'}}], Version=18.01.0-ce, ApiVersion=1.35, MinAPIVersion=1.12, GitCommit=03596f51b1, GoVersion=go1.9.2, Os=linux, Arch=amd64, KernelVersion=4.14.15-1-ARCH, BuildTime=2018-01-14T23:11:14.000000000+00:00
compose.cli.verbose_proxy.proxy_callable: docker containers <- (all=False, filters={'label': ['com.docker.compose.project=discord']})
urllib3.connectionpool._make_request: http://localhost:None "GET /v1.24/containers/json?limit=-1&all=0&size=0&trunc_cmd=0&filters=%7B%22label%22%3A+%5B%22com.docker.compose.project%3Ddiscord%22%5D%7D HTTP/1.1" 200 1762
compose.cli.verbose_proxy.proxy_callable: docker containers -> (list with 1 items)
compose.cli.verbose_proxy.proxy_callable: docker inspect_container <- ('59760b63049318f7b0bef2605e63d0fd8b13f4e134a7aea435db9eb1bdf2b389')

We have stopped using 17.12 completely and rolled back to 17.09 because of this problem on 17.12 (macOS and apparently Linux as well).

This is a critical, persistent problem.

And unfortunately I have not found way to recreate it except using docker a lot.

I'm experiencing the same issue in multiple servers using 17.12. As @rfay said, it didn't happen on 17.09.

Checking the changelog, a major difference between 17.12 and 17.09 is that, since 17.11, Docker is based on containerd. So, as the evidences seem to indicate this is an issue in the runtime, maybe it would be good to investigate down this path.

Yup, same here. I stick with 17.09 and recommend everyone using docker-compose or swarm to stick with it until the issue is resolved.

If you can grab a stacktrace from the running daemon it would be very helpful.
You can get this by hitting GET /debug/pprof/goroutine?debug=2

I suspect, though, that this is the recent bug that was found in runc that is a race in handling the container I/O... which has been around since forever, apparently.
if so, we suspect this is exposed by changes in the kernel and everyone is upgrading their kernel recently for spectre/meltdown patches.

The relevant runc patch is here, which you can try if you don't want to wait for a patched docker release:


You can get this by hitting GET /debug/pprof/goroutine?debug=2

Please provide commands. I don't understand how to "hit" a relative url, and what is it relative to. I use Docker for Mac. What should I hit?

Assuming you have docker listening on a unix socket at /var/run/docker.sock (the default):

curl --unix-socket /var/run/docker.sock http:/./debug/pprof/goroutine?debug=2

or a TCP socket

curl http://<ip>:<port>/debug/pprof/goroutine?debug=2

The following file is the output of that command ran in an AWS Ubuntu 16.04 instance using Docker version 17.12.0-ce, build c97c6d6

moby 35933.txt

@ay0o Thanks!
Is there something blocked on the system right now?
I don't see any in progress stop/kills, just looks like a bunch of running containers, unfortunately.

I took the logs on a MacBook Pro running macOS High Sierra 10.13.3, running docker 18.01.0-ce-mac48, channel: edge ee2282129d.


@AlterEgo7 Thanks! This looks like docker is blocked in a syscall to write to disk, and even read from disk at least in one place. Seems like something is very wrong with the disk that is allocated for that docker VM in docker4mac.

A number of i/o bound syscalls blocked for ~1 minute, actually.

@cpuguy83 that would be due to docker-compose timeout being 60 seconds. Are there any specific settings for the VM that I can experiment with? However, as mentioned above by other users, this behaviour started with the 17.12 update.

I experience this problem also very frequently. On 17.12 It seems to appear mostly on containers with bind-mounted volumes. In our case these volumes are nfs-shares on the host.

@cpuguy83 @mborejdo If it makes a difference, ~10 containers running on my machine use docker-sync unison volumes.

It does help. Maybe the docker-sync tool is broken after 17.12?
This would definitely explain blocked writes as nfs is not very friendly to disconnected backends.

Same when running the same set of containers without any volumes or any docker-sync containers running. pprof log is attached.


@cpuguy83, I'm also seeing this problem with docker 17.12 using swarm - I have a 3 node cluster, with 6 docker stacks and about 30 services. Docker swarm rm for each stack works and cleans up about 90% of the containers, but even after 5 minutes, docker ps -a still show's about 10 containers in exited state - I can by if that were the only issue because I added a step at the end to invoke docker system prune -f on each node. Unfortunately, about 25% of the time, I'm left with containers still running after the stack rm command (I've waited 10+ minutes in some cases), so a docker system prune won't work - and when I see this, like others have described, most docker commands on that node hang. I have to execute a kill -15 on the offending PID to gain back control

As you suggested, I did try, but still no luck. Before we say that didn't work, can you verify I patched it correctly? I manually build docker-ce/engine (branch 17.12) bits updating runc to commit: 9f9c96235cc97674e935002fc3d78361b696a69e and then overwrote /usr/bin/runc with the newly built binary. I did this on each node in the cluster, restarted docker and ran docker info | grep runc and see:

Default Runtime: runc
runc version: 9f9c96235cc97674e935002fc3d78361b696a69e (expected: b2567b37d7b75eb4cf325b77297b140ea686ce8f)

Should I try with replacing all the docker binaries?

I've also attached my stacktrace (curl --unix-socket /var/run/docker.sock http://localhost/debug/pprof/goroutine?debug=2).

@foleymic The issue you are seeing does seem to resemble the runc issue.
Replacing dockerd is not what's needed but rather replacing the docker-runc binary.

Note that for testing purposes you can install the patched runc to a custom location and tell docker to use that as the default runtime (or on a per-container basis).
To do this, basically just build runc with the above commit, put it somewhere like /usr/local/bin/patched-runc and then start docker with --add-runtime myrunc=/usr/local/bin/patched-runc --default-runtime=myrunc (these can also be put in /etc/docker/daemon.json).

@cpuguy83 - thanks for the quick reply. I did basically just replace runc, but by building the complete docker-ce engine from the 17.12 branch and then change the RUNC_COMMIT to b2567b37d7b75eb4cf325b77297b140ea686ce8f in binaries-commits and vendor.conf. Anyway, I think what you describe sounds much better, so let me try just building runc and replacing it and repeating my test. Thanks again!

@foleymic Awesome. Perhaps the output of docker-containerd-ctr pprof --debug-socket /run/docker/containerd/docker-containerd-debug.sock goroutines would be prudent as well.

@AlterEgo7 Your stack trace is quite errrr...interesting...
There are a bunch of goroutines (green threads) which are in the "runnable" state... which means they are waiting for a real OS thread to actually run them.

Can you get a process list from the host? docker run --pid=host busybox ps aux ought to do.

@cpuguy83 - I rebuilt standalone runc (checked out commit 9f9c96235cc97674e935002fc3d78361b696a69e) and deployed it as you suggested (snippet of daemon.json below) and reproduced the issue. As requested, I've attached the output of docker-containerd-debug.sock goroutines

I appreciate your help, let me know if there's anything else you need me to do or try.


cat /etc/docker/daemon.json
    "runtimes": {
        "patchedrunc": {
             "path": "/usr/local/bin/patched-runc"

docker info

docker info | grep runc
Runtimes: patchedrunc runc
Default Runtime: patchedrunc
WARNING: bridge-nf-call-ip6tables is disabled
runc version: 9f9c96235cc97674e935002fc3d78361b696a69e (expected: b2567b37d7b75eb4cf325b77297b140ea686ce8f)

Multiple people on my dev team are experiencing docker commands hanging with MacOS 10.13 and Docker For Mac 17.12. It has happened to me when trying to stop/kill containers, but also on other commands, like docker inspect

Attached is the output of curl --unix-socket /var/run/docker.sock http:/./debug/pprof/goroutine?debug=2

I took it while waiting on a docker-compose ps that had been hung for > 30 seconds. Interestingly docker ps worked fine during this hang, but a docker inspect on any of the containers returned by docker ps hangs just like the docker-compose ps did.

We don't use docker-sync, but we do use volume mounts via the OSXFS file system from Docker for Mac.
Let me know if there's anything else I can do to help investigate.

@cpuguy83 sorry, I understood you just wanted the log independently of whether or not it was failing at the moment.

The compose I'm using at the moment has 36 containers. I tried to reproduce the issue by simply running docker-compose up and docker-compose down. First time was great but the second time, 3 containers remained "up", and all the others remained in "exited". Here's the output of the log:


This is the error reported by docker-compose down:

ERROR: An HTTP request took too long to complete. Retry with --verbose to obtain debug information. If you encounter this issue regularly because of slow network conditions, consider setting COMPOSE_HTTP_TIMEOUT to a higher value (current value: 60).

One thing I noticed is that it seems to be just one container blocking the others. Particularly, in this case, the 3 containers that weren't stopped were postgres, etcd and a helper to configure the etcd. However, it looks it's the postgres blocking the others. For instance, I can run docker inspect etcd and it works, but docker inspect postgres fails with timeout.

Notice this is just an example of this specific case. I'm not saying it's postgres always the one to blame. Maybe next time it happens, it will be redis or rabbitmq.

Also, it happens using swarm as well.

I also have the same issue.
In the company I work for we have a Docker Host with many containers running java environments for development and, from time to time, I cannot remove images or containers, I cannot inspect images or containers, among other operations.

I get many messages in journalctl like:
Jan 31 11:31:51 DOCKER-DEV dockerd[1882]: time="2018-01-31T11:31:51.598987401Z" level=warning msg="unknown container" container=540456eed4104723a0d4e9d4628d436ec381f978738f83dda16f22430cc60094 module=libcontainerd namespace=plugins.moby

I'm running 17.12 CE in Ubuntu 16.04.3 LTS.

Thank you very much for your time and help.

@luisnabais The issue about Unknown container is discussed here:

@sorenhansendk thank you, I already knew about that, I'm following both threads, I have both issues with 17.12...

Update - I've setup a new, 3 node cluster (same VM template) and manually installed RC 1 of docker-18.02.0-ce( and have not been able to reproduce the problem. In addition, thanks to #35891, I no longer see the Unknown container message in my logs and all my undefined volumes are also getting removed. I'm going to do some more testing to try and isolate which binary(ies) has the fix.

For me at least, the problem is intermittent.

On which version @richardjq?
It exists on all version of Docker, beause the issue is in runc


I am also having the same problem with hanging after updating to the latest Docker (Mac OS). Here is my debug output if it helps:

Thanks all, very helpful.

@carlisia Is this edge or stable? (or maybe just the version in the about screen would be better, just because version madness).

@cpuguy83 stable. I just ran the update option from the UI.


I'm not sure if this helps or not, but I noticed something strange that I thought I would add to this conversation. The issue just happened again (with stock docker 17.12.0-ce) and so I looked into the daemon logs on the node where the stack/service was removed, but the container remained in healthy status. docker ps shows this container id as 8c5381ca6248, I grep'd for that id in the daemon logs and I see that it was started with pid=94570, but after I invoked the stack rm command, I see an error in the log that repeats 3 times - Ignoring Exit Event, no such exec command found for that container id, but each of these three log messages have different PIDs and none of them equal to pid=94570. I also confirmed that pid 94570 is the correct PID (ran ps ax | grep 8c5381ca6248).

Any idea why swarm would have the wrong PID?

Here is a snippet of the daemon logs:

$ journalctl | grep 8c538
Feb 01 12:03:12 xxx dockerd[38879]: time="2018-02-01T12:03:12-05:00" level=info msg="shim docker-containerd-shim started" address="/containerd-shim/moby/8c5381ca6248dcbe199c56f87842c2cfd089e3ee5547895d7f037ed79e20dfc0/shim.sock" debug=false module="containerd/tasks" pid=94570
Feb 01 12:03:53 xxx dockerd[38879]: time="2018-02-01T12:03:53.823039243-05:00" level=warning msg="unknown container" container=8c5381ca6248dcbe199c56f87842c2cfd089e3ee5547895d7f037ed79e20dfc0 module=libcontainerd namespace=plugins.moby
Feb 01 12:03:53 xxx dockerd[38879]: time="2018-02-01T12:03:53.879628386-05:00" level=warning msg="unknown container" container=8c5381ca6248dcbe199c56f87842c2cfd089e3ee5547895d7f037ed79e20dfc0 module=libcontainerd namespace=plugins.moby
Feb 01 12:04:04 xxx dockerd[38879]: time="2018-02-01T12:04:04.143264783-05:00" level=warning msg="unknown container" container=8c5381ca6248dcbe199c56f87842c2cfd089e3ee5547895d7f037ed79e20dfc0 module=libcontainerd namespace=plugins.moby
Feb 01 12:04:04 xxx dockerd[38879]: time="2018-02-01T12:04:04.285288560-05:00" level=warning msg="unknown container" container=8c5381ca6248dcbe199c56f87842c2cfd089e3ee5547895d7f037ed79e20dfc0 module=libcontainerd namespace=plugins.moby
Feb 01 12:04:14 xxx dockerd[38879]: time="2018-02-01T12:04:14.685324737-05:00" level=warning msg="unknown container" container=8c5381ca6248dcbe199c56f87842c2cfd089e3ee5547895d7f037ed79e20dfc0 module=libcontainerd namespace=plugins.moby
Feb 01 12:04:24 xxx dockerd[38879]: time="2018-02-01T12:04:24.310513628-05:00" level=warning msg="Health check for container 8c5381ca6248dcbe199c56f87842c2cfd089e3ee5547895d7f037ed79e20dfc0 error: context cancelled"
Feb 01 12:04:34 xxx dockerd[38879]: time="2018-02-01T12:04:34.520599061-05:00" level=warning msg="unknown container" container=8c5381ca6248dcbe199c56f87842c2cfd089e3ee5547895d7f037ed79e20dfc0 module=libcontainerd namespace=plugins.moby
Feb 01 12:04:44 xxx dockerd[38879]: time="2018-02-01T12:04:44.332650759-05:00" level=warning msg="Health check for container 8c5381ca6248dcbe199c56f87842c2cfd089e3ee5547895d7f037ed79e20dfc0 error: context cancelled"
Feb 01 12:04:54 xxx dockerd[38879]: time="2018-02-01T12:04:54.666961688-05:00" level=warning msg="unknown container" container=8c5381ca6248dcbe199c56f87842c2cfd089e3ee5547895d7f037ed79e20dfc0 module=libcontainerd namespace=plugins.moby
Feb 01 12:05:04 xxx dockerd[38879]: time="2018-02-01T12:05:04.348300333-05:00" level=warning msg="Health check for container 8c5381ca6248dcbe199c56f87842c2cfd089e3ee5547895d7f037ed79e20dfc0 error: context cancelled"
Feb 01 12:05:14 xxx dockerd[38879]: time="2018-02-01T12:05:14.633386506-05:00" level=warning msg="unknown container" container=8c5381ca6248dcbe199c56f87842c2cfd089e3ee5547895d7f037ed79e20dfc0 module=libcontainerd namespace=plugins.moby
Feb 01 12:05:20 xxx dockerd[38879]: time="2018-02-01T12:05:20.813814668-05:00" level=warning msg="unknown container" container=8c5381ca6248dcbe199c56f87842c2cfd089e3ee5547895d7f037ed79e20dfc0 module=libcontainerd namespace=plugins.moby
Feb 01 12:05:20 xxx dockerd[38879]: time="2018-02-01T12:05:20.813896784-05:00" level=warning msg="Ignoring Exit Event, no such exec command found" container=8c5381ca6248dcbe199c56f87842c2cfd089e3ee5547895d7f037ed79e20dfc0 exec-id=9669e0bea74dbd90d56f61c3b8070fc043f6254d5b5b089d1562fdbfb154aa2b exec-pid=106625
Feb 01 12:05:20 xxx dockerd[38879]: time="2018-02-01T12:05:20.845301695-05:00" level=warning msg="unknown container" container=8c5381ca6248dcbe199c56f87842c2cfd089e3ee5547895d7f037ed79e20dfc0 module=libcontainerd namespace=plugins.moby
Feb 01 12:05:20 xxx dockerd[38879]: time="2018-02-01T12:05:20.845372559-05:00" level=warning msg="Ignoring Exit Event, no such exec command found" container=8c5381ca6248dcbe199c56f87842c2cfd089e3ee5547895d7f037ed79e20dfc0 exec-id=b637507387bcce7dcf3f270d48eaf75d4d5deccd00506d0ed656c78833ad8cda exec-pid=104564
Feb 01 12:05:20 xxx dockerd[38879]: time="2018-02-01T12:05:20.877888876-05:00" level=warning msg="unknown container" container=8c5381ca6248dcbe199c56f87842c2cfd089e3ee5547895d7f037ed79e20dfc0 module=libcontainerd namespace=plugins.moby
Feb 01 12:05:20 xxx dockerd[38879]: time="2018-02-01T12:05:20.952730879-05:00" level=warning msg="unknown container" container=8c5381ca6248dcbe199c56f87842c2cfd089e3ee5547895d7f037ed79e20dfc0 module=libcontainerd namespace=plugins.moby
Feb 01 12:05:20 xxx dockerd[38879]: time="2018-02-01T12:05:20.952805518-05:00" level=warning msg="Ignoring Exit Event, no such exec command found" container=8c5381ca6248dcbe199c56f87842c2cfd089e3ee5547895d7f037ed79e20dfc0 exec-id=2e5df9c74b29ca12fa9df0e14f095cf8524629564245107c4d8f0fb9d82adf9e exec-pid=103807

Sorry for continuing to grow this thread, but I've been experimenting. Whatever the issue is, it looks like it was fixed in the docker daemon (dockerd) sometime between 18.01.0-ce and 18.02.0-ce-rc1. After installing the latest version from the test channel and not being able to reproduce the problem, I started to experiment. Going back to the stable channel, 17.12.0, I only replaced dockerd from the binaries distributed in the edge and test channels, 18.01, 18.02-rc1 and 18.02-rc2, running multiple tests on with each version.

  • dockerd 17.12.0-ce FAILED
  • dockerd 18.01.0-ce FAILED
  • dockerd 18.02.0-ce-rc1 SUCCESS

Just to be clear, when I say FAILED, I am talking about the issue where containers, created by a docker stack deploy, remain in Up status after executing a docker stack rm. There is still an issue, even in 18.02.0-ce-rc2 where docker stack rm will stop the containers, but not remove them (containers remain in Exited status) - not ideal, but I can live with this by running prune after the docker stack rm command completes on each stack (note that I wait 30 seconds before issuing the prune to give swarm some time to finish cleaning up before I take over).

So the good news, it looks like it's fixed in the next release, but I don't really want to roll out a RC to my teams. I'll scan through the commits between 18.01 and 18.02-rc1 to see if I can find the fix. @cpuguy83 and @thaJeztah , any ideas commits that come to mind. Is there anything you want me to try or logs to provide? If you can help me find a possible commit, I can test it in my fork.

One word of caution, running through these tests takes time and the issue is not consistently reproducible, so even though I have not seen it occur with 18.02 (RC 1 or 2), I can't be 100% positive that it won't show up at some point - I'll keep testing with RC1 and update this thread if I see the any occurrence of this problem, but with 17.12 - 18.01, I have seen this problem occur about 50% of the time, so I'm fairly confident it's fixed in 18.02.

I agree with you @foleymic! I have upgraded my test environments to the cutting edge version of Docker (the release candidate) and I also see very good results with the bug about stopping/killing containers. I think the updated version of runc have solved the issue 👍

Thanks for the report. I think this is totally fixed with a combo of containerd 1.0.1 and the patched runc.

@sorenhansendk - glad you're seeing positive results too. @cpuguy83 also thought it was the runc fix done related to spectre/meltdown, but that was definitely not the fix for me. I'm currently running with everything, including runc on 17.12.0 and only have dockerd on 18.02.0-rc1.
Conversely, if I upgraded only runc I would see the problem (I tried manually building runc with just the as well as taking the version from the testing and edge distros).

@cpuguy83 - I'm only running with patched dockerd. Containerd and runc are still on 17.12.0

docker info:

Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 89623f28b87a6004d4b785663257362d1658a729 (expected: 9b55aab90508bd389d7654c4baf173a981477d55)
runc version: b2567b37d7b75eb4cf325b77297b140ea686ce8f (expected: 9f9c96235cc97674e935002fc3d78361b696a69e)

Yes, 18.02 comes with the newer containerd.
We are working on a new patch release for 17.12.

So dockerd includes containerd? I thought it was a seperate binary - docker-containerd. Anyway, glad to hear you're working on a patch for 17.12 - that is great news. Thanks again for your help on this.


Oh, so you upgraded only the dockerd binary literally. Ah, ok. I'll have to double check which patches could be involved here and make sure they make it to the 17.12 release.

:) - I'm currently running franken-docker! Everything is 17.12.0, except dockerd, which is 18.02-rc1

FYI, I think the fix lies in PR #35748

I'm still able to reproduce the issue with Version 18.02.0-ce-rc2-mac51 (22446)

linuxkit-025000000001:/# /usr/bin/containerd --version
containerd v1.0.1 9b55aab90508bd389d7654c4baf173a981477d55
docker info

containerd version: 9b55aab90508bd389d7654c4baf173a981477d55
runc version: 9f9c96235cc97674e935002fc3d78361b696a69e

@ximarx Thanks for the report! Can you get stack dumps from docker and containerd?
With docker you can do

curl --unix-socket /var/run/docker.sock http:/./debug/pprof/goroutine?debug=2

For containerd on mac it's a little trickier...

 docker run -it --rm -v /:/host alpine /host/usr/local/bin/docker-containerd-ctr pprof --debug-socket /host/run/docker/containerd/docker-containerd-debug.sock goroutines

@ximarx - when you say the problem is still there, are you talking about the one where containers remain, but in exited state, or do you still see them in running state? I only noticed that with 18.02, I still see containers remain, but none are in running state so I can get by with running prune after docker stack rm. Not ideal, but at least I can get to a clean state after bringing everything down - prior to taking 18.02, I had to manually kill processes to remove those containers which were still running, but no longer associated to the stack.

BTW, I merged PR #35748 onto my 12.17 in my fork and built - I have not seen the problem since applying that build.

Can you get stack dumps from docker and containerd?

@cpuguy83: Unfortunately I've restarted docker. I'll get stack dump as soon as I'll reproduce the issue once again.

when you say the problem is still there, are you talking about the one where containers remain, but in exited state, or do you still see them in running state?

@foleymic Container remains in running state, but docker inspect command hangs forever (as well as docker-compose ps and docker stop)

@ximarx - thanks for confirming that, I was seeing the exact same behavior in multiple swarm clusters, but since upgrading them to the patched build we have not seen the issue. Only difference I see is that we're running on RHEL versus you on Mac.

is this issue fixed already?
If so in which version will it be released?

I believe a fixed was landed with 18.02, but I'm waiting for their confirmation

The "stable" channel 17.12.0 version still has this bug; if it's fixed, could that PR be back-ported to a patch release 17.12.1? The stable channel is pretty unstable, if people are having to revert all the way to 17.09 or resort to an edge release.

+1 for a patch release 17.12.1

It's being worked on. Thanks!

Sorry to warm up this thread - it looks like the fix is coming ;) - but I have a quick question:
We're seeing the exact same issue on docker-ce-17.12 since we added HEALTHCHECKs to our Dockerfiles. The containers without HEALTCHECK specified in their Dockerfiles stop just fine.
Could this be related to the HEALTHCHECKs or is this just a coincidence?


@PhilPhonic yes, can be triggered by healthchecks

Does (added to yesterday's release) fix this issue?

I come from issue #34213, not entirely sure the problem is the same, but I was directed here.

For us the docker container still get's stuck after these fixes and never finishes executing it, but in this occasion I was able to stop it without rebooting the docker daemon.

docker version

 Version:   17.12.1-ce
 API version:   1.35
 Go version:    go1.9.4
 Git commit:    7390fc6
 Built: Tue Feb 27 22:17:56 2018
 OS/Arch:   linux/amd64

  Version:  17.12.1-ce
  API version:  1.35 (minimum version 1.12)
  Go version:   go1.9.4
  Git commit:   7390fc6
  Built:    Tue Feb 27 22:16:28 2018
  OS/Arch:  linux/amd64
  Experimental: false

docker info

Containers: 1
 Running: 1
 Paused: 0
 Stopped: 0
Images: 1
Server Version: 17.12.1-ce
Storage Driver: overlay2
 Backing Filesystem: tmpfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9b55aab90508bd389d7654c4baf173a981477d55
runc version: 9f9c96235cc97674e935002fc3d78361b696a69e
init version: 949e6fa
Security Options:
Kernel Version: 4.4.0-64-generic
Operating System: Ubuntu 14.04 LTS
OSType: linux
Architecture: x86_64
CPUs: 6
Total Memory: 19.8GiB
Name: vm64-3
Docker Root Dir: /tmp/ramdisk/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 27
 Goroutines: 47
 System Time: 2018-03-01T07:57:09.565797691Z
 EventsListeners: 0
Experimental: false
Insecure Registries:
Live Restore Enabled: false

docker inspect prerelease (the container that is stuck)

        "Id": "34d42f7a8246de8c6eb4b3d9f8fe3a62c1b0ac8ce7a800f918538b33c35d282a",
        "Created": "2018-02-28T19:22:54.823764324Z",
        "Path": "bash",
        "Args": [
            "./release_scripts/ /release_scripts/.rosinstall unused-version-argument git /release_scripts/.rosinstall erbium xenial kinetic index.yaml coverage true "
        "State": {
            "Status": "running",
            "Running": true,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 6495,
            "ExitCode": 0,
            "Error": "",
            "StartedAt": "2018-02-28T19:22:55.675976607Z",
            "FinishedAt": "0001-01-01T00:00:00Z"
        "Image": "sha256:fa0591be5fc0613715fb2605264ff61967811262bdbc93d262faca23df1bf2fe",
        "ResolvConfPath": "/tmp/ramdisk/docker/containers/34d42f7a8246de8c6eb4b3d9f8fe3a62c1b0ac8ce7a800f918538b33c35d282a/resolv.conf",
        "HostnamePath": "/tmp/ramdisk/docker/containers/34d42f7a8246de8c6eb4b3d9f8fe3a62c1b0ac8ce7a800f918538b33c35d282a/hostname",
        "HostsPath": "/tmp/ramdisk/docker/containers/34d42f7a8246de8c6eb4b3d9f8fe3a62c1b0ac8ce7a800f918538b33c35d282a/hosts",
        "LogPath": "/tmp/ramdisk/docker/containers/34d42f7a8246de8c6eb4b3d9f8fe3a62c1b0ac8ce7a800f918538b33c35d282a/34d42f7a8246de8c6eb4b3d9f8fe3a62c1b0ac8ce7a800f918538b33c35d282a-json.log",
        "Name": "/prerelease",
        "RestartCount": 0,
        "Driver": "overlay2",
        "Platform": "linux",
        "MountLabel": "",
        "ProcessLabel": "",
        "AppArmorProfile": "unconfined",
        "ExecIDs": null,
        "HostConfig": {
            "Binds": [
            "ContainerIDFile": "",
            "LogConfig": {
                "Type": "json-file",
                "Config": {}
            "NetworkMode": "host",
            "PortBindings": {},
            "RestartPolicy": {
                "Name": "no",
                "MaximumRetryCount": 0
            "AutoRemove": false,
            "VolumeDriver": "",
            "VolumesFrom": null,
            "CapAdd": null,
            "CapDrop": null,
            "Dns": [],
            "DnsOptions": [],
            "DnsSearch": [],
            "ExtraHosts": null,
            "GroupAdd": null,
            "IpcMode": "shareable",
            "Cgroup": "",
            "Links": null,
            "OomScoreAdj": 0,
            "PidMode": "",
            "Privileged": true,
            "PublishAllPorts": false,
            "ReadonlyRootfs": false,
            "SecurityOpt": [
            "UTSMode": "",
            "UsernsMode": "",
            "ShmSize": 67108864,
            "Runtime": "runc",
            "ConsoleSize": [
            "Isolation": "",
            "CpuShares": 0,
            "Memory": 0,
            "NanoCpus": 0,
            "CgroupParent": "",
            "BlkioWeight": 0,
            "BlkioWeightDevice": [],
            "BlkioDeviceReadBps": null,
            "BlkioDeviceWriteBps": null,
            "BlkioDeviceReadIOps": null,
            "BlkioDeviceWriteIOps": null,
            "CpuPeriod": 0,
            "CpuQuota": 0,
            "CpuRealtimePeriod": 0,
            "CpuRealtimeRuntime": 0,
            "CpusetCpus": "",
            "CpusetMems": "",
            "Devices": [
                    "PathOnHost": "/dev/snd",
                    "PathInContainer": "/dev/snd",
                    "CgroupPermissions": "rwm"
            "DeviceCgroupRules": null,
            "DiskQuota": 0,
            "KernelMemory": 0,
            "MemoryReservation": 0,
            "MemorySwap": 0,
            "MemorySwappiness": null,
            "OomKillDisable": false,
            "PidsLimit": 0,
            "Ulimits": [
                    "Name": "nofile",
                    "Hard": 10240,
                    "Soft": 10240
            "CpuCount": 0,
            "CpuPercent": 0,
            "IOMaximumIOps": 0,
            "IOMaximumBandwidth": 0
        "GraphDriver": {
            "Data": {
                "LowerDir": "/tmp/ramdisk/docker/overlay2/03efcd70b4e0faac571708859c2cfb0d43f7da1331702508980c6c49945ba1e4-init/diff:/tmp/ramdisk/docker/overlay2/b4f54a4bcf9693f50de7c5789597a12479edfd42d9b8fa2f449f9c6a3f09e580/diff:/tmp/ramdisk/docker/overlay2/ec49f1ad95aaa5088113b71e7894db1ce0229123457e8e63f76cde8d5501cc73/diff:/tmp/ramdisk/docker/overlay2/3bb76c8cef3b6466f1f57167c326a8b065e5f0133424f16b1c0be2510ec60574/diff:/tmp/ramdisk/docker/overlay2/77e87eeac5f5325031e0ababfe19e094c1a443593e4f849f5d79f3f15ef4a573/diff:/tmp/ramdisk/docker/overlay2/1ad76d631a387704468b365fddcc772596361c64e3085cd9efc50f514211024c/diff:/tmp/ramdisk/docker/overlay2/14ee3b713fb1cb48a8baa1c18c39bb85ebbd4acf23df18f146d1b2ddf2106772/diff:/tmp/ramdisk/docker/overlay2/0636682f487ae65c4123e0d281991fde9eeb207745556ead1aa605588df66053/diff:/tmp/ramdisk/docker/overlay2/635fe79b7fa395eb1171b4cb8a5b053643c152699e2e36711d219de92f32d44f/diff:/tmp/ramdisk/docker/overlay2/f5f559bda6b48218a145b81d505dc6bb7dd1dbd144c00eb8200884c20c541a16/diff:/tmp/ramdisk/docker/overlay2/fd6c90c91221f4395ee5d1fa6e6ce165d98cdb5e38edd00d4ef5aaf7bda9bdec/diff:/tmp/ramdisk/docker/overlay2/4198097e8c9bcdcc596434d269f39a2014049b9f62bbd9e259c6d0653371db6e/diff:/tmp/ramdisk/docker/overlay2/d9665613172c8ceb3cd6bfa5709bb05a38f5886322e9ba6f3b59604755670125/diff:/tmp/ramdisk/docker/overlay2/c3eec0a6317354680ed46e1883089fb345a91aada6d1bebc1585869e7497cfa2/diff:/tmp/ramdisk/docker/overlay2/f82163d218a43b50885139dfad1c5dedcd4d381fd76af4a38e0480d48e2ab32b/diff:/tmp/ramdisk/docker/overlay2/c72be8cddf53887e1c8c55207153d2b59e26e3327b60764d1dbe632d6db7b4ff/diff",
                "MergedDir": "/tmp/ramdisk/docker/overlay2/03efcd70b4e0faac571708859c2cfb0d43f7da1331702508980c6c49945ba1e4/merged",
                "UpperDir": "/tmp/ramdisk/docker/overlay2/03efcd70b4e0faac571708859c2cfb0d43f7da1331702508980c6c49945ba1e4/diff",
                "WorkDir": "/tmp/ramdisk/docker/overlay2/03efcd70b4e0faac571708859c2cfb0d43f7da1331702508980c6c49945ba1e4/work"
            "Name": "overlay2"
        "Mounts": [
                "Type": "bind",
                "Source": "/home/hudson/.hudson/workspace/Mega-Integration/release_scripts",
                "Destination": "/release_scripts",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
                "Type": "bind",
                "Source": "/tmp/prerelease_out",
                "Destination": "/tmp/prerelease_out",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
                "Type": "bind",
                "Source": "/home/hudson/pbuilder_ccache",
                "Destination": "/tmp/pbuilder_ccache",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
                "Type": "bind",
                "Source": "/tmp/.X11-unix",
                "Destination": "/tmp/.X11-unix",
                "Mode": "rw",
                "RW": true,
                "Propagation": "rprivate"
                "Type": "bind",
                "Source": "/home/hudson/exchange",
                "Destination": "/home/user/exchange",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
        "Config": {
            "Hostname": "vm64-3",
            "Domainname": "",
            "User": "",
            "AttachStdin": false,
            "AttachStdout": true,
            "AttachStderr": true,
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
            "Cmd": [
                "./release_scripts/ /release_scripts/.rosinstall unused-version-argument git /release_scripts/.rosinstall erbium xenial kinetic index.yaml coverage true "
            "Image": "gitlab:4567/dockers/pal_docker_images/core-erbium-internal-staging",
            "Volumes": null,
            "WorkingDir": "",
            "Entrypoint": null,
            "OnBuild": null,
            "Labels": {
                "com.nvidia.volumes.needed": "nvidia_driver",
                "maintainer": "[email protected]"
        "NetworkSettings": {
            "Bridge": "",
            "SandboxID": "57d9e5bf0c9e05f5351787befe065ef390dd39f39d2aa7681b28abfe4381a8d3",
            "HairpinMode": false,
            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
            "Ports": {},
            "SandboxKey": "/var/run/docker/netns/default",
            "SecondaryIPAddresses": null,
            "SecondaryIPv6Addresses": null,
            "EndpointID": "",
            "Gateway": "",
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "IPAddress": "",
            "IPPrefixLen": 0,
            "IPv6Gateway": "",
            "MacAddress": "",
            "Networks": {
                "host": {
                    "IPAMConfig": null,
                    "Links": null,
                    "Aliases": null,
                    "NetworkID": "39e3e5618a9dcb24f03e831dbe1929bacdca472fefa3542122bc217429d6914d",
                    "EndpointID": "eabbf4a17b6f92b564ff0303cbc8d3ec67744f9d0b90fe21d49f5b7aebca1d5c",
                    "Gateway": "",
                    "IPAddress": "",
                    "IPPrefixLen": 0,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": "",
                    "DriverOpts": null

Also experienced these symptoms on 17.12.0-ce on Ubuntu 16.04.3 LTS with a 3-node Rancher (v1.6.14) cluster. Curiously, only one of the nodes exhibits the issue intermittently (anywhere from 1-7 days) although they're all running the same docker/ubuntu/rancher versions. Although, the node it happens on happens to be running the most number of containers and thus has the highest amount of load.

Planning on upgrading this cluster to 17.12.1-ce this weekend to see if this helps solve the issue fingers crossed

Anyone else had any luck on 17.12.1-ce yet?

@mauriceteunissen 17.12.1-ce solved the problem for me!

Sadly 17.12.1-ce did not solve the problem for me.
I'm still unable to stop some containers with HEALTHCHECK in their Dockerfile.


it took longer than expected to reproduce it. The zip contains the two dump you required.

$ docker info
Containers: 21
 Running: 21
 Paused: 0
 Stopped: 0
Images: 42
Server Version: 18.03.0-ce-rc1
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
 Volume: local
 Network: bridge host ipvlan macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: cfd04396dc68220d1cecbe686a6cc3aa5ce3667c
runc version: 6c55f98695e902427906eed2c799e566e3d3dfb5
init version: 949e6fa
Security Options:
  Profile: default
Kernel Version: 4.9.75-linuxkit-aufs
Operating System: Docker for Mac
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.363GiB
Name: linuxkit-025000000001
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 185
 Goroutines: 197
 System Time: 2018-03-07T15:05:02.215912835Z
 EventsListeners: 2
HTTP Proxy: docker.for.mac.http.internal:3128
HTTPS Proxy: docker.for.mac.http.internal:3129
Experimental: true
Insecure Registries:
Live Restore Enabled: false

This was happening to me frequently with 17.12.0-ce. I switched to 18.03.0-ce-rc1-mac54 and it happens far less often, although it does still happen occasionally. I do have HEALTHCHECKs on the all the containers that happens to.

Hi, sorry if this is a solved problem that's just waiting for the fix to make its way into the stable build, but I'm not sure whether I have a slightly different variation. I'm running Docker For Windows in Swarm Mode, to match my production environment, and have no problem with my deployed stacks but containers that I run separately sometimes get stuck in unkillable states. This has been going on for weeks at least (possibly months, I'm not sure).

I typically have a single stack deployed that's the app I'm working on. In addition to this, I run development tasks through docker run --rm commands with my project bind-mounted into the container. For example, a Webpack build command would be `docker run --rm -t --env "NODE_ENV=development" -v $(pwd):/srv --workdir /srv node:alpine node_modules/.bin/webpack --config webpack.config.js --progress.

Sometimes a Webpack build freezes (at around 10-12%) and I have to ctrl + c out of it. I've tried leaving it for 30 minutes or so, it never unfreezes (my typical build time is < 25 seconds). In these situations, when I press ctrl + c, sometimes the container dies nicely like it should but other times it hangs around like nothing happened. When the container hangs around, docker ps shows it still running but docker kill and docker rm just hang until I cancel them (again, I've tried leaving it for a good 20 mintes or so without the kill and rm commands completing).

I suspected a memory issue so tried increasing the memory allocated to Docker as much as I can without making Windows unstable, but that made no difference to the frequency of this issue. I then tried watching docker stats while my builds are running and it seems to peak at about 150Mb (out of my available 2.5Gb). Even with all the other containers I have running for my dev environment, the combined total doesn't exceed 20%, so it seems OOM is not the issue.

Another point is that whenever a container gets into this state, if I try re-running the failed command in a new container (e.g. re-run the same docker run command that I posted for a Webpack build) the new container just hangs with no log output and becomes unkillable too. This is now making me suspect it's file system related.

A simple restart of Docker for Windows gets everything back into a working state. Usually, repeating my Webpack build command after a restart works fine.

I doubt I'm stuck with this issue too.

I was debugging with lsof/strace, I found containerd stuck with writing something to stdout/stderr pipe.

% sudo ls -l /var/run/docker/containerd/2456b59776d918e0e07ae9259a54229a8f0985ae7b6ad2be7d25fccf8fdd5b49
total 0
prwx------ 1 root root 0 Mar  8 21:46 0048bf0b619804896d3b70ccab859221bf8d318415b26cedfc096b80f4f9ed08-stderr
prwx------ 1 root root 0 Mar  8 21:46 0048bf0b619804896d3b70ccab859221bf8d318415b26cedfc096b80f4f9ed08-stdout
prwx------ 1 root root 0 Mar  8 20:10 init-stderr
prwx------ 1 root root 0 Mar  9 11:17 init-stdout

I found 2456b59776d918e0e07ae9259a54229a8f0985ae7b6ad2be7d25fccf8fdd5b49 is container id, but 0048bf0b619804896d3b70ccab859221bf8d318415b26cedfc096b80f4f9ed08 is unknown.
By the way, I found containerd exits after reading unknown two pipes, with stdout -> stderr order.

% sudo cat /var/run/docker/containerd/2456b59776d918e0e07ae9259a54229a8f0985ae7b6ad2be7d25fccf8fdd5b49/0048bf0b619804896d3b70ccab859221bf8d318415b26cedfc096b80f4f9ed08-stdout
% sudo cat /var/run/docker/containerd/2456b59776d918e0e07ae9259a54229a8f0985ae7b6ad2be7d25fccf8fdd5b49/0048bf0b619804896d3b70ccab859221bf8d318415b26cedfc096b80f4f9ed08-stderr

I suspect dockerd contains some race condition, but I couldn't find that.

Docker was running smoothly on my host. But since I upgraded from Docker 17.12.0 to 17.12.1, I encounter sometimes the problem that I cannot stop or restart a container, especially if they have healthcheck setup. So when I found this issue I was puzzled, since 17.12.1 should have solve this problem and not trigger it.

The weird thing, if rebooting the host help a lot "cleaning up" things in Dockerd, so that I can again stop or restart certain containers. But after several days/weeks running, it will start failing to do so. For those containers which are failing, both using docker-compose or directly the docker cli fail (so docker-compose down or docker stop <name> fail equally, similar for restart).

$ docker info
Containers: 13
 Running: 7
 Paused: 0
 Stopped: 6
Images: 93
Server Version: 17.12.1-ce
Storage Driver: btrfs
 Build Version: Btrfs v4.4
 Library Version: 101
Logging Driver: json-file
Cgroup Driver: cgroupfs
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9b55aab90508bd389d7654c4baf173a981477d55
runc version: 9f9c96235cc97674e935002fc3d78361b696a69e
init version: 949e6fa
Security Options:
  Profile: default
Kernel Version: 4.13.0-36-generic
Operating System: Ubuntu 16.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.544GiB
Name: malmur
Docker Root Dir: /var/lib/docker/235536.235536
Debug Mode (client): false
Debug Mode (server): false
Experimental: false
Insecure Registries:
Live Restore Enabled: false

I have similar problem with version is 18.02. I think I can reproduce it quite easy by running a test in Jenkins. It occurs every time I run my test.

Here is my information.

docker stop 5a28870c74e1

Error response from daemon: cannot stop container: 5a28870c74e1: Cannot kill container 5a28870c74e166152b4fa57a95691969eb0fb85185504dee43e495168e179bdd: connection error: desc = "transport: dial unix /var/run/docker/containerd/docker-containerd.sock: connect: connection refused": unknown

Output of docker info:

Containers: 6
 Running: 3
 Paused: 0
 Stopped: 3
Images: 8
Server Version: 18.02.0-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: N/A (expected: 9b55aab90508bd389d7654c4baf173a981477d55)
runc version: 9f9c96235cc97674e935002fc3d78361b696a69e
init version: 949e6fa
Security Options:
  Profile: default
Kernel Version: 3.10.0-693.11.6.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 992.3MiB
Name: docker
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Experimental: false
Insecure Registries:
Live Restore Enabled: false

docker version

 Version:   18.02.0-ce
 API version:   1.36
 Go version:    go1.9.3
 Git commit:    fc4de44
 Built: Wed Feb  7 21:14:12 2018
 OS/Arch:   linux/amd64
 Experimental:  false
 Orchestrator:  swarm

  Version:  18.02.0-ce
  API version:  1.36 (minimum version 1.12)
  Go version:   go1.9.3
  Git commit:   fc4de44
  Built:    Wed Feb  7 21:17:42 2018
  OS/Arch:  linux/amd64
  Experimental: false

docker inspect

        "Id": "5a28870c74e166152b4fa57a95691969eb0fb85185504dee43e495168e179bdd",
        "Created": "2018-03-13T09:38:27.546391905Z",
        "Path": "cat",
        "Args": [],
        "State": {
            "Status": "running",
            "Running": true,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 22458,
            "ExitCode": 0,
            "Error": "",
            "StartedAt": "2018-03-13T09:38:27.977365052Z",
            "FinishedAt": "0001-01-01T00:00:00Z"
        "Image": "sha256:0d90898532210246893d448740ee6ad83ad351f71dc188d4a5705d193cb3a580",
        "ResolvConfPath": "/var/lib/docker/containers/5a28870c74e166152b4fa57a95691969eb0fb85185504dee43e495168e179bdd/resolv.conf",
        "HostnamePath": "/var/lib/docker/containers/5a28870c74e166152b4fa57a95691969eb0fb85185504dee43e495168e179bdd/hostname",
        "HostsPath": "/var/lib/docker/containers/5a28870c74e166152b4fa57a95691969eb0fb85185504dee43e495168e179bdd/hosts",
        "LogPath": "/var/lib/docker/containers/5a28870c74e166152b4fa57a95691969eb0fb85185504dee43e495168e179bdd/5a28870c74e166152b4fa57a95691969eb0fb85185504dee43e495168e179bdd-json.log",
        "Name": "/reverent_lovelace",
        "RestartCount": 0,
        "Driver": "overlay2",
        "Platform": "linux",
        "MountLabel": "",
        "ProcessLabel": "",
        "AppArmorProfile": "",
        "ExecIDs": [
        "HostConfig": {
            "Binds": null,
            "ContainerIDFile": "",
            "LogConfig": {
                "Type": "json-file",
                "Config": {}
            "NetworkMode": "default",
            "PortBindings": {},
            "RestartPolicy": {
                "Name": "no",
                "MaximumRetryCount": 0
            "AutoRemove": false,
            "VolumeDriver": "",
            "VolumesFrom": [
            "CapAdd": null,
            "CapDrop": null,
            "Dns": [],
            "DnsOptions": [],
            "DnsSearch": [],
            "ExtraHosts": null,
            "GroupAdd": null,
            "IpcMode": "shareable",
            "Cgroup": "",
            "Links": null,
            "OomScoreAdj": 0,
            "PidMode": "",
            "Privileged": false,
            "PublishAllPorts": false,
            "ReadonlyRootfs": false,
            "SecurityOpt": null,
            "UTSMode": "",
            "UsernsMode": "",
            "ShmSize": 67108864,
            "Runtime": "runc",
            "ConsoleSize": [
            "Isolation": "",
            "CpuShares": 0,
            "Memory": 0,
            "NanoCpus": 0,
            "CgroupParent": "",
            "BlkioWeight": 0,
            "BlkioWeightDevice": [],
            "BlkioDeviceReadBps": null,
            "BlkioDeviceWriteBps": null,
            "BlkioDeviceReadIOps": null,
            "BlkioDeviceWriteIOps": null,
            "CpuPeriod": 0,
            "CpuQuota": 0,
            "CpuRealtimePeriod": 0,
            "CpuRealtimeRuntime": 0,
            "CpusetCpus": "",
            "CpusetMems": "",
            "Devices": [],
            "DeviceCgroupRules": null,
            "DiskQuota": 0,
            "KernelMemory": 0,
            "MemoryReservation": 0,
            "MemorySwap": 0,
            "MemorySwappiness": null,
            "OomKillDisable": false,
            "PidsLimit": 0,
            "Ulimits": null,
            "CpuCount": 0,
            "CpuPercent": 0,
            "IOMaximumIOps": 0,
            "IOMaximumBandwidth": 0
        "GraphDriver": {
            "Data": {
                "LowerDir": "/var/lib/docker/overlay2/78b8c2d7843b9374c55d939ccb028e27bedb1bbba34032c38609eb57f32e74a1-init/diff:/var/lib/docker/overlay2/7816f6aae89ff8e103e90e971e70cdc3a5bda0722d89fa361fce259e169c8aa8/diff:/var/lib/docker/overlay2/72f3b10b691778a8838270b9548a52dfabb11f28b2f520f1a24981dd416d3a25/diff:/var/lib/docker/overlay2/40cbb59d56e7583545b375636d8b99666071d3511c7acc1edc87f8e175ff5df0/diff:/var/lib/docker/overlay2/e0f0d655a2a8e3d518e966d0bd90d8e1de6ea06cd99f27ca7a5716c5be2faa19/diff:/var/lib/docker/overlay2/367dc44c14f8ad44319a439436966e9e1b00779b51c819307eb3ab47422f0b36/diff:/var/lib/docker/overlay2/e04892a8d5d028c713f10f0cdbb5d0be6854b4c90b3fa9c2bee3f0fc7217bad9/diff",
                "MergedDir": "/var/lib/docker/overlay2/78b8c2d7843b9374c55d939ccb028e27bedb1bbba34032c38609eb57f32e74a1/merged",
                "UpperDir": "/var/lib/docker/overlay2/78b8c2d7843b9374c55d939ccb028e27bedb1bbba34032c38609eb57f32e74a1/diff",
                "WorkDir": "/var/lib/docker/overlay2/78b8c2d7843b9374c55d939ccb028e27bedb1bbba34032c38609eb57f32e74a1/work"
            "Name": "overlay2"
        "Mounts": [
                "Type": "bind",
                "Source": "/home/docker/jenkins/jenkins_home",
                "Destination": "/var/jenkins_home",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
                "Type": "bind",
                "Source": "/var/run/docker.sock",
                "Destination": "/var/run/docker.sock",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
        "Config": {
            "Hostname": "5a28870c74e1",
            "Domainname": "",
            "User": "1000:1000",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "Tty": true,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "JOB_NAME=pipeline test/master",
                "BUILD_TAG=jenkins-pipeline test-master-12",
            "Cmd": [
            "Image": "node:6.3",
            "Volumes": null,
            "WorkingDir": "/var/jenkins_home/workspace/pipeline_test_master-U6GGLXKIF4VESPPHZ475TWN76LB4NUH5VZJ6SPHWOEU6N7DL5ICA",
            "Entrypoint": null,
            "OnBuild": null,
            "Labels": {}
        "NetworkSettings": {
            "Bridge": "",
            "SandboxID": "16d1c596372cdd66415d9e738f493207f7fc8ee4444e936b08b40d273b05de1b",
            "HairpinMode": false,
            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
            "Ports": {},
            "SandboxKey": "/var/run/docker/netns/16d1c596372c",
            "SecondaryIPAddresses": null,
            "SecondaryIPv6Addresses": null,
            "EndpointID": "baec7581c0a8f19708e82d87ff98016b89309ef9b2c7717e92c28742f218fa4c",
            "Gateway": "",
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "IPAddress": "",
            "IPPrefixLen": 16,
            "IPv6Gateway": "",
            "MacAddress": "02:42:ac:11:00:04",
            "Networks": {
                "bridge": {
                    "IPAMConfig": null,
                    "Links": null,
                    "Aliases": null,
                    "NetworkID": "669f5d230c893729c633095f59ad01885c689028f4379aba3621f239f734e46b",
                    "EndpointID": "baec7581c0a8f19708e82d87ff98016b89309ef9b2c7717e92c28742f218fa4c",
                    "Gateway": "",
                    "IPAddress": "",
                    "IPPrefixLen": 16,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": "02:42:ac:11:00:04",
                    "DriverOpts": null

        "Id": "5049a01b4358233471daa74fb11996a6b8e3acfd9c00b33ea7a52a3c24264eaa",
        "Created": "2018-03-12T09:37:38.9422449Z",
        "Path": "/sbin/tini",
        "Args": [
        "State": {
            "Status": "running",
            "Running": true,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 22088,
            "ExitCode": 0,
            "Error": "",
            "StartedAt": "2018-03-13T09:33:50.029083326Z",
            "FinishedAt": "2018-03-13T09:27:43.347300762Z"
        "Image": "sha256:6844ee63019e341fa7c06a90ce1455200bad2f919738d12e44eaff03198c91d0",
        "ResolvConfPath": "/var/lib/docker/containers/5049a01b4358233471daa74fb11996a6b8e3acfd9c00b33ea7a52a3c24264eaa/resolv.conf",
        "HostnamePath": "/var/lib/docker/containers/5049a01b4358233471daa74fb11996a6b8e3acfd9c00b33ea7a52a3c24264eaa/hostname",
        "HostsPath": "/var/lib/docker/containers/5049a01b4358233471daa74fb11996a6b8e3acfd9c00b33ea7a52a3c24264eaa/hosts",
        "LogPath": "/var/lib/docker/containers/5049a01b4358233471daa74fb11996a6b8e3acfd9c00b33ea7a52a3c24264eaa/5049a01b4358233471daa74fb11996a6b8e3acfd9c00b33ea7a52a3c24264eaa-json.log",
        "Name": "/wonderful_carson",
        "RestartCount": 0,
        "Driver": "overlay2",
        "Platform": "linux",
        "MountLabel": "",
        "ProcessLabel": "",
        "AppArmorProfile": "",
        "ExecIDs": null,
        "HostConfig": {
            "Binds": [
            "ContainerIDFile": "",
            "LogConfig": {
                "Type": "json-file",
                "Config": {}
            "NetworkMode": "default",
            "PortBindings": {},
            "RestartPolicy": {
                "Name": "no",
                "MaximumRetryCount": 0
            "AutoRemove": false,
            "VolumeDriver": "",
            "VolumesFrom": null,
            "CapAdd": null,
            "CapDrop": null,
            "Dns": [],
            "DnsOptions": [],
            "DnsSearch": [],
            "ExtraHosts": null,
            "GroupAdd": null,
            "IpcMode": "shareable",
            "Cgroup": "",
            "Links": null,
            "OomScoreAdj": 0,
            "PidMode": "",
            "Privileged": false,
            "PublishAllPorts": false,
            "ReadonlyRootfs": false,
            "SecurityOpt": null,
            "UTSMode": "",
            "UsernsMode": "",
            "ShmSize": 67108864,
            "Runtime": "runc",
            "ConsoleSize": [
            "Isolation": "",
            "CpuShares": 0,
            "Memory": 0,
            "NanoCpus": 0,
            "CgroupParent": "",
            "BlkioWeight": 0,
            "BlkioWeightDevice": [],
            "BlkioDeviceReadBps": null,
            "BlkioDeviceWriteBps": null,
            "BlkioDeviceReadIOps": null,
            "BlkioDeviceWriteIOps": null,
            "CpuPeriod": 0,
            "CpuQuota": 0,
            "CpuRealtimePeriod": 0,
            "CpuRealtimeRuntime": 0,
            "CpusetCpus": "",
            "CpusetMems": "",
            "Devices": [],
            "DeviceCgroupRules": null,
            "DiskQuota": 0,
            "KernelMemory": 0,
            "MemoryReservation": 0,
            "MemorySwap": 0,
            "MemorySwappiness": null,
            "OomKillDisable": false,
            "PidsLimit": 0,
            "Ulimits": null,
            "CpuCount": 0,
            "CpuPercent": 0,
            "IOMaximumIOps": 0,
            "IOMaximumBandwidth": 0
        "GraphDriver": {
            "Data": {
                "LowerDir": "/var/lib/docker/overlay2/b3474fead85ba8317d2f19aecb9d13128c3660d9fa4b4a0d785480477a8e627f-init/diff:/var/lib/docker/overlay2/c1db031b078fccca2c4ecb9c08f02a12a166a19fbbb07d709fc7f909e543224b/diff:/var/lib/docker/overlay2/e87a5e9585cf46649068fd2804e9b0077586798e8817b635e9cdb27d604c9de3/diff:/var/lib/docker/overlay2/3aaf41fa890140b773a497f75e1e230249c9a50ef9e637807046bd2645d47383/diff:/var/lib/docker/overlay2/b3042667faad116fa7c1fda0ade770be4b7f3d980ec0c1fdaa3b7141590bce1a/diff:/var/lib/docker/overlay2/061f3bf09589109b981a0b1e4d1e033eda2e69ef0b93ee5d7b7219288392c75e/diff:/var/lib/docker/overlay2/ff2306ca7c4b29c85b102e4274968da89c04335ecbb436c348a2129aadaf5554/diff:/var/lib/docker/overlay2/f82ff9ece6eac85cacb3b39129ea89afe9f59b4729c85a06af7277f357a1800e/diff:/var/lib/docker/overlay2/486e2412d67fc87781f0bd6890402915bf1270ab6b9553c603427e671a2bb01d/diff:/var/lib/docker/overlay2/5a80be543bbabb904d8bc75cdee2788f58988b86bbe6e0b7cafa4f23a1afd08f/diff:/var/lib/docker/overlay2/a80d6b1c1895a11ac66cdb30aabf17cfe912f05b87260d435a53b2199b90bae5/diff:/var/lib/docker/overlay2/ee85443de96d6090613ca21303bb168e3d11f717424fbb2c198e8d9ad315ca7a/diff:/var/lib/docker/overlay2/b4bb1975a154aa01a6d293e6346bece8fca12ecf2390153a1c9b236aec5df008/diff:/var/lib/docker/overlay2/b57c6c33340dae437b79a51f24524821796e8dee73e1cb5499834e8cc08f3e81/diff:/var/lib/docker/overlay2/0e0e4140d22655500ea9580a3f5a3bf69b4a6e3f1880bf0896406aeea17cdb8a/diff:/var/lib/docker/overlay2/d49d5d848055333df33351aeeb4c7a9ed967515ef433ee875d5dbfc6edb14c17/diff:/var/lib/docker/overlay2/a87e0ee4bf32e82943b4e38375131867e0fa6fb5a524ad2a4835223588785a9d/diff:/var/lib/docker/overlay2/a6f8827918230ce42c9988c190c229030a62426de6ebc6de1e905573ef60ae9e/diff:/var/lib/docker/overlay2/5c6b6e85d0f568307e7514e51b736c50a5dbf12f3d8ba971067f4acfa02609bb/diff:/var/lib/docker/overlay2/a79d301bc255744c54ccdaa9c440494dfadec50106b0e3d6d0f7461c7a490742/diff:/var/lib/docker/overlay2/f6f53d9320fee3231a646cbc71bdffc571162368ee123ff3270ba39b2d70691a/diff:/var/lib/docker/overlay2/5930cbcb8f8133c83755f1e2c509a8cd51e7b5ba9a76dc0c581973129d3aa35d/diff",
                "MergedDir": "/var/lib/docker/overlay2/b3474fead85ba8317d2f19aecb9d13128c3660d9fa4b4a0d785480477a8e627f/merged",
                "UpperDir": "/var/lib/docker/overlay2/b3474fead85ba8317d2f19aecb9d13128c3660d9fa4b4a0d785480477a8e627f/diff",
                "WorkDir": "/var/lib/docker/overlay2/b3474fead85ba8317d2f19aecb9d13128c3660d9fa4b4a0d785480477a8e627f/work"
            "Name": "overlay2"
        "Mounts": [
                "Type": "bind",
                "Source": "/home/docker/jenkins/jenkins_home",
                "Destination": "/var/jenkins_home",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
                "Type": "bind",
                "Source": "/var/run/docker.sock",
                "Destination": "/var/run/docker.sock",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
        "Config": {
            "Hostname": "5049a01b4358",
            "Domainname": "",
            "User": "jenkins",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "ExposedPorts": {
                "50000/tcp": {},
                "8080/tcp": {}
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "[email protected]",
            "Cmd": null,
            "ArgsEscaped": true,
            "Image": "jenkins/jenkins:latest",
            "Volumes": {
                "/var/jenkins_home": {}
            "WorkingDir": "",
            "Entrypoint": [
            "OnBuild": null,
            "Labels": {}
        "NetworkSettings": {
            "Bridge": "",
            "SandboxID": "1dcd94febb0a76606f80777174e45b3fd7bfc4d90676b9b2a3bb2c3226aa2904",
            "HairpinMode": false,
            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
            "Ports": {
                "50000/tcp": null,
                "8080/tcp": null
            "SandboxKey": "/var/run/docker/netns/1dcd94febb0a",
            "SecondaryIPAddresses": null,
            "SecondaryIPv6Addresses": null,
            "EndpointID": "1bd793b1aad29922934758049725101de03b9284bb48774d3fd6012d7cbbbb2b",
            "Gateway": "",
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "IPAddress": "",
            "IPPrefixLen": 16,
            "IPv6Gateway": "",
            "MacAddress": "02:42:ac:11:00:03",
            "Networks": {
                "bridge": {
                    "IPAMConfig": null,
                    "Links": null,
                    "Aliases": null,
                    "NetworkID": "669f5d230c893729c633095f59ad01885c689028f4379aba3621f239f734e46b",
                    "EndpointID": "1bd793b1aad29922934758049725101de03b9284bb48774d3fd6012d7cbbbb2b",
                    "Gateway": "",
                    "IPAddress": "",
                    "IPPrefixLen": 16,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": "02:42:ac:11:00:03",
                    "DriverOpts": null

@miwa911 based on;

containerd version: N/A (expected: 9b55aab90508bd389d7654c4baf173a981477d55)

It looks like containerd may have quit / restarted. I see that machine does not have a lot of memory;

Total Memory: 992.3MiB

Could it be it ran out of memory during the test, and the kernel OOM-killed some processes? I'd recommend checking the daemon and system-logs to see if they contain more information.

@thaJeztah Thanks thaJeztah,
I am newbie so I don't know where to check. I've attached my stacktrace (curl --unix-socket /var/run/docker.sock http://localhost/debug/pprof/goroutine?debug=2).


I think I have a related problem

I've upgrade our dev environement to the latest 17.12.1-ce, build 7390fc6 last week and it's the first time I see this error.

I developer tried to update an application, and swarm is unable to delete an old container of the previous version on a specific node on the cluster. I found out because developers started complaining about a white page syndrom in a intermittent way.

When I do a docker service ps on the service, here's what I see :
The old container is running but in shutdown state.

ON the node, I see the container as if it were running in a healthy way :

And from the "docker service ls" , I have more containers than expected

I tried running docker kill and docker inspect on the container from the node but it's not working.
I do not have any specific messager in dmesg.

That's all I can tell from now, I'll remove the stack and launch it again so developers are enable to continue their work.

Hope it helps


  • Stack rm did not fix the issue, the zombie container was still on the node
  • Setting the node availability to drain did NOT fix the issue, node was left with only the zombie container on it
  • service docker restart don't respond
  • Finally rebooted the node and all the containers were not there anymore.

I saw some error like this on the node during the process

ar 13 10:04:10 server-name dockerd: time="2018-03-13T10:04:10.406196465-04:00" level=error msg="Failed to load container f5d6bb74d6b37871b72b5f27d46f8705a6b66cba7afb50706bbf68b764facb24: open /var/lib/docker/containers/f5d6bb74d6b37871b72b5f27d46f8705a6b66cba7afb50706bbf68b764facb24/config.v2.json: no such file or directory"
Mar 13 10:04:10 server-name dockerd: time="2018-03-13T10:04:10.408039262-04:00" level=error msg="Failed to load container fd5ac869991b263a28c36bddf9b2847a8a26e2b7d59fa033f85e9616b0b7cb7a: open /var/lib/docker/containers/fd5ac869991b263a28c36bddf9b2847a8a26e2b7d59fa033f85e9616b0b7cb7a/config.v2.json: no such file or directory"

Found somebody else with the same issue :

/cc @stevvooe @dmcgowan in case you are aware of things to look for from the containerd side

@mauriceteunissen we have the issue with 17.12.1-ce

Wonder if using Tini: All Tini does is spawn a single child (Tini is meant to be run in a container), and wait for it to exit all the while reaping zombies and performing signal forwarding. this issue and will be fixed!

@loretoparisi Doubt it.

But you an always set docker run --init to see... --init uses tini.

@cpuguy83 I'm say that since in all my issues and findings (see related to Java, to main problem could be addressed to zombie processes in the JVM (like Java multi-threaded applications with not trapped uncaught exceptions) and/or locked I/O resources (mostly the same java processes that kept some socket/file opened) OR the same issue but on some C++ executable launched via child_process fork. But this is just an hypothesis, since I was not able to replicate it in a way to be sure of it...

We also have the same issue on 17.12.1-ce

Over time containers enters a state where docker ps and docker inspect hangs.
Forcing the swarm to redeploy the service makes the container enter a zombie state (Desired state: Shutdown, Current status: Running).

docker kill does not work. One way to kill the container is ps aux | grep [container_id] and then kill [process_id]

Is there any information needed that I can provide?

@mhaamann If docker commands are stuck, can you please grab the stack trace from dockerd?

You can do this by hitting the /debug/pprof/goroutines?debug=2 endpoint

curl --unix-socket /var/run/docker.sock http://./debug/pprof/goroutines?debug=2

Sure @cpuguy83.
I will describe it so that I am sure that I do it correct. Two terminals open.
In the first one i executed docker inspect [container_id]. Waited a few seconds to ensure that it was stuck. (Usually inspect comes back with a result within 1 second.)
Then in the second terminal i executed:

curl --unix-socket /var/run/docker.sock http://./debug/pprof/goroutines?debug=2

Trace can be found here:

@mhaamann Can you run the following command to get the containerd stack?

docker run -it --rm -v /run/docker/containerd:/run/docker/containerd docker:17.12.1 docker-containerd-ctr pprof --debug-socket /run/docker/containerd/docker-containerd-debug.sock goroutines

Alternatively if you are on the host you can just run:

docker-containerd-ctr pprof --debug-socket /run/docker/containerd/docker-containerd-debug.sock goroutines

(btw) The relevant stuck goroutine looks to be a call to containerd:

Here is the stack from the host generated using: docker-containerd-ctr pprof --debug-socket /run/docker/containerd/docker-containerd-debug.sock goroutines

@mhaamann Thanks! Digging deeper...

This looks like it is stuck getting the state of the container from the shim process.
Are you able to trigger a stack trace on the shim? kill -SIGUSR1 ${PID_OF_SHIM}
This should generate a stack trace and propagate up to the dockerd logs.
You should be able to figure out what the pid is as it is the parent process of the container process.


We used ps faux | grep docker and found the parent of the shim. Copied here the relevant part:

root      2624  2.3  2.5 5621140 829884 ?      Ssl  Feb28 782:08 /usr/bin/dockerd
root      2634  0.3  0.1 1694516 47684 ?       Ssl  Feb28 105:32  \_ docker-containerd --config /var/run/docker/containerd/containerd.toml

Then we executed kill -SIGUSR1 2634. The logs have been attached here:

@mhaamann Thanks, but I think we need the dump from docker-containerd-shim rather than docker-containerd

I'm again stuck with that problem. This time when trying to upgrade from 17.12.1 to 18.03.0. The upgrade process is stuck, most containers are still running (because the application are still up and running, but docker ps is stuck).

I've done a dump of the docker-containerd socket, here is the gist:

I do not know how to do a dump of docker-containerd-shim.

@jcberthon Thanks, this seems like the same issue as above on first look.
To get a stack dump from docker-containerd-shim do kill -s SIGUSR1 <docker-containerd-shim-pid>. This should generate a stack trace in the logs for dockerd.

Hi @cpuguy83 I had to reboot the host (before I saw your message), because restarting the docker.service did not work, and killing the processes did not help restarting the containers afterwards. So I went through a complete reboot cycle rather than fiddling until I get back to a clean state.

So I need to wait for next lock-up before I can report the stack dump for docker-containerd-shim. I'm now on 18.03.0 though...

Anyway thanks for getting back quickly to me :-)

Been on 17.12.1 for 21 days now and this issue has happened to me twice since then (including today). I rarely have time to troubleshoot as it's running on a mission critical box and an immediate reboot is often the fix. As @jcberthon pointed out, restarting the docker service doesn't work for me either in this scenario, complete reboot is the only "fix."

If I can get the exact sequence of commands that need to be run and logs that need to be dumped, I can capture those the next time that it happens, but I also took advantage of the need to reboot and upgraded to 18.03.0 as of today hoping that that might fix this issue for me.

@cpuguy83 kill -SIGUSR1 on the docker-containerd-shim does not generate a stack trace. Nothing happens.

It should be in the logs for dockerd?

On Sat, Mar 24, 2018 at 2:15 PM, Matthias Haamann

@cpuguy83 kill -SIGUSR1 on the shim does
not generate a stack trace. Nothing happens.

You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute
the thread


  • Brian Goff

Anyone managed to run properly 17.12.1 in production so far ?

In the last (almost) 5 days (that is when I upgraded to Docker CE 18.03.0), I did not encounter the issue.

It does not mean it is solved in 18.03.0, it is too early to tell. But at least this is less often occuring. Before I had the problem at least every 2 or 3 days. 🤞

@jcberthon really curious about the result. Cause I'm seeing people that have problems with 18.03.0 aswell. @JnMik We decided to downgrade to 17.09.1 untill this issue is resolved since it was happening way to often on 17.12 and 18.02.

We are also sticking to 17.09.1 because newer versions are not working for us.

It is interesting because for my original issue updating to 18.02 was the solution. Well at least so far so good.

Just today I created a fresh new cluster with docker 17.12.1 and I encountered this problem.
After 8 hours, while doing a rolling update of a service, I noticed a container stuck in "running", and I cannot docker inspect it.

Here is the dump from containerd-shim, obtained as @cpuguy83 explained.

Edit: update with (hopefully) correct stack trace from containerd-shim, from a different container that was also stuck.

@mion00 That looks like a containerd dump rather than a containerd-shim :(

@cpuguy83 I updated my previous comment, with a new stack trace generated from a different container.

Sadly it happened again but after more than a week working flawlessly, so it's a win compare to 17.12.1. Cc: @jordijansen

Anyway, I was trying to generate a dump for docker-containerd-shim. I've listed all such processes and picked up one which corresponds to one container which is stuck (I cannot do docker logs, docker exec, etc. on this container). So I did the sudo kill -s SIGUSER1 <pid>, however when I try to check the logs of dockerd (using sudo journalctl -u docker.service) the last log messages date from more than 12 hours ago. So I suspect the dockerd is stuck and does not log anything anymore... So no dump... unless there is an unknown file somewhere...

I've updated the gist ( with the latest information. At the end of it, I've added the commands I run to try to find out which docker-containerd-shim I should send the SIGUSR1 signal. My stuck container is running a MongoDB database, so I look for its PID, then using the proc FS, I look for the cgroups it belong to. Using the cgroup IDs, I identificated the docker-containerd-shim PID.

@jcberthon I also had the same issue, dockerd not logging the stack trace. I resolved it adding "debug": true to the json config in /etc/docker/daemon.json and doing a service docker reload to pick up the new config. This enables debug logs in dockerd, and then you can find the stacktrace after a kill -s SIGUSER1

thank you @mion00 indeed by adding this option and reloading the daemon I can see some logs now when I do sudo journalctl -u docker.service. However, when sending the USR1 signal to the docker-containerd-shim process, I see no new logs, so I do not have any dumps :-(

@mion00 @jcberthon Do you want to give the patch in a try and let me know if that helps?

18.03.0 still affected

Still having this issue on 18.03.0 for mac... less frequently... but still there.

$ docker run -it --rm -v /run/docker/containerd:/run/docker/containerd docker:18.03.0 docker-containerd-ctr pprof --debug-socket /run/docker/containerd/docker-containerd-debug.sock goroutines
goroutine 764393 [running]:
runtime/pprof.writeGoroutineStacks(0x145e720, 0xc420741ce0, 0x0, 0xc420419ad0)
    /usr/local/go/src/runtime/pprof/pprof.go:608 +0xa9
runtime/pprof.writeGoroutine(0x145e720, 0xc420741ce0, 0x2, 0x30, 0x1026340)
    /usr/local/go/src/runtime/pprof/pprof.go:597 +0x46
runtime/pprof.(*Profile).WriteTo(0x1448700, 0x145e720, 0xc420741ce0, 0x2, 0xc420741ce0, 0xc4201b8750)
    /usr/local/go/src/runtime/pprof/pprof.go:310 +0x3ad
net/http/pprof.handler.ServeHTTP(0xc42052f4b1, 0x9, 0x1469220, 0xc420741ce0, 0xc4207e0000)
    /usr/local/go/src/net/http/pprof/pprof.go:237 +0x1ba
net/http/pprof.Index(0x1469220, 0xc420741ce0, 0xc4207e0000)
    /usr/local/go/src/net/http/pprof/pprof.go:248 +0x1dd
net/http.HandlerFunc.ServeHTTP(0x10b0ee8, 0x1469220, 0xc420741ce0, 0xc4207e0000)
    /usr/local/go/src/net/http/server.go:1918 +0x46
net/http.(*ServeMux).ServeHTTP(0xc4201b8750, 0x1469220, 0xc420741ce0, 0xc4207e0000)
    /usr/local/go/src/net/http/server.go:2254 +0x132
net/http.serverHandler.ServeHTTP(0xc4201c0410, 0x1469220, 0xc420741ce0, 0xc4207e0000)
    /usr/local/go/src/net/http/server.go:2619 +0xb6
net/http.(*conn).serve(0xc4206f6280, 0x1469fe0, 0xc4206e26c0)
    /usr/local/go/src/net/http/server.go:1801 +0x71f
created by net/http.(*Server).Serve
    /usr/local/go/src/net/http/server.go:2720 +0x28a

goroutine 1 [chan receive, 5006 minutes]:
main.main.func1(0xc4200aedc0, 0xc4200aedc0, 0xc4201adb4f)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x871, 0x10b0b60, 0xc4200aedc0, 0xc420058ba0, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0xd4*App).Run(0xc420182a80, 0xc420010090, 0x3, 0x3, 0x0, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x655
    /tmp/tmp.AD0Uaz9KtF/src/ +0x53d

goroutine 13 [select, 12 minutes]:
main.handleSignals.func1(0xc420058c60, 0xc420058c00, 0x146a0a0, 0xc4201739b0, 0xc420070420)
    /tmp/tmp.AD0Uaz9KtF/src/ +0xf7
created by main.handleSignals
    /tmp/tmp.AD0Uaz9KtF/src/ +0x8b

goroutine 12 [syscall, 12 minutes]:
    /usr/local/go/src/runtime/sigqueue.go:131 +0xa8
    /usr/local/go/src/os/signal/signal_unix.go:22 +0x24
created by os/signal.init.0
    /usr/local/go/src/os/signal/signal_unix.go:28 +0x43

goroutine 14 [select, 5006 minutes, locked to thread]:
runtime.gopark(0x10b1368, 0x0, 0xba26f3, 0x6, 0x18, 0x1)
    /usr/local/go/src/runtime/proc.go:287 +0x132
runtime.selectgo(0xc420200f50, 0xc4200704e0)
    /usr/local/go/src/runtime/select.go:395 +0x114f
    /usr/local/go/src/runtime/signal_unix.go:511 +0x226
    /usr/local/go/src/runtime/asm_amd64.s:2337 +0x1

goroutine 34 [select, 1 minutes]:*Broadcaster).run(0xc4201a23c0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x414
created by
    /tmp/tmp.AD0Uaz9KtF/src/ +0x1b1

goroutine 35 [select, 1 minutes]:*gcScheduler).run(0xc4201ec3c0, 0x146a0a0, 0xc42024eb10)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x21d
created by
    /tmp/tmp.AD0Uaz9KtF/src/ +0x4bf

goroutine 36 [syscall, 1 minutes]:
syscall.Syscall6(0xe8, 0x5, 0xc4206439b8, 0x80, 0xffffffffffffffff, 0x0, 0x0, 0x1, 0x80, 0x0)
    /usr/local/go/src/syscall/asm_linux_amd64.s:44 +0x5, 0xc4206439b8, 0x80, 0x80, 0xffffffffffffffff, 0x1, 0x0, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x79*oomCollector).start(0xc42020d3c0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x7d
created by
    /tmp/tmp.AD0Uaz9KtF/src/ +0x125

goroutine 50 [IO wait, 1 minutes]:
internal/poll.runtime_pollWait(0x7f9762b56f70, 0x72, 0xffffffffffffffff)
    /usr/local/go/src/runtime/netpoll.go:173 +0x59
internal/poll.(*pollDesc).wait(0xc4202ee118, 0x72, 0xc420203b00, 0x0, 0x0)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:85 +0xb0
internal/poll.(*pollDesc).waitRead(0xc4202ee118, 0xffffffffffffff00, 0x0, 0x0)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:90 +0x3f
internal/poll.(*FD).Accept(0xc4202ee100, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
    /usr/local/go/src/internal/poll/fd_unix.go:335 +0x1e4
net.(*netFD).accept(0xc4202ee100, 0xc4206f6300, 0xfb2160, 0xc420203d78)
    /usr/local/go/src/net/fd_unix.go:238 +0x44
net.(*UnixListener).accept(0xc4202c7920, 0x7f8e9a, 0x45ad10, 0xc420203dc0)
    /usr/local/go/src/net/unixsock_posix.go:162 +0x34
net.(*UnixListener).Accept(0xc4202c7920, 0x10b0d00, 0xc4206f6280, 0x146a0a0, 0xc4201b8840)
    /usr/local/go/src/net/unixsock.go:241 +0x4b
net/http.(*Server).Serve(0xc4201c0410, 0x1468fe0, 0xc4202c7920, 0x0, 0x0)
    /usr/local/go/src/net/http/server.go:2695 +0x1b4
net/http.Serve(0x1468fe0, 0xc4202c7920, 0x145e360, 0xc4201b8750, 0x10b0f00, 0xc420268720)
    /usr/local/go/src/net/http/server.go:2323 +0x75*Server).ServeDebug(0xc4201aef40, 0x1468fe0, 0xc4202c7920, 0xc420268738, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x1c8*Server).ServeDebug-fm(0x1468fe0, 0xc4202c7920, 0xc4202c7920, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x40
main.serve.func1(0x1468fe0, 0xc4202c7920, 0xc4202bb620, 0x146a0a0, 0xc4202c79e0, 0xc4202c0380, 0x37)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x77
created by main.serve
    /tmp/tmp.AD0Uaz9KtF/src/ +0x1c8

goroutine 51 [IO wait, 1 minutes]:
internal/poll.runtime_pollWait(0x7f9762b56eb0, 0x72, 0xffffffffffffffff)
    /usr/local/go/src/runtime/netpoll.go:173 +0x59
internal/poll.(*pollDesc).wait(0xc4202ee298, 0x72, 0xc420033b00, 0x0, 0x0)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:85 +0xb0
internal/poll.(*pollDesc).waitRead(0xc4202ee298, 0xffffffffffffff00, 0x0, 0x0)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:90 +0x3f
internal/poll.(*FD).Accept(0xc4202ee280, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
    /usr/local/go/src/internal/poll/fd_unix.go:335 +0x1e4
net.(*netFD).accept(0xc4202ee280, 0xc4202b8008, 0x0, 0x0)
    /usr/local/go/src/net/fd_unix.go:238 +0x44
net.(*UnixListener).accept(0xc4202c7a70, 0x89339b, 0x45ad10, 0xc420033da0)
    /usr/local/go/src/net/unixsock_posix.go:162 +0x34
net.(*UnixListener).Accept(0xc4202c7a70, 0x10b07e8, 0xc4201f8140, 0x146d6c0, 0xc4202b8008)
    /usr/local/go/src/net/unixsock.go:241 +0x4b*Server).Serve(0xc4201f8140, 0x1468fe0, 0xc4202c7a70, 0x0, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x198*Server).ServeGRPC(0xc4201aef40, 0x1468fe0, 0xc4202c7a70, 0xc420268f38, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x55*Server).ServeGRPC-fm(0x1468fe0, 0xc4202c7a70, 0xc4202c7a70, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x40
main.serve.func1(0x1468fe0, 0xc4202c7a70, 0xc4202bb730, 0x146a0a0, 0xc4202c7b30, 0xc4202c0480, 0x31)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x77
created by main.serve
    /tmp/tmp.AD0Uaz9KtF/src/ +0x1c8

goroutine 52 [IO wait, 1 minutes]:
internal/poll.runtime_pollWait(0x7f9762b56df0, 0x72, 0x0)
    /usr/local/go/src/runtime/netpoll.go:173 +0x59
internal/poll.(*pollDesc).wait(0xc4202ee618, 0x72, 0xffffffffffffff00, 0x1460960, 0x145adf0)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:85 +0xb0
internal/poll.(*pollDesc).waitRead(0xc4202ee618, 0xc4203f2000, 0x8000, 0x8000)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:90 +0x3f
internal/poll.(*FD).Read(0xc4202ee600, 0xc4203f2000, 0x8000, 0x8000, 0x0, 0x0, 0x0)
    /usr/local/go/src/internal/poll/fd_unix.go:126 +0x18c
net.(*netFD).Read(0xc4202ee600, 0xc4203f2000, 0x8000, 0x8000, 0x11, 0x0, 0x0)
    /usr/local/go/src/net/fd_unix.go:202 +0x54
net.(*conn).Read(0xc4202b8028, 0xc4203f2000, 0x8000, 0x8000, 0x0, 0x0, 0x0)
    /usr/local/go/src/net/net.go:176 +0x6f
bufio.(*Reader).Read(0xc4202da660, 0xc4203021f8, 0x9, 0x9, 0x9, 0x0, 0x0)
    /usr/local/go/src/bufio/bufio.go:213 +0x30d
io.ReadAtLeast(0x145c760, 0xc4202da660, 0xc4203021f8, 0x9, 0x9, 0x9, 0x4acb760201fcbb8, 0x5ac36c5e, 0xc4201fcbc0)
    /usr/local/go/src/io/io.go:309 +0x88
io.ReadFull(0x145c760, 0xc4202da660, 0xc4203021f8, 0x9, 0x9, 0x111349c75d3a6, 0x14a7a60, 0xbea8f8f784a7d948)
    /usr/local/go/src/io/io.go:327 +0x5a, 0x9, 0x9, 0x145c760, 0xc4202da660, 0x0, 0x7070e0900000000, 0xc4202e6798, 0xc4201fcce8)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x7d*Framer).ReadFrame(0xc4203021c0, 0xc4201c8ce0, 0xc4201c8ce0, 0x0, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0xa6*http2Server).HandleStreams(0xc420404000, 0xc4203ef9e0, 0x10b0820)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x317*Server).serveStreams(0xc4201f8140, 0x146d1e0, 0xc420404000)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x142*Server).serveHTTP2Transport(0xc4201f8140, 0x146d6c0, 0xc4202b8028, 0x0, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x473*Server).handleRawConn(0xc4201f8140, 0x146d6c0, 0xc4202b8028)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x499
created by*Server).Serve
    /tmp/tmp.AD0Uaz9KtF/src/ +0x5bb

goroutine 53 [select, 1 minutes]:, 0xc4203ed840, 0xc4203ef980, 0xc42041cfb8)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x2e6
    /tmp/tmp.AD0Uaz9KtF/src/ +0x60
created by
    /tmp/tmp.AD0Uaz9KtF/src/ +0x8fb

goroutine 54 [select, 88 minutes]:*http2Server).keepalive(0xc420404000)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x266
created by
    /tmp/tmp.AD0Uaz9KtF/src/ +0x920

goroutine 56 [select, 5006 minutes]:*service).Subscribe(0xc4201b0088, 0xc4203e93c0, 0x146d000, 0xc4203f0550, 0x0, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x213, 0xc4201b0088, 0x146c580, 0xc4203e93a0, 0xc4202e6d20, 0xc4202a2000)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x110, 0xc4201b0088, 0x146c640, 0xc4202ee800, 0xc4203e9380, 0x10af408, 0x0, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x13b*Server).processStreamingRPC(0xc4201f8140, 0x146d1e0, 0xc4200aef20, 0xc4202f0400, 0xc4202c7470, 0x1447fc0, 0x0, 0x0, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x2ea*Server).handleStream(0xc4201f8140, 0x146d1e0, 0xc4200aef20, 0xc4202f0400, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x14c3*Server).serveStreams.func1.1(0xc4200157f0, 0xc4201f8140, 0x146d1e0, 0xc4200aef20, 0xc4202f0400)
    /tmp/tmp.AD0Uaz9KtF/src/ +0xa1
created by*Server).serveStreams.func1
    /tmp/tmp.AD0Uaz9KtF/src/ +0xa3

goroutine 15 [IO wait, 74 minutes]:
internal/poll.runtime_pollWait(0x7f9762b56d30, 0x72, 0x0)
    /usr/local/go/src/runtime/netpoll.go:173 +0x59
internal/poll.(*pollDesc).wait(0xc420012498, 0x72, 0xffffffffffffff00, 0x1460960, 0x145adf0)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:85 +0xb0
internal/poll.(*pollDesc).waitRead(0xc420012498, 0xc420428000, 0x8000, 0x8000)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:90 +0x3f
internal/poll.(*FD).Read(0xc420012480, 0xc420428000, 0x8000, 0x8000, 0x0, 0x0, 0x0)
    /usr/local/go/src/internal/poll/fd_unix.go:126 +0x18c
net.(*netFD).Read(0xc420012480, 0xc420428000, 0x8000, 0x8000, 0x11, 0x0, 0x0)
    /usr/local/go/src/net/fd_unix.go:202 +0x54
net.(*conn).Read(0xc42000e138, 0xc420428000, 0x8000, 0x8000, 0x0, 0x0, 0x0)
    /usr/local/go/src/net/net.go:176 +0x6f
bufio.(*Reader).Read(0xc420058cc0, 0xc4200ac3b8, 0x9, 0x9, 0x9, 0x0, 0x0)
    /usr/local/go/src/bufio/bufio.go:213 +0x30d
io.ReadAtLeast(0x145c760, 0xc420058cc0, 0xc4200ac3b8, 0x9, 0x9, 0x9, 0xc420038bb8, 0x400f10, 0xc420038c67)
    /usr/local/go/src/io/io.go:309 +0x88
io.ReadFull(0x145c760, 0xc420058cc0, 0xc4200ac3b8, 0x9, 0x9, 0x83ce2d, 0xc42048414c, 0xc42043a000)
    /usr/local/go/src/io/io.go:327 +0x5a, 0x9, 0x9, 0x145c760, 0xc420058cc0, 0x0, 0x0, 0xc420484140, 0xc420038ce8)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x7d*Framer).ReadFrame(0xc4200ac380, 0xc420484140, 0xc420484140, 0x0, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0xa6*http2Server).HandleStreams(0xc4200aef20, 0xc420173bc0, 0x10b0820)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x317*Server).serveStreams(0xc4201f8140, 0x146d1e0, 0xc4200aef20)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x142*Server).serveHTTP2Transport(0xc4201f8140, 0x146d6c0, 0xc42000e138, 0x0, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x473*Server).handleRawConn(0xc4201f8140, 0x146d6c0, 0xc42000e138)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x499
created by*Server).Serve
    /tmp/tmp.AD0Uaz9KtF/src/ +0x5bb

goroutine 16 [select, 74 minutes]:, 0xc420054cc0, 0xc420173b60, 0xc42041dfb8)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x2e6
    /tmp/tmp.AD0Uaz9KtF/src/ +0x60
created by
    /tmp/tmp.AD0Uaz9KtF/src/ +0x8fb

goroutine 66 [select, 74 minutes]:*http2Server).keepalive(0xc4200aef20)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x266
created by
    /tmp/tmp.AD0Uaz9KtF/src/ +0x920

goroutine 57 [semacquire, 5006 minutes]:
sync.runtime_notifyListWait(0xc4203edbd0, 0xc400000000)
    /usr/local/go/src/runtime/sema.go:507 +0x114
    /usr/local/go/src/sync/cond.go:56 +0x82*Queue).next(0xc4204382d0, 0x0, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x87*Queue).run(0xc4204382d0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x34
created by
    /tmp/tmp.AD0Uaz9KtF/src/ +0x14d

goroutine 58 [select, 5006 minutes]:*Exchange).Subscribe.func3(0xc420438330, 0xc4203e93e0, 0xc42040a120, 0x1469fe0, 0xc4203edb80, 0xc4202da7e0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x15f
created by*Exchange).Subscribe
    /tmp/tmp.AD0Uaz9KtF/src/ +0x291

goroutine 67 [select, 1 minutes]:*service).Subscribe(0xc4201b0088, 0xc4203e9700, 0x146d000, 0xc4203f06f0, 0x0, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x213, 0xc4201b0088, 0x146c580, 0xc4203e96e0, 0xc4202e7090, 0xc420029400)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x110, 0xc4201b0088, 0x146c640, 0xc4202eeb80, 0xc4203e96c0, 0x10af408, 0x0, 0xc420268ec8)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x13b*Server).processStreamingRPC(0xc4201f8140, 0x146d1e0, 0xc420404840, 0xc42019a200, 0xc4202c7470, 0x1447fc0, 0x0, 0x0, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x2ea*Server).handleStream(0xc4201f8140, 0x146d1e0, 0xc420404840, 0xc42019a200, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x14c3*Server).serveStreams.func1.1(0xc4202c8860, 0xc4201f8140, 0x146d1e0, 0xc420404840, 0xc42019a200)
    /tmp/tmp.AD0Uaz9KtF/src/ +0xa1
created by*Server).serveStreams.func1
    /tmp/tmp.AD0Uaz9KtF/src/ +0xa3

goroutine 60 [IO wait, 1 minutes]:
internal/poll.runtime_pollWait(0x7f9762b56c70, 0x72, 0x0)
    /usr/local/go/src/runtime/netpoll.go:173 +0x59
internal/poll.(*pollDesc).wait(0xc4202ee918, 0x72, 0xffffffffffffff00, 0x1460960, 0x145adf0)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:85 +0xb0
internal/poll.(*pollDesc).waitRead(0xc4202ee918, 0xc420458000, 0x8000, 0x8000)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:90 +0x3f
internal/poll.(*FD).Read(0xc4202ee900, 0xc420458000, 0x8000, 0x8000, 0x0, 0x0, 0x0)
    /usr/local/go/src/internal/poll/fd_unix.go:126 +0x18c
net.(*netFD).Read(0xc4202ee900, 0xc420458000, 0x8000, 0x8000, 0x11, 0x0, 0x0)
    /usr/local/go/src/net/fd_unix.go:202 +0x54
net.(*conn).Read(0xc4202b8040, 0xc420458000, 0x8000, 0x8000, 0x0, 0x0, 0x0)
    /usr/local/go/src/net/net.go:176 +0x6f
bufio.(*Reader).Read(0xc4202daae0, 0xc4203023b8, 0x9, 0x9, 0x9, 0x0, 0x0)
    /usr/local/go/src/bufio/bufio.go:213 +0x30d
io.ReadAtLeast(0x145c760, 0xc4202daae0, 0xc4203023b8, 0x9, 0x9, 0x9, 0x7add264201febb8, 0x5ac36c5e, 0xc4201febc0)
    /usr/local/go/src/io/io.go:309 +0x88
io.ReadFull(0x145c760, 0xc4202daae0, 0xc4203023b8, 0x9, 0x9, 0x111349f76e9fa, 0x14a7a60, 0xbea8f8f787ab9e54)
    /usr/local/go/src/io/io.go:327 +0x5a, 0x9, 0x9, 0x145c760, 0xc4202daae0, 0x0, 0x7070e0900000000, 0xc4202e6fb8, 0xc4201fece8)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x7d*Framer).ReadFrame(0xc420302380, 0xc4202bf260, 0xc4202bf260, 0x0, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0xa6*http2Server).HandleStreams(0xc420404840, 0xc420438900, 0x10b0820)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x317*Server).serveStreams(0xc4201f8140, 0x146d1e0, 0xc420404840)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x142*Server).serveHTTP2Transport(0xc4201f8140, 0x146d6c0, 0xc4202b8040, 0x0, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x473*Server).handleRawConn(0xc4201f8140, 0x146d6c0, 0xc4202b8040)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x499
created by*Server).Serve
    /tmp/tmp.AD0Uaz9KtF/src/ +0x5bb

goroutine 61 [select, 1 minutes]:, 0xc4203eddc0, 0xc4204388a0, 0xc42046cfb8)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x2e6
    /tmp/tmp.AD0Uaz9KtF/src/ +0x60
created by
    /tmp/tmp.AD0Uaz9KtF/src/ +0x8fb

goroutine 62 [select, 74 minutes]:*http2Server).keepalive(0xc420404840)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x266
created by
    /tmp/tmp.AD0Uaz9KtF/src/ +0x920

goroutine 63 [semacquire, 1 minutes]:
sync.runtime_notifyListWait(0xc4203edf50, 0xc400009f35)
    /usr/local/go/src/runtime/sema.go:507 +0x114
    /usr/local/go/src/sync/cond.go:56 +0x82*Queue).next(0xc4204389c0, 0x0, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x87*Queue).run(0xc4204389c0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x34
created by
    /tmp/tmp.AD0Uaz9KtF/src/ +0x14d

goroutine 64 [select, 1 minutes]:*Exchange).Subscribe.func3(0xc420438a20, 0xc4203e9720, 0xc42040a3c0, 0x1469fe0, 0xc4203edf00, 0xc4202daf00)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x15f
created by*Exchange).Subscribe
    /tmp/tmp.AD0Uaz9KtF/src/ +0x291

goroutine 592290 [select, 42 minutes]:*Client).run(0xc4201a6360)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x342
created by
    /tmp/tmp.AD0Uaz9KtF/src/ +0x2ab

goroutine 764394 [IO wait, 1 minutes]:
internal/poll.runtime_pollWait(0x7f9762b56970, 0x72, 0x0)
    /usr/local/go/src/runtime/netpoll.go:173 +0x59
internal/poll.(*pollDesc).wait(0xc4202ee698, 0x72, 0xffffffffffffff00, 0x1460960, 0x145adf0)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:85 +0xb0
internal/poll.(*pollDesc).waitRead(0xc4202ee698, 0xc4205a7600, 0x1, 0x1)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:90 +0x3f
internal/poll.(*FD).Read(0xc4202ee680, 0xc4205a7661, 0x1, 0x1, 0x0, 0x0, 0x0)
    /usr/local/go/src/internal/poll/fd_unix.go:126 +0x18c
net.(*netFD).Read(0xc4202ee680, 0xc4205a7661, 0x1, 0x1, 0x0, 0xc420545a00, 0x42b91b)
    /usr/local/go/src/net/fd_unix.go:202 +0x54
net.(*conn).Read(0xc42000e398, 0xc4205a7661, 0x1, 0x1, 0x0, 0x0, 0x0)
    /usr/local/go/src/net/net.go:176 +0x6f
    /usr/local/go/src/net/http/server.go:660 +0x64
created by net/http.(*connReader).startBackgroundRead
    /usr/local/go/src/net/http/server.go:656 +0xda

goroutine 671746 [select, 42 minutes]:*Client).dispatch(0xc4201a6360, 0x146a0a0, 0xc4204eee70, 0xc420345a80, 0xc4201daa60, 0x0, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x282*Client).Call(0xc4201a6360, 0x146a0a0, 0xc4204eee70, 0xbbc2cb, 0x25, 0xba0ec1, 0x5, 0x1046a60, 0xc4201a1ee0, 0x1046b40, ...)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x15d*shimClient).State(0xc42000e068, 0x146a0a0, 0xc4204eee70, 0xc4201a1ee0, 0x0, 0xc42016b298, 0xc42059f578)
    /tmp/tmp.AD0Uaz9KtF/src/ +0xbf*Process).State(0xc4201daa40, 0x146a0a0, 0xc4204eee70, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
    /tmp/tmp.AD0Uaz9KtF/src/ +0xc7, 0xc4204eee70, 0x146d0c0, 0xc4201daa40, 0x40, 0x146d0c0, 0xc4201daa40)
    /tmp/tmp.AD0Uaz9KtF/src/ +0xbe*service).Get(0xc4202c6e70, 0x7f9762311168, 0xc4204eee70, 0xc4201da800, 0xc4202c6e70, 0xbbd46b, 0x3)
    /tmp/tmp.AD0Uaz9KtF/src/ +0xef, 0xc4204eee70, 0x10444c0, 0xc4201da800, 0xc420079ae0, 0x14cdcb0, 0xf87860, 0xc4201a1e30)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x88, 0xc4204eee70, 0x10444c0, 0xc4201da800, 0xc4201da820, 0xc4201da840, 0x50, 0x48, 0xc4201da7e0, 0xc42059f968)
    /tmp/tmp.AD0Uaz9KtF/src/ +0xd4, 0xc4204eedb0, 0x10444c0, 0xc4201da800, 0xc4201da820, 0xc4201da840, 0x4354a6, 0xc42059f9e0, 0x41228a, 0x50)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x1d2, 0xc4202c6e70, 0x7f9762311168, 0xc42044fbc0, 0xc420079900, 0x10af920, 0x0, 0x0, 0xc42001d900, 0x46)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x16f*Server).processUnaryRPC(0xc4201f8140, 0x146d1e0, 0xc420404840, 0xc42021d700, 0xc4202c77d0, 0x1452dc0, 0x0, 0x0, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0xab6*Server).handleStream(0xc4201f8140, 0x146d1e0, 0xc420404840, 0xc42021d700, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x152a*Server).serveStreams.func1.1(0xc4202c8860, 0xc4201f8140, 0x146d1e0, 0xc420404840, 0xc42021d700)
    /tmp/tmp.AD0Uaz9KtF/src/ +0xa1
created by*Server).serveStreams.func1
    /tmp/tmp.AD0Uaz9KtF/src/ +0xa3

goroutine 764368 [syscall, 1 minutes]:
syscall.Syscall6(0xf7, 0x1, 0x36fe, 0xc42049d5b8, 0x1000004, 0x0, 0x0, 0x146d1e0, 0xc420404840, 0xc4204f9c00)
    /usr/local/go/src/syscall/asm_linux_amd64.s:44 +0x5
os.(*Process).blockUntilWaitable(0xc4207ebe90, 0x0, 0xc42049d6b0, 0x5268aa)
    /usr/local/go/src/os/wait_waitid.go:31 +0xa7
os.(*Process).wait(0xc4207ebe90, 0xc42049d770, 0x8943d5, 0x146a0a0)
    /usr/local/go/src/os/exec_unix.go:22 +0x44
os.(*Process).Wait(0xc4207ebe90, 0xc420523400, 0x42b91b, 0xc400000008)
    /usr/local/go/src/os/exec.go:115 +0x2d
os/exec.(*Cmd).Wait(0xc4203138c0, 0x45e611, 0xc4202c8860)
    /usr/local/go/src/os/exec/exec.go:446 +0x64, 0xc420568320)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x2d
created by
    /tmp/tmp.AD0Uaz9KtF/src/ +0x2d5

goroutine 592291 [IO wait, 42 minutes]:
internal/poll.runtime_pollWait(0x7f9762b56af0, 0x72, 0x0)
    /usr/local/go/src/runtime/netpoll.go:173 +0x59
internal/poll.(*pollDesc).wait(0xc4201b4398, 0x72, 0xffffffffffffff00, 0x1460960, 0x145adf0)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:85 +0xb0
internal/poll.(*pollDesc).waitRead(0xc4201b4398, 0xc4202ea000, 0x1000, 0x1000)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:90 +0x3f
internal/poll.(*FD).Read(0xc4201b4380, 0xc4202ea000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
    /usr/local/go/src/internal/poll/fd_unix.go:126 +0x18c
net.(*netFD).Read(0xc4201b4380, 0xc4202ea000, 0x1000, 0x1000, 0xc420201e78, 0x43ebf4, 0xc42040a720)
    /usr/local/go/src/net/fd_unix.go:202 +0x54
net.(*conn).Read(0xc42000e060, 0xc4202ea000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
    /usr/local/go/src/net/net.go:176 +0x6f
bufio.(*Reader).Read(0xc4201a6300, 0xc4202fa060, 0xa, 0xa, 0xc420201fac, 0x0, 0xc420201fa8)
    /usr/local/go/src/bufio/bufio.go:213 +0x30d
io.ReadAtLeast(0x145c760, 0xc4201a6300, 0xc4202fa060, 0xa, 0xa, 0xa, 0xc42021e300, 0x0, 0xc420201f48)
    /usr/local/go/src/io/io.go:309 +0x88
io.ReadFull(0x145c760, 0xc4201a6300, 0xc4202fa060, 0xa, 0xa, 0x2, 0x2, 0x0)
    /usr/local/go/src/io/io.go:327 +0x5a, 0xa, 0xa, 0x145c760, 0xc4201a6300, 0xc420201f48, 0x2, 0x2, 0xc420201fa8)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x62*channel).recv(0xc4202fa040, 0x146a020, 0xc420014048, 0x0, 0x2, 0xc4201b6070, 0x6, 0x6, 0x0, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x6f*Client).run.func1(0xc42040a780, 0xc4201a6360, 0xc42003ba20, 0xc42040a720)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x9a
created by*Client).run
    /tmp/tmp.AD0Uaz9KtF/src/ +0x164

goroutine 764369 [select, 1 minutes]:*Client).run(0xc42065c540)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x342
created by
    /tmp/tmp.AD0Uaz9KtF/src/ +0x2ab

goroutine 671647 [select, 42 minutes]:*Client).dispatch(0xc4201a6360, 0x146a0a0, 0xc42024e5a0, 0xc4203440c0, 0xc420306260, 0x0, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x282*Client).Call(0xc4201a6360, 0x146a0a0, 0xc42024e5a0, 0xbbc2cb, 0x25, 0xba0ebc, 0x5, 0x10468a0, 0xc4201ae640, 0x1046980, ...)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x15d*shimClient).Start(0xc42000e068, 0x146a0a0, 0xc42024e5a0, 0xc4201ae640, 0x1001ae0, 0x7f9762311101, 0xc420571660)
    /tmp/tmp.AD0Uaz9KtF/src/ +0xbf*Process).Start(0xc4203061e0, 0x146a0a0, 0xc42024e5a0, 0x146a0a0, 0xc42024e5a0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x91*service).Start(0xc4202c6e70, 0x7f9762311168, 0xc42024e5a0, 0xc42000c560, 0xc4202c6e70, 0xbbea93, 0x5)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x14e, 0xc42024e5a0, 0x1044f40, 0xc42000c560, 0xc4202f64b0, 0x14cdcb0, 0xf87860, 0xc4201ae630)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x8b, 0xc42024e5a0, 0x1044f40, 0xc42000c560, 0xc42000c580, 0xc42000c5a0, 0x50, 0x48, 0xc42000c540, 0xc420571968)
    /tmp/tmp.AD0Uaz9KtF/src/ +0xd4, 0xc42024e4b0, 0x1044f40, 0xc42000c560, 0xc42000c580, 0xc42000c5a0, 0x8000000000000000, 0xc4205719e0, 0x41228a, 0x50)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x1d2, 0xc4202c6e70, 0x7f9762311168, 0xc420471a70, 0xc4202f62d0, 0x10af920, 0x0, 0x0, 0x0, 0x145e9e0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x16f*Server).processUnaryRPC(0xc4201f8140, 0x146d1e0, 0xc420404840, 0xc4202f0000, 0xc4202c77d0, 0x1452d78, 0x0, 0x0, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0xab6*Server).handleStream(0xc4201f8140, 0x146d1e0, 0xc420404840, 0xc4202f0000, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x152a*Server).serveStreams.func1.1(0xc4202c8860, 0xc4201f8140, 0x146d1e0, 0xc420404840, 0xc4202f0000)
    /tmp/tmp.AD0Uaz9KtF/src/ +0xa1
created by*Server).serveStreams.func1
    /tmp/tmp.AD0Uaz9KtF/src/ +0xa3

goroutine 592225 [syscall, 68 minutes]:
syscall.Syscall6(0xf7, 0x1, 0xb0e, 0xc42026a5b8, 0x1000004, 0x0, 0x0, 0x146d1e0, 0xc420404840, 0xc4202f5000)
    /usr/local/go/src/syscall/asm_linux_amd64.s:44 +0x5
os.(*Process).blockUntilWaitable(0xc420016d50, 0x0, 0xc42026a6b0, 0x5268aa)
    /usr/local/go/src/os/wait_waitid.go:31 +0xa7
os.(*Process).wait(0xc420016d50, 0xc42026a770, 0x8943d5, 0x146a0a0)
    /usr/local/go/src/os/exec_unix.go:22 +0x44
os.(*Process).Wait(0xc420016d50, 0xc42034b300, 0x42b91b, 0xc400000008)
    /usr/local/go/src/os/exec.go:115 +0x2d
os/exec.(*Cmd).Wait(0xc420262160, 0x45e611, 0xc4202c8860)
    /usr/local/go/src/os/exec/exec.go:446 +0x64, 0xc420440280)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x2d
created by
    /tmp/tmp.AD0Uaz9KtF/src/ +0x2d5

goroutine 764402 [IO wait, 1 minutes]:
internal/poll.runtime_pollWait(0x7f9762b56bb0, 0x72, 0x0)
    /usr/local/go/src/runtime/netpoll.go:173 +0x59
internal/poll.(*pollDesc).wait(0xc4201b4898, 0x72, 0xffffffffffffff00, 0x1460960, 0x145adf0)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:85 +0xb0
internal/poll.(*pollDesc).waitRead(0xc4201b4898, 0xc42073d000, 0x1000, 0x1000)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:90 +0x3f
internal/poll.(*FD).Read(0xc4201b4880, 0xc42073d000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
    /usr/local/go/src/internal/poll/fd_unix.go:126 +0x18c
net.(*netFD).Read(0xc4201b4880, 0xc42073d000, 0x1000, 0x1000, 0xc42041ee78, 0x43ebf4, 0xc4201be7e0)
    /usr/local/go/src/net/fd_unix.go:202 +0x54
net.(*conn).Read(0xc4201b0370, 0xc42073d000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
    /usr/local/go/src/net/net.go:176 +0x6f
bufio.(*Reader).Read(0xc42065c4e0, 0xc420774420, 0xa, 0xa, 0xc42041efac, 0x0, 0xc42041efa8)
    /usr/local/go/src/bufio/bufio.go:213 +0x30d
io.ReadAtLeast(0x145c760, 0xc42065c4e0, 0xc420774420, 0xa, 0xa, 0xa, 0xc4201f6100, 0x0, 0xc42041ef48)
    /usr/local/go/src/io/io.go:309 +0x88
io.ReadFull(0x145c760, 0xc42065c4e0, 0xc420774420, 0xa, 0xa, 0x2, 0x2, 0x0)
    /usr/local/go/src/io/io.go:327 +0x5a, 0xa, 0xa, 0x145c760, 0xc42065c4e0, 0xc42041ef48, 0x2, 0x2, 0xc42041efa8)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x62*channel).recv(0xc420774400, 0x146a020, 0xc420014048, 0x0, 0x2, 0xc4204a7060, 0x6, 0xb, 0x0, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x6f*Client).run.func1(0xc4201be840, 0xc42065c540, 0xc420265220, 0xc4201be7e0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x9a
created by*Client).run
    /tmp/tmp.AD0Uaz9KtF/src/ +0x164

And again (to allow easier filtering of transient from stuck)

$ docker run -it --rm -v /run/docker/containerd:/run/docker/containerd docker:18.03.0 docker-containerd-ctr pprof --debug-socket /run/docker/containerd/docker-containerd-debug.sock goroutines
goroutine 764428 [running]:
runtime/pprof.writeGoroutineStacks(0x145e720, 0xc4200ac7e0, 0x0, 0xc4205ccad0)
    /usr/local/go/src/runtime/pprof/pprof.go:608 +0xa9
runtime/pprof.writeGoroutine(0x145e720, 0xc4200ac7e0, 0x2, 0x30, 0x1026340)
    /usr/local/go/src/runtime/pprof/pprof.go:597 +0x46
runtime/pprof.(*Profile).WriteTo(0x1448700, 0x145e720, 0xc4200ac7e0, 0x2, 0xc4200ac7e0, 0xc4201b8750)
    /usr/local/go/src/runtime/pprof/pprof.go:310 +0x3ad
net/http/pprof.handler.ServeHTTP(0xc4205620d1, 0x9, 0x1469220, 0xc4200ac7e0, 0xc42057b200)
    /usr/local/go/src/net/http/pprof/pprof.go:237 +0x1ba
net/http/pprof.Index(0x1469220, 0xc4200ac7e0, 0xc42057b200)
    /usr/local/go/src/net/http/pprof/pprof.go:248 +0x1dd
net/http.HandlerFunc.ServeHTTP(0x10b0ee8, 0x1469220, 0xc4200ac7e0, 0xc42057b200)
    /usr/local/go/src/net/http/server.go:1918 +0x46
net/http.(*ServeMux).ServeHTTP(0xc4201b8750, 0x1469220, 0xc4200ac7e0, 0xc42057b200)
    /usr/local/go/src/net/http/server.go:2254 +0x132
net/http.serverHandler.ServeHTTP(0xc4201c0410, 0x1469220, 0xc4200ac7e0, 0xc42057b200)
    /usr/local/go/src/net/http/server.go:2619 +0xb6
net/http.(*conn).serve(0xc4206328c0, 0x1469fe0, 0xc4202fa300)
    /usr/local/go/src/net/http/server.go:1801 +0x71f
created by net/http.(*Server).Serve
    /usr/local/go/src/net/http/server.go:2720 +0x28a

goroutine 1 [chan receive, 5006 minutes]:
main.main.func1(0xc4200aedc0, 0xc4200aedc0, 0xc4201adb4f)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x871, 0x10b0b60, 0xc4200aedc0, 0xc420058ba0, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0xd4*App).Run(0xc420182a80, 0xc420010090, 0x3, 0x3, 0x0, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x655
    /tmp/tmp.AD0Uaz9KtF/src/ +0x53d

goroutine 13 [select, 12 minutes]:
main.handleSignals.func1(0xc420058c60, 0xc420058c00, 0x146a0a0, 0xc4201739b0, 0xc420070420)
    /tmp/tmp.AD0Uaz9KtF/src/ +0xf7
created by main.handleSignals
    /tmp/tmp.AD0Uaz9KtF/src/ +0x8b

goroutine 12 [syscall, 12 minutes]:
    /usr/local/go/src/runtime/sigqueue.go:131 +0xa8
    /usr/local/go/src/os/signal/signal_unix.go:22 +0x24
created by os/signal.init.0
    /usr/local/go/src/os/signal/signal_unix.go:28 +0x43

goroutine 14 [select, 5006 minutes, locked to thread]:
runtime.gopark(0x10b1368, 0x0, 0xba26f3, 0x6, 0x18, 0x1)
    /usr/local/go/src/runtime/proc.go:287 +0x132
runtime.selectgo(0xc420200f50, 0xc4200704e0)
    /usr/local/go/src/runtime/select.go:395 +0x114f
    /usr/local/go/src/runtime/signal_unix.go:511 +0x226
    /usr/local/go/src/runtime/asm_amd64.s:2337 +0x1

goroutine 34 [select]:*Broadcaster).run(0xc4201a23c0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x414
created by
    /tmp/tmp.AD0Uaz9KtF/src/ +0x1b1

goroutine 35 [select]:*gcScheduler).run(0xc4201ec3c0, 0x146a0a0, 0xc42024eb10)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x21d
created by
    /tmp/tmp.AD0Uaz9KtF/src/ +0x4bf

goroutine 36 [syscall]:
syscall.Syscall6(0xe8, 0x5, 0xc4206439b8, 0x80, 0xffffffffffffffff, 0x0, 0x0, 0x1, 0x80, 0x0)
    /usr/local/go/src/syscall/asm_linux_amd64.s:44 +0x5, 0xc4206439b8, 0x80, 0x80, 0xffffffffffffffff, 0x1, 0x0, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x79*oomCollector).start(0xc42020d3c0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x7d
created by
    /tmp/tmp.AD0Uaz9KtF/src/ +0x125

goroutine 50 [IO wait]:
internal/poll.runtime_pollWait(0x7f9762b56f70, 0x72, 0xffffffffffffffff)
    /usr/local/go/src/runtime/netpoll.go:173 +0x59
internal/poll.(*pollDesc).wait(0xc4202ee118, 0x72, 0xc420203b00, 0x0, 0x0)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:85 +0xb0
internal/poll.(*pollDesc).waitRead(0xc4202ee118, 0xffffffffffffff00, 0x0, 0x0)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:90 +0x3f
internal/poll.(*FD).Accept(0xc4202ee100, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
    /usr/local/go/src/internal/poll/fd_unix.go:335 +0x1e4
net.(*netFD).accept(0xc4202ee100, 0xc420632940, 0xfb2160, 0xc420203d78)
    /usr/local/go/src/net/fd_unix.go:238 +0x44
net.(*UnixListener).accept(0xc4202c7920, 0x7f8e9a, 0x45ad10, 0xc420203dc0)
    /usr/local/go/src/net/unixsock_posix.go:162 +0x34
net.(*UnixListener).Accept(0xc4202c7920, 0x10b0d00, 0xc4206328c0, 0x146a0a0, 0xc4201b8840)
    /usr/local/go/src/net/unixsock.go:241 +0x4b
net/http.(*Server).Serve(0xc4201c0410, 0x1468fe0, 0xc4202c7920, 0x0, 0x0)
    /usr/local/go/src/net/http/server.go:2695 +0x1b4
net/http.Serve(0x1468fe0, 0xc4202c7920, 0x145e360, 0xc4201b8750, 0x10b0f00, 0xc420268720)
    /usr/local/go/src/net/http/server.go:2323 +0x75*Server).ServeDebug(0xc4201aef40, 0x1468fe0, 0xc4202c7920, 0xc420268738, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x1c8*Server).ServeDebug-fm(0x1468fe0, 0xc4202c7920, 0xc4202c7920, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x40
main.serve.func1(0x1468fe0, 0xc4202c7920, 0xc4202bb620, 0x146a0a0, 0xc4202c79e0, 0xc4202c0380, 0x37)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x77
created by main.serve
    /tmp/tmp.AD0Uaz9KtF/src/ +0x1c8

goroutine 51 [IO wait]:
internal/poll.runtime_pollWait(0x7f9762b56eb0, 0x72, 0xffffffffffffffff)
    /usr/local/go/src/runtime/netpoll.go:173 +0x59
internal/poll.(*pollDesc).wait(0xc4202ee298, 0x72, 0xc420033b00, 0x0, 0x0)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:85 +0xb0
internal/poll.(*pollDesc).waitRead(0xc4202ee298, 0xffffffffffffff00, 0x0, 0x0)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:90 +0x3f
internal/poll.(*FD).Accept(0xc4202ee280, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
    /usr/local/go/src/internal/poll/fd_unix.go:335 +0x1e4
net.(*netFD).accept(0xc4202ee280, 0xc4202b8000, 0x0, 0x0)
    /usr/local/go/src/net/fd_unix.go:238 +0x44
net.(*UnixListener).accept(0xc4202c7a70, 0x89339b, 0x45ad10, 0xc420033da0)
    /usr/local/go/src/net/unixsock_posix.go:162 +0x34
net.(*UnixListener).Accept(0xc4202c7a70, 0x10b07e8, 0xc4201f8140, 0x146d6c0, 0xc4202b8000)
    /usr/local/go/src/net/unixsock.go:241 +0x4b*Server).Serve(0xc4201f8140, 0x1468fe0, 0xc4202c7a70, 0x0, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x198*Server).ServeGRPC(0xc4201aef40, 0x1468fe0, 0xc4202c7a70, 0xc420268f38, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x55*Server).ServeGRPC-fm(0x1468fe0, 0xc4202c7a70, 0xc4202c7a70, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x40
main.serve.func1(0x1468fe0, 0xc4202c7a70, 0xc4202bb730, 0x146a0a0, 0xc4202c7b30, 0xc4202c0480, 0x31)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x77
created by main.serve
    /tmp/tmp.AD0Uaz9KtF/src/ +0x1c8

goroutine 52 [IO wait]:
internal/poll.runtime_pollWait(0x7f9762b56df0, 0x72, 0x0)
    /usr/local/go/src/runtime/netpoll.go:173 +0x59
internal/poll.(*pollDesc).wait(0xc4202ee618, 0x72, 0xffffffffffffff00, 0x1460960, 0x145adf0)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:85 +0xb0
internal/poll.(*pollDesc).waitRead(0xc4202ee618, 0xc4203f2000, 0x8000, 0x8000)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:90 +0x3f
internal/poll.(*FD).Read(0xc4202ee600, 0xc4203f2000, 0x8000, 0x8000, 0x0, 0x0, 0x0)
    /usr/local/go/src/internal/poll/fd_unix.go:126 +0x18c
net.(*netFD).Read(0xc4202ee600, 0xc4203f2000, 0x8000, 0x8000, 0x11, 0x0, 0x0)
    /usr/local/go/src/net/fd_unix.go:202 +0x54
net.(*conn).Read(0xc4202b8028, 0xc4203f2000, 0x8000, 0x8000, 0x0, 0x0, 0x0)
    /usr/local/go/src/net/net.go:176 +0x6f
bufio.(*Reader).Read(0xc4202da660, 0xc4203021f8, 0x9, 0x9, 0x9, 0x0, 0x0)
    /usr/local/go/src/bufio/bufio.go:213 +0x30d
io.ReadAtLeast(0x145c760, 0xc4202da660, 0xc4203021f8, 0x9, 0x9, 0x9, 0x227a0234201fcbb8, 0x5ac36c69, 0xc4201fcbc0)
    /usr/local/go/src/io/io.go:309 +0x88
io.ReadFull(0x145c760, 0xc4202da660, 0xc4203021f8, 0x9, 0x9, 0x1113749e9ca22, 0x14a7a60, 0xbea8f8fa62751fd0)
    /usr/local/go/src/io/io.go:327 +0x5a, 0x9, 0x9, 0x145c760, 0xc4202da660, 0x0, 0x7070e0900000000, 0xc4202e6798, 0xc4201fcce8)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x7d*Framer).ReadFrame(0xc4203021c0, 0xc4202bff20, 0xc4202bff20, 0x0, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0xa6*http2Server).HandleStreams(0xc420404000, 0xc4203ef9e0, 0x10b0820)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x317*Server).serveStreams(0xc4201f8140, 0x146d1e0, 0xc420404000)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x142*Server).serveHTTP2Transport(0xc4201f8140, 0x146d6c0, 0xc4202b8028, 0x0, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x473*Server).handleRawConn(0xc4201f8140, 0x146d6c0, 0xc4202b8028)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x499
created by*Server).Serve
    /tmp/tmp.AD0Uaz9KtF/src/ +0x5bb

goroutine 53 [select]:, 0xc4203ed840, 0xc4203ef980, 0xc42041cfb8)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x2e6
    /tmp/tmp.AD0Uaz9KtF/src/ +0x60
created by
    /tmp/tmp.AD0Uaz9KtF/src/ +0x8fb

goroutine 54 [select, 88 minutes]:*http2Server).keepalive(0xc420404000)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x266
created by
    /tmp/tmp.AD0Uaz9KtF/src/ +0x920

goroutine 56 [select, 5006 minutes]:*service).Subscribe(0xc4201b0088, 0xc4203e93c0, 0x146d000, 0xc4203f0550, 0x0, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x213, 0xc4201b0088, 0x146c580, 0xc4203e93a0, 0xc4202e6d20, 0xc4202a2000)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x110, 0xc4201b0088, 0x146c640, 0xc4202ee800, 0xc4203e9380, 0x10af408, 0x0, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x13b*Server).processStreamingRPC(0xc4201f8140, 0x146d1e0, 0xc4200aef20, 0xc4202f0400, 0xc4202c7470, 0x1447fc0, 0x0, 0x0, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x2ea*Server).handleStream(0xc4201f8140, 0x146d1e0, 0xc4200aef20, 0xc4202f0400, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x14c3*Server).serveStreams.func1.1(0xc4200157f0, 0xc4201f8140, 0x146d1e0, 0xc4200aef20, 0xc4202f0400)
    /tmp/tmp.AD0Uaz9KtF/src/ +0xa1
created by*Server).serveStreams.func1
    /tmp/tmp.AD0Uaz9KtF/src/ +0xa3

goroutine 15 [IO wait, 74 minutes]:
internal/poll.runtime_pollWait(0x7f9762b56d30, 0x72, 0x0)
    /usr/local/go/src/runtime/netpoll.go:173 +0x59
internal/poll.(*pollDesc).wait(0xc420012498, 0x72, 0xffffffffffffff00, 0x1460960, 0x145adf0)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:85 +0xb0
internal/poll.(*pollDesc).waitRead(0xc420012498, 0xc420428000, 0x8000, 0x8000)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:90 +0x3f
internal/poll.(*FD).Read(0xc420012480, 0xc420428000, 0x8000, 0x8000, 0x0, 0x0, 0x0)
    /usr/local/go/src/internal/poll/fd_unix.go:126 +0x18c
net.(*netFD).Read(0xc420012480, 0xc420428000, 0x8000, 0x8000, 0x11, 0x0, 0x0)
    /usr/local/go/src/net/fd_unix.go:202 +0x54
net.(*conn).Read(0xc42000e138, 0xc420428000, 0x8000, 0x8000, 0x0, 0x0, 0x0)
    /usr/local/go/src/net/net.go:176 +0x6f
bufio.(*Reader).Read(0xc420058cc0, 0xc4200ac3b8, 0x9, 0x9, 0x9, 0x0, 0x0)
    /usr/local/go/src/bufio/bufio.go:213 +0x30d
io.ReadAtLeast(0x145c760, 0xc420058cc0, 0xc4200ac3b8, 0x9, 0x9, 0x9, 0xc420038bb8, 0x400f10, 0xc420038c67)
    /usr/local/go/src/io/io.go:309 +0x88
io.ReadFull(0x145c760, 0xc420058cc0, 0xc4200ac3b8, 0x9, 0x9, 0x83ce2d, 0xc42048414c, 0xc42043a000)
    /usr/local/go/src/io/io.go:327 +0x5a, 0x9, 0x9, 0x145c760, 0xc420058cc0, 0x0, 0x0, 0xc420484140, 0xc420038ce8)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x7d*Framer).ReadFrame(0xc4200ac380, 0xc420484140, 0xc420484140, 0x0, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0xa6*http2Server).HandleStreams(0xc4200aef20, 0xc420173bc0, 0x10b0820)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x317*Server).serveStreams(0xc4201f8140, 0x146d1e0, 0xc4200aef20)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x142*Server).serveHTTP2Transport(0xc4201f8140, 0x146d6c0, 0xc42000e138, 0x0, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x473*Server).handleRawConn(0xc4201f8140, 0x146d6c0, 0xc42000e138)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x499
created by*Server).Serve
    /tmp/tmp.AD0Uaz9KtF/src/ +0x5bb

goroutine 16 [select, 74 minutes]:, 0xc420054cc0, 0xc420173b60, 0xc42041dfb8)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x2e6
    /tmp/tmp.AD0Uaz9KtF/src/ +0x60
created by
    /tmp/tmp.AD0Uaz9KtF/src/ +0x8fb

goroutine 66 [select, 74 minutes]:*http2Server).keepalive(0xc4200aef20)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x266
created by
    /tmp/tmp.AD0Uaz9KtF/src/ +0x920

goroutine 57 [semacquire, 5006 minutes]:
sync.runtime_notifyListWait(0xc4203edbd0, 0xc400000000)
    /usr/local/go/src/runtime/sema.go:507 +0x114
    /usr/local/go/src/sync/cond.go:56 +0x82*Queue).next(0xc4204382d0, 0x0, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x87*Queue).run(0xc4204382d0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x34
created by
    /tmp/tmp.AD0Uaz9KtF/src/ +0x14d

goroutine 58 [select, 5006 minutes]:*Exchange).Subscribe.func3(0xc420438330, 0xc4203e93e0, 0xc42040a120, 0x1469fe0, 0xc4203edb80, 0xc4202da7e0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x15f
created by*Exchange).Subscribe
    /tmp/tmp.AD0Uaz9KtF/src/ +0x291

goroutine 67 [select]:*service).Subscribe(0xc4201b0088, 0xc4203e9700, 0x146d000, 0xc4203f06f0, 0x0, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x213, 0xc4201b0088, 0x146c580, 0xc4203e96e0, 0xc4202e7090, 0xc420029400)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x110, 0xc4201b0088, 0x146c640, 0xc4202eeb80, 0xc4203e96c0, 0x10af408, 0x0, 0xc420268ec8)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x13b*Server).processStreamingRPC(0xc4201f8140, 0x146d1e0, 0xc420404840, 0xc42019a200, 0xc4202c7470, 0x1447fc0, 0x0, 0x0, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x2ea*Server).handleStream(0xc4201f8140, 0x146d1e0, 0xc420404840, 0xc42019a200, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x14c3*Server).serveStreams.func1.1(0xc4202c8860, 0xc4201f8140, 0x146d1e0, 0xc420404840, 0xc42019a200)
    /tmp/tmp.AD0Uaz9KtF/src/ +0xa1
created by*Server).serveStreams.func1
    /tmp/tmp.AD0Uaz9KtF/src/ +0xa3

goroutine 60 [IO wait]:
internal/poll.runtime_pollWait(0x7f9762b56c70, 0x72, 0x0)
    /usr/local/go/src/runtime/netpoll.go:173 +0x59
internal/poll.(*pollDesc).wait(0xc4202ee918, 0x72, 0xffffffffffffff00, 0x1460960, 0x145adf0)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:85 +0xb0
internal/poll.(*pollDesc).waitRead(0xc4202ee918, 0xc420458000, 0x8000, 0x8000)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:90 +0x3f
internal/poll.(*FD).Read(0xc4202ee900, 0xc420458000, 0x8000, 0x8000, 0x0, 0x0, 0x0)
    /usr/local/go/src/internal/poll/fd_unix.go:126 +0x18c
net.(*netFD).Read(0xc4202ee900, 0xc420458000, 0x8000, 0x8000, 0x11, 0x0, 0x0)
    /usr/local/go/src/net/fd_unix.go:202 +0x54
net.(*conn).Read(0xc4202b8040, 0xc420458000, 0x8000, 0x8000, 0x0, 0x0, 0x0)
    /usr/local/go/src/net/net.go:176 +0x6f
bufio.(*Reader).Read(0xc4202daae0, 0xc4203023b8, 0x9, 0x9, 0x9, 0x0, 0x0)
    /usr/local/go/src/bufio/bufio.go:213 +0x30d
io.ReadAtLeast(0x145c760, 0xc4202daae0, 0xc4203023b8, 0x9, 0x9, 0x9, 0x22856124201febb8, 0x5ac36c69, 0xc4201febc0)
    /usr/local/go/src/io/io.go:309 +0x88
io.ReadFull(0x145c760, 0xc4202daae0, 0xc4203023b8, 0x9, 0x9, 0x1113749f52782, 0x14a7a60, 0xbea8f8fa62803e4c)
    /usr/local/go/src/io/io.go:327 +0x5a, 0x9, 0x9, 0x145c760, 0xc4202daae0, 0x0, 0x7070e0900000000, 0xc4202e6fb8, 0xc4201fece8)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x7d*Framer).ReadFrame(0xc420302380, 0xc420204d20, 0xc420204d20, 0x0, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0xa6*http2Server).HandleStreams(0xc420404840, 0xc420438900, 0x10b0820)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x317*Server).serveStreams(0xc4201f8140, 0x146d1e0, 0xc420404840)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x142*Server).serveHTTP2Transport(0xc4201f8140, 0x146d6c0, 0xc4202b8040, 0x0, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x473*Server).handleRawConn(0xc4201f8140, 0x146d6c0, 0xc4202b8040)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x499
created by*Server).Serve
    /tmp/tmp.AD0Uaz9KtF/src/ +0x5bb

goroutine 61 [select]:, 0xc4203eddc0, 0xc4204388a0, 0xc42046cfb8)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x2e6
    /tmp/tmp.AD0Uaz9KtF/src/ +0x60
created by
    /tmp/tmp.AD0Uaz9KtF/src/ +0x8fb

goroutine 62 [select, 74 minutes]:*http2Server).keepalive(0xc420404840)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x266
created by
    /tmp/tmp.AD0Uaz9KtF/src/ +0x920

goroutine 63 [semacquire]:
sync.runtime_notifyListWait(0xc4203edf50, 0xc400009f39)
    /usr/local/go/src/runtime/sema.go:507 +0x114
    /usr/local/go/src/sync/cond.go:56 +0x82*Queue).next(0xc4204389c0, 0x0, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x87*Queue).run(0xc4204389c0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x34
created by
    /tmp/tmp.AD0Uaz9KtF/src/ +0x14d

goroutine 64 [select]:*Exchange).Subscribe.func3(0xc420438a20, 0xc4203e9720, 0xc42040a3c0, 0x1469fe0, 0xc4203edf00, 0xc4202daf00)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x15f
created by*Exchange).Subscribe
    /tmp/tmp.AD0Uaz9KtF/src/ +0x291

goroutine 592290 [select, 43 minutes]:*Client).run(0xc4201a6360)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x342
created by
    /tmp/tmp.AD0Uaz9KtF/src/ +0x2ab

goroutine 671746 [select, 43 minutes]:*Client).dispatch(0xc4201a6360, 0x146a0a0, 0xc4204eee70, 0xc420345a80, 0xc4201daa60, 0x0, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x282*Client).Call(0xc4201a6360, 0x146a0a0, 0xc4204eee70, 0xbbc2cb, 0x25, 0xba0ec1, 0x5, 0x1046a60, 0xc4201a1ee0, 0x1046b40, ...)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x15d*shimClient).State(0xc42000e068, 0x146a0a0, 0xc4204eee70, 0xc4201a1ee0, 0x0, 0xc42016b298, 0xc42059f578)
    /tmp/tmp.AD0Uaz9KtF/src/ +0xbf*Process).State(0xc4201daa40, 0x146a0a0, 0xc4204eee70, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
    /tmp/tmp.AD0Uaz9KtF/src/ +0xc7, 0xc4204eee70, 0x146d0c0, 0xc4201daa40, 0x40, 0x146d0c0, 0xc4201daa40)
    /tmp/tmp.AD0Uaz9KtF/src/ +0xbe*service).Get(0xc4202c6e70, 0x7f9762311168, 0xc4204eee70, 0xc4201da800, 0xc4202c6e70, 0xbbd46b, 0x3)
    /tmp/tmp.AD0Uaz9KtF/src/ +0xef, 0xc4204eee70, 0x10444c0, 0xc4201da800, 0xc420079ae0, 0x14cdcb0, 0xf87860, 0xc4201a1e30)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x88, 0xc4204eee70, 0x10444c0, 0xc4201da800, 0xc4201da820, 0xc4201da840, 0x50, 0x48, 0xc4201da7e0, 0xc42059f968)
    /tmp/tmp.AD0Uaz9KtF/src/ +0xd4, 0xc4204eedb0, 0x10444c0, 0xc4201da800, 0xc4201da820, 0xc4201da840, 0x4354a6, 0xc42059f9e0, 0x41228a, 0x50)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x1d2, 0xc4202c6e70, 0x7f9762311168, 0xc42044fbc0, 0xc420079900, 0x10af920, 0x0, 0x0, 0xc42001d900, 0x46)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x16f*Server).processUnaryRPC(0xc4201f8140, 0x146d1e0, 0xc420404840, 0xc42021d700, 0xc4202c77d0, 0x1452dc0, 0x0, 0x0, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0xab6*Server).handleStream(0xc4201f8140, 0x146d1e0, 0xc420404840, 0xc42021d700, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x152a*Server).serveStreams.func1.1(0xc4202c8860, 0xc4201f8140, 0x146d1e0, 0xc420404840, 0xc42021d700)
    /tmp/tmp.AD0Uaz9KtF/src/ +0xa1
created by*Server).serveStreams.func1
    /tmp/tmp.AD0Uaz9KtF/src/ +0xa3

goroutine 764452 [select]:*Client).run(0xc420169500)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x342
created by
    /tmp/tmp.AD0Uaz9KtF/src/ +0x2ab

goroutine 592291 [IO wait, 43 minutes]:
internal/poll.runtime_pollWait(0x7f9762b56af0, 0x72, 0x0)
    /usr/local/go/src/runtime/netpoll.go:173 +0x59
internal/poll.(*pollDesc).wait(0xc4201b4398, 0x72, 0xffffffffffffff00, 0x1460960, 0x145adf0)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:85 +0xb0
internal/poll.(*pollDesc).waitRead(0xc4201b4398, 0xc4202ea000, 0x1000, 0x1000)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:90 +0x3f
internal/poll.(*FD).Read(0xc4201b4380, 0xc4202ea000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
    /usr/local/go/src/internal/poll/fd_unix.go:126 +0x18c
net.(*netFD).Read(0xc4201b4380, 0xc4202ea000, 0x1000, 0x1000, 0xc420201e78, 0x43ebf4, 0xc42040a720)
    /usr/local/go/src/net/fd_unix.go:202 +0x54
net.(*conn).Read(0xc42000e060, 0xc4202ea000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
    /usr/local/go/src/net/net.go:176 +0x6f
bufio.(*Reader).Read(0xc4201a6300, 0xc4202fa060, 0xa, 0xa, 0xc420201fac, 0x0, 0xc420201fa8)
    /usr/local/go/src/bufio/bufio.go:213 +0x30d
io.ReadAtLeast(0x145c760, 0xc4201a6300, 0xc4202fa060, 0xa, 0xa, 0xa, 0xc42021e300, 0x0, 0xc420201f48)
    /usr/local/go/src/io/io.go:309 +0x88
io.ReadFull(0x145c760, 0xc4201a6300, 0xc4202fa060, 0xa, 0xa, 0x2, 0x2, 0x0)
    /usr/local/go/src/io/io.go:327 +0x5a, 0xa, 0xa, 0x145c760, 0xc4201a6300, 0xc420201f48, 0x2, 0x2, 0xc420201fa8)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x62*channel).recv(0xc4202fa040, 0x146a020, 0xc420014048, 0x0, 0x2, 0xc4201b6070, 0x6, 0x6, 0x0, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x6f*Client).run.func1(0xc42040a780, 0xc4201a6360, 0xc42003ba20, 0xc42040a720)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x9a
created by*Client).run
    /tmp/tmp.AD0Uaz9KtF/src/ +0x164

goroutine 764451 [syscall]:
syscall.Syscall6(0xf7, 0x1, 0x3778, 0xc42026d5b8, 0x1000004, 0x0, 0x0, 0x146d1e0, 0xc420404840, 0xc42019af00)
    /usr/local/go/src/syscall/asm_linux_amd64.s:44 +0x5
os.(*Process).blockUntilWaitable(0xc42052ed80, 0x0, 0xc42026d6b0, 0x5268aa)
    /usr/local/go/src/os/wait_waitid.go:31 +0xa7
os.(*Process).wait(0xc42052ed80, 0xc42026d770, 0x8943d5, 0x146a0a0)
    /usr/local/go/src/os/exec_unix.go:22 +0x44
os.(*Process).Wait(0xc42052ed80, 0xc420128d00, 0x42b91b, 0xc400000008)
    /usr/local/go/src/os/exec.go:115 +0x2d
os/exec.(*Cmd).Wait(0xc4200ae580, 0x45e611, 0xc4202c8860)
    /usr/local/go/src/os/exec/exec.go:446 +0x64, 0xc4202f6410)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x2d
created by
    /tmp/tmp.AD0Uaz9KtF/src/ +0x2d5

goroutine 764429 [IO wait]:
internal/poll.runtime_pollWait(0x7f9762b56bb0, 0x72, 0x0)
    /usr/local/go/src/runtime/netpoll.go:173 +0x59
internal/poll.(*pollDesc).wait(0xc420230818, 0x72, 0xffffffffffffff00, 0x1460960, 0x145adf0)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:85 +0xb0
internal/poll.(*pollDesc).waitRead(0xc420230818, 0xc4204def00, 0x1, 0x1)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:90 +0x3f
internal/poll.(*FD).Read(0xc420230800, 0xc4204defa1, 0x1, 0x1, 0x0, 0x0, 0x0)
    /usr/local/go/src/internal/poll/fd_unix.go:126 +0x18c
net.(*netFD).Read(0xc420230800, 0xc4204defa1, 0x1, 0x1, 0x0, 0xc4204deb00, 0x42b91b)
    /usr/local/go/src/net/fd_unix.go:202 +0x54
net.(*conn).Read(0xc4201b0230, 0xc4204defa1, 0x1, 0x1, 0x0, 0x0, 0x0)
    /usr/local/go/src/net/net.go:176 +0x6f
    /usr/local/go/src/net/http/server.go:660 +0x64
created by net/http.(*connReader).startBackgroundRead
    /usr/local/go/src/net/http/server.go:656 +0xda

goroutine 671647 [select, 43 minutes]:*Client).dispatch(0xc4201a6360, 0x146a0a0, 0xc42024e5a0, 0xc4203440c0, 0xc420306260, 0x0, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x282*Client).Call(0xc4201a6360, 0x146a0a0, 0xc42024e5a0, 0xbbc2cb, 0x25, 0xba0ebc, 0x5, 0x10468a0, 0xc4201ae640, 0x1046980, ...)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x15d*shimClient).Start(0xc42000e068, 0x146a0a0, 0xc42024e5a0, 0xc4201ae640, 0x1001ae0, 0x7f9762311101, 0xc420571660)
    /tmp/tmp.AD0Uaz9KtF/src/ +0xbf*Process).Start(0xc4203061e0, 0x146a0a0, 0xc42024e5a0, 0x146a0a0, 0xc42024e5a0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x91*service).Start(0xc4202c6e70, 0x7f9762311168, 0xc42024e5a0, 0xc42000c560, 0xc4202c6e70, 0xbbea93, 0x5)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x14e, 0xc42024e5a0, 0x1044f40, 0xc42000c560, 0xc4202f64b0, 0x14cdcb0, 0xf87860, 0xc4201ae630)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x8b, 0xc42024e5a0, 0x1044f40, 0xc42000c560, 0xc42000c580, 0xc42000c5a0, 0x50, 0x48, 0xc42000c540, 0xc420571968)
    /tmp/tmp.AD0Uaz9KtF/src/ +0xd4, 0xc42024e4b0, 0x1044f40, 0xc42000c560, 0xc42000c580, 0xc42000c5a0, 0x8000000000000000, 0xc4205719e0, 0x41228a, 0x50)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x1d2, 0xc4202c6e70, 0x7f9762311168, 0xc420471a70, 0xc4202f62d0, 0x10af920, 0x0, 0x0, 0x0, 0x145e9e0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x16f*Server).processUnaryRPC(0xc4201f8140, 0x146d1e0, 0xc420404840, 0xc4202f0000, 0xc4202c77d0, 0x1452d78, 0x0, 0x0, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0xab6*Server).handleStream(0xc4201f8140, 0x146d1e0, 0xc420404840, 0xc4202f0000, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x152a*Server).serveStreams.func1.1(0xc4202c8860, 0xc4201f8140, 0x146d1e0, 0xc420404840, 0xc4202f0000)
    /tmp/tmp.AD0Uaz9KtF/src/ +0xa1
created by*Server).serveStreams.func1
    /tmp/tmp.AD0Uaz9KtF/src/ +0xa3

goroutine 592225 [syscall, 68 minutes]:
syscall.Syscall6(0xf7, 0x1, 0xb0e, 0xc42026a5b8, 0x1000004, 0x0, 0x0, 0x146d1e0, 0xc420404840, 0xc4202f5000)
    /usr/local/go/src/syscall/asm_linux_amd64.s:44 +0x5
os.(*Process).blockUntilWaitable(0xc420016d50, 0x0, 0xc42026a6b0, 0x5268aa)
    /usr/local/go/src/os/wait_waitid.go:31 +0xa7
os.(*Process).wait(0xc420016d50, 0xc42026a770, 0x8943d5, 0x146a0a0)
    /usr/local/go/src/os/exec_unix.go:22 +0x44
os.(*Process).Wait(0xc420016d50, 0xc42034b300, 0x42b91b, 0xc400000008)
    /usr/local/go/src/os/exec.go:115 +0x2d
os/exec.(*Cmd).Wait(0xc420262160, 0x45e611, 0xc4202c8860)
    /usr/local/go/src/os/exec/exec.go:446 +0x64, 0xc420440280)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x2d
created by
    /tmp/tmp.AD0Uaz9KtF/src/ +0x2d5

goroutine 764453 [IO wait]:
internal/poll.runtime_pollWait(0x7f9762b567f0, 0x72, 0x0)
    /usr/local/go/src/runtime/netpoll.go:173 +0x59
internal/poll.(*pollDesc).wait(0xc4202ee498, 0x72, 0xffffffffffffff00, 0x1460960, 0x145adf0)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:85 +0xb0
internal/poll.(*pollDesc).waitRead(0xc4202ee498, 0xc420777000, 0x1000, 0x1000)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:90 +0x3f
internal/poll.(*FD).Read(0xc4202ee480, 0xc420777000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
    /usr/local/go/src/internal/poll/fd_unix.go:126 +0x18c
net.(*netFD).Read(0xc4202ee480, 0xc420777000, 0x1000, 0x1000, 0xc42041fe78, 0x43ebf4, 0xc4201be300)
    /usr/local/go/src/net/fd_unix.go:202 +0x54
net.(*conn).Read(0xc42000e1f0, 0xc420777000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
    /usr/local/go/src/net/net.go:176 +0x6f
bufio.(*Reader).Read(0xc420169440, 0xc4203453e0, 0xa, 0xa, 0xc42041ffac, 0x0, 0xc42041ffa8)
    /usr/local/go/src/bufio/bufio.go:213 +0x30d
io.ReadAtLeast(0x145c760, 0xc420169440, 0xc4203453e0, 0xa, 0xa, 0xa, 0xc4200b4180, 0x0, 0xc42041ff48)
    /usr/local/go/src/io/io.go:309 +0x88
io.ReadFull(0x145c760, 0xc420169440, 0xc4203453e0, 0xa, 0xa, 0x2, 0x2, 0x0)
    /usr/local/go/src/io/io.go:327 +0x5a, 0xa, 0xa, 0x145c760, 0xc420169440, 0xc42041ff48, 0x2, 0x2, 0xc42041ffa8)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x62*channel).recv(0xc4203453c0, 0x146a020, 0xc420014048, 0x0, 0x2, 0xc42073a000, 0x6, 0x1b9, 0x0, 0x0)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x6f*Client).run.func1(0xc4201be360, 0xc420169500, 0xc420664ec0, 0xc4201be300)
    /tmp/tmp.AD0Uaz9KtF/src/ +0x9a
created by*Client).run
    /tmp/tmp.AD0Uaz9KtF/src/ +0x164
$ docker info
Containers: 5
 Running: 1
 Paused: 0
 Stopped: 4
Images: 323
Server Version: 18.03.0-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
 Volume: local
 Network: bridge host ipvlan macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: cfd04396dc68220d1cecbe686a6cc3aa5ce3667c
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
  Profile: default
Kernel Version: 4.9.87-linuxkit-aufs
Operating System: Docker for Mac
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 7.786GiB
Name: linuxkit-025000000001
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 42
 Goroutines: 70
 System Time: 2018-04-03T11:59:08.2610107Z
 EventsListeners: 2
HTTP Proxy: docker.for.mac.http.internal:3128
HTTPS Proxy: docker.for.mac.http.internal:3129
Experimental: true
Insecure Registries:
Live Restore Enabled: false
$ docker version
 Version:   18.03.0-ce
 API version:   1.37
 Go version:    go1.9.4
 Git commit:    0520e24
 Built: Wed Mar 21 23:06:22 2018
 OS/Arch:   darwin/amd64
 Experimental:  false
 Orchestrator:  swarm

  Version:  18.03.0-ce
  API version:  1.37 (minimum version 1.12)
  Go version:   go1.9.4
  Git commit:   0520e24
  Built:    Wed Mar 21 23:14:32 2018
  OS/Arch:  linux/amd64
  Experimental: true

The image I get this on always is (or any of the elasticsearch images... I'm just using 6.0.1 consistently)

I have 23 other images that I start and stop multiple times every day, but the elasticsearch one dies consistently... and it won't be the same elasticsearch container instance (because I'm running more than one, but all off the same image)...

The hang can be any time from within 1-2 minutes of start up to 2-3 hours...

I am also seeing what seems to be this issue across multiple swarms. Has there been any progress in determining the root cause?

Are people rolling back to previous versions?(17.09 has been mentioned)

The issue is flagged as needing more information, what additional information is required?

@timdau We are still on 17.09 in production because this is the most stable version for us due to these "unstoppable containers"

The same happens to me in docker-CE 17.12.0 (in 3 clusters), I rolling back to 17.09.
It's incredibly that Docker have now this kind of critical bugs in two LTS versions, and don't fix it...
I understand that maybe it's difficult to reproduce, but this happen to a lots of persons...

¿It's because now there are an EE version, and there efforts are now in that version EE 2.2.x (Docker 17.06.x)?

There are multiple issues at play here and there have been multiple fixes that take care of different areas.

The same happens to me in docker-CE 17.12.0

17.12.1 has been out for some time now. It doesn't fix all issues but it does fix some.
Please update.
There are other fixes available in 18.03.0, but it may be worth waiting for 18.03.1 which should be out soon.

This issue is still open because we understand it's not fixed and it is being worked on.
If you want to help there are a number of ways to contribute outside of narrowing down cases... e.g. specific/consistent repro steps, stack traces from an updated docker instance (and containerd and a containerd-shim also helpful), etc.

Coming on here and making false claims an silly posturing is not helpful at all.

I have this problem in this in other cluster:
docker service ls

2uy2rdh3cu7e arxx_rxx replicated 5/4 xx/arxx_rxx:latest-SNAP *:80->80/tcp,*:443->443/tcp
See 5 containers from 4.... >(it's set to 4 replicas always)

docker service ps arxx_rxx
show only 4 running

docker ps
show also the extra container that can't be stopped
69364e4293d1 xx/arxx_rxx:latest-SNAP "java -jar app-all.j…" 13 days ago Up 13 days (healthy) 80/tcp, 443/tcp arxx_rxx.1.bayrllx65489r7e1vh5te3plp

all command related to this container hangs
docker inspect 69364e4293d1

The container is also breaks the services.

¿Can you show me the exact commands that I have to do to give you the info you need to find the bug?

Ubuntu 16.04
Docker 17.12.0-ce

If you have Skype / Hangouts I can let you do it yourself.

We hit the same issue. One of the container hanged, so other commands like docker rmi/logs doesn't work.

cat /etc/redhat-release

CentOS Linux release 7.4.1708 (Core)

docker version

Version: 17.12.0-ce
API version: 1.35
Go version: go1.9.2
Git commit: c97c6d6
Built: Wed Dec 27 20:10:14 2017
OS/Arch: linux/amd64

Version: 17.12.0-ce
API version: 1.35 (minimum version 1.12)
Go version: go1.9.2
Git commit: c97c6d6
Built: Wed Dec 27 20:12:46 2017
OS/Arch: linux/amd64
Experimental: false


(curl --unix-socket /var/run/docker.sock http:/./debug/pprof/goroutine?debug=2)

I see some big IO wait in the log..
At the moment we switched back to docker version: 17.09.1-ce

Any ideas would be very appreciated.

@victorvarza see the earlier comments: - if you're on 17.12; at least upgrade to 17.12.1, but given that 17.12 reached EOL, consider 18.03 (but you may want to wait for 18.03.1, which will have some fixes)

I've made some changes to my infrastructure to afford myself the luxury of being able to spend some time collecting logs/information the next time that this happens on my production systems.

I'm currently on Ubuntu 16.04.4 LTS running docker-ce 18.03.1 and Linux Kernel 4.13.0-39-generic x86_64.

Can someone confirm that this is all of the information that would need to be collected in order to provide enough information to help troubleshoot this issue?

  1. docker inspect {container-id} > docker-inspect-container.log
  2. ps -aux | grep {container-id} to get docker-containerd-shim pid
  3. To get a stack dump from docker-containerd-shim do kill -s SIGUSR1 {docker-containerd-shim-pid}. This should generate a stack trace in the logs for dockerd.
  4. sudo journalctl -u docker.service --since today > docker-service-log.txt
  5. docker info
  6. docker version

Same problem here. A container is stuck. Cannot stop, kill, rm, etc.
Provided all the related information in the attached files.

 Version:   18.03.0-ce
 API version:   1.37
 Go version:    go1.9.4
 Git commit:    0520e24
 Built: Wed Mar 21 23:04:48 2018
 OS/Arch:   linux/arm64
 Experimental:  false
 Orchestrator:  swarm

  Version:  18.03.0-ce
  API version:  1.37 (minimum version 1.12)
  Go version:   go1.9.4
  Git commit:   0520e24
  Built:    Wed Mar 21 23:10:22 2018
  OS/Arch:  linux/arm64
  Experimental: false


I had the same problem with 18.03.0-ce but it looks like this was related to host mounted NFS volumes. I already read some similar setups in this thread. Could it be that NFS is the actual problem here?

We don‘t have NFS involved in our setup and face the same issue.

18.03.1 is out with some mitigations for this. Please let us know if it's still a problem on that release.

@cpuguy83 Docker for AWS was still on 18.03.0 last time I checked. And last version listed in release notes is 18.03.0. I am eager to check. Any idea when Docker for AWS stable will upgrade to 18.03.1 ?

Caution upgrading your swarm cluster, bug: #36961, your cluster can became dead.

@cpuguy83 18.03.1 is not there yet at the release page: or am I blind?

18.03.1 is not there yet at the release page: or am I blind?

That doc are out of date, you can see here:
released 11 days ago.

@marcomsousa thanks for noticing that; release-notes are now also added on the docs website;

@cpuguy83 Is there a list somewhere of all of the issues related to this problem? That way we can know for sure when this issue is resolved and its safe to upgrade.

This commit containerd/containerd@d235ae9 was released in containerd 1.0.3.
Docker-ce 18.03.1 include this version of containerd.

So we need to test if this error fixed in the 18.03.1 version

Seems like 18.03.1 has fixed the issue for me. I have been using it for a week locally, but did not experience the issue, that was easily reproducible within a day otherwise.

The 18.03.1 version seems to fixed this issue. (or mitigated was said @cpuguy83)

I tested in 4 clusters.

Thank you all for confirming; I'll go ahead and close this issue.

If you still run into this on Docker 18.03.1 or above; please open a new issue with details

@mavogel I had the same problem with freezing docker containers. The solution for me was that if I move logging from /dev/stderr to internal file inside docker container then the problem is gone. Probably there is some disk issue when container logs to /dev/stderr and probably it is the case for most of problems.

My (temporary) solution in both version 18.06.1-ce and 18.09 was similar to @casperWWW. In my case I lowered the log level of the applications executed inside the containers and they stopped hanging.

so what it seems is that the container cannot release allocated I/O resources.

I get the same issue, though without using docker-compose. I'm using docker swarm. Same thing though, I occasionally get containers that neither docker swarm nor I with the docker CLI can stop. This causes docker swarm to end up collecting more replicas than desired that it can't scale down. Sometimes these replicas can still service requests and receive traffic. The only way to remove the containers is to restart docker on the effected node.

I 'm same with you,except restart docker on the effected node,any other way to solve the problem?

I get the same issue, though without using docker-compose. I'm using docker swarm. Same thing though, I occasionally get containers that neither docker swarm nor I with the docker CLI can stop. This causes docker swarm to end up collecting more replicas than desired that it can't scale down. Sometimes these replicas can still service requests and receive traffic. The only way to remove the containers is to restart docker on the effected node.

I 'm same with you,except restart docker on the effected node,any other way to solve the problem?

See my comment I've sent earlier here -
Hopefully that will help you as well.

I get the same issue, though without using docker-compose. I'm using docker swarm. Same thing though, I occasionally get containers that neither docker swarm nor I with the docker CLI can stop. This causes docker swarm to end up collecting more replicas than desired that it can't scale down. Sometimes these replicas can still service requests and receive traffic. The only way to remove the containers is to restart docker on the effected node.

My docker version v17.12.1.
I get the same issue. this cause my service load balance on different image version and containers count more than replicas set. I think this a big bug of docker. It seriously affect my service in production. Please help resolve . @thaJeztah

Docker 17.12 has reached EOL over a year ago; are you able to reproduce on a current version?

Was this page helpful?
0 / 5 - 0 ratings