Kubeadm: "kubeadm reset" hangs forever intermittently

I see kubeadm version 1.7 hangs forever sometimes at

[reset] Removing kubernetes-managed containers

I am able to reproduce this issue 4 out of 5 times, and had to press CTRL+C to exit.

$ sudo kubeadm reset
sudo: unable to resolve host vhosakot-aci-1-w9c2681796d
[preflight] Running pre-flight checks
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Removing kubernetes-managed containers
^C  <--- Pressed CTRL+C to exit

Restarting docker by doing sudo systemctl restart docker.service resolves this issue and sudo kubeadm reset works fine without issues.

It will be nice if kubeadm reset checks the health of docker when removing kubernetes-managed containers and times out if docker in unhealthy, instead of hanging forever at:

[reset] Removing kubernetes-managed containers

kubeadm version:

$ kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.11", GitCommit:"b13f2fd682d56eab7a6a2b5a1cab1a3d2c8bdd55", GitTreeState:"clean", BuildDate:"2017-11-25T17:51:39Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}


  • Kubernetes version:
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.11", GitCommit:"b13f2fd682d56eab7a6a2b5a1cab1a3d2c8bdd55", GitTreeState:"clean", BuildDate:"2017-11-25T18:34:52Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.12", GitCommit:"3bda299a6414b4866f179921610d6738206a18fe", GitTreeState:"clean", BuildDate:"2017-12-29T08:39:49Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
  • OS:
    Ubuntu xenial VM:
$ sudo cat /etc/os-release
sudo: unable to resolve host vhosakot-aci-1-m3710102e28
VERSION="16.04.3 LTS (Xenial Xerus)"
PRETTY_NAME="Ubuntu 16.04.3 LTS"
  • Kernel:
$ uname -a
Linux vhosakot-aci-1-m3710102e28 4.4.0-104-generic #127-Ubuntu SMP Mon Dec 11 12:16:42 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
  • Others:
$ sudo docker info
sudo: unable to resolve host vhosakot-aci-1-w9c2681796d
Containers: 13
 Running: 9
 Paused: 0
 Stopped: 4
Images: 11
Server Version: 1.13.1
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 67
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version:  (expected: aa8187dbd3b7ad67d8e5e3a15115d3eef43a7ed1)
runc version: N/A (expected: 9df8b306d01f59d3a8029be411de015b7304dd8f)
init version: N/A (expected: 949e6facb77383876aeff8a6944dde66b3089574)
Security Options:
  Profile: default
Kernel Version: 4.4.0-104-generic
Operating System: Ubuntu 16.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 3
Total Memory: 15.67 GiB
Name: vhosakot-aci-1-w9c2681796d
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Experimental: false
Insecure Registries:
Live Restore Enabled: false

$ sudo docker version
sudo: unable to resolve host vhosakot-aci-1-w9c2681796d
 Version:      1.13.1
 API version:  1.26
 Go version:   go1.6.2
 Git commit:   092cba3
 Built:        Thu Nov  2 20:40:23 2017
 OS/Arch:      linux/amd64

 Version:      1.13.1
 API version:  1.26 (minimum version 1.12)
 Go version:   go1.6.2
 Git commit:   092cba3
 Built:        Thu Nov  2 20:40:23 2017
 OS/Arch:      linux/amd64
 Experimental: false

What happened?

kubeadm reset hangs forever.

What you expected to happen?

kubeadm reset times out if docker in unhealthy.

How to reproduce it (as minimally and precisely as possible)?

See steps above.

@cwedgwood I don't have any mount point and still see this issue.

Restarting docker by doing sudo systemctl restart docker.service resolves this issue and sudo kubeadm reset works fine without issues.

i see this too:

sudo kubeadm reset
[preflight] Running pre-flight checks
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Removing kubernetes-managed containers

what looks to be happening is docker kill/rm gets upset because of a wedged nfs mount

@vhosakot could it perhaps be something like that as well for you? that is stuck mount point?

@cwedgwood I don't have any mount point and still see this issue.

Restarting docker by doing sudo systemctl restart docker.service resolves this issue and sudo kubeadm reset works fine without issues.

/assign @detiber - this is a docs update.

/assign @chuckha

i can take this.

adding info in https://kubernetes.io/docs/setup/independent/troubleshooting-kubeadm/
unless there is a better location

^ PR sent to /website. we don't have enough debug information about this, but i've added a note in the docs with the fix that @vhosakot provided.

