Kubeadm: Running pre-flight checks hang

Created on 1 Apr 2019  ·  22Comments  ·  Source: kubernetes/kubeadm

What keywords did you search in kubeadm issues before filing this one?

preflight
hang
kubeadm join

BUG REPORT

Versions

kubeadm version (use kubeadm version):
kubeadm version: &version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.0", GitCommit:"641856db18352033a0d96dbc99153fa3b27298e5", GitTreeState:"clean", BuildDate:"2019-03-25T15:51:21Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Kubernetes version (use kubectl version):
    Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.0", GitCommit:"641856db18352033a0d96dbc99153fa3b27298e5", GitTreeState:"clean", BuildDate:"2019-03-25T15:53:57Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
    NAME="CentOS Linux"
    VERSION="7 (Core)"
    ID="centos"
    ID_LIKE="rhel fedora"
    VERSION_ID="7"
    PRETTY_NAME="CentOS Linux 7 (Core)"
    ANSI_COLOR="0;31"
    CPE_NAME="cpe:/o:centos:centos:7"
    HOME_URL="https://www.centos.org/"
    BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

  • Kernel (e.g. uname -a):
    Linux vm02.andrefagundes.org 3.10.0-957.5.1.el7.x86_64 #1 SMP Fri Feb 1 14:54:57 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

  • Others:

What happened?

Problem when joining a control-plane. The process hang with message Running pre-flight checks. See below:

[root@vm02 ~]# kubeadm join vm10.andrefagundes.org:6443 --token 07nh7g.v8p5fcs61fn3o2h4 --discovery-token-ca-cert-hash sha256:039a5f9229dafe39d4a51af6899c20adff1de5dda23f780ac9b896e95f95623a --experimental-control-plane --certificate-key 8afd066a7b8baa2abf86ba1b2d5e7f29625875d8f78a3e136f7fd35605b4775
[preflight] Running pre-flight checks

What you expected to happen?

I was expecting the node to be joined or a message indicating an error.

How to reproduce it (as minimally and precisely as possible)?

I am following the official documentation below.

https://kubernetes.io/docs/setup/independent/high-availability/#external-etcd-nodes

Anything else we need to know?

No.

kinsupport prioritawaiting-more-evidence sinetwork

Most helpful comment

make sure you call kubeadm init/join with e.g. --v=2 to have more details on what's going on.

All 22 comments

With v10 parameter.

[root@vm03 etcd]# kubeadm join vm10.andrefagundes.org:6443 --token 07nh7g.v8p5fcs61fn3o2h4 --discovery-token-ca-cert-hash sha256:039a5f9229dafe39d4a51af6899c20adff1de5dda23f780ac9b896e95f95623a --experimental-control-plane --certificate-key cf3c8ca4f74751bfe7fc9d3e00e03a37619d36a6d6fb79fb5ba3645d74dd7bf4 -v10
I0401 00:34:08.531961 16893 join.go:367] [preflight] found NodeName empty; using OS hostname as NodeName
I0401 00:34:08.532014 16893 join.go:371] [preflight] found advertiseAddress empty; using default interface's IP address as advertiseAddress
I0401 00:34:08.532048 16893 initconfiguration.go:105] detected and using CRI socket: /var/run/dockershim.sock
I0401 00:34:08.532179 16893 interface.go:384] Looking for default routes with IPv4 addresses
I0401 00:34:08.532187 16893 interface.go:389] Default route transits interface "eth0"
I0401 00:34:08.532324 16893 interface.go:196] Interface eth0 is up
I0401 00:34:08.532380 16893 interface.go:244] Interface "eth0" has 4 addresses :[192.168.122.103/24 fe80::a3c0:2a34:91f2:e0eb/64 fe80::8439:c3eb:5848:c1f2/64 fe80::4381:b4a5:5836:a0e1/64].
I0401 00:34:08.532399 16893 interface.go:211] Checking addr 192.168.122.103/24.
I0401 00:34:08.532407 16893 interface.go:218] IP found 192.168.122.103
I0401 00:34:08.532415 16893 interface.go:250] Found valid IPv4 address 192.168.122.103 for interface "eth0".
I0401 00:34:08.532421 16893 interface.go:395] Found active IP 192.168.122.103
[preflight] Running pre-flight checks
I0401 00:34:08.532495 16893 preflight.go:90] [preflight] Running general checks
I0401 00:34:08.532539 16893 checks.go:254] validating the existence and emptiness of directory /etc/kubernetes/manifests
I0401 00:34:08.532570 16893 checks.go:292] validating the existence of file /etc/kubernetes/kubelet.conf
I0401 00:34:08.532579 16893 checks.go:292] validating the existence of file /etc/kubernetes/bootstrap-kubelet.conf
I0401 00:34:08.532586 16893 checks.go:105] validating the container runtime
I0401 00:34:08.580885 16893 checks.go:131] validating if the service is enabled and active
I0401 00:34:08.638659 16893 checks.go:341] validating the contents of file /proc/sys/net/bridge/bridge-nf-call-iptables
I0401 00:34:08.638724 16893 checks.go:341] validating the contents of file /proc/sys/net/ipv4/ip_forward
I0401 00:34:08.638755 16893 checks.go:653] validating whether swap is enabled or not
I0401 00:34:08.638788 16893 checks.go:382] validating the presence of executable ip
I0401 00:34:08.638809 16893 checks.go:382] validating the presence of executable iptables
I0401 00:34:08.638824 16893 checks.go:382] validating the presence of executable mount
I0401 00:34:08.638837 16893 checks.go:382] validating the presence of executable nsenter
I0401 00:34:08.638849 16893 checks.go:382] validating the presence of executable ebtables
I0401 00:34:08.638860 16893 checks.go:382] validating the presence of executable ethtool
I0401 00:34:08.638871 16893 checks.go:382] validating the presence of executable socat
I0401 00:34:08.638883 16893 checks.go:382] validating the presence of executable tc
I0401 00:34:08.638894 16893 checks.go:382] validating the presence of executable touch
I0401 00:34:08.638914 16893 checks.go:524] running all checks
I0401 00:34:08.664826 16893 checks.go:412] checking whether the given node name is reachable using net.LookupHost
I0401 00:34:08.665583 16893 checks.go:622] validating kubelet version
I0401 00:34:08.709573 16893 checks.go:131] validating if the service is enabled and active
I0401 00:34:08.716270 16893 checks.go:209] validating availability of port 10250
I0401 00:34:08.716418 16893 checks.go:439] validating if the connectivity type is via proxy or direct
I0401 00:34:08.716444 16893 join.go:427] [preflight] Discovering cluster-info
I0401 00:34:08.716498 16893 token.go:200] [discovery] Trying to connect to API Server "vm10.andrefagundes.org:6443"
I0401 00:34:08.716961 16893 token.go:75] [discovery] Created cluster-info discovery client, requesting info from "https://vm10.andrefagundes.org:6443"
I0401 00:34:08.717031 16893 round_trippers.go:419] curl -k -v -XGET -H "Accept: application/json, /" -H "User-Agent: kubeadm/v1.14.0 (linux/amd64) kubernetes/641856d" 'https://vm10.andrefagundes.org:6443/api/v1/namespaces/kube-public/configmaps/cluster-info'
I0401 00:34:08.722405 16893 round_trippers.go:438] GET https://vm10.andrefagundes.org:6443/api/v1/namespaces/kube-public/configmaps/cluster-info 403 Forbidden in 5 milliseconds
I0401 00:34:08.722423 16893 round_trippers.go:444] Response Headers:
I0401 00:34:08.722432 16893 round_trippers.go:447] Content-Type: application/json
I0401 00:34:08.722441 16893 round_trippers.go:447] X-Content-Type-Options: nosniff
I0401 00:34:08.722450 16893 round_trippers.go:447] Content-Length: 321
I0401 00:34:08.722458 16893 round_trippers.go:447] Date: Mon, 01 Apr 2019 03:34:08 GMT
I0401 00:34:08.722497 16893 request.go:942] Response Body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"configmaps \"cluster-info\" is forbidden: User \"system:anonymous\" cannot get resource \"configmaps\" in API group \"\" in the namespace \"kube-public\"","reason":"Forbidden","details":{"name":"cluster-info","kind":"configmaps"},"code":403}
I0401 00:34:08.722937 16893 token.go:83] [discovery] Failed to request cluster info, will try again: [configmaps "cluster-info" is forbidden: User "system:anonymous" cannot get resource "configmaps" in API group "" in the namespace "kube-public"]

Another info ... vm10.andrefagundes.org is a Haproxy in front of my control plane.

seems like a networking issue to me.
are you sure this joining node has connectivity to port 6443 on the LB and can resolve vm10.andrefagundes.org?

Yes, I also changed vm10 to point to control plane. I saw traffic on control plane coming in monitoring with TCDUMP.

are you seeing any outstanding errors in the kubelet logs?

There are several errors in the logs. I also tried to reinstall the cluster few times and each time I get different errors. I am giving up. We can close the case. Thanks!!

does creating a single control plane node + some worker nodes work for you or does the problem only happen when joining additional control plane nodes?

User "system:anonymous" cannot get resource "configmaps" in API group "" in the namespace "kube-public"","reason":"Forbidden","details":{"name":"cluster-info","kind":"configmaps"},"code":403

Seems like kubeadm init doesn't created/configured cluster-info properly
Could you share the kubeadm init logs?

I have the same error after i executed the command 'kubeadm join ...' : Running pre-flight checks stuck. I have no idea to handle it.

I had the same issue. I needed to reboot the master and after that executing the 'kubeadm join ...' command again on the nodes worked for me.

i had same issues with kubeadm v1.15, reboot master doesn't works for me

i had same issues with kubeadm v1.15, reboot master doesn't works for me

fall back to kubelet & kubeadm v1.13.1 fixed this issues

make sure you call kubeadm init/join with e.g. --v=2 to have more details on what's going on.

Bumped into the same issue but the problem was traced down to network connectivity my side with my keepalived and haproxy daemons that were configured wrongly preventing the hang master node to join the cluster via the API service VIP

Worth pointing out that running the kubeadm init/join with --v=2 was how I got to resolve it

make sure you call kubeadm init/join with e.g. --v=2 to have more details on what's going on.

kubeadm v1.15

kubeadm join .. --v=2

I0802 11:47:31.027812 359 token.go:202] [discovery] Failed to connect to API Server "": token id "r5uyqk" is invalid for this cluster or it has expired. Use "kubeadm token create" on the control-plane node to create a new valid token

kubeadm init phase upload-certs --upload-certs
kubeadm token create

then kubeadm join sucess

In my case, I was able to successfully join the node by stopping the firewall on the Master node.

systemctl stop firewall

In my case, I was able to successfully join the node by stopping the firewall on the Master node.

systemctl stop firewall

This one worked like charm .
[root@localhost ~]# kubeadm join 192.168.8.128:6443 --token 38lhr8.kxi5uy8aoy71dj17 --discovery-token-ca-cert-hash sha256:a12c805b8d98f42a256486d27e87463e22aaba190ab8f5bdce89bbb843fca983
[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 19.03.1. Latest validated version: 18.09
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.14" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Activating the kubelet service
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...

This node has joined the cluster:

  • Certificate signing request was sent to apiserver and a response was received.
  • The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

looking at the log in the OP again, this is not a "hang" in the preflight, but rather the cluster-info config map cannot be accessed, the only way this could happen if the "boostrap-token" phase of "init" is skipped.

looking at later reports, i see networking and expired token problems which fall under "support" items and not bugs.

/triage support
for questions, try stackoverflow, reddit or #kubeadm on k8s slack.

if you find a real bug please, open a new issue.

In my case, I was able to successfully join the node by stopping the firewall on the Master node.

systemctl stop firewall

systemctl stop firewalld

I find traffic was not allowed to connect master node.

adding rules in sg solved my problem

I have the same error after i executed the command 'kubeadm join ...' : Running pre-flight checks stuck. I have no idea to handle it.

Did you find any solution?

I find traffic was not allowed to connect master node.

adding rules in sg solved my problem

what inbound port you allowed?

Was this page helpful?
0 / 5 - 0 ratings