kubeadm version: 1.10.2
A couple of months ago I created a kubernetes 1.9.3 HA cluster using
kubeadm 1.9.3, following the 'official' documentation https://kubernetes.io/docs/setup/independent/high-availability/ , setting up the
etcd HA cluster hosting it on the master nodes using static pods
I wanted to upgrade my cluster to
k8s 1.10.2 using the latest
kubeadm; after updating
kubeadm, when running
kubeadm upgrade plan, I got the following error:
[[email protected] tmp]# kubeadm upgrade plan [preflight] Running pre-flight checks. [upgrade] Making sure the cluster is healthy: [upgrade/config] Making sure the configuration is correct: [upgrade/config] Reading configuration from the cluster... [upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml' [upgrade/plan] computing upgrade possibilities [upgrade] Fetching available versions to upgrade to [upgrade/versions] Cluster version: v1.9.3 [upgrade/versions] kubeadm version: v1.10.2 [upgrade/versions] Latest stable version: v1.10.2 [upgrade/versions] FATAL: context deadline exceeded
I investigate the issue and found the 2 root causes:
etcdcluster as TLS enabled
The guide instruct to use the following command in the
etcd static pod
- etcd --name <name> \ - --data-dir /var/lib/etcd \ - --listen-client-urls http://localhost:2379 \ - --advertise-client-urls http://localhost:2379 \ - --listen-peer-urls http://localhost:2380 \ - --initial-advertise-peer-urls http://localhost:2380 \ - --cert-file=/certs/server.pem \ - --key-file=/certs/server-key.pem \ - --client-cert-auth \ - --trusted-ca-file=/certs/ca.pem \ - --peer-cert-file=/certs/peer.pem \ - --peer-key-file=/certs/peer-key.pem \ - --peer-client-cert-auth \ - --peer-trusted-ca-file=/certs/ca.pem \ - --initial-cluster etcd0=https://<etcd0-ip-address>:2380,etcd1=https://<etcd1-ip-address>:2380,etcd2=https://<etcd2-ip-address>:2380 \ - --initial-cluster-token my-etcd-token \ - --initial-cluster-state new
kubeadm >= 1.10 checks (here: https://github.com/kubernetes/kubernetes/blob/release-1.10/cmd/kubeadm/app/util/etcd/etcd.go#L56) if
etcd has TLS enabled by checking the presence of the following flags in the static pod command.
"--cert-file=", "--key-file=", "--trusted-ca-file=", "--client-cert-auth=", "--peer-cert-file=", "--peer-key-file=", "--peer-trusted-ca-file=", "--peer-client-cert-auth=",
but as the flags
--peer-client-cert-auth are used in the instructions without any parameter (being booleans)
kubeadm didn’t recognise the
etcd cluster to have TLS enabled.
I updated my
etcd static pod command to use
- --client-cert-auth=true and
Update the instructions to use
--peer-client-cert-auth=true and relax kubeadm checks using
"--peer-key-file" (without the equals)
kubeadmdidn't use the correct certificates
after fixing point 1, the problem still persisted as
kubeadm was not using the right certificates.
By following the kubeadm HA guide, in fact, the created certificates are
client-key.pem but the latest
kubeadm-config MasterConfiguration keys
etcd.keyFile are ignored.
.pem certificate to their
.key equivalent and updated the
etcd static pod configuration to use them.
data.keyFile values, infer the right certificates from etcd static pod definition (pod path + volumes hostPath) and/or create a new temporary client certificate to use during the upgrade.
The upgrade plan should have been executed correctly
create a k8s ha cluster using
kubeadm 1.9.3 following https://kubernetes.io/docs/setup/independent/high-availability/ and try to update it to
k8s >= 1.10 using the latest
this issue seems to be fixed in
kubeadm 1.10.3, even though it will not automatically update the static
etcd pod as it recognise it as 'external'
I am using
kubeadm 1.10.3 and have the same issues . My cluster is 1.10.2 with an external secure etcd
@brokenmass Does the values for your personnal fixes to the second cause you notice look like this :
caFile: /etc/kubernetes/pki/etcd/ca.crt certFile: /etc/kubernetes/pki/etcd/healthcheck-client.crt keyFile: /etc/kubernetes/pki/etcd/healthcheck-client.key
@detiber can you help please ?
in my case the values looks like :
caFile: /etc/kubernetes/pki/etcd/ca.pem certFile: /etc/kubernetes/pki/etcd/client.pem keyFile: /etc/kubernetes/pki/etcd/client-key.pem
and 1.10.3 is working correctly
@brokenmass So with kubeadm 1.10.3 everything work without no need of your personals fixes. In this case i am little confused. I have kubeadm 1.10.3 but the same error message that you mention in this bug report. I will double check my config may be i make some mistakes elsewhere
add here (or join kubernetes slack and send me a direct message) your kubeadm-config, etcd static pods yml and the full output of
kubeadm upgrade plan
My apologies, I'm just now seeing this. @chuckha did the original work for the static-pod HA etcd docs, I'll work with him over the next couple of days to see if we can help straighten out the HA upgrades.
@detiber thanks you. the upgrade plan finally work. but i face some race conditions issues when tries to upgrade the cluster. sometime it work sometimes i hae the same error as kubernetes/kubeadm/issues/850 . kubeadm run into race condition when try to restart a pod on one node.
I ran into some snags getting a test env setup for this today and I'm running out of time before my weekend starts. I'll pick back up on this early next week.
/assign @chuckha @detiber
@chuckha @detiber @stealthybox any update on this?
So 1.9->1.10 HA upgrade was not a supported or vetted path.
We are currently in progress on updating our maintain our docs for 1.11->1.12 which we do plan to maintain going forwards.