kubeadm version: 1.10.2
Environment:
A couple of months ago I created a kubernetes 1.9.3 HA cluster using kubeadm 1.9.3
, following the 'official' documentation https://kubernetes.io/docs/setup/independent/high-availability/ , setting up the etcd
HA cluster hosting it on the master nodes using static pods
I wanted to upgrade my cluster to k8s 1.10.2
using the latest kubeadm
; after updating kubeadm
, when running kubeadm upgrade plan
, I got the following error:
[root@shared-cob-01 tmp]# kubeadm upgrade plan
[preflight] Running pre-flight checks.
[upgrade] Making sure the cluster is healthy:
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[upgrade/plan] computing upgrade possibilities
[upgrade] Fetching available versions to upgrade to
[upgrade/versions] Cluster version: v1.9.3
[upgrade/versions] kubeadm version: v1.10.2
[upgrade/versions] Latest stable version: v1.10.2
[upgrade/versions] FATAL: context deadline exceeded
I investigate the issue and found the 2 root causes:
kubeadm
doesn't identify etcd
cluster as TLS enabledThe guide instruct to use the following command in the etcd
static pod
- etcd --name <name> \
- --data-dir /var/lib/etcd \
- --listen-client-urls http://localhost:2379 \
- --advertise-client-urls http://localhost:2379 \
- --listen-peer-urls http://localhost:2380 \
- --initial-advertise-peer-urls http://localhost:2380 \
- --cert-file=/certs/server.pem \
- --key-file=/certs/server-key.pem \
- --client-cert-auth \
- --trusted-ca-file=/certs/ca.pem \
- --peer-cert-file=/certs/peer.pem \
- --peer-key-file=/certs/peer-key.pem \
- --peer-client-cert-auth \
- --peer-trusted-ca-file=/certs/ca.pem \
- --initial-cluster etcd0=https://<etcd0-ip-address>:2380,etcd1=https://<etcd1-ip-address>:2380,etcd2=https://<etcd2-ip-address>:2380 \
- --initial-cluster-token my-etcd-token \
- --initial-cluster-state new
kubeadm >= 1.10
checks (here: https://github.com/kubernetes/kubernetes/blob/release-1.10/cmd/kubeadm/app/util/etcd/etcd.go#L56) if etcd
has TLS enabled by checking the presence of the following flags in the static pod command.
"--cert-file=",
"--key-file=",
"--trusted-ca-file=",
"--client-cert-auth=",
"--peer-cert-file=",
"--peer-key-file=",
"--peer-trusted-ca-file=",
"--peer-client-cert-auth=",
but as the flags --client-cert-auth
and --peer-client-cert-auth
are used in the instructions without any parameter (being booleans) kubeadm
didn’t recognise the etcd
cluster to have TLS enabled.
PERSONAL FIX:
I updated my etcd
static pod command to use - --client-cert-auth=true
and - --peer-client-cert-auth=true
GENERAL FIX:
Update the instructions to use --client-cert-auth=true
and --peer-client-cert-auth=true
and relax kubeadm checks using "--peer-cert-file"
and"--peer-key-file"
(without the equals)
kubeadm
didn't use the correct certificatesafter fixing point 1, the problem still persisted as kubeadm
was not using the right certificates.
By following the kubeadm HA guide, in fact, the created certificates are ca.pem
ca-key.pem
peer.pem
peer-key.pem
client.pem
client-key.pem
but the latest kubeadm
expects ca.crt
ca.key``peer.crt
peer.key``healthcheck-client.crt
healthcheck-client.key
instead.
Yhe kubeadm-config
MasterConfiguration keys etcd.caFile
, etcd.certFile
and etcd.keyFile
are ignored.
PERSONAL FIX:
Renamed .pem
certificate to their .crt
and .key
equivalent and updated the etcd
static pod configuration to use them.
GENERAL FIX:
Use the kubeadm-config
data.caFile
, data.certFile
and data.keyFile
values, infer the right certificates from etcd static pod definition (pod path + volumes hostPath) and/or create a new temporary client certificate to use during the upgrade.
The upgrade plan should have been executed correctly
create a k8s ha cluster using kubeadm 1.9.3
following https://kubernetes.io/docs/setup/independent/high-availability/ and try to update it to k8s >= 1.10
using the latest kubeadm
this issue seems to be fixed in kubeadm 1.10.3
, even though it will not automatically update the static etcd
pod as it recognise it as 'external'
I am using kubeadm 1.10.3
and have the same issues . My cluster is 1.10.2 with an external secure etcd
@brokenmass Does the values for your personnal fixes to the second cause you notice look like this :
caFile: /etc/kubernetes/pki/etcd/ca.crt
certFile: /etc/kubernetes/pki/etcd/healthcheck-client.crt
keyFile: /etc/kubernetes/pki/etcd/healthcheck-client.key
@detiber can you help please ?
@FloMedja
in my case the values looks like :
caFile: /etc/kubernetes/pki/etcd/ca.pem
certFile: /etc/kubernetes/pki/etcd/client.pem
keyFile: /etc/kubernetes/pki/etcd/client-key.pem
and 1.10.3 is working correctly
@brokenmass So with kubeadm 1.10.3 everything work without no need of your personals fixes. In this case i am little confused. I have kubeadm 1.10.3 but the same error message that you mention in this bug report. I will double check my config may be i make some mistakes elsewhere
add here (or join kubernetes slack and send me a direct message) your kubeadm-config, etcd static pods yml and the full output of kubeadm upgrade plan
My apologies, I'm just now seeing this. @chuckha did the original work for the static-pod HA etcd docs, I'll work with him over the next couple of days to see if we can help straighten out the HA upgrades.
@detiber thanks you. the upgrade plan finally work. but i face some race conditions issues when tries to upgrade the cluster. sometime it work sometimes i hae the same error as kubernetes/kubeadm/issues/850 . kubeadm run into race condition when try to restart a pod on one node.
I ran into some snags getting a test env setup for this today and I'm running out of time before my weekend starts. I'll pick back up on this early next week.
/assign @chuckha @detiber
@chuckha @detiber @stealthybox any update on this?
So 1.9->1.10 HA upgrade was not a supported or vetted path.
We are currently in progress on updating our maintain our docs for 1.11->1.12 which we do plan to maintain going forwards.