kubeadm reset success but this node ip still in kubeadm-config configmap

Created on 5 Dec 2018  ·  32Comments  ·  Source: kubernetes/kubeadm

Is this a BUG REPORT or FEATURE REQUEST?

BUG REPORT

Versions

kubeadm version (use kubeadm version):

[root@k8s-211 ~]# kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.0", GitCommit:"ddf47ac13c1a9483ea035a79cd7c10005ff21a6d", GitTreeState:"clean", BuildDate:"2018-12-03T21:02:01Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Kubernetes version (use kubectl version):
[root@k8s-211 ~]# kubectl version
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.0", GitCommit:"ddf47ac13c1a9483ea035a79cd7c10005ff21a6d", GitTreeState:"clean", BuildDate:"2018-12-03T21:04:45Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.0", GitCommit:"ddf47ac13c1a9483ea035a79cd7c10005ff21a6d", GitTreeState:"clean", BuildDate:"2018-12-03T20:56:12Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
  • Kernel (e.g. uname -a):
Linux k8s-lixin-211 3.10.0-693.el7.x86_64 #1 SMP Tue Aug 22 21:09:27 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
  • Others:

What happened?

I use kubeadm reset -f to reset this control plane node, the command run success. But when i see kubeadm-config ConfigMap, it already have this node ip in ClusterStatus.

I still have a question, why kubeadm reset not delete this node directly from the cluster? Instead, run kubectl delete node <node name> manually.

What you expected to happen?

kubeadm-config ConfigMap remove this node ip.

How to reproduce it (as minimally and precisely as possible)?

  • kubeadm init --config=kubeadm.yml on the first node.
  • kubeadm join --experimental-control-plane --config=kubeadm.yml on the second node.
  • kubeadm reset -f on the second node.
  • kubectl -n kube-system get cm kubeadm-config -oyaml find the second node ip already in ClusterStatus.

Anything else we need to know?


kubeadm-config configMap yaml:

apiVersion: v1
data:
  ClusterConfiguration: |
    apiServer:
      extraArgs:
        authorization-mode: Node,RBAC
      timeoutForControlPlane: 4m0s
    apiVersion: kubeadm.k8s.io/v1beta1
    certificatesDir: /etc/kubernetes/pki
    clusterName: kubernetes
    controlPlaneEndpoint: 192.168.46.117:6443
    controllerManager: {}
    dns:
      type: CoreDNS
    etcd:
      local:
        dataDir: /var/lib/etcd
        extraArgs:
          cipher-suites: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
        serverCertSANs:
        - 192.168.46.117
    imageRepository: k8s.gcr.io
    kind: ClusterConfiguration
    kubernetesVersion: v1.13.0
    networking:
      dnsDomain: cluster.local
      podSubnet: 10.244.0.0/16
      serviceSubnet: 10.96.0.0/12
    scheduler: {}
  ClusterStatus: |
    apiEndpoints:
      k8s-211:
        advertiseAddress: 192.168.46.211
        bindPort: 6443
      k8s-212:
        advertiseAddress: 192.168.46.212
        bindPort: 6443
    apiVersion: kubeadm.k8s.io/v1beta1
    kind: ClusterStatus
kind: ConfigMap
metadata:
  creationTimestamp: "2018-12-04T14:17:38Z"
  name: kubeadm-config
  namespace: kube-system
  resourceVersion: "103402"
  selfLink: /api/v1/namespaces/kube-system/configmaps/kubeadm-config
  uid: 5a9320c1-f7cf-11e8-868d-0050568863b3

help wanted kinbug prioritimportant-soon

Most helpful comment

Had the same issue in 1.13.3 (HA cluster setup: 3 master nodes + 3 workers). Successfully replaced master node only after next steps:

Delete node from cluster

kubectl delete node master03

Download etcdctl (for example, on master01)

mkdir /opt/tools && cd /opt/tools
wget https://github.com/etcd-io/etcd/releases/download/v3.3.12/etcd-v3.3.12-linux-arm64.tar.gz
tar xfz etcd-v3.3.12-linux-arm64.tar.gz

Remove master node from etcd

cd /opt/tools/etcd-v3.3.12-linux-arm64
./etcdctl --endpoints https://192.168.0.11:2379 --ca-file /etc/kubernetes/pki/etcd/ca.crt --cert-file /etc/kubernetes/pki/etcd/server.crt --key-file /etc/kubernetes/pki/etcd/server.key member list
./etcdctl --endpoints https://192.168.0.11:2379 --ca-file /etc/kubernetes/pki/etcd/ca.crt --cert-file /etc/kubernetes/pki/etcd/server.crt --key-file /etc/kubernetes/pki/etcd/server.key member remove 28a9dabfcfbca673

Remove from kubeadm-config

kubectl -n kube-system get cm kubeadm-config -o yaml > /tmp/conf.yml
manually edit /tmp/conf.yml to remove the old server
kubectl -n kube-system apply -f /tmp/conf.yml

All 32 comments

cc @fabriziopandini

Ideally there would be a way to "refresh" the ClusterStatus. We run clusters with chaos testing, it is entirely possible for a control plane node to be terminated without warning and without opportunity to run kubeadm reset. Ideally there would be a clean way to update the ClusterStatus explicitly to remove control plane nodes we know are no longer in the cluster. This is something that'd be done before running kubeadm join --control-plane ... or possibly it is built in?

Some comments here:

kubeadm-config ConfigMap remove this node ip.

@pytimer I know that having a node api address left in cluster status is not ideal, but I'm interested in understanding if this "lack of cleanup" generates problems or not. Have you tried to join the same control-plane again? have you tried to join another control-plane? I don't expect problems, but getting a confirmation on this point will really be appreciated.

I still have a question, why kubeadm reset not delete this node directly from the cluster? Instead, run kubectl delete node manually.

@luxas might be a little bit of historical context can help here.
My guessing is that the node didn't have the privilege to delete themselves (but this apply to worker nodes, not to control-plane nodes...)

Ideally there would be a way to "refresh" the ClusterStatus / there would be a clean way to update the ClusterStatus explicitly

@danbeaulieu that's a good point. having an explicit command for syncing cluster status and or enforcing automatic sync when kubeadm is executed is a good idea.
However, being kubeadm without any kind of continuosly running control loop, I think that there will be always the possibility to have ClusterStatus out of sync.
This should not be a problem, or more in specifically having node ip for nodes not existing anymore (lack of cleanup) should not be a problem.
Instead if a node exists and the corresponding node ip is missing from ClusterStatus (wrong initialization) this could create problems e.g. for updates.

Could you kindly report if the above assumption are confirmed by your chaos testing? any feedback will be really appreaciated.

@fabriziopandini I join the same control-plane node, it's failed.

My join steps:

The second control-plane node ip is 192.168.46.212.

  • remove the 192.168.46.212 node etcd member from etcd cluster.
  • kubectl delete node k8s-212
  • kubeadm reset -f on this control-plane node.
  • run kubeadm join --experimental-control-plane --config kubeadm.yaml -v 5 again.

kubeadm join logs :

...
[etcd] Checking Etcd cluster health
I1207 17:57:18.109993    8541 local.go:66] creating etcd client that connects to etcd pods
I1207 17:57:18.110000    8541 etcd.go:134] checking etcd manifest
I1207 17:57:18.119797    8541 etcd.go:181] etcd endpoints read from pods: https://192.168.46.211:2379,https://192.168.46.212:2379
I1207 17:57:18.131111    8541 etcd.go:221] etcd endpoints read from etcd: https://192.168.46.211:2379
etcd cluster is not healthy: context deadline exceeded

I see the kubeadm code, and i think this problem maybe caused by 192.168.46.212 left in the kubeadm-config ConfigMap.

Kubeadm get api endpoints from kubeadm-config ConfigMap when join control-plane node, and etcd endpoints are the same as api endpoints. But 912.168.46.212 control-plane node has been removed and it has not been joined yet, so check etcd cluster health wrong.

When i remove the 192.168.46.212 api endpoint from the kubeadm-config ConfigMap, and join this control-plane node again, it join success.

@pytimer thanks!
This should be investigated. There is already a logic that tries to sync the supposed list of etcd endpoints with the real etcd list endpoint, but something seems not working properly

Yes this does seem like a bug. We have a 3 node control plane ASG. If we terminate an instance a new one will be created per the ASG rules. During this time the terminated node is listed as unhealthy in the member list of etcd. When the new instance comes up, before running kubeadm join..., it removes the unhealthy member from etcd. By the time we run kubeadm join... the etcd cluster is healthy with 2 nodes according to etcd. However kubeadm uses the ClusterStatus as its source of truth, which still has the old instance listed.

The workaround for us is right after doing the etcd membership management is to update the kubeadm-config ConfigMap with the truth of the cluster, and then running kubeadm join....

Ideally kubeadm join... would use etcd as the source of truth and update the kubeadm-config ConfigMap accordingly.

@fabianofranz I maybe found the cause of this problem.

When sync the etcd endpoints with the real etcd endpoint list, the sync is success. But assign the real etcd endpoints to etcd client Endpoints, this client variable is not a pointer, so when other code use the client, this client endpoints still old endpoints, not the real endpoints after sync.

I fixed this problem in my fork repository, you can check this PR https://github.com/pytimer/kubernetes/commit/0cdf6cad87a317be5326f868bafe4faecc80f033. And i testing join the same control-plane node user case, it join success.

@pytimer Looks great! Well spotted!
Could you kindly send a PR? IMO this will be eligible for cherry picking.

@neolit123 @timothysc ^^^

@fabianofranz The first PR is wrong, i forget confirm CLA.

This PR https://github.com/kubernetes/kubernetes/pull/71945 you can check. If anything wrong, hope you point out.

@fabriziopandini I join the same control-plane node, it's failed.

My join steps:

The second control-plane node ip is 192.168.46.212.

  • remove the 192.168.46.212 node etcd member from etcd cluster.
  • kubectl delete node k8s-212
  • kubeadm reset -f on this control-plane node.
  • run kubeadm join --experimental-control-plane --config kubeadm.yaml -v 5 again.

kubeadm join logs :

...
[etcd] Checking Etcd cluster health
I1207 17:57:18.109993    8541 local.go:66] creating etcd client that connects to etcd pods
I1207 17:57:18.110000    8541 etcd.go:134] checking etcd manifest
I1207 17:57:18.119797    8541 etcd.go:181] etcd endpoints read from pods: https://192.168.46.211:2379,https://192.168.46.212:2379
I1207 17:57:18.131111    8541 etcd.go:221] etcd endpoints read from etcd: https://192.168.46.211:2379
etcd cluster is not healthy: context deadline exceeded

I see the kubeadm code, and i think this problem maybe caused by 192.168.46.212 left in the kubeadm-config ConfigMap.

Kubeadm get api endpoints from kubeadm-config ConfigMap when join control-plane node, and etcd endpoints are the same as api endpoints. But 912.168.46.212 control-plane node has been removed and it has not been joined yet, so check etcd cluster health wrong.

When i remove the 192.168.46.212 api endpoint from the kubeadm-config ConfigMap, and join this control-plane node again, it join success.

Got same error in kubeadm version 1.13.2, I tried to remove the node manually and update kubeadm-config, it doesn't work, the rest etcd nodes still try to connect the removed node

When i remove the 192.168.46.212 api endpoint from the kubeadm-config ConfigMap, and join this control-plane node again, it join success.

@pytimer Can you please elaborate on how you manually removed the old api-server?

I am running 1.13.3; removing the old server manually via:

1. kubectl -n kube-system get cm kubeadm-config -o yaml > /tmp/conf.yml
2. manually edit /tmp/conf.yml to remove the old server
3. kubectl -n kube-system apply -f /tmp/conf.yml 

I'm still not able to join the cluster due to the error:

[etcd] Checking etcd cluster health
etcd cluster is not healthy: context deadline exceeded

I've then killed the api pods and the etcd pods (2 of each).

They get recreated, but I still have the same error when trying to connect the additional node.

Had the same issue in 1.13.3 (HA cluster setup: 3 master nodes + 3 workers). Successfully replaced master node only after next steps:

Delete node from cluster

kubectl delete node master03

Download etcdctl (for example, on master01)

mkdir /opt/tools && cd /opt/tools
wget https://github.com/etcd-io/etcd/releases/download/v3.3.12/etcd-v3.3.12-linux-arm64.tar.gz
tar xfz etcd-v3.3.12-linux-arm64.tar.gz

Remove master node from etcd

cd /opt/tools/etcd-v3.3.12-linux-arm64
./etcdctl --endpoints https://192.168.0.11:2379 --ca-file /etc/kubernetes/pki/etcd/ca.crt --cert-file /etc/kubernetes/pki/etcd/server.crt --key-file /etc/kubernetes/pki/etcd/server.key member list
./etcdctl --endpoints https://192.168.0.11:2379 --ca-file /etc/kubernetes/pki/etcd/ca.crt --cert-file /etc/kubernetes/pki/etcd/server.crt --key-file /etc/kubernetes/pki/etcd/server.key member remove 28a9dabfcfbca673

Remove from kubeadm-config

kubectl -n kube-system get cm kubeadm-config -o yaml > /tmp/conf.yml
manually edit /tmp/conf.yml to remove the old server
kubectl -n kube-system apply -f /tmp/conf.yml

@zhangyelong Now kubeadm reset can't remove the etcd member, so you found the etcd cluster still connect the removed etcd node. You should manually remove etcd member using etcdctl now.

I send a PR to implement remove the etcd node when reset, you can see. https://github.com/kubernetes/kubernetes/pull/74112

@lvangool You can follow @Halytskyi steps. The PR: https://github.com/kubernetes/kubernetes/pull/71945 fixes sync etcd endpoints when join the control plane node, can not remove the etcd member.

Remove the etcd member from the etcd cluster when reset, you can see kubernetes/kubernetes#74112.

This seems to still be a bug in 1.13.4.

We still need to manually update the kubeadm config map ala https://github.com/kubernetes/kubeadm/issues/1300#issuecomment-463374200

Is it not the case that the fix in
kubernetes/kubernetes#71945 would use the etcd cluster membership as the source of truth for cluster members? If not, what exactly did that PR fix?

Interestingly it sporadically works because in golang range over maps, like ClusterStatus, is non-deterministic. So if the first endpoint it finds is from an old endpoint that no longer exists, things fail. If it finds a healthy endpoint it will update the ClusterStatus from the etcd Sync...

I believe the root cause of this is a bug in the etcd clientv3 where the bug causes the client not to retry the other endpoints if the first one fails https://github.com/etcd-io/etcd/issues/9949.

Please use following issue for tracking reset improvements

@fabriziopandini There is at least one other issue in here that is unrelated to kubeadm reset.

If a node fails without the chance to perform kubeadm reset (instance termination, hardware failure etc)
The cluster is left in a state where the ClusterStatus.apiEndpoints still lists a node that is no longer in the cluster. This requires the workaround of reading, editing and updating the config map before performing kubeadm join. Kubeadm probably has 2 options:

1) Implement the etcd client retry itself if the dial fails
2) Wait for the go-grpc bug to be fixed and then for the fix to make it to etcd client

This issue may be a good issue to use to track either of those options.

If a node fails without the chance to perform kubeadm reset (instance termination, hardware failure etc)
The cluster is left in a state where the ClusterStatus.apiEndpoints still lists a node that is no longer in the cluster. This requires the workaround of reading, editing and updating the config map before performing kubeadm join.

that is true, without calling reset you will have to manually update the ClusterStatus.
we don't have a command that does that. if you feel this a feature that kubeadm should support please file a separate ticket.

Just experienced this today on 1.14.1

The instance running one of my master nodes failed which kept it from being gracefully removed. When a new node tried to come in it failed to join due to the error described in this ticket.

I had to manually remove the etcd member via etcdctl, then I could join in a new node. I also manually removed the node from the kubeadm-config ConfigMap, but I am not sure if that was required.

@Halytskyi Thank you etcdctl section helped for me.....

Experienced this today in 1.15.5

In my case, I joined the cluster but with 1.16 version. then deleted this node kubectl delete node, downgrade to 15.5.5 and try to rejoin (same ip, same hostname, different version) and got the etcd unhealthy error.

Solved by (based on @Halytskyi answer but with updated etcdctl):

  • Delete the node form the kubeadm-config configmap
>: kubectl edit configmap  kubeadm-config -n kube-system
configmap/kubeadm-config edited
  • kubeadm reset -f in the problematic node && iptables -t -f -X and so on.

  • delete etcd member (this is the key):

root@k8s-nebula-m-115-2:wget https://github.com/etcd-io/etcd/releases/download/v3.4.3/etcd-v3.4.3-linux-amd64.tar.gz
root@k8s-nebula-m-115-2:tar xfz etcd-v3.4.3-linux-amd64.tar.gz

```shell
root@k8s-nebula-m-115-2:~/etcdctl/etcd-v3.4.3-linux-amd64# ./etcdctl --endpoints https://127.0.0.1:2379 --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member list
289ed62da3c6e9e5, started, k8s-nebula-m-115-1, https://10.205.30.2:2380, https://10.205.30.2:2379, false
917e16b9e790c427, started, k8s-nebula-m-115-0, https://10.205.30.1:2380, https://10.205.30.1:2379, false
ad6b76d968b18085, started, k8s-nebula-m-115-2, https://10.205.30.0:2380, https://10.205.30.0:2379, false

```shell
root@k8s-nebula-m-115-2:~/etcdctl/etcd-v3.4.3-linux-amd64# ./etcdctl --endpoints https://127.0.0.1:2379 --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member remove 289ed62da3c6e9e5
Member 289ed62da3c6e9e5 removed from cluster d4913a539ea2384e

And then rejoin works.

this can happen if kubeadm reset is interrupted and couldn't delete the node from the kubeadm CM.
in such a case you need to manually delete it from the kubeadm CM.

So if I delete the node with kubectl delete node foobar it does not
delete it from etcd member? But if i do kubeadm reset in the node i want
to delete, then it does it? 🙄

On Wed, 30 Oct 2019, 13:27 Lubomir I. Ivanov, notifications@github.com
wrote:

this can happen if kubeadm reset is interrupted and couldn't delete the
node from the kubeadm CM.
in such a case you need to manually delete it from the kubeadm CM.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes/kubeadm/issues/1300?email_source=notifications&email_token=AF7BZL3Q4E2FMPZYKYNOV53QRF4SXA5CNFSM4GIIZTPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECT7BPQ#issuecomment-547877054,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AF7BZL4EOZV7GQYNQOM3773QRF4SXANCNFSM4GIIZTPA
.

"kubeadm reset" should delete it from the kubeadm CM, but calling "kubectl delete node" is also needed which deletes the Node API object.

In my case deleting the node from de configmap did not delete it form the
etcd cluster i needed to manually etcdctl delete member.

On Thu, 31 Oct 2019 at 16:28, Lubomir I. Ivanov notifications@github.com
wrote:

"kubeadm reset" should delete it from the kubeadm CM, but calling "kubectl
delete node" is also needed which deletes the Node API object.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes/kubeadm/issues/1300?email_source=notifications&email_token=AF7BZLZVF7FFVA3LWINJZW3QRL2TLA5CNFSM4GIIZTPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECYGI4Y#issuecomment-548430963,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AF7BZL2KB3GVLTFKQTJTYXLQRL2TLANCNFSM4GIIZTPA
.

kubeadm reset should also remove the etcd member from the etcd cluster.
try executing it with e.g. --v=5 and see what it does.

however keep in mind that kubeadm reset is a best effort command so if it fails for some reason it might only print a warning.

so kubectl delete node does not delete it from etcd. Instead, running in the node kubeadm reset does it.
sounds broken to me, I think kubectl delete node should delete it form etcd also. Or am I missing an obvious use case?
maybe asking if it should be also deleted from there?
Anyway thanks for the clarification @neolit123, I first delete it from the control plane and then did a reset, guess it was too late to delete himself from the etcd.

so there are different responsibilities.
kubectl delete node, deletes the Node API object - you should do this when you are really sure that you no longer want the node around,
before that you should call kubeadm reset on that node. what i does is it cleans the state on disk and also removes the etcd member (if this is a control-plane node and if you are using the default option where etcd instances are running per control-plane node)

kubeadm reset resets the node, but it does not delete the Node object for a couple of reasons:

  • reset just resets the node and you can rejoin it. the Node name remains reserved.
  • the node itself does not have enough privileges to delete it's Node object. this is the responsibility of the owner of the "admin.conf" (e.g. administrator).

kubeadm reset is a best effort command

Regarding this: when the kubeadm reset fails to complete for whatever reason (including a hard fail of the underlying server so that kubeadm reset is never executed in the first place) are there any options to manually reconcile the state beside manually editing the kubeadm-config configmap object and removing the node?

if the node is hard failed and you cannot call kubeadm reset on it, it requires manual steps. you'd have to :
1) remove the control-plane IP from the kubeadm-config CM ClusterStatus
2) remove the etcd member using etcdctl
3) delete the Node object using kubectl (if you don't want the Node around anymore)

1 and 2 apply only to control-plane nodes.

Is there any way to automate this fail-over if kubeadm reset cannot be run?

Same problems on 1.9. Thanks for solutions.

Was this page helpful?
0 / 5 - 0 ratings