错误报告
kubeadm版本(使用kubeadm version
):
kubeadm版本:&version.Info {主要:“ 1”,次要:“ 10”,GitVersion:“ v1.10.0”,GitCommit:“ fc32d2f3698e36b93322a3465f63a14e9f0eaead”,GitTreeState:“ clean”,BuildDate:“ 2018-03-26T16:44: 10Z“,GoVersion:” go1.9.3“,编译器:” gc“,平台:” linux / amd64“}
环境:
kubectl version
):客户端版本:version.Info {主要:“ 1”,次要:“ 9”,GitVersion:“ v1.9.6”,GitCommit:“ 9f8ebd171479bec0ada837d7ee641dec2f8c6dd1”,GitTreeState:“ clean”,BuildDate:“ 2018-03-21T15:21: 50Z“,GoVersion:” go1.9.3“,编译器:” gc“,平台:” linux / amd64“}
服务器版本:version.Info {主要:“ 1”,次要:“ 9”,GitVersion:“ v1.9.6”,GitCommit:“ 9f8ebd171479bec0ada837d7ee641dec2f8c6dd1”,GitTreeState:“ clean”,BuildDate:“ 2018-03-21T15:13: 31Z“,GoVersion:” go1.9.3“,编译器:” gc“,平台:” linux / amd64“}
Scaleway裸机C2S
Ubuntu Xenial(16.04 LTS)(GNU / Linux 4.4.122-mainline-rev1 x86_64)
uname -a
):Linux amd64-master-1 4.4.122-mainline-rev1#1 SMP Sun Mar 18 10:44:19 UTC 2018 x86_64 x86_64 x86_64 GNU / Linux
尝试从1.9.6升级到1.10.0时出现此错误:
kubeadm upgrade apply v1.10.0
[preflight] Running pre-flight checks.
[upgrade] Making sure the cluster is healthy:
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[upgrade/version] You have chosen to change the cluster version to "v1.10.0"
[upgrade/versions] Cluster version: v1.9.6
[upgrade/versions] kubeadm version: v1.10.0
[upgrade/confirm] Are you sure you want to proceed with the upgrade? [y/N]: y
[upgrade/prepull] Will prepull images for components [kube-apiserver kube-controller-manager kube-scheduler]
[upgrade/apply] Upgrading your Static Pod-hosted control plane to version "v1.10.0"...
[etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests411909119/etcd.yaml"
[certificates] Generated etcd/ca certificate and key.
[certificates] Generated etcd/server certificate and key.
[certificates] etcd/server serving cert is signed for DNS names [localhost] and IPs [127.0.0.1]
[certificates] Generated etcd/peer certificate and key.
[certificates] etcd/peer serving cert is signed for DNS names [arm-master-1] and IPs [10.1.244.57]
[certificates] Generated etcd/healthcheck-client certificate and key.
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/etcd.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests180476754/etcd.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/apply] FATAL: fatal error when trying to upgrade the etcd cluster: couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced: [timed out waiting for the condition], rolled the state back to pre-upgrade state
成功升级
安装1.9.6软件包并初始化1.9.6集群:
curl -fsSL https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
echo "deb http://apt.kubernetes.io/ kubernetes-xenial main" | tee /etc/apt/sources.list.d/kubernetes.list
apt-get update -qq
apt-get install -qy kubectl=1.9.6-00
apt-get install -qy kubelet=1.9.6-00
apt-get install -qy kubeadm=1.9.6-00
编辑kubeadm-config并将FeatureGates从字符串更改为map,如https://github.com/kubernetes/kubernetes/issues/61764中所报告。
kubectl -n kube-system edit cm kubeadm-config
....
featureGates: {}
....
下载kubeadm 1.10.0并运行kubeadm upgrade plan
和kubeadm upgrade apply v1.10.0
。
致力于在本地重现此错误。
重试10次后,终于可以了
这是我的etcd清单差异
root @ vagrant :〜#diff /etc/kubernetes/manifests/etcd.yaml /etc/kubernetes/tmp/kubeadm-backup-manifests858209931/etcd.yaml
16,17c16,17
<---listen-client-urls = https://127.0.0.1:2379
- --listen-client-urls=http://127.0.0.1:2379 - --advertise-client-urls=http://127.0.0.1:2379
19,27c19
<---key-file = / etc / kubernetes / pki / etcd / server.key
<---trusted-ca-file = / etc / kubernetes / pki / etcd / ca.crt
<---peer-cert-file = / etc / kubernetes / pki / etcd / peer.crt
<---peer-key-file = / etc / kubernetes / pki / etcd / peer.key
<---client-cert-auth = true
<---peer-client-cert-auth = true
<---cert-file = / etc / kubernetes / pki / etcd / server.crt
<---peer-trusted-ca-file = / etc / kubernetes / pki / etcd / ca.crt<图片:gcr.io/google_containers/etcd-amd64:3.1.12
image: gcr.io/google_containers/etcd-amd64:3.1.11
29,35天20
<执行:
<命令:
<-/ bin / sh
<--ec
<-ETCDCTL_API = 3 etcdctl --endpoints = 127.0.0.1:2379 --cacert = / etc / kubernetes / pki / etcd / ca.crt
<--cert = / etc / kubernetes / pki / etcd / healthcheck-client.crt --key = / etc / kubernetes / pki / etcd / healthcheck-client.key
<获取foo
36a22,26
httpGet:
主持人:127.0.0.1
路径:/健康
端口:2379
方案:HTTP
43,45c33
<名称:etcd-data
<-mountPath:/ etc / kubernetes / pki / etcd<名称:etcd-certs
name: etcd
51,55c39
<名称:etcd-data
<-hostPath:
<路径:/ etc / kubernetes / pki / etcd
<类型:DirectoryOrCreate<名称:etcd-certs
name: etcd
root @ vagrant :〜#ls / etc / kubernetes / pki / etcd
ca.crt ca.key healthcheck-client.crt healthcheck-client.key peer.crt peer.key server.crt server.key``
Ubuntu 17.10 Vagrant上的1.9.6集群:
root<strong i="6">@vagrant</strong>:/vagrant# 1.10_kubernetes/server/bin/kubeadm upgrade apply v1.10.0
[preflight] Running pre-flight checks.
[upgrade] Making sure the cluster is healthy:
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[upgrade/version] You have chosen to change the cluster version to "v1.10.0"
[upgrade/versions] Cluster version: v1.9.6
[upgrade/versions] kubeadm version: v1.10.0
[upgrade/confirm] Are you sure you want to proceed with the upgrade? [y/N]: y
[upgrade/prepull] Will prepull images for components [kube-apiserver kube-controller-manager kube-scheduler]
[upgrade/apply] Upgrading your Static Pod-hosted control plane to version "v1.10.0"...
[etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests262738652/etcd.yaml"
[certificates] Generated etcd/ca certificate and key.
[certificates] Generated etcd/server certificate and key.
[certificates] etcd/server serving cert is signed for DNS names [localhost] and IPs [127.0.0.1]
[certificates] Generated etcd/peer certificate and key.
[certificates] etcd/peer serving cert is signed for DNS names [vagrant] and IPs [10.0.2.15]
[certificates] Generated etcd/healthcheck-client certificate and key.
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/etcd.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests858209931/etcd.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[apiclient] Error getting Pods with label selector "component=etcd" [the server was unable to return a response in the time allotted, but may still be processing the request (get pods)]
[apiclient] Error getting Pods with label selector "component=etcd" [Get https://10.0.2.15:6443/api/v1/namespaces/kube-system/pods?labelSelector=component%3Detcd: http2: server sent GOAWAY and closed the connection; LastStreamID=27, ErrCode=NO_ERROR, debug=""]
[apiclient] Error getting Pods with label selector "component=etcd" [Get https://10.0.2.15:6443/api/v1/namespaces/kube-system/pods?labelSelector=component%3Detcd: net/http: TLS handshake timeout]
[apiclient] Error getting Pods with label selector "component=etcd" [the server was unable to return a response in the time allotted, but may still be processing the request (get pods)]
[apiclient] Error getting Pods with label selector "component=etcd" [Get https://10.0.2.15:6443/api/v1/namespaces/kube-system/pods?labelSelector=component%3Detcd: http2: server sent GOAWAY and closed the connection; LastStreamID=3, ErrCode=NO_ERROR, debug=""]
[upgrade/apply] FATAL: fatal error when trying to upgrade the etcd cluster: couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced: [timed out waiting for the condition], rolled the state back to pre-upgrade state
这是我的复制环境: https :
将这些行更改为1.9.6-00
进行引导: https :
然后将1.10服务器二进制文件下载到存储库中,它们将以/vagrant
提供给来宾使用
https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.10.md#server -binaries
kubelet etcd相关的日志:
root<strong i="6">@vagrant</strong>:~# journalctl -xefu kubelet | grep -i etcd
Mar 28 16:32:07 vagrant kubelet[14676]: W0328 16:32:07.808776 14676 status_manager.go:459] Failed to get status for pod "etcd-vagrant_kube-system(7278f85057e8bf5cb81c9f96d3b25320)": Get https://10.0.2.15:6443/api/v1/namespaces/kube-system/pods/etcd-vagrant: dial tcp 10.0.2.15:6443: getsockopt: connection refused
Mar 28 16:32:07 vagrant kubelet[14676]: I0328 16:32:07.880412 14676 reconciler.go:217] operationExecutor.VerifyControllerAttachedVolume started for volume "etcd" (UniqueName: "kubernetes.io/host-path/7278f85057e8bf5cb81c9f96d3b25320-etcd") pod "etcd-vagrant" (UID: "7278f85057e8bf5cb81c9f96d3b25320")
Mar 28 16:34:27 vagrant kubelet[14676]: W0328 16:34:27.472534 14676 status_manager.go:459] Failed to get status for pod "etcd-vagrant_kube-system(7278f85057e8bf5cb81c9f96d3b25320)": Get https://10.0.2.15:6443/api/v1/namespaces/kube-system/pods/etcd-vagrant: dial tcp 10.0.2.15:6443: getsockopt: connection refused
Mar 28 16:57:33 vagrant kubelet[14676]: W0328 16:57:33.683648 14676 kubelet.go:1597] Deleting mirror pod "etcd-vagrant_kube-system(122348c3-32a6-11e8-8dc5-080027d6be16)" because it is outdated
Mar 28 16:57:33 vagrant kubelet[14676]: I0328 16:57:33.725564 14676 reconciler.go:217] operationExecutor.VerifyControllerAttachedVolume started for volume "etcd-certs" (UniqueName: "kubernetes.io/host-path/37936d2107e31b457cada6c2433469f1-etcd-certs") pod "etcd-vagrant" (UID: "37936d2107e31b457cada6c2433469f1")
Mar 28 16:57:33 vagrant kubelet[14676]: I0328 16:57:33.725637 14676 reconciler.go:217] operationExecutor.VerifyControllerAttachedVolume started for volume "etcd-data" (UniqueName: "kubernetes.io/host-path/37936d2107e31b457cada6c2433469f1-etcd-data") pod "etcd-vagrant" (UID: "37936d2107e31b457cada6c2433469f1")
Mar 28 16:57:35 vagrant kubelet[14676]: E0328 16:57:35.484901 14676 kuberuntime_container.go:66] Can't make a ref to pod "etcd-vagrant_kube-system(7278f85057e8bf5cb81c9f96d3b25320)", container etcd: selfLink was empty, can't make reference
Mar 28 16:57:35 vagrant kubelet[14676]: I0328 16:57:35.889458 14676 reconciler.go:191] operationExecutor.UnmountVolume started for volume "etcd" (UniqueName: "kubernetes.io/host-path/7278f85057e8bf5cb81c9f96d3b25320-etcd") pod "7278f85057e8bf5cb81c9f96d3b25320" (UID: "7278f85057e8bf5cb81c9f96d3b25320")
Mar 28 16:57:35 vagrant kubelet[14676]: I0328 16:57:35.889595 14676 operation_generator.go:643] UnmountVolume.TearDown succeeded for volume "kubernetes.io/host-path/7278f85057e8bf5cb81c9f96d3b25320-etcd" (OuterVolumeSpecName: "etcd") pod "7278f85057e8bf5cb81c9f96d3b25320" (UID: "7278f85057e8bf5cb81c9f96d3b25320"). InnerVolumeSpecName "etcd". PluginName "kubernetes.io/host-path", VolumeGidValue ""
Mar 28 16:57:35 vagrant kubelet[14676]: I0328 16:57:35.989892 14676 reconciler.go:297] Volume detached for volume "etcd" (UniqueName: "kubernetes.io/host-path/7278f85057e8bf5cb81c9f96d3b25320-etcd") on node "vagrant" DevicePath ""
Mar 28 16:58:03 vagrant kubelet[14676]: E0328 16:58:03.688878 14676 mirror_client.go:88] Failed deleting a mirror pod "etcd-vagrant_kube-system": Timeout: request did not complete within allowed duration
Mar 28 16:58:03 vagrant kubelet[14676]: E0328 16:58:03.841447 14676 event.go:200] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"etcd-vagrant.152023ff626cfbc5", GenerateName:"", Namespace:"kube-system", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, InvolvedObject:v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"etcd-vagrant", UID:"37936d2107e31b457cada6c2433469f1", APIVersion:"v1", ResourceVersion:"", FieldPath:""}, Reason:"SuccessfulMountVolume", Message:"MountVolume.SetUp succeeded for volume \"etcd-certs\" ", Source:v1.EventSource{Component:"kubelet", Host:"vagrant"}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xbea7103f713e59c5, ext:1534226953099, loc:(*time.Location)(0x5859e60)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xbea7103f713e59c5, ext:1534226953099, loc:(*time.Location)(0x5859e60)}}, Count:1, Type:"Normal", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Timeout: request did not complete within allowed duration' (will not retry!)
Mar 28 16:58:33 vagrant kubelet[14676]: E0328 16:58:33.844276 14676 event.go:200] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"etcd-vagrant.152023ff626cfb82", GenerateName:"", Namespace:"kube-system", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, InvolvedObject:v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"etcd-vagrant", UID:"37936d2107e31b457cada6c2433469f1", APIVersion:"v1", ResourceVersion:"", FieldPath:""}, Reason:"SuccessfulMountVolume", Message:"MountVolume.SetUp succeeded for volume \"etcd-data\" ", Source:v1.EventSource{Component:"kubelet", Host:"vagrant"}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xbea7103f713e5982, ext:1534226953033, loc:(*time.Location)(0x5859e60)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xbea7103f713e5982, ext:1534226953033, loc:(*time.Location)(0x5859e60)}}, Count:1, Type:"Normal", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Timeout: request did not complete within allowed duration' (will not retry!)
Mar 28 16:59:03 vagrant kubelet[14676]: E0328 16:59:03.692450 14676 kubelet.go:1612] Failed creating a mirror pod for "etcd-vagrant_kube-system(37936d2107e31b457cada6c2433469f1)": the server was unable to return a response in the time allotted, but may still be processing the request (post pods)
Mar 28 16:59:03 vagrant kubelet[14676]: E0328 16:59:03.848007 14676 event.go:200] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"etcd-vagrant.152023ff641f915f", GenerateName:"", Namespace:"kube-system", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, InvolvedObject:v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"etcd-vagrant", UID:"7278f85057e8bf5cb81c9f96d3b25320", APIVersion:"v1", ResourceVersion:"", FieldPath:"spec.containers{etcd}"}, Reason:"Killing", Message:"Killing container with id docker://etcd:Need to kill Pod", Source:v1.EventSource{Component:"kubelet", Host:"vagrant"}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xbea7103f72f0ef5f, ext:1534255433999, loc:(*time.Location)(0x5859e60)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xbea7103f72f0ef5f, ext:1534255433999, loc:(*time.Location)(0x5859e60)}}, Count:1, Type:"Normal", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Timeout: request did not complete within allowed duration' (will not retry!)
Mar 28 16:59:14 vagrant kubelet[14676]: W0328 16:59:14.472661 14676 kubelet.go:1597] Deleting mirror pod "etcd-vagrant_kube-system(122348c3-32a6-11e8-8dc5-080027d6be16)" because it is outdated
Mar 28 16:59:14 vagrant kubelet[14676]: W0328 16:59:14.473138 14676 status_manager.go:459] Failed to get status for pod "etcd-vagrant_kube-system(37936d2107e31b457cada6c2433469f1)": Get https://10.0.2.15:6443/api/v1/namespaces/kube-system/pods/etcd-vagrant: dial tcp 10.0.2.15:6443: getsockopt: connection refused
Mar 28 16:59:14 vagrant kubelet[14676]: E0328 16:59:14.473190 14676 mirror_client.go:88] Failed deleting a mirror pod "etcd-vagrant_kube-system": Delete https://10.0.2.15:6443/api/v1/namespaces/kube-system/pods/etcd-vagrant: dial tcp 10.0.2.15:6443: getsockopt: connection refused
Mar 28 16:59:14 vagrant kubelet[14676]: E0328 16:59:14.473658 14676 kubelet.go:1612] Failed creating a mirror pod for "etcd-vagrant_kube-system(37936d2107e31b457cada6c2433469f1)": Post https://10.0.2.15:6443/api/v1/namespaces/kube-system/pods: dial tcp 10.0.2.15:6443: getsockopt: connection refused
Mar 28 16:59:15 vagrant kubelet[14676]: W0328 16:59:15.481336 14676 kubelet.go:1597] Deleting mirror pod "etcd-vagrant_kube-system(122348c3-32a6-11e8-8dc5-080027d6be16)" because it is outdated
Mar 28 16:59:15 vagrant kubelet[14676]: E0328 16:59:15.483705 14676 mirror_client.go:88] Failed deleting a mirror pod "etcd-vagrant_kube-system": Delete https://10.0.2.15:6443/api/v1/namespaces/kube-system/pods/etcd-vagrant: dial tcp 10.0.2.15:6443: getsockopt: connection refused
Mar 28 16:59:15 vagrant kubelet[14676]: E0328 16:59:15.497391 14676 kubelet.go:1612] Failed creating a mirror pod for "etcd-vagrant_kube-system(37936d2107e31b457cada6c2433469f1)": Post https://10.0.2.15:6443/api/v1/namespaces/kube-system/pods: dial tcp 10.0.2.15:6443: getsockopt: connection refused
Mar 28 17:00:34 vagrant kubelet[14676]: W0328 17:00:34.475851 14676 kubelet.go:1597] Deleting mirror pod "etcd-vagrant_kube-system(122348c3-32a6-11e8-8dc5-080027d6be16)" because it is outdated
Mar 28 17:01:07 vagrant kubelet[14676]: W0328 17:01:07.720076 14676 status_manager.go:459] Failed to get status for pod "etcd-vagrant_kube-system(37936d2107e31b457cada6c2433469f1)": Get https://10.0.2.15:6443/api/v1/namespaces/kube-system/pods/etcd-vagrant: http2: server sent GOAWAY and closed the connection; LastStreamID=47, ErrCode=NO_ERROR, debug=""
Mar 28 17:01:07 vagrant kubelet[14676]: E0328 17:01:07.720107 14676 mirror_client.go:88] Failed deleting a mirror pod "etcd-vagrant_kube-system": Delete https://10.0.2.15:6443/api/v1/namespaces/kube-system/pods/etcd-vagrant: http2: server sent GOAWAY and closed the connection; LastStreamID=47, ErrCode=NO_ERROR, debug=""; some request body already written
Mar 28 17:01:07 vagrant kubelet[14676]: E0328 17:01:07.725335 14676 kubelet.go:1612] Failed creating a mirror pod for "etcd-vagrant_kube-system(37936d2107e31b457cada6c2433469f1)": Post https://10.0.2.15:6443/api/v1/namespaces/kube-system/pods: dial tcp 10.0.2.15:6443: getsockopt: connection refused
Mar 28 17:01:07 vagrant kubelet[14676]: I0328 17:01:07.728709 14676 reconciler.go:217] operationExecutor.VerifyControllerAttachedVolume started for volume "etcd" (UniqueName: "kubernetes.io/host-path/7278f85057e8bf5cb81c9f96d3b25320-etcd") pod "etcd-vagrant" (UID: "7278f85057e8bf5cb81c9f96d3b25320")
Mar 28 17:01:07 vagrant kubelet[14676]: W0328 17:01:07.734475 14676 status_manager.go:459] Failed to get status for pod "etcd-vagrant_kube-system(37936d2107e31b457cada6c2433469f1)": Get https://10.0.2.15:6443/api/v1/namespaces/kube-system/pods/etcd-vagrant: dial tcp 10.0.2.15:6443: getsockopt: connection refused
Mar 28 17:01:07 vagrant kubelet[14676]: W0328 17:01:07.740642 14676 status_manager.go:459] Failed to get status for pod "etcd-vagrant_kube-system(7278f85057e8bf5cb81c9f96d3b25320)": Get https://10.0.2.15:6443/api/v1/namespaces/kube-system/pods/etcd-vagrant: dial tcp 10.0.2.15:6443: getsockopt: connection refused
Mar 28 17:01:09 vagrant kubelet[14676]: E0328 17:01:09.484412 14676 kuberuntime_container.go:66] Can't make a ref to pod "etcd-vagrant_kube-system(37936d2107e31b457cada6c2433469f1)", container etcd: selfLink was empty, can't make reference
Mar 28 17:01:09 vagrant kubelet[14676]: I0328 17:01:09.848794 14676 reconciler.go:191] operationExecutor.UnmountVolume started for volume "etcd-certs" (UniqueName: "kubernetes.io/host-path/37936d2107e31b457cada6c2433469f1-etcd-certs") pod "37936d2107e31b457cada6c2433469f1" (UID: "37936d2107e31b457cada6c2433469f1")
Mar 28 17:01:09 vagrant kubelet[14676]: I0328 17:01:09.849282 14676 reconciler.go:191] operationExecutor.UnmountVolume started for volume "etcd-data" (UniqueName: "kubernetes.io/host-path/37936d2107e31b457cada6c2433469f1-etcd-data") pod "37936d2107e31b457cada6c2433469f1" (UID: "37936d2107e31b457cada6c2433469f1")
Mar 28 17:01:09 vagrant kubelet[14676]: I0328 17:01:09.849571 14676 operation_generator.go:643] UnmountVolume.TearDown succeeded for volume "kubernetes.io/host-path/37936d2107e31b457cada6c2433469f1-etcd-data" (OuterVolumeSpecName: "etcd-data") pod "37936d2107e31b457cada6c2433469f1" (UID: "37936d2107e31b457cada6c2433469f1"). InnerVolumeSpecName "etcd-data". PluginName "kubernetes.io/host-path", VolumeGidValue ""
Mar 28 17:01:09 vagrant kubelet[14676]: I0328 17:01:09.849503 14676 operation_generator.go:643] UnmountVolume.TearDown succeeded for volume "kubernetes.io/host-path/37936d2107e31b457cada6c2433469f1-etcd-certs" (OuterVolumeSpecName: "etcd-certs") pod "37936d2107e31b457cada6c2433469f1" (UID: "37936d2107e31b457cada6c2433469f1"). InnerVolumeSpecName "etcd-certs". PluginName "kubernetes.io/host-path", VolumeGidValue ""
Mar 28 17:01:09 vagrant kubelet[14676]: I0328 17:01:09.949925 14676 reconciler.go:297] Volume detached for volume "etcd-certs" (UniqueName: "kubernetes.io/host-path/37936d2107e31b457cada6c2433469f1-etcd-certs") on node "vagrant" DevicePath ""
Mar 28 17:01:09 vagrant kubelet[14676]: I0328 17:01:09.949975 14676 reconciler.go:297] Volume detached for volume "etcd-data" (UniqueName: "kubernetes.io/host-path/37936d2107e31b457cada6c2433469f1-etcd-data") on node "vagrant" DevicePath ""
当前的解决方法是继续重试升级,并且在某个时候它将成功。
@stealthybox您是否碰巧从etcd容器的docker中获取日志? 同样, grep -i etcd
可能掩盖了一些kubelet输出,例如一些错误消息,其中没有容器名称,但仍然有用。
我只是碰到了另一个与此错误有关的怪异案例。 在拉出新的etcd映像和部署新的静态Pod之前,kubeadm升级将etcd升级标记为已完成。 这会导致升级在以后的步骤中超时,并且升级回滚失败。 这也使群集处于损坏状态。 需要恢复原始的etcd静态pod清单以恢复集群。
哦,是的,我也被困在那里。 我的集群完全崩溃了。 有人可以分享一些有关如何从此状态中解救的说明吗?
就像@detiber所描述的那样,我第二次尝试升级时就在
在/ etc / kubernetes / tmp中找到了一些备份的东西,感觉到etcd可能是罪魁祸首,我将其旧清单复制到manifests文件夹中的新清单上。 那时我没有什么可失去的,因为我完全失去了对集群的控制。 然后,我记不清了,但是我想我重新启动了整个计算机,后来又将所有内容降级到v1.9.6。 最终,我获得了对该群集的控制权,然后失去了再次使用v1.10.0的动机。 一点都不好玩...
如果从/etc/kubernetes/tmp
回滚etcd静态Pod清单,由于1.10中新的TLS配置,还必须将apiserver清单回滚到1.9版本,这一点很重要。
^但是您可能不需要这样做,因为我相信etcd升级会阻止其余的控制平面升级。
似乎只有etcd清单不会在升级失败时回滚,其他一切都很好。 将备份清单移至上方并重新启动kubelet之后,一切恢复正常。
我遇到了同样的超时问题,并且kubeadm将kube-apiserv清单回滚到1.9.6,但保留了etcd清单(读取:启用了TLS),显然导致apiserv失败,有效地破坏了我的主节点。 我认为是单独发布问题报告的好人选。
@dvdmuckle @codepainters ,不幸的是,回滚是否成功取决于哪个组件符合竞争条件(etcd或api服务器)。 我找到了解决比赛条件的方法,但是它完全破坏了kubeadm的升级。 我正在与@stealthybox合作,尝试找到正确解决升级问题的正确方法。
@codepainters我认为这是相同的问题。
有一些潜在的问题导致此问题:
结果,仅当碰巧有etcd pod的pod状态更新导致哈希值在kubelet拾取etcd的新静态清单之前更改时,升级才当前成功。 此外,当升级工具在更新apiserver清单之前查询api时,api服务器需要在apiserver升级的第一部分中保持可用状态。
@detiber和我通电话讨论了我们需要对升级过程进行的更改。
我们计划在1.10.x修补程序版本中针对此错误实施3个修复程序:
从升级中删除etcd TLS。
当前的升级循环以串行方式对每个组件进行批量修改。
升级组件不了解相关的组件配置。
验证升级要求APIServer可用于检查容器状态。
Etcd TLS需要更改etcd + apiserver的配置,这会破坏此合同。
这是解决此问题的最小可行更改,并使升级后的群集具有不安全的etcd。
修正Pod状态更改上的mirror-pod哈希竞赛条件。
https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/app/phases/upgrade/staticpods.go#L189。
假设etcd和apiserver标志之间具有兼容性,那么升级现在将是正确的。
专门在单独的阶段中升级TLS。
Etcd和APIServer需要一起升级。
kubeadm alpha phase ensure-etcd-tls
?。
此阶段应可独立于集群升级运行。
在群集升级期间,应在更新所有组件之前运行此阶段。
对于1.11,我们要:
另外一种方法:使用CRI获取广告连播信息(使用crictl
演示可行)。
警告:dockershim上的CRI和可能的其他容器运行时当前不支持CRI重大更改的向后兼容性。
@detiber ,您介意解释我们在谈论什么比赛条件吗? 我对kubeadm内部构件并不熟悉,但这听起来很有趣。
@codepainters参见https://github.com/kubernetes/kubeadm/issues/740#issuecomment -377263347
仅供参考-从1.9.3升级相同的问题/问题
尝试了多次重试的解决方法。 最终达到了带有API服务器的竞争条件,并且升级无法回滚。
@stealthybox thx,我在初读时没听懂。
我遇到了同样的问题。.[ERROR APIServerHealth]:API服务器不健康; / healthz没有返回“确定”
[错误MasterNodesReady]:无法列出群集中的主服务器:升级时获取https.......。 请帮我解决一下这个。 我正在从1.9.3升级到1.10.0。 最初,它能够到达“ [等待升级/静态脚架]等待kubelet重新启动组件”的某个点。
临时解决方法是通过绕过检查来确保证书并升级etcd和apiserver吊舱。
确保检查您的配置并为您的用例添加任何标志:
kubectl -n kube-system edit cm kubeadm-config # change featureFlags
...
featureGates: {}
...
kubeadm alpha phase certs all
kubeadm alpha phase etcd local
kubeadm alpha phase controlplane all
kubeadm alpha phase upload-config
谢谢@stealthybox
对我来说, upgrade apply
进程停在了[upgrade/apply] Upgrading your Static Pod-hosted control plane to version "v1.10.1"...
但是集群已成功升级。
@stealthybox我不确定,但是在这些步骤之后似乎出现了问题,因为在此之后
kubeadm upgrade plan
会挂起:
[preflight] Running pre-flight checks.
[upgrade] Making sure the cluster is healthy:
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[upgrade] Fetching available versions to upgrade to
[upgrade/versions] Cluster version: v1.10.1
[upgrade/versions] kubeadm version: v1.10.1
[upgrade/versions] Latest stable version: v1.10.1
应用更新时,我也挂了[upgrade/apply] Upgrading your Static Pod-hosted control plane to version "v1.10.1"...
@kvaps @stealthybox这很可能是etcd
问题( kubeadm
对启用TLS的etcd
说的是简单的HTTP/2
etcd
),我也遇到了。 参见其他问题: https :
老实说,我不明白为什么TLS和非TLS etcd
侦听器都使用相同的TCP端口,只会引起类似这样的麻烦。 简单地说,旧的_拒绝连接会立即给出提示,在这里,我不得不求助于tcpdump
来了解发生了什么。
哦!
拍对了,那只适用于我的本地TLS补丁进行Etcd状态检查。
执行以下操作完成升级:
kubeadm alpha phase controlplane all
kubeadm alpha phase upload-config
编辑上述解决方法是正确的
@stealthybox第二个kubeadm命令不起作用:
# kubeadm alpha phase upload-config
The --config flag is mandatory
@renich只是给它配置的文件路径
如果您不使用任何自定义设置,则可以向其传递一个空文件。
这是在bash中执行此操作的一种简单方法:
1.10_kubernetes/server/bin/kubeadm alpha phase upload-config --config <(echo)
现在应该通过合并https://github.com/kubernetes/kubernetes/pull/62655解决此问题,并将成为v1.10.2版本的一部分。
我可以确认使用kubeadm 1.10.2进行的1.10.0-> 1.10.2升级是顺利的,没有超时
我仍然在1.10.0-> 1.10.2上有超时,但是另外一个:
[upgrade/staticpods] Waiting for the kubelet to restart the component
Static pod: kube-apiserver-master hash: a273591d3207fcd9e6fd0c308cc68d64
[upgrade/apply] FATAL: couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced: [timed out waiting for the condition]
我不确定该怎么办...
@ denis111使用docker ps
进行升级时检查API服务器日志。 我觉得您可能遇到了我也遇到的问题。
@dvdmuckle好吧,我在该日志中没有看到任何错误,只有以I和几个W开头的条目。
而且我认为kube-apiserver的哈希值在升级过程中不会改变。
我在1.9.3上有一个ARM64群集,已成功更新到1.9.7,但从1.9.7升级到1.10.2时也遇到了相同的超时问题。
我什至尝试编辑和重新编译kubeadm以增加超时(如这些最后的提交https://github.com/anguslees/kubernetes/commits/kubeadm-gusfork),结果相同。
$ sudo kubeadm upgrade apply v1.10.2 --force
[preflight] Running pre-flight checks.
[upgrade] Making sure the cluster is healthy:
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[upgrade/version] You have chosen to change the cluster version to "v1.10.2"
[upgrade/versions] Cluster version: v1.9.7
[upgrade/versions] kubeadm version: v1.10.2-dirty
[upgrade/version] Found 1 potential version compatibility errors but skipping since the --force flag is set:
- Specified version to upgrade to "v1.10.2" is higher than the kubeadm version "v1.10.2-dirty". Upgrade kubeadm first using the tool you used to install kubeadm
[upgrade/prepull] Will prepull images for components [kube-apiserver kube-controller-manager kube-scheduler]
[upgrade/apply] Upgrading your Static Pod-hosted control plane to version "v1.10.2"...
Static pod: kube-apiserver-kubemaster1 hash: ed7578d5bf9314188dca798386bcfb0e
Static pod: kube-controller-manager-kubemaster1 hash: e0c3f578f1c547dcf9996e1d3390c10c
Static pod: kube-scheduler-kubemaster1 hash: 52e767858f52ac4aba448b1a113884ee
[upgrade/etcd] Upgrading to TLS for etcd
Static pod: etcd-kubemaster1 hash: 413224efa82e36533ce93e30bd18e3a8
[etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests346927148/etcd.yaml"
[certificates] Using the existing etcd/ca certificate and key.
[certificates] Using the existing etcd/server certificate and key.
[certificates] Using the existing etcd/peer certificate and key.
[certificates] Using the existing etcd/healthcheck-client certificate and key.
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/etcd.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests190581659/etcd.yaml"
[upgrade/staticpods] Not waiting for pod-hash change for component "etcd"
[upgrade/etcd] Waiting for etcd to become available
[util/etcd] Waiting 30s for initial delay
[util/etcd] Attempting to get etcd status 1/10
[util/etcd] Attempt failed with error: dial tcp 127.0.0.1:2379: getsockopt: connection refused
[util/etcd] Waiting 15s until next retry
[util/etcd] Attempting to get etcd status 2/10
[util/etcd] Attempt failed with error: dial tcp 127.0.0.1:2379: getsockopt: connection refused
[util/etcd] Waiting 15s until next retry
[util/etcd] Attempting to get etcd status 3/10
[util/etcd] Attempt failed with error: dial tcp 127.0.0.1:2379: getsockopt: connection refused
[util/etcd] Waiting 15s until next retry
[util/etcd] Attempting to get etcd status 4/10
[upgrade/staticpods] Writing new Static Pod manifests to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests346927148"
[controlplane] Wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests346927148/kube-apiserver.yaml"
[controlplane] Wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests346927148/kube-controller-manager.yaml"
[controlplane] Wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests346927148/kube-scheduler.yaml"
[upgrade/staticpods] The etcd manifest will be restored if component "kube-apiserver" fails to upgrade
[certificates] Using the existing etcd/ca certificate and key.
[certificates] Using the existing apiserver-etcd-client certificate and key.
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-apiserver.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests190581659/kube-apiserver.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/apply] FATAL: couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced: [timed out waiting for the condition]
升级v1.10.2-> v1.10.2(这可能是废话。只需测试...)
Ubuntu 16.04。
并且,它失败并显示错误。
kubeadm upgrade apply v1.10.2
[upgrade/apply] FATAL: couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced: [timed out waiting for the condition]
我想知道是否在某些问题上仍在跟踪……找不到。
我还看到升级仍然失败,并出现timed out waiting for the condition
错误。
编辑:将讨论移至新票证https://github.com/kubernetes/kubeadm/issues/850 ,请在此处进行讨论。
如果其他任何人对1.9.x都有此问题,请执行以下操作:
如果您使用的是具有自定义主机名的AWS,则需要编辑kubeadm-config configmap并在nodeName上设置AWS内部名称:ip-xx-xx-xx-xx。$ REGION.compute.internal)
kubectl -n kube-system edit cm kubeadm-config -oyaml
这除了将etc客户端设置为http。 我还没来信来看看他们是否解决了这个问题。
这是因为kubeadm尝试在api中读取此路径:/ api / v1 / namespaces / kube-system / pods / kube-apiserver- $ NodeName
由于超时已在1.10.6上增加,所以几周前我已将1.9.7部署成功更新为1.10.6。
计划在.deb软件包准备就绪后立即升级到1.11.2,因为此版本中的更改相同。
我的集群在ARM64板上本地运行。
最有用的评论
临时解决方法是通过绕过检查来确保证书并升级etcd和apiserver吊舱。
确保检查您的配置并为您的用例添加任何标志: