发生了什么:我们将15个kubernetes集群从1.17.5升级到1.18.2 / 1.18.3,开始看到守护程序集不再正常工作。
问题是所有守护程序吊舱都未设置。 它将向事件返回以下错误消息:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 9s (x5 over 71s) default-scheduler 0/13 nodes are available: 12 node(s) didn't match node selector.
但是,所有节点均可用,并且没有节点选择器。 节点也没有污点。
守护程序https://gist.github.com/zetaab/4a605cb3e15e349934cb7db29ec72bd8
% kubectl get nodes
NAME STATUS ROLES AGE VERSION
e2etest-1-kaasprod-k8s-local Ready node 46h v1.18.3
e2etest-2-kaasprod-k8s-local Ready node 46h v1.18.3
e2etest-3-kaasprod-k8s-local Ready node 44h v1.18.3
e2etest-4-kaasprod-k8s-local Ready node 44h v1.18.3
master-zone-1-1-1-kaasprod-k8s-local Ready master 47h v1.18.3
master-zone-2-1-1-kaasprod-k8s-local Ready master 47h v1.18.3
master-zone-3-1-1-kaasprod-k8s-local Ready master 47h v1.18.3
nodes-z1-1-kaasprod-k8s-local Ready node 47h v1.18.3
nodes-z1-2-kaasprod-k8s-local Ready node 47h v1.18.3
nodes-z2-1-kaasprod-k8s-local Ready node 46h v1.18.3
nodes-z2-2-kaasprod-k8s-local Ready node 46h v1.18.3
nodes-z3-1-kaasprod-k8s-local Ready node 47h v1.18.3
nodes-z3-2-kaasprod-k8s-local Ready node 46h v1.18.3
% kubectl get pods -n weave -l weave-scope-component=agent -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
weave-scope-agent-2drzw 1/1 Running 0 26h 10.1.32.23 e2etest-1-kaasprod-k8s-local <none> <none>
weave-scope-agent-4kpxc 1/1 Running 3 26h 10.1.32.12 nodes-z1-2-kaasprod-k8s-local <none> <none>
weave-scope-agent-78n7r 1/1 Running 0 26h 10.1.32.7 e2etest-4-kaasprod-k8s-local <none> <none>
weave-scope-agent-9m4n8 1/1 Running 0 26h 10.1.96.4 master-zone-1-1-1-kaasprod-k8s-local <none> <none>
weave-scope-agent-b2gnk 1/1 Running 1 26h 10.1.96.12 master-zone-3-1-1-kaasprod-k8s-local <none> <none>
weave-scope-agent-blwtx 1/1 Running 2 26h 10.1.32.20 nodes-z1-1-kaasprod-k8s-local <none> <none>
weave-scope-agent-cbhjg 1/1 Running 0 26h 10.1.64.15 e2etest-2-kaasprod-k8s-local <none> <none>
weave-scope-agent-csp49 1/1 Running 0 26h 10.1.96.14 e2etest-3-kaasprod-k8s-local <none> <none>
weave-scope-agent-g4k2x 1/1 Running 1 26h 10.1.64.10 nodes-z2-2-kaasprod-k8s-local <none> <none>
weave-scope-agent-kx85h 1/1 Running 2 26h 10.1.96.6 nodes-z3-1-kaasprod-k8s-local <none> <none>
weave-scope-agent-lllqc 0/1 Pending 0 5m56s <none> <none> <none> <none>
weave-scope-agent-nls2h 1/1 Running 0 26h 10.1.96.17 master-zone-2-1-1-kaasprod-k8s-local <none> <none>
weave-scope-agent-p8njs 1/1 Running 2 26h 10.1.96.19 nodes-z3-2-kaasprod-k8s-local <none> <none>
我试图重新启动apiserver / schedulers / controller-managers,但这无济于事。 我也尝试过重新启动被卡住的单个节点(nodes-z2-1-kaasprod-k8s-local),但它也无济于事。 仅删除该节点并重新创建它会有所帮助。
% kubectl describe node nodes-z2-1-kaasprod-k8s-local
Name: nodes-z2-1-kaasprod-k8s-local
Roles: node
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=59cf4871-de1b-4294-9e9f-2ea7ca4b771f
beta.kubernetes.io/os=linux
failure-domain.beta.kubernetes.io/region=regionOne
failure-domain.beta.kubernetes.io/zone=zone-2
kops.k8s.io/instancegroup=nodes-z2
kubernetes.io/arch=amd64
kubernetes.io/hostname=nodes-z2-1-kaasprod-k8s-local
kubernetes.io/os=linux
kubernetes.io/role=node
node-role.kubernetes.io/node=
node.kubernetes.io/instance-type=59cf4871-de1b-4294-9e9f-2ea7ca4b771f
topology.cinder.csi.openstack.org/zone=zone-2
topology.kubernetes.io/region=regionOne
topology.kubernetes.io/zone=zone-2
Annotations: csi.volume.kubernetes.io/nodeid: {"cinder.csi.openstack.org":"faf14d22-010f-494a-9b34-888bdad1d2df"}
node.alpha.kubernetes.io/ttl: 0
projectcalico.org/IPv4Address: 10.1.64.32/19
projectcalico.org/IPv4IPIPTunnelAddr: 100.98.136.0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Thu, 28 May 2020 13:28:24 +0300
Taints: <none>
Unschedulable: false
Lease:
HolderIdentity: nodes-z2-1-kaasprod-k8s-local
AcquireTime: <unset>
RenewTime: Sat, 30 May 2020 12:02:13 +0300
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Fri, 29 May 2020 09:40:51 +0300 Fri, 29 May 2020 09:40:51 +0300 CalicoIsUp Calico is running on this node
MemoryPressure False Sat, 30 May 2020 11:59:53 +0300 Fri, 29 May 2020 09:40:45 +0300 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Sat, 30 May 2020 11:59:53 +0300 Fri, 29 May 2020 09:40:45 +0300 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Sat, 30 May 2020 11:59:53 +0300 Fri, 29 May 2020 09:40:45 +0300 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Sat, 30 May 2020 11:59:53 +0300 Fri, 29 May 2020 09:40:45 +0300 KubeletReady kubelet is posting ready status. AppArmor enabled
Addresses:
InternalIP: 10.1.64.32
Hostname: nodes-z2-1-kaasprod-k8s-local
Capacity:
cpu: 4
ephemeral-storage: 10287360Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 8172420Ki
pods: 110
Allocatable:
cpu: 4
ephemeral-storage: 9480830961
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 8070020Ki
pods: 110
System Info:
Machine ID: c94284656ff04cf090852c1ddee7bcc2
System UUID: faf14d22-010f-494a-9b34-888bdad1d2df
Boot ID: 295dc3d9-0a90-49ee-92f3-9be45f2f8e3d
Kernel Version: 4.19.0-8-cloud-amd64
OS Image: Debian GNU/Linux 10 (buster)
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://19.3.8
Kubelet Version: v1.18.3
Kube-Proxy Version: v1.18.3
PodCIDR: 100.96.12.0/24
PodCIDRs: 100.96.12.0/24
ProviderID: openstack:///faf14d22-010f-494a-9b34-888bdad1d2df
Non-terminated Pods: (3 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
kube-system calico-node-77pqs 100m (2%) 200m (5%) 100Mi (1%) 100Mi (1%) 46h
kube-system kube-proxy-nodes-z2-1-kaasprod-k8s-local 100m (2%) 200m (5%) 100Mi (1%) 100Mi (1%) 46h
volume csi-cinder-nodeplugin-5jbvl 100m (2%) 400m (10%) 200Mi (2%) 200Mi (2%) 46h
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 300m (7%) 800m (20%)
memory 400Mi (5%) 400Mi (5%)
ephemeral-storage 0 (0%) 0 (0%)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 7m27s kubelet, nodes-z2-1-kaasprod-k8s-local Starting kubelet.
Normal NodeHasSufficientMemory 7m26s kubelet, nodes-z2-1-kaasprod-k8s-local Node nodes-z2-1-kaasprod-k8s-local status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 7m26s kubelet, nodes-z2-1-kaasprod-k8s-local Node nodes-z2-1-kaasprod-k8s-local status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 7m26s kubelet, nodes-z2-1-kaasprod-k8s-local Node nodes-z2-1-kaasprod-k8s-local status is now: NodeHasSufficientPID
Normal NodeAllocatableEnforced 7m26s kubelet, nodes-z2-1-kaasprod-k8s-local Updated Node Allocatable limit across pods
我们在所有集群中都随机看到了这一点。
您期望发生的事情:我希望守护程序集将提供给所有节点。
如何重现(尽可能最小且尽可能精确) :真的不知道,安装1.18.x kubernetes并部署daemonset,然后等待几天(?)。
我们还需要知道什么吗? :发生这种情况时,我们也无法将任何其他守护程序设置到该节点。 就像您看到的日志记录流利的位也丢失了。 我在该节点kubelet日志中看不到任何错误,并且像所说的那样,重新启动没有帮助。
% kubectl get ds --all-namespaces
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
falco falco-daemonset 13 13 12 13 12 <none> 337d
kube-system audit-webhook-deployment 3 3 3 3 3 node-role.kubernetes.io/master= 174d
kube-system calico-node 13 13 13 13 13 kubernetes.io/os=linux 36d
kube-system kops-controller 3 3 3 3 3 node-role.kubernetes.io/master= 193d
kube-system metricbeat 6 6 5 6 5 <none> 35d
kube-system openstack-cloud-provider 3 3 3 3 3 node-role.kubernetes.io/master= 337d
logging fluent-bit 13 13 12 13 12 <none> 337d
monitoring node-exporter 13 13 12 13 12 kubernetes.io/os=linux 58d
volume csi-cinder-nodeplugin 6 6 6 6 6 <none> 239d
weave weave-scope-agent 13 13 12 13 12 <none> 193d
weave weavescope-iowait-plugin 6 6 5 6 5 <none> 193d
如您所见,大多数守护程序都缺少一个pod
环境:
kubectl version
):1.18.3cat /etc/os-release
):debian busteruname -a
):Linux节点-z2-1-kaasprod-k8s-local 4.19.0-8-cloud-amd64#1 SMP Debian 4.19.98-1 + deb10u1(2020-04-27) x86_64 GNU / Linux/ sig调度
您能否提供从服务器检索到的节点,守护程序集,示例Pod和包含名称空间的完整Yaml?
节点:
https://gist.github.com/zetaab/2a7e8d3fe6cb42a617e17abc0fa375f7
守护程序:
https://gist.github.com/zetaab/31bb406c8bd622b3017bf4f468d0154f
示例pod(工作):
https://gist.github.com/zetaab/814871bec6f2879e371f5bbdc6f2e978
示例广告连播(不安排):
https://gist.github.com/zetaab/f3488d65486c745af78dbe2e6173fd42
命名空间:
https://gist.github.com/zetaab/4625b759f4e21b50757c79e5072cd7d9
DaemonSet窗格使用仅与单个节点匹配的nodeAffinity选择器进行调度,因此应该出现“ 13个中的12个不匹配”消息。
我看不出调度程序对pod / node组合不满意的原因……在podspec中没有可能发生冲突的端口,该节点不是不可调度的或受污染的,并且具有足够的资源
好的,我重新启动了所有3个调度程序(如果我们可以在其中看到有趣的内容,则将日志级别更改为4)。 但是,它解决了这个问题
% kubectl get ds --all-namespaces
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
falco falco-daemonset 13 13 13 13 13 <none> 338d
kube-system audit-webhook-deployment 3 3 3 3 3 node-role.kubernetes.io/master= 175d
kube-system calico-node 13 13 13 13 13 kubernetes.io/os=linux 36d
kube-system kops-controller 3 3 3 3 3 node-role.kubernetes.io/master= 194d
kube-system metricbeat 6 6 6 6 6 <none> 36d
kube-system openstack-cloud-provider 3 3 3 3 3 node-role.kubernetes.io/master= 338d
logging fluent-bit 13 13 13 13 13 <none> 338d
monitoring node-exporter 13 13 13 13 13 kubernetes.io/os=linux 59d
volume csi-cinder-nodeplugin 6 6 6 6 6 <none> 239d
weave weave-scope-agent 13 13 13 13 13 <none> 194d
weave weavescope-iowait-plugin 6 6 6 6 6 <none> 194d
现在,所有守护程序都已正确配置。 很奇怪,无论如何调度程序似乎有问题
cc @ kubernetes / sig-scheduling-bugs @ ahg-g
我们在v1.18.3上看到了类似的问题,无法为守护程序容器调度一个节点。
重新启动调度程序会有所帮助。
[root@tesla-cb0434-csfp1-csfp1-control-03 ~]# kubectl get pod -A|grep Pending
kube-system coredns-vc5ws 0/1 Pending 0 2d16h
kube-system local-volume-provisioner-mwk88 0/1 Pending 0 2d16h
kube-system svcwatcher-ltqb6 0/1 Pending 0 2d16h
ncms bcmt-api-hfzl6 0/1 Pending 0 2d16h
ncms bcmt-yum-repo-589d8bb756-5zbvh 0/1 Pending 0 2d16h
[root@tesla-cb0434-csfp1-csfp1-control-03 ~]# kubectl get ds -A
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-system coredns 3 3 2 3 2 is_control=true 2d16h
kube-system danmep-cleaner 0 0 0 0 0 cbcs.nokia.com/danm_node=true 2d16h
kube-system kube-proxy 8 8 8 8 8 <none> 2d16h
kube-system local-volume-provisioner 8 8 7 8 7 <none> 2d16h
kube-system netwatcher 0 0 0 0 0 cbcs.nokia.com/danm_node=true 2d16h
kube-system sriov-device-plugin 0 0 0 0 0 sriov=enabled 2d16h
kube-system svcwatcher 3 3 2 3 2 is_control=true 2d16h
ncms bcmt-api 3 3 0 3 0 is_control=true 2d16h
[root@tesla-cb0434-csfp1-csfp1-control-03 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
tesla-cb0434-csfp1-csfp1-control-01 Ready <none> 2d16h v1.18.3
tesla-cb0434-csfp1-csfp1-control-02 Ready <none> 2d16h v1.18.3
tesla-cb0434-csfp1-csfp1-control-03 Ready <none> 2d16h v1.18.3
tesla-cb0434-csfp1-csfp1-edge-01 Ready <none> 2d16h v1.18.3
tesla-cb0434-csfp1-csfp1-edge-02 Ready <none> 2d16h v1.18.3
tesla-cb0434-csfp1-csfp1-worker-01 Ready <none> 2d16h v1.18.3
tesla-cb0434-csfp1-csfp1-worker-02 Ready <none> 2d16h v1.18.3
tesla-cb0434-csfp1-csfp1-worker-03 Ready <none> 2d16h v1.18.3
在不知道如何减少的情况下很难调试。 您是否有调度失败的调度程序日志的机会?
好吧,我重新启动了所有3个调度程序
我假设其中只有一个被命名为default-scheduler
,对吗?
如果我们可以看到有趣的内容,则将日志级别更改为4
您能否分享您注意到的内容?
将loglevel设置为9,但似乎没有什么更有趣的了,下面的日志正在循环。
I0601 01:45:05.039373 1 generic_scheduler.go:290] Preemption will not help schedule pod kube-system/coredns-vc5ws on any node.
I0601 01:45:05.039437 1 factory.go:462] Unable to schedule kube-system/coredns-vc5ws: no fit: 0/8 nodes are available: 7 node(s) didn't match node selector.; waiting
I0601 01:45:05.039494 1 scheduler.go:776] Updating pod condition for kube-system/coredns-vc5ws to (PodScheduled==False, Reason=Unschedulable)
是的,我只看到同一行
no fit: 0/8 nodes are available: 7 node(s) didn't match node selector.; waiting
奇怪的是,日志消息仅显示了7个节点的结果,例如https://github.com/kubernetes/kubernetes/issues/91340中报告的问题
/ cc @damemi
@ ahg-g这看起来确实与我在此处报告的问题相同,似乎我们有一个过滤器插件可能并不总是报告其错误,或者如果我不得不猜测,某些其他情况会无声地失败
请注意,在我的问题中,重新启动调度程序也对其进行了修复(也如本线程所述https://github.com/kubernetes/kubernetes/issues/91601#issuecomment-636360092)
我的也是关于守护程序的,所以我认为这是重复的。 如果是这种情况,我们可以关闭它并继续在https://github.com/kubernetes/kubernetes/issues/91340中进行讨论
无论如何,调度程序需要更多详细的日志记录选项,如果没有有关其功能的日志,则不可能调试这些问题
@zetaab +1,调度程序可以对其当前的日志记录功能进行重大改进。 这是我一段时间以来一直打算解决的升级问题,我终于在这里为其打开了一个问题: https :
/分配
我正在调查这个。 几个问题可以帮助我缩小范围。 我还不能复制。
节点在守护程序集之前创建。
假设我们使用了默认配置文件,您指的是哪个配置文件以及如何检查?
没有扩展。
command:
- /usr/local/bin/kube-scheduler
- --address=127.0.0.1
- --kubeconfig=/etc/kubernetes/kube-scheduler.kubeconfig
- --profiling=false
- --v=1
可能影响的另一件事是磁盘性能对于etcd而言不是很好,etcd抱怨操作缓慢。
是的,这些标志将使调度程序以默认配置文件运行。 我会继续寻找。 我仍然无法复制。
还是一无所有...您认为正在使用的其他功能可能会影响到您? 异味,港口,其他资源?
做了一些与此有关的尝试。 出现问题后,仍然可以将Pod调度到该节点(没有定义或使用“ nodeName”选择器)。
如果尝试使用亲和力/反亲和力,则pod不会安排到节点。
问题出现时的工作方式:
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
run: nginx
name: nginx
spec:
nodeName: master-zone-3-1-1-test-cluster-k8s-local
containers:
- image: nginx
name: nginx
resources: {}
dnsPolicy: ClusterFirst
restartPolicy: Always
不能同时工作:
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
run: nginx
name: nginx
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- master-zone-3-1-1-test-cluster-k8s-local
containers:
- image: nginx
name: nginx
resources: {}
dnsPolicy: ClusterFirst
restartPolicy: Always
同样,当检查后者时,甚至是非常有趣的:
Warning FailedScheduling 4m37s (x17 over 26m) default-scheduler 0/9 nodes are available: 8 node(s) didn't match node selector.
Warning FailedScheduling 97s (x6 over 3m39s) default-scheduler 0/8 nodes are available: 8 node(s) didn't match node selector.
Warning FailedScheduling 53s default-scheduler 0/8 nodes are available: 8 node(s) didn't match node selector.
Warning FailedScheduling 7s (x5 over 32s) default-scheduler 0/9 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 7 node(s) didn't match node selector.
“ nodeName”不是选择器。 使用nodeName将绕过调度。
当节点恢复时,第四个出现。 出现问题的节点是主节点,因此该节点不去那里(但它表明在3个较早的事件中未找到该节点)。 第四事件的有趣之处在于,仍然缺少来自一个节点的信息。 事件说有0/9个节点可用,但仅从8个给出了描述。
您是说不应该在丢失的节点中安排Pod的原因是因为它是主节点?
我们看到8 node(s) didn't match node selector
将要到7。我假设此时没有删除任何节点,对吗?
“ nodeName”不是选择器。 使用nodeName将绕过调度。
“ NodeName”尝试是高亮的,该节点是可用的,并且如果需要,pod可以到达那里。 因此,并不是节点无法启动Pod。
当节点恢复时,第四个出现。 出现问题的节点是主节点,因此该节点不去那里(但它表明在3个较早的事件中未找到该节点)。 第四事件的有趣之处在于,仍然缺少来自一个节点的信息。 事件说有0/9个节点可用,但仅从8个给出了描述。
您是说不应该在丢失的节点中安排Pod的原因是因为它是主节点?
我们看到
8 node(s) didn't match node selector
将要到7。我假设此时没有删除任何节点,对吗?
测试集群有9个节点; 3名硕士和6名工人。 在成功启动非工作节点之前,事件会告知有关所有可用节点的信息: 0/8 nodes are available: 8 node(s) didn't match node selector.
。 但是,当与该节点选择器匹配的那个节点出现时,该事件告诉0/9 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 7 node(s) didn't match node selector.
解释表明有8个不匹配,但没有告诉第九个(在上一个事件中得到确认)。
因此事件状态:
最终,由于污点,测试Pod没有在匹配节点开始,但这是另一回事了(在第一个事件中应该已经如此)。
“ NodeName”尝试是高亮的,该节点是可用的,并且如果需要,pod可以到达那里。 因此,并不是节点无法启动Pod。
请注意,除了调度程序之外,没有什么可以防止过量使用节点。 因此,这并没有显示太多。
最终,由于污点,测试Pod没有在匹配节点开始,但这是另一回事了(在第一个事件中应该已经如此)。
我的问题是:第9个节点是否从一开始就受到污染? 我正在尝试寻找(1)可重复的步骤以达到状态或(2)错误所在。
我的问题是:第9个节点是否从一开始就受到污染? 我正在尝试寻找(1)可重复的步骤以达到状态或(2)错误所在。
是的,在这种情况下,污点一直存在,因为非接收节点是主节点。 但是我们在主人和工人身上都看到了同样的问题。
仍然不知道问题来自何处,只是至少节点的重新创建和节点的重新启动似乎可以解决问题。 但是这些都是修复问题的“硬”方法。
远射,但是如果您再次遇到它...您可以检查节点是否存在未显示的指定舱位?
考虑到可能的情况,我正在发布问题:
* Do you have other master nodes in your cluster?
所有cluser都有3个master(因此重新启动它们很容易)
* Do you have extenders?
没有。
今天注意到了一件有趣的事情:我有一个集群,其中一个主机未从DaemonSet接收Pod。 我们使用了ChaosMonkey,它终止了一个工作节点。 有趣的是,这使吊舱可以转到较早没有收到的吊舱。 因此,除去问题节点以外的其他节点似乎可以解决该问题。
由于存在该“修复”,因此我不得不等待问题再次出现,以便能够回答有关提名豆荚的问题。
我现在很困惑...您的守护程序是否可以容忍主节点的污点? 换句话说...是给您的bug只是安排事件,还是应该安排Pod的事实?
问题是,即使有至少一个匹配的亲和力(或反亲和力)设置,调度程序也找不到该节点。
这就是为什么我说异味错误是预期的,并且应该在第一次事件中就已经存在(因为异味不是亲和力标准的一部分)
明白了我试图确认您的设置,以确保我没有丢失任何东西。
我不认为调度程序“看不见”该节点。 假设我们看到0/9 nodes are available
,我们可以得出结论,该节点确实在缓存中。 这更像是无法预测的原因丢失在某个地方,因此我们不在事件中包括它。
是的,总计数始终与实际节点数匹配。 并非在所有节点上都提供了更具描述性的事件文本,但是正如您提到的那样,这可能是单独的问题。
您可以查看您的kube-scheduler日志吗? 有什么相关的内容吗?
我认为@zetaab试图寻找没有成功的东西。 当问题再次发生时,我可以尝试(以及之前要求的提名豆荚问题)
如果可能,请运行1.18.5,以防我们无意中解决了该问题。
如果您需要更多日志,则可以在测试群集上可靠地重现此信息
@dilyevsky请分享复制步骤。 您能以某种方式确定发生故障的过滤器是什么吗?
它似乎只是ds pod的节点的metadata.name ...很奇怪。 这是豆荚的Yaml:
豆荚Yaml:
apiVersion: v1
kind: Pod
metadata:
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ""
creationTimestamp: "2020-07-09T23:17:53Z"
generateName: cilium-
labels:
controller-revision-hash: 6c94db8bb8
k8s-app: cilium
pod-template-generation: "1"
managedFields:
# managed fields crap
name: cilium-d5n4f
namespace: kube-system
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: DaemonSet
name: cilium
uid: 0f00e8af-eb19-4985-a940-a02fa84fcbc5
resourceVersion: "2840"
selfLink: /api/v1/namespaces/kube-system/pods/cilium-d5n4f
uid: e3f7d566-ee5b-4557-8d1b-f0964cde2f22
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchFields:
- key: metadata.name
operator: In
values:
- us-central1-dilyevsky-master-qmwnl
containers:
- args:
- --config-dir=/tmp/cilium/config-map
command:
- cilium-agent
env:
- name: K8S_NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
- name: CILIUM_K8S_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: CILIUM_FLANNEL_MASTER_DEVICE
valueFrom:
configMapKeyRef:
key: flannel-master-device
name: cilium-config
optional: true
- name: CILIUM_FLANNEL_UNINSTALL_ON_EXIT
valueFrom:
configMapKeyRef:
key: flannel-uninstall-on-exit
name: cilium-config
optional: true
- name: CILIUM_CLUSTERMESH_CONFIG
value: /var/lib/cilium/clustermesh/
- name: CILIUM_CNI_CHAINING_MODE
valueFrom:
configMapKeyRef:
key: cni-chaining-mode
name: cilium-config
optional: true
- name: CILIUM_CUSTOM_CNI_CONF
valueFrom:
configMapKeyRef:
key: custom-cni-conf
name: cilium-config
optional: true
image: docker.io/cilium/cilium:v1.7.6
imagePullPolicy: IfNotPresent
lifecycle:
postStart:
exec:
command:
- /cni-install.sh
- --enable-debug=false
preStop:
exec:
command:
- /cni-uninstall.sh
livenessProbe:
exec:
command:
- cilium
- status
- --brief
failureThreshold: 10
initialDelaySeconds: 120
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 5
name: cilium-agent
readinessProbe:
exec:
command:
- cilium
- status
- --brief
failureThreshold: 3
initialDelaySeconds: 5
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 5
resources: {}
securityContext:
capabilities:
add:
- NET_ADMIN
- SYS_MODULE
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/cilium
name: cilium-run
- mountPath: /host/opt/cni/bin
name: cni-path
- mountPath: /host/etc/cni/net.d
name: etc-cni-netd
- mountPath: /var/lib/cilium/clustermesh
name: clustermesh-secrets
readOnly: true
- mountPath: /tmp/cilium/config-map
name: cilium-config-path
readOnly: true
- mountPath: /lib/modules
name: lib-modules
readOnly: true
- mountPath: /run/xtables.lock
name: xtables-lock
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: cilium-token-j74lr
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
hostNetwork: true
initContainers:
- command:
- /init-container.sh
env:
- name: CILIUM_ALL_STATE
valueFrom:
configMapKeyRef:
key: clean-cilium-state
name: cilium-config
optional: true
- name: CILIUM_BPF_STATE
valueFrom:
configMapKeyRef:
key: clean-cilium-bpf-state
name: cilium-config
optional: true
- name: CILIUM_WAIT_BPF_MOUNT
valueFrom:
configMapKeyRef:
key: wait-bpf-mount
name: cilium-config
optional: true
image: docker.io/cilium/cilium:v1.7.6
imagePullPolicy: IfNotPresent
name: clean-cilium-state
resources: {}
securityContext:
capabilities:
add:
- NET_ADMIN
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/cilium
name: cilium-run
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: cilium-token-j74lr
readOnly: true
priority: 2000001000
priorityClassName: system-node-critical
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: cilium
serviceAccountName: cilium
terminationGracePeriodSeconds: 1
tolerations:
- operator: Exists
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/disk-pressure
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/memory-pressure
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/pid-pressure
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/unschedulable
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/network-unavailable
operator: Exists
volumes:
- hostPath:
path: /var/run/cilium
type: DirectoryOrCreate
name: cilium-run
- hostPath:
path: /opt/cni/bin
type: DirectoryOrCreate
name: cni-path
- hostPath:
path: /etc/cni/net.d
type: DirectoryOrCreate
name: etc-cni-netd
- hostPath:
path: /lib/modules
type: ""
name: lib-modules
- hostPath:
path: /run/xtables.lock
type: FileOrCreate
name: xtables-lock
- name: clustermesh-secrets
secret:
defaultMode: 420
optional: true
secretName: cilium-clustermesh
- configMap:
defaultMode: 420
name: cilium-config
name: cilium-config-path
- name: cilium-token-j74lr
secret:
defaultMode: 420
secretName: cilium-token-j74lr
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2020-07-09T23:17:53Z"
message: '0/6 nodes are available: 5 node(s) didn''t match node selector.'
reason: Unschedulable
status: "False"
type: PodScheduled
phase: Pending
qosClass: BestEffort
我重现此方法的方式是通过拆分具有3个主节点和3个工作节点的新群集(使用群集API)并应用Cilium 1.7.6:
Cilium yaml:
---
# Source: cilium/charts/agent/templates/serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: cilium
namespace: kube-system
---
# Source: cilium/charts/operator/templates/serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: cilium-operator
namespace: kube-system
---
# Source: cilium/charts/config/templates/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: cilium-config
namespace: kube-system
data:
# Identity allocation mode selects how identities are shared between cilium
# nodes by setting how they are stored. The options are "crd" or "kvstore".
# - "crd" stores identities in kubernetes as CRDs (custom resource definition).
# These can be queried with:
# kubectl get ciliumid
# - "kvstore" stores identities in a kvstore, etcd or consul, that is
# configured below. Cilium versions before 1.6 supported only the kvstore
# backend. Upgrades from these older cilium versions should continue using
# the kvstore by commenting out the identity-allocation-mode below, or
# setting it to "kvstore".
identity-allocation-mode: crd
# If you want to run cilium in debug mode change this value to true
debug: "false"
# Enable IPv4 addressing. If enabled, all endpoints are allocated an IPv4
# address.
enable-ipv4: "true"
# Enable IPv6 addressing. If enabled, all endpoints are allocated an IPv6
# address.
enable-ipv6: "false"
# If you want cilium monitor to aggregate tracing for packets, set this level
# to "low", "medium", or "maximum". The higher the level, the less packets
# that will be seen in monitor output.
monitor-aggregation: medium
# The monitor aggregation interval governs the typical time between monitor
# notification events for each allowed connection.
#
# Only effective when monitor aggregation is set to "medium" or higher.
monitor-aggregation-interval: 5s
# The monitor aggregation flags determine which TCP flags which, upon the
# first observation, cause monitor notifications to be generated.
#
# Only effective when monitor aggregation is set to "medium" or higher.
monitor-aggregation-flags: all
# ct-global-max-entries-* specifies the maximum number of connections
# supported across all endpoints, split by protocol: tcp or other. One pair
# of maps uses these values for IPv4 connections, and another pair of maps
# use these values for IPv6 connections.
#
# If these values are modified, then during the next Cilium startup the
# tracking of ongoing connections may be disrupted. This may lead to brief
# policy drops or a change in loadbalancing decisions for a connection.
#
# For users upgrading from Cilium 1.2 or earlier, to minimize disruption
# during the upgrade process, comment out these options.
bpf-ct-global-tcp-max: "524288"
bpf-ct-global-any-max: "262144"
# bpf-policy-map-max specified the maximum number of entries in endpoint
# policy map (per endpoint)
bpf-policy-map-max: "16384"
# Pre-allocation of map entries allows per-packet latency to be reduced, at
# the expense of up-front memory allocation for the entries in the maps. The
# default value below will minimize memory usage in the default installation;
# users who are sensitive to latency may consider setting this to "true".
#
# This option was introduced in Cilium 1.4. Cilium 1.3 and earlier ignore
# this option and behave as though it is set to "true".
#
# If this value is modified, then during the next Cilium startup the restore
# of existing endpoints and tracking of ongoing connections may be disrupted.
# This may lead to policy drops or a change in loadbalancing decisions for a
# connection for some time. Endpoints may need to be recreated to restore
# connectivity.
#
# If this option is set to "false" during an upgrade from 1.3 or earlier to
# 1.4 or later, then it may cause one-time disruptions during the upgrade.
preallocate-bpf-maps: "false"
# Regular expression matching compatible Istio sidecar istio-proxy
# container image names
sidecar-istio-proxy-image: "cilium/istio_proxy"
# Encapsulation mode for communication between nodes
# Possible values:
# - disabled
# - vxlan (default)
# - geneve
tunnel: vxlan
# Name of the cluster. Only relevant when building a mesh of clusters.
cluster-name: default
# DNS Polling periodically issues a DNS lookup for each `matchName` from
# cilium-agent. The result is used to regenerate endpoint policy.
# DNS lookups are repeated with an interval of 5 seconds, and are made for
# A(IPv4) and AAAA(IPv6) addresses. Should a lookup fail, the most recent IP
# data is used instead. An IP change will trigger a regeneration of the Cilium
# policy for each endpoint and increment the per cilium-agent policy
# repository revision.
#
# This option is disabled by default starting from version 1.4.x in favor
# of a more powerful DNS proxy-based implementation, see [0] for details.
# Enable this option if you want to use FQDN policies but do not want to use
# the DNS proxy.
#
# To ease upgrade, users may opt to set this option to "true".
# Otherwise please refer to the Upgrade Guide [1] which explains how to
# prepare policy rules for upgrade.
#
# [0] http://docs.cilium.io/en/stable/policy/language/#dns-based
# [1] http://docs.cilium.io/en/stable/install/upgrade/#changes-that-may-require-action
tofqdns-enable-poller: "false"
# wait-bpf-mount makes init container wait until bpf filesystem is mounted
wait-bpf-mount: "false"
masquerade: "true"
enable-xt-socket-fallback: "true"
install-iptables-rules: "true"
auto-direct-node-routes: "false"
kube-proxy-replacement: "probe"
enable-host-reachable-services: "false"
enable-external-ips: "false"
enable-node-port: "false"
node-port-bind-protection: "true"
enable-auto-protect-node-port-range: "true"
enable-endpoint-health-checking: "true"
enable-well-known-identities: "false"
enable-remote-node-identity: "true"
---
# Source: cilium/charts/agent/templates/clusterrole.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: cilium
rules:
- apiGroups:
- networking.k8s.io
resources:
- networkpolicies
verbs:
- get
- list
- watch
- apiGroups:
- discovery.k8s.io
resources:
- endpointslices
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- namespaces
- services
- nodes
- endpoints
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- pods
- nodes
verbs:
- get
- list
- watch
- update
- apiGroups:
- ""
resources:
- nodes
- nodes/status
verbs:
- patch
- apiGroups:
- apiextensions.k8s.io
resources:
- customresourcedefinitions
verbs:
- create
- get
- list
- watch
- update
- apiGroups:
- cilium.io
resources:
- ciliumnetworkpolicies
- ciliumnetworkpolicies/status
- ciliumclusterwidenetworkpolicies
- ciliumclusterwidenetworkpolicies/status
- ciliumendpoints
- ciliumendpoints/status
- ciliumnodes
- ciliumnodes/status
- ciliumidentities
- ciliumidentities/status
verbs:
- '*'
---
# Source: cilium/charts/operator/templates/clusterrole.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: cilium-operator
rules:
- apiGroups:
- ""
resources:
# to automatically delete [core|kube]dns pods so that are starting to being
# managed by Cilium
- pods
verbs:
- get
- list
- watch
- delete
- apiGroups:
- discovery.k8s.io
resources:
- endpointslices
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
# to automatically read from k8s and import the node's pod CIDR to cilium's
# etcd so all nodes know how to reach another pod running in in a different
# node.
- nodes
# to perform the translation of a CNP that contains `ToGroup` to its endpoints
- services
- endpoints
# to check apiserver connectivity
- namespaces
verbs:
- get
- list
- watch
- apiGroups:
- cilium.io
resources:
- ciliumnetworkpolicies
- ciliumnetworkpolicies/status
- ciliumclusterwidenetworkpolicies
- ciliumclusterwidenetworkpolicies/status
- ciliumendpoints
- ciliumendpoints/status
- ciliumnodes
- ciliumnodes/status
- ciliumidentities
- ciliumidentities/status
verbs:
- '*'
---
# Source: cilium/charts/agent/templates/clusterrolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: cilium
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cilium
subjects:
- kind: ServiceAccount
name: cilium
namespace: kube-system
---
# Source: cilium/charts/operator/templates/clusterrolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: cilium-operator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cilium-operator
subjects:
- kind: ServiceAccount
name: cilium-operator
namespace: kube-system
---
# Source: cilium/charts/agent/templates/daemonset.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
labels:
k8s-app: cilium
name: cilium
namespace: kube-system
spec:
selector:
matchLabels:
k8s-app: cilium
template:
metadata:
annotations:
# This annotation plus the CriticalAddonsOnly toleration makes
# cilium to be a critical pod in the cluster, which ensures cilium
# gets priority scheduling.
# https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/
scheduler.alpha.kubernetes.io/critical-pod: ""
labels:
k8s-app: cilium
spec:
containers:
- args:
- --config-dir=/tmp/cilium/config-map
command:
- cilium-agent
livenessProbe:
exec:
command:
- cilium
- status
- --brief
failureThreshold: 10
# The initial delay for the liveness probe is intentionally large to
# avoid an endless kill & restart cycle if in the event that the initial
# bootstrapping takes longer than expected.
initialDelaySeconds: 120
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 5
readinessProbe:
exec:
command:
- cilium
- status
- --brief
failureThreshold: 3
initialDelaySeconds: 5
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 5
env:
- name: K8S_NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
- name: CILIUM_K8S_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: CILIUM_FLANNEL_MASTER_DEVICE
valueFrom:
configMapKeyRef:
key: flannel-master-device
name: cilium-config
optional: true
- name: CILIUM_FLANNEL_UNINSTALL_ON_EXIT
valueFrom:
configMapKeyRef:
key: flannel-uninstall-on-exit
name: cilium-config
optional: true
- name: CILIUM_CLUSTERMESH_CONFIG
value: /var/lib/cilium/clustermesh/
- name: CILIUM_CNI_CHAINING_MODE
valueFrom:
configMapKeyRef:
key: cni-chaining-mode
name: cilium-config
optional: true
- name: CILIUM_CUSTOM_CNI_CONF
valueFrom:
configMapKeyRef:
key: custom-cni-conf
name: cilium-config
optional: true
image: "docker.io/cilium/cilium:v1.7.6"
imagePullPolicy: IfNotPresent
lifecycle:
postStart:
exec:
command:
- "/cni-install.sh"
- "--enable-debug=false"
preStop:
exec:
command:
- /cni-uninstall.sh
name: cilium-agent
securityContext:
capabilities:
add:
- NET_ADMIN
- SYS_MODULE
privileged: true
volumeMounts:
- mountPath: /var/run/cilium
name: cilium-run
- mountPath: /host/opt/cni/bin
name: cni-path
- mountPath: /host/etc/cni/net.d
name: etc-cni-netd
- mountPath: /var/lib/cilium/clustermesh
name: clustermesh-secrets
readOnly: true
- mountPath: /tmp/cilium/config-map
name: cilium-config-path
readOnly: true
# Needed to be able to load kernel modules
- mountPath: /lib/modules
name: lib-modules
readOnly: true
- mountPath: /run/xtables.lock
name: xtables-lock
hostNetwork: true
initContainers:
- command:
- /init-container.sh
env:
- name: CILIUM_ALL_STATE
valueFrom:
configMapKeyRef:
key: clean-cilium-state
name: cilium-config
optional: true
- name: CILIUM_BPF_STATE
valueFrom:
configMapKeyRef:
key: clean-cilium-bpf-state
name: cilium-config
optional: true
- name: CILIUM_WAIT_BPF_MOUNT
valueFrom:
configMapKeyRef:
key: wait-bpf-mount
name: cilium-config
optional: true
image: "docker.io/cilium/cilium:v1.7.6"
imagePullPolicy: IfNotPresent
name: clean-cilium-state
securityContext:
capabilities:
add:
- NET_ADMIN
privileged: true
volumeMounts:
- mountPath: /var/run/cilium
name: cilium-run
restartPolicy: Always
priorityClassName: system-node-critical
serviceAccount: cilium
serviceAccountName: cilium
terminationGracePeriodSeconds: 1
tolerations:
- operator: Exists
volumes:
# To keep state between restarts / upgrades
- hostPath:
path: /var/run/cilium
type: DirectoryOrCreate
name: cilium-run
# To install cilium cni plugin in the host
- hostPath:
path: /opt/cni/bin
type: DirectoryOrCreate
name: cni-path
# To install cilium cni configuration in the host
- hostPath:
path: /etc/cni/net.d
type: DirectoryOrCreate
name: etc-cni-netd
# To be able to load kernel modules
- hostPath:
path: /lib/modules
name: lib-modules
# To access iptables concurrently with other processes (e.g. kube-proxy)
- hostPath:
path: /run/xtables.lock
type: FileOrCreate
name: xtables-lock
# To read the clustermesh configuration
- name: clustermesh-secrets
secret:
defaultMode: 420
optional: true
secretName: cilium-clustermesh
# To read the configuration from the config map
- configMap:
name: cilium-config
name: cilium-config-path
updateStrategy:
rollingUpdate:
maxUnavailable: 2
type: RollingUpdate
---
# Source: cilium/charts/operator/templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
io.cilium/app: operator
name: cilium-operator
name: cilium-operator
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
io.cilium/app: operator
name: cilium-operator
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
annotations:
labels:
io.cilium/app: operator
name: cilium-operator
spec:
containers:
- args:
- --debug=$(CILIUM_DEBUG)
- --identity-allocation-mode=$(CILIUM_IDENTITY_ALLOCATION_MODE)
- --synchronize-k8s-nodes=true
command:
- cilium-operator
env:
- name: CILIUM_K8S_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: K8S_NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
- name: CILIUM_DEBUG
valueFrom:
configMapKeyRef:
key: debug
name: cilium-config
optional: true
- name: CILIUM_CLUSTER_NAME
valueFrom:
configMapKeyRef:
key: cluster-name
name: cilium-config
optional: true
- name: CILIUM_CLUSTER_ID
valueFrom:
configMapKeyRef:
key: cluster-id
name: cilium-config
optional: true
- name: CILIUM_IPAM
valueFrom:
configMapKeyRef:
key: ipam
name: cilium-config
optional: true
- name: CILIUM_DISABLE_ENDPOINT_CRD
valueFrom:
configMapKeyRef:
key: disable-endpoint-crd
name: cilium-config
optional: true
- name: CILIUM_KVSTORE
valueFrom:
configMapKeyRef:
key: kvstore
name: cilium-config
optional: true
- name: CILIUM_KVSTORE_OPT
valueFrom:
configMapKeyRef:
key: kvstore-opt
name: cilium-config
optional: true
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
key: AWS_ACCESS_KEY_ID
name: cilium-aws
optional: true
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
key: AWS_SECRET_ACCESS_KEY
name: cilium-aws
optional: true
- name: AWS_DEFAULT_REGION
valueFrom:
secretKeyRef:
key: AWS_DEFAULT_REGION
name: cilium-aws
optional: true
- name: CILIUM_IDENTITY_ALLOCATION_MODE
valueFrom:
configMapKeyRef:
key: identity-allocation-mode
name: cilium-config
optional: true
image: "docker.io/cilium/operator:v1.7.6"
imagePullPolicy: IfNotPresent
name: cilium-operator
livenessProbe:
httpGet:
host: '127.0.0.1'
path: /healthz
port: 9234
scheme: HTTP
initialDelaySeconds: 60
periodSeconds: 10
timeoutSeconds: 3
hostNetwork: true
restartPolicy: Always
serviceAccount: cilium-operator
serviceAccountName: cilium-operator
这是调度程序日志:
I0709 23:08:22.055830 1 registry.go:150] Registering EvenPodsSpread predicate and priority function
I0709 23:08:22.056081 1 registry.go:150] Registering EvenPodsSpread predicate and priority function
I0709 23:08:23.137451 1 serving.go:313] Generated self-signed cert in-memory
W0709 23:08:33.843509 1 authentication.go:297] Error looking up in-cluster authentication configuration: etcdserver: request timed out
W0709 23:08:33.843671 1 authentication.go:298] Continuing without authentication configuration. This may treat all requests as anonymous.
W0709 23:08:33.843710 1 authentication.go:299] To require authentication configuration lookup to succeed, set --authentication-tolerate-lookup-failure=false
I0709 23:08:33.911805 1 registry.go:150] Registering EvenPodsSpread predicate and priority function
I0709 23:08:33.911989 1 registry.go:150] Registering EvenPodsSpread predicate and priority function
W0709 23:08:33.917999 1 authorization.go:47] Authorization is disabled
W0709 23:08:33.918162 1 authentication.go:40] Authentication is disabled
I0709 23:08:33.918238 1 deprecated_insecure_serving.go:51] Serving healthz insecurely on [::]:10251
I0709 23:08:33.925860 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0709 23:08:33.926013 1 shared_informer.go:223] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0709 23:08:33.930685 1 secure_serving.go:178] Serving securely on 127.0.0.1:10259
I0709 23:08:33.936198 1 tlsconfig.go:240] Starting DynamicServingCertificateController
I0709 23:08:34.026382 1 shared_informer.go:230] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0709 23:08:34.036998 1 leaderelection.go:242] attempting to acquire leader lease kube-system/kube-scheduler...
I0709 23:08:50.597201 1 leaderelection.go:252] successfully acquired lease kube-system/kube-scheduler
E0709 23:08:50.658551 1 factory.go:503] pod: kube-system/coredns-66bff467f8-9rjvd is already present in the active queue
E0709 23:12:27.673854 1 factory.go:503] pod kube-system/cilium-vv466 is already present in the backoff queue
E0709 23:12:58.099432 1 leaderelection.go:320] error retrieving resource lock kube-system/kube-scheduler: etcdserver: leader changed
重新启动调度程序容器后,挂起的容器将立即进行调度。
你会得到什么荚事件? 您知道节点中是否有污点
没有安排在什么地方? 它仅对主节点或任何其他节点失败
节点? 节点中是否有足够的空间?
2020年7月9日星期四dilyevsky下午7:49, notifications @github.com
写道:
它似乎只是ds pod的节点的metadata.name ...
奇怪的。 这是豆荚的Yaml:apiVersion:v1kind:Podmetadata:
注释:
scheduler.alpha.kubernetes.io/critical-pod:“”
creationTimestamp:“ 2020-07-09T23:17:53Z”
generateName:纤毛-
标签:
控制器修订哈希:6c94db8bb8
k8s-app:纤毛
pod模板生成:“ 1”
managedFields:
#托管字段废话
名称:cilium-d5n4f
命名空间:kube-system
owner参考:
- apiVersion:apps / v1
blockOwnerDeletion:true
控制器:真
种类:DaemonSet
名称:纤毛
uid:0f00e8af-eb19-4985-a940-a02fa84fcbc5
resourceVersion:“ 2840”
selfLink:/ api / v1 /名称空间/ kube-system / pods / cilium-d5n4f
uid:e3f7d566-ee5b-4557-8d1b-f0964cde2f22spec:
亲和力:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
-matchFields:
-键:metadata.name
运算符:在
值:
-us-central1-dilyevsky-master-qmwnl
容器:- args:
- --config-dir = / tmp / cilium / config-map
命令:- 纤毛剂
环境:- 名称:K8S_NODE_NAME
valueFrom:
fieldRef:
apiVersion:v1
fieldPath:spec.nodeName- 名称:CILIUM_K8S_NAMESPACE
valueFrom:
fieldRef:
apiVersion:v1
fieldPath:meta.namespace- 名称:CILIUM_FLANNEL_MASTER_DEVICE
valueFrom:
configMapKeyRef:
密钥:法兰绒主设备
名称:cilium-config
可选:true- 名称:CILIUM_FLANNEL_UNINSTALL_ON_EXIT
valueFrom:
configMapKeyRef:
密钥:退出时法兰绒卸载
名称:cilium-config
可选:true- 名称:CILIUM_CLUSTERMESH_CONFIG
值:/ var / lib / cilium / clustermesh /- 名称:CILIUM_CNI_CHAINING_MODE
valueFrom:
configMapKeyRef:
密钥:cni链接模式
名称:cilium-config
可选:true- 名称:CILIUM_CUSTOM_CNI_CONF
valueFrom:
configMapKeyRef:
密钥:custom-cni-conf
名称:cilium-config
可选:true
图像:docker.io/cilium/ cilium:v1.7.6
imagePullPolicy:IfNotPresent
生命周期:
postStart:
执行:
命令:
- /cni-install.sh
- --enable-debug = false
preStop:
执行:
命令:- /cni-uninstall.sh
livenessProbe:
执行:
命令:
- 纤毛
- 状态
- - 简要
failureThreshold:10
initialDelaySeconds:120
periodSecond:30
successThreshold:1
超时秒:5
名称:纤毛剂
准备情况:
执行:
命令:- 纤毛
- 状态
- - 简要
failureThreshold:3
initialDelaySeconds:5
periodSecond:30
successThreshold:1
超时秒:5
资源:{}
securityContext:
功能:
加:- NET_ADMIN
- SYS_MODULE
特权:真
TerminationMessagePath:/ dev / termination-log
TerminationMessagePolicy:文件
volumeMounts:- mountPath:/ var / run / cilium
名称:cilium-run- mountPath:/ host / opt / cni / bin
名称:cni-path- mountPath:/host/etc/cni/net.d
名称:etc-cni-netd- mountPath:/ var / lib / cilium / clustermesh
名称:clustermesh-secrets
readOnly:正确- mountPath:/ tmp / cilium / config-map
名称:cilium-config-path
readOnly:正确- mountPath:/ lib / modules
名称:lib-modules
readOnly:正确- mountPath:/run/xtables.lock
名称:xtables-lock- mountPath:/var/run/secrets/kubernetes.io/serviceaccount
名称:cilium-token-j74lr
readOnly:正确
dnsPolicy:ClusterFirst
enableServiceLinks:是
hostNetwork:true
initContainers:- 命令:
- /init-container.sh
环境:- 名称:CILIUM_ALL_STATE
valueFrom:
configMapKeyRef:
关键:清洁纤毛状态
名称:cilium-config
可选:true- 名称:CILIUM_BPF_STATE
valueFrom:
configMapKeyRef:
关键:clean-cilium-bpf-state
名称:cilium-config
可选:true- 名称:CILIUM_WAIT_BPF_MOUNT
valueFrom:
configMapKeyRef:
键:wait-bpf-mount
名称:cilium-config
可选:true
图像:docker.io/cilium/ cilium:v1.7.6
imagePullPolicy:IfNotPresent
名称:清洁纤毛状态
资源:{}
securityContext:
功能:
加:
- NET_ADMIN
特权:真
TerminationMessagePath:/ dev / termination-log
TerminationMessagePolicy:文件
volumeMounts:- mountPath:/ var / run / cilium
名称:cilium-run- mountPath:/var/run/secrets/kubernetes.io/serviceaccount
名称:cilium-token-j74lr
readOnly:正确
优先级:2000001000
priorityClassName:关键系统节点
restartPolicy:始终
schedulerName:默认调度程序
securityContext:{}
服务帐户:cilium
serviceAccountName:纤毛
TerminationGracePeriodSeconds:1
公差:- 运算符:存在
- 效果:NoExecute
密钥:node.kubernetes.io/未就绪
运算符:存在- 效果:NoExecute
密钥:node.kubernetes.io/无法访问
运算符:存在- 效果:NoSchedule
密钥:node.kubernetes.io/磁盘压力
运算符:存在- 效果:NoSchedule
密钥:node.kubernetes.io/内存压力
运算符:存在- 效果:NoSchedule
密钥:node.kubernetes.io/pid-pressure
运算符:存在- 效果:NoSchedule
密钥:node.kubernetes.io/不可调度
运算符:存在- 效果:NoSchedule
密钥:node.kubernetes.io/网络不可用
运算符:存在
数量:- hostPath:
路径:/ var / run / cilium
类型:DirectoryOrCreate
名称:cilium-run- hostPath:
路径:/ opt / cni / bin
类型:DirectoryOrCreate
名称:cni-path- hostPath:
路径:/etc/cni/net.d
类型:DirectoryOrCreate
名称:etc-cni-netd- hostPath:
路径:/ lib / modules
类型:“”
名称:lib-modules- hostPath:
路径:/run/xtables.lock
类型:FileOrCreate
名称:xtables-lock- 名称:clustermesh-secrets
秘密:
defaultMode:420
可选:true
secretName:纤毛clustermesh- configMap:
defaultMode:420
名称:cilium-config
名称:cilium-config-path- 名称:cilium-token-j74lr
秘密:
defaultMode:420
secretName:cilium-token-j74lrstatus:
条件:- lastProbeTime:null
lastTransitionTime:“ 2020-07-09T23:17:53Z”
消息:“ 0/6个节点可用:5个节点与节点选择器不匹配。”
原因:计划外
状态:“假”
类型:PodScheduled
阶段:待定
qosClass:尽力而为我重现此方法的方法是将新集群与2个主节点合并,
3个工作节点(使用群集API)并应用Cilium 1.7.6:---#来源:cilium / charts / agent / templates / serviceaccount.yamlapi版本:v1kind:ServiceAccount元数据:
名称:纤毛
命名空间:kube-system
---#来源:cilium / charts / operator / templates / serviceaccount.yamlapi版本:v1kind:ServiceAccount元数据:
名称:cilium-operator
命名空间:kube-system
---#来源:cilium / charts / config / templates / configmap.yamlapi版本:v1kind:ConfigMapmetadata:
名称:cilium-config
命名空间:kube-systemdata:#身份分配模式选择如何在cilium之间共享身份
通过设置节点的存储方式来设置节点数。 选项为“ crd”或“ kvstore”。
#-“ crd”将身份存储在kubernetes中作为CRD(自定义资源定义)。
#可通过以下方式查询:
#kubectl获得ciliumid
#-“ kvstore”将身份存储在kvstore,etcd或consul中,即
#在下面配置。 1.6版之前的Cilium版本仅支持kvstore
#后端。 从这些较旧的cilium版本升级应继续使用
通过注释掉下面的identity-allocation-mode来#kvstore,或者
#将其设置为“ kvstore”。
身份分配模式:crd#如果要在调试模式下运行cilium,请将此值更改为true
调试:“假”#启用IPv4寻址。 如果启用,将为所有端点分配一个IPv4
# 地址。
enable-ipv4:“ true”#启用IPv6寻址。 如果启用,将为所有端点分配一个IPv6
# 地址。
enable-ipv6:“假”#如果您希望cilium Monitor聚合数据包跟踪,请设置此级别
#设置为“低”,“中”或“最大”。 级别越高,数据包越少
#将在监视器输出中看到。
监控汇总:中等#监视器聚合间隔决定监视器之间的典型时间
#每个允许的连接的通知事件。
#
#仅在监视器聚合设置为“中”或更高时有效。
monitor-aggregation-interval:5秒#监控器聚合标志确定哪些TCP标志在
#第一次观察,导致生成监视器通知。
#
#仅在监视器聚合设置为“中”或更高时有效。
monitor-aggregation-flags:全部#ct-global-max-entries- *指定最大连接数
#支持所有端点,按协议划分:tcp或其他。 一对
数量的地图将这些值用于IPv4连接,另一对地图
#将这些值用于IPv6连接。
#
#如果修改了这些值,则在下次Cilium启动期间,
#跟踪正在进行的连接可能会中断。 这可能导致简短
#策略丢失或更改连接的负载平衡决策。
#
#对于从Cilium 1.2或更早版本升级的用户,以最大程度地减少中断
#在升级过程中,注释掉这些选项。
bpf-ct-global-tcp-max:“ 524288”
bpf-ct-global-any-max:“ 262144”#bpf-policy-map-max指定端点中的最大条目数
#策略映射(每个端点)
bpf-policy-map-max:“ 16384”#预先分配地图项可以减少每个数据包的延迟
#为映射中的条目分配前期内存的费用。 的
#下面的默认值将最小化默认安装中的内存使用;
#对延迟敏感的用户可以考虑将其设置为“ true”。
#
#此选项在Cilium 1.4中引入。 Cilium 1.3和更早版本忽略
#此选项,其行为就像设置为“ true”一样。
#
#如果修改此值,则在下次Cilium启动期间还原
现有端点的数量和对正在进行的连接的跟踪可能会中断。
#这可能会导致策略丢弃或负载均衡决策的更改
#连接一段时间。 可能需要重新创建端点才能还原
#连接。
#
#如果在从1.3或更早版本升级到此版本的过程中将此选项设置为“ false”
#1.4或更高版本,则可能在升级过程中造成一次性中断。
preallocate-bpf-maps:“假”#正则表达式匹配兼容的Istio sidecar istio-proxy
#容器映像名称
sidecar-istio-proxy-image:“ cilium / istio_proxy”#节点间通信的封装方式
#可能的值:
#-禁用
#-vxlan(默认)
#-日内瓦
隧道:vxlan#集群名称。 仅在构建群集网格时才有意义。
群集名称:默认#DNS轮询会定期为每个
matchName
发出DNS查询
#纤毛剂。 结果用于重新生成端点策略。
#DNS查询以5秒钟的间隔重复进行,
#A(IPv4)和AAAA(IPv6)地址。 如果查找失败,则使用最新的IP
使用#数据代替。 IP更改将触发Cilium的再生
#每个端点的策略,并增加每个cilium-agent策略
#存储库修订版。
#
#默认情况下,从1.4.x版本开始禁用此选项
基于DNS代理的功能更强大的实现的详细信息,请参见[0]。
#如果要使用FQDN策略但不想使用,请启用此选项
#DNS代理。
#
#为了简化升级,用户可以选择将此选项设置为“ true”。
#否则,请参考升级指南[1],其中介绍了如何
#准备升级策略规则。
#
#[0] http://docs.cilium.io/en/stable/policy/language/#dns -based
#[1] http://docs.cilium.io/en/stable/install/upgrade/#changes -that-may-require-action
tofqdns-enable-poller:“假”#wait-bpf-mount使初始化容器等待直到挂载bpf文件系统
wait-bpf-mount:“假”假面舞会:“真实”
enable-xt-socket-fallback:“ true”
install-iptables-rules:“ true”
自动直接节点路由:“假”
kube-proxy-replacement:“探针”
enable-host-reachable-services:“假”
enable-external-ips:“假”
enable-node-port:“假”
节点端口绑定保护:“ true”
启用自动保护节点端口范围:“ true”
enable-endpoint-health-checking:“真”
enable-众所周知的身份:“假”
enable-remote-node-identity:“ true”
---#来源:cilium / charts / agent / templates / clusterrole.yamlapi版本:rbac.authorization.k8s.io/v1kind:ClusterRolemetadata:
名称:纤毛规则:
- apiGroups:
- 联网.k8s.io
资源:- 网络政策
动词:- 得到
- 清单
- 看
- apiGroups:
- Discovery.k8s.io
资源:- 端点切片
动词:- 得到
- 清单
- 看
- apiGroups:
- ”
资源:- 命名空间
- 服务
- 节点
- 终点
动词:- 得到
- 清单
- 看
- apiGroups:
- ”
资源:- 豆荚
- 节点
动词:- 得到
- 清单
- 看
- 更新
- apiGroups:
- ”
资源:- 节点
- 节点/状态
动词:- 补丁
- apiGroups:
- apiextensions.k8s.io
资源:- customresourcedefinitions
动词:- 创造
- 得到
- 清单
- 看
- 更新
- apiGroups:
- 纤毛虫
资源:- cilium网络政策
- ciliumnetworkpolicies / status
- cilium群集全网政策
- ciliumclusterwidenetworkpolicy / status
- 纤毛端点
- 纤毛端点/状态
- 纤毛节
- 纤毛节点/状态
- 纤毛身份
- 机构身份/状态
动词:- '*'
---#来源:cilium / charts / operator / templates / clusterrole.yamlapi版本:rbac.authorization.k8s.io/v1kind:ClusterRolemetadata:
名称:cilium-operatorrules:- apiGroups:
- ”
资源:
#自动删除[core | kube] dns连播,以便开始使用
#由Cilium管理- 豆荚
动词:- 得到
- 清单
- 看
- 删除
- apiGroups:
- Discovery.k8s.io
资源:- 端点切片
动词:- 得到
- 清单
- 看
- apiGroups:
- ”
资源:
#自动从k8s读取并将节点的pod CIDR导入cilium的
#etcd,以便所有节点都知道如何到达另一个以不同方式运行的Pod
#个节点。- 节点
#执行将包含ToGroup
的CNP转换为其端点的操作- 服务
- 终点
#检查apiserver连接- 命名空间
动词:- 得到
- 清单
- 看
- apiGroups:
- 纤毛虫
资源:- cilium网络政策
- ciliumnetworkpolicies / status
- cilium群集全网政策
- ciliumclusterwidenetworkpolicy / status
- 纤毛端点
- 纤毛端点/状态
- 纤毛节
- 纤毛节点/状态
- 纤毛身份
- 机构身份/状态
动词:- '*'
---#来源:cilium / charts / agent / templates / clusterrolebinding.yamlapi版本:rbac.authorization.k8s.io/v1kind:ClusterRoleBinding元数据:
名称:ciliumroleRef:
apiGroup:rbac.authorization.k8s.io
种类:ClusterRole
名称:纤毛科目:- 种类:ServiceAccount
名称:纤毛
命名空间:kube-system
---#来源:cilium / charts / operator / templates / clusterrolebinding.yamlapi版本:rbac.authorization.k8s.io/v1kind:ClusterRoleBinding元数据:
名称:cilium-operatorroleRef:
apiGroup:rbac.authorization.k8s.io
种类:ClusterRole
名称:cilium-operator主题:- 种类:ServiceAccount
名称:cilium-operator
命名空间:kube-system
---#来源:cilium / charts / agent / templates / daemonset.yamlapi版本:apps / v1kind:DaemonSetmetadata:
标签:
k8s-app:纤毛
名称:纤毛
命名空间:kube-systemspec:
选择器:
matchLabels:
k8s-app:纤毛
模板:
元数据:
注释:
#此注释加上CriticalAddonsOnly容忍度
#cilium成为群集中的关键Pod,可确保cilium
#获取优先级调度。
#https ://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/
scheduler.alpha.kubernetes.io/critical-pod:“”
标签:
k8s-app:纤毛
规格:
容器:
- args:
- --config-dir = / tmp / cilium / config-map
命令:- 纤毛剂
livenessProbe:
执行:
命令:
- 纤毛
- 状态
- - 简要
failureThreshold:10
#活动性探针的初始延迟故意大于
#避免在初始情况下出现无限的终止和重启周期
#引导所需的时间比预期的长。
initialDelaySeconds:120
periodSecond:30
successThreshold:1
超时秒:5
准备情况:
执行:
命令:- 纤毛
- 状态
- - 简要
failureThreshold:3
initialDelaySeconds:5
periodSecond:30
successThreshold:1
超时秒:5
环境:- 名称:K8S_NODE_NAME
valueFrom:
fieldRef:
apiVersion:v1
fieldPath:spec.nodeName- 名称:CILIUM_K8S_NAMESPACE
valueFrom:
fieldRef:
apiVersion:v1
fieldPath:meta.namespace- 名称:CILIUM_FLANNEL_MASTER_DEVICE
valueFrom:
configMapKeyRef:
密钥:法兰绒主设备
名称:cilium-config
可选:true- 名称:CILIUM_FLANNEL_UNINSTALL_ON_EXIT
valueFrom:
configMapKeyRef:
密钥:退出时法兰绒卸载
名称:cilium-config
可选:true- 名称:CILIUM_CLUSTERMESH_CONFIG
值:/ var / lib / cilium / clustermesh /- 名称:CILIUM_CNI_CHAINING_MODE
valueFrom:
configMapKeyRef:
密钥:cni链接模式
名称:cilium-config
可选:true- 名称:CILIUM_CUSTOM_CNI_CONF
valueFrom:
configMapKeyRef:
密钥:custom-cni-conf
名称:cilium-config
可选:true
图像:“ docker.io/cilium/ cilium:v1.7.6 ”
imagePullPolicy:IfNotPresent
生命周期:
postStart:
执行:
命令:
- “ /cni-install.sh”
- “ --enable-debug = false”
preStop:
执行:
命令:- /cni-uninstall.sh
名称:纤毛剂
securityContext:
功能:
加:
- NET_ADMIN
- SYS_MODULE
特权:真
volumeMounts:- mountPath:/ var / run / cilium
名称:cilium-run- mountPath:/ host / opt / cni / bin
名称:cni-path- mountPath:/host/etc/cni/net.d
名称:etc-cni-netd- mountPath:/ var / lib / cilium / clustermesh
名称:clustermesh-secrets
readOnly:正确- mountPath:/ tmp / cilium / config-map
名称:cilium-config-path
readOnly:正确
#需要能够加载内核模块- mountPath:/ lib / modules
名称:lib-modules
readOnly:正确- mountPath:/run/xtables.lock
名称:xtables-lock
hostNetwork:true
initContainers:- 命令:
- /init-container.sh
环境:- 名称:CILIUM_ALL_STATE
valueFrom:
configMapKeyRef:
关键:清洁纤毛状态
名称:cilium-config
可选:true- 名称:CILIUM_BPF_STATE
valueFrom:
configMapKeyRef:
关键:clean-cilium-bpf-state
名称:cilium-config
可选:true- 名称:CILIUM_WAIT_BPF_MOUNT
valueFrom:
configMapKeyRef:
键:wait-bpf-mount
名称:cilium-config
可选:true
图像:“ docker.io/cilium/ cilium:v1.7.6 ”
imagePullPolicy:IfNotPresent
名称:清洁纤毛状态
securityContext:
功能:
加:
- NET_ADMIN
特权:真
volumeMounts:- mountPath:/ var / run / cilium
名称:cilium-run
restartPolicy:始终
priorityClassName:关键系统节点
服务帐户:cilium
serviceAccountName:纤毛
TerminationGracePeriodSeconds:1
公差:- 运算符:存在
数量:
#在重启/升级之间保持状态- hostPath:
路径:/ var / run / cilium
类型:DirectoryOrCreate
名称:cilium-run
#在主机上安装cilium cni插件- hostPath:
路径:/ opt / cni / bin
类型:DirectoryOrCreate
名称:cni-path
#在主机上安装cilium cni配置- hostPath:
路径:/etc/cni/net.d
类型:DirectoryOrCreate
名称:etc-cni-netd
#能够加载内核模块- hostPath:
路径:/ lib / modules
名称:lib-modules
#与其他进程(例如kube-proxy)同时访问iptables- hostPath:
路径:/run/xtables.lock
类型:FileOrCreate
名称:xtables-lock
#读取clustermesh配置- 名称:clustermesh-secrets
秘密:
defaultMode:420
可选:true
secretName:纤毛clustermesh
#从config映射中读取配置- configMap:
名称:cilium-config
名称:cilium-config-path
updateStrategy:
滚动更新:
maxUnavailable:2
类型:RollingUpdate
---#来源:cilium / charts / operator / templates / deployment.yamlapi版本:apps / v1kind:部署元数据:
标签:
io.cilium / app:运算子
名称:cilium-operator
名称:cilium-operator
命名空间:kube-systemspec:
复制品:1
选择器:
matchLabels:
io.cilium / app:运算子
名称:cilium-operator
战略:
滚动更新:
maxSurge:1
maxUnavailable:1
类型:RollingUpdate
模板:
元数据:
注释:
标签:
io.cilium / app:运算子
名称:cilium-operator
规格:
容器:- args:
- --debug = $(CILIUM_DEBUG)
- --identity-allocation-mode = $(CILIUM_IDENTITY_ALLOCATION_MODE)
- --synchronize-k8s-nodes = true
命令:- 纤毛算子
环境:- 名称:CILIUM_K8S_NAMESPACE
valueFrom:
fieldRef:
apiVersion:v1
fieldPath:meta.namespace- 名称:K8S_NODE_NAME
valueFrom:
fieldRef:
apiVersion:v1
fieldPath:spec.nodeName- 名称:CILIUM_DEBUG
valueFrom:
configMapKeyRef:
关键:调试
名称:cilium-config
可选:true- 名称:CILIUM_CLUSTER_NAME
valueFrom:
configMapKeyRef:
密钥:集群名称
名称:cilium-config
可选:true- 名称:CILIUM_CLUSTER_ID
valueFrom:
configMapKeyRef:
密钥:cluster-id
名称:cilium-config
可选:true- 名称:CILIUM_IPAM
valueFrom:
configMapKeyRef:
键:ipam
名称:cilium-config
可选:true- 名称:CILIUM_DISABLE_ENDPOINT_CRD
valueFrom:
configMapKeyRef:
键:disable-endpoint-crd
名称:cilium-config
可选:true- 名称:CILIUM_KVSTORE
valueFrom:
configMapKeyRef:
密钥:kvstore
名称:cilium-config
可选:true- 名称:CILIUM_KVSTORE_OPT
valueFrom:
configMapKeyRef:
密钥:kvstore-opt
名称:cilium-config
可选:true- 名称:AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
密钥:AWS_ACCESS_KEY_ID
名称:纤毛
可选:true- 名称:AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
密钥:AWS_SECRET_ACCESS_KEY
名称:纤毛
可选:true- 名称:AWS_DEFAULT_REGION
valueFrom:
secretKeyRef:
密钥:AWS_DEFAULT_REGION
名称:纤毛
可选:true- 名称:CILIUM_IDENTITY_ALLOCATION_MODE
valueFrom:
configMapKeyRef:
密钥:身份分配模式
名称:cilium-config
可选:true
图像:“ docker.io/cilium/运算符:v1.7.6 ”
imagePullPolicy:IfNotPresent
名称:cilium-operator
livenessProbe:
httpGet:
主持人:“ 127.0.0.1”
路径:/ healthz
端口:9234
方案:HTTP
initialDelaySeconds:60
periodSecond:10
超时秒:3
hostNetwork:true
restartPolicy:始终
serviceAccount:cilium-operator
serviceAccountName:cilium-operator-
您收到此邮件是因为您已被分配。
直接回复此电子邮件,在GitHub上查看
https://github.com/kubernetes/kubernetes/issues/91601#issuecomment-656404841 ,
或退订
https://github.com/notifications/unsubscribe-auth/AAJ5E6BMTNCADT5K7D4PMF3R2ZJRVANCNFSM4NOTPEDA
。
您能否尝试增加日志级别并使用grep过滤节点
还是吊舱?
2020年7月9日,星期四,迪利耶夫斯基(dilyevsky), notifications @ github.com
写道:
这是调度程序日志:
I0709 23:08:22.056081 1 Registry.go:150]注册EvenPodsSpread谓词和优先级功能
I0709 23:08:23.137451 1serving.go:313]在内存中生成了自签名证书
W0709 23:08:33.843509 1 authentication.go:297]查找群集内身份验证配置时出错:etcdserver:请求超时
W0709 23:08:33.843671 1 authentication.go:298]继续进行而不进行身份验证配置。 这可能会将所有请求视为匿名请求。
W0709 23:08:33.843710 1 authentication.go:299]要要求成功进行身份验证配置,请设置--authentication-tolerate-lookup-failure = false
I0709 23:08:33.911805 1 Registry.go:150]注册EvenPodsSpread谓词和优先级功能
I0709 23:08:33.911989 1 Registry.go:150]注册EvenPodsSpread谓词和优先级功能
W0709 23:08:33.917999 1 Author.go:47]禁用了授权
W0709 23:08:33.918162 1 authentication.go:40]身份验证已禁用
I0709 23:08:33.918238 1 deprecated_insecure_serving.go:51]在[::]:10251上不安全地提供healthz
I0709 23:08:33.925860 1 configmap_cafile_content.go:202]启动client-ca :: kube-system :: extension-apiserver-authentication :: client-ca-file
I0709 23:08:33.926013 1 shared_informer.go:223]等待缓存同步以获取客户端ca :: kube-system :: extension-apiserver-authentication ::: client-ca-file
I0709 23:08:33.930685 1 secure_serving.go:178]在127.0.0.1:10259上安全地提供服务
I0709 23:08:33.936198 1 tlsconfig.go:240]启动DynamicServingCertificateController
I0709 23:08:34.026382 1 shared_informer.go:230]为客户端ca :: kube-system :: extension-apiserver-authentication :: client-ca-file同步了缓存
I0709 23:08:34.036998 1 Leaderelection.go:242]试图获取领导者租约kube-system / kube-scheduler ...
I0709 23:08:50.597201 1 Leaderelection.go:252]成功获得了租赁kube-system / kube-scheduler
E0709 23:08:50.658551 1 factory.go:503] pod:kube-system / coredns-66bff467f8-9rjvd已存在于活动队列中
E0709 23:12:27.673854 1 factory.go:503] pod kube-system / cilium-vv466已经存在于退避队列中
E0709 23:12:58.099432 1 Leaderelection.go:320]检索资源锁kube-system / kube-scheduler时出错:etcdserver:领导者已更改重新启动调度程序容器后,挂起的容器将立即进行调度。
-
您收到此邮件是因为您已被分配。
直接回复此电子邮件,在GitHub上查看
https://github.com/kubernetes/kubernetes/issues/91601#issuecomment-656406215 ,
或退订
https://github.com/notifications/unsubscribe-auth/AAJ5E6E4QPGNNBFUYSZEJC3R2ZKHDANCNFSM4NOTPEDA
。
这些是事件:
活动:
从消息中键入原因年龄
---- ------ ---- ---- -------
警告失败排程
警告失败排程
The node only has two taints but the pod tolerates all existing taints and yeah it seems to only happen on masters:
污染:node-role.kubernetes.io/ master:NoSchedule
node.kubernetes.io/network-un可用:否
There is enough space and pod is best effort with no reservation anyway:
``` Resource Requests Limits
-------- -------- ------
cpu 650m (32%) 0 (0%)
memory 70Mi (0%) 170Mi (2%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
attachable-volumes-gce-pd 0 0
我现在尝试增加调度程序日志级别...
您的Pod Yaml实际上没有node-role.kubernetes.io/master
容忍度。 因此,它不应该在主服务器中安排。
嗨! 我们遇到了同样的问题。 但是,我们在部署中也遇到了同样的问题,在这种情况下,我们使用反亲和力来确保在每个节点上调度Pod或针对特定节点的Pod选择器。
仅通过设置节点选择器来匹配出现故障的节点的主机名来创建Pod,就足以导致调度失败。 据说5个节点与选择器不匹配,但与第6个节点不匹配。 重新启动调度程序解决了该问题。 看起来有一些关于该节点的缓存,并阻止了对该节点的调度。
正如其他人之前所说,关于失败的日志中没有任何内容。
我们将失败的部署减少到最低限度(我们已经删除了失败的主节点上的污点):
apiVersion: apps/v1
kind: Deployment
metadata:
name: test-deployment
labels:
app: nginx
spec:
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.14.2
restartPolicy: Always
schedulerName: default-scheduler
nodeSelector:
kubernetes.io/hostname: master-2
当主人有一个污点,而布署了一个污点时,我们也遇到了同样的问题。 因此,它似乎与守护进程,容忍度或亲和力/反亲和力没有特别关系。 当故障开始发生时,无法计划针对特定节点的任何事情。 我们看到了1.18.2到1.18.5的问题(没有尝试使用1.18.0或.1)
仅创建一个节点设置为匹配故障节点主机名的Pod足以导致调度失败
您能否澄清在创建这样的Pod之后还是之前,它是否开始失败? 我认为这个节点没有豆荚不能容忍的污点。
@nodo将有助于复制。 您能看一下NodeSelector的代码吗? 测试时,您可能需要添加额外的日志行。 您也可以打印缓存。
$ pidof kube-scheduler
$ sudo kill -SIGUSR2 <pid>
。 请注意,这不会终止调度程序进程。/优先级紧急紧急
/未指定
在尝试部署此测试部署之前,我们已经看到一些守护进程和部署处于“挂起”状态,因此它已经失败了。 并且污点已从节点上移除。
现在,我们失去了发生这种情况的环境,因为我们不得不重新启动节点,因此该问题不再可见。 复制后,我们将尝试返回更多信息
请这样做。 我曾经尝试重现此方法,但没有成功。 我对失败的第一个实例更感兴趣。 它可能仍然与污点有关。
我们已转载了该问题。 我运行了您要求的命令,这是信息:
I0716 14:47:52.768362 1 factory.go:462] Unable to schedule default/test-deployment-558f47bbbb-4rt5t: no fit: 0/6 nodes are available: 5 node(s) didn't match node selector.; waiting
I0716 14:47:52.768683 1 scheduler.go:776] Updating pod condition for default/test-deployment-558f47bbbb-4rt5t to (PodScheduled==False, Reason=Unschedulable)
I0716 14:47:53.018781 1 httplog.go:90] verb="GET" URI="/healthz" latency=299.172µs resp=200 UserAgent="kube-probe/1.18" srcIP="127.0.0.1:57258":
I0716 14:47:59.469828 1 comparer.go:42] cache comparer started
I0716 14:47:59.470936 1 comparer.go:67] cache comparer finished
I0716 14:47:59.471038 1 dumper.go:47] Dump of cached NodeInfo
I0716 14:47:59.471484 1 dumper.go:49]
Node name: master-0-bug
Requested Resources: {MilliCPU:1100 Memory:52428800 EphemeralStorage:0 AllowedPodNumber:0 ScalarResources:map[]}
Allocatable Resources:{MilliCPU:2000 Memory:3033427968 EphemeralStorage:19290208634 AllowedPodNumber:110 ScalarResources:map[hugepages-1Gi:0 hugepages-2Mi:0]}
Scheduled Pods(number: 9):
...
I0716 14:47:59.472623 1 dumper.go:60] Dump of scheduling queue:
name: coredns-cd64c8d7c-29zjq, namespace: kube-system, uid: 938e8827-5d17-4db9-ac04-d229baf4534a, phase: Pending, nominated node:
name: test-deployment-558f47bbbb-4rt5t, namespace: default, uid: fa19fda9-c8d6-4ffe-b248-8ddd24ed5310, phase: Pending, nominated node:
不幸的是,这似乎无济于事
转储缓存是为了调试,它不会更改任何内容。 您能包括转储吗?
此外,假设这是第一个错误,您可以包括pod yaml和node吗?
这几乎就是所有转储的内容,我只是删除了其他节点。 这不是第一个错误,但是您可以在转储中看到coredns pod,这是第一个错误。 我不确定您要在转储中还有什么要求。
我去拿洋娃娃
谢谢,我没有意识到您已经修剪了相关的节点和吊舱。
您是否可以包括该节点的计划的Pod? 以防万一在资源使用情况计算中存在错误。
Requested Resources: {MilliCPU:1100 Memory:52428800 EphemeralStorage:0 AllowedPodNumber:0 ScalarResources:map[]}
AllowedPodNumber: 0
看起来很奇怪。
这是该节点上的其他Pod:
`
name: kube-controller-manager-master-0-bug, namespace: kube-system, uid: 095eebb0-4752-419b-aac7-245e5bc436b8, phase: Running, nominated node:
name: kube-proxy-xwf6h, namespace: kube-system, uid: 16552eaf-9eb8-4584-ba3c-7dff6ce92592, phase: Running, nominated node:
name: kube-apiserver-master-0-bug, namespace: kube-system, uid: 1d338e26-b0bc-4cef-9bad-86b7dd2b2385, phase: Running, nominated node:
name: kube-multus-ds-amd64-tpkm8, namespace: kube-system, uid: d50c0c7f-599c-41d5-a029-b43352a4f5b8, phase: Running, nominated node:
name: openstack-cloud-controller-manager-wrb8n, namespace: kube-system, uid: 17aeb589-84a1-4416-a701-db6d8ef60591, phase: Running, nominated node:
name: kube-scheduler-master-0-bug, namespace: kube-system, uid: 52469084-3122-4e99-92f6-453e512b640f, phase: Running, nominated node:
name: subport-controller-28j9v, namespace: kube-system, uid: a5a07ac8-763a-4ff2-bdae-91c6e9e95698, phase: Running, nominated node:
name: csi-cinder-controllerplugin-0, namespace: kube-system, uid: 8b16d6c8-a871-454e-98a3-0aa545f9c9d0, phase: Running, nominated node:
name: calico-node-d899t, namespace: kube-system, uid: e3672030-53b1-4356-a5df-0f4afd6b9237, phase: Running, nominated node:
所有节点都在转储中请求的资源中将allowedPodNumber设置为0,但是其他节点是可调度的
节点yaml:
apiVersion: v1
kind: Node
metadata:
annotations:
kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
node.alpha.kubernetes.io/ttl: "0"
volumes.kubernetes.io/controller-managed-attach-detach: "true"
creationTimestamp: "2020-07-16T09:59:48Z"
labels:
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/instance-type: 54019dbc-10d7-409c-8338-5556f61a9371
beta.kubernetes.io/os: linux
failure-domain.beta.kubernetes.io/region: regionOne
failure-domain.beta.kubernetes.io/zone: nova
kubernetes.io/arch: amd64
kubernetes.io/hostname: master-0-bug
kubernetes.io/os: linux
node-role.kubernetes.io/master: ""
node.kubernetes.io/instance-type: 54019dbc-10d7-409c-8338-5556f61a9371
node.uuid: 00324054-405e-4fae-a3bf-d8509d511ded
node.uuid_source: cloud-init
topology.kubernetes.io/region: regionOne
topology.kubernetes.io/zone: nova
name: master-0-bug
resourceVersion: "85697"
selfLink: /api/v1/nodes/master-0-bug
uid: 629b6ef3-3c76-455b-8b6b-196c4754fb0e
spec:
podCIDR: 192.168.0.0/24
podCIDRs:
- 192.168.0.0/24
providerID: openstack:///00324054-405e-4fae-a3bf-d8509d511ded
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/master
status:
addresses:
- address: 10.0.10.14
type: InternalIP
- address: master-0-bug
type: Hostname
allocatable:
cpu: "2"
ephemeral-storage: "19290208634"
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 2962332Ki
pods: "110"
capacity:
cpu: "2"
ephemeral-storage: 20931216Ki
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 3064732Ki
pods: "110"
conditions:
- lastHeartbeatTime: "2020-07-16T10:02:20Z"
lastTransitionTime: "2020-07-16T10:02:20Z"
message: Calico is running on this node
reason: CalicoIsUp
status: "False"
type: NetworkUnavailable
- lastHeartbeatTime: "2020-07-16T15:46:11Z"
lastTransitionTime: "2020-07-16T09:59:43Z"
message: kubelet has sufficient memory available
reason: KubeletHasSufficientMemory
status: "False"
type: MemoryPressure
- lastHeartbeatTime: "2020-07-16T15:46:11Z"
lastTransitionTime: "2020-07-16T09:59:43Z"
message: kubelet has no disk pressure
reason: KubeletHasNoDiskPressure
status: "False"
type: DiskPressure
- lastHeartbeatTime: "2020-07-16T15:46:11Z"
lastTransitionTime: "2020-07-16T09:59:43Z"
message: kubelet has sufficient PID available
reason: KubeletHasSufficientPID
status: "False"
type: PIDPressure
- lastHeartbeatTime: "2020-07-16T15:46:11Z"
lastTransitionTime: "2020-07-16T10:19:44Z"
message: kubelet is posting ready status. AppArmor enabled
reason: KubeletReady
status: "True"
type: Ready
daemonEndpoints:
kubeletEndpoint:
Port: 10250
nodeInfo:
architecture: amd64
bootID: fe410ed3-2825-4f94-a9f9-08dc5e6a955e
containerRuntimeVersion: docker://19.3.11
kernelVersion: 4.12.14-197.45-default
kubeProxyVersion: v1.18.5
kubeletVersion: v1.18.5
machineID: 00324054405e4faea3bfd8509d511ded
operatingSystem: linux
systemUUID: 00324054-405e-4fae-a3bf-d8509d511ded
和吊舱:
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: "2020-07-16T10:13:35Z"
generateName: pm-node-exporter-
labels:
controller-revision-hash: 6466d9c7b
pod-template-generation: "1"
name: pm-node-exporter-mn9vj
namespace: monitoring
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: DaemonSet
name: pm-node-exporter
uid: 5855a26f-a57e-4b0e-93f2-461c19c477e1
resourceVersion: "5239"
selfLink: /api/v1/namespaces/monitoring/pods/pm-node-exporter-mn9vj
uid: 0db09c9c-1618-4454-94fa-138e55e5ebd7
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchFields:
- key: metadata.name
operator: In
values:
- master-0-bug
containers:
- args:
- --path.procfs=/host/proc
- --path.sysfs=/host/sys
image: ***
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /
port: 9100
scheme: HTTP
initialDelaySeconds: 5
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 1
name: pm-node-exporter
ports:
- containerPort: 9100
hostPort: 9100
name: metrics
protocol: TCP
resources:
limits:
cpu: 200m
memory: 150Mi
requests:
cpu: 100m
memory: 100Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /host/proc
name: proc
readOnly: true
- mountPath: /host/sys
name: sys
readOnly: true
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: pm-node-exporter-token-csllf
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
hostNetwork: true
hostPID: true
nodeSelector:
node-role.kubernetes.io/master: ""
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: pm-node-exporter
serviceAccountName: pm-node-exporter
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/master
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/disk-pressure
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/memory-pressure
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/pid-pressure
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/unschedulable
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/network-unavailable
operator: Exists
volumes:
- hostPath:
path: /proc
type: ""
name: proc
- hostPath:
path: /sys
type: ""
name: sys
- name: pm-node-exporter-token-csllf
secret:
defaultMode: 420
secretName: pm-node-exporter-token-csllf
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2020-07-16T10:13:35Z"
message: '0/6 nodes are available: 2 node(s) didn''t have free ports for the requested
pod ports, 3 node(s) didn''t match node selector.'
reason: Unschedulable
status: "False"
type: PodScheduled
phase: Pending
qosClass: Burstable
非常感谢您提供的所有信息。 @nodo你能接受吗?
/救命
如果发现错误, @ maelk随时接受并提交PR。 您添加的日志行可能会有所帮助。 否则,我将向贡献者开放。
@alculquicondor :
该请求已被标记为需要贡献者的帮助。
请确保该请求满足此处列出的要求。
如果此请求不再满足这些要求,则可以将标签除去
通过使用/remove-help
命令进行注释。
针对此:
/救命
如果发现错误, @ maelk随时接受并提交PR。 您添加的日志行可能会有所帮助。 否则,我将向贡献者开放。
可在此处获得使用PR注释与我互动的说明。 如果您对我的行为有任何疑问或建议,请针对kubernetes / test-infra存储库提出问题。
/分配
@maelk第一次出现此问题时,计时有什么特别的吗? 例如,是否在节点启动后立即发生?
不,有很多豆荚可以安排在那里并且运行良好。 但是一旦发生问题,就无法再安排任何时间了。
降低优先级,直到我们得到可复制的案例。
我们能够使用具有其他日志条目的调度程序来重现该错误。 我们看到的是,其中一个主节点完全从迭代的节点列表中消失了。 我们可以看到该过程从6个节点开始(从快照开始):
I0720 13:58:28.246507 1 generic_scheduler.go:441] Looking for a node for kube-system/coredns-cd64c8d7c-tcxbq, going through []*nodeinfo.NodeInfo{(*nodeinfo.NodeInfo)(0xc000326a90), (*nodeinfo.NodeInfo)(0xc000952000), (*nodeinfo.NodeInfo)(0xc0007d08f0), (*nodeinfo.NodeInfo)(0xc0004f35f0), (*nodeinfo.NodeInfo)(0xc000607040), (*nodeinfo.NodeInfo)(0xc000952000)}
但是之后,我们可以看到它仅在5个节点上进行迭代,然后得到:
I0720 13:58:28.247420 1 generic_scheduler.go:505] pod kube-system/coredns-cd64c8d7c-tcxbq : processed 5 nodes, 0 fit
因此,将节点之一从潜在节点列表中删除。 不幸的是,在过程开始时我们没有足够的日志记录,但是我们将尝试获取更多日志。
通过日志行引用代码:
ma
您是否看到%v/%v on node %v, too many nodes fit
任何行?
否则, @ pancernik您可以检查workqueue.ParallelizeUntil(ctx, 16, len(allNodes), checkNode)
吗?
不,该日志没有出现。 我还认为这可能是因为我们对并行化存在问题,或者该节点较早被过滤掉了。 如果失败并出现错误,请执行以下操作: https :
我只是意识到一个节点要经过两次过滤!
日志是:
I0720 13:58:28.246507 1 generic_scheduler.go:441] Looking for a node for kube-system/coredns-cd64c8d7c-tcxbq, going through []*nodeinfo.NodeInfo{(*nodeinfo.NodeInfo)(0xc000326a90), (*nodeinfo.NodeInfo)(0xc000952000), (*nodeinfo.NodeInfo)(0xc0007d08f0), (*nodeinfo.NodeInfo)(0xc0004f35f0), (*nodeinfo.NodeInfo)(0xc000607040), (*nodeinfo.NodeInfo)(0xc000952000)}
I0720 13:58:28.246793 1 generic_scheduler.go:469] pod kube-system/coredns-cd64c8d7c-tcxbq on node worker-pool1-60846k0y-scheduler, fits: false, status: &v1alpha1.Status{code:3, reasons:[]string{"node(s) didn't match node selector"}}
I0720 13:58:28.246970 1 generic_scheduler.go:483] pod kube-system/coredns-cd64c8d7c-tcxbq on node worker-pool1-60846k0y-scheduler : status is not success
I0720 13:58:28.246819 1 taint_toleration.go:71] Checking taints for pod kube-system/coredns-cd64c8d7c-tcxbq for node master-0-scheduler : taints : []v1.Taint{v1.Taint{Key:"node-role.kubernetes.io/master", Value:"", Effect:"NoSchedule", TimeAdded:(*v1.Time)(nil)}} and tolerations: []v1.Toleration{v1.Toleration{Key:"node-role.kubernetes.io/master", Operator:"Exists", Value:"", Effect:"NoSchedule", TolerationSeconds:(*int64)(nil)}, v1.Toleration{Key:"CriticalAddonsOnly", Operator:"Exists", Value:"", Effect:"NoSchedule", TolerationSeconds:(*int64)(nil)}, v1.Toleration{Key:"node-role.kubernetes.io/master", Operator:"Exists", Value:"", Effect:"NoExecute", TolerationSeconds:(*int64)(nil)}, v1.Toleration{Key:"node-role.kubernetes.io/not-ready", Operator:"Exists", Value:"", Effect:"NoSchedule", TolerationSeconds:(*int64)(nil)}, v1.Toleration{Key:"node.kubernetes.io/not-ready", Operator:"Exists", Value:"", Effect:"NoExecute", TolerationSeconds:(*int64)(0xc000d40d90)}, v1.Toleration{Key:"node.kubernetes.io/unreachable", Operator:"Exists", Value:"", Effect:"NoExecute", TolerationSeconds:(*int64)(0xc000d40db0)}}
I0720 13:58:28.247019 1 taint_toleration.go:71] Checking taints for pod kube-system/coredns-cd64c8d7c-tcxbq for node master-2-scheduler : taints : []v1.Taint{v1.Taint{Key:"node-role.kubernetes.io/master", Value:"", Effect:"NoSchedule", TimeAdded:(*v1.Time)(nil)}} and tolerations: []v1.Toleration{v1.Toleration{Key:"node-role.kubernetes.io/master", Operator:"Exists", Value:"", Effect:"NoSchedule", TolerationSeconds:(*int64)(nil)}, v1.Toleration{Key:"CriticalAddonsOnly", Operator:"Exists", Value:"", Effect:"NoSchedule", TolerationSeconds:(*int64)(nil)}, v1.Toleration{Key:"node-role.kubernetes.io/master", Operator:"Exists", Value:"", Effect:"NoExecute", TolerationSeconds:(*int64)(nil)}, v1.Toleration{Key:"node-role.kubernetes.io/not-ready", Operator:"Exists", Value:"", Effect:"NoSchedule", TolerationSeconds:(*int64)(nil)}, v1.Toleration{Key:"node.kubernetes.io/not-ready", Operator:"Exists", Value:"", Effect:"NoExecute", TolerationSeconds:(*int64)(0xc000d40d90)}, v1.Toleration{Key:"node.kubernetes.io/unreachable", Operator:"Exists", Value:"", Effect:"NoExecute", TolerationSeconds:(*int64)(0xc000d40db0)}}
I0720 13:58:28.247144 1 generic_scheduler.go:469] pod kube-system/coredns-cd64c8d7c-tcxbq on node master-2-scheduler, fits: false, status: &v1alpha1.Status{code:2, reasons:[]string{"node(s) didn't match pod affinity/anti-affinity", "node(s) didn't satisfy existing pods anti-affinity rules"}}
I0720 13:58:28.247172 1 generic_scheduler.go:483] pod kube-system/coredns-cd64c8d7c-tcxbq on node master-2-scheduler : status is not success
I0720 13:58:28.247210 1 generic_scheduler.go:469] pod kube-system/coredns-cd64c8d7c-tcxbq on node worker-pool1-7dt1xd4k-scheduler, fits: false, status: &v1alpha1.Status{code:3, reasons:[]string{"node(s) didn't match node selector"}}
I0720 13:58:28.247231 1 generic_scheduler.go:483] pod kube-system/coredns-cd64c8d7c-tcxbq on node worker-pool1-7dt1xd4k-scheduler : status is not success
I0720 13:58:28.247206 1 generic_scheduler.go:469] pod kube-system/coredns-cd64c8d7c-tcxbq on node worker-pool1-60846k0y-scheduler, fits: false, status: &v1alpha1.Status{code:3, reasons:[]string{"node(s) didn't match node selector"}}
I0720 13:58:28.247297 1 generic_scheduler.go:483] pod kube-system/coredns-cd64c8d7c-tcxbq on node worker-pool1-60846k0y-scheduler : status is not success
I0720 13:58:28.247246 1 generic_scheduler.go:469] pod kube-system/coredns-cd64c8d7c-tcxbq on node worker-pool1-hyk0hg7r-scheduler, fits: false, status: &v1alpha1.Status{code:3, reasons:[]string{"node(s) didn't match node selector"}}
I0720 13:58:28.247340 1 generic_scheduler.go:483] pod kube-system/coredns-cd64c8d7c-tcxbq on node worker-pool1-hyk0hg7r-scheduler : status is not success
I0720 13:58:28.247147 1 generic_scheduler.go:469] pod kube-system/coredns-cd64c8d7c-tcxbq on node master-0-scheduler, fits: false, status: &v1alpha1.Status{code:2, reasons:[]string{"node(s) didn't match pod affinity/anti-affinity", "node(s) didn't satisfy existing pods anti-affinity rules"}}
I0720 13:58:28.247375 1 generic_scheduler.go:483] pod kube-system/coredns-cd64c8d7c-tcxbq on node master-0-scheduler : status is not success
I0720 13:58:28.247420 1 generic_scheduler.go:505] pod kube-system/coredns-cd64c8d7c-tcxbq : processed 5 nodes, 0 fit
I0720 13:58:28.247461 1 generic_scheduler.go:430] pod kube-system/coredns-cd64c8d7c-tcxbq After scheduling, filtered: []*v1.Node{}, filtered nodes: v1alpha1.NodeToStatusMap{"master-0-scheduler":(*v1alpha1.Status)(0xc000d824a0), "master-2-scheduler":(*v1alpha1.Status)(0xc000b736c0), "worker-pool1-60846k0y-scheduler":(*v1alpha1.Status)(0xc000d825a0), "worker-pool1-7dt1xd4k-scheduler":(*v1alpha1.Status)(0xc000b737e0), "worker-pool1-hyk0hg7r-scheduler":(*v1alpha1.Status)(0xc000b738c0)}
I0720 13:58:28.247527 1 generic_scheduler.go:185] Pod kube-system/coredns-cd64c8d7c-tcxbq failed scheduling:
nodes snapshot: &cache.Snapshot{nodeInfoMap:map[string]*nodeinfo.NodeInfo{"master-0-scheduler":(*nodeinfo.NodeInfo)(0xc000607040), "master-1-scheduler":(*nodeinfo.NodeInfo)(0xc0001071e0), "master-2-scheduler":(*nodeinfo.NodeInfo)(0xc000326a90), "worker-pool1-60846k0y-scheduler":(*nodeinfo.NodeInfo)(0xc000952000), "worker-pool1-7dt1xd4k-scheduler":(*nodeinfo.NodeInfo)(0xc0007d08f0), "worker-pool1-hyk0hg7r-scheduler":(*nodeinfo.NodeInfo)(0xc0004f35f0)}, nodeInfoList:[]*nodeinfo.NodeInfo{(*nodeinfo.NodeInfo)(0xc000326a90), (*nodeinfo.NodeInfo)(0xc000952000), (*nodeinfo.NodeInfo)(0xc0007d08f0), (*nodeinfo.NodeInfo)(0xc0004f35f0), (*nodeinfo.NodeInfo)(0xc000607040), (*nodeinfo.NodeInfo)(0xc000952000)}, havePodsWithAffinityNodeInfoList:[]*nodeinfo.NodeInfo{(*nodeinfo.NodeInfo)(0xc000326a90), (*nodeinfo.NodeInfo)(0xc000607040)}, generation:857}
statuses: v1alpha1.NodeToStatusMap{"master-0-scheduler":(*v1alpha1.Status)(0xc000d824a0), "master-2-scheduler":(*v1alpha1.Status)(0xc000b736c0), "worker-pool1-60846k0y-scheduler":(*v1alpha1.Status)(0xc000d825a0), "worker-pool1-7dt1xd4k-scheduler":(*v1alpha1.Status)(0xc000b737e0), "worker-pool1-hyk0hg7r-scheduler":(*v1alpha1.Status)(0xc000b738c0)}
如您所见,节点worker-pool1-60846k0y-scheduler通过过滤两次
不,该日志没有出现。 我还认为这可能是因为我们对并行化存在问题,或者该节点较早被过滤掉了。 如果失败并出现错误,请执行以下操作: Nordix @ 5c00cdf#diff -c237cdd9e4cb201118ca380732d7f361R464在日志afaik中将可见,因此我将尝试围绕功能和并行化添加更多调试项。
是的,那里的错误将在pod事件中显示为“调度错误”。
我只是意识到一个节点要经过两次过滤!
老实说,我不认为并行化存在错误(仍然值得检查),但这可能是一个迹象,表明我们无法通过从缓存中构建快照(从缓存转储中看到,缓存是正确的)来构建快照节点两次。 由于状态是一个映射,因此有意义的是,我们仅在最后一个日志行“看到” 5个节点。
抄送@ ahg-g
我将尝试在调度程序的缓存部分添加大量日志,特别是在节点添加和更新以及快照周围。 但是,从日志的最后一行,您可以看到快照实际上是正确的,并且包含所有节点,因此以后处理该快照时似乎会发生任何事情
缓存!=快照
缓存是从事件中更新的生物。 在每个调度周期之前(从缓存中)更新快照以“锁定”状态。 我们添加了优化措施,以使最后一个过程尽可能快。 该错误很可能在那里。
谢谢@maelk! 这非常有用。 您的日志表明(*nodeinfo.NodeInfo)(0xc000952000)
在执行任何并行代码之前已在https://github.com/Nordix/kubernetes/commit/5c00cdf195fa61316f963f59e73c6cafc2ad9bdc#diff -c237cdd9e4cb201118ca380732d7f361R441中的列表中重复。 这确实意味着在快照更新之前将其复制。
实际上,这来自快照,发生在以下日志消息之前: https : https://github.com/Nordix/kubernetes/commit/5c00cdf195fa61316f963f59e73c6cafc2ad9bdc#diff -c237cdd9e4cb201118ca380732d7f361R436
那就对了。 我的意思是在快照更新完成之前它已经被复制了。
那就对了。 我的意思是在快照更新完成之前它已经被复制了。
否,在计划周期开始时更新快照。 该错误是在快照更新期间或之前。 但是根据https://github.com/kubernetes/kubernetes/issues/91601#issuecomment -659465008中的转储,缓存是正确的
编辑:我看错了,我没有看到“完成”这个词:)
PR优化更新快照是在1.18中完成的: https : https://github.com/kubernetes/kubernetes/pull/86919
我想知道节点树是否也有重复的记录
我想知道节点树是否也有重复的记录
@maelk您可以显示缓存中完整节点列表的转储吗?
我们不从NodeInfoList中添加/删除项目,而是从树中创建或不从树中创建完整列表,因此,我认为如果存在重复项,则这些重复项很可能来自树。
只是澄清一下:
1)集群有6个节点(包括主节点)
2)根本没有检查应该承载Pod的节点(没有日志行指示该节点),这可能意味着它根本不在NodeInfoList中
3)NodeInfoList有6个节点,但其中之一是重复的
我想知道节点树是否也有重复的记录
@maelk您可以显示缓存中完整节点列表的转储吗?
每个节点树,列表和映射的转储都很棒。
我将继续努力。 同时,进行了一次小更新。 我们可以在日志中看到:
I0720 13:37:30.530980 1 node_tree.go:100] Removed node "worker-pool1-60846k0y-scheduler" in group "" from NodeTree
I0720 13:37:30.531136 1 node_tree.go:86] Added node "worker-pool1-60846k0y-scheduler" in group "regionOne:\x00:nova" to NodeTree
这就是丢失的节点消失的确切时间。 日志中最后一次出现的时间是13:37:24。 在下一个调度中,丢失的节点消失了。 因此,该错误似乎在/跟随node_tree的更新。 所有节点都经历了该更新,只是该工作者608是经历该更新的最后一个。
转储高速缓存(使用SIGUSR2)时,所有六个节点都在此处列出,并且容器在节点上运行,没有重复或丢失节点。
我们将围绕快照功能添加调试功能进行新的尝试: https :
从节点树中删除了“”组中的节点“ worker-pool1-60846k0y-scheduler”
有趣的是,我认为删除/添加是由updateNode调用触发的。 区域键在删除项上丢失,但是在添加项上存在,因此更新基本上是在添加区域和区域标签吗?
您是否还有与此节点相关的其他调度程序日志?
我们正在尝试使用添加的日志记录来重现该错误。 我有更多信息时会回来
我将继续努力。 同时,进行了一次小更新。 我们可以在日志中看到:
I0720 13:37:30.530980 1 node_tree.go:100] Removed node "worker-pool1-60846k0y-scheduler" in group "" from NodeTree I0720 13:37:30.531136 1 node_tree.go:86] Added node "worker-pool1-60846k0y-scheduler" in group "regionOne:\x00:nova" to NodeTree
我要指出的是,这样的节点就是重复的节点。 @maelk ,您是否看到了其他节点的类似消息? 与@ ahg-g一样,应该在节点第一次收到其拓扑标签时可以预期到这一点。
是的,它发生在所有节点上,并且可以预期。 巧合的是,这个节点是最后一个更新的节点,而恰恰在那个时间,另一个节点丢失了
您是否获得缺少节点的更新日志?
您是否获得缺少节点的更新日志?
大声笑,只是在输入这个问题。
可能的错误是,在删除所有节点之前,已从树中删除了整个区域。
为了澄清起见,我并不是亲自查看代码,而是试图确保我们拥有所有信息。 而且我认为,利用我们现在拥有的,我们应该能够发现该错误。 随意提交PR,如果可以提供失败的单元测试,那就更好了。
您是否获得缺少节点的更新日志?
是的,它表明该区域已针对该丢失的节点进行了更新。 所有节点都有一个日志条目
老实说,我仍然不知道该错误的原因,但是如果我们能尽快发现它,我将提交PR或单元测试。
是的,它表明该区域已针对该丢失的节点进行了更新。 所有节点都有一个日志条目
如果是这样,那么我想假设这是“丢失的节点消失时的确切点”。 可能不相关。 让我们等待新的日志。 如果您可以共享在文件中获得的所有调度程序日志,那就太好了。
当我们使用新的日志记录进行复制时,我会做的。 从现有更新中,我们实际上可以看到,更新之后立即进行的pod调度是第一个失败的更新。 但这并不能提供足够的信息来知道两者之间发生了什么,因此请继续关注...
@maelk您是否在调度程序日志中看到以snapshot state is not consistent
开头的消息?
您能否提供完整的调度程序日志?
不,该消息不存在。 我可以提供条带化的日志文件(以避免重复),但让我们首先等待,直到输出的快照周围包含更多日志
我发现了错误。 问题在于nodeTree next()函数,在某些情况下,该函数不会返回所有节点的列表。 https://github.com/kubernetes/kubernetes/blob/release-1.18/pkg/scheduler/internal/cache/node_tree.go#L147
如果在此处添加以下内容,则可见: https :
{
name: "add nodes to a new and to an exhausted zone",
nodesToAdd: append(allNodes[5:9], allNodes[3]),
nodesToRemove: nil,
operations: []string{"add", "add", "next", "next", "add", "add", "add", "next", "next", "next", "next"},
expectedOutput: []string{"node-6", "node-7", "node-3", "node-8", "node-6", "node-7"},
},
主要问题是,当添加节点时,某些区域的索引不为0。 为此,您必须至少有两个区域,一个区域比另一个区域短,而较长的区域在第一次调用下一个函数时索引未设置为0。
我要解决的问题是在第一次调用next()之前重置索引。 我打开了一个PR以显示我的解决方法。 当然,它与1.18版本不符,因为这是我一直在研究的内容,但主要用于讨论如何修复它(或修复next()函数本身)。 我可以向主管理员开放适当的PR,然后在需要时进行反向移植。
我注意到迭代存在相同的问题。 但是我无法将其链接到快照中的副本。 @maelk,您是否设法创建了一种可能发生这种情况的方案?
是的,您可以通过添加我放入的小代码在单元测试中运行它
我现在正在为快照添加测试用例,以确保已正确测试了该用例。
@igraecao非常感谢您在重现问题和在其设置中运行测试方面的帮助
感谢大家调试这个臭名昭著的问题。 在创建列表之前重置索引是安全的,因此我认为我们应该使用1.18和1.19补丁程序的索引,并在master分支中进行适当的修复。
next
函数的用途随着NodeInfoList的引入而改变,因此我们可以简化它,甚至可以将其更改为toList
,该函数从树中创建一个列表并简单地启动从每一次开始。
现在,我明白了这个问题:计算区域是否已用尽是错误的,因为它没有考虑我们在每个区域中的哪个位置启动了“ UpdateSnapshot”过程。 是的,只有在不平坦的区域才能看到。
发现这个@maelk真是太好了!
我想我们在旧版本中也有同样的问题。 但是,由于我们每次都会经过一棵树,因此它被隐藏了。 而在1.18中,我们将结果快照化,直到树中发生更改为止。
既然轮循策略是在generic_scheduler.go中实现的,那么像您的PR所做的那样,只需在UpdateSnapshot之前简单地重置所有计数器就可以了。
只是仔细检查@ ahg-g,即使在群集中始终都添加/删除新节点的情况下也可以,对吗?
感谢@maelk找出根本原因!
下一个函数的用途随着NodeInfoList的引入而改变,因此我们可以简化它,也可以将其更改为toList,该函数从树中创建一个列表,每次都从头开始。
鉴于仅在构建快照nodeInfoList中调用cache.nodeTree.next()
,我认为从nodeTree结构中删除索引(zoneIndex和nodeIndex)也是安全的。 相反,提出一个简单的nodeIterator()
函数以循环方式遍历其区域/节点。
顺便说一句: https: //github.com/kubernetes/kubernetes/issues/91601#issuecomment -662663090中有一个错字,情况应该是:
{
name: "add nodes to a new and to an exhausted zone",
nodesToAdd: append(allNodes[6:9], allNodes[3]),
nodesToRemove: nil,
operations: []string{"add", "add", "next", "next", "add", "add", "next", "next", "next", "next"},
expectedOutput: []string{"node-6", "node-7", "node-3", "node-8", "node-6", "node-7"},
// with codecase on master and 1.18, its output is [node-6 node-7 node-3 node-8 node-6 node-3]
},
只是仔细检查@ ahg-g,即使在群集中始终都添加/删除新节点的情况下也可以,对吗?
我假设您正在谈论generic_scheduler.go中的逻辑,如果是的话,添加或删除节点并不重要,我们需要避免的主要事情是每次都以相同的顺序遍历节点我们计划一个Pod,只需要在Pod上的节点上进行迭代就可以了。
鉴于只在构建快照nodeInfoList时才调用cache.nodeTree.next(),我认为从nodeTree结构中删除索引(zoneIndex和nodeIndex)也是安全的。 相反,想出一个简单的nodeIterator()函数以循环方式遍历其区域/节点。
是的,我们只需要每次都以相同的顺序遍历所有区域/节点。
我已经使用单元测试更新了PR,以用于更新快照列表的功能,特别是针对该bug。 我还可以照顾到重构next()函数以在没有循环的情况下遍历区域和节点,从而消除了问题。
谢谢,听起来不错,但是我们还是应该像现在这样,通过设计在区域之间进行迭代。
我真的不明白你的意思。 是因为节点的顺序很重要,而且我们仍然必须在区域之间进行轮询,还是可以列出一个区域的所有节点,一个区域在另一个区域之后? 假设您有两个区域,每个区域有两个节点,您希望它们按什么顺序排列,或者根本不重要?
顺序很重要,我们需要在创建列表时在区域之间交替。 如果您有两个包含两个节点的区域,每个区域分别z1: {n11, n12}
和z2: {n21, n22}
,则列表应为{n11, n21, n12, n22}
好的,谢谢,我会考虑的。 我们可以同时进行快速修复吗? 顺便说一句,一些测试失败了,但我不确定这与我的PR有何关系
那些是片状的。 请同时发送补丁到1.18。
好的,我会做。 谢谢
{ name: "add nodes to a new and to an exhausted zone", nodesToAdd: append(allNodes[5:9], allNodes[3]), nodesToRemove: nil, operations: []string{"add", "add", "next", "next", "add", "add", "add", "next", "next", "next", "next"}, expectedOutput: []string{"node-6", "node-7", "node-3", "node-8", "node-6", "node-7"}, },
@maelk ,您是说该测试忽略了“ node-5”吗?
在固定https://github.com/kubernetes/kubernetes/pull/93516中的追加后,我发现,可以迭代所有节点的测试结果:
{
name: "add nodes to a new and to an exhausted zone",
nodesToAdd: append(append(make([]*v1.Node, 0), allNodes[5:9]...), allNodes[3]),
nodesToRemove: nil,
operations: []string{"add", "add", "next", "next", "add", "add", "add", "next", "next", "next", "next"},
expectedOutput: []string{"node-5", "node-6", "node-3", "node-7", "node-8", "node-5"},
},
节点5、6、7、8、3可以迭代。
如果我误解了这里,请原谅我。
是的,它是有意的,基于那里的内容,但我可以看到它是如何神秘的,因此最好将其制成,以使附件的行为更清晰。 感谢您的补丁。
您认为这个错误存在多久了? 1.17? 1.16? 我刚刚在AWS上的1.17中看到了完全相同的问题,并且重新启动未计划的节点解决了该问题。
@judgeaxl您能否提供更多详细信息? 日志行,缓存转储等。因此,我们可以确定问题是否相同。
正如我在https://github.com/kubernetes/kubernetes/issues/91601#issuecomment -662746695中指出的那样,我相信此错误存在于较旧的版本中,但我认为这是暂时的。
@maelk您可以调查吗?
还请共享区域中节点的分布。
不幸的是, @ alculquicondor我现在不能。 抱歉。
@alculquicondor抱歉,由于其他原因,我已经重建了群集,但是它可能是与多az部署有关的网络配置问题,以及在哪个子网中启动了故障节点,因此,我现在不必担心它此问题的背景。 如果我再次注意到它,我会报告更详细的信息。 谢谢!
/ retitle当区域不平衡时,在调度中不考虑某些节点
最有用的评论
我现在正在为快照添加测试用例,以确保已正确测试了该用例。