Kubernetes: 区域不平衡时在调度中不考虑某些节点

创建于 2020-05-30 · 129评论 · 资料来源: kubernetes/kubernetes

发生了什么：我们将15个kubernetes集群从1.17.5升级到1.18.2 / 1.18.3，开始看到守护程序集不再正常工作。

问题是所有守护程序吊舱都未设置。它将向事件返回以下错误消息：

Events:
  Type     Reason            Age               From               Message
  ----     ------            ----              ----               -------
  Warning  FailedScheduling  9s (x5 over 71s)  default-scheduler  0/13 nodes are available: 12 node(s) didn't match node selector.

但是，所有节点均可用，并且没有节点选择器。节点也没有污点。

守护程序https://gist.github.com/zetaab/4a605cb3e15e349934cb7db29ec72bd8

% kubectl get nodes
NAME                                   STATUS   ROLES    AGE   VERSION
e2etest-1-kaasprod-k8s-local           Ready    node     46h   v1.18.3
e2etest-2-kaasprod-k8s-local           Ready    node     46h   v1.18.3
e2etest-3-kaasprod-k8s-local           Ready    node     44h   v1.18.3
e2etest-4-kaasprod-k8s-local           Ready    node     44h   v1.18.3
master-zone-1-1-1-kaasprod-k8s-local   Ready    master   47h   v1.18.3
master-zone-2-1-1-kaasprod-k8s-local   Ready    master   47h   v1.18.3
master-zone-3-1-1-kaasprod-k8s-local   Ready    master   47h   v1.18.3
nodes-z1-1-kaasprod-k8s-local          Ready    node     47h   v1.18.3
nodes-z1-2-kaasprod-k8s-local          Ready    node     47h   v1.18.3
nodes-z2-1-kaasprod-k8s-local          Ready    node     46h   v1.18.3
nodes-z2-2-kaasprod-k8s-local          Ready    node     46h   v1.18.3
nodes-z3-1-kaasprod-k8s-local          Ready    node     47h   v1.18.3
nodes-z3-2-kaasprod-k8s-local          Ready    node     46h   v1.18.3

% kubectl get pods -n weave -l weave-scope-component=agent -o wide
NAME                      READY   STATUS    RESTARTS   AGE     IP           NODE                                   NOMINATED NODE   READINESS GATES
weave-scope-agent-2drzw   1/1     Running   0          26h     10.1.32.23   e2etest-1-kaasprod-k8s-local           <none>           <none>
weave-scope-agent-4kpxc   1/1     Running   3          26h     10.1.32.12   nodes-z1-2-kaasprod-k8s-local          <none>           <none>
weave-scope-agent-78n7r   1/1     Running   0          26h     10.1.32.7    e2etest-4-kaasprod-k8s-local           <none>           <none>
weave-scope-agent-9m4n8   1/1     Running   0          26h     10.1.96.4    master-zone-1-1-1-kaasprod-k8s-local   <none>           <none>
weave-scope-agent-b2gnk   1/1     Running   1          26h     10.1.96.12   master-zone-3-1-1-kaasprod-k8s-local   <none>           <none>
weave-scope-agent-blwtx   1/1     Running   2          26h     10.1.32.20   nodes-z1-1-kaasprod-k8s-local          <none>           <none>
weave-scope-agent-cbhjg   1/1     Running   0          26h     10.1.64.15   e2etest-2-kaasprod-k8s-local           <none>           <none>
weave-scope-agent-csp49   1/1     Running   0          26h     10.1.96.14   e2etest-3-kaasprod-k8s-local           <none>           <none>
weave-scope-agent-g4k2x   1/1     Running   1          26h     10.1.64.10   nodes-z2-2-kaasprod-k8s-local          <none>           <none>
weave-scope-agent-kx85h   1/1     Running   2          26h     10.1.96.6    nodes-z3-1-kaasprod-k8s-local          <none>           <none>
weave-scope-agent-lllqc   0/1     Pending   0          5m56s   <none>       <none>                                 <none>           <none>
weave-scope-agent-nls2h   1/1     Running   0          26h     10.1.96.17   master-zone-2-1-1-kaasprod-k8s-local   <none>           <none>
weave-scope-agent-p8njs   1/1     Running   2          26h     10.1.96.19   nodes-z3-2-kaasprod-k8s-local          <none>           <none>

我试图重新启动apiserver / schedulers / controller-managers，但这无济于事。我也尝试过重新启动被卡住的单个节点（nodes-z2-1-kaasprod-k8s-local），但它也无济于事。仅删除该节点并重新创建它会有所帮助。

% kubectl describe node nodes-z2-1-kaasprod-k8s-local
Name:               nodes-z2-1-kaasprod-k8s-local
Roles:              node
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=59cf4871-de1b-4294-9e9f-2ea7ca4b771f
                    beta.kubernetes.io/os=linux
                    failure-domain.beta.kubernetes.io/region=regionOne
                    failure-domain.beta.kubernetes.io/zone=zone-2
                    kops.k8s.io/instancegroup=nodes-z2
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=nodes-z2-1-kaasprod-k8s-local
                    kubernetes.io/os=linux
                    kubernetes.io/role=node
                    node-role.kubernetes.io/node=
                    node.kubernetes.io/instance-type=59cf4871-de1b-4294-9e9f-2ea7ca4b771f
                    topology.cinder.csi.openstack.org/zone=zone-2
                    topology.kubernetes.io/region=regionOne
                    topology.kubernetes.io/zone=zone-2
Annotations:        csi.volume.kubernetes.io/nodeid: {"cinder.csi.openstack.org":"faf14d22-010f-494a-9b34-888bdad1d2df"}
                    node.alpha.kubernetes.io/ttl: 0
                    projectcalico.org/IPv4Address: 10.1.64.32/19
                    projectcalico.org/IPv4IPIPTunnelAddr: 100.98.136.0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Thu, 28 May 2020 13:28:24 +0300
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  nodes-z2-1-kaasprod-k8s-local
  AcquireTime:     <unset>
  RenewTime:       Sat, 30 May 2020 12:02:13 +0300
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Fri, 29 May 2020 09:40:51 +0300   Fri, 29 May 2020 09:40:51 +0300   CalicoIsUp                   Calico is running on this node
  MemoryPressure       False   Sat, 30 May 2020 11:59:53 +0300   Fri, 29 May 2020 09:40:45 +0300   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Sat, 30 May 2020 11:59:53 +0300   Fri, 29 May 2020 09:40:45 +0300   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Sat, 30 May 2020 11:59:53 +0300   Fri, 29 May 2020 09:40:45 +0300   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Sat, 30 May 2020 11:59:53 +0300   Fri, 29 May 2020 09:40:45 +0300   KubeletReady                 kubelet is posting ready status. AppArmor enabled
Addresses:
  InternalIP:  10.1.64.32
  Hostname:    nodes-z2-1-kaasprod-k8s-local
Capacity:
  cpu:                4
  ephemeral-storage:  10287360Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             8172420Ki
  pods:               110
Allocatable:
  cpu:                4
  ephemeral-storage:  9480830961
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             8070020Ki
  pods:               110
System Info:
  Machine ID:                 c94284656ff04cf090852c1ddee7bcc2
  System UUID:                faf14d22-010f-494a-9b34-888bdad1d2df
  Boot ID:                    295dc3d9-0a90-49ee-92f3-9be45f2f8e3d
  Kernel Version:             4.19.0-8-cloud-amd64
  OS Image:                   Debian GNU/Linux 10 (buster)
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  docker://19.3.8
  Kubelet Version:            v1.18.3
  Kube-Proxy Version:         v1.18.3
PodCIDR:                      100.96.12.0/24
PodCIDRs:                     100.96.12.0/24
ProviderID:                   openstack:///faf14d22-010f-494a-9b34-888bdad1d2df
Non-terminated Pods:          (3 in total)
  Namespace                   Name                                        CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                   ----                                        ------------  ----------  ---------------  -------------  ---
  kube-system                 calico-node-77pqs                           100m (2%)     200m (5%)   100Mi (1%)       100Mi (1%)     46h
  kube-system                 kube-proxy-nodes-z2-1-kaasprod-k8s-local    100m (2%)     200m (5%)   100Mi (1%)       100Mi (1%)     46h
  volume                      csi-cinder-nodeplugin-5jbvl                 100m (2%)     400m (10%)  200Mi (2%)       200Mi (2%)     46h
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests    Limits
  --------           --------    ------
  cpu                300m (7%)   800m (20%)
  memory             400Mi (5%)  400Mi (5%)
  ephemeral-storage  0 (0%)      0 (0%)
Events:
  Type    Reason                   Age    From                                    Message
  ----    ------                   ----   ----                                    -------
  Normal  Starting                 7m27s  kubelet, nodes-z2-1-kaasprod-k8s-local  Starting kubelet.
  Normal  NodeHasSufficientMemory  7m26s  kubelet, nodes-z2-1-kaasprod-k8s-local  Node nodes-z2-1-kaasprod-k8s-local status is now: NodeHasSufficientMemory
  Normal  NodeHasNoDiskPressure    7m26s  kubelet, nodes-z2-1-kaasprod-k8s-local  Node nodes-z2-1-kaasprod-k8s-local status is now: NodeHasNoDiskPressure
  Normal  NodeHasSufficientPID     7m26s  kubelet, nodes-z2-1-kaasprod-k8s-local  Node nodes-z2-1-kaasprod-k8s-local status is now: NodeHasSufficientPID
  Normal  NodeAllocatableEnforced  7m26s  kubelet, nodes-z2-1-kaasprod-k8s-local  Updated Node Allocatable limit across pods

我们在所有集群中都随机看到了这一点。

您期望发生的事情：我希望守护程序集将提供给所有节点。

如何重现（尽可能最小且尽可能精确） ：真的不知道，安装1.18.x kubernetes并部署daemonset，然后等待几天（？）。

我们还需要知道什么吗？ ：发生这种情况时，我们也无法将任何其他守护程序设置到该节点。就像您看到的日志记录流利的位也丢失了。我在该节点kubelet日志中看不到任何错误，并且像所说的那样，重新启动没有帮助。

% kubectl get ds --all-namespaces
NAMESPACE     NAME                       DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                     AGE
falco         falco-daemonset            13        13        12      13           12          <none>                            337d
kube-system   audit-webhook-deployment   3         3         3       3            3           node-role.kubernetes.io/master=   174d
kube-system   calico-node                13        13        13      13           13          kubernetes.io/os=linux            36d
kube-system   kops-controller            3         3         3       3            3           node-role.kubernetes.io/master=   193d
kube-system   metricbeat                 6         6         5       6            5           <none>                            35d
kube-system   openstack-cloud-provider   3         3         3       3            3           node-role.kubernetes.io/master=   337d
logging       fluent-bit                 13        13        12      13           12          <none>                            337d
monitoring    node-exporter              13        13        12      13           12          kubernetes.io/os=linux            58d
volume        csi-cinder-nodeplugin      6         6         6       6            6           <none>                            239d
weave         weave-scope-agent          13        13        12      13           12          <none>                            193d
weave         weavescope-iowait-plugin   6         6         5       6            5           <none>                            193d

如您所见，大多数守护程序都缺少一个pod

环境：

Kubernetes版本（使用kubectl version ）：1.18.3
云提供商或硬件配置：openstack
操作系统（例如： cat /etc/os-release ）：debian buster
内核（例如uname -a ）：Linux节点-z2-1-kaasprod-k8s-local 4.19.0-8-cloud-amd64＃1 SMP Debian 4.19.98-1 + deb10u1（2020-04-27） x86_64 GNU / Linux
安装工具：kops
网络插件和版本（如果这是与网络相关的错误）：calico
其他：

help wanted kinbug prioritimportant-soon sischeduling

资料来源

zetaab

最有用的评论

我现在正在为快照添加测试用例，以确保已正确测试了该用例。

maelk 于 2020-07-22

🎉1 👍1

所有129条评论

/ sig调度

zetaab 于 2020-05-30

您能否提供从服务器检索到的节点，守护程序集，示例Pod和包含名称空间的完整Yaml？

liggitt 于 2020-05-30

节点：
https://gist.github.com/zetaab/2a7e8d3fe6cb42a617e17abc0fa375f7

守护程序：
https://gist.github.com/zetaab/31bb406c8bd622b3017bf4f468d0154f

示例pod（工作）：
https://gist.github.com/zetaab/814871bec6f2879e371f5bbdc6f2e978

示例广告连播（不安排）：
https://gist.github.com/zetaab/f3488d65486c745af78dbe2e6173fd42

命名空间：
https://gist.github.com/zetaab/4625b759f4e21b50757c79e5072cd7d9

zetaab 于 2020-05-30

DaemonSet窗格使用仅与单个节点匹配的nodeAffinity选择器进行调度，因此应该出现“ 13个中的12个不匹配”消息。

liggitt 于 2020-05-30

我看不出调度程序对pod / node组合不满意的原因……在podspec中没有可能发生冲突的端口，该节点不是不可调度的或受污染的，并且具有足够的资源

liggitt 于 2020-05-30

好的，我重新启动了所有3个调度程序（如果我们可以在其中看到有趣的内容，则将日志级别更改为4）。但是，它解决了这个问题

% kubectl get ds --all-namespaces
NAMESPACE     NAME                       DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                     AGE
falco         falco-daemonset            13        13        13      13           13          <none>                            338d
kube-system   audit-webhook-deployment   3         3         3       3            3           node-role.kubernetes.io/master=   175d
kube-system   calico-node                13        13        13      13           13          kubernetes.io/os=linux            36d
kube-system   kops-controller            3         3         3       3            3           node-role.kubernetes.io/master=   194d
kube-system   metricbeat                 6         6         6       6            6           <none>                            36d
kube-system   openstack-cloud-provider   3         3         3       3            3           node-role.kubernetes.io/master=   338d
logging       fluent-bit                 13        13        13      13           13          <none>                            338d
monitoring    node-exporter              13        13        13      13           13          kubernetes.io/os=linux            59d
volume        csi-cinder-nodeplugin      6         6         6       6            6           <none>                            239d
weave         weave-scope-agent          13        13        13      13           13          <none>                            194d
weave         weavescope-iowait-plugin   6         6         6       6            6           <none>                            194d

现在，所有守护程序都已正确配置。很奇怪，无论如何调度程序似乎有问题

zetaab 于 2020-05-30

cc @ kubernetes / sig-scheduling-bugs @ ahg-g

liggitt 于 2020-05-30

我们在v1.18.3上看到了类似的问题，无法为守护程序容器调度一个节点。
重新启动调度程序会有所帮助。

[root@tesla-cb0434-csfp1-csfp1-control-03 ~]# kubectl get pod -A|grep Pending
kube-system   coredns-vc5ws                                                 0/1     Pending   0          2d16h
kube-system   local-volume-provisioner-mwk88                                0/1     Pending   0          2d16h
kube-system   svcwatcher-ltqb6                                              0/1     Pending   0          2d16h
ncms          bcmt-api-hfzl6                                                0/1     Pending   0          2d16h
ncms          bcmt-yum-repo-589d8bb756-5zbvh                                0/1     Pending   0          2d16h
[root@tesla-cb0434-csfp1-csfp1-control-03 ~]# kubectl get ds -A
NAMESPACE     NAME                       DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                   AGE
kube-system   coredns                    3         3         2       3            2           is_control=true                 2d16h
kube-system   danmep-cleaner             0         0         0       0            0           cbcs.nokia.com/danm_node=true   2d16h
kube-system   kube-proxy                 8         8         8       8            8           <none>                          2d16h
kube-system   local-volume-provisioner   8         8         7       8            7           <none>                          2d16h
kube-system   netwatcher                 0         0         0       0            0           cbcs.nokia.com/danm_node=true   2d16h
kube-system   sriov-device-plugin        0         0         0       0            0           sriov=enabled                   2d16h
kube-system   svcwatcher                 3         3         2       3            2           is_control=true                 2d16h
ncms          bcmt-api                   3         3         0       3            0           is_control=true                 2d16h
[root@tesla-cb0434-csfp1-csfp1-control-03 ~]# kubectl get node
NAME                                  STATUS   ROLES    AGE     VERSION
tesla-cb0434-csfp1-csfp1-control-01   Ready    <none>   2d16h   v1.18.3
tesla-cb0434-csfp1-csfp1-control-02   Ready    <none>   2d16h   v1.18.3
tesla-cb0434-csfp1-csfp1-control-03   Ready    <none>   2d16h   v1.18.3
tesla-cb0434-csfp1-csfp1-edge-01      Ready    <none>   2d16h   v1.18.3
tesla-cb0434-csfp1-csfp1-edge-02      Ready    <none>   2d16h   v1.18.3
tesla-cb0434-csfp1-csfp1-worker-01    Ready    <none>   2d16h   v1.18.3
tesla-cb0434-csfp1-csfp1-worker-02    Ready    <none>   2d16h   v1.18.3
tesla-cb0434-csfp1-csfp1-worker-03    Ready    <none>   2d16h   v1.18.3

jejer 于 2020-06-01

在不知道如何减少的情况下很难调试。您是否有调度失败的调度程序日志的机会？

ahg-g 于 2020-06-01

好吧，我重新启动了所有3个调度程序

我假设其中只有一个被命名为default-scheduler ，对吗？

如果我们可以看到有趣的内容，则将日志级别更改为4

您能否分享您注意到的内容？

ahg-g 于 2020-06-01

将loglevel设置为9，但似乎没有什么更有趣的了，下面的日志正在循环。

I0601 01:45:05.039373       1 generic_scheduler.go:290] Preemption will not help schedule pod kube-system/coredns-vc5ws on any node.
I0601 01:45:05.039437       1 factory.go:462] Unable to schedule kube-system/coredns-vc5ws: no fit: 0/8 nodes are available: 7 node(s) didn't match node selector.; waiting
I0601 01:45:05.039494       1 scheduler.go:776] Updating pod condition for kube-system/coredns-vc5ws to (PodScheduled==False, Reason=Unschedulable)

jejer 于 2020-06-01

是的，我只看到同一行

no fit: 0/8 nodes are available: 7 node(s) didn't match node selector.; waiting

zetaab 于 2020-06-01

奇怪的是，日志消息仅显示了7个节点的结果，例如https://github.com/kubernetes/kubernetes/issues/91340中报告的问题

ahg-g 于 2020-06-01

/ cc @damemi

ahg-g 于 2020-06-01

@ ahg-g这看起来确实与我在此处报告的问题相同，似乎我们有一个过滤器插件可能并不总是报告其错误，或者如果我不得不猜测，某些其他情况会无声地失败

damemi 于 2020-06-01

请注意，在我的问题中，重新启动调度程序也对其进行了修复（也如本线程所述https://github.com/kubernetes/kubernetes/issues/91601#issuecomment-636360092）

我的也是关于守护程序的，所以我认为这是重复的。如果是这种情况，我们可以关闭它并继续在https://github.com/kubernetes/kubernetes/issues/91340中进行讨论

damemi 于 2020-06-01

无论如何，调度程序需要更多详细的日志记录选项，如果没有有关其功能的日志，则不可能调试这些问题

zetaab 于 2020-06-01

👍2

@zetaab +1，调度程序可以对其当前的日志记录功能进行重大改进。这是我一段时间以来一直打算解决的升级问题，我终于在这里为其打开了一个问题： https :

damemi 于 2020-06-01

/分配

我正在调查这个。几个问题可以帮助我缩小范围。我还不能复制。

首先创建的是什么：守护程序集或节点？
您是否使用默认配置文件？

alculquicondor 于 2020-06-05

你有补充剂吗？

alculquicondor 于 2020-06-05

节点在守护程序集之前创建。
假设我们使用了默认配置文件，您指的是哪个配置文件以及如何检查？
没有扩展。

    command:
    - /usr/local/bin/kube-scheduler
    - --address=127.0.0.1
    - --kubeconfig=/etc/kubernetes/kube-scheduler.kubeconfig
    - --profiling=false
    - --v=1

可能影响的另一件事是磁盘性能对于etcd而言不是很好，etcd抱怨操作缓慢。

jejer 于 2020-06-09

是的，这些标志将使调度程序以默认配置文件运行。我会继续寻找。我仍然无法复制。

alculquicondor 于 2020-06-09

还是一无所有...您认为正在使用的其他功能可能会影响到您？异味，港口，其他资源？

alculquicondor 于 2020-06-11

做了一些与此有关的尝试。出现问题后，仍然可以将Pod调度到该节点（没有定义或使用“ nodeName”选择器）。

如果尝试使用亲和力/反亲和力，则pod不会安排到节点。

问题出现时的工作方式：

apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: nginx
  name: nginx
spec:
  nodeName: master-zone-3-1-1-test-cluster-k8s-local
  containers:
    - image: nginx
      name: nginx
      resources: {}
  dnsPolicy: ClusterFirst
  restartPolicy: Always

不能同时工作：

apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: nginx
  name: nginx
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: kubernetes.io/hostname
                operator: In
                values:
                  - master-zone-3-1-1-test-cluster-k8s-local
  containers:
    - image: nginx
      name: nginx
      resources: {}
  dnsPolicy: ClusterFirst
  restartPolicy: Always

同样，当检查后者时，甚至是非常有趣的：

Warning  FailedScheduling  4m37s (x17 over 26m)  default-scheduler  0/9 nodes are available: 8 node(s) didn't match node selector.
Warning  FailedScheduling  97s (x6 over 3m39s)   default-scheduler  0/8 nodes are available: 8 node(s) didn't match node selector.
Warning  FailedScheduling  53s                   default-scheduler  0/8 nodes are available: 8 node(s) didn't match node selector.
Warning  FailedScheduling  7s (x5 over 32s)      default-scheduler  0/9 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 7 node(s) didn't match node selector.

第一个事件是清单刚刚被应用的时间（对不可调度节点未执行任何操作）。
第二和第三次是使用kubectl删除节点然后重新启动时。
当节点恢复时，第四个出现。出现问题的节点是主节点，因此该节点不去那里（但它表明在3个较早的事件中未找到该节点）。第四事件的有趣之处在于，仍然缺少来自一个节点的信息。事件说有0/9个节点可用，但仅从8个给出了描述。

Hi-Fi 于 2020-06-29

“ nodeName”不是选择器。使用nodeName将绕过调度。

当节点恢复时，第四个出现。出现问题的节点是主节点，因此该节点不去那里（但它表明在3个较早的事件中未找到该节点）。第四事件的有趣之处在于，仍然缺少来自一个节点的信息。事件说有0/9个节点可用，但仅从8个给出了描述。

您是说不应该在丢失的节点中安排Pod的原因是因为它是主节点？

我们看到8 node(s) didn't match node selector将要到7。我假设此时没有删除任何节点，对吗？

alculquicondor 于 2020-06-29

“ nodeName”不是选择器。使用nodeName将绕过调度。

“ NodeName”尝试是高亮的，该节点是可用的，并且如果需要，pod可以到达那里。因此，并不是节点无法启动Pod。

当节点恢复时，第四个出现。出现问题的节点是主节点，因此该节点不去那里（但它表明在3个较早的事件中未找到该节点）。第四事件的有趣之处在于，仍然缺少来自一个节点的信息。事件说有0/9个节点可用，但仅从8个给出了描述。
您是说不应该在丢失的节点中安排Pod的原因是因为它是主节点？
我们看到8 node(s) didn't match node selector将要到7。我假设此时没有删除任何节点，对吗？

测试集群有9个节点； 3名硕士和6名工人。在成功启动非工作节点之前，事件会告知有关所有可用节点的信息： 0/8 nodes are available: 8 node(s) didn't match node selector. 。但是，当与该节点选择器匹配的那个节点出现时，该事件告诉0/9 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 7 node(s) didn't match node selector.解释表明有8个不匹配，但没有告诉第九个（在上一个事件中得到确认）。

因此事件状态：

第一个事件：9个可用节点，daemonset注意到该错误
第二和第三事件：8个可用节点。没有收到Pod的那个正在重新启动
第四事件：9个可用节点（因此已启动的节点已重新启动）。

最终，由于污点，测试Pod没有在匹配节点开始，但这是另一回事了（在第一个事件中应该已经如此）。

Hi-Fi 于 2020-06-29

“ NodeName”尝试是高亮的，该节点是可用的，并且如果需要，pod可以到达那里。因此，并不是节点无法启动Pod。

请注意，除了调度程序之外，没有什么可以防止过量使用节点。因此，这并没有显示太多。

最终，由于污点，测试Pod没有在匹配节点开始，但这是另一回事了（在第一个事件中应该已经如此）。

我的问题是：第9个节点是否从一开始就受到污染？我正在尝试寻找（1）可重复的步骤以达到状态或（2）错误所在。

alculquicondor 于 2020-06-29

我的问题是：第9个节点是否从一开始就受到污染？我正在尝试寻找（1）可重复的步骤以达到状态或（2）错误所在。

是的，在这种情况下，污点一直存在，因为非接收节点是主节点。但是我们在主人和工人身上都看到了同样的问题。

仍然不知道问题来自何处，只是至少节点的重新创建和节点的重新启动似乎可以解决问题。但是这些都是修复问题的“硬”方法。

Hi-Fi 于 2020-06-29

远射，但是如果您再次遇到它...您可以检查节点是否存在未显示的指定舱位？

alculquicondor 于 2020-06-29

考虑到可能的情况，我正在发布问题：

集群中是否还有其他主节点？
你有补充剂吗？

alculquicondor 于 2020-06-29

* Do you have other master nodes in your cluster?

所有cluser都有3个master（因此重新启动它们很容易）

* Do you have extenders?

没有。

今天注意到了一件有趣的事情：我有一个集群，其中一个主机未从DaemonSet接收Pod。我们使用了ChaosMonkey，它终止了一个工作节点。有趣的是，这使吊舱可以转到较早没有收到的吊舱。因此，除去问题节点以外的其他节点似乎可以解决该问题。

由于存在该“修复”，因此我不得不等待问题再次出现，以便能够回答有关提名豆荚的问题。

Hi-Fi 于 2020-06-30

我现在很困惑...您的守护程序是否可以容忍主节点的污点？换句话说...是给您的bug只是安排事件，还是应该安排Pod的事实？

alculquicondor 于 2020-06-30

问题是，即使有至少一个匹配的亲和力（或反亲和力）设置，调度程序也找不到该节点。

这就是为什么我说异味错误是预期的，并且应该在第一次事件中就已经存在（因为异味不是亲和力标准的一部分）

Hi-Fi 于 2020-06-30

明白了我试图确认您的设置，以确保我没有丢失任何东西。

我不认为调度程序“看不见”该节点。假设我们看到0/9 nodes are available ，我们可以得出结论，该节点确实在缓存中。这更像是无法预测的原因丢失在某个地方，因此我们不在事件中包括它。

alculquicondor 于 2020-06-30

👍1

是的，总计数始终与实际节点数匹配。并非在所有节点上都提供了更具描述性的事件文本，但是正如您提到的那样，这可能是单独的问题。

Hi-Fi 于 2020-06-30

您可以查看您的kube-scheduler日志吗？有什么相关的内容吗？

alculquicondor 于 2020-06-30

我认为@zetaab试图寻找没有成功的东西。当问题再次发生时，我可以尝试（以及之前要求的提名豆荚问题）

Hi-Fi 于 2020-06-30

如果可能，请运行1.18.5，以防我们无意中解决了该问题。

alculquicondor 于 2020-06-30

如果您需要更多日志，则可以在测试群集上可靠地重现此信息

dilyevsky 于 2020-07-10

@dilyevsky请分享复制步骤。您能以某种方式确定发生故障的过滤器是什么吗？

alculquicondor 于 2020-07-10

它似乎只是ds pod的节点的metadata.name ...很奇怪。这是豆荚的Yaml：

豆荚Yaml：

apiVersion: v1
kind: Pod
metadata:
  annotations:
    scheduler.alpha.kubernetes.io/critical-pod: ""
  creationTimestamp: "2020-07-09T23:17:53Z"
  generateName: cilium-
  labels:
    controller-revision-hash: 6c94db8bb8
    k8s-app: cilium
    pod-template-generation: "1"
  managedFields:
    # managed fields crap
  name: cilium-d5n4f
  namespace: kube-system
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: DaemonSet
    name: cilium
    uid: 0f00e8af-eb19-4985-a940-a02fa84fcbc5
  resourceVersion: "2840"
  selfLink: /api/v1/namespaces/kube-system/pods/cilium-d5n4f
  uid: e3f7d566-ee5b-4557-8d1b-f0964cde2f22
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchFields:
          - key: metadata.name
            operator: In
            values:
            - us-central1-dilyevsky-master-qmwnl
  containers:
  - args:
    - --config-dir=/tmp/cilium/config-map
    command:
    - cilium-agent
    env:
    - name: K8S_NODE_NAME
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: spec.nodeName
    - name: CILIUM_K8S_NAMESPACE
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.namespace
    - name: CILIUM_FLANNEL_MASTER_DEVICE
      valueFrom:
        configMapKeyRef:
          key: flannel-master-device
          name: cilium-config
          optional: true
    - name: CILIUM_FLANNEL_UNINSTALL_ON_EXIT
      valueFrom:
        configMapKeyRef:
          key: flannel-uninstall-on-exit
          name: cilium-config
          optional: true
    - name: CILIUM_CLUSTERMESH_CONFIG
      value: /var/lib/cilium/clustermesh/
    - name: CILIUM_CNI_CHAINING_MODE
      valueFrom:
        configMapKeyRef:
          key: cni-chaining-mode
          name: cilium-config
          optional: true
    - name: CILIUM_CUSTOM_CNI_CONF
      valueFrom:
        configMapKeyRef:
          key: custom-cni-conf
          name: cilium-config
          optional: true
    image: docker.io/cilium/cilium:v1.7.6
    imagePullPolicy: IfNotPresent
    lifecycle:
      postStart:
        exec:
          command:
          - /cni-install.sh
          - --enable-debug=false
      preStop:
        exec:
          command:
          - /cni-uninstall.sh
    livenessProbe:
      exec:
        command:
        - cilium
        - status
        - --brief
      failureThreshold: 10
      initialDelaySeconds: 120
      periodSeconds: 30
      successThreshold: 1
      timeoutSeconds: 5
    name: cilium-agent
    readinessProbe:
      exec:
        command:
        - cilium
        - status
        - --brief
      failureThreshold: 3
      initialDelaySeconds: 5
      periodSeconds: 30
      successThreshold: 1
      timeoutSeconds: 5
    resources: {}
    securityContext:
      capabilities:
        add:
        - NET_ADMIN
        - SYS_MODULE
      privileged: true
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/cilium
      name: cilium-run
    - mountPath: /host/opt/cni/bin
      name: cni-path
    - mountPath: /host/etc/cni/net.d
      name: etc-cni-netd
    - mountPath: /var/lib/cilium/clustermesh
      name: clustermesh-secrets
      readOnly: true
    - mountPath: /tmp/cilium/config-map
      name: cilium-config-path
      readOnly: true
    - mountPath: /lib/modules
      name: lib-modules
      readOnly: true
    - mountPath: /run/xtables.lock
      name: xtables-lock
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: cilium-token-j74lr
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  hostNetwork: true
  initContainers:
  - command:
    - /init-container.sh
    env:
    - name: CILIUM_ALL_STATE
      valueFrom:
        configMapKeyRef:
          key: clean-cilium-state
          name: cilium-config
          optional: true
    - name: CILIUM_BPF_STATE
      valueFrom:
        configMapKeyRef:
          key: clean-cilium-bpf-state
          name: cilium-config
          optional: true
    - name: CILIUM_WAIT_BPF_MOUNT
      valueFrom:
        configMapKeyRef:
          key: wait-bpf-mount
          name: cilium-config
          optional: true
    image: docker.io/cilium/cilium:v1.7.6
    imagePullPolicy: IfNotPresent
    name: clean-cilium-state
    resources: {}
    securityContext:
      capabilities:
        add:
        - NET_ADMIN
      privileged: true
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/cilium
      name: cilium-run
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: cilium-token-j74lr
      readOnly: true
  priority: 2000001000
  priorityClassName: system-node-critical
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: cilium
  serviceAccountName: cilium
  terminationGracePeriodSeconds: 1
  tolerations:
  - operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/disk-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/pid-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/unschedulable
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/network-unavailable
    operator: Exists
  volumes:
  - hostPath:
      path: /var/run/cilium
      type: DirectoryOrCreate
    name: cilium-run
  - hostPath:
      path: /opt/cni/bin
      type: DirectoryOrCreate
    name: cni-path
  - hostPath:
      path: /etc/cni/net.d
      type: DirectoryOrCreate
    name: etc-cni-netd
  - hostPath:
      path: /lib/modules
      type: ""
    name: lib-modules
  - hostPath:
      path: /run/xtables.lock
      type: FileOrCreate
    name: xtables-lock
  - name: clustermesh-secrets
    secret:
      defaultMode: 420
      optional: true
      secretName: cilium-clustermesh
  - configMap:
      defaultMode: 420
      name: cilium-config
    name: cilium-config-path
  - name: cilium-token-j74lr
    secret:
      defaultMode: 420
      secretName: cilium-token-j74lr
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2020-07-09T23:17:53Z"
    message: '0/6 nodes are available: 5 node(s) didn''t match node selector.'
    reason: Unschedulable
    status: "False"
    type: PodScheduled
  phase: Pending
  qosClass: BestEffort

我重现此方法的方式是通过拆分具有3个主节点和3个工作节点的新群集（使用群集API）并应用Cilium 1.7.6：

Cilium yaml：

---
# Source: cilium/charts/agent/templates/serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: cilium
  namespace: kube-system
---
# Source: cilium/charts/operator/templates/serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: cilium-operator
  namespace: kube-system
---
# Source: cilium/charts/config/templates/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: cilium-config
  namespace: kube-system
data:

  # Identity allocation mode selects how identities are shared between cilium
  # nodes by setting how they are stored. The options are "crd" or "kvstore".
  # - "crd" stores identities in kubernetes as CRDs (custom resource definition).
  #   These can be queried with:
  #     kubectl get ciliumid
  # - "kvstore" stores identities in a kvstore, etcd or consul, that is
  #   configured below. Cilium versions before 1.6 supported only the kvstore
  #   backend. Upgrades from these older cilium versions should continue using
  #   the kvstore by commenting out the identity-allocation-mode below, or
  #   setting it to "kvstore".
  identity-allocation-mode: crd

  # If you want to run cilium in debug mode change this value to true
  debug: "false"

  # Enable IPv4 addressing. If enabled, all endpoints are allocated an IPv4
  # address.
  enable-ipv4: "true"

  # Enable IPv6 addressing. If enabled, all endpoints are allocated an IPv6
  # address.
  enable-ipv6: "false"

  # If you want cilium monitor to aggregate tracing for packets, set this level
  # to "low", "medium", or "maximum". The higher the level, the less packets
  # that will be seen in monitor output.
  monitor-aggregation: medium

  # The monitor aggregation interval governs the typical time between monitor
  # notification events for each allowed connection.
  #
  # Only effective when monitor aggregation is set to "medium" or higher.
  monitor-aggregation-interval: 5s

  # The monitor aggregation flags determine which TCP flags which, upon the
  # first observation, cause monitor notifications to be generated.
  #
  # Only effective when monitor aggregation is set to "medium" or higher.
  monitor-aggregation-flags: all

  # ct-global-max-entries-* specifies the maximum number of connections
  # supported across all endpoints, split by protocol: tcp or other. One pair
  # of maps uses these values for IPv4 connections, and another pair of maps
  # use these values for IPv6 connections.
  #
  # If these values are modified, then during the next Cilium startup the
  # tracking of ongoing connections may be disrupted. This may lead to brief
  # policy drops or a change in loadbalancing decisions for a connection.
  #
  # For users upgrading from Cilium 1.2 or earlier, to minimize disruption
  # during the upgrade process, comment out these options.
  bpf-ct-global-tcp-max: "524288"
  bpf-ct-global-any-max: "262144"

  # bpf-policy-map-max specified the maximum number of entries in endpoint
  # policy map (per endpoint)
  bpf-policy-map-max: "16384"

  # Pre-allocation of map entries allows per-packet latency to be reduced, at
  # the expense of up-front memory allocation for the entries in the maps. The
  # default value below will minimize memory usage in the default installation;
  # users who are sensitive to latency may consider setting this to "true".
  #
  # This option was introduced in Cilium 1.4. Cilium 1.3 and earlier ignore
  # this option and behave as though it is set to "true".
  #
  # If this value is modified, then during the next Cilium startup the restore
  # of existing endpoints and tracking of ongoing connections may be disrupted.
  # This may lead to policy drops or a change in loadbalancing decisions for a
  # connection for some time. Endpoints may need to be recreated to restore
  # connectivity.
  #
  # If this option is set to "false" during an upgrade from 1.3 or earlier to
  # 1.4 or later, then it may cause one-time disruptions during the upgrade.
  preallocate-bpf-maps: "false"

  # Regular expression matching compatible Istio sidecar istio-proxy
  # container image names
  sidecar-istio-proxy-image: "cilium/istio_proxy"

  # Encapsulation mode for communication between nodes
  # Possible values:
  #   - disabled
  #   - vxlan (default)
  #   - geneve
  tunnel: vxlan

  # Name of the cluster. Only relevant when building a mesh of clusters.
  cluster-name: default

  # DNS Polling periodically issues a DNS lookup for each `matchName` from
  # cilium-agent. The result is used to regenerate endpoint policy.
  # DNS lookups are repeated with an interval of 5 seconds, and are made for
  # A(IPv4) and AAAA(IPv6) addresses. Should a lookup fail, the most recent IP
  # data is used instead. An IP change will trigger a regeneration of the Cilium
  # policy for each endpoint and increment the per cilium-agent policy
  # repository revision.
  #
  # This option is disabled by default starting from version 1.4.x in favor
  # of a more powerful DNS proxy-based implementation, see [0] for details.
  # Enable this option if you want to use FQDN policies but do not want to use
  # the DNS proxy.
  #
  # To ease upgrade, users may opt to set this option to "true".
  # Otherwise please refer to the Upgrade Guide [1] which explains how to
  # prepare policy rules for upgrade.
  #
  # [0] http://docs.cilium.io/en/stable/policy/language/#dns-based
  # [1] http://docs.cilium.io/en/stable/install/upgrade/#changes-that-may-require-action
  tofqdns-enable-poller: "false"

  # wait-bpf-mount makes init container wait until bpf filesystem is mounted
  wait-bpf-mount: "false"

  masquerade: "true"
  enable-xt-socket-fallback: "true"
  install-iptables-rules: "true"
  auto-direct-node-routes: "false"
  kube-proxy-replacement:  "probe"
  enable-host-reachable-services: "false"
  enable-external-ips: "false"
  enable-node-port: "false"
  node-port-bind-protection: "true"
  enable-auto-protect-node-port-range: "true"
  enable-endpoint-health-checking: "true"
  enable-well-known-identities: "false"
  enable-remote-node-identity: "true"
---
# Source: cilium/charts/agent/templates/clusterrole.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: cilium
rules:
- apiGroups:
  - networking.k8s.io
  resources:
  - networkpolicies
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - discovery.k8s.io
  resources:
  - endpointslices
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - namespaces
  - services
  - nodes
  - endpoints
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - pods
  - nodes
  verbs:
  - get
  - list
  - watch
  - update
- apiGroups:
  - ""
  resources:
  - nodes
  - nodes/status
  verbs:
  - patch
- apiGroups:
  - apiextensions.k8s.io
  resources:
  - customresourcedefinitions
  verbs:
  - create
  - get
  - list
  - watch
  - update
- apiGroups:
  - cilium.io
  resources:
  - ciliumnetworkpolicies
  - ciliumnetworkpolicies/status
  - ciliumclusterwidenetworkpolicies
  - ciliumclusterwidenetworkpolicies/status
  - ciliumendpoints
  - ciliumendpoints/status
  - ciliumnodes
  - ciliumnodes/status
  - ciliumidentities
  - ciliumidentities/status
  verbs:
  - '*'
---
# Source: cilium/charts/operator/templates/clusterrole.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: cilium-operator
rules:
- apiGroups:
  - ""
  resources:
  # to automatically delete [core|kube]dns pods so that are starting to being
  # managed by Cilium
  - pods
  verbs:
  - get
  - list
  - watch
  - delete
- apiGroups:
  - discovery.k8s.io
  resources:
  - endpointslices
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resources:
  # to automatically read from k8s and import the node's pod CIDR to cilium's
  # etcd so all nodes know how to reach another pod running in in a different
  # node.
  - nodes
  # to perform the translation of a CNP that contains `ToGroup` to its endpoints
  - services
  - endpoints
  # to check apiserver connectivity
  - namespaces
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - cilium.io
  resources:
  - ciliumnetworkpolicies
  - ciliumnetworkpolicies/status
  - ciliumclusterwidenetworkpolicies
  - ciliumclusterwidenetworkpolicies/status
  - ciliumendpoints
  - ciliumendpoints/status
  - ciliumnodes
  - ciliumnodes/status
  - ciliumidentities
  - ciliumidentities/status
  verbs:
  - '*'
---
# Source: cilium/charts/agent/templates/clusterrolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: cilium
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cilium
subjects:
- kind: ServiceAccount
  name: cilium
  namespace: kube-system
---
# Source: cilium/charts/operator/templates/clusterrolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: cilium-operator
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cilium-operator
subjects:
- kind: ServiceAccount
  name: cilium-operator
  namespace: kube-system
---
# Source: cilium/charts/agent/templates/daemonset.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  labels:
    k8s-app: cilium
  name: cilium
  namespace: kube-system
spec:
  selector:
    matchLabels:
      k8s-app: cilium
  template:
    metadata:
      annotations:
        # This annotation plus the CriticalAddonsOnly toleration makes
        # cilium to be a critical pod in the cluster, which ensures cilium
        # gets priority scheduling.
        # https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/
        scheduler.alpha.kubernetes.io/critical-pod: ""
      labels:
        k8s-app: cilium
    spec:
      containers:
      - args:
        - --config-dir=/tmp/cilium/config-map
        command:
        - cilium-agent
        livenessProbe:
          exec:
            command:
            - cilium
            - status
            - --brief
          failureThreshold: 10
          # The initial delay for the liveness probe is intentionally large to
          # avoid an endless kill & restart cycle if in the event that the initial
          # bootstrapping takes longer than expected.
          initialDelaySeconds: 120
          periodSeconds: 30
          successThreshold: 1
          timeoutSeconds: 5
        readinessProbe:
          exec:
            command:
            - cilium
            - status
            - --brief
          failureThreshold: 3
          initialDelaySeconds: 5
          periodSeconds: 30
          successThreshold: 1
          timeoutSeconds: 5
        env:
        - name: K8S_NODE_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName
        - name: CILIUM_K8S_NAMESPACE
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
        - name: CILIUM_FLANNEL_MASTER_DEVICE
          valueFrom:
            configMapKeyRef:
              key: flannel-master-device
              name: cilium-config
              optional: true
        - name: CILIUM_FLANNEL_UNINSTALL_ON_EXIT
          valueFrom:
            configMapKeyRef:
              key: flannel-uninstall-on-exit
              name: cilium-config
              optional: true
        - name: CILIUM_CLUSTERMESH_CONFIG
          value: /var/lib/cilium/clustermesh/
        - name: CILIUM_CNI_CHAINING_MODE
          valueFrom:
            configMapKeyRef:
              key: cni-chaining-mode
              name: cilium-config
              optional: true
        - name: CILIUM_CUSTOM_CNI_CONF
          valueFrom:
            configMapKeyRef:
              key: custom-cni-conf
              name: cilium-config
              optional: true
        image: "docker.io/cilium/cilium:v1.7.6"
        imagePullPolicy: IfNotPresent
        lifecycle:
          postStart:
            exec:
              command:
              - "/cni-install.sh"
              - "--enable-debug=false"
          preStop:
            exec:
              command:
              - /cni-uninstall.sh
        name: cilium-agent
        securityContext:
          capabilities:
            add:
            - NET_ADMIN
            - SYS_MODULE
          privileged: true
        volumeMounts:
        - mountPath: /var/run/cilium
          name: cilium-run
        - mountPath: /host/opt/cni/bin
          name: cni-path
        - mountPath: /host/etc/cni/net.d
          name: etc-cni-netd
        - mountPath: /var/lib/cilium/clustermesh
          name: clustermesh-secrets
          readOnly: true
        - mountPath: /tmp/cilium/config-map
          name: cilium-config-path
          readOnly: true
          # Needed to be able to load kernel modules
        - mountPath: /lib/modules
          name: lib-modules
          readOnly: true
        - mountPath: /run/xtables.lock
          name: xtables-lock
      hostNetwork: true
      initContainers:
      - command:
        - /init-container.sh
        env:
        - name: CILIUM_ALL_STATE
          valueFrom:
            configMapKeyRef:
              key: clean-cilium-state
              name: cilium-config
              optional: true
        - name: CILIUM_BPF_STATE
          valueFrom:
            configMapKeyRef:
              key: clean-cilium-bpf-state
              name: cilium-config
              optional: true
        - name: CILIUM_WAIT_BPF_MOUNT
          valueFrom:
            configMapKeyRef:
              key: wait-bpf-mount
              name: cilium-config
              optional: true
        image: "docker.io/cilium/cilium:v1.7.6"
        imagePullPolicy: IfNotPresent
        name: clean-cilium-state
        securityContext:
          capabilities:
            add:
            - NET_ADMIN
          privileged: true
        volumeMounts:
        - mountPath: /var/run/cilium
          name: cilium-run
      restartPolicy: Always
      priorityClassName: system-node-critical
      serviceAccount: cilium
      serviceAccountName: cilium
      terminationGracePeriodSeconds: 1
      tolerations:
      - operator: Exists
      volumes:
        # To keep state between restarts / upgrades
      - hostPath:
          path: /var/run/cilium
          type: DirectoryOrCreate
        name: cilium-run
      # To install cilium cni plugin in the host
      - hostPath:
          path:  /opt/cni/bin
          type: DirectoryOrCreate
        name: cni-path
        # To install cilium cni configuration in the host
      - hostPath:
          path: /etc/cni/net.d
          type: DirectoryOrCreate
        name: etc-cni-netd
        # To be able to load kernel modules
      - hostPath:
          path: /lib/modules
        name: lib-modules
        # To access iptables concurrently with other processes (e.g. kube-proxy)
      - hostPath:
          path: /run/xtables.lock
          type: FileOrCreate
        name: xtables-lock
        # To read the clustermesh configuration
      - name: clustermesh-secrets
        secret:
          defaultMode: 420
          optional: true
          secretName: cilium-clustermesh
        # To read the configuration from the config map
      - configMap:
          name: cilium-config
        name: cilium-config-path
  updateStrategy:
    rollingUpdate:
      maxUnavailable: 2
    type: RollingUpdate
---
# Source: cilium/charts/operator/templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    io.cilium/app: operator
    name: cilium-operator
  name: cilium-operator
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      io.cilium/app: operator
      name: cilium-operator
  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
    type: RollingUpdate
  template:
    metadata:
      annotations:
      labels:
        io.cilium/app: operator
        name: cilium-operator
    spec:
      containers:
      - args:
        - --debug=$(CILIUM_DEBUG)
        - --identity-allocation-mode=$(CILIUM_IDENTITY_ALLOCATION_MODE)
        - --synchronize-k8s-nodes=true
        command:
        - cilium-operator
        env:
        - name: CILIUM_K8S_NAMESPACE
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
        - name: K8S_NODE_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName
        - name: CILIUM_DEBUG
          valueFrom:
            configMapKeyRef:
              key: debug
              name: cilium-config
              optional: true
        - name: CILIUM_CLUSTER_NAME
          valueFrom:
            configMapKeyRef:
              key: cluster-name
              name: cilium-config
              optional: true
        - name: CILIUM_CLUSTER_ID
          valueFrom:
            configMapKeyRef:
              key: cluster-id
              name: cilium-config
              optional: true
        - name: CILIUM_IPAM
          valueFrom:
            configMapKeyRef:
              key: ipam
              name: cilium-config
              optional: true
        - name: CILIUM_DISABLE_ENDPOINT_CRD
          valueFrom:
            configMapKeyRef:
              key: disable-endpoint-crd
              name: cilium-config
              optional: true
        - name: CILIUM_KVSTORE
          valueFrom:
            configMapKeyRef:
              key: kvstore
              name: cilium-config
              optional: true
        - name: CILIUM_KVSTORE_OPT
          valueFrom:
            configMapKeyRef:
              key: kvstore-opt
              name: cilium-config
              optional: true
        - name: AWS_ACCESS_KEY_ID
          valueFrom:
            secretKeyRef:
              key: AWS_ACCESS_KEY_ID
              name: cilium-aws
              optional: true
        - name: AWS_SECRET_ACCESS_KEY
          valueFrom:
            secretKeyRef:
              key: AWS_SECRET_ACCESS_KEY
              name: cilium-aws
              optional: true
        - name: AWS_DEFAULT_REGION
          valueFrom:
            secretKeyRef:
              key: AWS_DEFAULT_REGION
              name: cilium-aws
              optional: true
        - name: CILIUM_IDENTITY_ALLOCATION_MODE
          valueFrom:
            configMapKeyRef:
              key: identity-allocation-mode
              name: cilium-config
              optional: true
        image: "docker.io/cilium/operator:v1.7.6"
        imagePullPolicy: IfNotPresent
        name: cilium-operator
        livenessProbe:
          httpGet:
            host: '127.0.0.1'
            path: /healthz
            port: 9234
            scheme: HTTP
          initialDelaySeconds: 60
          periodSeconds: 10
          timeoutSeconds: 3
      hostNetwork: true
      restartPolicy: Always
      serviceAccount: cilium-operator
      serviceAccountName: cilium-operator

dilyevsky 于 2020-07-10

这是调度程序日志：
I0709 23:08:22.056081 I0709 23:08:23.137451 W0709 23:08:33.843509 W0709 23:08:33.843671 W0709 23:08:33.843710 I0709 23:08:33.911805 I0709 23:08:33.911989 W0709 23:08:33.917999 W0709 23:08:33.918162 I0709 23:08:33.918238 I0709 23:08:33.925860 I0709 23:08:33.926013 I0709 23:08:33.930685 I0709 23:08:33.936198 I0709 23:08:34.026382 I0709 23:08:34.036998 I0709 23:08:50.597201 E0709 23:08:50.658551 E0709 23:12:27.673854 E0709 23:12:58.099432 I0709 23:08:22.055830 1 registry.go:150] Registering EvenPodsSpread predicate and priority function 1 registry.go:150] Registering EvenPodsSpread predicate and priority function 1 serving.go:313] Generated self-signed cert in-memory 1 authentication.go:297] Error looking up in-cluster authentication configuration: etcdserver: request timed out 1 authentication.go:298] Continuing without authentication configuration. This may treat all requests as anonymous. 1 authentication.go:299] To require authentication configuration lookup to succeed, set --authentication-tolerate-lookup-failure=false 1 registry.go:150] Registering EvenPodsSpread predicate and priority function 1 registry.go:150] Registering EvenPodsSpread predicate and priority function 1 authorization.go:47] Authorization is disabled 1 authentication.go:40] Authentication is disabled 1 deprecated_insecure_serving.go:51] Serving healthz insecurely on [::]:10251 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::client-ca-file 1 shared_informer.go:223] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file 1 secure_serving.go:178] Serving securely on 127.0.0.1:10259 1 tlsconfig.go:240] Starting DynamicServingCertificateController 1 shared_informer.go:230] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file 1 leaderelection.go:242] attempting to acquire leader lease kube-system/kube-scheduler... 1 leaderelection.go:252] successfully acquired lease kube-system/kube-scheduler 1 factory.go:503] pod: kube-system/coredns-66bff467f8-9rjvd is already present in the active queue 1 factory.go:503] pod kube-system/cilium-vv466 is already present in the backoff queue 1 leaderelection.go:320] error retrieving resource lock kube-system/kube-scheduler: etcdserver: leader changed

重新启动调度程序容器后，挂起的容器将立即进行调度。


                    
                        
                            
                                
                                dilyevsky
                                于 2020-07-10



                                                
                    
                        你会得到什么荚事件？ 您知道节点中是否有污点
没有安排在什么地方？ 它仅对主节点或任何其他节点失败
节点？ 节点中是否有足够的空间？
2020年7月9日星期四dilyevsky下午7:49， notifications @github.com
 写道：
它似乎只是ds pod的节点的metadata.name ...
 奇怪的。 这是豆荚的Yaml：
apiVersion：v1kind：Podmetadata：
 注释：
 scheduler.alpha.kubernetes.io/critical-pod：“”
 creationTimestamp：“ 2020-07-09T23：17：53Z”
 generateName：纤毛-
 标签：
 控制器修订哈希：6c94db8bb8
 k8s-app：纤毛
 pod模板生成：“ 1”
 managedFields：
 ＃托管字段废话
名称：cilium-d5n4f
 命名空间：kube-system
 owner参考：
apiVersion：apps / v1
 blockOwnerDeletion：true
 控制器：真
种类：DaemonSet
 名称：纤毛
 uid：0f00e8af-eb19-4985-a940-a02fa84fcbc5
 resourceVersion：“ 2840”
 selfLink：/ api / v1 /名称空间/ kube-system / pods / cilium-d5n4f
 uid：e3f7d566-ee5b-4557-8d1b-f0964cde2f22spec：
 亲和力：
 nodeAffinity：
 requiredDuringSchedulingIgnoredDuringExecution：
 nodeSelectorTerms：
 -matchFields：
 -键：metadata.name
 运算符：在
值：
 -us-central1-dilyevsky-master-qmwnl
 容器：
args：

--config-dir = / tmp / cilium / config-map

 命令：

纤毛剂

环境：

名称：K8S_NODE_NAME

 valueFrom：

 fieldRef：

 apiVersion：v1

 fieldPath：spec.nodeName

名称：CILIUM_K8S_NAMESPACE

 valueFrom：

 fieldRef：

 apiVersion：v1

 fieldPath：meta.namespace

名称：CILIUM_FLANNEL_MASTER_DEVICE

 valueFrom：

 configMapKeyRef：

 密钥：法兰绒主设备

名称：cilium-config

 可选：true

名称：CILIUM_FLANNEL_UNINSTALL_ON_EXIT

 valueFrom：

 configMapKeyRef：

 密钥：退出时法兰绒卸载

名称：cilium-config

 可选：true

名称：CILIUM_CLUSTERMESH_CONFIG

 值：/ var / lib / cilium / clustermesh /

名称：CILIUM_CNI_CHAINING_MODE

 valueFrom：

 configMapKeyRef：

 密钥：cni链接模式

名称：cilium-config

 可选：true

名称：CILIUM_CUSTOM_CNI_CONF

 valueFrom：

 configMapKeyRef：

 密钥：custom-cni-conf

 名称：cilium-config

 可选：true

 图像：docker.io/cilium/ cilium：v1.7.6

 imagePullPolicy：IfNotPresent

 生命周期：

 postStart：

 执行：

 命令：



/cni-install.sh


--enable-debug = false


 preStop：


 执行：


 命令：


/cni-uninstall.sh


 livenessProbe：


 执行：


 命令： 





纤毛



状态



- 简要



 failureThreshold：10



 initialDelaySeconds：120



 periodSecond：30



 successThreshold：1



 超时秒：5



 名称：纤毛剂



准备情况：



 执行：



 命令：



纤毛



状态



- 简要



 failureThreshold：3



 initialDelaySeconds：5



 periodSecond：30



 successThreshold：1



 超时秒：5



 资源：{}



 securityContext：



 功能：



 加：



NET_ADMIN



SYS_MODULE



 特权：真



 TerminationMessagePath：/ dev / termination-log



 TerminationMessagePolicy：文件



 volumeMounts： 






mountPath：/ var / run / cilium

 名称：cilium-run

mountPath：/ host / opt / cni / bin

 名称：cni-path

mountPath：/host/etc/cni/net.d

 名称：etc-cni-netd

mountPath：/ var / lib / cilium / clustermesh

 名称：clustermesh-secrets

 readOnly：正确

mountPath：/ tmp / cilium / config-map

 名称：cilium-config-path

 readOnly：正确

mountPath：/ lib / modules

 名称：lib-modules

 readOnly：正确

mountPath：/run/xtables.lock

 名称：xtables-lock

mountPath：/var/run/secrets/kubernetes.io/serviceaccount

 名称：cilium-token-j74lr

 readOnly：正确

 dnsPolicy：ClusterFirst

 enableServiceLinks：是

 hostNetwork：true

 initContainers：

命令：

/init-container.sh

 环境：

名称：CILIUM_ALL_STATE

 valueFrom：

 configMapKeyRef：

 关键：清洁纤毛状态

名称：cilium-config

 可选：true

名称：CILIUM_BPF_STATE

 valueFrom：

 configMapKeyRef：

 关键：clean-cilium-bpf-state

 名称：cilium-config

 可选：true

名称：CILIUM_WAIT_BPF_MOUNT

 valueFrom：

 configMapKeyRef：

 键：wait-bpf-mount

 名称：cilium-config

 可选：true

 图像：docker.io/cilium/ cilium：v1.7.6

 imagePullPolicy：IfNotPresent

 名称：清洁纤毛状态

资源：{}

 securityContext：

 功能：

 加：



NET_ADMIN


 特权：真


 TerminationMessagePath：/ dev / termination-log


 TerminationMessagePolicy：文件


 volumeMounts：



mountPath：/ var / run / cilium

 名称：cilium-run

mountPath：/var/run/secrets/kubernetes.io/serviceaccount

 名称：cilium-token-j74lr

 readOnly：正确

优先级：2000001000

 priorityClassName：关键系统节点

 restartPolicy：始终

 schedulerName：默认调度程序

 securityContext：{}

 服务帐户：cilium

 serviceAccountName：纤毛

 TerminationGracePeriodSeconds：1

 公差：

运算符：存在
效果：NoExecute
 密钥：node.kubernetes.io/未就绪
运算符：存在
效果：NoExecute
 密钥：node.kubernetes.io/无法访问
运算符：存在
效果：NoSchedule
 密钥：node.kubernetes.io/磁盘压力
运算符：存在
效果：NoSchedule
 密钥：node.kubernetes.io/内存压力
运算符：存在
效果：NoSchedule
 密钥：node.kubernetes.io/pid-pressure
 运算符：存在
效果：NoSchedule
 密钥：node.kubernetes.io/不可调度
运算符：存在
效果：NoSchedule
 密钥：node.kubernetes.io/网络不可用
运算符：存在
数量：
hostPath：
 路径：/ var / run / cilium
 类型：DirectoryOrCreate
 名称：cilium-run
hostPath：
 路径：/ opt / cni / bin
 类型：DirectoryOrCreate
 名称：cni-path
hostPath：
 路径：/etc/cni/net.d
 类型：DirectoryOrCreate
 名称：etc-cni-netd
hostPath：
 路径：/ lib / modules
 类型：“”
 名称：lib-modules
hostPath：
 路径：/run/xtables.lock
 类型：FileOrCreate
 名称：xtables-lock
名称：clustermesh-secrets
 秘密：
 defaultMode：420
 可选：true
 secretName：纤毛clustermesh
configMap：
 defaultMode：420
 名称：cilium-config
 名称：cilium-config-path
名称：cilium-token-j74lr
 秘密：
 defaultMode：420
 secretName：cilium-token-j74lrstatus：
 条件：
lastProbeTime：null
 lastTransitionTime：“ 2020-07-09T23：17：53Z”
 消息：“ 0/6个节点可用：5个节点与节点选择器不匹配。”
 原因：计划外
状态：“假”
 类型：PodScheduled
 阶段：待定
 qosClass：尽力而为
我重现此方法的方法是将新集群与2个主节点合并，
 3个工作节点（使用群集API）并应用Cilium 1.7.6：
---＃来源：cilium / charts / agent / templates / serviceaccount.yamlapi版本：v1kind：ServiceAccount元数据：
 名称：纤毛
命名空间：kube-system
 ---＃来源：cilium / charts / operator / templates / serviceaccount.yamlapi版本：v1kind：ServiceAccount元数据：
 名称：cilium-operator
 命名空间：kube-system
 ---＃来源：cilium / charts / config / templates / configmap.yamlapi版本：v1kind：ConfigMapmetadata：
 名称：cilium-config
 命名空间：kube-systemdata：
＃身份分配模式选择如何在cilium之间共享身份
通过设置节点的存储方式来设置节点数。 选项为“ crd”或“ kvstore”。
 ＃-“ crd”将身份存储在kubernetes中作为CRD（自定义资源定义）。
 ＃可通过以下方式查询：
 ＃kubectl获得ciliumid
 ＃-“ kvstore”将身份存储在kvstore，etcd或consul中，即
 ＃在下面配置。 1.6版之前的Cilium版本仅支持kvstore
 ＃后端。 从这些较旧的cilium版本升级应继续使用
通过注释掉下面的identity-allocation-mode来＃kvstore，或者
 ＃将其设置为“ kvstore”。
 身份分配模式：crd
＃如果要在调试模式下运行cilium，请将此值更改为true
 调试：“假”
＃启用IPv4寻址。 如果启用，将为所有端点分配一个IPv4
 ＃ 地址。
 enable-ipv4：“ true”
＃启用IPv6寻址。 如果启用，将为所有端点分配一个IPv6
 ＃ 地址。
 enable-ipv6：“假”
＃如果您希望cilium Monitor聚合数据包跟踪，请设置此级别
 ＃设置为“低”，“中”或“最大”。 级别越高，数据包越少
 ＃将在监视器输出中看到。
 监控汇总：中等
＃监视器聚合间隔决定监视器之间的典型时间
 ＃每个允许的连接的通知事件。
 ＃
 ＃仅在监视器聚合设置为“中”或更高时有效。
 monitor-aggregation-interval：5秒
＃监控器聚合标志确定哪些TCP标志在
 ＃第一次观察，导致生成监视器通知。
 ＃
 ＃仅在监视器聚合设置为“中”或更高时有效。
 monitor-aggregation-flags：全部
＃ct-global-max-entries- *指定最大连接数
 ＃支持所有端点，按协议划分：tcp或其他。 一对
数量的地图将这些值用于IPv4连接，另一对地图
 ＃将这些值用于IPv6连接。
 ＃
 ＃如果修改了这些值，则在下次Cilium启动期间，
 ＃跟踪正在进行的连接可能会中断。 这可能导致简短
 ＃策略丢失或更改连接的负载平衡决策。
 ＃
 ＃对于从Cilium 1.2或更早版本升级的用户，以最大程度地减少中断
 ＃在升级过程中，注释掉这些选项。
 bpf-ct-global-tcp-max：“ 524288”
 bpf-ct-global-any-max：“ 262144”
＃bpf-policy-map-max指定端点中的最大条目数
 ＃策略映射（每个端点）
 bpf-policy-map-max：“ 16384”
＃预先分配地图项可以减少每个数据包的延迟
 ＃为映射中的条目分配前期内存的费用。 的
 ＃下面的默认值将最小化默认安装中的内存使用；
 ＃对延迟敏感的用户可以考虑将其设置为“ true”。
 ＃
 ＃此选项在Cilium 1.4中引入。 Cilium 1.3和更早版本忽略
 ＃此选项，其行为就像设置为“ true”一样。
 ＃
 ＃如果修改此值，则在下次Cilium启动期间还原
现有端点的数量和对正在进行的连接的跟踪可能会中断。
 ＃这可能会导致策略丢弃或负载均衡决策的更改
 ＃连接一段时间。 可能需要重新创建端点才能还原
 ＃连接。
 ＃
 ＃如果在从1.3或更早版本升级到此版本的过程中将此选项设置为“ false”
 ＃1.4或更高版本，则可能在升级过程中造成一次性中断。
 preallocate-bpf-maps：“假”
＃正则表达式匹配兼容的Istio sidecar istio-proxy
 ＃容器映像名称
 sidecar-istio-proxy-image：“ cilium / istio_proxy”
＃节点间通信的封装方式
 ＃可能的值：
 ＃-禁用
 ＃-vxlan（默认）
 ＃-日内瓦
隧道：vxlan
＃集群名称。 仅在构建群集网格时才有意义。
 群集名称：默认
＃DNS轮询会定期为每个matchName发出DNS查询
 ＃纤毛剂。 结果用于重新生成端点策略。
 ＃DNS查询以5秒钟的间隔重复进行，
 ＃A（IPv4）和AAAA（IPv6）地址。 如果查找失败，则使用最新的IP
 使用＃数据代替。 IP更改将触发Cilium的再生
 ＃每个端点的策略，并增加每个cilium-agent策略
 ＃存储库修订版。
 ＃
 ＃默认情况下，从1.4.x版本开始禁用此选项
基于DNS代理的功能更强大的实现的详细信息，请参见[0]。
 ＃如果要使用FQDN策略但不想使用，请启用此选项
 ＃DNS代理。
 ＃
 ＃为了简化升级，用户可以选择将此选项设置为“ true”。
 ＃否则，请参考升级指南[1]，其中介绍了如何
 ＃准备升级策略规则。
 ＃
 ＃[0] http://docs.cilium.io/en/stable/policy/language/#dns -based
 ＃[1] http://docs.cilium.io/en/stable/install/upgrade/#changes -that-may-require-action
 tofqdns-enable-poller：“假”
＃wait-bpf-mount使初始化容器等待直到挂载bpf文件系统
 wait-bpf-mount：“假”
假面舞会：“真实”
 enable-xt-socket-fallback：“ true”
 install-iptables-rules：“ true”
 自动直接节点路由：“假”
 kube-proxy-replacement：“探针”
 enable-host-reachable-services：“假”
 enable-external-ips：“假”
 enable-node-port：“假”
 节点端口绑定保护：“ true”
 启用自动保护节点端口范围：“ true”
 enable-endpoint-health-checking：“真”
 enable-众所周知的身份：“假”
 enable-remote-node-identity：“ true”
 ---＃来源：cilium / charts / agent / templates / clusterrole.yamlapi版本：rbac.authorization.k8s.io/v1kind：ClusterRolemetadata：
 名称：纤毛规则：
apiGroups：

联网.k8s.io

 资源：

网络政策

动词：

得到

清单

看

apiGroups：

Discovery.k8s.io

 资源：

端点切片

动词：

得到

清单

看

apiGroups：

”

 资源：

命名空间

服务

节点

终点

动词：

得到

清单

看

apiGroups：

”

 资源：

豆荚

节点

动词：

得到

清单

看

更新

apiGroups：

”

 资源：

节点

节点/状态

动词：

补丁

apiGroups：

apiextensions.k8s.io

 资源：

customresourcedefinitions

 动词：

创造

得到

清单

看

更新

apiGroups：

纤毛虫

资源：

cilium网络政策

ciliumnetworkpolicies / status

cilium群集全网政策

ciliumclusterwidenetworkpolicy / status

纤毛端点

纤毛端点/状态

纤毛节

纤毛节点/状态

纤毛身份

机构身份/状态

动词：

'*'

 ---＃来源：cilium / charts / operator / templates / clusterrole.yamlapi版本：rbac.authorization.k8s.io/v1kind：ClusterRolemetadata：

 名称：cilium-operatorrules：

apiGroups：

”

 资源：

 ＃自动删除[core | kube] dns连播，以便开始使用

 ＃由Cilium管理

豆荚

动词：

得到

清单

看

删除

apiGroups：

Discovery.k8s.io

 资源：

端点切片

动词：

得到

清单

看

apiGroups：

”

 资源：

 ＃自动从k8s读取并将节点的pod CIDR导入cilium的

 ＃etcd，以便所有节点都知道如何到达另一个以不同方式运行的Pod

 ＃个节点。

节点

 ＃执行将包含ToGroup的CNP转换为其端点的操作

服务

终点

 ＃检查apiserver连接

命名空间

动词：

得到

清单

看

apiGroups：

纤毛虫

资源：

cilium网络政策

ciliumnetworkpolicies / status

cilium群集全网政策

ciliumclusterwidenetworkpolicy / status

纤毛端点

纤毛端点/状态

纤毛节

纤毛节点/状态

纤毛身份

机构身份/状态

动词：

'*'

 ---＃来源：cilium / charts / agent / templates / clusterrolebinding.yamlapi版本：rbac.authorization.k8s.io/v1kind：ClusterRoleBinding元数据：

 名称：ciliumroleRef：

 apiGroup：rbac.authorization.k8s.io

 种类：ClusterRole

 名称：纤毛科目：

种类：ServiceAccount
 名称：纤毛
命名空间：kube-system
 ---＃来源：cilium / charts / operator / templates / clusterrolebinding.yamlapi版本：rbac.authorization.k8s.io/v1kind：ClusterRoleBinding元数据：
 名称：cilium-operatorroleRef：
 apiGroup：rbac.authorization.k8s.io
 种类：ClusterRole
 名称：cilium-operator主题：
种类：ServiceAccount
 名称：cilium-operator
 命名空间：kube-system
 ---＃来源：cilium / charts / agent / templates / daemonset.yamlapi版本：apps / v1kind：DaemonSetmetadata：
 标签：
 k8s-app：纤毛
名称：纤毛
命名空间：kube-systemspec：
 选择器：
 matchLabels：
 k8s-app：纤毛
模板：
 元数据：
 注释：
 ＃此注释加上CriticalAddonsOnly容忍度
 ＃cilium成为群集中的关键Pod，可确保cilium
 ＃获取优先级调度。
 ＃https ://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/
 scheduler.alpha.kubernetes.io/critical-pod：“”
 标签：
 k8s-app：纤毛
规格：
 容器：

args：



--config-dir = / tmp / cilium / config-map


 命令：


纤毛剂


 livenessProbe：


 执行：


 命令： 





纤毛



状态



- 简要



 failureThreshold：10



 ＃活动性探针的初始延迟故意大于



 ＃避免在初始情况下出现无限的终止和重启周期



 ＃引导所需的时间比预期的长。



 initialDelaySeconds：120



 periodSecond：30



 successThreshold：1



 超时秒：5



 准备情况：



 执行：



 命令：



纤毛



状态



- 简要



 failureThreshold：3



 initialDelaySeconds：5



 periodSecond：30



 successThreshold：1



 超时秒：5



 环境：





名称：K8S_NODE_NAME


 valueFrom：


 fieldRef：


 apiVersion：v1


 fieldPath：spec.nodeName


名称：CILIUM_K8S_NAMESPACE


 valueFrom：


 fieldRef：


 apiVersion：v1


 fieldPath：meta.namespace


名称：CILIUM_FLANNEL_MASTER_DEVICE


 valueFrom：


 configMapKeyRef：


 密钥：法兰绒主设备


名称：cilium-config


 可选：true


名称：CILIUM_FLANNEL_UNINSTALL_ON_EXIT


 valueFrom：


 configMapKeyRef：


 密钥：退出时法兰绒卸载


名称：cilium-config


 可选：true


名称：CILIUM_CLUSTERMESH_CONFIG


 值：/ var / lib / cilium / clustermesh /


名称：CILIUM_CNI_CHAINING_MODE


 valueFrom：


 configMapKeyRef：


 密钥：cni链接模式


名称：cilium-config


 可选：true


名称：CILIUM_CUSTOM_CNI_CONF


 valueFrom：


 configMapKeyRef：


 密钥：custom-cni-conf


 名称：cilium-config


 可选：true


 图像：“ docker.io/cilium/ cilium：v1.7.6 ”


 imagePullPolicy：IfNotPresent


 生命周期：


 postStart：


 执行：


 命令： 





“ /cni-install.sh”



“ --enable-debug = false”



 preStop：



 执行：



 命令：



/cni-uninstall.sh



 名称：纤毛剂



 securityContext：



 功能：



 加： 







NET_ADMIN




SYS_MODULE




 特权：真




 volumeMounts： 









mountPath：/ var / run / cilium


 名称：cilium-run


mountPath：/ host / opt / cni / bin


 名称：cni-path


mountPath：/host/etc/cni/net.d


 名称：etc-cni-netd


mountPath：/ var / lib / cilium / clustermesh


 名称：clustermesh-secrets


 readOnly：正确


mountPath：/ tmp / cilium / config-map


 名称：cilium-config-path


 readOnly：正确


 ＃需要能够加载内核模块


mountPath：/ lib / modules


 名称：lib-modules


 readOnly：正确


mountPath：/run/xtables.lock


 名称：xtables-lock


 hostNetwork：true


 initContainers：



命令：



/init-container.sh


 环境：


名称：CILIUM_ALL_STATE


 valueFrom：


 configMapKeyRef：


 关键：清洁纤毛状态


名称：cilium-config


 可选：true


名称：CILIUM_BPF_STATE


 valueFrom：


 configMapKeyRef：


 关键：clean-cilium-bpf-state


 名称：cilium-config


 可选：true


名称：CILIUM_WAIT_BPF_MOUNT


 valueFrom：


 configMapKeyRef：


 键：wait-bpf-mount


 名称：cilium-config


 可选：true


 图像：“ docker.io/cilium/ cilium：v1.7.6 ”


 imagePullPolicy：IfNotPresent


 名称：清洁纤毛状态


 securityContext：


 功能：


 加： 





NET_ADMIN



 特权：真



 volumeMounts：





mountPath：/ var / run / cilium


 名称：cilium-run


 restartPolicy：始终


 priorityClassName：关键系统节点


服务帐户：cilium


 serviceAccountName：纤毛


 TerminationGracePeriodSeconds：1


 公差：



运算符：存在

数量：

 ＃在重启/升级之间保持状态

hostPath：

 路径：/ var / run / cilium

 类型：DirectoryOrCreate

 名称：cilium-run

 ＃在主机上安装cilium cni插件

hostPath：

 路径：/ opt / cni / bin

 类型：DirectoryOrCreate

 名称：cni-path

 ＃在主机上安装cilium cni配置

hostPath：

 路径：/etc/cni/net.d

 类型：DirectoryOrCreate

 名称：etc-cni-netd

 ＃能够加载内核模块

hostPath：

 路径：/ lib / modules

 名称：lib-modules

 ＃与其他进程（例如kube-proxy）同时访问iptables

hostPath：

 路径：/run/xtables.lock

 类型：FileOrCreate

 名称：xtables-lock

 ＃读取clustermesh配置

名称：clustermesh-secrets

 秘密：

 defaultMode：420

 可选：true

 secretName：纤毛clustermesh

 ＃从config映射中读取配置

configMap：

 名称：cilium-config

 名称：cilium-config-path

 updateStrategy：

 滚动更新：

 maxUnavailable：2

 类型：RollingUpdate

 ---＃来源：cilium / charts / operator / templates / deployment.yamlapi版本：apps / v1kind：部署元数据：

 标签：

 io.cilium / app：运算子

名称：cilium-operator

 名称：cilium-operator

 命名空间：kube-systemspec：

 复制品：1

 选择器：

 matchLabels：

 io.cilium / app：运算子

名称：cilium-operator

 战略：

 滚动更新：

 maxSurge：1

 maxUnavailable：1

 类型：RollingUpdate

 模板：

 元数据：

 注释：

 标签：

 io.cilium / app：运算子

名称：cilium-operator

 规格：

 容器：

args：



--debug = $（CILIUM_DEBUG）


--identity-allocation-mode = $（CILIUM_IDENTITY_ALLOCATION_MODE）


--synchronize-k8s-nodes = true


 命令：


纤毛算子


环境：


名称：CILIUM_K8S_NAMESPACE


 valueFrom：


 fieldRef：


 apiVersion：v1


 fieldPath：meta.namespace


名称：K8S_NODE_NAME


 valueFrom：


 fieldRef：


 apiVersion：v1


 fieldPath：spec.nodeName


名称：CILIUM_DEBUG


 valueFrom：


 configMapKeyRef：


 关键：调试


名称：cilium-config


 可选：true


名称：CILIUM_CLUSTER_NAME


 valueFrom：


 configMapKeyRef：


 密钥：集群名称


名称：cilium-config


 可选：true


名称：CILIUM_CLUSTER_ID


 valueFrom：


 configMapKeyRef：


 密钥：cluster-id


 名称：cilium-config


 可选：true


名称：CILIUM_IPAM


 valueFrom：


 configMapKeyRef：


 键：ipam


 名称：cilium-config


 可选：true


名称：CILIUM_DISABLE_ENDPOINT_CRD


 valueFrom：


 configMapKeyRef：


 键：disable-endpoint-crd


 名称：cilium-config


 可选：true


名称：CILIUM_KVSTORE


 valueFrom：


 configMapKeyRef：


 密钥：kvstore


 名称：cilium-config


 可选：true


名称：CILIUM_KVSTORE_OPT


 valueFrom：


 configMapKeyRef：


 密钥：kvstore-opt


 名称：cilium-config


 可选：true


名称：AWS_ACCESS_KEY_ID


 valueFrom：


 secretKeyRef：


 密钥：AWS_ACCESS_KEY_ID


 名称：纤毛


可选：true


名称：AWS_SECRET_ACCESS_KEY


 valueFrom：


 secretKeyRef：


 密钥：AWS_SECRET_ACCESS_KEY


 名称：纤毛


可选：true


名称：AWS_DEFAULT_REGION


 valueFrom：


 secretKeyRef：


 密钥：AWS_DEFAULT_REGION


 名称：纤毛


可选：true


名称：CILIUM_IDENTITY_ALLOCATION_MODE


 valueFrom：


 configMapKeyRef：


 密钥：身份分配模式


名称：cilium-config


 可选：true


 图像：“ docker.io/cilium/运算符：v1.7.6 ”


 imagePullPolicy：IfNotPresent


 名称：cilium-operator


 livenessProbe：


 httpGet：


 主持人：“ 127.0.0.1”


 路径：/ healthz


 端口：9234


 方案：HTTP


 initialDelaySeconds：60


 periodSecond：10


 超时秒：3


 hostNetwork：true


 restartPolicy：始终


 serviceAccount：cilium-operator


 serviceAccountName：cilium-operator



-
 您收到此邮件是因为您已被分配。
 直接回复此电子邮件，在GitHub上查看
 https://github.com/kubernetes/kubernetes/issues/91601#issuecomment-656404841 ，
 或退订
 https://github.com/notifications/unsubscribe-auth/AAJ5E6BMTNCADT5K7D4PMF3R2ZJRVANCNFSM4NOTPEDA
 。
                    
                    
                        
                            
                                
                                alculquicondor
                                于 2020-07-10
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        您能否尝试增加日志级别并使用grep过滤节点
还是吊舱？
2020年7月9日，星期四，迪利耶夫斯基（dilyevsky）， notifications @ github.com
 写道：
这是调度程序日志：
I0709 23：08：22.056081 1 Registry.go：150]注册EvenPodsSpread谓词和优先级功能
 I0709 23：08：23.137451 1serving.go：313]在内存中生成了自签名证书
 W0709 23：08：33.843509 1 authentication.go：297]查找群集内身份验证配置时出错：etcdserver：请求超时
 W0709 23：08：33.843671 1 authentication.go：298]继续进行而不进行身份验证配置。 这可能会将所有请求视为匿名请求。
 W0709 23：08：33.843710 1 authentication.go：299]要要求成功进行身份验证配置，请设置--authentication-tolerate-lookup-failure = false
 I0709 23：08：33.911805 1 Registry.go：150]注册EvenPodsSpread谓词和优先级功能
 I0709 23：08：33.911989 1 Registry.go：150]注册EvenPodsSpread谓词和优先级功能
 W0709 23：08：33.917999 1 Author.go：47]禁用了授权
 W0709 23：08：33.918162 1 authentication.go：40]身份验证已禁用
 I0709 23：08：33.918238 1 deprecated_insecure_serving.go：51]在[::]：10251上不安全地提供healthz
 I0709 23：08：33.925860 1 configmap_cafile_content.go：202]启动client-ca :: kube-system :: extension-apiserver-authentication :: client-ca-file
 I0709 23：08：33.926013 1 shared_informer.go：223]等待缓存同步以获取客户端ca :: kube-system :: extension-apiserver-authentication ::: client-ca-file
 I0709 23：08：33.930685 1 secure_serving.go：178]在127.0.0.1:10259上安全地提供服务
 I0709 23：08：33.936198 1 tlsconfig.go：240]启动DynamicServingCertificateController
 I0709 23：08：34.026382 1 shared_informer.go：230]为客户端ca :: kube-system :: extension-apiserver-authentication :: client-ca-file同步了缓存
 I0709 23：08：34.036998 1 Leaderelection.go：242]试图获取领导者租约kube-system / kube-scheduler ...
 I0709 23：08：50.597201 1 Leaderelection.go：252]成功获得了租赁kube-system / kube-scheduler
 E0709 23：08：50.658551 1 factory.go：503] pod：kube-system / coredns-66bff467f8-9rjvd已存在于活动队列中
 E0709 23：12：27.673854 1 factory.go：503] pod kube-system / cilium-vv466已经存在于退避队列中
 E0709 23：12：58.099432 1 Leaderelection.go：320]检索资源锁kube-system / kube-scheduler时出错：etcdserver：领导者已更改
重新启动调度程序容器后，挂起的容器将立即进行调度。
-
 您收到此邮件是因为您已被分配。
 直接回复此电子邮件，在GitHub上查看
 https://github.com/kubernetes/kubernetes/issues/91601#issuecomment-656406215 ，
 或退订
 https://github.com/notifications/unsubscribe-auth/AAJ5E6E4QPGNNBFUYSZEJC3R2ZKHDANCNFSM4NOTPEDA
 。
                    
                    
                        
                            
                                
                                alculquicondor
                                于 2020-07-10
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        这些是事件：
 活动：
 从消息中键入原因年龄
 ---- ------ ---- ---- -------
 警告失败排程默认调度程序0/6节点可用：5个节点与节点选择器不匹配。
 警告失败排程默认调度程序0/6节点可用：5个节点与节点选择器不匹配。

The node only has two taints but the pod tolerates all existing taints and yeah it seems to only happen on masters:
污染：node-role.kubernetes.io/ master：NoSchedule
 node.kubernetes.io/network-un可用：否

There is enough space and pod is best effort with no reservation anyway:
```  Resource                   Requests    Limits
  --------                   --------    ------
  cpu                        650m (32%)  0 (0%)
  memory                     70Mi (0%)   170Mi (2%)
  ephemeral-storage          0 (0%)      0 (0%)
  hugepages-1Gi              0 (0%)      0 (0%)
  hugepages-2Mi              0 (0%)      0 (0%)
  attachable-volumes-gce-pd  0           0
我现在尝试增加调度程序日志级别...
                    
                    
                        
                            
                                
                                dilyevsky
                                于 2020-07-10
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        您的Pod Yaml实际上没有node-role.kubernetes.io/master容忍度。 因此，它不应该在主服务器中安排。
                    
                    
                        
                            
                                
                                alculquicondor
                                于 2020-07-10
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        嗨！ 我们遇到了同样的问题。 但是，我们在部署中也遇到了同样的问题，在这种情况下，我们使用反亲和力来确保在每个节点上调度Pod或针对特定节点的Pod选择器。
 仅通过设置节点选择器来匹配出现故障的节点的主机名来创建Pod，就足以导致调度失败。 据说5个节点与选择器不匹配，但与第6个节点不匹配。 重新启动调度程序解决了该问题。 看起来有一些关于该节点的缓存，并阻止了对该节点的调度。
 正如其他人之前所说，关于失败的日志中没有任何内容。
我们将失败的部署减少到最低限度（我们已经删除了失败的主节点上的污点）：
apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-deployment
  labels:
    app: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
      restartPolicy: Always
      schedulerName: default-scheduler
      nodeSelector:
        kubernetes.io/hostname: master-2
当主人有一个污点，而布署了一个污点时，我们也遇到了同样的问题。 因此，它似乎与守护进程，容忍度或亲和力/反亲和力没有特别关系。 当故障开始发生时，无法计划针对特定节点的任何事情。 我们看到了1.18.2到1.18.5的问题（没有尝试使用1.18.0或.1）
                    
                    
                        
                            
                                
                                maelk
                                于 2020-07-15
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        仅创建一个节点设置为匹配故障节点主机名的Pod足以导致调度失败
您能否澄清在创建这样的Pod之后还是之前，它是否开始失败？ 我认为这个节点没有豆荚不能容忍的污点。
@nodo将有助于复制。 您能看一下NodeSelector的代码吗？ 测试时，您可能需要添加额外的日志行。 您也可以打印缓存。
获取kube-scheduler的PID： $ pidof kube-scheduler
触发队列转储： $ sudo kill -SIGUSR2 <pid> 。 请注意，这不会终止调度程序进程。
然后在调度程序日志中，搜索字符串“ Dump of cached NodeInfo”，“ Dump of schedule queue”和“ cache比较器已启动”。
/优先级紧急紧急
                    
                    
                        
                            
                                
                                alculquicondor
                                于 2020-07-15
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        /未指定
                    
                    
                        
                            
                                
                                alculquicondor
                                于 2020-07-15
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        在尝试部署此测试部署之前，我们已经看到一些守护进程和部署处于“挂起”状态，因此它已经失败了。 并且污点已从节点上移除。
 现在，我们失去了发生这种情况的环境，因为我们不得不重新启动节点，因此该问题不再可见。 复制后，我们将尝试返回更多信息
                    
                    
                        
                            
                                
                                maelk
                                于 2020-07-15
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        请这样做。 我曾经尝试重现此方法，但没有成功。 我对失败的第一个实例更感兴趣。 它可能仍然与污点有关。
                    
                    
                        
                            
                                
                                alculquicondor
                                于 2020-07-15
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        我们已转载了该问题。 我运行了您要求的命令，这是信息：
I0716 14:47:52.768362       1 factory.go:462] Unable to schedule default/test-deployment-558f47bbbb-4rt5t: no fit: 0/6 nodes are available: 5 node(s) didn't match node selector.; waiting
I0716 14:47:52.768683       1 scheduler.go:776] Updating pod condition for default/test-deployment-558f47bbbb-4rt5t to (PodScheduled==False, Reason=Unschedulable)
I0716 14:47:53.018781       1 httplog.go:90] verb="GET" URI="/healthz" latency=299.172µs resp=200 UserAgent="kube-probe/1.18" srcIP="127.0.0.1:57258": 
I0716 14:47:59.469828       1 comparer.go:42] cache comparer started
I0716 14:47:59.470936       1 comparer.go:67] cache comparer finished
I0716 14:47:59.471038       1 dumper.go:47] Dump of cached NodeInfo
I0716 14:47:59.471484       1 dumper.go:49] 
Node name: master-0-bug
Requested Resources: {MilliCPU:1100 Memory:52428800 EphemeralStorage:0 AllowedPodNumber:0 ScalarResources:map[]}
Allocatable Resources:{MilliCPU:2000 Memory:3033427968 EphemeralStorage:19290208634 AllowedPodNumber:110 ScalarResources:map[hugepages-1Gi:0 hugepages-2Mi:0]}
Scheduled Pods(number: 9):
...
I0716 14:47:59.472623       1 dumper.go:60] Dump of scheduling queue:
name: coredns-cd64c8d7c-29zjq, namespace: kube-system, uid: 938e8827-5d17-4db9-ac04-d229baf4534a, phase: Pending, nominated node: 
name: test-deployment-558f47bbbb-4rt5t, namespace: default, uid: fa19fda9-c8d6-4ffe-b248-8ddd24ed5310, phase: Pending, nominated node: 
不幸的是，这似乎无济于事
                    
                    
                        
                            
                                
                                maelk
                                于 2020-07-16
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        转储缓存是为了调试，它不会更改任何内容。 您能包括转储吗？
                    
                    
                        
                            
                                
                                alculquicondor
                                于 2020-07-16
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        此外，假设这是第一个错误，您可以包括pod yaml和node吗？
                    
                    
                        
                            
                                
                                alculquicondor
                                于 2020-07-16
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        这几乎就是所有转储的内容，我只是删除了其他节点。 这不是第一个错误，但是您可以在转储中看到coredns pod，这是第一个错误。 我不确定您要在转储中还有什么要求。
 我去拿洋娃娃
                    
                    
                        
                            
                                
                                maelk
                                于 2020-07-16
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        谢谢，我没有意识到您已经修剪了相关的节点和吊舱。
                    
                    
                        
                            
                                
                                alculquicondor
                                于 2020-07-16
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        您是否可以包括该节点的计划的Pod？ 以防万一在资源使用情况计算中存在错误。
                    
                    
                        
                            
                                
                                alculquicondor
                                于 2020-07-16
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        Requested Resources: {MilliCPU:1100 Memory:52428800 EphemeralStorage:0 AllowedPodNumber:0 ScalarResources:map[]}
AllowedPodNumber: 0看起来很奇怪。
                    
                    
                        
                            
                                
                                alculquicondor
                                于 2020-07-16
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        这是该节点上的其他Pod：
 ` 
name: kube-controller-manager-master-0-bug, namespace: kube-system, uid: 095eebb0-4752-419b-aac7-245e5bc436b8, phase: Running, nominated node: 
name: kube-proxy-xwf6h, namespace: kube-system, uid: 16552eaf-9eb8-4584-ba3c-7dff6ce92592, phase: Running, nominated node: 
name: kube-apiserver-master-0-bug, namespace: kube-system, uid: 1d338e26-b0bc-4cef-9bad-86b7dd2b2385, phase: Running, nominated node: 
name: kube-multus-ds-amd64-tpkm8, namespace: kube-system, uid: d50c0c7f-599c-41d5-a029-b43352a4f5b8, phase: Running, nominated node: 
name: openstack-cloud-controller-manager-wrb8n, namespace: kube-system, uid: 17aeb589-84a1-4416-a701-db6d8ef60591, phase: Running, nominated node: 
name: kube-scheduler-master-0-bug, namespace: kube-system, uid: 52469084-3122-4e99-92f6-453e512b640f, phase: Running, nominated node: 
name: subport-controller-28j9v, namespace: kube-system, uid: a5a07ac8-763a-4ff2-bdae-91c6e9e95698, phase: Running, nominated node: 
name: csi-cinder-controllerplugin-0, namespace: kube-system, uid: 8b16d6c8-a871-454e-98a3-0aa545f9c9d0, phase: Running, nominated node: 
name: calico-node-d899t, namespace: kube-system, uid: e3672030-53b1-4356-a5df-0f4afd6b9237, phase: Running, nominated node:

                    
                    
                        
                            
                                
                                maelk
                                于 2020-07-16
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        所有节点都在转储中请求的资源中将allowedPodNumber设置为0，但是其他节点是可调度的
                    
                    
                        
                            
                                
                                maelk
                                于 2020-07-16
                            
                            
                                                                👍1
                            
                        
                    
                

                                                
                    
                        节点yaml：
apiVersion: v1
kind: Node
metadata:
  annotations:
    kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
    node.alpha.kubernetes.io/ttl: "0"
    volumes.kubernetes.io/controller-managed-attach-detach: "true"
  creationTimestamp: "2020-07-16T09:59:48Z"
  labels:
    beta.kubernetes.io/arch: amd64
    beta.kubernetes.io/instance-type: 54019dbc-10d7-409c-8338-5556f61a9371
    beta.kubernetes.io/os: linux
    failure-domain.beta.kubernetes.io/region: regionOne
    failure-domain.beta.kubernetes.io/zone: nova
    kubernetes.io/arch: amd64
    kubernetes.io/hostname: master-0-bug
    kubernetes.io/os: linux
    node-role.kubernetes.io/master: ""
    node.kubernetes.io/instance-type: 54019dbc-10d7-409c-8338-5556f61a9371
    node.uuid: 00324054-405e-4fae-a3bf-d8509d511ded
    node.uuid_source: cloud-init
    topology.kubernetes.io/region: regionOne
    topology.kubernetes.io/zone: nova
  name: master-0-bug
  resourceVersion: "85697"
  selfLink: /api/v1/nodes/master-0-bug
  uid: 629b6ef3-3c76-455b-8b6b-196c4754fb0e
spec:
  podCIDR: 192.168.0.0/24
  podCIDRs:
  - 192.168.0.0/24
  providerID: openstack:///00324054-405e-4fae-a3bf-d8509d511ded
  taints:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
status:
  addresses:
  - address: 10.0.10.14
    type: InternalIP
  - address: master-0-bug
    type: Hostname
  allocatable:
    cpu: "2"
    ephemeral-storage: "19290208634"
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    memory: 2962332Ki
    pods: "110"
  capacity:
    cpu: "2"
    ephemeral-storage: 20931216Ki
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    memory: 3064732Ki
    pods: "110"
  conditions:
  - lastHeartbeatTime: "2020-07-16T10:02:20Z"
    lastTransitionTime: "2020-07-16T10:02:20Z"
    message: Calico is running on this node
    reason: CalicoIsUp
    status: "False"
    type: NetworkUnavailable
  - lastHeartbeatTime: "2020-07-16T15:46:11Z"
    lastTransitionTime: "2020-07-16T09:59:43Z"
    message: kubelet has sufficient memory available
    reason: KubeletHasSufficientMemory
    status: "False"
    type: MemoryPressure
  - lastHeartbeatTime: "2020-07-16T15:46:11Z"
    lastTransitionTime: "2020-07-16T09:59:43Z"
    message: kubelet has no disk pressure
    reason: KubeletHasNoDiskPressure
    status: "False"
    type: DiskPressure
  - lastHeartbeatTime: "2020-07-16T15:46:11Z"
    lastTransitionTime: "2020-07-16T09:59:43Z"
    message: kubelet has sufficient PID available
    reason: KubeletHasSufficientPID
    status: "False"
    type: PIDPressure
  - lastHeartbeatTime: "2020-07-16T15:46:11Z"
    lastTransitionTime: "2020-07-16T10:19:44Z"
    message: kubelet is posting ready status. AppArmor enabled
    reason: KubeletReady
    status: "True"
    type: Ready
  daemonEndpoints:
    kubeletEndpoint:
      Port: 10250
  nodeInfo:
    architecture: amd64
    bootID: fe410ed3-2825-4f94-a9f9-08dc5e6a955e
    containerRuntimeVersion: docker://19.3.11
    kernelVersion: 4.12.14-197.45-default
    kubeProxyVersion: v1.18.5
    kubeletVersion: v1.18.5
    machineID: 00324054405e4faea3bfd8509d511ded
    operatingSystem: linux
    systemUUID: 00324054-405e-4fae-a3bf-d8509d511ded
和吊舱：
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: "2020-07-16T10:13:35Z"
  generateName: pm-node-exporter-
  labels:
    controller-revision-hash: 6466d9c7b
    pod-template-generation: "1"
  name: pm-node-exporter-mn9vj
  namespace: monitoring
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: DaemonSet
    name: pm-node-exporter
    uid: 5855a26f-a57e-4b0e-93f2-461c19c477e1
  resourceVersion: "5239"
  selfLink: /api/v1/namespaces/monitoring/pods/pm-node-exporter-mn9vj
  uid: 0db09c9c-1618-4454-94fa-138e55e5ebd7
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchFields:
          - key: metadata.name
            operator: In
            values:
            - master-0-bug
  containers:
  - args:
    - --path.procfs=/host/proc
    - --path.sysfs=/host/sys
    image: ***
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /
        port: 9100
        scheme: HTTP
      initialDelaySeconds: 5
      periodSeconds: 5
      successThreshold: 1
      timeoutSeconds: 1
    name: pm-node-exporter
    ports:
    - containerPort: 9100
      hostPort: 9100
      name: metrics
      protocol: TCP
    resources:
      limits:
        cpu: 200m
        memory: 150Mi
      requests:
        cpu: 100m
        memory: 100Mi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /host/proc
      name: proc
      readOnly: true
    - mountPath: /host/sys
      name: sys
      readOnly: true
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: pm-node-exporter-token-csllf
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  hostNetwork: true
  hostPID: true
  nodeSelector:
    node-role.kubernetes.io/master: ""
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: pm-node-exporter
  serviceAccountName: pm-node-exporter
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/disk-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/pid-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/unschedulable
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/network-unavailable
    operator: Exists
  volumes:
  - hostPath:
      path: /proc
      type: ""
    name: proc
  - hostPath:
      path: /sys
      type: ""
    name: sys
  - name: pm-node-exporter-token-csllf
    secret:
      defaultMode: 420
      secretName: pm-node-exporter-token-csllf
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2020-07-16T10:13:35Z"
    message: '0/6 nodes are available: 2 node(s) didn''t have free ports for the requested
      pod ports, 3 node(s) didn''t match node selector.'
    reason: Unschedulable
    status: "False"
    type: PodScheduled
  phase: Pending
  qosClass: Burstable

                    
                    
                        
                            
                                
                                maelk
                                于 2020-07-16
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        非常感谢您提供的所有信息。 @nodo你能接受吗？
                    
                    
                        
                            
                                
                                alculquicondor
                                于 2020-07-16
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        我们现在还尝试使用https://github.com/Nordix/kubernetes/commit/5c00cdf195fa61316f963f59e73c6cafc2ad9bdc ，以获取更多信息
                    
                    
                        
                            
                                
                                maelk
                                于 2020-07-16
                            
                            
                                                                👍1
                            
                        
                    
                

                                                
                    
                        /救命
如果发现错误， @ maelk随时接受并提交PR。 您添加的日志行可能会有所帮助。 否则，我将向贡献者开放。
                    
                    
                        
                            
                                
                                alculquicondor
                                于 2020-07-16
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        @alculquicondor ：
 该请求已被标记为需要贡献者的帮助。
请确保该请求满足此处列出的要求。
如果此请求不再满足这些要求，则可以将标签除去
通过使用/remove-help命令进行注释。
针对此：
/救命
如果发现错误， @ maelk随时接受并提交PR。 您添加的日志行可能会有所帮助。 否则，我将向贡献者开放。
可在此处获得使用PR注释与我互动的说明。 如果您对我的行为有任何疑问或建议，请针对kubernetes / test-infra存储库提出问题。

                    
                    
                        
                            
                                
                                k8s-ci-robot
                                于 2020-07-16
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        /分配
                    
                    
                        
                            
                                
                                pancernik
                                于 2020-07-17
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        @maelk第一次出现此问题时，计时有什么特别的吗？ 例如，是否在节点启动后立即发生？
                    
                    
                        
                            
                                
                                pancernik
                                于 2020-07-17
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        不，有很多豆荚可以安排在那里并且运行良好。 但是一旦发生问题，就无法再安排任何时间了。
                    
                    
                        
                            
                                
                                maelk
                                于 2020-07-17
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        降低优先级，直到我们得到可复制的案例。
                    
                    
                        
                            
                                
                                liggitt
                                于 2020-07-19
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        我们能够使用具有其他日志条目的调度程序来重现该错误。 我们看到的是，其中一个主节点完全从迭代的节点列表中消失了。 我们可以看到该过程从6个节点开始（从快照开始）：
I0720 13:58:28.246507       1 generic_scheduler.go:441] Looking for a node for kube-system/coredns-cd64c8d7c-tcxbq, going through []*nodeinfo.NodeInfo{(*nodeinfo.NodeInfo)(0xc000326a90), (*nodeinfo.NodeInfo)(0xc000952000), (*nodeinfo.NodeInfo)(0xc0007d08f0), (*nodeinfo.NodeInfo)(0xc0004f35f0), (*nodeinfo.NodeInfo)(0xc000607040), (*nodeinfo.NodeInfo)(0xc000952000)}
但是之后，我们可以看到它仅在5个节点上进行迭代，然后得到：
I0720 13:58:28.247420       1 generic_scheduler.go:505] pod kube-system/coredns-cd64c8d7c-tcxbq : processed 5 nodes, 0 fit
因此，将节点之一从潜在节点列表中删除。 不幸的是，在过程开始时我们没有足够的日志记录，但是我们将尝试获取更多日志。
                    
                    
                        
                            
                                
                                maelk
                                于 2020-07-20
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        通过日志行引用代码：
https://github.com/Nordix/kubernetes/commit/5c00cdf195fa61316f963f59e73c6cafc2ad9bdc#diff -c237cdd9e4cb201118ca380732d7f361R441
https://github.com/Nordix/kubernetes/commit/5c00cdf195fa61316f963f59e73c6cafc2ad9bdc#diff -c237cdd9e4cb201118ca380732d7f361R505
ma
 您是否看到%v/%v on node %v, too many nodes fit任何行？
否则， @ pancernik您可以检查workqueue.ParallelizeUntil(ctx, 16, len(allNodes), checkNode)吗？
                    
                    
                        
                            
                                
                                alculquicondor
                                于 2020-07-20
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        不，该日志没有出现。 我还认为这可能是因为我们对并行化存在问题，或者该节点较早被过滤掉了。 如果失败并出现错误，请执行以下操作： https : 
                    
                    
                        
                            
                                
                                maelk
                                于 2020-07-20
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        我只是意识到一个节点要经过两次过滤！
                    
                    
                        
                            
                                
                                maelk
                                于 2020-07-20
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        日志是：
I0720 13:58:28.246507       1 generic_scheduler.go:441] Looking for a node for kube-system/coredns-cd64c8d7c-tcxbq, going through []*nodeinfo.NodeInfo{(*nodeinfo.NodeInfo)(0xc000326a90), (*nodeinfo.NodeInfo)(0xc000952000), (*nodeinfo.NodeInfo)(0xc0007d08f0), (*nodeinfo.NodeInfo)(0xc0004f35f0), (*nodeinfo.NodeInfo)(0xc000607040), (*nodeinfo.NodeInfo)(0xc000952000)}
I0720 13:58:28.246793       1 generic_scheduler.go:469] pod kube-system/coredns-cd64c8d7c-tcxbq on node worker-pool1-60846k0y-scheduler, fits: false, status: &v1alpha1.Status{code:3, reasons:[]string{"node(s) didn't match node selector"}}
I0720 13:58:28.246970       1 generic_scheduler.go:483] pod kube-system/coredns-cd64c8d7c-tcxbq on node worker-pool1-60846k0y-scheduler : status is not success
I0720 13:58:28.246819       1 taint_toleration.go:71] Checking taints for pod kube-system/coredns-cd64c8d7c-tcxbq for node master-0-scheduler : taints : []v1.Taint{v1.Taint{Key:"node-role.kubernetes.io/master", Value:"", Effect:"NoSchedule", TimeAdded:(*v1.Time)(nil)}} and tolerations: []v1.Toleration{v1.Toleration{Key:"node-role.kubernetes.io/master", Operator:"Exists", Value:"", Effect:"NoSchedule", TolerationSeconds:(*int64)(nil)}, v1.Toleration{Key:"CriticalAddonsOnly", Operator:"Exists", Value:"", Effect:"NoSchedule", TolerationSeconds:(*int64)(nil)}, v1.Toleration{Key:"node-role.kubernetes.io/master", Operator:"Exists", Value:"", Effect:"NoExecute", TolerationSeconds:(*int64)(nil)}, v1.Toleration{Key:"node-role.kubernetes.io/not-ready", Operator:"Exists", Value:"", Effect:"NoSchedule", TolerationSeconds:(*int64)(nil)}, v1.Toleration{Key:"node.kubernetes.io/not-ready", Operator:"Exists", Value:"", Effect:"NoExecute", TolerationSeconds:(*int64)(0xc000d40d90)}, v1.Toleration{Key:"node.kubernetes.io/unreachable", Operator:"Exists", Value:"", Effect:"NoExecute", TolerationSeconds:(*int64)(0xc000d40db0)}}
I0720 13:58:28.247019       1 taint_toleration.go:71] Checking taints for pod kube-system/coredns-cd64c8d7c-tcxbq for node master-2-scheduler : taints : []v1.Taint{v1.Taint{Key:"node-role.kubernetes.io/master", Value:"", Effect:"NoSchedule", TimeAdded:(*v1.Time)(nil)}} and tolerations: []v1.Toleration{v1.Toleration{Key:"node-role.kubernetes.io/master", Operator:"Exists", Value:"", Effect:"NoSchedule", TolerationSeconds:(*int64)(nil)}, v1.Toleration{Key:"CriticalAddonsOnly", Operator:"Exists", Value:"", Effect:"NoSchedule", TolerationSeconds:(*int64)(nil)}, v1.Toleration{Key:"node-role.kubernetes.io/master", Operator:"Exists", Value:"", Effect:"NoExecute", TolerationSeconds:(*int64)(nil)}, v1.Toleration{Key:"node-role.kubernetes.io/not-ready", Operator:"Exists", Value:"", Effect:"NoSchedule", TolerationSeconds:(*int64)(nil)}, v1.Toleration{Key:"node.kubernetes.io/not-ready", Operator:"Exists", Value:"", Effect:"NoExecute", TolerationSeconds:(*int64)(0xc000d40d90)}, v1.Toleration{Key:"node.kubernetes.io/unreachable", Operator:"Exists", Value:"", Effect:"NoExecute", TolerationSeconds:(*int64)(0xc000d40db0)}}
I0720 13:58:28.247144       1 generic_scheduler.go:469] pod kube-system/coredns-cd64c8d7c-tcxbq on node master-2-scheduler, fits: false, status: &v1alpha1.Status{code:2, reasons:[]string{"node(s) didn't match pod affinity/anti-affinity", "node(s) didn't satisfy existing pods anti-affinity rules"}}
I0720 13:58:28.247172       1 generic_scheduler.go:483] pod kube-system/coredns-cd64c8d7c-tcxbq on node master-2-scheduler : status is not success
I0720 13:58:28.247210       1 generic_scheduler.go:469] pod kube-system/coredns-cd64c8d7c-tcxbq on node worker-pool1-7dt1xd4k-scheduler, fits: false, status: &v1alpha1.Status{code:3, reasons:[]string{"node(s) didn't match node selector"}}
I0720 13:58:28.247231       1 generic_scheduler.go:483] pod kube-system/coredns-cd64c8d7c-tcxbq on node worker-pool1-7dt1xd4k-scheduler : status is not success
I0720 13:58:28.247206       1 generic_scheduler.go:469] pod kube-system/coredns-cd64c8d7c-tcxbq on node worker-pool1-60846k0y-scheduler, fits: false, status: &v1alpha1.Status{code:3, reasons:[]string{"node(s) didn't match node selector"}}
I0720 13:58:28.247297       1 generic_scheduler.go:483] pod kube-system/coredns-cd64c8d7c-tcxbq on node worker-pool1-60846k0y-scheduler : status is not success
I0720 13:58:28.247246       1 generic_scheduler.go:469] pod kube-system/coredns-cd64c8d7c-tcxbq on node worker-pool1-hyk0hg7r-scheduler, fits: false, status: &v1alpha1.Status{code:3, reasons:[]string{"node(s) didn't match node selector"}}
I0720 13:58:28.247340       1 generic_scheduler.go:483] pod kube-system/coredns-cd64c8d7c-tcxbq on node worker-pool1-hyk0hg7r-scheduler : status is not success
I0720 13:58:28.247147       1 generic_scheduler.go:469] pod kube-system/coredns-cd64c8d7c-tcxbq on node master-0-scheduler, fits: false, status: &v1alpha1.Status{code:2, reasons:[]string{"node(s) didn't match pod affinity/anti-affinity", "node(s) didn't satisfy existing pods anti-affinity rules"}}
I0720 13:58:28.247375       1 generic_scheduler.go:483] pod kube-system/coredns-cd64c8d7c-tcxbq on node master-0-scheduler : status is not success
I0720 13:58:28.247420       1 generic_scheduler.go:505] pod kube-system/coredns-cd64c8d7c-tcxbq : processed 5 nodes, 0 fit
I0720 13:58:28.247461       1 generic_scheduler.go:430] pod kube-system/coredns-cd64c8d7c-tcxbq After scheduling, filtered: []*v1.Node{}, filtered nodes: v1alpha1.NodeToStatusMap{"master-0-scheduler":(*v1alpha1.Status)(0xc000d824a0), "master-2-scheduler":(*v1alpha1.Status)(0xc000b736c0), "worker-pool1-60846k0y-scheduler":(*v1alpha1.Status)(0xc000d825a0), "worker-pool1-7dt1xd4k-scheduler":(*v1alpha1.Status)(0xc000b737e0), "worker-pool1-hyk0hg7r-scheduler":(*v1alpha1.Status)(0xc000b738c0)}
I0720 13:58:28.247527       1 generic_scheduler.go:185] Pod kube-system/coredns-cd64c8d7c-tcxbq failed scheduling:
  nodes snapshot: &cache.Snapshot{nodeInfoMap:map[string]*nodeinfo.NodeInfo{"master-0-scheduler":(*nodeinfo.NodeInfo)(0xc000607040), "master-1-scheduler":(*nodeinfo.NodeInfo)(0xc0001071e0), "master-2-scheduler":(*nodeinfo.NodeInfo)(0xc000326a90), "worker-pool1-60846k0y-scheduler":(*nodeinfo.NodeInfo)(0xc000952000), "worker-pool1-7dt1xd4k-scheduler":(*nodeinfo.NodeInfo)(0xc0007d08f0), "worker-pool1-hyk0hg7r-scheduler":(*nodeinfo.NodeInfo)(0xc0004f35f0)}, nodeInfoList:[]*nodeinfo.NodeInfo{(*nodeinfo.NodeInfo)(0xc000326a90), (*nodeinfo.NodeInfo)(0xc000952000), (*nodeinfo.NodeInfo)(0xc0007d08f0), (*nodeinfo.NodeInfo)(0xc0004f35f0), (*nodeinfo.NodeInfo)(0xc000607040), (*nodeinfo.NodeInfo)(0xc000952000)}, havePodsWithAffinityNodeInfoList:[]*nodeinfo.NodeInfo{(*nodeinfo.NodeInfo)(0xc000326a90), (*nodeinfo.NodeInfo)(0xc000607040)}, generation:857} 
  statuses: v1alpha1.NodeToStatusMap{"master-0-scheduler":(*v1alpha1.Status)(0xc000d824a0), "master-2-scheduler":(*v1alpha1.Status)(0xc000b736c0), "worker-pool1-60846k0y-scheduler":(*v1alpha1.Status)(0xc000d825a0), "worker-pool1-7dt1xd4k-scheduler":(*v1alpha1.Status)(0xc000b737e0), "worker-pool1-hyk0hg7r-scheduler":(*v1alpha1.Status)(0xc000b738c0)} 
如您所见，节点worker-pool1-60846k0y-scheduler通过过滤两次
                    
                    
                        
                            
                                
                                maelk
                                于 2020-07-20
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        不，该日志没有出现。 我还认为这可能是因为我们对并行化存在问题，或者该节点较早被过滤掉了。 如果失败并出现错误，请执行以下操作： Nordix @ 5c00cdf＃diff -c237cdd9e4cb201118ca380732d7f361R464在日志afaik中将可见，因此我将尝试围绕功能和并行化添加更多调试项。
是的，那里的错误将在pod事件中显示为“调度错误”。
我只是意识到一个节点要经过两次过滤！
老实说，我不认为并行化存在错误（仍然值得检查），但这可能是一个迹象，表明我们无法通过从缓存中构建快照（从缓存转储中看到，缓存是正确的）来构建快照节点两次。 由于状态是一个映射，因此有意义的是，我们仅在最后一个日志行“看到” 5个节点。
这是代码（1.18的提示） https://github.com/kubernetes/kubernetes/blob/ec73e191f47b7992c2f40fadf1389446d6661d6d/pkg/scheduler/internal/cache/cache.go#L203
                    
                    
                        
                            
                                
                                alculquicondor
                                于 2020-07-20
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        抄送@ ahg-g
                    
                    
                        
                            
                                
                                alculquicondor
                                于 2020-07-20
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        我将尝试在调度程序的缓存部分添加大量日志，特别是在节点添加和更新以及快照周围。 但是，从日志的最后一行，您可以看到快照实际上是正确的，并且包含所有节点，因此以后处理该快照时似乎会发生任何事情
                    
                    
                        
                            
                                
                                maelk
                                于 2020-07-20
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        缓存！=快照
缓存是从事件中更新的生物。 在每个调度周期之前（从缓存中）更新快照以“锁定”状态。 我们添加了优化措施，以使最后一个过程尽可能快。 该错误很可能在那里。
                    
                    
                        
                            
                                
                                alculquicondor
                                于 2020-07-20
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        谢谢@maelk！ 这非常有用。 您的日志表明(*nodeinfo.NodeInfo)(0xc000952000)在执行任何并行代码之前已在https://github.com/Nordix/kubernetes/commit/5c00cdf195fa61316f963f59e73c6cafc2ad9bdc#diff -c237cdd9e4cb201118ca380732d7f361R441中的列表中重复。 这确实意味着在快照更新之前将其复制。
                    
                    
                        
                            
                                
                                pancernik
                                于 2020-07-20
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        实际上，这来自快照，发生在以下日志消息之前： https : https://github.com/Nordix/kubernetes/commit/5c00cdf195fa61316f963f59e73c6cafc2ad9bdc#diff -c237cdd9e4cb201118ca380732d7f361R436
                    
                    
                        
                            
                                
                                maelk
                                于 2020-07-20
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        那就对了。 我的意思是在快照更新完成之前它已经被复制了。
                    
                    
                        
                            
                                
                                pancernik
                                于 2020-07-20
                            
                            
                                                                👍1
                            
                        
                    
                

                                                
                    
                        那就对了。 我的意思是在快照更新完成之前它已经被复制了。
否，在计划周期开始时更新快照。 该错误是在快照更新期间或之前。 但是根据https://github.com/kubernetes/kubernetes/issues/91601#issuecomment -659465008中的转储，缓存是正确的
编辑：我看错了，我没有看到“完成”这个词:)
                    
                    
                        
                            
                                
                                alculquicondor
                                于 2020-07-20
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        PR优化更新快照是在1.18中完成的： https ： https://github.com/kubernetes/kubernetes/pull/86919
                    
                    
                        
                            
                                
                                ahg-g
                                于 2020-07-20
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        我想知道节点树是否也有重复的记录
                    
                    
                        
                            
                                
                                ahg-g
                                于 2020-07-20
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        我想知道节点树是否也有重复的记录
@maelk您可以显示缓存中完整节点列表的转储吗？
                    
                    
                        
                            
                                
                                alculquicondor
                                于 2020-07-20
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        我们不从NodeInfoList中添加/删除项目，而是从树中创建或不从树中创建完整列表，因此，我认为如果存在重复项，则这些重复项很可能来自树。
                    
                    
                        
                            
                                
                                ahg-g
                                于 2020-07-20
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        只是澄清一下：
 1）集群有6个节点（包括主节点）
 2）根本没有检查应该承载Pod的节点（没有日志行指示该节点），这可能意味着它根本不在NodeInfoList中
 3）NodeInfoList有6个节点，但其中之一是重复的
                    
                    
                        
                            
                                
                                ahg-g
                                于 2020-07-20
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        我想知道节点树是否也有重复的记录
@maelk您可以显示缓存中完整节点列表的转储吗？
每个节点树，列表和映射的转储都很棒。
                    
                    
                        
                            
                                
                                ahg-g
                                于 2020-07-20
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        我将继续努力。 同时，进行了一次小更新。 我们可以在日志中看到：
I0720 13:37:30.530980       1 node_tree.go:100] Removed node "worker-pool1-60846k0y-scheduler" in group "" from NodeTree
I0720 13:37:30.531136       1 node_tree.go:86] Added node "worker-pool1-60846k0y-scheduler" in group "regionOne:\x00:nova" to NodeTree
这就是丢失的节点消失的确切时间。 日志中最后一次出现的时间是13:37:24。 在下一个调度中，丢失的节点消失了。 因此，该错误似乎在/跟随node_tree的更新。 所有节点都经历了该更新，只是该工作者608是经历该更新的最后一个。
转储高速缓存（使用SIGUSR2）时，所有六个节点都在此处列出，并且容器在节点上运行，没有重复或丢失节点。
                    
                    
                        
                            
                                
                                maelk
                                于 2020-07-21
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        我们将围绕快照功能添加调试功能进行新的尝试： https : 
                    
                    
                        
                            
                                
                                maelk
                                于 2020-07-21
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        从节点树中删除了“”组中的节点“ worker-pool1-60846k0y-scheduler”
有趣的是，我认为删除/添加是由updateNode调用触发的。 区域键在删除项上丢失，但是在添加项上存在，因此更新基本上是在添加区域和区域标签吗？
您是否还有与此节点相关的其他调度程序日志？
                    
                    
                        
                            
                                
                                ahg-g
                                于 2020-07-21
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        我们正在尝试使用添加的日志记录来重现该错误。 我有更多信息时会回来
                    
                    
                        
                            
                                
                                maelk
                                于 2020-07-21
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        我将继续努力。 同时，进行了一次小更新。 我们可以在日志中看到：
I0720 13:37:30.530980       1 node_tree.go:100] Removed node "worker-pool1-60846k0y-scheduler" in group "" from NodeTree
I0720 13:37:30.531136       1 node_tree.go:86] Added node "worker-pool1-60846k0y-scheduler" in group "regionOne:\x00:nova" to NodeTree
我要指出的是，这样的节点就是重复的节点。 @maelk ，您是否看到了其他节点的类似消息？ 与@ ahg-g一样，应该在节点第一次收到其拓扑标签时可以预期到这一点。
                    
                    
                        
                            
                                
                                alculquicondor
                                于 2020-07-21
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        是的，它发生在所有节点上，并且可以预期。 巧合的是，这个节点是最后一个更新的节点，而恰恰在那个时间，另一个节点丢失了
                    
                    
                        
                            
                                
                                maelk
                                于 2020-07-21
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        您是否获得缺少节点的更新日志？
                    
                    
                        
                            
                                
                                alculquicondor
                                于 2020-07-21
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        您是否获得缺少节点的更新日志？
大声笑，只是在输入这个问题。
                    
                    
                        
                            
                                
                                ahg-g
                                于 2020-07-21
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        可能的错误是，在删除所有节点之前，已从树中删除了整个区域。
                    
                    
                        
                            
                                
                                ahg-g
                                于 2020-07-21
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        为了澄清起见，我并不是亲自查看代码，而是试图确保我们拥有所有信息。 而且我认为，利用我们现在拥有的，我们应该能够发现该错误。 随意提交PR，如果可以提供失败的单元测试，那就更好了。
                    
                    
                        
                            
                                
                                alculquicondor
                                于 2020-07-21
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        您是否获得缺少节点的更新日志？
是的，它表明该区域已针对该丢失的节点进行了更新。 所有节点都有一个日志条目
老实说，我仍然不知道该错误的原因，但是如果我们能尽快发现它，我将提交PR或单元测试。
                    
                    
                        
                            
                                
                                maelk
                                于 2020-07-21
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        是的，它表明该区域已针对该丢失的节点进行了更新。 所有节点都有一个日志条目
如果是这样，那么我想假设这是“丢失的节点消失时的确切点”。 可能不相关。 让我们等待新的日志。 如果您可以共享在文件中获得的所有调度程序日志，那就太好了。
                    
                    
                        
                            
                                
                                ahg-g
                                于 2020-07-21
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        当我们使用新的日志记录进行复制时，我会做的。 从现有更新中，我们实际上可以看到，更新之后立即进行的pod调度是第一个失败的更新。 但这并不能提供足够的信息来知道两者之间发生了什么，因此请继续关注...
                    
                    
                        
                            
                                
                                maelk
                                于 2020-07-21
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        @maelk您是否在调度程序日志中看到以snapshot state is not consistent开头的消息？
您能否提供完整的调度程序日志？
                    
                    
                        
                            
                                
                                pancernik
                                于 2020-07-21
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        不，该消息不存在。 我可以提供条带化的日志文件（以避免重复），但让我们首先等待，直到输出的快照周围包含更多日志
                    
                    
                        
                            
                                
                                maelk
                                于 2020-07-21
                            
                            
                                                                👍1
                            
                        
                    
                

                                                
                    
                        我发现了错误。 问题在于nodeTree next（）函数，在某些情况下，该函数不会返回所有节点的列表。 https://github.com/kubernetes/kubernetes/blob/release-1.18/pkg/scheduler/internal/cache/node_tree.go#L147
如果在此处添加以下内容，则可见： https : 
{
    name:           "add nodes to a new and to an exhausted zone",
    nodesToAdd:     append(allNodes[5:9], allNodes[3]),
    nodesToRemove:  nil,
    operations:     []string{"add", "add", "next", "next", "add", "add", "add", "next", "next", "next", "next"},
    expectedOutput: []string{"node-6", "node-7", "node-3", "node-8", "node-6", "node-7"},
},
主要问题是，当添加节点时，某些区域的索引不为0。 为此，您必须至少有两个区域，一个区域比另一个区域短，而较长的区域在第一次调用下一个函数时索引未设置为0。
我要解决的问题是在第一次调用next（）之前重置索引。 我打开了一个PR以显示我的解决方法。 当然，它与1.18版本不符，因为这是我一直在研究的内容，但主要用于讨论如何修复它（或修复next（）函数本身）。 我可以向主管理员开放适当的PR，然后在需要时进行反向移植。
                    
                    
                        
                            
                                
                                maelk
                                于 2020-07-22
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        我注意到迭代存在相同的问题。 但是我无法将其链接到快照中的副本。 @maelk，您是否设法创建了一种可能发生这种情况的方案？
                    
                    
                        
                            
                                
                                pancernik
                                于 2020-07-22
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        是的，您可以通过添加我放入的小代码在单元测试中运行它
                    
                    
                        
                            
                                
                                maelk
                                于 2020-07-22
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        我现在正在为快照添加测试用例，以确保已正确测试了该用例。
                    
                    
                        
                            
                                
                                maelk
                                于 2020-07-22
                            
                            
                                                                🎉1
👍1
                            
                        
                    
                

                                                
                    
                        @igraecao非常感谢您在重现问题和在其设置中运行测试方面的帮助
                    
                    
                        
                            
                                
                                maelk
                                于 2020-07-22
                            
                            
                                                                👍1
                            
                        
                    
                

                                                
                    
                        感谢大家调试这个臭名昭著的问题。 在创建列表之前重置索引是安全的，因此我认为我们应该使用1.18和1.19补丁程序的索引，并在master分支中进行适当的修复。
next函数的用途随着NodeInfoList的引入而改变，因此我们可以简化它，甚至可以将其更改为toList ，该函数从树中创建一个列表并简单地启动从每一次开始。
                    
                    
                        
                            
                                
                                ahg-g
                                于 2020-07-22
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        现在，我明白了这个问题：计算区域是否已用尽是错误的，因为它没有考虑我们在每个区域中的哪个位置启动了“ UpdateSnapshot”过程。 是的，只有在不平坦的区域才能看到。
发现这个@maelk真是太好了！
我想我们在旧版本中也有同样的问题。 但是，由于我们每次都会经过一棵树，因此它被隐藏了。 而在1.18中，我们将结果快照化，直到树中发生更改为止。
既然轮循策略是在generic_scheduler.go中实现的，那么像您的PR所做的那样，只需在UpdateSnapshot之前简单地重置所有计数器就可以了。
https://github.com/kubernetes/kubernetes/blob/02cf58102a61b6d1e021e256381ff750573ce55d/pkg/scheduler/core/generic_scheduler.go#L357
只是仔细检查@ ahg-g，即使在群集中始终都添加/删除新节点的情况下也可以，对吗？
                    
                    
                        
                            
                                
                                alculquicondor
                                于 2020-07-23
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        感谢@maelk找出根本原因！
下一个函数的用途随着NodeInfoList的引入而改变，因此我们可以简化它，也可以将其更改为toList，该函数从树中创建一个列表，每次都从头开始。
鉴于仅在构建快照nodeInfoList中调用cache.nodeTree.next() ，我认为从nodeTree结构中删除索引（zoneIndex和nodeIndex）也是安全的。 相反，提出一个简单的nodeIterator()函数以循环方式遍历其区域/节点。
                    
                    
                        
                            
                                
                                Huang-Wei
                                于 2020-07-23
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        顺便说一句： https: //github.com/kubernetes/kubernetes/issues/91601#issuecomment -662663090中有一个错字，情况应该是：
{
    name:           "add nodes to a new and to an exhausted zone",
    nodesToAdd:     append(allNodes[6:9], allNodes[3]),
    nodesToRemove:  nil,
    operations:     []string{"add", "add", "next", "next", "add", "add", "next", "next", "next", "next"},
    expectedOutput: []string{"node-6", "node-7", "node-3", "node-8", "node-6", "node-7"},
    // with codecase on master and 1.18, its output is [node-6 node-7 node-3 node-8 node-6 node-3]
},

                    
                    
                        
                            
                                
                                Huang-Wei
                                于 2020-07-23
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        只是仔细检查@ ahg-g，即使在群集中始终都添加/删除新节点的情况下也可以，对吗？
我假设您正在谈论generic_scheduler.go中的逻辑，如果是的话，添加或删除节点并不重要，我们需要避免的主要事情是每次都以相同的顺序遍历节点我们计划一个Pod，只需要在Pod上的节点上进行迭代就可以了。
                    
                    
                        
                            
                                
                                ahg-g
                                于 2020-07-23
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        鉴于只在构建快照nodeInfoList时才调用cache.nodeTree.next（），我认为从nodeTree结构中删除索引（zoneIndex和nodeIndex）也是安全的。 相反，想出一个简单的nodeIterator（）函数以循环方式遍历其区域/节点。
是的，我们只需要每次都以相同的顺序遍历所有区域/节点。
                    
                    
                        
                            
                                
                                ahg-g
                                于 2020-07-23
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        我已经使用单元测试更新了PR，以用于更新快照列表的功能，特别是针对该bug。 我还可以照顾到重构next（）函数以在没有循环的情况下遍历区域和节点，从而消除了问题。
                    
                    
                        
                            
                                
                                maelk
                                于 2020-07-23
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        谢谢，听起来不错，但是我们还是应该像现在这样，通过设计在区域之间进行迭代。
                    
                    
                        
                            
                                
                                ahg-g
                                于 2020-07-23
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        我真的不明白你的意思。 是因为节点的顺序很重要，而且我们仍然必须在区域之间进行轮询，还是可以列出一个区域的所有节点，一个区域在另一个区域之后？ 假设您有两个区域，每个区域有两个节点，您希望它们按什么顺序排列，或者根本不重要？
                    
                    
                        
                            
                                
                                maelk
                                于 2020-07-23
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        顺序很重要，我们需要在创建列表时在区域之间交替。 如果您有两个包含两个节点的区域，每个区域分别z1: {n11, n12}和z2: {n21, n22} ，则列表应为{n11, n21, n12, n22}
                    
                    
                        
                            
                                
                                ahg-g
                                于 2020-07-23
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        好的，谢谢，我会考虑的。 我们可以同时进行快速修复吗？ 顺便说一句，一些测试失败了，但我不确定这与我的PR有何关系
                    
                    
                        
                            
                                
                                maelk
                                于 2020-07-23
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        那些是片状的。 请同时发送补丁到1.18。
                    
                    
                        
                            
                                
                                ahg-g
                                于 2020-07-23
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        好的，我会做。 谢谢
                    
                    
                        
                            
                                
                                maelk
                                于 2020-07-23
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        {
  name:           "add nodes to a new and to an exhausted zone",
  nodesToAdd:     append(allNodes[5:9], allNodes[3]),
  nodesToRemove:  nil,
  operations:     []string{"add", "add", "next", "next", "add", "add", "add", "next", "next", "next", "next"},
  expectedOutput: []string{"node-6", "node-7", "node-3", "node-8", "node-6", "node-7"},
},
@maelk ，您是说该测试忽略了“ node-5”吗？
在固定https://github.com/kubernetes/kubernetes/pull/93516中的追加后，我发现，可以迭代所有节点的测试结果：
{
            name:           "add nodes to a new and to an exhausted zone",
            nodesToAdd:     append(append(make([]*v1.Node, 0), allNodes[5:9]...), allNodes[3]),
            nodesToRemove:  nil,
            operations:     []string{"add", "add", "next", "next", "add", "add", "add", "next", "next", "next", "next"},
            expectedOutput: []string{"node-5", "node-6", "node-3", "node-7", "node-8", "node-5"},
},
节点5、6、7、8、3可以迭代。
如果我误解了这里，请原谅我。
                    
                    
                        
                            
                                
                                soulxu
                                于 2020-07-29
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        是的，它是有意的，基于那里的内容，但我可以看到它是如何神秘的，因此最好将其制成，以使附件的行为更清晰。 感谢您的补丁。
                    
                    
                        
                            
                                
                                maelk
                                于 2020-07-29
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        您认为这个错误存在多久了？ 1.17？ 1.16？ 我刚刚在AWS上的1.17中看到了完全相同的问题，并且重新启动未计划的节点解决了该问题。
                    
                    
                        
                            
                                
                                judgeaxl
                                于 2020-09-14
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        @judgeaxl您能否提供更多详细信息？ 日志行，缓存转储等。因此，我们可以确定问题是否相同。
正如我在https://github.com/kubernetes/kubernetes/issues/91601#issuecomment -662746695中指出的那样，我相信此错误存在于较旧的版本中，但我认为这是暂时的。
@maelk您可以调查吗？
                    
                    
                        
                            
                                
                                alculquicondor
                                于 2020-09-14
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        还请共享区域中节点的分布。
                    
                    
                        
                            
                                
                                alculquicondor
                                于 2020-09-14
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        不幸的是， @ alculquicondor我现在不能。 抱歉。
                    
                    
                        
                            
                                
                                maelk
                                于 2020-09-14
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        @alculquicondor抱歉，由于其他原因，我已经重建了群集，但是它可能是与多az部署有关的网络配置问题，以及在哪个子网中启动了故障节点，因此，我现在不必担心它此问题的背景。 如果我再次注意到它，我会报告更详细的信息。 谢谢！
                    
                    
                        
                            
                                
                                judgeaxl
                                于 2020-09-15
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        / retitle当区域不平衡时，在调度中不考虑某些节点
                    
                    
                        
                            
                                
                                alculquicondor
                                于 2020-09-15



            
                
                    
                        此页面是否有帮助？
                                                                                                    
                                                                                                                        
                                                                
                                                                
                                                                
                                                                
                                                                                    
                        0 / 5 - 0 等级


    

        

        
            
                相关问题
                                                
                    
                        为守护进程集实现滚动更新
                    
                
                
                    
                    Seb-Solon
                                         · 
                    3评论
                                    
                 
                                                
                    
                        【问题】如何知道pod属于什么（Rep.Controller、Deployment、Replica Set等？）
                    
                
                
                    
                    jason-riddle
                                         · 
                    3评论
                                    
                 
                                                
                    
                        在 pod 中安装 nfs 卷已损坏
                    
                
                
                    
                    sjenning
                                         · 
                    3评论
                                    
                 
                                                
                    
                        使用 go 1.7.1 编译 darwin 二进制文件，以便它们停止段错误
                    
                
                
                    
                    pwittrock
                                         · 
                    3评论
                                    
                 
                                                
                    
                        Stackdriver 上自己的容器没有 GKE CPU 使用率
                    
                
                
                    
                    alexferl
                                         · 
                    3评论