Kubernetes: ๋ฌด์‹œ ๋œ pod-eviction-timeout ์„ค์ •

์— ๋งŒ๋“  2019๋…„ 02์›” 27์ผ  ยท  15์ฝ”๋ฉ˜ํŠธ  ยท  ์ถœ์ฒ˜: kubernetes/kubernetes

๋ฒ„๊ทธ๋ฅผ๋ณด๊ณ  ํ•  ๋•Œ์ด ํ…œํ”Œ๋ฆฟ์„ ์‚ฌ์šฉํ•˜๊ณ  ๊ฐ€๋Šฅํ•œ ํ•œ ๋งŽ์€ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•˜์‹ญ์‹œ์˜ค. ๊ทธ๋ ‡๊ฒŒํ•˜์ง€ ์•Š์œผ๋ฉด ๋ฒ„๊ทธ๊ฐ€ ์ ์‹œ์— ํ•ด๊ฒฐ๋˜์ง€ ์•Š์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ฐ์‚ฌ! ๋ณด์•ˆ ๊ด€๋ จ ๋ฌธ์ œ์ธ ๊ฒฝ์šฐ https://kubernetes.io/security/๋ฅผ ํ†ตํ•ด ๋น„๊ณต๊ฐœ๋กœ ๊ณต๊ฐœํ•˜์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค.

๋ฌด์Šจ ์ผ์ด ์žˆ์—ˆ๋Š”์ง€ : ๋งˆ์Šคํ„ฐ ๋…ธ๋“œ์—์„œ kube-controller-manager์˜ pod-eviction-timeout ์„ค์ •์„ ์ˆ˜์ •ํ–ˆ์Šต๋‹ˆ๋‹ค (๋…ธ๋“œ ์žฅ์• ์‹œ k8s๊ฐ€ ํฌ๋“œ๋ฅผ ๋‹ค์‹œ ์ƒ์„ฑํ•˜๊ธฐ๊นŒ์ง€ ๊ฑธ๋ฆฌ๋Š” ์‹œ๊ฐ„์„ ์ค„์ด๊ธฐ ์œ„ํ•ด). ๊ธฐ๋ณธ๊ฐ’์€ 5 ๋ถ„์ด๊ณ  30 ์ดˆ๋ฅผ ๊ตฌ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค. sudo docker ps --no-trunc | grep "kube-controller-manager" ๋ช…๋ น์„ ์‚ฌ์šฉํ•˜์—ฌ ์ˆ˜์ •์ด ์„ฑ๊ณต์ ์œผ๋กœ ์ ์šฉ๋˜์—ˆ๋Š”์ง€ ํ™•์ธํ–ˆ์Šต๋‹ˆ๋‹ค.

kubeadmin<strong i="10">@nodetest21</strong>:~$ sudo docker ps --no-trunc | grep "kube-controller-manager"
387261c61ee9cebce50de2540e90b89e2bc710b4126a0c066ef41f0a1fb7cf38   sha256:0482f640093306a4de7073fde478cf3ca877b6fcc2c4957624dddb2d304daef5                         "kube-controller-manager --address=127.0.0.1 --authentication-kubeconfig=/etc/kubernetes/controller-manager.conf --authorization-kubeconfig=/etc/kubernetes/controller-manager.conf --client-ca-file=/etc/kubernetes/pki/ca.crt --cluster-signing-cert-file=/etc/kubernetes/pki/ca.crt --cluster-signing-key-file=/etc/kubernetes/pki/ca.key --controllers=*,bootstrapsigner,tokencleaner --kubeconfig=/etc/kubernetes/controller-manager.conf --leader-elect=true --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt --root-ca-file=/etc/kubernetes/pki/ca.crt --service-account-private-key-file=/etc/kubernetes/pki/sa.key --use-service-account-credentials=true --pod-eviction-timeout=30s" 

๋‘ ๊ฐœ์˜ ๋ณต์ œ๋ณธ์ด์žˆ๋Š” ๊ธฐ๋ณธ ๋ฐฐํฌ๋ฅผ ์ ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: busybox
  namespace: default
spec:
  replicas: 2
  selector:
    matchLabels:
      app: busybox
  template:
    metadata:
      labels:
        app: busybox
    spec:
      containers:
      - image: busybox
        command:
        - sleep
        - "3600"
        imagePullPolicy: IfNotPresent
        name: busybox
      restartPolicy: Always

์ฒซ ๋ฒˆ์งธ ์ž‘์—…์ž ๋…ธ๋“œ์— ์ƒ์„ฑ ๋œ ์ฒซ ๋ฒˆ์งธ ํฌ๋“œ, ๋‘ ๋ฒˆ์งธ ์ž‘์—…์ž ๋…ธ๋“œ์— ์ƒ์„ฑ ๋œ ๋‘ ๋ฒˆ์งธ ํฌ๋“œ :

NAME         STATUS   ROLES    AGE   VERSION
nodetest21   Ready    master   34m   v1.13.3
nodetest22   Ready    <none>   31m   v1.13.3
nodetest23   Ready    <none>   30m   v1.13.3

NAMESPACE     NAME                                 READY   STATUS    RESTARTS   AGE   IP          NODE         NOMINATED NODE   READINESS GATES
default       busybox-74b487c57b-5s6g7             1/1     Running   0          13s   10.44.0.2   nodetest22   <none>           <none>
default       busybox-74b487c57b-6zdvv             1/1     Running   0          13s   10.36.0.1   nodetest23   <none>           <none>
kube-system   coredns-86c58d9df4-gmcjd             1/1     Running   0          34m   10.32.0.2   nodetest21   <none>           <none>
kube-system   coredns-86c58d9df4-wpffr             1/1     Running   0          34m   10.32.0.3   nodetest21   <none>           <none>
kube-system   etcd-nodetest21                      1/1     Running   0          33m   10.0.1.4    nodetest21   <none>           <none>
kube-system   kube-apiserver-nodetest21            1/1     Running   0          33m   10.0.1.4    nodetest21   <none>           <none>
kube-system   kube-controller-manager-nodetest21   1/1     Running   0          20m   10.0.1.4    nodetest21   <none>           <none>
kube-system   kube-proxy-6mcn8                     1/1     Running   1          31m   10.0.1.5    nodetest22   <none>           <none>
kube-system   kube-proxy-dhdqj                     1/1     Running   0          30m   10.0.1.6    nodetest23   <none>           <none>
kube-system   kube-proxy-vqjg8                     1/1     Running   0          34m   10.0.1.4    nodetest21   <none>           <none>
kube-system   kube-scheduler-nodetest21            1/1     Running   1          33m   10.0.1.4    nodetest21   <none>           <none>
kube-system   weave-net-9qls7                      2/2     Running   3          31m   10.0.1.5    nodetest22   <none>           <none>
kube-system   weave-net-h2cb6                      2/2     Running   0          33m   10.0.1.4    nodetest21   <none>           <none>
kube-system   weave-net-vkb62                      2/2     Running   0          30m   10.0.1.6    nodetest23   <none>           <none>

์˜ฌ๋ฐ”๋ฅธ ํฌ๋“œ ์ œ๊ฑฐ๋ฅผ ํ…Œ์ŠคํŠธํ•˜๊ธฐ ์œ„ํ•ด ์ฒซ ๋ฒˆ์งธ ์ž‘์—…์ž ๋…ธ๋“œ๋ฅผ ์ข…๋ฃŒํ–ˆ์Šต๋‹ˆ๋‹ค. ~ 1 ๋ถ„ ํ›„ ์ฒซ ๋ฒˆ์งธ ์ž‘์—…์ž ๋…ธ๋“œ์˜ ์ƒํƒœ๊ฐ€ "NotReady"๋กœ ๋ณ€๊ฒฝ๋œ ๋‹ค์Œ
๊บผ์ง„ ๋…ธ๋“œ์˜ ํฌ๋“œ๊ฐ€ ๋‹ค๋ฅธ ๋…ธ๋“œ์—์„œ ๋‹ค์‹œ ์ƒ์„ฑ ๋  ๋•Œ๊นŒ์ง€ +5 ๋ถ„ (๊ธฐ๋ณธ ํฌ๋“œ ์ œ๊ฑฐ ์ œํ•œ ์‹œ๊ฐ„)์„ ๊ธฐ๋‹ค๋ ค์•ผํ–ˆ์Šต๋‹ˆ๋‹ค.

์˜ˆ์ƒ ํ•œ ์ผ :
๋…ธ๋“œ ์ƒํƒœ๊ฐ€ "NotReady"๋ฅผ๋ณด๊ณ  ํ•œ ํ›„ ๊ธฐ๋ณธ 5 ๋ถ„์ด๋ฉด ๋Œ€์‹  30 ์ดˆ ํ›„์— ๋‹ค๋ฅธ ๋…ธ๋“œ์—์„œ ํฌ๋“œ๋ฅผ ๋‹ค์‹œ ๋งŒ๋“ค์–ด์•ผํ•ฉ๋‹ˆ๋‹ค!

์žฌํ˜„ ๋ฐฉ๋ฒ• (๊ฐ€๋Šฅํ•œ ํ•œ ์ตœ์†Œํ•œ์œผ๋กœ ์ •ํ™•ํ•˜๊ฒŒ) :
์„ธ ๊ฐœ์˜ ๋…ธ๋“œ๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค. ์ฒซ ๋ฒˆ์งธ ๋…ธ๋“œ ( sudo kubeadm init )์—์„œ Kubernetes๋ฅผ ์ดˆ๊ธฐํ™”ํ•˜๊ณ  ๋„คํŠธ์›Œํฌ ํ”Œ๋Ÿฌ๊ทธ์ธ ( kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')" )์„ ์ ์šฉํ•œ ๋‹ค์Œ ๋‹ค๋ฅธ ๋‘ ๋…ธ๋“œ๋ฅผ ๊ฒฐํ•ฉํ•ฉ๋‹ˆ๋‹ค (์˜ˆ : kubeadm join 10.0.1.4:6443 --token xdx9y1.z7jc0j7c8g8lpjog --discovery-token-ca-cert-hash sha256:04ae8388f607755c14eed702a23fd47802d5512e092b08add57040a2ae0736ac ).
๋งˆ์Šคํ„ฐ ๋…ธ๋“œ์˜ Kube Controller Manager์— pod-eviction-timeout ๋งค๊ฐœ ๋ณ€์ˆ˜๋ฅผ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. sudo vi /etc/kubernetes/manifests/kube-controller-manager.yaml :

apiVersion: v1
kind: Pod
metadata:
  annotations:
    scheduler.alpha.kubernetes.io/critical-pod: ""
  creationTimestamp: null
  labels:
    component: kube-controller-manager
    tier: control-plane
  name: kube-controller-manager
  namespace: kube-system
spec:
  containers:
  - command:
    - kube-controller-manager
    - --address=127.0.0.1
    - --authentication-kubeconfig=/etc/kubernetes/controller-manager.conf
    - --authorization-kubeconfig=/etc/kubernetes/controller-manager.conf
    - --client-ca-file=/etc/kubernetes/pki/ca.crt
    - --cluster-signing-cert-file=/etc/kubernetes/pki/ca.crt
    - --cluster-signing-key-file=/etc/kubernetes/pki/ca.key
    - --controllers=*,bootstrapsigner,tokencleaner
    - --kubeconfig=/etc/kubernetes/controller-manager.conf
    - --leader-elect=true
    - --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt
    - --root-ca-file=/etc/kubernetes/pki/ca.crt
    - --service-account-private-key-file=/etc/kubernetes/pki/sa.key
    - --use-service-account-credentials=true
    - --pod-eviction-timeout=30s

(yaml์€ ์ž˜๋ฆฌ๊ณ  ์—ฌ๊ธฐ์—๋Š” ๊ด€๋ จ๋œ ์ฒซ ๋ฒˆ์งธ ๋ถ€๋ถ„ ๋งŒ ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค).

์„ค์ •์ด ์ ์šฉ๋˜์—ˆ๋Š”์ง€ ํ™•์ธํ•˜์‹ญ์‹œ์˜ค.
sudo docker ps --no-trunc | grep "kube-controller-manager"

๋‘ ๊ฐœ์˜ ๋ณต์ œ๋ณธ์ด์žˆ๋Š” ๋ฐฐ์น˜๋ฅผ ์ ์šฉํ•˜๊ณ  ์ฒซ ๋ฒˆ์งธ ์ž‘์—…์ž ๋…ธ๋“œ์— ํ•˜๋‚˜์˜ ํŒŸ (Pod)์ด ์ž‘์„ฑ๋˜๊ณ  ๋‘ ๋ฒˆ์งธ ์ž‘์—…์ž ๋…ธ๋“œ์— ๋‘ ๋ฒˆ์งธ๊ฐ€ ์ž‘์„ฑ๋˜๋Š”์ง€ ํ™•์ธํ•˜์‹ญ์‹œ์˜ค.
๋…ธ๋“œ ์ค‘ ํ•˜๋‚˜๋ฅผ ์ข…๋ฃŒํ•˜๊ณ  ๋…ธ๋“œ๊ฐ€ "NotReady"๋ฅผ๋ณด๊ณ ํ•˜๊ณ  ํฌ๋“œ๊ฐ€ ๋‹ค์‹œ ์ƒ์„ฑ ๋  ๋•Œ ์ด๋ฒคํŠธ ์‚ฌ์ด์˜ ๊ฒฝ๊ณผ ์‹œ๊ฐ„์„ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค.

์šฐ๋ฆฌ๊ฐ€ ์•Œ์•„์•ผ ํ•  ๋‹ค๋ฅธ ๊ฒƒ์ด ์žˆ์Šต๋‹ˆ๊นŒ? :
๋‹ค์ค‘ ๋งˆ์Šคํ„ฐ ํ™˜๊ฒฝ์—์„œ๋„ ๋™์ผํ•œ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

ํ™˜๊ฒฝ :

  • Kubernetes ๋ฒ„์ „ ( kubectl version ) : v1.13.3
    ํด๋ผ์ด์–ธํŠธ ๋ฒ„์ „ : version.Info {Major : "1", Minor : "13", GitVersion : "v1.13.3", GitCommit : "721bfa751924da8d1680787490c54b9179b1fed0", GitTreeState : "clean", BuildDate : "2019-02-01T20 : 08 : 12Z ", GoVersion :"go1.11.5 ", ์ปดํŒŒ์ผ๋Ÿฌ :"gc ", ํ”Œ๋žซํผ :"linux / amd64 "}
    ์„œ๋ฒ„ ๋ฒ„์ „ : version.Info {Major : "1", Minor : "13", GitVersion : "v1.13.3", GitCommit : "721bfa751924da8d1680787490c54b9179b1fed0", GitTreeState : "clean", BuildDate : "2019-02-01T20 : 00 : 57Z ", GoVersion :"go1.11.5 ", ์ปดํŒŒ์ผ๋Ÿฌ :"gc ", ํ”Œ๋žซํผ :"linux / amd64 "}
  • ํด๋ผ์šฐ๋“œ ๊ณต๊ธ‰์ž ๋˜๋Š” ํ•˜๋“œ์›จ์–ด ๊ตฌ์„ฑ : Azure VM
  • OS (์˜ˆ : cat /etc/os-release ) : NAME = "Ubuntu"VERSION = "16.04.5 LTS (Xenial Xerus)"
  • ์ปค๋„ (์˜ˆ : uname -a ) : Linux nodetest21 4.15.0-1037-azure # 39 ~ 16.04.1-Ubuntu SMP 1 ์›” 15 ์ผ ํ™”์š”์ผ 17:20:47 UTC 2019 x86_64 x86_64 x86_64 GNU / Linux
  • ๋„๊ตฌ ์„ค์น˜ :
  • ๊ธฐํƒ€ : Docker v18.06.1-ce
kinbug siapps sinode

๊ฐ€์žฅ ์œ ์šฉํ•œ ๋Œ“๊ธ€

๊ท€ํ•˜์˜ ํ”ผ๋“œ๋ฐฑ์„ ์ฃผ์…”์„œ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค ChiefAlexander!
๊ทธ๊ฒƒ์ด ์ƒํ™ฉ์ด๋ผ๊ณ  ์ผ์Šต๋‹ˆ๋‹ค. ํฌ๋“œ๋ฅผ ํ™•์ธํ–ˆ๊ณ  ํ—ˆ์šฉ ํ•  ํฌ๋“œ์— ํ• ๋‹น ๋œ ๊ธฐ๋ณธ๊ฐ’์ด ์žˆ๋Š”์ง€ ํ™•์ธํ–ˆ์Šต๋‹ˆ๋‹ค.

kubectl describe pod busybox-74b487c57b-95b6n | grep -i toleration -A 2
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s

๊ทธ๋ž˜์„œ ๊ทธ๋ƒฅ ๋ฐฐํฌ์— ๋‚ด ๊ฐ€์น˜๋ฅผ ์ถ”๊ฐ€ํ–ˆ์Šต๋‹ˆ๋‹ค.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: busybox
  namespace: default
spec:
  replicas: 2
  selector:
    matchLabels:
      app: busybox
  template:
    metadata:
      labels:
        app: busybox
    spec:
      tolerations:
      - key: "node.kubernetes.io/unreachable"
        operator: "Exists"
        effect: "NoExecute"
        tolerationSeconds: 2
      - key: "node.kubernetes.io/not-ready"
        operator: "Exists"
        effect: "NoExecute"
        tolerationSeconds: 2
      containers:
      - image: busybox
        command:
        - sleep
        - "3600"
        imagePullPolicy: IfNotPresent
        name: busybox
      restartPolicy: Always

๋…ธ๋“œ ์žฅ์• ์‹œ ๋ฐฐํฌ๋ฅผ ์ ์šฉํ•œ ํ›„ ๋…ธ๋“œ ์ƒํƒœ๊ฐ€ "NotReady"๋กœ ๋ณ€๊ฒฝ๋œ ๋‹ค์Œ 2 ์ดˆ ํ›„์— ํฌ๋“œ๊ฐ€ ๋‹ค์‹œ ์ƒ์„ฑ๋ฉ๋‹ˆ๋‹ค.

๋”ฐ๋ผ์„œ ๋” ์ด์ƒ pod-eviction-timeout์„ ์ฒ˜๋ฆฌ ํ•  ํ•„์š”๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค. ํƒ€์ž„ ์•„์›ƒ์€ Pod ๊ธฐ์ค€์œผ๋กœ ์„ค์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค! ๋ฉ‹์žˆ๋Š”!

๋„์™€ ์ฃผ์…”์„œ ๋‹ค์‹œ ํ•œ ๋ฒˆ ๊ฐ์‚ฌ๋“œ๋ฆฝ๋‹ˆ๋‹ค!

๋ชจ๋“  15 ๋Œ“๊ธ€

@ kubernetes / sig-node-bugs
@ kubernetes / sig-apps-bugs

@danielloczi : ์•Œ๋ฆผ์„ ํŠธ๋ฆฌ๊ฑฐํ•˜๊ธฐ ์œ„ํ•ด ์–ธ๊ธ‰์„ ๋ฐ˜๋ณตํ•ฉ๋‹ˆ๋‹ค.
@ kubernetes / sig-node-bugs, @ kubernetes / sig-apps-bugs

์— ๋Œ€ํ•œ ์‘๋‹ต ์ด :

@ kubernetes / sig-node-bugs
@ kubernetes / sig-apps-bugs

PR ๋Œ“๊ธ€์„ ์‚ฌ์šฉํ•˜์—ฌ ๋‚˜์™€ ์ƒํ˜ธ ์ž‘์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ์—ฌ๊ธฐ์—์„œ ํ™•์ธํ•  ์ˆ˜ kubernetes / test-infra ์ €์žฅ์†Œ์— ๋ฌธ์ œ๋ฅผ ์ œ์ถœํ•˜์„ธ์š”.

๋˜ํ•œ ํ‡ด๊ฑฐ ์ œํ•œ ์‹œ๊ฐ„์„ ๋‚ฎ๊ฒŒ ์„ค์ •ํ•˜๋Š” ๋™์•ˆ์ด ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด ๋ฌธ์ œ๋ฅผ ์ž ์‹œ ์‚ดํŽด๋ณธ ํ›„ ์›์ธ์ด ์ƒˆ๋กœ์šด TaintBasedEvictions๋ผ๋Š” ๊ฒƒ์„ ์•Œ์•„ ๋ƒˆ์Šต๋‹ˆ๋‹ค.

๋ฒ„์ „ 1.13์—์„œ๋Š” TaintBasedEvictions ๊ธฐ๋Šฅ์ด ๋ฒ ํƒ€๋กœ ์Šน๊ฒฉ๋˜๊ณ  ๊ธฐ๋ณธ์ ์œผ๋กœ ํ™œ์„ฑํ™”๋˜์–ด ์žˆ์œผ๋ฏ€๋กœ Taint๋Š” NodeController (๋˜๋Š” kubelet)์— ์˜ํ•ด ์ž๋™์œผ๋กœ ์ถ”๊ฐ€๋˜๊ณ  Ready NodeCondition์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋…ธ๋“œ์—์„œ ํฌ๋“œ๋ฅผ ์ œ๊ฑฐํ•˜๋Š” ์ผ๋ฐ˜ ๋…ผ๋ฆฌ๊ฐ€ ๋น„ํ™œ์„ฑํ™”๋ฉ๋‹ˆ๋‹ค.

์ด์— ๋Œ€ํ•œ ๊ธฐ๋Šฅ ํ”Œ๋ž˜๊ทธ๋ฅผ false๋กœ ์„ค์ •ํ•˜๋ฉด pod๊ฐ€ ์˜ˆ์ƒ๋Œ€๋กœ ์ œ๊ฑฐ๋ฉ๋‹ˆ๋‹ค. ์˜ค์—ผ ๊ธฐ๋ฐ˜ ์ œ๊ฑฐ ์ฝ”๋“œ๋ฅผ ๊ฒ€์ƒ‰ํ•˜๋Š” ๋ฐ ์‹œ๊ฐ„์ด ๊ฑธ๋ฆฌ์ง€ ์•Š์•˜์ง€๋งŒ์ด ์ œ๊ฑฐ ์‹œ๊ฐ„ ์ดˆ๊ณผ ํ”Œ๋ž˜๊ทธ๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š” ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

์ด๊ฒƒ์„ ๋” ์‚ดํŽด๋ณด๋ฉด. TaintBasedEvictions๋ฅผ true๋กœ ์„ค์ •ํ•˜๋ฉด ํ—ˆ์šฉ ๋ฒ”์œ„์—์„œ ํ•ด๋‹น ์‚ฌ์–‘ ๋‚ด์—์„œ ํฌ๋“œ ์ œ๊ฑฐ ์‹œ๊ฐ„์„ ์„ค์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/#taint -based-evictions
์ด๋“ค์˜ ๊ธฐ๋ณธ๊ฐ’์€ ์Šน์ธ ์ปจํŠธ๋กค๋Ÿฌ์— ์˜ํ•ด ์„ค์ •๋ฉ๋‹ˆ๋‹ค : https://github.com/kubernetes/kubernetes/blob/master/plugin/pkg/admission/defaulttolerationseconds/admission.go#L34
์ด ๋‘ ํ”Œ๋ž˜๊ทธ๋Š” kube-apiserver๋ฅผ ํ†ตํ•ด ์„ค์ •ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ ๋™์ผํ•œ ํšจ๊ณผ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

// Controller will not proactively sync node health, but will monitor node
// health signal updated from kubelet. There are 2 kinds of node healthiness
// signals: NodeStatus and NodeLease. NodeLease signal is generated only when
// NodeLease feature is enabled. If it doesn't receive update for this amount
// of time, it will start posting "NodeReady==ConditionUnknown". The amount of
// time before which Controller start evicting pods is controlled via flag
// 'pod-eviction-timeout'.
// Note: be cautious when changing the constant, it must work with
// nodeStatusUpdateFrequency in kubelet and renewInterval in NodeLease
// controller. The node health signal update frequency is the minimal of the
// two.
// There are several constraints:
// 1. nodeMonitorGracePeriod must be N times more than  the node health signal
//    update frequency, where N means number of retries allowed for kubelet to
//    post node status/lease. It is pointless to make nodeMonitorGracePeriod
//    be less than the node health signal update frequency, since there will
//    only be fresh values from Kubelet at an interval of node health signal
//    update frequency. The constant must be less than podEvictionTimeout.
// 2. nodeMonitorGracePeriod can't be too large for user experience - larger
//    value takes longer for user to see up-to-date node health.

๊ท€ํ•˜์˜ ํ”ผ๋“œ๋ฐฑ์„ ์ฃผ์…”์„œ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค ChiefAlexander!
๊ทธ๊ฒƒ์ด ์ƒํ™ฉ์ด๋ผ๊ณ  ์ผ์Šต๋‹ˆ๋‹ค. ํฌ๋“œ๋ฅผ ํ™•์ธํ–ˆ๊ณ  ํ—ˆ์šฉ ํ•  ํฌ๋“œ์— ํ• ๋‹น ๋œ ๊ธฐ๋ณธ๊ฐ’์ด ์žˆ๋Š”์ง€ ํ™•์ธํ–ˆ์Šต๋‹ˆ๋‹ค.

kubectl describe pod busybox-74b487c57b-95b6n | grep -i toleration -A 2
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s

๊ทธ๋ž˜์„œ ๊ทธ๋ƒฅ ๋ฐฐํฌ์— ๋‚ด ๊ฐ€์น˜๋ฅผ ์ถ”๊ฐ€ํ–ˆ์Šต๋‹ˆ๋‹ค.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: busybox
  namespace: default
spec:
  replicas: 2
  selector:
    matchLabels:
      app: busybox
  template:
    metadata:
      labels:
        app: busybox
    spec:
      tolerations:
      - key: "node.kubernetes.io/unreachable"
        operator: "Exists"
        effect: "NoExecute"
        tolerationSeconds: 2
      - key: "node.kubernetes.io/not-ready"
        operator: "Exists"
        effect: "NoExecute"
        tolerationSeconds: 2
      containers:
      - image: busybox
        command:
        - sleep
        - "3600"
        imagePullPolicy: IfNotPresent
        name: busybox
      restartPolicy: Always

๋…ธ๋“œ ์žฅ์• ์‹œ ๋ฐฐํฌ๋ฅผ ์ ์šฉํ•œ ํ›„ ๋…ธ๋“œ ์ƒํƒœ๊ฐ€ "NotReady"๋กœ ๋ณ€๊ฒฝ๋œ ๋‹ค์Œ 2 ์ดˆ ํ›„์— ํฌ๋“œ๊ฐ€ ๋‹ค์‹œ ์ƒ์„ฑ๋ฉ๋‹ˆ๋‹ค.

๋”ฐ๋ผ์„œ ๋” ์ด์ƒ pod-eviction-timeout์„ ์ฒ˜๋ฆฌ ํ•  ํ•„์š”๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค. ํƒ€์ž„ ์•„์›ƒ์€ Pod ๊ธฐ์ค€์œผ๋กœ ์„ค์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค! ๋ฉ‹์žˆ๋Š”!

๋„์™€ ์ฃผ์…”์„œ ๋‹ค์‹œ ํ•œ ๋ฒˆ ๊ฐ์‚ฌ๋“œ๋ฆฝ๋‹ˆ๋‹ค!

@danielloczi ์•ˆ๋…•ํ•˜์„ธ์š” danielloczi,์ด ๋ฌธ์ œ๋ฅผ ์–ด๋–ป๊ฒŒ ํ•ด๊ฒฐํ•ฉ๋‹ˆ๊นŒ? ๋‚˜๋„์ด ๋ฌธ์ œ๋ฅผ ๋งŒ๋‚˜

@ 323929 @danielloczi ๋Š” kube-controller-manager์˜ pod-eviction-timeout ๋งค๊ฐœ ๋ณ€์ˆ˜์— ๋Œ€ํ•ด ์‹ ๊ฒฝ ์“ฐ์ง€ ์•Š๋Š”๋‹ค๊ณ  ์ƒ๊ฐํ•˜์ง€๋งŒ Taint based Evictions ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ•ด๊ฒฐํ•ฉ๋‹ˆ๋‹ค ๏ผŒ ์ €๋Š” Taint based Evictions ํ…Œ์ŠคํŠธํ–ˆ์Šต๋‹ˆ๋‹ค. ๋‚˜๋ฅผ ์œ„ํ•ด.

๋งž์Šต๋‹ˆ๋‹ค : ์ €๋Š” ๋‹จ์ˆœํžˆ Taint based Eviction ๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ์‹œ์ž‘ํ–ˆ์Šต๋‹ˆ๋‹ค.

๊ธ€๋กœ๋ฒŒํ™”๊ฐ€ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๊นŒ? ๊ฐ ํฌ๋“œ ๊ตฌ์„ฑ์— ๋Œ€ํ•ด ํ™œ์„ฑํ™”ํ•˜๊ณ  ์‹ถ์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ํŠนํžˆ helm์—์„œ ์ค€๋น„๋œ ๋งŽ์€ ๊ฒƒ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

์ „์ฒด ํด๋Ÿฌ์Šคํ„ฐ๋ณ„๋กœ ๊ตฌ์„ฑ ํ•  ์ˆ˜์žˆ๋Š” ๊ฐ€๋Šฅ์„ฑ์— +1ํ•ฉ๋‹ˆ๋‹ค. ํฌ๋“œ ๋˜๋Š” ๋ฐฐํฌ๋ณ„๋กœ ์กฐ์ •ํ•˜๋Š” ๊ฒƒ์€ ๊ฑฐ์˜ ์œ ์šฉํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋Œ€๋ถ€๋ถ„์˜ ๊ฒฝ์šฐ ์ •์ƒ์ ์ธ ์ „์—ญ ๊ฐ’์ด ๋” ํŽธ๋ฆฌํ•˜๊ณ  ํ˜„์žฌ ๊ธฐ๋ณธ๊ฐ’ ์ธ 5m๋Š” ๋งŽ์€ ๊ฒฝ์šฐ์— ๊ธธ์–ด์ง‘๋‹ˆ๋‹ค.

์ด ๋ฌธ์ œ๋ฅผ ๋‹ค์‹œ์—ฌ์‹ญ์‹œ์˜ค.

๋‚˜๋Š” ์ด์™€ ๋™์ผํ•œ ๋ฌธ์ œ์— ์ง๋ฉดํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. Taint ๊ธฐ๋ฐ˜ Evictions๋ฅผ ๋น„ํ™œ์„ฑํ™”ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ์žˆ์œผ๋ฉฐ pod-eviction-timeout์ด ์ „์—ญ ๋ชจ๋“œ์—์„œ ์ž‘๋™ํ•ฉ๋‹ˆ๊นŒ?

๋‚˜๋Š” ์ด์™€ ๋™์ผํ•œ ๋ฌธ์ œ์— ์ง๋ฉดํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. Taint ๊ธฐ๋ฐ˜ Evictions๋ฅผ ๋น„ํ™œ์„ฑํ™”ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ์žˆ์œผ๋ฉฐ pod-eviction-timeout์ด ์ „์—ญ ๋ชจ๋“œ์—์„œ ์ž‘๋™ํ•ฉ๋‹ˆ๊นŒ?

apiserver๋ฅผ ํ†ตํ•ด ์ „์—ญ ํฌ๋“œ ์ œ๊ฑฐ๋ฅผ ๊ตฌ์„ฑ ํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค : https://kubernetes.io/docs/reference/command-line-tools-reference/kube-apiserver/
๋‚˜๋Š” ์ด๊ฒƒ์„ ์‹œ๋„ํ•˜์ง€ ์•Š์•˜์ง€๋งŒ ๋‚ด๊ฐ€ ๋ณผ ์ˆ˜ ์žˆ๋“ฏ์ด ์˜ต์…˜์ด ์žˆ์Šต๋‹ˆ๋‹ค : --default-not-ready-toleration-seconds ๋ฐ --default-unreachable-toleration-seconds.

์ด ๋ฒ„๊ทธ๊ฐ€ ์ข…๊ฒฐ ๋œ ๊ฒƒ์œผ๋กœ ํ‘œ์‹œ๋œ ์ด์œ ๋Š” ๋ฌด์—‡์ž…๋‹ˆ๊นŒ? ์›๋ž˜ ๋ฌธ์ œ๊ฐ€ ํ•ด๊ฒฐ๋˜์ง€ ์•Š๊ณ  ํ•ด๊ฒฐ ๋œ ๊ฒƒ์ฒ˜๋Ÿผ ๋ณด์ž…๋‹ˆ๋‹ค.
pod-eviction-timeout ํ”Œ๋ž˜๊ทธ๊ฐ€ ์ž‘๋™ํ•˜์ง€ ์•Š๋Š” ์ด์œ ๊ฐ€ ๋ช…ํ™•ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

๊ฐ™์€ ๋ฌธ์ œ

์ด ํŽ˜์ด์ง€๊ฐ€ ๋„์›€์ด ๋˜์—ˆ๋‚˜์š”?
0 / 5 - 0 ๋“ฑ๊ธ‰