Kubeadm: Fresh deploy with CoreDNS not resolving any dns lookup

Created on 14 Aug 2018  ·  22Comments  ·  Source: kubernetes/kubeadm

Is this a BUG REPORT or FEATURE REQUEST?

BUG REPORT

Versions

kubeadm version (use kubeadm version):

{
  "clientVersion": {
    "major": "1",
    "minor": "11",
    "gitVersion": "v1.11.2",
    "gitCommit": "bb9ffb1654d4a729bb4cec18ff088eacc153c239",
    "gitTreeState": "clean",
    "buildDate": "2018-08-07T23:14:39Z",
    "goVersion": "go1.10.3",
    "compiler": "gc",
    "platform": "linux/amd64"
  }
}

Environment:

  • Kubernetes version (use kubectl version):
{
  "clientVersion": {
    "major": "1",
    "minor": "11",
    "gitVersion": "v1.11.2",
    "gitCommit": "bb9ffb1654d4a729bb4cec18ff088eacc153c239",
    "gitTreeState": "clean",
    "buildDate": "2018-08-07T23:17:28Z",
    "goVersion": "go1.10.3",
    "compiler": "gc",
    "platform": "linux/amd64"
  },
  "serverVersion": {
    "major": "1",
    "minor": "11",
    "gitVersion": "v1.11.2",
    "gitCommit": "bb9ffb1654d4a729bb4cec18ff088eacc153c239",
    "gitTreeState": "clean",
    "buildDate": "2018-08-07T23:08:19Z",
    "goVersion": "go1.10.3",
    "compiler": "gc",
    "platform": "linux/amd64"
  }
}
  • Cloud provider or hardware configuration:
    CentosOS 7 VM
  • OS (e.g. from /etc/os-release):
    CentOS Linux release 7.5.1804 (Core)
  • Kernel (e.g. uname -a):
    Linux K8S-master 3.10.0-862.9.1.el7.x86_64 #1 SMP Mon Jul 16 16:29:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
  • Others:
    Networking with flannel
$ kubectl get all --all-namespaces 
NAMESPACE     NAME                                     READY     STATUS    RESTARTS   AGE
kube-system   pod/coredns-78fcdf6894-bvtcg             1/1       Running   2          3h
kube-system   pod/coredns-78fcdf6894-lq7st             1/1       Running   2          3h
kube-system   pod/etcd-k8s-master                      1/1       Running   1          3h
kube-system   pod/kube-apiserver-k8s-master            1/1       Running   1          3h
kube-system   pod/kube-controller-manager-k8s-master   1/1       Running   1          3h
kube-system   pod/kube-flannel-ds-6tgqf                1/1       Running   2          3h
kube-system   pod/kube-flannel-ds-cn4ql                1/1       Running   1          3h
kube-system   pod/kube-proxy-cjlvz                     1/1       Running   1          3h
kube-system   pod/kube-proxy-w7ts7                     1/1       Running   1          3h
kube-system   pod/kube-scheduler-k8s-master            1/1       Running   1          3h

NAMESPACE   NAME                 TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
default     service/kubernetes   ClusterIP   10.96.0.1    <none>        443/TCP   3h

NAMESPACE     NAME                             DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR                   AGE
kube-system   daemonset.apps/kube-flannel-ds   2         2         2         2            2           beta.kubernetes.io/arch=amd64   3h
kube-system   daemonset.apps/kube-proxy        2         2         2         2            2           beta.kubernetes.io/arch=amd64   3h

NAMESPACE     NAME                      DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
kube-system   deployment.apps/coredns   2         2         2            2           3h

NAMESPACE     NAME                                 DESIRED   CURRENT   READY     AGE
kube-system   replicaset.apps/coredns-78fcdf6894   2         2         2         3h

What happened?

I've created a service so a pod can curl another pod, but the name is never resolved.
Exec-ing into the pod:

# cat /etc/resolv.conf 
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

In an older installation where kube-dns was the default, I remember a service with IP 10.96.0.10 with name "kube-dns". This installation doesn't have such service.

curl my-service 
curl: (6) Could not resolve host: my-service

curl my-service.default.svc.cluster.local 
curl: (6) Could not resolve host: my-service.default.svc.cluster.local

curl www.google.com
curl: (6) Could not resolve host: www.google.com

What you expected to happen?

The dns lookup should resolve

How to reproduce it (as minimally and precisely as possible)?

Fresh install with kubeadm and flannel, CentOS 7 with one node and master also acting as node.
Create a pod and a service, try to curl the pod inside a pod.

Anything else we need to know?

The IP address I see inside /etc/resolv.conf (10.96.0.10) is the same I had with kube-dns, but this time I don't see anything in 10.96.0.10.

$ kubectl logs -f --namespace=kube-system coredns-78fcdf6894-bvtcg 
.:53
CoreDNS-1.1.3
linux/amd64, go1.10.1, b0fd575c
2018/08/14 15:34:06 [INFO] CoreDNS-1.1.3
2018/08/14 15:34:06 [INFO] linux/amd64, go1.10.1, b0fd575c
2018/08/14 15:34:06 [INFO] plugin/reload: Running configuration MD5 = 2a066f12ec80aeb2b92740dd74c17138
^C
$ kubectl logs -f --namespace=kube-system coredns-78fcdf6894-lq7st 
.:53
2018/08/14 15:34:06 [INFO] CoreDNS-1.1.3
2018/08/14 15:34:06 [INFO] linux/amd64, go1.10.1, b0fd575c
2018/08/14 15:34:06 [INFO] plugin/reload: Running configuration MD5 = 2a066f12ec80aeb2b92740dd74c17138
CoreDNS-1.1.3
linux/amd64, go1.10.1, b0fd575c
help wanted prioritawaiting-more-evidence

Most helpful comment

Please check the clusterDNS variable in /var/lib/kubelet/config.yaml . For our configuration this was set (incorrectly) to 10.96.0.10 whereas it should have been 10.244.240.10 (that's what we've bootstrapped our cluster with) . Changing this and restarting kubelet fixed the issue for us. Your mileage may vary though.

All 22 comments

For whatever reason, there is no kube-dns service on your cluster.
You'll first need to re-create that by hand to fix things. Then we can try to figure out how it disappeared.

You can use this yaml to create the service with kubectl apply -f...

apiVersion: v1
kind: Service
metadata:
  name: kube-dns
  namespace: kube-system
  annotations:
    prometheus.io/port: "9153"
    prometheus.io/scrape: "true"
  labels:
    k8s-app: kube-dns
    kubernetes.io/cluster-service: "true"
    kubernetes.io/name: "CoreDNS"
spec:
  selector:
    k8s-app: kube-dns
  clusterIP: 10.96.0.10
  ports:
  - name: dns
    port: 53
    protocol: UDP
  - name: dns-tcp
    port: 53
    protocol: TCP

Note: It's counter-intuitive that the CoreDNS service name is still named "kube-dns", but it does select the coredns pods (which use selector label "kube-dns').

I'm having the same issue as OP, and the description and use-case is just about the same: kubeadm on Centos 7.5 with one master that is operating as the worker node as well. I have the same issue and I the service DOES exist:

λ k get all --all-namespaces
NAMESPACE        NAME                                                READY     STATUS             RESTARTS   AGE
default          pod/busybox                                         0/1       Error              0          28m
default          pod/gitlab-gitlab-fd8b9fb85-26mkz                   0/1       CrashLoopBackOff   6          50m
default          pod/gitlab-minio-7fb7886d94-2zsff                   1/1       Running            0          50m
default          pod/gitlab-postgresql-8684bb6656-ltxjm              1/1       Running            0          50m
default          pod/gitlab-redis-785447c586-84x4c                   1/1       Running            0          50m
default          pod/ldap-79bb8c66b9-68v9f                           1/1       Running            0          2d
default          pod/local-volume-provisioner-dkxm9                  1/1       Running            0          2d
kube-system      pod/coredns-78fcdf6894-2t8tv                        1/1       Running            0          2d
kube-system      pod/coredns-78fcdf6894-wvq26                        1/1       Running            0          2d
kube-system      pod/etcd-server1.stitches.tech                      1/1       Running            0          2d
kube-system      pod/kube-apiserver-server1.domain            1/1       Running            0          2d
kube-system      pod/kube-controller-manager-server1.domain   1/1       Running            0          2d
kube-system      pod/kube-flannel-ds-m9cz5                           1/1       Running            0          2d
kube-system      pod/kube-proxy-qhr8p                                1/1       Running            0          2d
kube-system      pod/kube-scheduler-server1.domain            1/1       Running            0          2d
kube-system      pod/kubernetes-dashboard-6948bdb78-qnp4b            1/1       Running            0          2d
kube-system      pod/tiller-deploy-56c4cf647b-64w8v                  1/1       Running            0          2d
metallb-system   pod/controller-9c57dbd4-fqhzb                       1/1       Running            0          2d
metallb-system   pod/speaker-tngv7                                   1/1       Running            0          2d

NAMESPACE     NAME                           TYPE           CLUSTER-IP      EXTERNAL-IP     PORT(S)                                   AGE
default       service/gitlab-gitlab          LoadBalancer   10.102.204.34   192.168.1.201   22:32208/TCP,80:32194/TCP,443:31370/TCP   50m
default       service/gitlab-minio           ClusterIP      None            <none>          9000/TCP                                  50m
default       service/gitlab-postgresql      ClusterIP      10.108.66.88    <none>          5432/TCP                                  50m
default       service/gitlab-redis           ClusterIP      10.97.59.57     <none>          6379/TCP                                  50m
default       service/kubernetes             ClusterIP      10.96.0.1       <none>          443/TCP                                   2d
default       service/ldap-service           LoadBalancer   10.101.250.10   192.168.1.200   389:32231/TCP                             2d
kube-system   service/kube-dns               ClusterIP      10.96.0.10      <none>          53/UDP,53/TCP                             2d
kube-system   service/kubernetes-dashboard   NodePort       10.104.132.52   <none>          443:30924/TCP                             2d
kube-system   service/tiller-deploy          ClusterIP      10.96.67.163    <none>          44134/TCP                                 2d

NAMESPACE        NAME                                      DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR                   AGE
default          daemonset.apps/local-volume-provisioner   1         1         1         1            1           <none>                          2d
kube-system      daemonset.apps/kube-flannel-ds            1         1         1         1            1           beta.kubernetes.io/arch=amd64   2d
kube-system      daemonset.apps/kube-proxy                 1         1         1         1            1           beta.kubernetes.io/arch=amd64   2d
metallb-system   daemonset.apps/speaker                    1         1         1         1            1           <none>                          2d

NAMESPACE        NAME                                   DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
default          deployment.apps/gitlab-gitlab          1         1         1            0           50m
default          deployment.apps/gitlab-minio           1         1         1            1           50m
default          deployment.apps/gitlab-postgresql      1         1         1            1           50m
default          deployment.apps/gitlab-redis           1         1         1            1           50m
default          deployment.apps/ldap                   1         1         1            1           2d
kube-system      deployment.apps/coredns                2         2         2            2           2d
kube-system      deployment.apps/kubernetes-dashboard   1         1         1            1           2d
kube-system      deployment.apps/tiller-deploy          1         1         1            1           2d
metallb-system   deployment.apps/controller             1         1         1            1           2d

NAMESPACE        NAME                                             DESIRED   CURRENT   READY     AGE
default          replicaset.apps/gitlab-gitlab-fd8b9fb85          1         1         0         50m
default          replicaset.apps/gitlab-minio-7fb7886d94          1         1         1         50m
default          replicaset.apps/gitlab-postgresql-8684bb6656     1         1         1         50m
default          replicaset.apps/gitlab-redis-785447c586          1         1         1         50m
default          replicaset.apps/ldap-79bb8c66b9                  1         1         1         2d
kube-system      replicaset.apps/coredns-78fcdf6894               2         2         2         2d
kube-system      replicaset.apps/kubernetes-dashboard-6948bdb78   1         1         1         2d
kube-system      replicaset.apps/tiller-deploy-56c4cf647b         1         1         1         2d
kube-system      replicaset.apps/tiller-deploy-64c9d747bd         0         0         0         2d
metallb-system   replicaset.apps/controller-9c57dbd4              1         1         1         2d

From the CoreDNS pods, I can't seem to do lookups out to the outside world, which seems strange:

root on server1 at 11:45:48 AM in /internal/gitlab
λ k exec -it coredns-78fcdf6894-2t8tv /bin/sh -n kube-system
/ # cat /etc/resolv.conf
nameserver 192.168.1.254
nameserver 2600:1700:c540:64c0::1
search attlocal.net domain
/ # host gitlab
;; connection timed out; no servers could be reached
/ # host google.com
;; connection timed out; no servers could be reached

To me, this means the CoreDNS pod can't see it's upstream nameserver, which is 192.168.1.254, the IP of the host network. Am I on the right track?

But, what's even stranger, is that a pod running on that master node CAN reach that IP address just fine:

λ kubectl run -it --rm --restart=Never --image=infoblox/dnstools:latest dnstools
If you don't see a command prompt, try pressing enter.
dnstools# ping 192.168.1.254
PING 192.168.1.254 (192.168.1.254): 56 data bytes
64 bytes from 192.168.1.254: seq=0 ttl=63 time=1.102 ms

Can you try with dig?

dig google.com @192.168.1.254

Also typically systems with a valid ipv6 config will try to resolve with that ipv6 resolver first. If that fails these systems call it a failure. Take a look at the dig command first if that works I would look to see if the system is configured with dual stack ipv4 ipv6 or not.

Thanks again to @mauilion for spending so much time helping me diagnose this issue today!

My solution (albeit it quite terrible for now) was just to disable the firewalld service on my host OS:

sudo systemctl stop firewalld
sudo systemctl disable firewalld

Keep in mind what that command is actually doing. Do so at your own risk.

I ran into the same issue with kubernetes 1.11.2 and flannel 0.10.0 deployed to a CentOS 7 VM via kubeadm with a kube-proxy configured to use iptables. What I noticed is that I had no pod to pod or pod to service communications after initial deployment. Looking at the FORWARD chain on iptables, kube-proxy set up a KUBE-FORWARD chain as the first rule which should, upon inspection, handle all the traffic I described above. Flannel appended two rules after the DROP and REJECT rules that are default in CentOS 7 FORWARD chain. I noticed when I removed the REJECT rule, then the rules added by Flannel would process the traffic, and my pods could communicate with other pods and with service ips.

Since kube-proxy monitors the KUBE-FORWARD change and keeps it from changing, I added two rules after the KUBE-FORWARD rule that added the ctstate of NEW. Once I added these rules, then internal traffic would get processed as I expected.

rules

Please check the clusterDNS variable in /var/lib/kubelet/config.yaml . For our configuration this was set (incorrectly) to 10.96.0.10 whereas it should have been 10.244.240.10 (that's what we've bootstrapped our cluster with) . Changing this and restarting kubelet fixed the issue for us. Your mileage may vary though.

@pkeuter, 10.244.0.0/16 is the default the _pod_ cidr for flannel. If that's so in your case, then 10.244.240.10 would be a pod IP, which you shouldn't use as your cluster-dns ip setting (re: it could change, no load balancing).

It is not:
image

We've bootstrapped the cluster with: --pod-network-cidr=10.244.0.0/16 --service-cidr=10.244.240.0/20, but as I see now there is some overlap, which I should change anyway :-) So thanks for that @chrisohaver!

Please check the clusterDNS variable in /var/lib/kubelet/config.yaml . For our configuration this was set (incorrectly) to 10.96.0.10 whereas it should have been 10.244.240.10 (that's what we've bootstrapped our cluster with) . Changing this and restarting kubelet fixed the issue for us. Your mileage may vary though.

Thank you for this - it helped me track down why my internal DNS requests were not resolving.

For reference, I had to set my clusterDNS value to 192.168.0.10 as I init kubeadm with --service-cidr=192.168.0.0/16 and my kube-dns service has that as its external IP.

Also of note, simply restarting kubelet was not enough - I had to restart my pods so /etc/resolv.conf was updated. One that was done requests are resolving as expected.

There were a number of conflating issues on coreDNS that have since been resolved. Given the overloaded set of issues I'm going to close this one.

If there are specific repro's on 1.12+ feel free to open and we'll address ASAP.

Please check the clusterDNS variable in /var/lib/kubelet/config.yaml . For our configuration this was set (incorrectly) to 10.96.0.10 whereas it should have been 10.244.240.10 (that's what we've bootstrapped our cluster with) . Changing this and restarting kubelet fixed the issue for us. Your mileage may vary though.

great, and i use calico which clusterDNS address i should set ?

I did same but facing same error my coredns pods not starting giving error state

I changed my ClusterDNS but still no effect @justlooks

+1 Facing the same issue in CentOS 7 and kubeadm 1.11

@timothysc

Adding iptables -p FORWARD ACCEPT fixed the issue

+1 Facing the same issue in CentOS 7 and kubeadm 1.12

found the resolution for the issue .
removed resources limit on core dns daemon controller as it was reaching cpu limit. which was making it restart.

Maybe the flannel problem, in my case, the vagrant has mutil network interface, so must specify the interface when deploy flannel: - --iface=eth1, otherwise the same dns problem is going to happen...

https://github.com/kubernetes/kubernetes/issues/39701

vim https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml modified as following:

......
containers:
      - name: kube-flannel
        image: quay.io/coreos/flannel:v0.11.0-amd64
        command:
        - /opt/bin/flanneld
        args:
        - --ip-masq
        - --kube-subnet-mgr
        - --iface=eth1
......

Thanks @pkeuter , It fixed the issue and I had to delete the coredns pods and let them to recreate it.

Was this page helpful?
0 / 5 - 0 ratings