Kubeadm: Document how to and provide scripts for running kubeadm in a container

Created on 22 Nov 2016  ·  55Comments  ·  Source: kubernetes/kubeadm

_From @andersla on October 27, 2016 18:8_

When trying to install Kubeadm inside Ubuntu 16.04 docker container it fails.

BUG REPORT

Kubernetes version (use kubectl version):
latest

Environment:
Ubuntu 16.04 Docker container

What happened:
When trying to install Kubeadm inside Ubuntu 16.04 docker container it fails.
My Idea was to use one docker container as master "node" and a second container as a worker "node" (kubernetes in docker)
Is this a systemd issue? (something I came across when "googling" for answers)

Inside Ubuntu 16.04 docker image I install with : apt-get install -y kubeadm

setup log:

...
...
...
all: Setting up socat (1.7.3.1-1) ...
    all: Setting up kubelet (1.4.3-00) ...
    all: /var/lib/dpkg/info/kubelet.postinst: 38: /var/lib/dpkg/info/kubelet.postinst: [[: not found
    all: Setting up kubectl (1.4.3-00) ...
    all: Setting up kubeadm (1.5.0-alpha.0-1534-gcf7301f-00) ...
    all: Failed to connect to bus: No such file or directory
    **all: dpkg: error processing package kubeadm (--configure):**
    all: subprocess installed post-installation script returned error exit status 1
    all: Setting up netcat-traditional (1.10-41) ...
    all: update-alternatives: using /bin/nc.traditional to provide /bin/nc (nc) in auto mode
    all: Setting up netcat (1.10-41) ...
    all: Setting up patch (2.7.5-1) ...
    all: Setting up rename (0.20-4) ...
    all: update-alternatives: using /usr/bin/file-rename to provide /usr/bin/rename (rename) in auto mode
    all: Setting up tcpd (7.6.q-25) ...
    all: Setting up ubuntu-fan (0.9.1) ...
    all: invoke-rc.d: could not determine current runlevel
    all: invoke-rc.d: policy-rc.d denied execution of start.
    all: Setting up xz-utils (5.1.1alpha+20120614-2ubuntu2) ...
    all: update-alternatives: using /usr/bin/xz to provide /usr/bin/lzma (lzma) in auto mode
    all: Setting up python3 (3.5.1-3) ...
    all: running python rtupdate hooks for python3.5...
    all: running python post-rtupdate hooks for python3.5...
    all: Setting up apparmor (2.10.95-0ubuntu2.2) ...
    all: update-rc.d: warning: start and stop actions are no longer supported; falling back to defaults
    all: Setting up dh-python (2.20151103ubuntu1.1) ...
    all: Processing triggers for libc-bin (2.23-0ubuntu4) ...
    all: Processing triggers for systemd (229-4ubuntu11) ...
    all: Processing triggers for initramfs-tools (0.122ubuntu8.5) ...
    all: Processing triggers for dbus (1.10.6-1ubuntu3) ...
    all: Errors were encountered while processing:
    all: kubeadm
    all: E: Sub-process /usr/bin/dpkg returned an error code (1)
==> all: Killing the container: 93babb5045461c343a803109ba683a2acf68f1f453447a336b09171a1b190f38
Build 'all' errored: Script exited with non-zero exit status: 100

==> Some builds didn't complete successfully and had errors:
--> all: Script exited with non-zero exit status: 100

_Copied from original issue: kubernetes/kubernetes#35712_

aretesting documentatiocontent-gap kinsupport prioritbacklog

Most helpful comment

So if you use Flannel, then everything is working, here is the complete setup:

Dockerfile:

FROM ubuntu:xenial-20161213

ARG DEBIAN_FRONTEND=noninteractive

RUN apt-get update -qq

RUN apt-get install -y \
    apt-transport-https \
    apt-utils \
    ca-certificates \
    curl \
    dialog \
    python \
    daemon \
    vim \
    jq

# remove unwanted systemd services
RUN for i in /lib/systemd/system/sysinit.target.wants/*; do [ "${i##*/}" = "systemd-tmpfiles-setup.service" ] || rm -f "$i"; done; \
  rm -f /lib/systemd/system/multi-user.target.wants/*;\
  rm -f /etc/systemd/system/*.wants/*;\
  rm -f /lib/systemd/system/local-fs.target.wants/*; \
  rm -f /lib/systemd/system/sockets.target.wants/*udev*; \
  rm -f /lib/systemd/system/sockets.target.wants/*initctl*; \
  rm -f /lib/systemd/system/basic.target.wants/*;\
  rm -f /lib/systemd/system/anaconda.target.wants/*;

# install docker (after removing unwanted systemd)
RUN apt-get install -y \
    docker.io

RUN echo "Add Kubernetes repo..."
RUN sh -c 'curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -'
RUN sh -c 'echo "deb http://apt.kubernetes.io/ kubernetes-xenial main" > /etc/apt/sources.list.d/kubernetes.list'

RUN echo "Installing Kubernetes requirements..."
RUN apt-get update -y && apt-get install -y \
  kubelet \
  kubernetes-cni \
  kubectl

RUN echo "Installing Kubeadm - this will fail at post-install but that doesn't matter"
RUN apt-get install -y \
  kubeadm; exit 0

# Create volume for docker
VOLUME /var/lib/docker

Build it with:
docker build -t kubeadm_docker .

And then run:
docker run -it --privileged=true --name=master -h master -d --security-opt seccomp:unconfined --cap-add=SYS_ADMIN -v /sys/fs/cgroup:/sys/fs/cgroup:ro kubeadm_docker /sbin/init

Wait a few (10-15) seconds until systemd and docker is up and running

Then I start kubeadm inside the running container:
docker exec -it master kubeadm init --skip-preflight-checks --token=acbec6.2852dff7cb569aa0

When it is initiated I start a second "worker" node:
docker run -it --privileged=true --name=node -h node -d --security-opt seccomp:unconfined --cap-add=SYS_ADMIN -v /sys/fs/cgroup:/sys/fs/cgroup:ro kubeadm_docker /sbin/init

And after a few seconds (until systemd and docker is up) join the master:
docker exec -it node kubeadm join --skip-preflight-checks --token=acbec6.2852dff7cb569aa0 172.17.0.2

When they have joined, - enter master and apply workaround for crashing kube-proxy:
docker exec -it master bash

kubectl -n kube-system get ds -l 'component=kube-proxy' -o json | jq '.items[0].spec.template.spec.containers[0].command |= .+ ["--conntrack-max-per-core=0"]' | kubectl apply -f - && kubectl -n kube-system delete pods -l 'component=kube-proxy'

Finally apply flannel overlay network:
curl -sSL "https://github.com/coreos/flannel/blob/master/Documentation/kube-flannel.yml?raw=true" | kubectl create -f -

I had no problem installing Helm, Traefic or GlusterFS in Kubernetes in this setting:)

All 55 comments

_From @luxas on October 27, 2016 18:14_

cc @errordeveloper and @marun since they have been running systemd inside a container

@andersla Be beware that running systemd this way inside a container is not supported ootb, but feel free to try it our/hack on it as it would be great for testing kubeadm that way

_From @zreigz on October 28, 2016 7:36_

If you don't mind I would like take a look closer and try to fix it.

_From @andersla on October 28, 2016 8:48_

@zreigz Please do!
This is how I try to install it:

docker run -it --privileged ubuntu /bin/bash

And then:

echo "Updating Ubuntu..."
apt-get update -y
apt-get upgrade -y

echo "Install os requirements"
apt-get install -y \
  curl \
  apt-transport-https \
  dialog \
  python \
  daemon

echo "Add Kubernetes repo..."
sh -c 'curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -'
sh -c 'echo "deb http://apt.kubernetes.io/ kubernetes-xenial main" > /etc/apt/sources.list.d/kubernetes.list'
apt-get update -y

echo "Installing Kubernetes requirements..."
apt-get install -y \
  docker.io \
  kubelet \
  kubernetes-cni \
  kubectl \
  kubeadm

And this is the error I get when kubeadm is being installed:

root@82f5321d45cb:/# apt-get install kubeadm
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following NEW packages will be installed:
  kubeadm
0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Need to get 7981 kB of archives.
After this operation, 59.2 MB of additional disk space will be used.
Get:1 https://packages.cloud.google.com/apt kubernetes-xenial/main amd64 kubeadm amd64 1.5.0-alpha.0-1534-gcf7301f-00 [7981 kB]
Fetched 7981 kB in 0s (8532 kB/s)
Selecting previously unselected package kubeadm.
(Reading database ... 14222 files and directories currently installed.)
Preparing to unpack .../kubeadm_1.5.0-alpha.0-1534-gcf7301f-00_amd64.deb ...
Unpacking kubeadm (1.5.0-alpha.0-1534-gcf7301f-00) ...
Setting up kubeadm (1.5.0-alpha.0-1534-gcf7301f-00) ...
Failed to connect to bus: No such file or directory
dpkg: error processing package kubeadm (--configure):
 subprocess installed post-installation script returned error exit status 1
Errors were encountered while processing:
 kubeadm
E: Sub-process /usr/bin/dpkg returned an error code (1)

_From @zreigz on October 28, 2016 9:10_

I reproduced it and I have been working on this

_From @zreigz on October 31, 2016 7:24_

There are two problems.

The first one: ll: /var/lib/dpkg/info/kubelet.postinst: 38: /var/lib/dpkg/info/kubelet.postinst: [[: not found
On Ubuntu systems, /bin/sh is dash, not bash, and dash does not support the double bracket keyword. Good thing is the issue is fixed on master branch and should be available soon: https://github.com/kubernetes/release/blob/master/debian/xenial/kubelet/debian/postinst#L40

The second one is not so trivial. Running systemctl in container fails with Failed to get D-Bus connection. It seems like systemd doesn't work properly in container. Now I am working on this

_From @andersla on October 31, 2016 7:42_

Great!
I just don't see why installation of kubeadm need systemd/systemctl at all?

_From @zreigz on October 31, 2016 7:47_

Because of those two lines: https://github.com/kubernetes/release/blob/master/debian/xenial/kubeadm/debian/postinst#L25

systemctl daemon-reload
systemctl restart kubelet

It fails on first line

_From @zreigz on October 31, 2016 7:48_

this is the explanation:

# because kubeadm package adds kubelet drop-ins, we must daemon-reload
# and restart kubelet now. restarting kubelet is ok because kubelet
# postinst configure step auto-starts it.

_From @zreigz on October 31, 2016 7:52_

There are some configuration steps to make it work but I have to try it first. If I find something I'll let you know.

_From @zreigz on November 2, 2016 7:19_

Good news. I've managed solve all issues. It needs last tests and I will post solution how to run kubeadm in Docker container

_From @andersla on November 2, 2016 7:23_

Super! I will help testing as soon as it is ready! - although I am on holidays the rest of this week:)

_From @zreigz on November 2, 2016 10:13_

There are two main issues regarding to installation kubeadm in Docker container. First is systemd running in container. Second is installation docker inside container. Successfully the problems were fixed. Here is the Dockerfile which must be used to prepare Ubuntu image

FROM ubuntu
ENV container docker
RUN apt-get -y update

RUN apt-get update -qq && apt-get install -qqy \
    apt-transport-https \
    ca-certificates \
    curl \
    lxc \
    vim \
    iptables

RUN curl -sSL https://get.docker.com/ | sh

RUN (cd /lib/systemd/system/sysinit.target.wants/; for i in *; do [ $i == systemd-tmpfiles-setup.service ] || rm -f $i; done); \
rm -f /lib/systemd/system/multi-user.target.wants/*;\
rm -f /etc/systemd/system/*.wants/*;\
rm -f /lib/systemd/system/local-fs.target.wants/*; \
rm -f /lib/systemd/system/sockets.target.wants/*udev*; \
rm -f /lib/systemd/system/sockets.target.wants/*initctl*; \
rm -f /lib/systemd/system/basic.target.wants/*;\
rm -f /lib/systemd/system/anaconda.target.wants/*;

VOLUME /sys/fs/cgroup
VOLUME /var/run/docker.sock
CMD /sbin/init

I use this command to build the image in the directory containing the Dockerfile

docker build -t kubeadm_docker .

Now you can run prepared image and finish kubeadm installation.
Use the following command to run kubeadm_docker image:

docker run -it -e "container=docker" --privileged=true -d --security-opt seccomp:unconfined --cap-add=SYS_ADMIN -v /sys/fs/cgroup:/sys/fs/cgroup:ro -v /var/run/docker.sock:/var/run/docker.sock  kubeadm_docker /sbin/init

Find running container ID

$ docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
7dd73057620d        kubeadm_docker      "/sbin/init"        About an hour ago   Up About an hour                        furious_fermi

Now you can open container console:

docker exec -it 7dd73057620d /bin/bash

This is your script (with small modifications) to install kubeadm

echo "Updating Ubuntu..."
apt-get update -y
apt-get upgrade -y

systemctl start docker

echo "Install os requirements"
apt-get install -y \
  curl \
  apt-transport-https \
  dialog \
  python \
  daemon

echo "Add Kubernetes repo..."
sh -c 'curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -'
sh -c 'echo "deb http://apt.kubernetes.io/ kubernetes-xenial main" > /etc/apt/sources.list.d/kubernetes.list'
apt-get update -y

echo "Installing Kubernetes requirements..."
apt-get install -y \
  kubelet

# This is temporary fix until new version will be released
sed -i 38,40d /var/lib/dpkg/info/kubelet.postinst

apt-get install -y \
  kubernetes-cni \
  kubectl \
  kubeadm

And finally you can execute

# kubeadm init

Everything works the same like on local machine.
Good luck :)

_From @SuperStevenZ on November 17, 2016 7:21_

@zreigz That solved the same problem of mine, thanks!

_From @zreigz on November 17, 2016 7:30_

No problem :)

We should set up a CI with docker-in-docker stuff.

@errordeveloper @zreigz Can you take this on?
At least we should document somewhere how to run kubeadm inside a container...

Sounds good for me. For sure we need to put all this stuff in docker image plus some config/start scripts to distinguish between master and node. Good start would be to create project for it like kubernetes/kubeadm-docker. It would be also right place for Dockerfile ,scripts and documentation

Create that as a private project first under zreigz/ and eventually we'll probably merge that code into this repo.

But first, prototype in your own space and we'll see how it goes.

Real assignee is @zreigz

Yes good point. I will do it. Next week (Monday, Tuesday) I am on conference so I will start on Wednesday.

@luxas I was wondering how I should provide kubeadm and kubernetes-cni packages. If I should build it from the current sources (to be able test the newest implementation) or just download the newest version from repository? For CI purpose I thing we should have current state of the code to be able test it, or just we need it to test the release version?

Hi thanks for the fix but still getting an issue after kubeadm init i'm getting a 0/3 on DNS, DNS doesn't seem to be running at all

Every 2.0s: kubectl get pods --all-namespaces Fri Dec 16 17:00:50 2016

NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system dummy-2088944543-17sey 1/1 Running 0 11m
kube-system etcd-8dd8c92c6c38 1/1 Running 2 12m
kube-system kube-apiserver-8dd8c92c6c38 1/1 Running 4 12m
kube-system kube-controller-manager-8dd8c92c6c38 1/1 Running 2 11m
kube-system kube-discovery-1150918428-m506w 1/1 Running 0 11m
kube-system kube-dns-654381707-vuijm 0/3 ContainerCreating 0 11m
kube-system kube-proxy-tuw6u 0/1 CrashLoopBackOff 6 11m
kube-system kube-scheduler-8dd8c92c6c38 1/1 Running 2 10m

tried install network policy
root@8dd8c92c6c38:/# kubectl apply -f calico.yaml
the path "calico.yaml" does not exist
root@8dd8c92c6c38:/# kubectl create -f calico.yaml
the path "calico.yaml" does not exist
root@8dd8c92c6c38:/# kubectl apply -f kube-flannel.yml
the path "kube-flannel.yml" does not exist

root@8dd8c92c6c38:/# kubectl apply -f https://git.io/weave-kube
daemonset "weave-net" created
root@8dd8c92c6c38:/# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system dummy-2088944543-17sey 1/1 Running 0 46m
kube-system etcd-8dd8c92c6c38 1/1 Running 2 46m
kube-system kube-apiserver-8dd8c92c6c38 1/1 Running 4 46m
kube-system kube-controller-manager-8dd8c92c6c38 1/1 Running 2 45m
kube-system kube-discovery-1150918428-9m6rr 0/1 Pending 0 3m
kube-system kube-dns-654381707-vuijm 0/3 ContainerCreating 0 45m
kube-system kube-proxy-tuw6u 0/1 CrashLoopBackOff 13 45m
kube-system kube-scheduler-8dd8c92c6c38 1/1 Running 2 44m
kube-system weave-net-iv0bc 0/2 ContainerCreating 0 49s
info: 1 completed object(s) was(were) not shown in pods list. Pass --show-all to see all objects.

Hi again @zreigz
Now I have finally had time to go further with this and tested it - I can almost make it, but there is an error that docker picks vfs storage driver (probably because it cant use aufs on top of aufs? But as you describe the workaround above I am mounting the outer docker .sock in the inner docker so it should be possible to write with aufs? If I do
docker info on my host machine it says that it is running aufs storage driver. - Whereas if I do docker info inside the docker container with kubernetes it says it is using vfs storage driver.
Any idéas of why I get the following problem when running
kubeadm init

root@f50f087baa83:/# kubeadm init
[kubeadm] WARNING: kubeadm is in alpha, please do not use it for production clusters.
[preflight] Running pre-flight checks
[preflight] The system verification failed. Printing the output from the verification:
OS: Linux
KERNEL_VERSION: 4.4.0-43-generic
CONFIG_NAMESPACES: enabled
CONFIG_NET_NS: enabled
CONFIG_PID_NS: enabled
CONFIG_IPC_NS: enabled
CONFIG_UTS_NS: enabled
CONFIG_CGROUPS: enabled
CONFIG_CGROUP_CPUACCT: enabled
CONFIG_CGROUP_DEVICE: enabled
CONFIG_CGROUP_FREEZER: enabled
CONFIG_CGROUP_SCHED: enabled
CONFIG_CPUSETS: enabled
CONFIG_MEMCG: enabled
CONFIG_INET: enabled
CONFIG_EXT4_FS: enabled
CONFIG_PROC_FS: enabled
CONFIG_NETFILTER_XT_TARGET_REDIRECT: enabled (as module)
CONFIG_NETFILTER_XT_MATCH_COMMENT: enabled (as module)
CONFIG_OVERLAY_FS: enabled (as module)
CONFIG_AUFS_FS: enabled (as module)
CONFIG_BLK_DEV_DM: enabled
CGROUPS_CPU: enabled
CGROUPS_CPUACCT: enabled
CGROUPS_CPUSET: enabled
CGROUPS_DEVICES: enabled
CGROUPS_FREEZER: enabled
CGROUPS_MEMORY: enabled
DOCKER_VERSION: 1.12.1
DOCKER_GRAPH_DRIVER: vfs
[preflight] Some fatal errors occurred:
    unsupported graph driver: vfs
[preflight] If you know what you are doing, you can skip pre-flight checks with `--skip-preflight-checks`
root@f50f087baa83:/# 

Some more info after trying a bit more.
I changed the docker storage driver to "overlay" on the host. Then docker inside docker picked aufs as driver an I passed the "pre-flight checks", but now I am stuck in
[apiclient] Created API client, waiting for the control plane to become ready

On some other testing I realized that docker was not picking the same storage driver when It was started as a service via the /sbin/init
If I ran the docker image this way it didn't start the same driver as host (as mentioned above):
sudo docker run -it --privileged=true -d --security-opt seccomp:unconfined --cap-add=SYS_ADMIN -v /sys/fs/cgroup:/sys/fs/cgroup:ro -v /var/run/docker.sock:/var/run/docker.sock kubeadm_docker /sbin/init

If I started it without /sbin/init and not as a daemon like this:
sudo docker run -it --privileged=true --security-opt seccomp:unconfined --cap-add=SYS_ADMIN -v /sys/fs/cgroup:/sys/fs/cgroup:ro -v /var/run/docker.sock:/var/run/docker.sock kubeadm_docker /bin/bash then docker was picking the same storage driver as the host (but now systemctrl was not working)

Some more updates:

I can now build a working kubeadm-in-docker-container with this Dockerfile:

FROM ubuntu:xenial-20161213

ARG DEBIAN_FRONTEND=noninteractive

RUN apt-get update -qq

RUN apt-get install -y \
    apt-transport-https \
    apt-utils \
    ca-certificates \
    curl \
    dialog \
    python \
    daemon \
    vim \
    jq \
    linux-image-$(uname -r)

# remove unwanted systemd services
RUN for i in /lib/systemd/system/sysinit.target.wants/*; do [ "${i##*/}" = "systemd-tmpfiles-setup.service" ] || rm -f "$i"; done; \
  rm -f /lib/systemd/system/multi-user.target.wants/*;\
  rm -f /etc/systemd/system/*.wants/*;\
  rm -f /lib/systemd/system/local-fs.target.wants/*; \
  rm -f /lib/systemd/system/sockets.target.wants/*udev*; \
  rm -f /lib/systemd/system/sockets.target.wants/*initctl*; \
  rm -f /lib/systemd/system/basic.target.wants/*;\
  rm -f /lib/systemd/system/anaconda.target.wants/*;

# install docker (after removing unwanted systemd)
RUN apt-get install -y \
    docker.io

RUN echo "Add Kubernetes repo..."
RUN sh -c 'curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -'
RUN sh -c 'echo "deb http://apt.kubernetes.io/ kubernetes-xenial main" > /etc/apt/sources.list.d/kubernetes.list'

RUN echo "Installing Kubernetes requirements..."
RUN apt-get update -y && apt-get install -y \
  kubelet \
  kubernetes-cni \
  kubectl

RUN echo "Installing Kubeadm - this will fail at post-install but that doesn't matter"
RUN apt-get install -y \
  kubeadm; exit 0

# Create volume for docker
VOLUME /var/lib/docker

I build with: docker build -t kubeadm_docker .

And then run:

docker run -it --privileged=true --name=master -d --security-opt seccomp:unconfined --cap-add=SYS_ADMIN -v /sys/fs/cgroup:/sys/fs/cgroup:ro kubeadm_docker /sbin/init

Wait a few (10-15) seconds until systemd and docker is up and running

Then I start kubeadm inside the running container:
docker exec -it master kubeadm init --token=acbec6.2852dff7cb569aa0

When it is initiated I start a second "worker" node:

docker run -it --privileged=true --name=node -d --security-opt seccomp:unconfined --cap-add=SYS_ADMIN -v /sys/fs/cgroup:/sys/fs/cgroup:ro kubeadm_docker /sbin/init
And after a few seconds join the master:

docker exec -it edge kubeadm join --token=acbec6.2852dff7cb569aa0 172.17.0.2

Currently there is some problem with the docker network because kube-proxy fais and enters a CrashLoopBackOff.

If I instead set --net=host when running docker above, then kube-proxy and all pods are comming up OK - but that is not an option since I will need to have the containers running on the docker network with their ip:s

I also previously tried to run docker with the same process as on host: -v /var/run/docker.sock:/var/run/docker.sock but I never got it working because when the docker inside the container is started with systemd it doesn't pick up the sock (or something like that).

Thanks @andersla!
Can you paste what kube-proxy fails with?

Thanks @luxas for your interest!

Unfortunately no details in journalctl -xeu kubelet

This is all I find about kube-proxy (repeated many times) I also attach full log.

Jan 09 14:40:02 1355b98bf8c7 kubelet[244]: I0109 14:40:02.690862     244 docker_manager.go:2524] checking backoff for container "kube-proxy" in pod "kube-proxy-7886l"
Jan 09 14:40:03 1355b98bf8c7 kubelet[244]: I0109 14:40:03.984818     244 docker_manager.go:2538] Back-off 20s restarting failed container=kube-proxy pod=kube-proxy-7886l_kube-system(71a1e950-d679-11e6-a9f7-02429d4c0f01)
Jan 09 14:40:03 1355b98bf8c7 kubelet[244]: E0109 14:40:03.984833     244 pod_workers.go:184] Error syncing pod 71a1e950-d679-11e6-a9f7-02429d4c0f01, skipping: failed to "StartContainer" for "kube-proxy" with CrashLoopBackOff: "Back-off 20s restarting failed container=kube-proxy pod=kube-proxy-7886l_kube-system(71a1e950-d679-11e6-a9f7-02429d4c0f01)"

The full log also complains about the kube-dns - but that is because I haven't yet started weave.

Here is log from kubectl describe pod -n kube-system kube-proxy-w0ng5

Name:       kube-proxy-w0ng5
Namespace:  kube-system
Node:       3551807cba77/172.17.0.2
Start Time: Tue, 10 Jan 2017 18:03:06 +0000
Labels:     component=kube-proxy
        k8s-app=kube-proxy
        kubernetes.io/cluster-service=true
        name=kube-proxy
        tier=node
Status:     Running
IP:     172.17.0.2
Controllers:    DaemonSet/kube-proxy
Containers:
  kube-proxy:
    Container ID:   docker://dcc2bc0b50a2477b72d451b776f35e327f1faf09e3cddb25d5609569c6f2a242
    Image:      gcr.io/google_containers/kube-proxy-amd64:v1.5.1
    Image ID:       docker-pullable://gcr.io/google_containers/kube-proxy-amd64@sha256:3b82b2e0862b3c0ece915de29a5a53634c9b0a73140340f232533c645decbd4b
    Port:       
    Command:
      kube-proxy
      --kubeconfig=/run/kubeconfig
    State:      Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Tue, 10 Jan 2017 18:08:48 +0000
      Finished:     Tue, 10 Jan 2017 18:08:48 +0000
    Ready:      False
    Restart Count:  6
    Volume Mounts:
      /run/kubeconfig from kubeconfig (rw)
      /var/run/dbus from dbus (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-g0ft5 (ro)
    Environment Variables:  <none>
Conditions:
  Type      Status
  Initialized   True 
  Ready     False 
  PodScheduled  True 
Volumes:
  kubeconfig:
    Type:   HostPath (bare host directory volume)
    Path:   /etc/kubernetes/kubelet.conf
  dbus:
    Type:   HostPath (bare host directory volume)
    Path:   /var/run/dbus
  default-token-g0ft5:
    Type:   Secret (a volume populated by a Secret)
    SecretName: default-token-g0ft5
QoS Class:  BestEffort
Tolerations:    dedicated=master:NoSchedule
Events:
  FirstSeen LastSeen    Count   From            SubObjectPath           Type        Reason  Message
  --------- --------    -----   ----            -------------           --------    ------  -------
  9m        9m      1   {kubelet 3551807cba77}  spec.containers{kube-proxy} Normal      Pullingpulling image "gcr.io/google_containers/kube-proxy-amd64:v1.5.1"
  9m        9m      1   {kubelet 3551807cba77}  spec.containers{kube-proxy} Normal      CreatedCreated container with docker id ecf446de342a; Security:[seccomp=unconfined]
  9m        9m      1   {kubelet 3551807cba77}  spec.containers{kube-proxy} Normal      StartedStarted container with docker id ecf446de342a
  9m        9m      1   {kubelet 3551807cba77}  spec.containers{kube-proxy} Normal      Pulled  Successfully pulled image "gcr.io/google_containers/kube-proxy-amd64:v1.5.1"
  9m        9m      1   {kubelet 3551807cba77}  spec.containers{kube-proxy} Normal      CreatedCreated container with docker id f562fb667a64; Security:[seccomp=unconfined]
  9m        9m      1   {kubelet 3551807cba77}  spec.containers{kube-proxy} Normal      StartedStarted container with docker id f562fb667a64
  9m        9m      2   {kubelet 3551807cba77}                  Warning     FailedSync  Error syncing pod, skipping: failed to "StartContainer" for "kube-proxy" with CrashLoopBackOff: "Back-off 10s restarting failed container=kube-proxy pod=kube-proxy-w0ng5_kube-system(09c4f65d-d75f-11e6-814c-0242255c9a68)"

  9m    9m  1   {kubelet 3551807cba77}  spec.containers{kube-proxy} Normal  Started     Started container with docker id 1a7d7d4f682b
  9m    9m  1   {kubelet 3551807cba77}  spec.containers{kube-proxy} Normal  Created     Created container with docker id 1a7d7d4f682b; Security:[seccomp=unconfined]
  9m    9m  2   {kubelet 3551807cba77}                  Warning FailedSync  Error syncing pod, skipping: failed to "StartContainer" for "kube-proxy" with CrashLoopBackOff: "Back-off 20s restarting failed container=kube-proxy pod=kube-proxy-w0ng5_kube-system(09c4f65d-d75f-11e6-814c-0242255c9a68)"

  8m    8m  1   {kubelet 3551807cba77}  spec.containers{kube-proxy} Normal  Started     Started container with docker id 89bdf4ba7e0b
  8m    8m  1   {kubelet 3551807cba77}  spec.containers{kube-proxy} Normal  Created     Created container with docker id 89bdf4ba7e0b; Security:[seccomp=unconfined]
  8m    8m  3   {kubelet 3551807cba77}                  Warning FailedSync  Error syncing pod, skipping: failed to "StartContainer" for "kube-proxy" with CrashLoopBackOff: "Back-off 40s restarting failed container=kube-proxy pod=kube-proxy-w0ng5_kube-system(09c4f65d-d75f-11e6-814c-0242255c9a68)"

  8m    8m  1   {kubelet 3551807cba77}  spec.containers{kube-proxy} Normal  Created     Created container with docker id f2b7a2b5078d; Security:[seccomp=unconfined]
  8m    8m  1   {kubelet 3551807cba77}  spec.containers{kube-proxy} Normal  Started     Started container with docker id f2b7a2b5078d
  8m    7m  6   {kubelet 3551807cba77}                  Warning FailedSync  Error syncing pod, skipping: failed to "StartContainer" for "kube-proxy" with CrashLoopBackOff: "Back-off 1m20s restarting failed container=kube-proxy pod=kube-proxy-w0ng5_kube-system(09c4f65d-d75f-11e6-814c-0242255c9a68)"

  6m    6m  1   {kubelet 3551807cba77}  spec.containers{kube-proxy} Normal  Created     Created container with docker id 28deaf41d920; Security:[seccomp=unconfined]
  6m    6m  1   {kubelet 3551807cba77}  spec.containers{kube-proxy} Normal  Started     Started container with docker id 28deaf41d920
  6m    4m  12  {kubelet 3551807cba77}                  Warning FailedSync  Error syncing pod, skipping: failed to "StartContainer" for "kube-proxy" with CrashLoopBackOff: "Back-off 2m40s restarting failed container=kube-proxy pod=kube-proxy-w0ng5_kube-system(09c4f65d-d75f-11e6-814c-0242255c9a68)"

  9m    4m  6   {kubelet 3551807cba77}  spec.containers{kube-proxy} Normal  Pulled      Container image "gcr.io/google_containers/kube-proxy-amd64:v1.5.1" already present on machine
  4m    4m  1   {kubelet 3551807cba77}  spec.containers{kube-proxy} Normal  Created     Created container with docker id dcc2bc0b50a2; Security:[seccomp=unconfined]
  4m    4m  1   {kubelet 3551807cba77}  spec.containers{kube-proxy} Normal  Started     Started container with docker id dcc2bc0b50a2
  9m    10s 43  {kubelet 3551807cba77}  spec.containers{kube-proxy} Warning BackOff     Back-off restarting failed docker container
  4m    10s 18  {kubelet 3551807cba77}                  Warning FailedSync  Error syncing pod, skipping: failed to "StartContainer" for "kube-proxy" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=kube-proxy pod=kube-proxy-w0ng5_kube-system(09c4f65d-d75f-11e6-814c-0242255c9a68)"

Yes, I see _that_ it's crashlooping, but could you give e.g. kubectl -n kube-system logs kube-proxy-w0ng5?
So we actually see the reason _why_ :smile:

Hey that's brilliant:)
root@3551807cba77:/# kubectl -n kube-system logs kube-proxy-w0ng5

I0110 18:29:01.705993       1 server.go:215] Using iptables Proxier.
W0110 18:29:01.706933       1 proxier.go:254] clusterCIDR not specified, unable to distinguish between internal and external traffic
I0110 18:29:01.706947       1 server.go:227] Tearing down userspace rules.
I0110 18:29:01.712693       1 conntrack.go:81] Set sysctl 'net/netfilter/nf_conntrack_max' to 262144
I0110 18:29:01.712927       1 conntrack.go:66] Setting conntrack hashsize to 65536
write /sys/module/nf_conntrack/parameters/hashsize: operation not supported

I can fix it with a workaround: setting --conntrack-max-per-core=0 and then restarting proxy. A 0-val skips reconfiguring the nf_conntrack_max and leaves as is (65536). I inject start parameter like this:

First enter docker container:
docker exec -it master bash

then apply fix:

kubectl -n kube-system get ds -l 'component=kube-proxy' -o json | jq '.items[0].spec.template.spec.containers[0].command |= .+ ["--conntrack-max-per-core=0"]' | kubectl apply -f - && kubectl -n kube-system delete pods -l 'component=kube-proxy'

Now I get the CrashLoop on Weave instead when I later do a kubectl apply -f weave.yaml, the following is the log output from the weave pod:
/proc/sys/net/bridge/bridge-nf-call-iptables not found
I also tried starting with kube-proxy parameter --proxy-mode=userspace but same result.

I think this will solve weave issue: https://github.com/weaveworks/weave/pull/2659

@andersla Yes, that seems to fix the problem. Can you try a build from HEAD?
For example, you could use the luxas/weave-(kube|npc):v1.9.0-alpha.5 images that are from HEAD~ish.
Let me know if it works, and please comment here exactly what you're doing when now (shell commands, Dockerfile, other scripts, etc.) so others can take advantage of it.

I used the latest image from weaveworks/weave-kube

I also used the latest yaml-template https://github.com/weaveworks/weave/blob/master/prog/weave-kube/weave-daemonset.yaml

Unfortunately kube-dns didn't work (it is stuch in ContainerCreating. The error message from kubelet after starting weave is:

an 15 16:14:30 7c12205804da kubelet[540]: I0115 16:14:30.443327     540 operation_executor.go:917] MountVolume.SetUp succeeded for volume "kubernetes.io/secret/c23fb73d-db39-11e6-b84d-0242b1ac1840-default-token-142vd" (spec.Name: "default-token-142vd") pod "c23fb73d-db39-11e6-b84d-0242b1ac1840" (UID: "c23fb73d-db39-11e6-b84d-0242b1ac1840").
Jan 15 16:14:31 7c12205804da kubelet[540]: E0115 16:14:31.381741     540 docker_manager.go:373] NetworkPlugin cni failed on the status hook for pod 'kube-dns-2924299975-9gjcg' - Unexpected command output Device "eth0" does not exist.
Jan 15 16:14:31 7c12205804da kubelet[540]:  with error: exit status 1

If I only started the master node and not joining another node then kubedns came up OK when I applied weave.yaml

I also tested the weave.yaml with latest weave-kube on a Vagrant installation and not in my docker-experiment and then it all worked.

This is the weave.yaml I used for kubectl apply -f weave.yaml

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: weave-net
  namespace: kube-system
spec:
  template:
    metadata:
      labels:
        name: weave-net
      annotations:
        scheduler.alpha.kubernetes.io/tolerations: |
          [
            {
              "key": "dedicated",
              "operator": "Equal",
              "value": "master",
              "effect": "NoSchedule"
            }
          ]
    spec:
      hostNetwork: true
      hostPID: true
      containers:
        - name: weave
          image: weaveworks/weave-kube:latest
          imagePullPolicy: Always
          command:
            - /home/weave/launch.sh
          livenessProbe:
            initialDelaySeconds: 30
            httpGet:
              host: 127.0.0.1
              path: /status
              port: 6784
          securityContext:
            privileged: true
          volumeMounts:
            - name: weavedb
              mountPath: /weavedb
            - name: cni-bin
              mountPath: /host/opt
            - name: cni-bin2
              mountPath: /host/home
            - name: cni-conf
              mountPath: /host/etc
            - name: dbus
              mountPath: /host/var/lib/dbus
          resources:
            requests:
              cpu: 10m
        - name: weave-npc
          image: weaveworks/weave-npc:latest
          imagePullPolicy: Always
          resources:
            requests:
              cpu: 10m
          securityContext:
            privileged: true
      restartPolicy: Always
      volumes:
        - name: weavedb
          emptyDir: {}
        - name: cni-bin
          hostPath:
            path: /opt
        - name: cni-bin2
          hostPath:
            path: /home
        - name: cni-conf
          hostPath:
            path: /etc
        - name: dbus
          hostPath:
            path: /var/lib/dbus

hey guys, i ran into this thread and it freaking rocks! great stuff.

i really want to use this approach for CI against our repo (which is fairly complex, honestly). we have a Helm/Tiller requirement for launching quite a few charts for CI. have any of you run into this, or have suggestions for getting this going? Tiller seems to barf all over itself in this situation:

root@JINKITNIX05:~/openstack-helm# kubectl logs tiller-deploy-3299276078-6kdzw -n kube-system
Error from server (BadRequest): the server rejected our request for an unknown reason (get pods tiller-deploy-3299276078-6kdzw)
root@JINKITNIX05:~/openstack-helm# 

i may try with other SDN's. we've been using Calico so far because L3 is a little more straightforward to troubleshoot in hacky situations, but if Weave is better (since it's L2)...I'll try whatever gets us past the Tiller issue. i think Tiller is unhappy because at the end of the day it appears that it associates with 127.0.0.1...and i've seen that cause problems in the past testing other things. any input would be amazing. again, really awesome props to the folks who are hacking things up! thank you!!

Hi! Great that we are more people wanting this to work. I don't have experience with calico. On the cloud we are running Weave so thats what i wanted to get working on this project. But I am stuck and haven't had time to dig further why kube-dns doesn't come up when I apply Weave as described above.

Now the latest stable weave is working better than before....

kubectl apply -f https://git.io/weave-kube

..but unfortunately still same issue with kube-dns not comming up, stuck in ContainerCreating:

root@18a7d1ec5124:/# kubectl get pods --all-namespaces
NAMESPACE     NAME                                   READY     STATUS              RESTARTS   AGE
kube-system   dummy-2088944543-pvvdx                 1/1       Running             0          5m
kube-system   etcd-18a7d1ec5124                      1/1       Running             0          4m
kube-system   kube-apiserver-18a7d1ec5124            1/1       Running             2          5m
kube-system   kube-controller-manager-18a7d1ec5124   1/1       Running             0          4m
kube-system   kube-discovery-1769846148-6tv4l        1/1       Running             0          5m
kube-system   kube-dns-2924299975-4608d              0/4       ContainerCreating   0          5m
kube-system   kube-proxy-k0stq                       1/1       Running             0          4m
kube-system   kube-proxy-tnm8h                       1/1       Running             0          4m
kube-system   kube-scheduler-18a7d1ec5124            1/1       Running             0          4m
kube-system   weave-net-mff6t                        2/2       Running             0          3m
kube-system   weave-net-t7zcl                        2/2       Running             0          3m

and after applying weave, this error message stops:
Feb 04 18:06:57 18a7d1ec5124 kubelet[252]: E0204 18:06:57.125434 252 pod_workers.go:184] Error syncing pod 7dc68091-eb04-11e6-a321-02425e578ba1, skipping: failed to "SetupNetwork" for "kube-dns-2924299975-4608d_kube-system" with SetupNetworkError: "Failed to setup network for pod \"kube-dns-2924299975-4608d_kube-system(7dc68091-eb04-11e6-a321-02425e578ba1)\" using network plugins \"cni\": cni config unintialized; Skipping pod"

and instead once I see:

Feb 04 18:06:59 18a7d1ec5124 kubelet[252]: E0204 18:06:59.615375 252 docker_manager.go:373] NetworkPlugin cni failed on the status hook for pod 'kube-dns-2924299975-4608d' - Unexpected command output Device "eth0" does not exist. Feb 04 18:06:59 18a7d1ec5124 kubelet[252]: with error: exit status 1

If I use Flannel as network plugin instead it works.

docker exec -it master bash

curl -sSL "https://github.com/coreos/flannel/blob/master/Documentation/kube-flannel.yml?raw=true" | kubectl create -f -

So if you use Flannel, then everything is working, here is the complete setup:

Dockerfile:

FROM ubuntu:xenial-20161213

ARG DEBIAN_FRONTEND=noninteractive

RUN apt-get update -qq

RUN apt-get install -y \
    apt-transport-https \
    apt-utils \
    ca-certificates \
    curl \
    dialog \
    python \
    daemon \
    vim \
    jq

# remove unwanted systemd services
RUN for i in /lib/systemd/system/sysinit.target.wants/*; do [ "${i##*/}" = "systemd-tmpfiles-setup.service" ] || rm -f "$i"; done; \
  rm -f /lib/systemd/system/multi-user.target.wants/*;\
  rm -f /etc/systemd/system/*.wants/*;\
  rm -f /lib/systemd/system/local-fs.target.wants/*; \
  rm -f /lib/systemd/system/sockets.target.wants/*udev*; \
  rm -f /lib/systemd/system/sockets.target.wants/*initctl*; \
  rm -f /lib/systemd/system/basic.target.wants/*;\
  rm -f /lib/systemd/system/anaconda.target.wants/*;

# install docker (after removing unwanted systemd)
RUN apt-get install -y \
    docker.io

RUN echo "Add Kubernetes repo..."
RUN sh -c 'curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -'
RUN sh -c 'echo "deb http://apt.kubernetes.io/ kubernetes-xenial main" > /etc/apt/sources.list.d/kubernetes.list'

RUN echo "Installing Kubernetes requirements..."
RUN apt-get update -y && apt-get install -y \
  kubelet \
  kubernetes-cni \
  kubectl

RUN echo "Installing Kubeadm - this will fail at post-install but that doesn't matter"
RUN apt-get install -y \
  kubeadm; exit 0

# Create volume for docker
VOLUME /var/lib/docker

Build it with:
docker build -t kubeadm_docker .

And then run:
docker run -it --privileged=true --name=master -h master -d --security-opt seccomp:unconfined --cap-add=SYS_ADMIN -v /sys/fs/cgroup:/sys/fs/cgroup:ro kubeadm_docker /sbin/init

Wait a few (10-15) seconds until systemd and docker is up and running

Then I start kubeadm inside the running container:
docker exec -it master kubeadm init --skip-preflight-checks --token=acbec6.2852dff7cb569aa0

When it is initiated I start a second "worker" node:
docker run -it --privileged=true --name=node -h node -d --security-opt seccomp:unconfined --cap-add=SYS_ADMIN -v /sys/fs/cgroup:/sys/fs/cgroup:ro kubeadm_docker /sbin/init

And after a few seconds (until systemd and docker is up) join the master:
docker exec -it node kubeadm join --skip-preflight-checks --token=acbec6.2852dff7cb569aa0 172.17.0.2

When they have joined, - enter master and apply workaround for crashing kube-proxy:
docker exec -it master bash

kubectl -n kube-system get ds -l 'component=kube-proxy' -o json | jq '.items[0].spec.template.spec.containers[0].command |= .+ ["--conntrack-max-per-core=0"]' | kubectl apply -f - && kubectl -n kube-system delete pods -l 'component=kube-proxy'

Finally apply flannel overlay network:
curl -sSL "https://github.com/coreos/flannel/blob/master/Documentation/kube-flannel.yml?raw=true" | kubectl create -f -

I had no problem installing Helm, Traefic or GlusterFS in Kubernetes in this setting:)

kubeadm-dind-cluster basically does what the last comment outlined, providing automation so you don't have to type the commands manually (although as of now it uses CNI bridge plugin with some hacks instead of flannel, but this I'll fix quite soon).
It also makes it easy to build both k8s components and kubeadm from local source and use the binaries in the cluster you start. Besides, there were some non-apparent problems I encountered while working on it, e.g. agetty eating 100% CPU and causing docker crashes unless you take care to disable it.

Some of the changes coming quite soon in kubeadm-dind-cluster:

  • fix it for k8s master, kube-proxy broke there
  • support for prebuilt images (I'm also going to publish several such images) so just a single script is enough to start the cluster. This may be useful for CI in various projects that use k8s
  • caching of Docker data dirs for faster cluster restarts
  • support for CNI implementations besides bridge

kubeadm-dind-cluster also provides automation for e2e tests. Another interesting trait of it is that you can use same remote docker engine for both building k8s and running kubeadm-dind-cluster without copying back the binaries (it pulls them directly from build data container), which may be important if you're working with remote docker over slow connection.

... forgot to mention it configures local kubectl for you so you don't need to do docker exec on your master container to access your cluster.

As I already mentioned, while DIND may seem easy on surface, you can have some unexpected problems with it. Some of the problems are already fixed in kubeadm-dind-cluster and base image it uses. E.g. you need to do some mounts, also you need to use STOPSIGNAL SIGRTMIN+3 and resist the temptation to use /sbin/init as ENTRYPOINT, and vfs driver can be quite slow at times. So... here be dragons ;)

@ivan4th Thanks for all the work you've been doing with kubeadm and dind :)
Can you open a new issue referencing this issue where we can discuss the MVP needed for merging kubeadm-dind-cluster into this repo?

After looking quickly, I found some points that we might want to do before a possible MVP:

  • It should ideally be written in Go -- I generally think we are trying to move away from Bash, so Go is the way to Go for a new project I think :)
  • The debian base should be based on gcr.io/google-containers/debian-base-$(ARCH):0.1

    • The base image for dind should ideally be published to gcr.io

  • It should work on multiple arches like kubeadm
  • You should be able to provide your own binaries, but most often it should download from the CI that publishes binaries for all arches every hour
  • It should use CNI -- with network providers swappable
  • It should expose its configuration options via a config file like kubeadm can take a config file as input for options
  • It should only support kubeadm v1.6+

What do you think? Thanks for the awesome start, I can't wait to actually integrate this into something kubeadm official :+1:

cc @jbeda @lukemarsden @errordeveloper @mikedanese @timothysc @sttts

Thanks for the awesome start, I can't wait to actually integrate this into something kubeadm official

if we can devel-build, kubeadm-local-up-cluster that would be fantastic.

@ivan4th @luxas What's the status of this?

I don't know really... @ivan4th

@jamiehannaford

  • as of now, I got delayed with Go rewrite because I also need to work on other projects
  • k-d-c has support for different CNI impls (Weave, Calico, Flannel and plain CNI bridge which is default)
  • supporting multiple architectures is not here yet but quite doable
  • the binaries that are used in the images are by default taken from k8s release but you can build your own or, with some small effort, make an image based on your own separately built binaries
  • it does support config file but as of now it's actually a set of env vars
  • the base image is still ubuntu but we'll going to switch to debian
  • we support 1.6 and I'll add support for 1.7 early next week

Overall k-d-c is quite usable in its current form IMO. It also has its own public CI based on Travis (BTW I also succeeded in running DIND on CircleCI if it's of some interest)

@luxas Maybe we can use @andersla's solution instead of a full DIND cluster? If so, would we need to host the Docker image anywhere, or just document what the Dockerfile looks like?

It'd be great if we can get a fix out for this issue for 1.9

I don't have cycles to work on this. If anyone else, can please do!

@jamiehannaford problem is, much of "full" DIND cluster is dedicated to handling numerous problems that arise from "simple" DIND usage. These may be quite obscure at times, see e.g. https://github.com/Mirantis/kubeadm-dind-cluster/commit/405c8bead4fb443582328fd3c7b8f01452872438 (I think I'll need to submit a fix for k8s for this). As of kubeadm-dind-cluster, it's still quite usable and I try to keep it up-to-date( @danehans and @pmichali are using it for k8s IPv6 e2e testing and Virtlet uses it to run it's e2e tests on CircleCI), although I spend a lot of time on other projects so I didn't manage to rewrite it in Go yet.

We talked about this in the SIG meeting yesterday, and we're gonna close the issue.
Developing and maintaining a full-blown DIND solution is not in scope for the core kubeadm team for a foreseeable future, if ever. We're super happy that the community provides these solutions though, like @ivan4th's hard work on the Mirantis project. If we find a good place to document the possibility to use that project, I'm personally fine with referencing it. Thanks!

Was this page helpful?
0 / 5 - 0 ratings