_From @andersla on October 27, 2016 18:8_
BUG REPORT
Kubernetes version (use kubectl version
):
latest
Environment:
Ubuntu 16.04 Docker container
What happened:
When trying to install Kubeadm inside Ubuntu 16.04 docker container it fails.
My Idea was to use one docker container as master "node" and a second container as a worker "node" (kubernetes in docker)
Is this a systemd issue? (something I came across when "googling" for answers)
Inside Ubuntu 16.04 docker image I install with : apt-get install -y kubeadm
setup log:
...
...
...
all: Setting up socat (1.7.3.1-1) ...
all: Setting up kubelet (1.4.3-00) ...
all: /var/lib/dpkg/info/kubelet.postinst: 38: /var/lib/dpkg/info/kubelet.postinst: [[: not found
all: Setting up kubectl (1.4.3-00) ...
all: Setting up kubeadm (1.5.0-alpha.0-1534-gcf7301f-00) ...
all: Failed to connect to bus: No such file or directory
**all: dpkg: error processing package kubeadm (--configure):**
all: subprocess installed post-installation script returned error exit status 1
all: Setting up netcat-traditional (1.10-41) ...
all: update-alternatives: using /bin/nc.traditional to provide /bin/nc (nc) in auto mode
all: Setting up netcat (1.10-41) ...
all: Setting up patch (2.7.5-1) ...
all: Setting up rename (0.20-4) ...
all: update-alternatives: using /usr/bin/file-rename to provide /usr/bin/rename (rename) in auto mode
all: Setting up tcpd (7.6.q-25) ...
all: Setting up ubuntu-fan (0.9.1) ...
all: invoke-rc.d: could not determine current runlevel
all: invoke-rc.d: policy-rc.d denied execution of start.
all: Setting up xz-utils (5.1.1alpha+20120614-2ubuntu2) ...
all: update-alternatives: using /usr/bin/xz to provide /usr/bin/lzma (lzma) in auto mode
all: Setting up python3 (3.5.1-3) ...
all: running python rtupdate hooks for python3.5...
all: running python post-rtupdate hooks for python3.5...
all: Setting up apparmor (2.10.95-0ubuntu2.2) ...
all: update-rc.d: warning: start and stop actions are no longer supported; falling back to defaults
all: Setting up dh-python (2.20151103ubuntu1.1) ...
all: Processing triggers for libc-bin (2.23-0ubuntu4) ...
all: Processing triggers for systemd (229-4ubuntu11) ...
all: Processing triggers for initramfs-tools (0.122ubuntu8.5) ...
all: Processing triggers for dbus (1.10.6-1ubuntu3) ...
all: Errors were encountered while processing:
all: kubeadm
all: E: Sub-process /usr/bin/dpkg returned an error code (1)
==> all: Killing the container: 93babb5045461c343a803109ba683a2acf68f1f453447a336b09171a1b190f38
Build 'all' errored: Script exited with non-zero exit status: 100
==> Some builds didn't complete successfully and had errors:
--> all: Script exited with non-zero exit status: 100
_Copied from original issue: kubernetes/kubernetes#35712_
_From @luxas on October 27, 2016 18:14_
cc @errordeveloper and @marun since they have been running systemd inside a container
@andersla Be beware that running systemd this way inside a container is not supported ootb, but feel free to try it our/hack on it as it would be great for testing kubeadm that way
_From @zreigz on October 28, 2016 7:36_
If you don't mind I would like take a look closer and try to fix it.
_From @andersla on October 28, 2016 8:48_
@zreigz Please do!
This is how I try to install it:
docker run -it --privileged ubuntu /bin/bash
And then:
echo "Updating Ubuntu..."
apt-get update -y
apt-get upgrade -y
echo "Install os requirements"
apt-get install -y \
curl \
apt-transport-https \
dialog \
python \
daemon
echo "Add Kubernetes repo..."
sh -c 'curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -'
sh -c 'echo "deb http://apt.kubernetes.io/ kubernetes-xenial main" > /etc/apt/sources.list.d/kubernetes.list'
apt-get update -y
echo "Installing Kubernetes requirements..."
apt-get install -y \
docker.io \
kubelet \
kubernetes-cni \
kubectl \
kubeadm
And this is the error I get when kubeadm is being installed:
root@82f5321d45cb:/# apt-get install kubeadm
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following NEW packages will be installed:
kubeadm
0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Need to get 7981 kB of archives.
After this operation, 59.2 MB of additional disk space will be used.
Get:1 https://packages.cloud.google.com/apt kubernetes-xenial/main amd64 kubeadm amd64 1.5.0-alpha.0-1534-gcf7301f-00 [7981 kB]
Fetched 7981 kB in 0s (8532 kB/s)
Selecting previously unselected package kubeadm.
(Reading database ... 14222 files and directories currently installed.)
Preparing to unpack .../kubeadm_1.5.0-alpha.0-1534-gcf7301f-00_amd64.deb ...
Unpacking kubeadm (1.5.0-alpha.0-1534-gcf7301f-00) ...
Setting up kubeadm (1.5.0-alpha.0-1534-gcf7301f-00) ...
Failed to connect to bus: No such file or directory
dpkg: error processing package kubeadm (--configure):
subprocess installed post-installation script returned error exit status 1
Errors were encountered while processing:
kubeadm
E: Sub-process /usr/bin/dpkg returned an error code (1)
_From @zreigz on October 28, 2016 9:10_
I reproduced it and I have been working on this
_From @zreigz on October 31, 2016 7:24_
There are two problems.
The first one: ll: /var/lib/dpkg/info/kubelet.postinst: 38: /var/lib/dpkg/info/kubelet.postinst: [[: not found
On Ubuntu systems, /bin/sh is dash, not bash, and dash does not support the double bracket keyword. Good thing is the issue is fixed on master branch and should be available soon: https://github.com/kubernetes/release/blob/master/debian/xenial/kubelet/debian/postinst#L40
The second one is not so trivial. Running systemctl in container fails with Failed to get D-Bus connection
. It seems like systemd doesn't work properly in container. Now I am working on this
_From @andersla on October 31, 2016 7:42_
Great!
I just don't see why installation of kubeadm need systemd/systemctl at all?
_From @zreigz on October 31, 2016 7:47_
Because of those two lines: https://github.com/kubernetes/release/blob/master/debian/xenial/kubeadm/debian/postinst#L25
systemctl daemon-reload
systemctl restart kubelet
It fails on first line
_From @zreigz on October 31, 2016 7:48_
this is the explanation:
# because kubeadm package adds kubelet drop-ins, we must daemon-reload
# and restart kubelet now. restarting kubelet is ok because kubelet
# postinst configure step auto-starts it.
_From @zreigz on October 31, 2016 7:52_
There are some configuration steps to make it work but I have to try it first. If I find something I'll let you know.
_From @zreigz on November 2, 2016 7:19_
Good news. I've managed solve all issues. It needs last tests and I will post solution how to run kubeadm in Docker container
_From @andersla on November 2, 2016 7:23_
Super! I will help testing as soon as it is ready! - although I am on holidays the rest of this week:)
_From @zreigz on November 2, 2016 10:13_
There are two main issues regarding to installation kubeadm in Docker container. First is systemd running in container. Second is installation docker inside container. Successfully the problems were fixed. Here is the Dockerfile which must be used to prepare Ubuntu image
FROM ubuntu
ENV container docker
RUN apt-get -y update
RUN apt-get update -qq && apt-get install -qqy \
apt-transport-https \
ca-certificates \
curl \
lxc \
vim \
iptables
RUN curl -sSL https://get.docker.com/ | sh
RUN (cd /lib/systemd/system/sysinit.target.wants/; for i in *; do [ $i == systemd-tmpfiles-setup.service ] || rm -f $i; done); \
rm -f /lib/systemd/system/multi-user.target.wants/*;\
rm -f /etc/systemd/system/*.wants/*;\
rm -f /lib/systemd/system/local-fs.target.wants/*; \
rm -f /lib/systemd/system/sockets.target.wants/*udev*; \
rm -f /lib/systemd/system/sockets.target.wants/*initctl*; \
rm -f /lib/systemd/system/basic.target.wants/*;\
rm -f /lib/systemd/system/anaconda.target.wants/*;
VOLUME /sys/fs/cgroup
VOLUME /var/run/docker.sock
CMD /sbin/init
I use this command to build the image in the directory containing the Dockerfile
docker build -t kubeadm_docker .
Now you can run prepared image and finish kubeadm installation.
Use the following command to run kubeadm_docker
image:
docker run -it -e "container=docker" --privileged=true -d --security-opt seccomp:unconfined --cap-add=SYS_ADMIN -v /sys/fs/cgroup:/sys/fs/cgroup:ro -v /var/run/docker.sock:/var/run/docker.sock kubeadm_docker /sbin/init
Find running container ID
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7dd73057620d kubeadm_docker "/sbin/init" About an hour ago Up About an hour furious_fermi
Now you can open container console:
docker exec -it 7dd73057620d /bin/bash
This is your script (with small modifications) to install kubeadm
echo "Updating Ubuntu..."
apt-get update -y
apt-get upgrade -y
systemctl start docker
echo "Install os requirements"
apt-get install -y \
curl \
apt-transport-https \
dialog \
python \
daemon
echo "Add Kubernetes repo..."
sh -c 'curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -'
sh -c 'echo "deb http://apt.kubernetes.io/ kubernetes-xenial main" > /etc/apt/sources.list.d/kubernetes.list'
apt-get update -y
echo "Installing Kubernetes requirements..."
apt-get install -y \
kubelet
# This is temporary fix until new version will be released
sed -i 38,40d /var/lib/dpkg/info/kubelet.postinst
apt-get install -y \
kubernetes-cni \
kubectl \
kubeadm
And finally you can execute
# kubeadm init
Everything works the same like on local machine.
Good luck :)
_From @SuperStevenZ on November 17, 2016 7:21_
@zreigz That solved the same problem of mine, thanks!
_From @zreigz on November 17, 2016 7:30_
No problem :)
We should set up a CI with docker-in-docker stuff.
@errordeveloper @zreigz Can you take this on?
At least we should document somewhere how to run kubeadm inside a container...
Sounds good for me. For sure we need to put all this stuff in docker image plus some config/start scripts to distinguish between master and node. Good start would be to create project for it like kubernetes/kubeadm-docker. It would be also right place for Dockerfile ,scripts and documentation
Create that as a private project first under zreigz/ and eventually we'll probably merge that code into this repo.
But first, prototype in your own space and we'll see how it goes.
Real assignee is @zreigz
Yes good point. I will do it. Next week (Monday, Tuesday) I am on conference so I will start on Wednesday.
@luxas I was wondering how I should provide kubeadm and kubernetes-cni packages. If I should build it from the current sources (to be able test the newest implementation) or just download the newest version from repository? For CI purpose I thing we should have current state of the code to be able test it, or just we need it to test the release version?
Hi thanks for the fix but still getting an issue after kubeadm init i'm getting a 0/3 on DNS, DNS doesn't seem to be running at all
Every 2.0s: kubectl get pods --all-namespaces Fri Dec 16 17:00:50 2016
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system dummy-2088944543-17sey 1/1 Running 0 11m
kube-system etcd-8dd8c92c6c38 1/1 Running 2 12m
kube-system kube-apiserver-8dd8c92c6c38 1/1 Running 4 12m
kube-system kube-controller-manager-8dd8c92c6c38 1/1 Running 2 11m
kube-system kube-discovery-1150918428-m506w 1/1 Running 0 11m
kube-system kube-dns-654381707-vuijm 0/3 ContainerCreating 0 11m
kube-system kube-proxy-tuw6u 0/1 CrashLoopBackOff 6 11m
kube-system kube-scheduler-8dd8c92c6c38 1/1 Running 2 10m
tried install network policy
root@8dd8c92c6c38:/# kubectl apply -f calico.yaml
the path "calico.yaml" does not exist
root@8dd8c92c6c38:/# kubectl create -f calico.yaml
the path "calico.yaml" does not exist
root@8dd8c92c6c38:/# kubectl apply -f kube-flannel.yml
the path "kube-flannel.yml" does not exist
root@8dd8c92c6c38:/# kubectl apply -f https://git.io/weave-kube
daemonset "weave-net" created
root@8dd8c92c6c38:/# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system dummy-2088944543-17sey 1/1 Running 0 46m
kube-system etcd-8dd8c92c6c38 1/1 Running 2 46m
kube-system kube-apiserver-8dd8c92c6c38 1/1 Running 4 46m
kube-system kube-controller-manager-8dd8c92c6c38 1/1 Running 2 45m
kube-system kube-discovery-1150918428-9m6rr 0/1 Pending 0 3m
kube-system kube-dns-654381707-vuijm 0/3 ContainerCreating 0 45m
kube-system kube-proxy-tuw6u 0/1 CrashLoopBackOff 13 45m
kube-system kube-scheduler-8dd8c92c6c38 1/1 Running 2 44m
kube-system weave-net-iv0bc 0/2 ContainerCreating 0 49s
info: 1 completed object(s) was(were) not shown in pods list. Pass --show-all to see all objects.
Hi again @zreigz
Now I have finally had time to go further with this and tested it - I can almost make it, but there is an error that docker picks vfs storage driver (probably because it cant use aufs on top of aufs? But as you describe the workaround above I am mounting the outer docker .sock in the inner docker so it should be possible to write with aufs? If I do
docker info
on my host machine it says that it is running aufs storage driver. - Whereas if I do docker info
inside the docker container with kubernetes it says it is using vfs storage driver.
Any idéas of why I get the following problem when running
kubeadm init
root@f50f087baa83:/# kubeadm init
[kubeadm] WARNING: kubeadm is in alpha, please do not use it for production clusters.
[preflight] Running pre-flight checks
[preflight] The system verification failed. Printing the output from the verification:
OS: Linux
KERNEL_VERSION: 4.4.0-43-generic
CONFIG_NAMESPACES: enabled
CONFIG_NET_NS: enabled
CONFIG_PID_NS: enabled
CONFIG_IPC_NS: enabled
CONFIG_UTS_NS: enabled
CONFIG_CGROUPS: enabled
CONFIG_CGROUP_CPUACCT: enabled
CONFIG_CGROUP_DEVICE: enabled
CONFIG_CGROUP_FREEZER: enabled
CONFIG_CGROUP_SCHED: enabled
CONFIG_CPUSETS: enabled
CONFIG_MEMCG: enabled
CONFIG_INET: enabled
CONFIG_EXT4_FS: enabled
CONFIG_PROC_FS: enabled
CONFIG_NETFILTER_XT_TARGET_REDIRECT: enabled (as module)
CONFIG_NETFILTER_XT_MATCH_COMMENT: enabled (as module)
CONFIG_OVERLAY_FS: enabled (as module)
CONFIG_AUFS_FS: enabled (as module)
CONFIG_BLK_DEV_DM: enabled
CGROUPS_CPU: enabled
CGROUPS_CPUACCT: enabled
CGROUPS_CPUSET: enabled
CGROUPS_DEVICES: enabled
CGROUPS_FREEZER: enabled
CGROUPS_MEMORY: enabled
DOCKER_VERSION: 1.12.1
DOCKER_GRAPH_DRIVER: vfs
[preflight] Some fatal errors occurred:
unsupported graph driver: vfs
[preflight] If you know what you are doing, you can skip pre-flight checks with `--skip-preflight-checks`
root@f50f087baa83:/#
Some more info after trying a bit more.
I changed the docker storage driver to "overlay" on the host. Then docker inside docker picked aufs as driver an I passed the "pre-flight checks", but now I am stuck in
[apiclient] Created API client, waiting for the control plane to become ready
On some other testing I realized that docker was not picking the same storage driver when It was started as a service via the /sbin/init
If I ran the docker image this way it didn't start the same driver as host (as mentioned above):
sudo docker run -it --privileged=true -d --security-opt seccomp:unconfined --cap-add=SYS_ADMIN -v /sys/fs/cgroup:/sys/fs/cgroup:ro -v /var/run/docker.sock:/var/run/docker.sock kubeadm_docker /sbin/init
If I started it without /sbin/init
and not as a daemon like this:
sudo docker run -it --privileged=true --security-opt seccomp:unconfined --cap-add=SYS_ADMIN -v /sys/fs/cgroup:/sys/fs/cgroup:ro -v /var/run/docker.sock:/var/run/docker.sock kubeadm_docker /bin/bash
then docker was picking the same storage driver as the host (but now systemctrl
was not working)
Some more updates:
I can now build a working kubeadm-in-docker-container with this Dockerfile:
FROM ubuntu:xenial-20161213
ARG DEBIAN_FRONTEND=noninteractive
RUN apt-get update -qq
RUN apt-get install -y \
apt-transport-https \
apt-utils \
ca-certificates \
curl \
dialog \
python \
daemon \
vim \
jq \
linux-image-$(uname -r)
# remove unwanted systemd services
RUN for i in /lib/systemd/system/sysinit.target.wants/*; do [ "${i##*/}" = "systemd-tmpfiles-setup.service" ] || rm -f "$i"; done; \
rm -f /lib/systemd/system/multi-user.target.wants/*;\
rm -f /etc/systemd/system/*.wants/*;\
rm -f /lib/systemd/system/local-fs.target.wants/*; \
rm -f /lib/systemd/system/sockets.target.wants/*udev*; \
rm -f /lib/systemd/system/sockets.target.wants/*initctl*; \
rm -f /lib/systemd/system/basic.target.wants/*;\
rm -f /lib/systemd/system/anaconda.target.wants/*;
# install docker (after removing unwanted systemd)
RUN apt-get install -y \
docker.io
RUN echo "Add Kubernetes repo..."
RUN sh -c 'curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -'
RUN sh -c 'echo "deb http://apt.kubernetes.io/ kubernetes-xenial main" > /etc/apt/sources.list.d/kubernetes.list'
RUN echo "Installing Kubernetes requirements..."
RUN apt-get update -y && apt-get install -y \
kubelet \
kubernetes-cni \
kubectl
RUN echo "Installing Kubeadm - this will fail at post-install but that doesn't matter"
RUN apt-get install -y \
kubeadm; exit 0
# Create volume for docker
VOLUME /var/lib/docker
I build with: docker build -t kubeadm_docker .
And then run:
docker run -it --privileged=true --name=master -d --security-opt seccomp:unconfined --cap-add=SYS_ADMIN -v /sys/fs/cgroup:/sys/fs/cgroup:ro kubeadm_docker /sbin/init
Wait a few (10-15) seconds until systemd and docker is up and running
Then I start kubeadm inside the running container:
docker exec -it master kubeadm init --token=acbec6.2852dff7cb569aa0
When it is initiated I start a second "worker" node:
docker run -it --privileged=true --name=node -d --security-opt seccomp:unconfined --cap-add=SYS_ADMIN -v /sys/fs/cgroup:/sys/fs/cgroup:ro kubeadm_docker /sbin/init
And after a few seconds join the master:
docker exec -it edge kubeadm join --token=acbec6.2852dff7cb569aa0 172.17.0.2
Currently there is some problem with the docker network because kube-proxy fais and enters a CrashLoopBackOff.
If I instead set --net=host
when running docker above, then kube-proxy and all pods are comming up OK - but that is not an option since I will need to have the containers running on the docker network with their ip:s
I also previously tried to run docker with the same process as on host: -v /var/run/docker.sock:/var/run/docker.sock
but I never got it working because when the docker inside the container is started with systemd it doesn't pick up the sock (or something like that).
Thanks @andersla!
Can you paste what kube-proxy fails with?
Thanks @luxas for your interest!
Unfortunately no details in journalctl -xeu kubelet
This is all I find about kube-proxy (repeated many times) I also attach full log.
Jan 09 14:40:02 1355b98bf8c7 kubelet[244]: I0109 14:40:02.690862 244 docker_manager.go:2524] checking backoff for container "kube-proxy" in pod "kube-proxy-7886l"
Jan 09 14:40:03 1355b98bf8c7 kubelet[244]: I0109 14:40:03.984818 244 docker_manager.go:2538] Back-off 20s restarting failed container=kube-proxy pod=kube-proxy-7886l_kube-system(71a1e950-d679-11e6-a9f7-02429d4c0f01)
Jan 09 14:40:03 1355b98bf8c7 kubelet[244]: E0109 14:40:03.984833 244 pod_workers.go:184] Error syncing pod 71a1e950-d679-11e6-a9f7-02429d4c0f01, skipping: failed to "StartContainer" for "kube-proxy" with CrashLoopBackOff: "Back-off 20s restarting failed container=kube-proxy pod=kube-proxy-7886l_kube-system(71a1e950-d679-11e6-a9f7-02429d4c0f01)"
The full log also complains about the kube-dns - but that is because I haven't yet started weave.
Here is log from kubectl describe pod -n kube-system kube-proxy-w0ng5
Name: kube-proxy-w0ng5
Namespace: kube-system
Node: 3551807cba77/172.17.0.2
Start Time: Tue, 10 Jan 2017 18:03:06 +0000
Labels: component=kube-proxy
k8s-app=kube-proxy
kubernetes.io/cluster-service=true
name=kube-proxy
tier=node
Status: Running
IP: 172.17.0.2
Controllers: DaemonSet/kube-proxy
Containers:
kube-proxy:
Container ID: docker://dcc2bc0b50a2477b72d451b776f35e327f1faf09e3cddb25d5609569c6f2a242
Image: gcr.io/google_containers/kube-proxy-amd64:v1.5.1
Image ID: docker-pullable://gcr.io/google_containers/kube-proxy-amd64@sha256:3b82b2e0862b3c0ece915de29a5a53634c9b0a73140340f232533c645decbd4b
Port:
Command:
kube-proxy
--kubeconfig=/run/kubeconfig
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Tue, 10 Jan 2017 18:08:48 +0000
Finished: Tue, 10 Jan 2017 18:08:48 +0000
Ready: False
Restart Count: 6
Volume Mounts:
/run/kubeconfig from kubeconfig (rw)
/var/run/dbus from dbus (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-g0ft5 (ro)
Environment Variables: <none>
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
kubeconfig:
Type: HostPath (bare host directory volume)
Path: /etc/kubernetes/kubelet.conf
dbus:
Type: HostPath (bare host directory volume)
Path: /var/run/dbus
default-token-g0ft5:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-g0ft5
QoS Class: BestEffort
Tolerations: dedicated=master:NoSchedule
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
9m 9m 1 {kubelet 3551807cba77} spec.containers{kube-proxy} Normal Pullingpulling image "gcr.io/google_containers/kube-proxy-amd64:v1.5.1"
9m 9m 1 {kubelet 3551807cba77} spec.containers{kube-proxy} Normal CreatedCreated container with docker id ecf446de342a; Security:[seccomp=unconfined]
9m 9m 1 {kubelet 3551807cba77} spec.containers{kube-proxy} Normal StartedStarted container with docker id ecf446de342a
9m 9m 1 {kubelet 3551807cba77} spec.containers{kube-proxy} Normal Pulled Successfully pulled image "gcr.io/google_containers/kube-proxy-amd64:v1.5.1"
9m 9m 1 {kubelet 3551807cba77} spec.containers{kube-proxy} Normal CreatedCreated container with docker id f562fb667a64; Security:[seccomp=unconfined]
9m 9m 1 {kubelet 3551807cba77} spec.containers{kube-proxy} Normal StartedStarted container with docker id f562fb667a64
9m 9m 2 {kubelet 3551807cba77} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "kube-proxy" with CrashLoopBackOff: "Back-off 10s restarting failed container=kube-proxy pod=kube-proxy-w0ng5_kube-system(09c4f65d-d75f-11e6-814c-0242255c9a68)"
9m 9m 1 {kubelet 3551807cba77} spec.containers{kube-proxy} Normal Started Started container with docker id 1a7d7d4f682b
9m 9m 1 {kubelet 3551807cba77} spec.containers{kube-proxy} Normal Created Created container with docker id 1a7d7d4f682b; Security:[seccomp=unconfined]
9m 9m 2 {kubelet 3551807cba77} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "kube-proxy" with CrashLoopBackOff: "Back-off 20s restarting failed container=kube-proxy pod=kube-proxy-w0ng5_kube-system(09c4f65d-d75f-11e6-814c-0242255c9a68)"
8m 8m 1 {kubelet 3551807cba77} spec.containers{kube-proxy} Normal Started Started container with docker id 89bdf4ba7e0b
8m 8m 1 {kubelet 3551807cba77} spec.containers{kube-proxy} Normal Created Created container with docker id 89bdf4ba7e0b; Security:[seccomp=unconfined]
8m 8m 3 {kubelet 3551807cba77} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "kube-proxy" with CrashLoopBackOff: "Back-off 40s restarting failed container=kube-proxy pod=kube-proxy-w0ng5_kube-system(09c4f65d-d75f-11e6-814c-0242255c9a68)"
8m 8m 1 {kubelet 3551807cba77} spec.containers{kube-proxy} Normal Created Created container with docker id f2b7a2b5078d; Security:[seccomp=unconfined]
8m 8m 1 {kubelet 3551807cba77} spec.containers{kube-proxy} Normal Started Started container with docker id f2b7a2b5078d
8m 7m 6 {kubelet 3551807cba77} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "kube-proxy" with CrashLoopBackOff: "Back-off 1m20s restarting failed container=kube-proxy pod=kube-proxy-w0ng5_kube-system(09c4f65d-d75f-11e6-814c-0242255c9a68)"
6m 6m 1 {kubelet 3551807cba77} spec.containers{kube-proxy} Normal Created Created container with docker id 28deaf41d920; Security:[seccomp=unconfined]
6m 6m 1 {kubelet 3551807cba77} spec.containers{kube-proxy} Normal Started Started container with docker id 28deaf41d920
6m 4m 12 {kubelet 3551807cba77} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "kube-proxy" with CrashLoopBackOff: "Back-off 2m40s restarting failed container=kube-proxy pod=kube-proxy-w0ng5_kube-system(09c4f65d-d75f-11e6-814c-0242255c9a68)"
9m 4m 6 {kubelet 3551807cba77} spec.containers{kube-proxy} Normal Pulled Container image "gcr.io/google_containers/kube-proxy-amd64:v1.5.1" already present on machine
4m 4m 1 {kubelet 3551807cba77} spec.containers{kube-proxy} Normal Created Created container with docker id dcc2bc0b50a2; Security:[seccomp=unconfined]
4m 4m 1 {kubelet 3551807cba77} spec.containers{kube-proxy} Normal Started Started container with docker id dcc2bc0b50a2
9m 10s 43 {kubelet 3551807cba77} spec.containers{kube-proxy} Warning BackOff Back-off restarting failed docker container
4m 10s 18 {kubelet 3551807cba77} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "kube-proxy" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=kube-proxy pod=kube-proxy-w0ng5_kube-system(09c4f65d-d75f-11e6-814c-0242255c9a68)"
Yes, I see _that_ it's crashlooping, but could you give e.g. kubectl -n kube-system logs kube-proxy-w0ng5
?
So we actually see the reason _why_ :smile:
Hey that's brilliant:)
root@3551807cba77:/# kubectl -n kube-system logs kube-proxy-w0ng5
I0110 18:29:01.705993 1 server.go:215] Using iptables Proxier.
W0110 18:29:01.706933 1 proxier.go:254] clusterCIDR not specified, unable to distinguish between internal and external traffic
I0110 18:29:01.706947 1 server.go:227] Tearing down userspace rules.
I0110 18:29:01.712693 1 conntrack.go:81] Set sysctl 'net/netfilter/nf_conntrack_max' to 262144
I0110 18:29:01.712927 1 conntrack.go:66] Setting conntrack hashsize to 65536
write /sys/module/nf_conntrack/parameters/hashsize: operation not supported
I can fix it with a workaround: setting --conntrack-max-per-core=0
and then restarting proxy. A 0-val skips reconfiguring the nf_conntrack_max and leaves as is (65536). I inject start parameter like this:
First enter docker container:
docker exec -it master bash
then apply fix:
kubectl -n kube-system get ds -l 'component=kube-proxy' -o json | jq '.items[0].spec.template.spec.containers[0].command |= .+ ["--conntrack-max-per-core=0"]' | kubectl apply -f - && kubectl -n kube-system delete pods -l 'component=kube-proxy'
Now I get the CrashLoop on Weave instead when I later do a kubectl apply -f weave.yaml
, the following is the log output from the weave pod:
/proc/sys/net/bridge/bridge-nf-call-iptables not found
I also tried starting with kube-proxy parameter --proxy-mode=userspace
but same result.
I think this will solve weave issue: https://github.com/weaveworks/weave/pull/2659
@andersla Yes, that seems to fix the problem. Can you try a build from HEAD?
For example, you could use the luxas/weave-(kube|npc):v1.9.0-alpha.5
images that are from HEAD~ish.
Let me know if it works, and please comment here exactly what you're doing when now (shell commands, Dockerfile, other scripts, etc.) so others can take advantage of it.
I used the latest image from weaveworks/weave-kube
I also used the latest yaml-template https://github.com/weaveworks/weave/blob/master/prog/weave-kube/weave-daemonset.yaml
Unfortunately kube-dns didn't work (it is stuch in ContainerCreating. The error message from kubelet after starting weave is:
an 15 16:14:30 7c12205804da kubelet[540]: I0115 16:14:30.443327 540 operation_executor.go:917] MountVolume.SetUp succeeded for volume "kubernetes.io/secret/c23fb73d-db39-11e6-b84d-0242b1ac1840-default-token-142vd" (spec.Name: "default-token-142vd") pod "c23fb73d-db39-11e6-b84d-0242b1ac1840" (UID: "c23fb73d-db39-11e6-b84d-0242b1ac1840").
Jan 15 16:14:31 7c12205804da kubelet[540]: E0115 16:14:31.381741 540 docker_manager.go:373] NetworkPlugin cni failed on the status hook for pod 'kube-dns-2924299975-9gjcg' - Unexpected command output Device "eth0" does not exist.
Jan 15 16:14:31 7c12205804da kubelet[540]: with error: exit status 1
If I only started the master node and not joining another node then kubedns came up OK when I applied weave.yaml
I also tested the weave.yaml with latest weave-kube on a Vagrant installation and not in my docker-experiment and then it all worked.
This is the weave.yaml I used for kubectl apply -f weave.yaml
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: weave-net
namespace: kube-system
spec:
template:
metadata:
labels:
name: weave-net
annotations:
scheduler.alpha.kubernetes.io/tolerations: |
[
{
"key": "dedicated",
"operator": "Equal",
"value": "master",
"effect": "NoSchedule"
}
]
spec:
hostNetwork: true
hostPID: true
containers:
- name: weave
image: weaveworks/weave-kube:latest
imagePullPolicy: Always
command:
- /home/weave/launch.sh
livenessProbe:
initialDelaySeconds: 30
httpGet:
host: 127.0.0.1
path: /status
port: 6784
securityContext:
privileged: true
volumeMounts:
- name: weavedb
mountPath: /weavedb
- name: cni-bin
mountPath: /host/opt
- name: cni-bin2
mountPath: /host/home
- name: cni-conf
mountPath: /host/etc
- name: dbus
mountPath: /host/var/lib/dbus
resources:
requests:
cpu: 10m
- name: weave-npc
image: weaveworks/weave-npc:latest
imagePullPolicy: Always
resources:
requests:
cpu: 10m
securityContext:
privileged: true
restartPolicy: Always
volumes:
- name: weavedb
emptyDir: {}
- name: cni-bin
hostPath:
path: /opt
- name: cni-bin2
hostPath:
path: /home
- name: cni-conf
hostPath:
path: /etc
- name: dbus
hostPath:
path: /var/lib/dbus
hey guys, i ran into this thread and it freaking rocks! great stuff.
i really want to use this approach for CI against our repo (which is fairly complex, honestly). we have a Helm/Tiller requirement for launching quite a few charts for CI. have any of you run into this, or have suggestions for getting this going? Tiller seems to barf all over itself in this situation:
root@JINKITNIX05:~/openstack-helm# kubectl logs tiller-deploy-3299276078-6kdzw -n kube-system
Error from server (BadRequest): the server rejected our request for an unknown reason (get pods tiller-deploy-3299276078-6kdzw)
root@JINKITNIX05:~/openstack-helm#
i may try with other SDN's. we've been using Calico so far because L3 is a little more straightforward to troubleshoot in hacky situations, but if Weave is better (since it's L2)...I'll try whatever gets us past the Tiller issue. i think Tiller is unhappy because at the end of the day it appears that it associates with 127.0.0.1...and i've seen that cause problems in the past testing other things. any input would be amazing. again, really awesome props to the folks who are hacking things up! thank you!!
Hi! Great that we are more people wanting this to work. I don't have experience with calico. On the cloud we are running Weave so thats what i wanted to get working on this project. But I am stuck and haven't had time to dig further why kube-dns doesn't come up when I apply Weave as described above.
Now the latest stable weave is working better than before....
kubectl apply -f https://git.io/weave-kube
..but unfortunately still same issue with kube-dns not comming up, stuck in ContainerCreating:
root@18a7d1ec5124:/# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system dummy-2088944543-pvvdx 1/1 Running 0 5m
kube-system etcd-18a7d1ec5124 1/1 Running 0 4m
kube-system kube-apiserver-18a7d1ec5124 1/1 Running 2 5m
kube-system kube-controller-manager-18a7d1ec5124 1/1 Running 0 4m
kube-system kube-discovery-1769846148-6tv4l 1/1 Running 0 5m
kube-system kube-dns-2924299975-4608d 0/4 ContainerCreating 0 5m
kube-system kube-proxy-k0stq 1/1 Running 0 4m
kube-system kube-proxy-tnm8h 1/1 Running 0 4m
kube-system kube-scheduler-18a7d1ec5124 1/1 Running 0 4m
kube-system weave-net-mff6t 2/2 Running 0 3m
kube-system weave-net-t7zcl 2/2 Running 0 3m
and after applying weave, this error message stops:
Feb 04 18:06:57 18a7d1ec5124 kubelet[252]: E0204 18:06:57.125434 252 pod_workers.go:184] Error syncing pod 7dc68091-eb04-11e6-a321-02425e578ba1, skipping: failed to "SetupNetwork" for "kube-dns-2924299975-4608d_kube-system" with SetupNetworkError: "Failed to setup network for pod \"kube-dns-2924299975-4608d_kube-system(7dc68091-eb04-11e6-a321-02425e578ba1)\" using network plugins \"cni\": cni config unintialized; Skipping pod"
and instead once I see:
Feb 04 18:06:59 18a7d1ec5124 kubelet[252]: E0204 18:06:59.615375 252 docker_manager.go:373] NetworkPlugin cni failed on the status hook for pod 'kube-dns-2924299975-4608d' - Unexpected command output Device "eth0" does not exist.
Feb 04 18:06:59 18a7d1ec5124 kubelet[252]: with error: exit status 1
If I use Flannel as network plugin instead it works.
docker exec -it master bash
curl -sSL "https://github.com/coreos/flannel/blob/master/Documentation/kube-flannel.yml?raw=true" | kubectl create -f -
So if you use Flannel, then everything is working, here is the complete setup:
Dockerfile:
FROM ubuntu:xenial-20161213
ARG DEBIAN_FRONTEND=noninteractive
RUN apt-get update -qq
RUN apt-get install -y \
apt-transport-https \
apt-utils \
ca-certificates \
curl \
dialog \
python \
daemon \
vim \
jq
# remove unwanted systemd services
RUN for i in /lib/systemd/system/sysinit.target.wants/*; do [ "${i##*/}" = "systemd-tmpfiles-setup.service" ] || rm -f "$i"; done; \
rm -f /lib/systemd/system/multi-user.target.wants/*;\
rm -f /etc/systemd/system/*.wants/*;\
rm -f /lib/systemd/system/local-fs.target.wants/*; \
rm -f /lib/systemd/system/sockets.target.wants/*udev*; \
rm -f /lib/systemd/system/sockets.target.wants/*initctl*; \
rm -f /lib/systemd/system/basic.target.wants/*;\
rm -f /lib/systemd/system/anaconda.target.wants/*;
# install docker (after removing unwanted systemd)
RUN apt-get install -y \
docker.io
RUN echo "Add Kubernetes repo..."
RUN sh -c 'curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -'
RUN sh -c 'echo "deb http://apt.kubernetes.io/ kubernetes-xenial main" > /etc/apt/sources.list.d/kubernetes.list'
RUN echo "Installing Kubernetes requirements..."
RUN apt-get update -y && apt-get install -y \
kubelet \
kubernetes-cni \
kubectl
RUN echo "Installing Kubeadm - this will fail at post-install but that doesn't matter"
RUN apt-get install -y \
kubeadm; exit 0
# Create volume for docker
VOLUME /var/lib/docker
Build it with:
docker build -t kubeadm_docker .
And then run:
docker run -it --privileged=true --name=master -h master -d --security-opt seccomp:unconfined --cap-add=SYS_ADMIN -v /sys/fs/cgroup:/sys/fs/cgroup:ro kubeadm_docker /sbin/init
Wait a few (10-15) seconds until systemd and docker is up and running
Then I start kubeadm inside the running container:
docker exec -it master kubeadm init --skip-preflight-checks --token=acbec6.2852dff7cb569aa0
When it is initiated I start a second "worker" node:
docker run -it --privileged=true --name=node -h node -d --security-opt seccomp:unconfined --cap-add=SYS_ADMIN -v /sys/fs/cgroup:/sys/fs/cgroup:ro kubeadm_docker /sbin/init
And after a few seconds (until systemd and docker is up) join the master:
docker exec -it node kubeadm join --skip-preflight-checks --token=acbec6.2852dff7cb569aa0 172.17.0.2
When they have joined, - enter master and apply workaround for crashing kube-proxy:
docker exec -it master bash
kubectl -n kube-system get ds -l 'component=kube-proxy' -o json | jq '.items[0].spec.template.spec.containers[0].command |= .+ ["--conntrack-max-per-core=0"]' | kubectl apply -f - && kubectl -n kube-system delete pods -l 'component=kube-proxy'
Finally apply flannel overlay network:
curl -sSL "https://github.com/coreos/flannel/blob/master/Documentation/kube-flannel.yml?raw=true" | kubectl create -f -
I had no problem installing Helm, Traefic or GlusterFS in Kubernetes in this setting:)
kubeadm-dind-cluster basically does what the last comment outlined, providing automation so you don't have to type the commands manually (although as of now it uses CNI bridge plugin with some hacks instead of flannel, but this I'll fix quite soon).
It also makes it easy to build both k8s components and kubeadm from local source and use the binaries in the cluster you start. Besides, there were some non-apparent problems I encountered while working on it, e.g. agetty eating 100% CPU and causing docker crashes unless you take care to disable it.
Some of the changes coming quite soon in kubeadm-dind-cluster:
kubeadm-dind-cluster also provides automation for e2e tests. Another interesting trait of it is that you can use same remote docker engine for both building k8s and running kubeadm-dind-cluster without copying back the binaries (it pulls them directly from build data container), which may be important if you're working with remote docker over slow connection.
... forgot to mention it configures local kubectl for you so you don't need to do docker exec
on your master container to access your cluster.
As I already mentioned, while DIND may seem easy on surface, you can have some unexpected problems with it. Some of the problems are already fixed in kubeadm-dind-cluster and base image it uses. E.g. you need to do some mounts, also you need to use STOPSIGNAL SIGRTMIN+3
and resist the temptation to use /sbin/init
as ENTRYPOINT
, and vfs driver can be quite slow at times. So... here be dragons ;)
@ivan4th Thanks for all the work you've been doing with kubeadm and dind :)
Can you open a new issue referencing this issue where we can discuss the MVP needed for merging kubeadm-dind-cluster into this repo?
After looking quickly, I found some points that we might want to do before a possible MVP:
What do you think? Thanks for the awesome start, I can't wait to actually integrate this into something kubeadm official :+1:
cc @jbeda @lukemarsden @errordeveloper @mikedanese @timothysc @sttts
Thanks for the awesome start, I can't wait to actually integrate this into something kubeadm official
if we can devel-build, kubeadm-local-up-cluster that would be fantastic.
@ivan4th @luxas What's the status of this?
I don't know really... @ivan4th
@jamiehannaford
Overall k-d-c is quite usable in its current form IMO. It also has its own public CI based on Travis (BTW I also succeeded in running DIND on CircleCI if it's of some interest)
@luxas Maybe we can use @andersla's solution instead of a full DIND cluster? If so, would we need to host the Docker image anywhere, or just document what the Dockerfile looks like?
It'd be great if we can get a fix out for this issue for 1.9
I don't have cycles to work on this. If anyone else, can please do!
@jamiehannaford problem is, much of "full" DIND cluster is dedicated to handling numerous problems that arise from "simple" DIND usage. These may be quite obscure at times, see e.g. https://github.com/Mirantis/kubeadm-dind-cluster/commit/405c8bead4fb443582328fd3c7b8f01452872438 (I think I'll need to submit a fix for k8s for this). As of kubeadm-dind-cluster, it's still quite usable and I try to keep it up-to-date( @danehans and @pmichali are using it for k8s IPv6 e2e testing and Virtlet uses it to run it's e2e tests on CircleCI), although I spend a lot of time on other projects so I didn't manage to rewrite it in Go yet.
We talked about this in the SIG meeting yesterday, and we're gonna close the issue.
Developing and maintaining a full-blown DIND solution is not in scope for the core kubeadm team for a foreseeable future, if ever. We're super happy that the community provides these solutions though, like @ivan4th's hard work on the Mirantis project. If we find a good place to document the possibility to use that project, I'm personally fine with referencing it. Thanks!
Most helpful comment
So if you use Flannel, then everything is working, here is the complete setup:
Dockerfile:
Build it with:
docker build -t kubeadm_docker .
And then run:
docker run -it --privileged=true --name=master -h master -d --security-opt seccomp:unconfined --cap-add=SYS_ADMIN -v /sys/fs/cgroup:/sys/fs/cgroup:ro kubeadm_docker /sbin/init
Wait a few (10-15) seconds until systemd and docker is up and running
Then I start kubeadm inside the running container:
docker exec -it master kubeadm init --skip-preflight-checks --token=acbec6.2852dff7cb569aa0
When it is initiated I start a second "worker" node:
docker run -it --privileged=true --name=node -h node -d --security-opt seccomp:unconfined --cap-add=SYS_ADMIN -v /sys/fs/cgroup:/sys/fs/cgroup:ro kubeadm_docker /sbin/init
And after a few seconds (until systemd and docker is up) join the master:
docker exec -it node kubeadm join --skip-preflight-checks --token=acbec6.2852dff7cb569aa0 172.17.0.2
When they have joined, - enter master and apply workaround for crashing kube-proxy:
docker exec -it master bash
kubectl -n kube-system get ds -l 'component=kube-proxy' -o json | jq '.items[0].spec.template.spec.containers[0].command |= .+ ["--conntrack-max-per-core=0"]' | kubectl apply -f - && kubectl -n kube-system delete pods -l 'component=kube-proxy'
Finally apply flannel overlay network:
curl -sSL "https://github.com/coreos/flannel/blob/master/Documentation/kube-flannel.yml?raw=true" | kubectl create -f -
I had no problem installing Helm, Traefic or GlusterFS in Kubernetes in this setting:)