Machine: Machine create fails with latest Docker

Created on 29 Jun 2017  ·  46Comments  ·  Source: docker/machine

Hello

docker-machine version 0.12.0, build 45c69ad

docker-machine create fails now:

docker-machine -D create \
    --driver google \
    --google-project project \
    --google-zone us-east1-d \
    --google-machine-type n1-standard-1 \
    --google-disk-size 20 \
    --google-preemptible \
    build-vm2

The machine is created and Docker is installed, but it won't start. The problem appears to be a related to a new version of Docker getting installed by a new version of the install script over at https://get.docker.com. My installs went from 17.05.0-ce to 17.06.0-ce, and with that change, Docker installs but does not start.

Jun 29 00:50:08 build-vm2 docker[5705]: `docker daemon` is not supported on Linux. Please run `dockerd` directly

or

Jun 29 00:56:12 build-vm2 dockerd[6407]: Error starting daemon: error initializing graphdriver: driver not supported

Unless I change:

/usr/bin/docker daemon -H tcp://0.0.0.0:2376 -H unix:///var/run/docker.sock --storage-driver aufs --tlsverify --tlscacert /etc/docker/ca.pem --tlscert /etc/docker/server.pem --tlskey /etc/docker/server-key.pem --label provider=google

to

/usr/bin/dockerd -H tcp://0.0.0.0:2376 -H unix:///var/run/docker.sock --tlsverify --tlscacert /etc/docker/ca.pem --tlscert /etc/docker/server.pem --tlskey /etc/docker/server-key.pem --label provider=google

in /etc/systemd/system/docker.service.d/10-machine.conf.

areprovision kinbug

Most helpful comment

I'm using this as a workaround:

docker-machine create \
--driver amazonec2 \
--engine-install-url=https://web.archive.org/web/20170623081500/https://get.docker.com

or
--engine-install-url=https://releases.rancher.com/install-docker/17.05.sh

All 46 comments

Same problem here

docker-machine create 
    --driver=digitalocean
    --digitalocean-access-token=XXX 
    --digitalocean-size=2gb
    machinename

Yesterday the same command worked fine with docker version 17.05.0-ce
Today my new machine's docker won't start (17.06.0-ce)
I've tried multiple time.

I can confirm this too:

dm create -d digitalocean \
--digitalocean-access-token XXX \
--digitalocean-size 4gb machine

I'm using this as a workaround:

docker-machine create \
--driver amazonec2 \
--engine-install-url=https://web.archive.org/web/20170623081500/https://get.docker.com

or
--engine-install-url=https://releases.rancher.com/install-docker/17.05.sh

I have the same issue.

docker version : Docker version 17.06.0-ce
docker-machine version : 0.12.0, build 45c69ad

docker-machine create --driver amazonec2 --amazonec2-region eu-west-1 --amazonec2-instance-type t2.small --amazonec2-access-key XXX --amazonec2-secret-key XXX test-create-machine

Jun 29 12:26:56 ip-172-31-10-149 systemd[1]: Starting Docker Application Container Engine...
Jun 29 12:26:56 ip-172-31-10-149 docker[5234]: docker daemon is not supported on Linux. Please run dockerd directly

docker daemon is not supported on Linux. Please run dockerd directly

I was able to get it working with this PR
https://github.com/docker/machine/pull/4128

Just compile docker-machine with this fix and everything works again

@gnomus super, that's interesting! I wonder why it was working for 17.05.0-ce, though.

@therealppa haahaha awesome! I was wondering how I might get the old version of that script, or whether the live script takes params to install older version. web.archive.org definitely didn't not occur to me.

@dminkovsky I don't think it will work forever, if you look into the script it doesn't actually specify the version anywhere... Still, right now it works.

@therealppa @dminkovsky A longer term fix is to change the line 457 of the script from

$sh_c 'apt-get install -y -q docker-ce'

to

$sh_c "apt-get install -y -q docker-ce=17.05.0~ce-0~$lsb_dist-$dist_version"

Hopefully the fixed version of docker-machine is released soon.

same for me
We make it working by using "dockerd" instead of "docker daemon" in the file /etc/systemd/system/docker.service.d/10-machine.conf

@fabio-barile what about the --storage-driver aufs arg? Mine wouldn't start unless I got rid of that, too.

@dminkovsky I had the same problem on a autoscaling ci with gitlab, got the aufs problem + dockerd problem, had to solve it with specifying overlay in the storage driver.

Beyond the storage driver issue I'm also seeing verification errors for certificates created by gitlab-runner (9.3.0). @JustEra have you been running into the same issue or am I the only one?

http: TLS handshake error from ...:
 tls:
  failed to verify client's certificate: x509:
   certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "unknown")
ERROR: Error creating machine:
 Error checking the host:
  Error checking and/or regenerating the certs:
   There was an error validating certificates for host "...":
    remote error: tls: bad certificate  driver=amazonec2 name=...

This fixed storage-driver issue for me (just removed that parameter; for systemd ONLY). Apply on top of https://github.com/docker/machine/pull/4128 and re-build:

diff --git a/libmachine/provision/systemd.go b/libmachine/provision/systemd.go
index 90d02603..05d63bb5 100644
--- a/libmachine/provision/systemd.go
+++ b/libmachine/provision/systemd.go
@@ -53,7 +53,7 @@ func (p *SystemdProvisioner) GenerateDockerOptions(dockerPort int) (*DockerOptio

        engineConfigTmpl := `[Service]
 ExecStart=
-ExecStart=/usr/bin/` + arg + ` -H tcp://0.0.0.0:{{.DockerPort}} -H unix:///var/run/docker.sock --storage-driver {{.EngineOptions.StorageDriver}} --tlsverify --tlscacert {{.AuthOptions.CaCertRemotePath}} --tlscert {{.AuthOptions.ServerCertRemotePath}} --tlskey {{.AuthOptions.ServerKeyRemotePath}} {{ range .EngineOptions.Labels }}--label {{.}} {{ end }}{{ range .EngineOptions.InsecureRegistry }}--insecure-registry {{.}} {{ end }}{{ range .EngineOptions.RegistryMirror }}--registry-mirror {{.}} {{ end }}{{ range .EngineOptions.ArbitraryFlags }}--{{.}} {{ end }}
+ExecStart=/usr/bin/` + arg + ` -H tcp://0.0.0.0:{{.DockerPort}} -H unix:///var/run/docker.sock --tlsverify --tlscacert {{.AuthOptions.CaCertRemotePath}} --tlscert {{.AuthOptions.ServerCertRemotePath}} --tlskey {{.AuthOptions.ServerKeyRemotePath}} {{ range .EngineOptions.Labels }}--label {{.}} {{ end }}{{ range .EngineOptions.InsecureRegistry }}--insecure-registry {{.}} {{ end }}{{ range .EngineOptions.RegistryMirror }}--registry-mirror {{.}} {{ end }}{{ range .EngineOptions.ArbitraryFlags }}--{{.}} {{ end }}

For anyone who wants a specific older version, we (Rancher) maintain slightly modified get.docker.com scripts to install each one:

http://rancher.com/docs/rancher/v1.6/en/hosts/#supported-docker-versions

@fabio-barile above is entirely correct. How 'testing' lets such things be emitted, can't imagine.

More information here: https://github.com/docker/for-linux/issues/11#issuecomment-312143765

@vincent99 ...always like the sound of you guys, and thanks.

+1
I check back every day for a new docker-machine release... This bug is killing me :-)

For now, I add /etc/systemd/system/docker.service.d/20-machine.conf which overrides 10-machine.conf with the correct command line. That way further docker-machine command which would normally break it, doesn't. Of course the longer it takes for this to be fixed in the release, the more work I have putting everything back!

Thanks for the great breakdown of details on the issue - We're looking into it to try and figure out what went wrong.

related to https://github.com/docker/for-linux/issues/11#issuecomment-312143765

So this is not related to the install script at get.docker.com but rather related to the version comparison not working correctly and with 17.06.0-ce being the first to officially deprecate docker daemon that is why we are seeing failures.

This PR (docker/machine#4128) seems to remedy this issue and I'll have a PR up by late afternoon that adds tests for the other comparison functions so that we don't run into something like this again.

@seemethere Sounds good, thanks. Like to hear about the test.

The diff on one of the PRs appeared a little odd to me, but think you guys will have taken care of that.

The 0.12.1 release fixes this bug. Thanks everyone for your patience and your help.

@shin- thanks for the quick fix! Looking forward to using it.

@shin- This patch fixes the docker daemon -> dockerd part, but Docker still doesn't start on the machine due to

dockerd[6407]: Error starting daemon: error initializing graphdriver: driver not supported

@shin- I was able to get around the storage driver issue by adding --engine-storage-driver=overlay (https://github.com/docker/machine/issues/3895#issuecomment-270934728). So here's my whole docker-machine invocation.

docker-machine -D create \
    --driver google \
    --google-project $project \
    --google-zone $zone \
    --google-machine-type $type \
    --google-disk-size $size \
    --google-preemptible \
    --engine-storage-driver=overlay \
    $name

Without --engine-storage-driver=overlay it still fails with

dockerd[6407]: Error starting daemon: error initializing graphdriver: driver not supported

as before and as in #3895

Did you see the log that explained why it crashed?

On Fri, Jul 7, 2017 at 9:39 AM, Seweryn Zeman notifications@github.com
wrote:

@shin- https://github.com/shin- unfortunately 0.12.1 didn't fixed this
for me.

$ docker -v
Docker version 17.06.0-ce, build 02c1d87
$ docker-machine -v
docker-machine version 0.12.1, build c8b17e8

I'm creating an amazonec2 machine with --amazonec2-region=eu-central-1
which creates an ami-fe408091 for me.

The output from docker-machine create is:

Running pre-create checks...
Creating machine...
(test-d-m) Launching instance...
Waiting for machine to be running, this may take a few minutes...
Detecting operating system of created instance...
Waiting for SSH to be available...
Detecting the provisioner...
Provisioning with ubuntu(systemd)...
Installing Docker...
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
Error creating machine: Error running provisioning: ssh command error:
command : sudo systemctl -f start docker
err : exit status 1
output : Job for docker.service failed because the control process exited with error code. See "systemctl status docker.service" and "journalctl -xe" for details.

Output from launched machine is:

$ systemctl status docker.service
● docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/docker.service.d
└─10-machine.conf
Active: inactive (dead) (Result: exit-code) since Fri 2017-07-07 13:34:47 UTC; 36s ago
Docs: https://docs.docker.com
Process: 5522 ExecStart=/usr/bin/dockerd -H tcp://0.0.0.0:2376 -H unix:///var/run/docker.sock --storage-driver aufs --tlsverify --tlscacer
Main PID: 5522 (code=exited, status=1/FAILURE)

Jul 07 13:34:46 test-d-m systemd[1]: docker.service: Unit entered failed state.
Jul 07 13:34:46 test-d-m systemd[1]: docker.service: Failed with result 'exit-code'.
Jul 07 13:34:47 test-d-m systemd[1]: docker.service: Service hold-off time over, scheduling restart.
Jul 07 13:34:47 test-d-m systemd[1]: Stopped Docker Application Container Engine.
Jul 07 13:34:47 test-d-m systemd[1]: docker.service: Start request repeated too quickly.
Jul 07 13:34:47 test-d-m systemd[1]: Failed to start Docker Application Container Engine.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/docker/machine/issues/4156#issuecomment-313683311,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AANWZXHODzL3Lumb5NqlmXwnSi3VZBBkks5sLjUlgaJpZM4OIt7R
.

@dminkovsky Thanks for the workaround. I decided to use overlay2 since its a latest version of the driver instead.

Do you know if there a workaround for docker-machine rm {instance-name} as well? I'm getting an error related to EOF and it leaves remnants of key pairs on the AWS cloud preventing me from recreating the instance.

Sorry, I removed my message after I debugged hard and noticed it's actually due to what @dminkovsky wrote:

Without --engine-storage-driver=overlay it still fails with
dockerd[6407]: Error starting daemon: error initializing graphdriver: driver not supported
as before and as in #3895

Do we have any issue for this one particular case of using AUFS engine storage?

@cadavre

Do we have any issue for this one particular case of using AUFS engine storage?

I've seen https://github.com/docker/machine/issues/3895, which is open and which you also referenced.

Interestingly enough, I am no longer seeing this bug. I get --storage-driver overlay

@drujensen

I decided to use overlay2 since its a latest version of the driver instead.

Oh cool, thanks, I didn't know that.

Do you know if there a workaround for docker-machine rm {instance-name} as well?

Not sure, I haven't had that bug. I use docker-machine rm -f when the machine has been terminated and won't respond. With -f, docker-machine rm removes the VM and associated disks even if it can't reach the box.

@dminkovsky Can you create a new issue for this? It's unrelated to the dockerd/docker daemon issue, so we should treat it separately as well. And please indicate what OS you're provisioning as well :)

@shin- i'm all good. docker-machine is working 100% right now for me. are you referring to the overlay2 thing?

My other issue regarding removing machines was addressed in pr #4187. Thx.

@dminkovsky Sorry - yes, the one you mention here

@shin - After experiencing the issue in https://github.com/docker/machine/issues/4168, I attempted to re-create my staging server and found a slough of issues with docker-machine create that has been reported in multiple recent tickets:

Are these all related? Start tracking these here? I can confirm that this issue is still happening today.

@shin- docker-machine v0.12.1 exhibits the same issue still

I'm still getting the same issue with version 0.12.1.

screen shot 2017-07-27 at 11 32 00 am

Please update to the latest release found on github:
https://github.com/docker/machine/releases/tag/v0.12.2

@eamontaaffe @ajwah @costa

Thank you @dminkovsky I was getting this error on 0.12.2 today as well!!! Seems like 10-machine.conf file does not get overridden during update

You're welcome!

I specify "overlay" in the command line option for storage engine and since
the my machines boot.

ср, 2 авг. 2017 г. в 12:05, Denis notifications@github.com:

Thank you @dminkovsky https://github.com/dminkovsky I was getting this
error on 0.12.2 today as well!!! Seems like 10-machine.conf file does not
get overridden during update


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/docker/machine/issues/4156#issuecomment-319719085,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AANWZSYqy1uGhWeXozx35OnFhPRSb144ks5sUJ5YgaJpZM4OIt7R
.

If using systems with kernel >4.4 I suggest using overlay2.

I wasn't able to get the machine to use overlay2, and the use case for this
fortunately was just building/CD

ср, 2 авг. 2017 г. в 12:36, Seweryn Zeman notifications@github.com:

If using systems with kernel >4.4 I suggest using overlay2.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/docker/machine/issues/4156#issuecomment-319727847,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AANWZXLGHjLvfOOAgmBWV0zOEBDZBdSVks5sUKWBgaJpZM4OIt7R
.

Also getting this error on 0.12.2 :-(

this still opened!

I still see this issue with docker-machine 0.12.2. I moved forwards by uninstalling docker on the provisioned machine (sudo apt purge docker-ce && sudo apt autoremove) and used correct Rancher install script for my version as listed above.

For some reason, this still fails to start docker, but rebooting the machine then solves it.

Can confirm, still the same error

@jhartma I guess is necessary upgrade to latest release (linux image) and works

@kassanmoor seems my AMI didn't support it on AWS, I got it to work with the default one

Was this page helpful?
0 / 5 - 0 ratings