At work we wrap our dependent services in a script that check if the link is up yet. I know one of my colleagues would be interested in this too! Personally I feel it's a container-level concern to wait for services to be available, but I may be wrong :)

d11wtq on 5 Aug 2014

We do the same thing with wrapping. You can see an example here: https://github.com/dominionenterprises/tol-api-php/blob/master/tests/provisioning/set-env.sh

nubs on 5 Aug 2014

👍3

It'd be handy to have an entrypoint script that loops over all of the links and waits until they're working before starting the command passed to it.

This should be built in to Docker itself, but the solution is a way off. A container shouldn't be considered started until the link it exposes has opened.

bfirsh on 5 Aug 2014

👍21 ❤1 🎉1

@bfirsh that's more than I was imagining, but would be excellent.

A container shouldn't be considered started until the link it exposes has opened.

I think that's exactly what people need.

For now, I'll be using a variation on https://github.com/aanand/docker-wait

dancrumb on 5 Aug 2014

👍17

Yeah, I'd be interested in something like this - meant to post about it earlier.

The smallest impact pattern I can think of that would fix this usecase for us would be to be the following:

Add "wait" as a new key in fig.yml, with similar value semantics as link. Docker would treat this as a pre-requisite and wait until this container has exited prior to carrying on.

So, my docker file would look something like:

db:
  image: tutum/mysql:5.6

initdb:
  build: /path/to/db
  link:
    - db:db
  command: /usr/local/bin/init_db

app:
  link:
    - db:db
  wait:
    - initdb

On running app, it will start up all the link containers, then run the wait container and only progress to the actual app container once the wait container (initdb) has exited. initdb would run a script that waits for the database to be available, then runs any initialisations/migrations/whatever, then exits.

That's my thoughts, anyway.

silarsis on 5 Aug 2014

👍73 ❤13 😄8

(revised, see below)

dnephin on 5 Aug 2014

+1 here too. It's not very appealing to have to do this in the commands themselves.

dsyer on 14 Aug 2014

+1 as well. Just ran into this issue. Great tool btw, makes my life so much easier!

jcalazan on 15 Aug 2014

+1 would be great to have this.

arruda on 16 Aug 2014

+1 also. Recently run into the same set of problems

prologic on 19 Aug 2014

+1 also. any statement from dockerguys?

chymian on 19 Aug 2014

I am writing wrapper scripts as entrypoints to synchronise at the moment, not sure if having a mechanism in fig is wise if you have other targets for your containers that perform orchestration a different way. Seems very application specific to me, as such the responsibility of the containers doing the work.

codeitagile on 22 Aug 2014

After some thought and experimentation I do kind of agree with this.

As such an application I'm building basically has a synchronous
waitfor(host, port) function that lets me waits for services the application
is depending on (either detected via environment or explicitly
configuration via cli options).

cheers
James

James Mills / prologic

E: [email protected]
W: prologic.shortcircuit.net.au

On Fri, Aug 22, 2014 at 6:34 PM, Mark Stuart [email protected]
wrote:

I am writing wrapper scripts as entrypoints to synchronise at the moment,
not sure if having a mechanism in fig is wise if you have other targets for
your containers that perform orchestration a different way. Seems very
application specific to me as such the responsibility of the containers
doing the work.

—
Reply to this email directly or view it on GitHub
https://github.com/docker/fig/issues/374#issuecomment-53036154.

prologic on 22 Aug 2014

Yes some basic "depend's on" neeeded here...
so if you have 20 container, you just wan't to run fig up and everything starts with correct order...
However it also have some timeout option or other failure catching mechanisms

shuron on 31 Aug 2014

👍1

Another +1 here. I have Postgres taking longer than Django to start so the DB isn't there for the migration command without hackery.

ahknight on 23 Oct 2014

@ahknight interesting, why is migration running during run ?

Don't you want to actually run migrate during the build phase? That way you can startup fresh images much faster.

dnephin on 23 Oct 2014

There's a larger startup script for the application in question, alas. For now, we're doing non-DB work first, using nc -w 1 in a loop to wait for the DB, then doing DB actions. It works, but it makes me feel dirty(er).

ahknight on 23 Oct 2014

I've had a lot of success doing this work during the fig build phase. I have one example of this with a django project (still a work in progress through): https://github.com/dnephin/readthedocs.org/blob/fig-demo/dockerfiles/database/Dockerfile#L21

No need to poll for startup. Although I've done something similar with mysql, where I did have to poll for startup because the mysqld init script wasn't doing it already. This postgres init script seems to be much better.

dnephin on 23 Oct 2014

Here is what I was thinking:

Using the idea of docker/docker#7445 we could implement this "wait_for_helth_check" attribute in fig?
So it would be a fig not a Docker issue?

is there anyway of making fig check the tcp status on the linked container, if so then I think this is the way to go. =)

arruda on 24 Oct 2014

@dnephin can you explain a bit more what you're doing in Dockerfiles to help this ?
Isn't the build phase unable to influence the runtime?

docteurklein on 10 Nov 2014

@docteurklein I can. I fixed the link from above (https://github.com/dnephin/readthedocs.org/blob/fig-demo/dockerfiles/database/Dockerfile#L21)

The idea is that you do all the slower "setup" operations during the build, so you don't have to wait for anything during container startup. In the case of a database or search index, you would:

start the service
create the users, databases, tables, and fixture data
shutdown the service

all as a single build step. Later when you fig up the database container it's ready to go basically immediately, and you also get to take advantage of the docker build cache for these slower operations.

dnephin on 10 Nov 2014

nice! thanks :)

docteurklein on 10 Nov 2014

@dnephin nice, hadn't thought of that .

arruda on 11 Nov 2014

+1 This is definitely needed.
An ugly time delay hack would be enough in most cases, but a _real_ solution would be welcome.

oskarhane on 5 Dec 2014

Could you give an example of why/when it's needed?

dnephin on 5 Dec 2014

In the use case I have, I have an Elasticsearch server and then an application server that's connecting to Elasticsearch. Elasticsearch takes a few seconds to spin up, so I can't simply do a fig up -d because the application server will fail immediately when connecting to the Elasticsearch server.

dacort on 5 Dec 2014

Say one container starts MySQL and the other starts an app that needs MySQL and it turns out the other app starts faster. We have transient fig up failures because of that.

ddossot on 5 Dec 2014

crane has a way around this by letting you create groups that can be started individually. So you can start the MySQL group, wait 5 secs and then start the other stuff that depends on it.
Works in a small scale, but not a real solution.

oskarhane on 5 Dec 2014

@oskarhane not sure if this "wait 5 secs" helps, in some cases in might need to wait more (or just can't be sure it won't go over the 5 secs)... it's isn't much safe to rely on time waiting.
Also you would have to manually do this waiting and loading the other group, and that's kind of lame, fig should do that for you =/

arruda on 6 Dec 2014

@oskarhane, @dacort, @ddossot: Keep in mind that, in the real world, things crash and restart, network connections come and go, etc. Whether or not Fig introduces a convenience for waiting on a TCP socket, your containers should be resilient to connection failures. That way they'll work properly everywhere.

aanand on 6 Dec 2014

👍2

You are right, but until we fix all pre-existing apps to do things like gracefully recovering from the absence of their critical resources (like DB) on start (which is a Great Thing™ but unfortunately seldom supported by frameworks), we should use fig start to start individual container in a certain order, with delays, instead of fig up.

I can see a shell script coming to control fig to control docker :wink:

ddossot on 6 Dec 2014

👍1

I am ok with this not being built in to fig but some advice on best practice for waiting on readiness would be good

I saw in some code linked from an earlier comment this was done:

while ! exec 6<>/dev/tcp/${MONGO_1_PORT_27017_TCP_ADDR}/${MONGO_1_PORT_27017_TCP_PORT}; do
    echo "$(date) - still trying to connect to mongo at ${TESTING_MONGO_URL}"
    sleep 1
done

In my case there is no /dev/tcp path though, maybe it's a different linux distro(?) - I'm on Ubuntu

I found instead this method which seems to work ok:

until nc -z postgres 5432; do
    echo "$(date) - waiting for postgres..."
    sleep 1
done

This seems to work but I don't know enough about such things to know if it's robust... does anyone know if there's there a possible race condition between port showing up to nc and postgres server _really_ able to accept commands?

I'd be happier if it was possible to invert the check - instead of polling from the dependent containers, is it possible instead to send a signal from the target (ie postgres server) container to all the dependents?

Maybe it's a silly idea, anyone have any thoughts?

anentropic on 22 Dec 2014

👍11 ❤3

@anentropic Docker links are one-way, so polling from the downstream container is currently the only way to do it.

does anyone know if there's there a possible race condition between port showing up to nc and postgres server really able to accept commands?

There's no way to know in the general case - it might be true for postgres, it might be false for other services - which is another argument for not doing it in Fig.

aanand on 29 Dec 2014

@aanand I tried using your docker/wait image approach but i am not sure what is happening. So basically i have this "Orientdb" container which lot of other NodeJS app containers link to. This orientdb container takes some amount of time to start listening on the TCP port and this makes the other containers to get "Connection Refused" error.

I hoped that by linking wait container to Orientdb i will not see this error. But unfortunately i am still getting it randomly. Here is my setup (Docker version 1.4.1, fig 1.0.1 on an Ubuntu 14.04 Box):

orientdb:
    build: ./Docker/orientdb
    ports:
        -   "2424:2424"
        -   "2480:2480"
wait:
    build: ./Docker/wait
    links:
        - orientdb:orientdb
....
core:
    build:  ./Docker/core
    ports:
        -   "3000:3000"
    links:
        -   orientdb:orientdb
        -   nsqd:nsqd

Any help is appreciated. Thanks.

mindnuts on 8 Jan 2015

@mindnuts the wait image is more of a demonstration; it's not suitable for use in a fig.yml. You should use the same technique (repeated polling) in your core container to wait for the orientdb container to start before kicking off the main process.

aanand on 8 Jan 2015

+1 just started running into this as I am pulling custom built images vs building them in the fig.yml. Node app failing because mongodb is not ready yet...

MrMMorris on 16 Jan 2015

I just spent hours debugging why MySQL was reachable when starting WordPress manually with Docker, and why it was offline when starting with Fig. Only now I realized that Fig always restarts the MySQL container whenever I start the application, so the WordPress entrypoint.sh dies not yet being able to connect to MySQL.

I added my own overridden entrypoint.sh that waits for 5 seconds before executing the real entrypoint.sh. But clearly this is a use case that needs a general solution, if it's supposed to be easy to launch a MySQL+WordPress container combination with Docker/Fig.

kennu on 17 Jan 2015

so the WordPress entrypoint.sh dies not yet being able to connect to MySQL.

I think this is an issue with the WordPress container.

While I was initially a fan of this idea, after reading https://github.com/docker/docker/issues/7445#issuecomment-56391294, I think such a feature would be the wrong approach, and actually encourages bad practices.

There seem to be two cases which this issue aims to address:

A dependency service needs to be available to perform some initialization.

Any container initialization should really be done during build. That way it is cached, and the work doesn't need to be repeated by every user of the image.

A dependency service needs to be available so that a connection can be opened

The application should really be resilient to connection failures and retry the connection.

dnephin on 19 Jan 2015

I suppose the root of the problem is that there are no ground rules as to whose responsibility it is to wait for services to become ready. But even if there were, I think it's a bit unrealistic to expect that developers would add database connection retrying to every single initialization script. Such scripts are often needed to prepare empty data volumes that have just been mounted (e.g. create the database).

The problem would actually be much less obtrusive if Fig didn't always restart linked containers (i.e. the database server) when restarting the application container. I don't really know why it does that.

kennu on 19 Jan 2015

👍1

The problem would actually be much less obtrusive if Fig didn't always restart linked containers (i.e. the database server) when restarting the application container. I don't really know why it does that.

Actually it doesn't just _restart_ containers, it _destroys and recreates_ them, because it's the simplest way to make sure changes to fig.yml are picked up. We should eventually implement a smarter solution that can compare "current config" with "desired config" and only recreate what has changed.

Getting back to the original issue, I really don't think it's unrealistic to expect containers to have connection retry logic - it's fundamental to designing a distributed system that works. If different scripts need to share it, it should be factored out into an executable (or language-specific module if you're not using shell), so each script can just invoke waitfor db at the top.

aanand on 19 Jan 2015

@kennu what about --no-recreate ? /cc @aanand

docteurklein on 19 Jan 2015

@aanand I meant the unrealism comment from the point of view the Docker Hub is already full of published images that probably don't handle connection retrying in their initialization scripts, and that it would be quite an undertaking to get everybody to add it. But I guess it could be done if Docker Inc published some kind of official guidelines / requirements.

Personally I'd rather keep containers/images simple though and let the underlying system worry about resolving dependencies. In fact, Docker's restart policy might already solve everything (if the application container fails to connect to the database, it will restart and try again until the database is available).

But relying on the restart policy means that it should be enabled by default, or otherwise people spend hours debugging the problem (like I just did). E.g. Kubernetes defaults to RestartPolicyAlways for pods.

kennu on 19 Jan 2015

any progress on this? I would like to echo that expecting all docker images to change and the entire community implement connection retry practices is not reasonable. Fig is a Docker orchestration tool and the problem lies in the order it does things so the change needs to be made in Fig, not Docker or the community.

MrMMorris on 23 Jan 2015

expecting all docker images to change and the entire community implement connection retry practices is not reasonable

It's not that an application should need to retry because of docker or fig. Applications should be resilient to dropped connections because the network is not reliable. Any application should already be built this way.

I personally haven't had to implement retries in any of my containers, and I also haven't needed any delay or waiting on startup. I believe most cases of this problem fall into these two categories (my use of "retry" is probably not great here, I meant more that it would re-establish a connection if the connection was closed, not necessarily poll for some period attempting multiple times).

If you make sure that all initialization happens during the "build" phase, and that connections are re-established on the next request you won't need to retry (or wait on other containers to start). If connections are opened lazily (when the first request is made), instead of eagerly (during startup), I suspect you won't need to retry at all.

the problem lies in the order [fig] does things

I don't see any mention of that in this discussion so far. Fig orders startup based on the links specified in the config, so it should always start containers in the right order. Can you provide a test case where the order is incorrect?

dnephin on 25 Jan 2015

I have to agree with @dnephin here. Sure, it would be convenient if compose/fig was able to do some magic and check availability of services, however, what would the expected behavior be if a service doesn't respond? That _really_ depends on the requirements of your application/stack. In some cases, the entire stack should be destroyed and replaced with a new one, in other cases a failover stack should be used. Many other scenarios can be thought of.

Compose/Fig cannot make these decisions, and monitoring services should be the responsibility of the applications running inside the container.

thaJeztah on 25 Jan 2015

I would like to suggest that @dnephin has merely been lucky. If you fork two processes in parallel, one of which will connect to a port that the other will listen to, you are essentially introducing a race condition; a lottery to see which process happens to initialize faster.

I would also like to repeat the WordPress initialization example: It runs a startup shell script that creates a new database if the MySQL container doesn't yet have it (this can't be done when building the Docker image, since it's dependent on the externally mounted data volume). Such a script becomes significantly more complex if it has to distinguish generic database errors from "database is not yet ready" errors and implement some sane retry logic within the shell script. I consider it highly likely that the author of the image will never actually test the startup script against the said race condition.

Still, Docker's built-in restart policy provides a workaround for this, if you're ready to accept that containers sporadically fail to start and regularly print errors in logs. (And if you remember to turn it on.)

Personally, I would make Things Just Work, by making Fig autodetect which container ports are exposed to a linked container, ping them before starting the linked container (with a sane timeout), and ultimately provide a configuration setting to override/disable this functionality.

kennu on 25 Jan 2015

👍1

this can't be done when building the Docker image, since it's dependent on the externally mounted data volume

True. An approach here is to start just the database container once (if needed, with a different entrypoint/command), to initialise the database, or use a data-only container for the database, created from the same image as the database container itself.

Such a script becomes significantly more complex if it has to distinguish generic database errors from "database is not yet ready" errors

Compose/Fig will run into the same issue there; How to check if MySQL is up, and _accepting_ connections? (and PostgreSQL, and (_insert your service here_)). Also, _where_ should the "ping" be executed from? Inside the container you're starting, from the host?

As far as I can tell, the official WordPress image includes a check to see if MySQL is accepting connections in the docker-entrypoint.sh

thaJeztah on 25 Jan 2015

@thaJeztah "Add some simple retry logic in PHP for MySQL connection errors" authored by tianon 2 days ago - Nice. :-) Who knows, maybe this will become a standard approach after all, but I still have my doubts, especially about this kind of retry implementations actually having being tested by all image authors.

About the port pinging - I can't say offhand what the optimal implementation would be. I guess maybe simple connection checking from a temporary linked container and retrying while getting ECONNREFUSED. Whatever solves 80% (or possibly 99%) of the problems, so users don't have to solve them by themselves again and again every time.

kennu on 25 Jan 2015

@kennu Ah! Thanks, wasn't aware it was just added recently, just checked the script now because of this discussion.

To be clear, I understand the problems you're having, but I'm not sure Compose/Fig would be able to solve them in a clean way that works for everyone (and reliably). I understand many images on the registry don't have "safeguards" in place to handle these issues, but I doubt it's Compose/Fig's responsibility to fix that.

thaJeztah on 25 Jan 2015

Having said the above; I _do_ think it would be a good thing to document this in the Dockerfile best practices section.

People should be made aware of this and some examples should be added to illustrate how to handle service "outage". Including a link to the WikiPedia article that @dnephin mentioned (and possibly other sources) for reference.

thaJeztah on 25 Jan 2015

I ran into the same problem and like this idea from @kennu

Personally, I would make Things Just Work, by making Fig autodetect which container ports are exposed to a linked container, ping them before starting the linked container (with a sane timeout), and ultimately provide a configuration setting to override/disable this functionality.

I think this would solve a lot typical use cases, like for me when depending on the official mongodb container.

soupdiver on 8 Feb 2015

I agree with @soupdiver. I am also having trouble in conjunction with a mongo container, and although I have it working with a start.sh script, the script is not very dynamic and adds another file I need to keep in my repo (I would like to just have a Dockerfile and docker-compose.yml in my node repo). It would be nice if there were some way to just Make It Work, but I think something simple like a wait timer won't cut it in most cases.

MrMMorris on 8 Feb 2015

IMO pinging is not enough, because the basic network connection may be available, but the service itself is still not ready.
This is the case with the MySQL image for example, using curl or telnet for the connection check on the exposed ports would be safer, although I don't know if it would be enough. But most containers don't have these tools installed by default.

Could docker or fig handle these checks?

schmunk42 on 9 Feb 2015

Could docker or fig handle these checks?

In short: _no_. For various reasons;

Performing a "ping" from within a container would mean running a second process. Fig/Compose cannot automatically start such process, and I don't think you'd want Fig/Compose to modify your container by _installing_ software (such as curl or telnet) in it.
(As I mentioned in a previous comment), each service requires different way to check if it is accepting connections / ready for use. Some services may need credentials or certificates to _establish_ a connection. Fig/Compose cannot automatically invent how to do that.

thaJeztah on 9 Feb 2015

and I don't think you'd want Fig/Compose to modify your container by installing software (such as curl or telnet) in it.

No, for sure not.

Fig/Compose cannot automatically invent how to do that.

Not invent. I was thinking more about an instruction for fig or docker, how to check it, eg.

web:
    image: nginx
    link: db
db:
   is_available: "curl DB_TCP_ADDR:DB_TCP_PORT"

The telnet command would be executed on the docker-host, not in the container.
But I am just thinking loud, I know that this is not the perfect solution. But the current way of using custom check-scripts for the containers could be improved.

schmunk42 on 9 Feb 2015

The telnet command would be executed on the docker-host, not in the container.

Then curl or <name a tool that's needed> would have to be installed on the host. This could even have huge security issues (e.g. someone wants to be funny and uses is_available: "rm -rf /"). Apart from that, being able to access the database from the _host_ is no guarantee that it's also accessible from inside the container.

But I am just thinking loud, ...

I know, and I appreciate it. Just think there's no reliable way to automate this, or would serve most use-cases. In many cases you'd end up with something complex (take, for example, the curl example; how long should it try to connect? Retry?). Such complexity is better to move inside the container, which would also be useful if the container was started with Docker, not Fig/Compose.

thaJeztah on 9 Feb 2015

@thaJeztah I totally agree with you. And it's very likely that there will be no 100% solution.

schmunk42 on 9 Feb 2015

I’m going to repeat a suggestion I made earlier: It would be sufficient for me if I could state in the fig.yml “wait for this container to exit before running this other container”.

This would allow me to craft a container that knows how to wait for all it’s dependencies - check ports, initialise databases, whatever - and would require fig know as little as possible.

I would see it configured as something like:

“”"
app:
links:
- db:db
prereqs:
- runthisfirst

runthisfirst:
links:
- db:db
“””

runthisfirst has a link that means the database starts up so it can check access. app will only run once runthisfirst has exited (bonus points if runthisfirst has to exit successfully).

Is this feasible as an answer?

KJL

On 10 Feb 2015, at 05:28, Tobias Munk [email protected] wrote:

@thaJeztah https://github.com/thaJeztah I totally agree with you. And it's very likely that there will be no 100% solution.

—
Reply to this email directly or view it on GitHub https://github.com/docker/fig/issues/374#issuecomment-73561930.

silarsis on 10 Feb 2015

👍1

I've just tried migrating my shell script launchers and ran into this issue. It would be nice even just to add a simple sleep/wait key that just sleeps for that number of seconds before launching the next container.

db:
  image: tutum/mysql:5.6
  sleep: 10
app:
  link:
    - db:db

jgeiger on 28 Feb 2015

👍1

I really dno't like this for a number of reasons.

a) I think it's the wrong place for this
b) How long do you sleep for?
c) What if the timeout is not long enougH?

Aside from the obvious issues I really don't think
infrastructure should care about what the application
is and vice versa. IHMO the app should be written to be
more tolerant and/or smarter about it's own requirements.

That being said existing applications and legacy applications
will need something -- But it should probably be more along
the lines of:

a docker-compose.yml:

db:
  image: tutum/mysql:5.6
app:
  wait: db
  link:
    - db:db

Where wait waits for "exposed" services on db to become available.

The problem is how do you determine that?

In the simplest cases you wait until you can successfully open
a tcp or udp connection to the exposed services.

prologic on 28 Feb 2015

This might be overkill for this problem but what would be a nice solution is if docker provided an event triggering system where you could initiate a trigger from one container that resulted in some sort of callback in another container. In the case of waiting on importing data into a MySQL database before starting another service, just monitoring whether the port was available isn't enough.

Having an entrypoint script set an alert to Docker from inside the container (set a pre-definied environment variable for example) that triggered an event in another container (perhaps setting the same synchronized environment variable) would enable scripts on both sides to know when certain tasks are complete.

Of course we could set up our own socket server or other means but that's tedious to solve a container orchestration issue.

mattwallington on 10 Mar 2015

👍1

@aanand I _almost_ have something working using your wait approach as the starting point. However, there is something else happening between docker-compose run and docker run where the former appears to hang whilst the later works a charm.

example docker-compose.yml:

db:
  image: postgres
  ports:
    - "5432"
es:
  image: dockerfile/elasticsearch
  ports:
    - "9200"
wait:
  image: n3llyb0y/wait
  environment:
    PORTS: "5432 9200"
  links:
    - es
    - db

then using...

docker-compose run wait

however this is not to be. The linked services start and it looks like we are about to wait only for it to choke (at least within my virtualbox env. I get to the nc loop and we get a single dot then...nothing).

However, with the linked services running I can use this method (which is essentially what I have been doing for our CI builds)

docker run -e PORTS="5432 9200" --links service_db_1:wait1 --links service_es_1:wait2 n3llyb0y/wait

It feels like docker-compose run should work in the same way. The difference is that when using docker-compose run with the detach flag -d you get no wait benefit as the wait container backgrounds and I think (at this moment in time) that not using the flag causes the wait to choke on the other non-backgrounded services. I am going to take a closer look

n3llyb0y on 11 Mar 2015

After a bit of trial and error it seems the above approach does work! It's just the busybox base doesn't have a netcat util that works very well. My modified version of @aanand wait utility does work against docker-compose 1.1.0 when using docker-compose run <util label> instead of docker-compose up. Example of usage in the link.

Not sure if it can handle chaining situations as per the original question though. Probably not.

Let me know what you think.

n3llyb0y on 13 Mar 2015

This is a very interesting issue. I think it would be really interesting to have a way that one container waits until another one it's ready. But as everybody says, what does ready mean? In my case I have a container for MySQL, another one that manage its backups and is also in charge of import an initial database, and then the containers for each app that need the database. It's obvious that to wait the ports to be exposed is not enough. First the mysql container must be started and then the rest should wait until the mysql service is ready to use, not before. To get that, I have needed to implement a simple script to be executed on reboot that uses the docker exec functionality. Basically, the pseudo-code would be like:

run mysql
waitUntil "docker exec -t mysql mysql -u root -prootpass database -e \"show tables\""
run mysql-backup
waitUntil "docker exec -t mysql mysql -u root -prootpass database -e \"describe my_table\""
run web1
waitUntil "dexec web1 curl localhost:9000 | grep '<h1>Home</h1>'"
run web2
waitUntil "dexec web2 curl localhost:9000 | grep '<h1>Home</h1>'"
run nginx

Where waitUntil function has a loop with a timeout that evals the docker exec … command and check if the exit code is 0.

With that I assure that every container waits until its dependencies are ready to use.

So I think it could be an option to integrate within compose utility. Maybe something like that, where wait_until declares a list of other dependencies (containers) and waits for each one until they respond ok to the corresponding command (or maybe with an optional pattern or regex to check if the result matches to something you expect, even though using grep command could be enough).

mysql:
  image: mysql
  ...
mysql-backup:
  links:
   - mysql
  wait_until:
   - mysql: mysql -u root -prootpass database -e "show tables"
  ...
web1:
  links:
   - mysql
  wait_until:
   - mysql: mysql -u root -prootpass database -e "describe my_table"
  ...
web2:
  links:
   - mysql
  wait_until:
   - mysql: mysql -u root -prootpass database -e "describe my_table"
  ...
nginx:
  links:
   - web1
   - web2
  wait_until:
   - web1: curl localhost:9000 | grep '<h1>Home</h1>'
   - web2: curl localhost:9000 | grep '<h1>Home</h1>'
  ...

adrianhurt on 16 Mar 2015

Wha not a simple eait for the port like it?
http://docs.azk.io/en/azkfilejs/wait.html#

robsonpeixoto on 31 Mar 2015

@robsonpeixoto: Waiting for the port isn't sufficient for a lot of use cases. For example, let's say you are seeding a database with data on creation and don't want the web server to start and connect to it until the data operation has completed. The port will be open the whole time so that wouldn't block the web server from starting.

mattwallington on 2 Apr 2015

Something like AWS CloudFormation's WaitCondition would be nice. http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-waitcondition.html

mattwallington on 6 Apr 2015

+1 I'm having the same issue when using Docker for testing my Rails apps which depend on MySQL

deanpcmad on 16 Apr 2015

+1 I have this issue too. I like @adrianhurt idea, where you actually supply the condition to be evaluated to determine if the wait is complete. That way you still have a nice declarative yml, and you don't have to have an arbitrary definition of "ready".

AwokeKnowing on 21 Apr 2015

+1

rkettelerij on 23 Apr 2015

I've had this tab open for a while: http://crosbymichael.com/docker-events.html ...seems relevant

anentropic on 23 Apr 2015

+1

tuscland on 4 May 2015

+1 for simple timeout

kkamkou on 4 May 2015

+1 for a ready condition

ryneeverett on 4 May 2015

+1

fdellavedova on 19 May 2015

I am solving this very reliably on the application level since a while, like it was recommended in this thread.

Just to give you an idea how this can be implemented for MySQL + PHP here's my code.

From igorw/retry :)

Since the network is reliable, things should always work. Am I right? For those cases when they don't, there is retry.

schmunk42 on 19 May 2015

+1

rfink on 20 May 2015

@schmunk42 Nice stuff - I like that it's a good example of both establishing the connection and performing an idempotent database setup operation.

aanand on 20 May 2015

Might be good to create a (some) basic example(s) for inclusion in the docs, for different cases, e.g. NodeJS, Ruby, PHP.

thaJeztah on 20 May 2015

+1, at least should provide some options to add some delay before the container starts successfully.

yeasy on 21 May 2015

+1

Silex on 3 Jun 2015

How to solve problems when you try to connect services that aren't your code.
For example, if a have the service Service and the database InfluxDB. Services requires InfluxDB and the InfluxDB has a slow startup.

How can docker-compose wait for InfluxDB be ready?

I the code is my, I can solve it putting a retry. But for third app I can't changes the code.

robsonpeixoto on 3 Jun 2015

@robsonpeixoto there are some examples in this ticket with netcat or similar ways. You can take a look to my MySQL example in another ticket: https://github.com/docker/docker/issues/7445#issuecomment-101523662

artem-sidorenko on 3 Jun 2015

That's the reason I think each container should have the optional ability to indicate its own readiness. For a DB, for example, I want to wait until the service is completely ready, not when the process is created. I solve this with customized checks with docker exec and checking if it can solve a simple query, for example.

Some optional flag for docker run to indicate an internal check command would be great to later link it from another container using a special flag for the link.

Something like:

$ sudo docker run -d --name db training/postgres --readiness-check /bin/sh -c "is_ready.sh"
$ sudo docker run -d -P --name web --link db:db --wait-for-readiness db training/webapp python app.py

Where is_ready.sh is a simple boolean test which is in charge of the decision of when the container is considered as ready.

adrianhurt on 3 Jun 2015

+1

ringanta on 26 Jun 2015

@schmunk42 nice quote!

zheli on 26 Jun 2015

+1

approxit on 29 Jun 2015

+1

jayfk on 1 Jul 2015

+1

RutledgePaulV on 3 Jul 2015

+1

GrantGochnauer on 4 Jul 2015

+1 and one of the solutions using own script

signalpillar on 5 Jul 2015

+1

pleerock on 6 Jul 2015

+1

probepark on 9 Jul 2015

+1

cteyton on 10 Jul 2015

+1

kulbida on 13 Jul 2015

+1

bahaaldine on 16 Jul 2015

Actually I changed my mind about this, so -1

It makes more sense for your container to check wether the 3rd party service is available, and this is easily done with a little bash wrapper script that uses nc for example.

Relying on a delay is tempting, but it's a poor solution because:

Your container will _always_ wait X seconds before being ready.
X seconds might still not be enough in some cases (e.g heavy I/O or CPU on the host), so your container is still not failsafe.
No fail strategy.

Relying on writing a wrapper bash script is better because:

Your container is ready as soon as it possibly can.
You can implement any fail strategy, e.g try 10 times then fail, try forever, etc. You can even implement the delay yourself by sleeping before trying!

Silex on 16 Jul 2015

Reading through this threadnought I see that no one mentions secrets. I'm trying to use a data-only container which requests secrets once it is run. The problem I have: If my secrets take too long to transmit/decrypt then my dependant container fails because the data it's expecting isn't there. I can't really use the method of "well put everything in the container before you run it" because they are secrets.

I know there is some ambiguity around data-only containers in compose due to the context of the return code, but is there a better way to do this?

HakShak on 29 Jul 2015

Similarly changing my mind on this, -1. @dnephin's approach is entirely correct. If your _application_ depends on a _service_, the application itself should be able to handle unavailability of that service gracefully (e.g. re-establishing a connection). It shouldn't be some bash wrapper script or some logic in Compose or Docker, it's the responsibility of the application itself. Anything not at the application level will also only work on initialization; if that service goes down, a wrapper script or something will not be executed.

Now if we could get application/library/framework developers to realize and support this responsibility, that would be fantastic.

agilgur5 on 29 Jul 2015

Tough to do considering the approaches you would take involve sideloading other daemons which isn't recommended. For my example where I have a rails app attempting to connect to a MySQL database while another rails app is currently migrating and seeding the DB on initial startup, for me to make the rails app know to not attempt to use the DB, I would either have to mod the ActiveRecord library (not going to happen) or run a script that keeps checking to see if the DB has been migrated and seeded. But how do I know for sure without knowing what data is supposed to be in there and/or having some script that runs on the system that's seeding the DB to inform the rest to connect to it.

Maybe i'm missing the obvious solution but your answer of "developers should be able to deal with this in their own code" breaks down when you're using off the shelf libraries and when you're not "supposed" to sideload daemons into a container.

mattwallington on 29 Jul 2015

@mattwallington I'm not even sure how Compose would provide a solution for that situation...
That's also awfully specific which would make it even tougher for Compose to invent. I would read some of @dnephin's tips above on initialization/migration/seeding as that might help your case.

I think you missed the point of my last line, many off the shelf libraries don't work _precisely_ because they haven't been built resiliently. There isn't a magic solution that can solve all of this that Compose can just implement.

agilgur5 on 29 Jul 2015

Understood. My suggestion earlier which would work for a lot of these use cases is to have a shared environment variable between containers. One side could lock on polling the variable and the other could perform the action and then set the variable.

mattwallington on 29 Jul 2015

+1 @mattwallington idea of shared environment variable between containers

eighilaza on 30 Jul 2015

Thank you. It's simple (at the product level. I have no idea what it would take from the development side as I haven't looked at the code) but it would solve a lot of these problems and probably a lot of others as it's not specific to this issue.

mattwallington on 30 Jul 2015

But how do I know for sure without knowing what data is supposed to be in there and/or having some script that runs on the system that's seeding the DB to inform the rest to connect to it.

@mattwallington: check the migration number in the schema table. If the number is correct you know the migration ran.

It shouldn't be some bash wrapper script or some logic in Compose or Docker, it's the responsibility of the application itself. Anything not at the application level will also only work on initialization; if that service goes down, a wrapper script or something will not be executed.

@agilgur5: yes I agree it'd be handled by the application, but a bash script is a simple solution to handle those applications that are not coded that way, e.g by restarting the app when the service is not available.

Silex on 30 Jul 2015

Arguments can be made all day about what should or could be done at the app level but rather than expecting every app on the market to deal with this and become awesome at self recovery (unlikely) why are we so against adding some features that can solve this problem for apps run in docker regardless of how the third party apps are written or what they SHOULD do but won't. This is what we have control over. Let's solve the problem rather than decide who should be the one to solve it since we have no control over that.

mattwallington on 30 Jul 2015

I agree with @mattwallington. You can require this extra effort for app-level self-recovery from every single developer of every single container image, but a large percentage of them are bound to be too ignorant or just too busy to implement and test it carefully. End result will be that some containers know how to self-recover while many do not. And as a user, you'll be without any tools to manage the ones that don't.

kennu on 30 Jul 2015

An Idea that just came into my mind: Instead of solving the problem by delaying container startups, Compose could try to recover the failed container.

Something like recover: auto would restart the failed container for 5 times in 2, 4, 8, 16 and 32 seconds and then give up completely.

jayfk on 30 Jul 2015

Has anyone _thought_ of the notion of a container dependency container?

For example:

``` #!yml
db:
image: mysql

waitfor:
links:
- db
volumes:
- /var/lib/docker.sock:/docker.sock
- ${PWD}/docker-compose.yml:/docker-compose.yml
command: docker-compose up -d app

app:
image: myuser/myapp
links:
- db
```

The basic idea here is that you _solve_ the problem for containers that don't have self-recovery mechanisms by creating a dedicated reuseable service that could be published to the Docker Hub that everyone can then just inject into their composition.

I _would_ even be willing to prototype such a service/container/image and let other splay with this to see how it fares...

prologic on 30 Jul 2015

@prologic The problem with a dependency is, how do make sure that the service you want to speak to is actually up?

Your db container might respond to a ping but is doing some pre launch cleanup/initializing the database before it actually is available for mysql/psql commands.

jayfk on 30 Jul 2015

Can that test be defined it a configurable way and/or supplied in a script to a reusable waitfor type service?

prologic on 30 Jul 2015

IMHO, this is a very common issue and everyone has his own specific requirements. As a commented before in this issue, I think docker could (and should) provide a way for a container to specify a simple command to check its own readiness. Obviously, we as developers should have to specifically indicate how to check the readiness of each container.

Some optional flag for docker run to indicate an internal check command would be great to later link it from another container using a special flag for the link.

Something like:

$ sudo docker run -d --name db training/postgres --readiness-check /bin/sh -c "is_ready.sh"
$ sudo docker run -d -P --name web --link db:db --wait-for-readiness db training/webapp python

Where is_ready.sh is a simple boolean test which is in charge of the decision of when the container is considered as ready.

It could be also a command for a container to check manually its readiness.

adrianhurt on 30 Jul 2015

Where is_ready.sh is a simple boolean test which is in charge of the decision of when the container is considered as ready.

which means that every developer should prepare his images/containers to include _something_ that can be used for checking if the container is ready.

Which brings us back to square 1.; developers are the ones responsible for making their containers resilient to service outage / startup time, because they are the only one that can tell "what" that means for their situation?

Or am overlooking something here?

thaJeztah on 30 Jul 2015

I agree. The responsibility lies with the developer / container / service

On Thursday, 30 July 2015, Sebastiaan van Stijn [email protected]
wrote:

Where is_ready.sh is a simple boolean test which is in charge of the
decision of when the container is considered as ready.

which means that every developer should prepare his images/containers to
include _something_ that can be used for checking if the container is
ready.

Which brings us back to square 1.; developers are the ones responsible for
making their containers resilient to service outage / startup time, because
they are the only one that can tell "what" that means for their situation?

Or am overlooking something here?

—
Reply to this email directly or view it on GitHub
https://github.com/docker/compose/issues/374#issuecomment-126278215.

James Mills / prologic

E: [email protected]
W: prologic.shortcircuit.net.au

prologic on 30 Jul 2015

Yes, of course. For me, the only one that really knows when a container is ready is the own container. Docker can't know nothing about the content of a container. It's a black box. The only thing it could do is to ask the container (with a custom action specified when you want to running, like I proposed, or any other common way to test it). And obviously the developer is the only one that knows what he needs and the content of that black box.

adrianhurt on 30 Jul 2015

Yeah that's right!

On Thursday, 30 July 2015, adrianhurt [email protected] wrote:

Yes, of course. For me, the only one that really knows when a container is
ready is the own container. Docker can't know nothing about the content of
a container. It's a black box. The only thing it could do is to ask the
container (with a custom action specified when you want to running, like I
proposed, or any other common way to test it). And obviously the developer
is the only one that knows what he needs and the content of that black box.

—
Reply to this email directly or view it on GitHub
https://github.com/docker/compose/issues/374#issuecomment-126285056.

James Mills / prologic

E: [email protected]
W: prologic.shortcircuit.net.au

prologic on 30 Jul 2015

OK - for the sake of all the in-development or legacy software out there that can't handle network failure, let's suppose we want to solve this problem after all. I'm not saying we do, I just want to get a feel for syntax, semantics and complexity.

The minimal set of requirements seems to be:

I want Compose to wait to start a service until another service is "ready".
I want to define "ready" as "is accepting TCP connections on port X", or something else.

Let's also suppose that health checks aren't going to make it into Docker for a while.

I wonder if it could be solved in the general case by making it possible to _wait for another service's containers to exit_. You could then write your health check as just another service.

web:
  image: mywebapp
  links: ["db"]
  wait_for: ["db_wait"]

db_wait:
  image: netcat
  links: ["db"]
  command: sh -c "while ! nc -w 1 -z db 5432; do sleep 1; done"

db:
  image: postgres

If you want some kind of custom health check, define it in the "wait" service. Here, db_wait will only exit once mytable exists in the mydb database:

db_wait:
  image: postgres
  links: ["db"]
  command: sh -c "while ! psql --host db --dbname mydb -c "\d mytable"; do sleep 1; done"

If you have a db preparation script to run first, you can make that the thing to wait for:

web:
  image: mywebapp
  links: ["db"]
  wait_for: ["prepare_db"]

prepare_db:
  image: prepare_db
  links: ["db"]
  command: ./prepare.sh

db:
  image: postgres

The first case (wait until the container is accepting TCP connections) might be common enough to be worth supporting out-of-the-box.

web:
  image: mywebapp
  links: ["db"]
  wait_for_tcp: ["db:5432"]

db:
  image: postgres

There's a hidden implication in all of this: docker-compose up -d would have to block while the intermediary health check or preparation service is running, so that it can start the consumer service(s) once it's done.

aanand on 30 Jul 2015

👍9 ❤1

Yes. However, in my opinion, docker itself should provide a way to determine when a container is ready, and then compose could manage it. We could then raise a new issue directly in docker. For me, it could be something like:

web:
  image: mywebapp
  links: ["db"]
  wait_for: ["db"]

db:
  image: postgres
  ready_when: sh -c "while ! psql --host db --dbname mydb -c "\d mytable"; do sleep 1; done"

Then, there's no need to create new services as a workaround

adrianhurt on 30 Jul 2015

Agreed. Give the flexibility to the developer. Also pausing the container might not be what the developer needs that container to do while they wait. Perhaps there is some of its own init that needs to happen but then wait for the db to be ready for connections. So if it's a simple shared env variable, it enables the developer to use it as they need to.

mattwallington on 30 Jul 2015

How would a shared environment variable be updated? As far as I'm aware, you can't make changes to a process' set of environment variables once it's started.

aanand on 30 Jul 2015

You can change them in e.g. a bash session, but they won't get propagated in all other layers.

To do that, the change has to be committed and the container restarted. This is pretty useless in this scenario.

jayfk on 30 Jul 2015

What's more, even if the variable could change, the process in the container would need to know to poll it, which makes this a non-solution for the legacy software that this feature is ostensibly for.

aanand on 30 Jul 2015

Since we're really only talking about solving this problem for legacy containers in development environments, I think it could be solved as a tool that sits on top of compose docker-compose ps -s (list services in dependency order, #1077).

With these two commands, a tool could be written that does something like this:

Run docker-compose ps -s to get the list of services names in dependency order
Run docker-compose up -d --no-recreate <first service from the list>
Run the "healthcheck" command for that service until it goes healthy or it hits the timeout. This could be an HTTP request, or a docker exec call
Repeat 2 and 3 for each service in the list
Run docker-compose logs (or don't if -d is passed)

This way the "healthcheck" and "wait for" config can be external to compose, but I don't think the developer is required to provide anything above what would be required if it were implemented as part of compose itself.

Any reason this wouldn't work?

dnephin on 30 Jul 2015

I'm glad we've limited the scope to legacy containers, that's much more reasonable :+1:

@dnephin I think that's great from the immediate perspective that it's flexible and that Compose doesn't have to have some built-in support for legacy containers, but I see an immediate problem similar to what @mattwallington described, what if the other containers can run some things (e.g. init) before connecting to this service? Blocking on that container _works_, but isn't ideal (that being said, I'm not sure if there is an ideal solution for a legacy container). That would solve my problem at least, now I have to find that ticket!

agilgur5 on 30 Jul 2015

+1

for being able to specify dependencies in docker-compose files ...

... not using links (as they are incompatible with net=host). I would have thought it is the job or at least the concern of a multi-container management tool to know what order things should start up in, much like puppet has its own dependency tree, but sometimes the user knows best and can override. Reading all of the above, it seems to me, the only difficulty is deciding when a container is "up" so that the next container in the dependency chain can be started. This could be exactly the same mechanism as the link mechanism (for now) - anything is better than nothing. Later, the clarification of "fulfilling a dependency" could be specified by the docker-compose file, for example - this dependency it matters that the container is running

depends_on:
  container: foo
  requires: running

or this dependency it matters that the containers TCP ports are listening.

depends_on:
  container: foo
  requires: listening

Saying it's the job of some external tool or script atop docker-compose is tantamount to saying docker-compose has no real interest or responsibility for the orchestration of running 2 or more containers on the same machine. So whats its purpose?

gtmtech on 31 Jul 2015

I would have thought it is the job or at least the concern of a multi-container management tool to know what order things should start up in

No, not necessarily. For the reasons laid out several times already in this thread, I believe there are only two excuses to hand this job off to a container management tool:

You're running off-the-shelf container images that aren't resilient to the unavailability of the upstream services they depend on, and for technical or business reasons you can't change or extend them.
You're running software purely in a development environment, not a production one, and you've got better things to spend time on than implementing resilience.

However, if you have control over your software and you're planning to deploy it to production, you cannot hope to rely on an external tool that merely starts things in the right order, even if you've carefully defined your readiness conditions. It's a non-solution to the fundamental problem, and your system will fall over the moment there's a network hiccup. At best, you'll have to automatically restart all your web frontend containers every time that happens, and I can't see that being an acceptable amount of downtime for anyone.

aanand on 31 Jul 2015

Agree with you that well written software will cope with network outages. I think we both agree that not all software is well-written, and developers don't always think about all possible usecases.

Perhaps I want to run a clients container running a JVM, and another monitoring tool to attach to it, both in the host PID namespace so one can monitor the other and I am only supported and licensed to run such a tool with the vendors authorised image. If the monitoring tool monitors existing JVMs, then it matters which order they start up in (obviously). There are probably hundreds of usecases where order matters, some involving networks (mysql, elasticsearch, service discovery clusters have all been mentioned), but some involving other things which can be shared by using the different namespaces effectively.

So I definitely agree with usecase (1), in that in some circumstances you just cant change a container

But also as soon as a tool concerns itself with multiple anything, it immediately comes up against ordering. If ordering is important and the only way to guarantee an order is to write a bash script around docker-compose to first up something, and then up something else, docker-compose might as well not even exist in the chain, other than the fact JSON/YAML is prettier than cmdline args.

IMHO Ultimately a tool is either useful or not for a bunch of usecases. Docker-compose is clearly useful for unordered startups of multiple containers on the same host. If enough people and enough usecases are about ordering and a tool doesn't address them, people will just go elsewhere for those usecases which is a shame.

Anyway I see I'm just rehashing old arguments, so I'll eject from this thread now..

gtmtech on 31 Jul 2015

This is the same argument that is stated across thousands of threads on so many different development topics "If all code was written properly by everyone, there would be no need like this, so let's not do it". That is equivalent to saying: if all people in the world stopped burning fossil fuels or using electricity, we could fix the problems we are causing our planet. You are correct. If all apps did what they should, we wouldn't be here talking about this. But we live on earth. A place that had massive imperfections with a species of beings that tend to have to learn things the hard way. The same people who only get companies off the ground because they build "minimum viable product" in order to pray that they will make it to the next round of funding or the next customer that might someday allow them to build the version they wish they could.

The world is not and will never be perfect and so we can only do what we have in our own control. And in this case the only thing I have in my control is to try to convince you all (a.k.a. The people who develop the tool that I absolutely love and would use like crazy if it were to only have this one feature) to build it in a way where it can handle the software that exists in the world we live in. Not the one we wished we lived in.

On Jul 31, 2015, at 3:42 AM, Aanand Prasad [email protected] wrote:

I would have thought it is the job or at least the concern of a multi-container management tool to know what order things should start up in

No, not necessarily. For the reasons laid out several times already in this thread, I believe there are only two excuses to hand this job off to a container management tool:

You're running off-the-shelf container images that aren't resilient to the unavailability of the upstream services they depend on, and for technical or business reasons you can't change or extend them.

You're running software purely in a development environment, not a production one, and you've got better things to spend time on than implementing resilience.

However, if you have control over your software and you're planning to deploy it to production, you cannot hope to rely on an external tool that merely starts things in the right order, even if you've carefully defined your readiness conditions. It's a non-solution to the fundamental problem, and your system will fall over the moment there's a network hiccup. At best, you'll have to automatically restart all your web frontend containers every time that happens, and I can't see that being an acceptable amount of downtime for anyone.

—
Reply to this email directly or view it on GitHub.

mattwallington on 31 Jul 2015

+1 @aanand. This is not something you're going to be able to limit in scope. The feature, if done, needs to be something people can count on. They've been coding for "durable" infrastructure for a long time and it is taking a long time to convert the masses. They _will_ use this for a long time to come.

huslage on 31 Jul 2015

I'd like to reiterate that we haven't ruled out implementing something to alleviate the problem of startup timing dependencies between containers - I sketched out one possible solution just yesterday, in this very thread.

But I want us to be on the same page regarding whom this feature is for, what problems it will solve, what problems it _won't_ solve and to what degree it can be relied on. That's what I'm trying to feel out.

Furthermore, given the multiple definitions of readiness that exist ("container has started" vs "container is accepting TCP connections" vs "custom health check passes"), and the differing complexity of implementation of each of them, I want to get a comparative idea of how many people would benefit from out-of-the-box support for each one.

aanand on 31 Jul 2015

A common technique for synchronizing services is something like a "service registry" using etcd or similar. How about a wait_for_service:

web:
  image: mywebapp
  links: ["db"]
  wait_for_service:
    type: etcd (or consul, or zk)    -- or use swarm type notation
    addr: http://my.etcd.com/
    path: postgres.service

db:
  image: postgres

compose doesn't know if the service is really ready but it can look for the data in the registry and start the dependent container based on that. It's the responsibility of the services (like postgres) to publish their availability to the registry so for legacy applications some sort of wrapping script would do that: start the app, watch for the port to go live, then publish to the registry.

gilclark on 4 Aug 2015

Hi @aanand.

I was discussing this today with @bfirsh on the twitters as it's something I've encountered.

Specifically, when building a distributed system out of many small dockerized components I found a need to have integration tests which spun up all the main interface apps and their dependencies, then ran various tests against them before tearing it all down again.

This lead to the issues of e.g. Riak taking a while to start compared to just about anything else that uses it.

I wouldn't be especially opposed to y'all saying "containers start async, deal with it", but designing around each service exposing a consistent healthcheck would at least contain the "dealing with it" to one implementation per service rather than having code to deal with retrying connections in each app that relies on the service.

Services defining their own healthcheck is also beneficial for monitoring purposes.

elliotcm on 12 Aug 2015

@elliotcm Agreed on both points.

aanand on 12 Aug 2015

For testing on our CI, we built a small utility that can be used in a Docker container to wait for linked services to be ready. It automatically finds all linked TCP services from their environment variables and repeatedly and concurrently tries to establish TCP connections until it succeeds or times out.

We also wrote a blog post describing why we built it and how we use it.

mfrister on 14 Aug 2015

@meeee that looks really nice and useful! Can you show us examples of how it's spin up; in particular in use with a docker-compose?

prologic on 14 Aug 2015

I had a trickier case today, where I was starting a mysql container for the first time. The container bootstraps itself when first run, and restarts the database daemon when it's configured. This caused my port-availability checks to trigger prematurely, and dependent containers started but failed to connect. This was in a CI environment, where we want to have things set up completely from scratch.

I've very keen for compose to support some kind of wait / check behaviour, but there are some deceptively tricky cases out there. I'm happy to participate in discussions.

pugnascotia on 14 Aug 2015

@prologic All you have to do is run the waitforservices command in a Docker container depending on other services/containers before running your application or tests. It finds all linked services and runs until it can connect to them or a certain time has passed (60 seconds by default). Just run all code that depends on other services after the binary quit (you might want to check the exit status though).

@pugnascotia Your database server could listen on localhost only while being bootstrapped - you'll have to expose some kind of indicator whether the container is ready anyway. We don't use MySQL, but waitforservices works perfectly with the official postgres image.

mfrister on 14 Aug 2015

@aanand Postgres is an excellent example to choose actually, because waiting for the TCP port to open is _not_ enough - if you do that in this type of (docker) scenario then you will sometimes get hit with an error like FATAL: the database system is starting up. if your other container connects too quickly after the TCP connection is open. So therefore psql does seem necessary if you want to be sure that postgres is ready - I use select version() but maybe there's a lighter alternative.

rarkins on 20 Aug 2015

There's a particular Maven docker plugin that has an interesting wait implementation. At the moment I'm using a bash wrapper to start each service, which rather defeats the point of using compose.

pugnascotia on 20 Aug 2015

Using this as a workaround (not sure it's bulletproof):

db:
  image: postgres:9.3
  ports:
    - "5432:5432"
createdbs:
  image: postgres:9.3
  links:
    - db
  command: >
    /bin/bash -c "
      while ! psql --host=db --username=postgres; do sleep 1; done;
      psql --host=db --username=postgres -c 'CREATE DATABASE \"somedatabase\";';
    "

olalonde on 20 Aug 2015

I've been using similar methods to @olalonde. When utilizing single quotes for the command that is executed after /bin/bash -c, I am also able to utilize environment variables that are re-used from links when other applications so I can use usernames and passwords without having to maintain them in two places. This works well for situations where I have a service, such as an API, that needs a database to be up, and have the proper data bootstrapped by running a query checking to see if a specific table or record exists. This also means that I need to have some sort of client installed in the container to query the database properly but it works.

mbentley on 20 Aug 2015

+1 I'm really interested in this functionality

robmorgan on 22 Aug 2015

+1 to dependencies. I agree that in principle architecture should be robust enough to support any start up order. BUT, often times, doing so is not practical.

It feels like the push back for this feature stems from others trying to dictate architecture from afar; where they really have no right to do so. This feature would allow for working around time consuming refactors; with little long term value. (Refactoring for the sake of refactoring).

Yes, I said "work around"; and I feel dirty. But compose to me is really about enabling others to be productive. This simple configuration enables that.

If this feature existed in the toolest, I could solve my problem in a minute and move on to adding real value. Instead, I'm banging my head against a wall trying to work around start up order issues with external dependencies.

beardface on 26 Aug 2015

@beardface, and everyone else: Which feature, specifically, would enable you to move on with developing your app?

The ability to specify that service A must wait to start until service B has started? (which, keep in mind, still won't solve the race condition when a container's started but not ready to accept connections)
The ability to specify that a service A must wait to start until service B is accepting connections? (which, keep in mind, still won't solve the race condition when a container's listening but hasn't finished initialising - e.g. a postgres container creating a database on startup, to use @rarkins' example)
The ability to define a health check on service B, and to specify that service A must wait to start until service B's health check passes?

aanand on 26 Aug 2015

@aanand Number 3 is my vote. It gives the flexibility to the developer to choose the health check logic instead of again relying on the service to decide when it's time. For this particular use case, more developer freedom is better. Impossible to anticipate all the types of apps that will be installed in containers.

mattwallington on 26 Aug 2015

👍1

+3

Would be nice to have some basic health checks included, though. Something like _http 200 ok on port 80_ is so common that it might be worth the effort.

jayfk on 26 Aug 2015

It would be nice to have a "batteries included" number 3 (so dare I say all of the above?)

Ie built in capability for "container is up", "file is present" and "port is open" type of waits and then a way to let people define their own "application layer" checks.

rarkins on 26 Aug 2015

3 gets my vote

faceleg on 26 Aug 2015

3 is a more general 2 is a more general 1. Everyone will prefer 3, but 2 or 1 will be good enough for some.

ariscn on 26 Aug 2015

Vote for 3

rteabeault on 27 Aug 2015

Obviously the 3rd option. There are loads of cases only in this discussion. So the 1st and 2nd options would be great and many people would be happy, but the issue would remain open.

adrianhurt on 27 Aug 2015

Vote for 3

cteyton on 27 Aug 2015

Vote for 3

artem-sidorenko on 27 Aug 2015

Vote for 3. Interested in beta testing it too.

pugnascotia on 27 Aug 2015

Vote for 3.

volkodavs on 27 Aug 2015

Vote for 3. Interested in beta testing it too.

robsonpeixoto on 27 Aug 2015

Vote for 3

isabelabel on 27 Aug 2015

3

gtmtech on 27 Aug 2015

Thanks everyone. You can stop voting now - I think the message is clear.

I'd be interested in feedback on the design I proposed in https://github.com/docker/compose/issues/374#issuecomment-126312313 - any alternate design proposals, along with discussions of their strengths/weaknesses.

aanand on 27 Aug 2015

The wait_for_tcp convenience method would be useful, but it isn't clear to me how having a separate container to do the health check is any easier than doing it in the same container as described by @olalonde and @mbentley above.

ryneeverett on 27 Aug 2015

What about doing something like what alexec is doing in the docker-maven-plugin? He explicitly designed his configuration to be similar to docker-compse/fig and so far, it was worked great for my projects.

healthChecks:
  pings:
     # check this URL for 200 OK
     - https://localhost:8446/info
     # check another URL with non-default time out, with a pattern, and non checking SSL certificates
     - url: https://localhost:8446/info
       timeout: 60000
       pattern: pattern that must be in the body of the return value
       sslVerify: false
  logPatterns:
     - pattern that must be in log file
     - pattern: another pattern with non-default timeout
       timeout: 30000

Source: https://github.com/alexec/docker-maven-plugin/blob/master/USAGE.md

ceagan on 27 Aug 2015

Checking that tcp connection is up not enough for knowing that database is started. I think better to have some command which is check database health.

volkodavs on 28 Aug 2015

@ceagan Like that. Custom commands would be useful too. e.g.

healthChecks:
  custom:
    # retry this command until it returns success exit code
    - cmd: psql --host=localhost --username=postgres
      sleep: 1s

I also think checks would be better than healthChecks because no need to remember case convention. Might also be useful to have wait (how many seconds to wait before starting to run health checks), attempts (how many times health should be checked before giving up), retire (how many seconds to wait before giving up completely) parameters.

olalonde on 28 Aug 2015

this goes in the direction how the marathon framework solved it. basic dependencies plus healthchecks. as it is very popular for starting docker containers it's worth to check it's options (timeouts, interval, response codes etc.) to adopt them for compose.

MikeMichel on 31 Aug 2015

Same here, option 3.

deviantony on 5 Sep 2015

+1

Nickology on 10 Sep 2015

+1

confususs on 15 Sep 2015

+1

I'll stick with a custom wait script, however, it would be very nice.

floriank on 16 Sep 2015

+1

cusspvz on 20 Oct 2015

+1

maguowei on 21 Oct 2015

Options 3-healthChecks sounds good.

vitan on 2 Nov 2015

+1

niieani on 5 Nov 2015

+1

tabbi89 on 7 Nov 2015

+1

kamrar on 8 Nov 2015

+3.
To add some other thoughts to the discussion, the choices @aanand proposed are really saying: the state of containers is responsibility of docker, not docker-compose. Docker-compose could implement all these use cases in a clean and elegant way, if docker provided state information. But docker doesn't. It seems to stick to the vision of immediate, stateless containers, launching so fast that these kinds of synchronization issues are unimportant.
In my case, I pursue the idea of being able to choose the best architecture for my services, for each case. For example, sometimes I'd want multiple MariaDB instances, each one serving a single application. Other times I'd want a single MariaDB instance, serving multiple applications. I don't want Docker to tell me what is best, or what I should be doing instead. Docker always seems to have these kind of temptations ;).
I think the best solution is to convince Docker to let containers declare arbitrary metadata about themselves, and use that feature to let docker-compose find out whether a container is considered "ready" so that others can rely upon.
As for the "single db-multiple apps" approach, I'd like a definition such as:

db:
  image: postgres:9.3
  ports:
    - "5432:5432"
app1:
  image: wordpress
  links:
    - db [WP]
app2:
  image: ghost
  links:
    - db [GST]

Docker Compose would launch "db", and ask about its metadata (relevant to Docker Compose). From the yml file, it knows "app1" expects "db" to be "wordpress ready" (meaning not only accepting connections, but also with the required objects).

I don't have a simple solution for how to solve this situation. I currently do it manually, in two steps: a custom postgresql-bootstrap image, in which I create the db and the database user to access it; and a custom liquibase-postgresql image, to generate the database objects from the DDLs provided by (or extracted from) the Wordpress container. Only then, I can launch "app1".
That forces me to separate the containers in "infrastructure" and "apps" groups, if the "infrastructure" containers are serving different applications.

Docker Compose wants to be as stateless as Docker itself. I don't know if that's possible if it wants to be really useful.

rydnr on 8 Nov 2015

+1 for option 3

alexpop on 8 Nov 2015

+1

carloscarcamo on 10 Nov 2015

+1 for option 3.

xiaohanyu on 11 Nov 2015

+1 for option 3

JonathanRosado on 12 Nov 2015

+1 for option 3

pozgo on 13 Nov 2015

This issue has been around a while, what is the solution?

bweston92 on 13 Nov 2015

@bweston92 I think the status is that @aanand proposed a solution earlier in this thread and is looking for

any alternate design proposals, along with discussions of their strengths/weaknesses.

Personally, I think @aanand's proposed solution makes a lot of sense. It seems very explicit to me, while also being flexibly. It would cover my needs ot waiting for a TCP port to open or just waiting a fixed amount of time.

My use case is just for testing. My tests will fail if they start before the database has been created, so I changed their comand to bash -c "sleep 2; python manage.py test --keepdb", like this:

db:
    image: postgres:9.5
test:
    build: .
    command: bash -c "sleep 2; python manage.py test --keepdb"
    volumes:
        - ./test_project:/app
    links:
        - db
        - selenium
    environment:
        - EXTERNAL_TEST_SERVER=http://testserver:8000/
        - SELENIUM_HOST=http://selenium:4444/wd/hub
selenium:
    image: selenium/standalone-chrome:2.48.2
    links:
        - testserver
testserver:
    build: .
    command: bash -c "sleep 5; python manage.py testserver 8000 --static"
    volumes:
        - ./test_project:/app
    ports:
      - "8000:8000"
    links:
        - db

so that I can run docker-compose run test without starting the database first and waiting.

saulshanabrook on 13 Nov 2015

It's hard to tell which issue these days is the "right" place to vote for a new docker compose functionality that allows explicit dependencies to be declared, but consider my vote a strong one. With the new Docker 1.9 networking functionality, and looming deprecation of container links in favor of it, there is now no great way to make sure that container A starts up before container B — because if you use Docker 1.9 user-defined networking, you can no longer specify container links. That's... broken.

delfuego on 18 Nov 2015

I agree. Is there a timeline for getting option 3? It would be great to have this expedited.

mbdas on 18 Nov 2015

It's worth noting that dependency ordering doesn't actually fix this problem. This issue applies to links as well. In many cases a container starts fast enough that you don't notice the issue, but the issue is still there.

What is necessary to solve this problem is an application-aware healthcheck. A healthcheck is basically a loop which retries some operation until either: the operation is successful, or a timeout is hit. In the case of an HTTP service this might be making http requests until you get a 2xx code. For a database it might be connecting and selecting from a table.

Whatever the case, it is application specific, so needs to be defined by the developer. If we were to implement option 3 from https://github.com/docker/compose/issues/374#issuecomment-135090543, you would still need to implement this healthcheck logic.

It's been mentioned a few times in this issue already (https://github.com/docker/compose/issues/374#issuecomment-53036154, https://github.com/docker/compose/issues/374#issuecomment-71342299), but to re-iterate, you can solve this problem today by making your application resilient to failure by retrying a connection. You need to do this anyway for any production system.

It turns out the functionality to make your application resilient to failure is effectively the same logic as a healthcheck. So, either way, you still need to implement the same logic. The only difference would be where you include it. Right now you can include it in your application, or an entrypoint script. With the proposed change you would be able to define it in the Compose file. Either way, you're still having to implement a healthcheck for each service.

Is there a significant advantage to including it in the Compose file instead of the entrypoint script? That is maybe still up for debate.

The big _disadvantage_ to putting it in the Compose file is that it makes up significantly slower.

With the new networking we can make up happen in parallel (like we do for stop, rm, and scale). Every container can start at once, do some initialization, then wait for it's dependencies to be available to proceed. This makes starting an environment very fast.

If Compose has to wait for a healthcheck to complete, startup is effectively sequential. Container startup and application initialization doesn't happen in parallel, and everything is slower.

dnephin on 18 Nov 2015

Most apps will have health checks because of behind LB , monitored externally etc. To open up a new one is not difficult. So, if compose supports it then it's a choice that folks can use. It's not mandatory. In real world folks have to deal with a variety of applications and this notion that suddenly all apps can be made smart is unrealistic and impractical. And wrapper logic in entry point is just ugly. I think there has been enough demand in community for a feature and as you see option 3 got a lot of votes.

Sent from my iPhone

On Nov 18, 2015, at 11:01 AM, Daniel Nephin [email protected] wrote:

It's worth noting that dependency ordering doesn't actually fix this problem. This issue applies to links as well. In many cases a container starts fast enough that you don't notice the issue, but the issue is still there.

What is necessary to solve this problem is an application-aware healthcheck. A healthcheck is basically a loop which retries some operation until either: the operation is successful, or a timeout is hit. In the case of an HTTP service this might be making http requests until you get a 2xx code. For a database it might be connecting and selecting from a table.

Whatever the case, it is application specific, so needs to be defined by the developer. If we were to implement option 3 from #374 (comment), you would still need to implement this healthcheck logic.

It's been mentioned a few times in this issue already (#374 (comment), #374 (comment)), but to re-iterate, you can solve this problem today by making your application resilient to failure by retrying a connection. You need to do this anyway for any production system.

It turns out the functionality to make your application resilient to failure is effectively the same logic as a healthcheck. So, either way, you still need to implement the same logic. The only difference would be where you include it. Right now you can include it in your application, or an entrypoint script. With the proposed change you would be able to define it in the Compose file. Either way, you're still having to implement a healthcheck for each service.

Is there a significant advantage to including it in the Compose file instead of the entrypoint script? That is maybe still up for debate.

The big disadvantage to putting it in the Compose file is that it makes up significantly slower.

With the new networking we can make up happen in parallel (like we do for stop, rm, and scale). Every container can start at once, do some initialization, then wait for it's dependencies to be available to proceed. This makes starting an environment very fast.

If Compose has to wait for a healthcheck to complete, startup is effectively sequential. Container startup and application initialization doesn't happen in parallel, and everything is slower.

—
Reply to this email directly or view it on GitHub.

mbdas on 18 Nov 2015

@dnephin The end result is that you have the same wrapper scripts for each and every service. My point is that there are some things that are so common (like HTTP 200 on 80 and 443 or TCP on 5432) that it is a good idea to ship them with compose.

Sure, it would be cool to solve all this on the application level but in reality you'll only have control over your own application and not all the other moving parts like the database, cache or massage queue.

jayfk on 18 Nov 2015

I agree with both @mbdas and @jayfk, and will just add: if the resistance to this is that even with dependency-specifications and resultant ordering of container startup, there will be failures, then the use of container links and volumes-from to control container startup order shouldn't ever have happened — all we're asking for is that now that the new network model means that links are being deprecated (and that the new network model literally can't coexist with links), the same startup-order functionality that links allowed be given back to us in some way. Sure, any failure cases that might've happened with link-based container ordering might still happen with the new network model and container dependencies, but we've all learned to live with that.

delfuego on 19 Nov 2015

@delfuego: can you elaborate on how links are being deprecated, and especially by what they are replaced? link to some doc/examples is enough

Silex on 19 Nov 2015

@Silex https://docs.docker.com/compose/networking. is it you mean?

h17liner on 19 Nov 2015

@h17liner: yes it is, interesting! thanks

Silex on 19 Nov 2015

While I agree with @dnephin that

It's been mentioned a few times in this issue already, but to re-iterate, you can solve this problem today by making your application resilient to failure by retrying a connection. You need to do this anyway for any production system.

I don't see this as making since for something like running tests. If I am just testing whether a model saves correctly in Django app, I am not sure how much sense it makes to add resiliency to the database connection.

saulshanabrook on 19 Nov 2015

@delfuego I think you were in the right place originally (#686) for that issue. This issue is not about ordering, it's about artificial delay in startup (when an order already exists). While these things are related, they are separate issues.

dnephin on 19 Nov 2015

I disagree with links not supported in user created bridge networks and documented to be deprecated in general there is no ordering. So, option 3 takes care of both ordering and when to start issue.

Sent from my iPhone

On Nov 19, 2015, at 8:14 AM, Daniel Nephin [email protected] wrote:

@delfuego I think you were in the right place originally (#686) for that issue. This issue is not about ordering, it's about artificial delay in startup (when an order already exists). While these things are related, they are separate issues.

—
Reply to this email directly or view it on GitHub.

mbdas on 19 Nov 2015

I'd like to propose option 4 (some may say it's actually a variation of 3)
Container is not ready until all startup commands are finished with 0 exit code. Must be possible to define these startup commands in yml file for each container. These commands executed like you would do with "docker exec" against running container. Think setUp() and tearDown() methods in classic unit testing. Yes, we could have "shutdown" commands as well.
Obviously next container in hierarchy not launched until all containers it depends on are ready.
P.S. Thanks for great DockerCon.Eu 2015

ysoldak on 19 Nov 2015

A HEALTHCHECK directive is much more flexible (i.e. can be used at any later point) and helpful. Setup should be done within the CMD or (better yet) ENTRYPOINT script, teardown by handling process signals.

fbender on 20 Nov 2015

I think the crux of the problem here is that people want a single docker-compose up command to bring up a stack and everything just magically works.

Based upon all of the feedback there are clearly many solutions for different use cases but no "one size fits all".

You can execute "initialisation" tasks quite easily by executing multiple docker-compose commands - and I think this approach is the most generic and flexible approach.

For example I run an Ansible playbook in an "agent" container with a single task that waits for the my database (MySQL) container to be running on port 3306. This "agent" container is linked to the my "db" container so automatically starts it when the following is executed:

$ docker-compose run --rm agent
Creating db_1

PLAY [Probe Host] *************************************************************

TASK: [Set facts] *************************************************************
ok: [localhost]

TASK: [Message] ***************************************************************
ok: [localhost] => {
    "msg": "Probing db:3306 with delay=0s and timeout=180s"
}

TASK: [Waiting for host to respond...] ****************************************
ok: [localhost -> 127.0.0.1]

PLAY RECAP ********************************************************************
localhost                  : ok=3    changed=0    unreachable=0    failed=0

After which I can run docker-compose up knowing that the db container is fully operational.

Here is a simple docker-compose.yml file that supports this:

...
...
db:
  image: mysql
  hostname: db
  expose:
    - "3306"
  environment:
    MYSQL_DATABASE: xxx
    MYSQL_USER: xxx
    MYSQL_PASSWORD: xxx
    MYSQL_ROOT_PASSWORD: xxx

agent:
  image: cloudhotspot/ansible
  links:
    - db
  volumes:
    - ../../ansible/probe:/ansible
  environment:
    PROBE_HOST: "db"
    PROBE_PORT: "3306"

The "agent" container runs a playbook named site.yml in the /ansible mounted volume which is shown below:

- name: Probe Host
  hosts: localhost
  connection: local
  gather_facts: no
  tasks: 
    - name: Set facts
      set_fact: 
        probe_host: "{{ lookup('env','PROBE_HOST') }}"
        probe_port: "{{ lookup('env','PROBE_PORT') }}"
        probe_delay: "{{ lookup('env','PROBE_DELAY') | default(0, true) }}"
        probe_timeout: "{{ lookup('env','PROBE_TIMEOUT') | default (180, true) }}"
    - name: Message
      debug: msg="Probing {{ probe_host }}:{{ probe_port }} with delay={{ probe_delay }}s and timeout={{ probe_timeout}}s"
    - name: Waiting for host to respond...
      local_action: >
        wait_for host={{ probe_host }}
        port={{ probe_port }}
        delay={{ probe_delay }}
        timeout={{ probe_timeout }}
      sudo: false

One solution to the single docker-compose up goal might be to introduce a "workflow" feature to docker compose and include a optional workflow spec file that allows for more complex and controlled orchestration scenarios by specifying one or more docker-compose commands as "tasks" that should be executed:

# The default workflow, specified tasks will be run before docker-compose up
# The "up" task is implicit and automatically invoked for the default workflow
# The "up" task is explicit for custom workflows as some workflows may not want docker-compose up
default:
  tasks:
    - run --rm agent
    - up

# Custom workflows that can be invoked via a new docker-compose command option
# This example:
# 1. Runs agent container that waits until database container is up on port 3306
# 2. Runs Django database migrations from app container
# 3. Runs Django collect static task from app container
# 4. Runs test container that runs acceptance tests against linked app container
# Does not execute a docker-compose up afterwards

test:
  tasks:
    - run --rm agent 
    - run --rm app manage.py migrate
    - run --rm app manage.py collectstatic --noinput
    - run --rm test

Today I achieve the above using Makefiles, which provide a higher-order ability to define my own workflows for different scenarios.

It would be great if a "workflow" feature or similar could be introduced into docker compose, which would provide a very generic and flexible solution to this particular issue and many more.

mixja on 21 Nov 2015

Agreed, the problem is that people expect docker-compose to be sufficient for production deployments. Personally, I think it'll be a long time before that becomes feasible and Kubernetes / Helm seem to be a lot closer to that goal.

olalonde on 21 Nov 2015

@olalonde, sure we'd love compose to be production-ready... but we'll TAKE it supporting important and existing functionality that, given the deprecation of container links, is going to disappear unless it is replicated over to the new user-created-networks model. (Again, maybe this request isn't perfectly aligned with this specific issue -- it remains unclear to me whether just getting container startup ordering "belongs" here or in issue #686...)

delfuego on 21 Nov 2015

With a HEALTCHECK Docker directive, the depends_on functionality can either wait for container startup (no healthcheck) or the healthcheck script to exit successfully (exit code 0). This is as flexible as it gets (you can define arbitrary logic) and keeps the healthcheck logic where it belongs (within the container that is checked).

fbender on 21 Nov 2015

@delfuego even for development and testing, this functionality would be helpful. Personally, I wanna be able to do docker-compose run test and have it work without bringing up services beforehand and manually waiting. Although this is possible, it just makes it a bit more painful to get started with the project and adds more ways testing can fail.

saulshanabrook on 21 Nov 2015

+1

gdubicki on 23 Nov 2015

I think the resolution requires a middle way - compose will never be able to account for all the different ways in which applications can be considered available or not. The idea of a health check will mean different things to different people, and might not be as simple as "is it up or not". In production, you might take a container down if it was exhibiting unusually long response times, even it was passing any HTTP checks.

I therefore feel that basic support for HTTP responses, open ports, files created or log lines emitted should be enough for development. Anything more advanced than that almost immediately becomes application specific. I also like the idea of encouraging developers to make the individual parts of their application stacks more robust.

pugnascotia on 24 Nov 2015

@pugnascotia thanks, that's a constructive comment, and a reasonable approach ("best of both worlds"?)

thaJeztah on 24 Nov 2015

The currently discussed solutions don't seem to actually address the _originally reported_ issue which is far simpler... which is NOT waiting for a service to be available, but to wait for a service to EXIT.

I have a use-case where I have two containers which expose the same ports. The first runs for 15-60 seconds, then exits. Then the second service should start. There is no (obvious?) way to do this in compose today as it will detect the port conflict and will exit; not even 'restart: always' is a solution.

ewindisch on 4 Dec 2015

Yes, Compose is not designed for that use-case. Compose is focused around runtime environments, not build pipelines. I don't think that's what the original reported issue is about.

There have been a few requests for more build-oriented features, but I don't think they make sense for compose. The two functions are very different and trying to make them fit in the same configuration format is likely to lead to a lot of confusions and a bad user experience.

dnephin on 4 Dec 2015

@ewindisch your use case can be generalized to running a chain of batch jobs. Compose is helpful for that case (despite not being designed for it) because it maintains dependencies between services - e.g. those chains. But it does not handle sequencing and IMHO it shouldn't because Compose is _outside_ of containers and has no idea what a process _inside_ a container is going to do.

kevstigneev on 6 Dec 2015

This part of the Compose documentation covers the question of why Compose doesn't have this capability:

https://docs.docker.com/compose/faq/#how-do-i-get-compose-to-wait-for-my-database-to-be-ready-before-starting-my-application

However these pages don't mention the issue at all:

https://docs.docker.com/compose/django/
https://docs.docker.com/compose/rails/
https://docs.docker.com/compose/wordpress/

At the very least, these pages should include an acknowledgment that Compose won't wait for a database container to be ready. They could also include examples of ways to deal with it.

vr2262 on 6 Dec 2015

@ewindisch Actually, that's exactly what I was proposing in https://github.com/docker/compose/issues/374#issuecomment-126312313 - with the hypothesis that solving _that_ problem also gives users the tools to solve the startup ordering problem (if not the problem of longer-term resilience).

I'm still interested in exploring that solution space, if anyone else is.

aanand on 7 Dec 2015

I am.

pugnascotia on 7 Dec 2015

Me too.

jayfk on 7 Dec 2015

+1

kikulikov on 8 Dec 2015

+3

rafabene on 11 Dec 2015

+1

yermulnik on 11 Dec 2015

+1 for implementing the https://github.com/docker/compose/issues/374#issuecomment-126312313 solution.

jspizziri on 13 Dec 2015

Massively upvoting!
Currently this effects usage of tools that run in container, but rely on docker events (e.g. jwilder/nginx-proxy). The way I do it is just docker-compose up the listener manually and run all other containers afterwards (which spoils all the beauty of docker-compose up as a single entry point).

meetmatt on 14 Dec 2015

@meetmatt have you tried running jwilder/nginx-proxy afterwards? Starting order shouldn't matter for that, it'll pick up existing (running) containers when started

thaJeztah on 14 Dec 2015

+1

rpherrera on 15 Dec 2015

+1

wicksy on 31 Dec 2015

I'd really like to see a transparent channel-based solution. Basically like libchan. That way if I query a database the request is buffered until the database is ready.

I really don't think load order is a sufficient solution in distributed systems. What if, for example, you need to restart your database but other services might crash as a result? A real solution would handle this use case as well.

CooCooCaCha on 2 Jan 2016

Count me in, as predictability of the execution pipeline is a key thing for what we do at work. +1

burhan on 6 Jan 2016

+1

ezeql on 12 Jan 2016

I must be missing something.

Why is no one advocating adding a "wait until" into docker run (the docker engine itself). In all the cases I can think of, the dependent container knows when it is "ready" but docker doesn't respect that.

In the original case (mysql loading a large data set and alfresco), the mysql container can return or signal when it is ready and the alfresco container wouldn't start until then.

I would like to run arbitrary logic and signal to docker when I am ready as decided upon by me (e.g. when a certain message in the log appears-->signal CONTAINER_UP).

iamjoshd on 13 Jan 2016

👍2

net: "container:[name or id]" why not ordering the startups of my containers? I had to drop links because it will be deprecated and I want the whole stack to use net: "host" networking. Unfortunately this is not allowed with links. Is there another way to change the boot order of the containers or I have to share useless volumes between them?

Update:

I just did the re-ordering with useless volumes instead of links:

base:
  build: ./base
  net: "host"
  volumes:
    - /root/lemp_base
phpmyadmin:
  build: ./phpmyadmin
  net: "host"
  volumes_from:
    - base
  volumes:
    - /root/lemp_phpmyadmin
ffmpeg:
  build: ./ffmpeg
  net: "host"
  volumes_from:
    - phpmyadmin
  volumes:
    - /root/lemp_ffmpeg
mariadb:
  build: ./mariadb
  net: "host"
  volumes_from:
    - ffmpeg
  volumes:
    - /root/lemp_mariadb
php:
  build: ./php
  net: "host"
  volumes_from:
    - mariadb
  volumes:
    - /root/lemp_php
nginx:
  build: ./nginx
  net: "host"
  volumes_from:
    - php
  volumes:
    - /root/lemp_nginx

(I cleared other shared volumes from the stack and another infos like container_name, ports to look simple.)

If I want to use with net: "container:base, I've got an error message on docker-compose build command.

ERROR: Service "mariadb" is trying to use the network of "lemp_base", which is not the name of a service or container.

What I not like about this solution is that every other container will got the webserver files at /var/www folder from base.

EDIT:
For some reason this stack deleting the entire /var/www folder on startup.

DJviolin on 18 Jan 2016

My humble opinion is that any mechanism ending up with Docker Compose knowing about dependency between container goes against Separation of Concerns. Docker Compose is responsible with running containers A and B. Containers A and B are responsible for their own service. If B depends on A to work correctly, it is B's responsibility to wait for A to be in working condition. As has been said in the discussion, this can be done through timeout, retry or whatever, but this is B's problem, not Docker Compose nor A. SoC is of paramount importance for service independence and proper scaling.

jp-gouigoux on 18 Jan 2016

👎1

Is there any work on your idea 3 @aanand ? It would be good to know if there's any progress, it sounded like a promising start which would help out a few very common usecases even if not a perfect solution

gtmtech on 19 Jan 2016

+1

dano-reisys on 24 Jan 2016

+1

amir-rahnama on 24 Jan 2016

Maybe I'm wrong but args: buildno: can order the containers in docker-compose.yml version 2?

DJviolin on 24 Jan 2016

I tend to agree that this is a concern that doesn't belong in Compose. @jwilder's excellent Dockerize just got support to wait for dependent containers and you can specify the protocol/port which you are waiting on. I would suggest this suits most of the use cases described here:

api:
  build: .
  ports:
   - "8000:80"
  expose:
  - "80"

test:
  build: test
  command: dockerize -wait http://api:80 -wait tcp://db:5432 somecommand -some arg -another arg2
  links:
    - api:api

Ideally we use the Docker Events API to automatically detect this, but that would mean every container would also need access to the Docker runtime which probably isn't feasible/something we would want.

mefellows on 25 Jan 2016

👎2

I think that the wait should be done outside of compose. In my dev I'm going to use hybrid of what @mefellows suggested and timercheck.io status pages. I thing that will give me exactly what I need without a hastle of using RabbitMQ or something similar.

pozgo on 25 Jan 2016

We are using a shell script entrypoint that waits for the open port, with a timeout of 15 seconds:

#!/usr/bin/env bash

# wait for db to come up before starting tests, as shown in https://github.com/docker/compose/issues/374#issuecomment-126312313
# uses bash instead of netcat, because netcat is less likely to be installed
# strategy from http://superuser.com/a/806331/98716
set -e

echoerr() { echo "$@" 1>&2; }

echoerr wait-for-db: waiting for db:5432

timeout 15 bash <<EOT
while ! (echo > /dev/tcp/db/5432) >/dev/null 2>&1;
    do sleep 1;
done;
EOT
RESULT=$?

if [ $RESULT -eq 0 ]; then
  # sleep another second for so that we don't get a "the database system is starting up" error
  sleep 1
  echoerr wait-for-db: done
else
  echoerr wait-for-db: timeout out after 15 seconds waiting for db:5432
fi

exec "$@"

saulshanabrook on 25 Jan 2016

👍3

This should be solved by the forthcoming (and apparently imminent with the docs being updated) depend_on, yeah?

CWSpear on 28 Jan 2016

Nope. depends_on is only ordering. To actually delay the starting of another container there would need to be some way to detect when a process has finished initializing itself.

dnephin on 28 Jan 2016

Ah, thanks for the clarification. =)

CWSpear on 28 Jan 2016

maybe this can help http://www.onegeek.com.au/articles/waiting-for-dependencies-in-docker-compose

jianbin-wei on 28 Jan 2016

I have written a pure bash command line utility called wait-for-it that can be included in docker deployments to help synchronize service deployments.

vishnubob on 2 Feb 2016

To me it's not a good idea to hard-code an arbitrary collection of "availability checks". There are numerous situations that are specific to one kind of deployment and you can never cover them all. Just as an example, in my multi-container app I need to wait for a certain log message to appear in a certain log file - only then will the container service be ready.
Instead what's needed is an SPI that I can implement. If Docker provides some example implementations for the most frequent use cases (e. g. TCP connect), that's fine. But there needs to be a way for me plug in my own functionality and have Docker call it.
Docker Compose is pretty much useless to me as a whole product, if I can't get my containers up and running dependably. So a stable and uniform "container service readiness SPI" is needed. And "ready" should not be a boolean, as there are possibly more levels of readiness (such as: "now you can read" and "now you can write").

realulim on 12 Feb 2016

@realulim Good writeup. I fully agree with the idea of letting us define what a service's "ready" state means via plugins. I also think it's a good idea to have a default for the plugin that only checks that a service is listening to a http/tcp connection. That would cover a majority of cases right there.

gittycat on 24 Feb 2016

This is what I came up with, in entrypoint file;

until netcat -z -w 2 database 5432; do sleep 1; done
# do the job here, database host on port 5432 accepts connections

kulbida on 27 Feb 2016

👍5

@kulbida ,
I do something very similar with MySQL. "database" in this case is a link in a compose file.

if [[ "$APP_ENV" == "local" ]]; then
    while ! mysqladmin ping -h database --silent; do
        sleep 1
    done
    # Load in the schema or whatever else is needed here.
fi

pgporada on 29 Feb 2016

There have been some comments in this thread which claim that startup ordering is only a subset of application level error recovery, which your application should be handling anyway. I would like to offer up one example to illustrate where this might not always be the case. Consider if some services depend on a clustered database, and whenever a quorum is lost due to a crash etc, you do _not_ want to automatically retry from the app. This could be the case for example if database recovery requires some manual steps, and you need services to remain unambiguously down until those steps are performed.

Now the app's error handling logic may be quite different from the startup logic:

If the db is down because we're just starting up, wait for it to become available.
If the db is down because it crashed, log a critical error and die.

It may not be the most common scenario, but you do see this pattern occasionally. In this case, clustering is used to solve the "network is unreliable" problem in the general case, which changes some of the expectations around which error conditions should be retried in the app. Cluster crashes can be rare enough, and automatically restarting them can be risky enough, that manually restarting services is preferred to retrying in the application. I suspect there are other scenarios as well which might challenge assumptions around when to retry.

More generally, I'm claiming that startup ordering and error handling are not always equivalent, and that it's appropriate for a framework to provide (optional) features to manage startup order. I do wonder if this belongs in docker-engine, though, rather than compose. It could be needed anytime docker starts up, regardless of whether compose is used.

mglasgow42 on 14 Mar 2016

👍1

There is a discussion starting on the docker engine repo in proposal https://github.com/docker/docker/issues/21142 to add support for health checking. Once this support is available it will be possible for Compose to provide a way to configure it, and use it for a delayed start up.

dnephin on 14 Mar 2016

👍3

How about using the filesystem to check for the existence of a file?

ready_on: /tmp/this_container_is_up_and_ready

That way it's up to the container developer to decide when things are UP, but compose can wait until the container declares itself ready. It's an explicit convention, but could be easily added as an additional layer to images that don't have that behaviour..

konobi on 25 Mar 2016

Built-in support for health checks will be good; in the meantime here's the hack I got working in my local docker-compose setup:

    nginx:
        image: nginx:latest
        command: /bin/bash -c "sleep 2 && echo starting && nginx -g 'daemon off;'"
        ...

(In production, my app proxies to a few already-running upstream servers using proxy_pass; in local dev and test, I start docker instances of these, and nginx needs to wait a bit for them to start, else it crashes and dies. The daemon off thing keeps nginx in a single process, else docker will stop the container as soon as the parent process spawns its daemon child.)

alexch on 13 Apr 2016

Just to add my two cents, if you happen to be using the ANT build tool it comes with builtin support to delay execution until a certain socket is open.

Our Jenkins CI server spins up the project containers with Docker Compose and then runs ANT from within the main container, like this:

docker-compose up -d
docker exec -it projectx-fpm-jenkins ant -f /var/www/projectX/build.xml

This is the relevant piece of configuration from the docker-compose.yml file. Note that, as discussed above, making fpm depend on mysql is not enough to guarantee that the MySQL service will be ready when it is actually needed.

version: '2'
services:
  nginx:
    build: ./docker/nginx
    depends_on:
      - fpm
  fpm:
    build: ./docker/fpm
    depends_on:
      - mysql
  mysql:
    image: mysql:5.7
    environment:
      - MYSQL_ROOT_PASSWORD=projectx
      - MYSQL_DATABASE=projectx

But you can wait for it during the ANT task:

<!-- other targets... -->

<target name="setup db">
    <!-- wait until the 3306 TCP port in the "mysql" host is open -->
    <waitfor>
        <socket server="mysql" port="3306"/>
    </waitfor>

    <exec executable="php">
        <arg value="${consoledir}/console"/>
        <arg value="doctrine:database:create"/>
        <arg value="--no-interaction"/>
    </exec>
</target>

1ma on 20 Apr 2016

@kulbida That did the trick, thanks. Something a bit faster:

while ! nc -w 1 -z db 5432; do sleep 0.1; done

skorokithakis on 1 May 2016

👍9

_depends_on_ might solve the issue.
From docker-compose documentation.
Express dependency between services, which has two effects:

docker-compose up will start services in dependency order. In the following example, db and redis will be started before web.
docker-compose up SERVICE will automatically include SERVICE’s dependencies. In the following example, docker-compose up web will also create and start db and redis.

version: '2'
services:
web:
build: .
depends_on:
- db
- redis
redis:
image: redis
db:
image: postgres

@alexch : at a customer side performance test(micro-service routed via nginx+). Dockerized nginx test - a dip in load from very highs to a near zero low was repeating every 1-2 mins. Finally decided to go with non-dockerized Nginx running as a VM (just because of the huge performance difference), maybe a network driver plugin / libNetwork issue.

syamsathyan on 5 May 2016

@syamsathyan depends_on doesn't appear to help.

nottrobin on 8 Jun 2016

@skorokithakis, @kulbida this is a nice solution. Unfortunately, netcat isn't available by default in any of the services that I need to connect to my database (including postgres). Do you know of any alternative method?

nottrobin on 8 Jun 2016

@nottrobin I'm afraid not, I just installed it in my image :/

skorokithakis on 8 Jun 2016

@nottrobin my team is working on this, will let you know in a day or two!

syamsathyan on 8 Jun 2016

For those having recent bash, there is a netcat-free solution (inspired by: http://stackoverflow.com/a/19866239/1581069):

while ! timeout 1 bash -c 'cat < /dev/null > /dev/tcp/db/5432'; do sleep 0.1; done

or less verbose version:

while ! timeout 1 bash -c 'cat < /dev/null > /dev/tcp/db/5432' >/dev/null 2>/dev/null; do sleep 0.1; done

typekpb on 9 Jun 2016

👍4

@typekpb that works perfectly. Thanks!

nottrobin on 9 Jun 2016

Now that HEALTHCHECK support is merged upstream as per https://github.com/docker/docker/pull/23218 - this can be considered to determine when a container is healthy prior to starting the next in the order. Half of the puzzle solved :)

CpuID on 9 Jun 2016

🎉15 👍4

Now that HEALTHCHECK support is merged upstream as per docker/docker#23218 - this can be considered to determine when a container is healthy prior to starting the next in the order. Half of the puzzle solved :)

Looks good. How to implement it on docker-compose.yml?

soullivaneuh on 10 Jun 2016

👍2

Looks good. How to implement it on docker-compose.yml?

The other piece of the puzzle will be having docker-compose watch for healthy containers, and use something like the depends_on syntax mentioned further up in this issue. Will require patches to docker-compose to get things working.

Also note that the health check feature in Docker is currently unreleased, so will probably need to align with a Docker/Docker Compose release cycle.

CpuID on 10 Jun 2016

I wrote a js library that has a method .waitForPort(). Just like it was mentioned before, this might not work for all situations, but could do just fine for majority of use-cases.
See my blog.

DmitryEfimenko on 13 Jun 2016

The HEALTHCHECK merge is great news.

In the meantime, this document describes the problem and some solutions.

aanand on 14 Jun 2016

🎉4

Is this not enough?

https://docs.docker.com/compose/startup-order/

pablofmorales on 28 Jun 2016

@pablofmorales Nope, because depends_on just checks that the container is up.

Some daemons need some extra time to bootstrap themselves and start listening to their assigned ports and addresses, most notably MySQL.

1ma on 28 Jun 2016

I'm still thinking a "READY_ON" declaration is still the best overall. It leaves the decision about when something's ready to the container itself, regardless of image, it's explicit in opt-ing into and the resourcepath (within container) functionality in the Docker Remote API ensures minimal changes needed.

The behaviour of when a container is "up" is the only affect this should have. It'll only report as "up" when the READY_ON file exists.

I think this is 90% of the behaviour that everyone's been discussing. I think "healthcheck" here is getting conflated as 2 different events, but trying to cramp it into one. One is "ready" for chain of events when spinning up infrastructure, the other is "health" so that infrastructure can be kept up.

"ready" is totally an appropriate place for docker to be helping out. As for "health", it's so varied in terms of systems, I think it's up to the container to deal with that.

For a better alternate to healthcheck, you might want to look at something like containerpilot that covers not just health, but service discovery and monitoring too. https://github.com/joyent/containerpilot

konobi on 29 Jun 2016

👍4

Yes, this is an accurate and important distinction. However, how will containers write that file without images becoming significantly more complicated? It seems to me that it would require a wrapper script for every single container that wants to use this.

skorokithakis on 29 Jun 2016

Well, you'd have to kick off a script to initialize the instance anyway... the last thing that script needs to do is touch a file. To me, that seems much easier than attempting to running an exec on a remote machine to do a health check. At least with a touch file, it can be watched, etc. entirely via API passively without needing to enter the context of the container.

konobi on 29 Jun 2016

I agree, but many containers don't use a script, they just install a service like Postgres or Redis and let it start up without watching it.

skorokithakis on 29 Jun 2016

In my case, I'm using Kong API Gateway

Before run the kong container I just check if Cassandra is working with this script

while true; do
    CHECK=`kong-database/check`
    if [[ $CHECK =~ "system.dateof" ]]; then
        break
    fi
    sleep 1;
done

the check file contain this

#!/bin/bash
docker cp cassandra-checker kong-database:/root/
docker exec -i kong-database cqlsh -f /root/cassandra-checker

cassandra-checker is just a simple query

SELECT dateof(now()) FROM system.local ;

pablofmorales on 29 Jun 2016

Sure, but the alternate is a healthcheck, which requires a script that you'd have to write anyway, so there's no overhead difference. It's also an explicit opt-in, which means that you're stating you want this behaviour. As for something that doesn't run a script, you could always have a ready_on path check for a pid file or a unix socket; which wouldn't require a script.

konobi on 30 Jun 2016

That's true, you're right.

skorokithakis on 30 Jun 2016

Checking for the existence of a file may be fine for a lot of cases, but forcing containers to use a startup script when they wouldn't otherwise need one is a nuisance. Why can't there also be checks for other very simple conditions? Especially useful would be waiting until the process is listening on a particular tcp port.

mglasgow42 on 30 Jun 2016

👎1

This idea is opt-in, so there's no forcing of anything. Infact you're being explicit in saying what should be expected.

A tcp port listening may not be sufficient to tell when a container has been initialized as there may be a bunch of setup data that needs run. Hell, if you connect to a postgres container too quickly, even over tcp, you'll get an error stating that the db isn't ready yet.

konobi on 30 Jun 2016

If I understand you correctly, it's "opt-in, or else you can't use this feature". Ergo, if I need this feature and my app doesn't use a pid file, I'm forced to use a startup script.

For MySQL (the OP's case), once it's listening, it's ready. They go to a lot of trouble to ensure that's true, probably for cases much like this one. My take is that there is probably a short list of conditions that could be enumerated such that you could "opt-in" configuring a ready check against any of those conditions. I see no reason it has to be done one and only one way.

mglasgow42 on 30 Jun 2016

👎1

For mysql, once it's listening, it's not ready. In the simple one node case it'll be ready, but if you have more than one node, then it certainly won't be ready yet. I understand what you mean by "one and only one way", but i think as a base abstraction it's just perfect. I see it more as a spot where you can apply whatever tooling you want. Heck, your script could even communicate with external services and have them verify the container, in which case your external services could signal your container agent to write the file. Flexibility ftw.

If you attempt anything thing in this list of "conditions" there will ALWAYS be a case where it doesn't work. However touching a file will always work, since the image knows when it believes it's ready (oh, i have to wait on other hosts, i need for files to be downloaded, i need to make sure $external_service is also available, I span up properly, but for some reason I don't have the correct permissions to the database, why is this image readonly... etc. etc.

These sorts of scripts already exist all over the place... hell it's already been necessary to write these scripts because we haven't had functionality like this before. So dropping in a script like this is minimal, since it's likely a script already exists.

konobi on 30 Jun 2016

Another likely case, is that you'd have something like chef or ansible run against that host and then write the file.

konobi on 30 Jun 2016

If it's a question of a Docker-side check, then something like;

UPCHECK --port=7474 --interval=0.5s --response="Please log in"

For the record I think the file solution has a lot of merit, but it also introduces complexity.
80% of the time, verifying the tcp response would work just fine.

Joshfindit on 30 Jun 2016

well... i suppose:

UPCHECK --file=/tmp/container_is_ready --interval=0.5s --timeout=2m

Is just the same.

konobi on 30 Jun 2016

👍1

I'm actually working on a re-implementation of docker-compose that adds functionality to wait for specific conditions. It uses libcompose (so I don't have to rebuild the docker interaction) and adds a bunch of config commands for this. Check it out here: https://github.com/dansteen/controlled-compose

Note, that the code is finished, but I'm waiting on a couple of upstream issues to be resolved before this will be able to be really used.

dansteen on 30 Jun 2016

👍15

Goss can be used as a fairly flexible shim to delay container startup, I've written a blog post explaining how this can be accomplished with a minor change to your image here:

Kubernetes has the concept of init-containers I wonder if compose/swarm would benefit from a similar concept.

aelsabbahy on 2 Aug 2016

👍3

+1

piotr-s-brainhub on 8 Sep 2016

👎20 😄11

I think it's better to let the service you are exposing on a container decide whether or not it is ready or capable of exposing its service.

For example for a PHP application might depend on MySQL's connection. So on the ENTRYPOINT of PHP Container, I wrote something like this.

#!/bin/bash
cat << EOF > /tmp/wait_for_mysql.php
<?php
\$connected = false;
while(!\$connected) {
    try{
        \$dbh = new pdo( 
            'mysql:host=mysql:3306;dbname=db_name', 'db_user', 'db_pass',
            array(PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION)
        );
        \$connected = true;
    }
    catch(PDOException \$ex){
        error_log("Could not connect to MySQL");
        error_log(\$ex->getMessage());
        error_log("Waiting for MySQL Connection.");
        sleep(5);
    }
}
EOF
php /tmp/wait_for_mysql.php
# Rest of entry point bootstrapping

This way, I can add any logic to ensure that the dependencies of the service I am exposing i.e. php has been resolved.

starx on 20 Sep 2016

👍2

Nabin Nepal schrieb:

I think it's better to let the service you are exposing on a container decide whether or not it is ready or capable of exposing its service.

You can of course hardcode this behavior into every container that uses your
MySql container. But if something in your MySql service changes, then you are
changing all dependent containers, not to speak of the repetetive coding
needed in each. This is not DRY, there is no stable contract and thus it will
lead to brittle systems.

From a software craftsmanship standpoint there should be some kind of
"Container Readiness SPI", which the container developer can implement. On
the other side there should be a "Container Readiness API", which the
services can depend on.

Ulrich

realulim on 20 Sep 2016

👍4

@realulim I agree that any change in the MySQL container has to be replicated or propagated to all affected or linked containers.

However, if the change is about parameters like DB_HOST, DB_NAME, DB_USER and DB_PASSWORD. These could be passed as an ARG (argument) and be shared by all related container. If you using docker-compose.yml file then, the change happens on one file.

And totally agree that having an API to check for container's readiness being the real way of solving this but I still believe that the service being exposed would be a better candidate to declare this.

starx on 20 Sep 2016

a workaround until nc -z localhost 27017; do echo Waiting for MongoDB; sleep 1; done

piotr-s-brainhub on 20 Sep 2016

👍2

@piotr-s-brainhub From the comments above it mentions that having an open port does not mean that the service is ready.

starx on 23 Sep 2016

Can we have optional condition of readiness which can be triggered either by logs, port opening or time delay? Something like:

ready_when:
  in_logs: `MySQL init process done`
  ports_open:
  - 3306

tartakynov on 29 Sep 2016

👍21 👎1

I just realized that waiting for dependency containers to become ready can be easily implemented with tools like ansible. Did anyone use that approach? Can you easily replace docker-compose with ansible/chef/puppet? Any project on github demonstrating this approach?

Note: I understand the importance of writing a robust service that can run even when its dependencies are unavailable at the moment. That's not the question.

korya on 2 Nov 2016

I solved this nowadays with a tool I wrote: https://github.com/betalo-sweden/await

It can wait until a given list of resources is available, and continue with what you want to continue, either by going to the next command implicitly or calling it explicitly.

djui on 2 Nov 2016

👍2

@djui, what does await do while it is waiting for a given resource?

derekmahar on 2 Nov 2016

@derekmahar It polls. It has a default timeout of 60 seconds. Every time it can't see the resource, it will just retry in 1s intervals. Currently it doesn't do concurrent resource detection, so it's sequential, but that turned out to be good enough and can be fixed.

I use it in the following scenario:

I spin up a docker-compose infrastructure and then run an integration test driver. The driver service gets started only after all components in the infrastructure are available, using await; so await eventually calls the driver's run command.

djui on 2 Nov 2016

Here's a way to do this with the new Docker HEALTHCHECK directive using make:

https://gist.github.com/mixja/1ed1314525ba4a04807303dad229f2e1

[UPDATE: updated gist to deal with if the container exits with an error code, as Docker 1.12 somewhat stupidly reports Healthcheck status on the stopped container as "starting"]

mixja on 25 Nov 2016

👍4 🎉2 ❤1

Thanks @mixja, nice solution.

nomasprime on 25 Nov 2016

@mixja, nice solution! That's exactly the functionality I would expect to come out of the box. But now the question is if you start your containers manually, why do you need docker-compose at all?

korya on 25 Nov 2016

For testing I use https://github.com/avast/docker-compose-gradle-plugin and it uses Docker healthcheck as well - no more artificial pauses, faster builds.

sslavic on 25 Nov 2016

👍1

@korya - Docker compose is not really an orchestration tool - it is more of an environment specification and management tool. I use Make to provide procedural style orchestration over Docker Compose and Docker (and other tools as required). The combination of Make, Docker and Docker Compose is very powerful and you can achieve a lot of different scenarios with these building blocks.

mixja on 29 Nov 2016

@mixja well, may be you are right. But as many people pointed in this thread, an orchestration functionality is very needed in test environments, and when there is docker-compose in your toolbox it is very tempting to require this kind of functionality from docker-compose.

Indeed, according to the docs "Compose is a tool for defining and running multi-container Docker applications". Although it does not say that compose is an orchestration tool, I think that from user's perspective (e.g. myself) it is natural to expect from "a tool for defining and running multi-container Docker applications" to support basic dependency management between the managed containers out of the box.

I am not saying that the tool has to support it. All I am saying is that it is very natural to expect it. Otherwise everyone has to come up with their super smart ways to do it. In fact, we use a bash script doing something similar to what your makefile does.

korya on 30 Nov 2016

@mixja @korya I would like to improve my tool await and would like to ask you for feedback what you Makefile versions provide that is missing/more convenient/enabling over await.

It seems the healthcheck+make version seems to be a "global" view, no single container knows the global state (but the makefile does) and await is a "local" view, each enabled container knows (only) what it needs to know, similar to depends_on or links. Furthermore you prefer to ship the container with the tools required for the healthcheck (which sometimes is the default, eg. mysqlshow) and otherwise leave the Dockerfile untouched. Additionally you seem to use docker-compose not mainly for the composition anymore but mainly for the flexible configuration (e.g. docker-compose up -d mysql should be equivalent to docker run -d -e ... -v ... -p ... mysql).

djui on 30 Nov 2016

Hi @djui - it's probably a philosophical point of view, but I think the whole premise of the HEALTHCHECK is promoting the right behaviour - i.e. a container can provide a means of establishing container health, without any external dependencies.

This by no means detracts from the value of having something external verify connectivity, however I would typically run a suite of acceptance tests to cover this as you want to verify connectivity and a whole lot more (i.e. application functionality). Of course you can't generally run this level of testing until a complete environment has been established and the scope of your await tool and other approaches I've used in the past (Ansible playbooks wrapped in an agent container) is really focused on getting the environment setup orchestrated correctly (not the end goal of acceptance testing) and until now was really the only approach available in a Docker world.

With Docker 1.12 we now have a means to introspect the Docker environment and the ability to use well-established constructs (i.e. bash/shell mechanisms) to "await" a certain state, of course as long as our containers have defined their own health checks. I see more value in leveraging the native capabilities of the platform and encouraging container owners to define their own health checks, rather than relying on the historical external (I've started my application process, it's no longer my problem) approach we have had to resort to.

As a related analogy consider AWS CloudFormation and the concept of autoscaling groups and orchestrating rolling updates. How does CloudFormation know if a new instance is "healthy" ready to go and we can kill an old instance and roll in another new instance? Do we write an external healthcheck or do we rely on the instance itself to signal health? The answer is the latter, it means the instance owner can set whatever success criteria is required for his/her instance, and then signal to the overarching orchestration system (i.e. CloudFormation) that the instance is "healthy".

With regards to your comments about Docker Compose - it is a tool that can provide both aspects you mention. The docker-compose.yml part is the desired state compositional environment specification, whilst the various docker-compose commands provide the ability to interact with the environment in a number of ways. For now we need external orchestration tools because fundamentally docker-compose does not perform dependency management between services well enough. As docker-compose gets features like native health check support, the goal of a single docker-compose up command will be more realistic, assuming we'll be to able to specify for example a service must be marked healthy before it is considered "up", which then means our dependant services effectively wait until the dependency is healthy.

mixja on 1 Dec 2016

👍3

@mixja Thanks for the detailed explanation. I think

I see more value in leveraging the native capabilities of the platform

is a good/the main point. Just waiting for Docker Compose to leverage the healthchecks natively either in depends_on or a new key, await. Just wonder if should/will go even a step further than that and basically brings down linked containers if e.g. --abort-on-container-exit is set and a health check during runtime sets the healthcheck label to _unhealthy_.

djui on 1 Dec 2016

Possible temporary workaround for those of you who's looking for delay functionallity to run tests:

I have two docker-compose yml files. One is for testing and another one for development. The difference is just in having sut container in docker-compose.test.yml. sut container runs pytest. My goal was to run test docker-compose and if pytest command in sut container fails, don't run development docker-compose. Here is what I came up with:

# launch test docker-compose; note: I'm starting it with -p argument
docker-compose -f docker-compose.test.yml -p ci up --build -d
# simply get ID of sut container
tests_container_id=$(docker-compose -f docker-compose.test.yml -p ci ps -q sut)
# wait for sut container to finish (pytest will return 0 if all tests passed)
docker wait $tests_container_id
# get exit code of sut container
tests_status=$(docker-compose -f docker-compose.test.yml -p ci ps -q sut | xargs docker inspect -f '{{ .State.ExitCode  }}' | grep -v 0 | wc -l | tr -d ' ')
# print logs if tests didn't pass and return exit code
if [ $tests_status = "1" ] ; then
    docker-compose -f docker-compose.test.yml -p ci logs sut
    return 1
else
    return 0
fi

Now you can use the code above in any function of your choice (mine is called test) and do smth like that:

test
test_result=$?
if [[ $test_result -eq 0 ]] ; then
    docker-compose -f docker-compose.yml up --build -d
fi

Works well for me but I'm still looking forward to see docker-compose support that kind of stuff natively :)

desprit on 9 Dec 2016

+1

blockjon on 18 Dec 2016

👎25 😄2

Perhaps things that are considered outside the core of docker-compose could be supported through allowing plugins? Similar to the request #1341 it seems there is additional functionality that some would find useful but doesn't necessarily fully align with the current vision. Perhaps supporting a plugin system such as proposed by #3905 would provide a way to allow compose focus on a core set of capabilities and if this isn't one then those that want it for their particular use case could write a plugin to handle performing up differently?

It would be nice to be able to have docker-compose act as the entrypoint to all projects we have locally around docker env setup, rather than needing to add a script sitting in front of all to act as the default entrypoint instead of people needing to remember to run the script for the odd cases.

electrofelix on 19 Dec 2016

Here's a way to do it with healthcheck and docker-compose 2.1+:

version: "2.1"
services:
  db:
    image: mysql:5.7
    environment:
      MYSQL_ROOT_PASSWORD: password
    healthcheck:
      test: mysqladmin -uroot -ppassword ping
      interval: 2s
      timeout: 5s
      retries: 30
  web:
    image: nginx:latest # your image
    depends_on:
      db:
        condition: service_healthy

Here docker-compose up will start the web container only after the db container is considered healthy.

Sorry if it was mentionned already, but I don't think a full solution was posted.

Silex on 7 Mar 2017

👍67 ❤13 👎2

Here's a way for PostgreSQL.

Thanks @Silex 👍

version: '2.1'
services:
  db:
    image: postgres:9.6.1
    healthcheck:
      test: "pg_isready -h localhost -p 5432 -q -U postgres"
      interval: 3s
      timeout: 5s
      retries: 5

raccoonyy on 8 Mar 2017

👍25

@Silex sadly with version "3" and this format:

    image: nginx:latest # your image
    depends_on:
      db:
        condition: service_healthy

I get ERROR: The Compose file './docker-compose.yml' is invalid because: depends_on contains an invalid type, it should be an array

vladikoff on 8 Mar 2017

2.1 continues to support it and will not be deprecated. 3.x is mainly for swarm services mode (non local).

  From: Vlad Filippov <[email protected]>

To: docker/compose compose@noreply.github.com
Cc: mbdas meghdoot_b@yahoo.com; Mention mention@noreply.github.com
Sent: Wednesday, March 8, 2017 11:45 AM
Subject: Re: [docker/compose] Is there a way to delay container startup to support dependant services with a longer startup time (#374)

@Silex sadly with version "3" and this format: image: nginx:latest # your image
depends_on:
db:
condition: service_healthy
I get ERROR: The Compose file './docker-compose.yml' is invalid because: services.auth.depends_on contains an invalid type, it should be an array—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

mbdas on 8 Mar 2017

👍1

2.1 continues to support it and will not be deprecated. 3.x is mainly for swarm services mode (non local).

Thanks!

vladikoff on 8 Mar 2017

@vladikoff: more info about version 3 at https://github.com/docker/compose/issues/4305

Basically, it won't be supported, you have to make your containers fault-tolerant instead of relying on docker-compose.

Silex on 8 Mar 2017

👎56 👍4

I believe this can be closed now.

shin- on 21 Mar 2017

👎93

Unfortunatelly, condition is not supported anymore in v3. Here is workaround, that I've found:

website:
    depends_on:
      - 'postgres'
    build: .
    ports:
      - '3000'
    volumes:
      - '.:/news_app'
      - 'bundle_data:/bundle'
    entrypoint: ./wait-for-postgres.sh postgres 5432

  postgres:
    image: 'postgres:9.6.2'
    ports:
      - '5432'

wait-for-postgres.sh:

#!/bin/sh

postgres_host=$1
postgres_port=$2
shift 2
cmd="$@"

# wait for the postgres docker to be running
while ! pg_isready -h $postgres_host -p $postgres_port -q -U postgres; do
  >&2 echo "Postgres is unavailable - sleeping"
  sleep 1
done

>&2 echo "Postgres is up - executing command"

# run the command
exec $cmd

slava-nikulin on 11 May 2017

👍2

@slava-nikulin custom entrypoint is a common practice, it is almost the only (docker native) way how you can define and check all conditions you need before staring your app in a container.

riuvshin on 11 May 2017

Truth is there was a lot of debate and I think the 2.x support for the conditional support to natively integrate with health checks and order the startup was a much needed support. Docker does not support local pod of containers natively and when it does it will have to support something similar again just like kubernetes for example provides the semantics.

Docker 3.x is a series to bring swarm support into compose and hence bunch of options has been dropped keeping the distributed nature in mind.

2.x series preserves the original compose/local topology features.

Docker has to figure out how to merge these 2 versions because forcing swarm onto compose by reducing feature set of compose is not a welcome direction.

On May 10, 2017, at 8:15 PM, Slava Nikulin notifications@github.com wrote:

Unfortunatelly, condition is not supported anymore in v3. Here is workaround, that I've found:

website:
depends_on:
- 'postgres'
build: .
ports:
- '3000'
volumes:
- '.:/news_app'
- 'bundle_data:/bundle'
entrypoint: ./wait-for-postgres.sh postgres 5432

postgres:
image: 'postgres:9.6.2'
ports:
- '5432'
wait-for-postgres.sh:

!/bin/sh

postgres_host=$1
postgres_port=$2
cmd="$@"

wait for the postgres docker to be running

while ! pg_isready -h $postgres_host -p $postgres_port -q -U postgres; do

&2 echo "Postgres is unavailable - sleeping"
sleep 1
done

&2 echo "Postgres is up - executing command"

run the command

exec $cmd
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

mbdas on 11 May 2017

👍34

I was able to do something like this
// start.sh

#!/bin/sh
set -eu

docker volume create --name=gql-sync
echo "Building docker containers"
docker-compose build
echo "Running tests inside docker container"
docker-compose up -d pubsub
docker-compose up -d mongo
docker-compose up -d botms
docker-compose up -d events
docker-compose up -d identity
docker-compose up -d importer
docker-compose run status
docker-compose run testing

exit $?

// status.sh

#!/bin/sh

set -eu pipefail

echo "Attempting to connect to bots"
until $(nc -zv botms 3000); do
    printf '.'
    sleep 5
done
echo "Attempting to connect to events"
until $(nc -zv events 3000); do
    printf '.'
    sleep 5
done
echo "Attempting to connect to identity"
until $(nc -zv identity 3000); do
    printf '.'
    sleep 5
done
echo "Attempting to connect to importer"
until $(nc -zv importer 8080); do
    printf '.'
    sleep 5
done
echo "Was able to connect to all"

exit 0

// in my docker compose file

  status:
    image: yikaus/alpine-bash
    volumes:
      - "./internals/scripts:/scripts"
    command: "sh /scripts/status.sh"
    depends_on:
      - "mongo"
      - "importer"
      - "events"
      - "identity"
      - "botms"

patrickml on 22 Jun 2017

👍12

I have a similar problem but a bit different. I have to wait for MongoDB to start and initialize a replica set.
Im doing all of the procedure in docker. i.e. creating and authentication replica set. But I have another python script in which I have to connect to the primary node of the replica set. I'm getting an error there.

docker-compose.txt
Dockerfile.txt
and in the python script im trying to do something like this
for x in range(1, 4): client = MongoClient(host='node' + str(x), port=27017, username='admin', password='password') if client.is_primary: print('the client.address is: ' + str(client.address)) print(dbName) print(collectionName) break

Am having difficulty in doing so, anyone has any idea?

usamaB on 30 Oct 2017

@patrickml If I don't use docker compose, How I you do it with Dockerfile?
I need 'cqlsh' to execute my build_all.cql. However, 'cqlsh' is not ready...have to wait for 60 seconds to be ready.

cat Dockerfile

FROM store/datastax/dse-server:5.1.8

USER root

RUN apt-get update
RUN apt-get install -y vim

ADD db-scripts-2.1.33.2-RFT-01.tar /docker/cms/
COPY entrypoint.sh /entrypoint.sh

WORKDIR /docker/cms/db-scripts-2.1.33.2/
RUN cqlsh -f build_all.cql

USER dse

=============

Step 8/9 : RUN cqlsh -f build_all.cql
---> Running in 08c8a854ebf4
Connection error: ('Unable to connect to any servers', {'127.0.0.1': error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error: Connection refused")})
The command '/bin/sh -c cqlsh -f build_all.cql' returned a non-zero code: 1

chaicode88 on 26 Apr 2018

Requires= var-lib-libvirt.mount var-lib-libvirt-images-ram.mount

henroFall on 27 Oct 2020

Compose: Is there a way to delay container startup to support dependant services with a longer startup time

Most helpful comment

All 314 comments

A dependency service needs to be available to perform some initialization.

A dependency service needs to be available so that a connection can be opened

!/bin/sh

wait for the postgres docker to be running

run the command

cat Dockerfile

Related issues