Celery: Can preforked worker processes ignore TERM signals?

Created on 4 Jun 2018 · 3Comments · Source: celery/celery

General

Affected version: latest (4.1.1)

Expected behavior

preforked workers ignore sigterm signal - only parent process responsible for the warm shutdown reacts to it and gracefully shuts down the child processes.

Current behavior

When preforked worker processes get a sigterm, they immediately shut down (e.g. kill <pid>):

celery_1        | [2018-06-04 09:19:57,376: ERROR/MainProcess] Process 'ForkPoolWorker-2' pid:26 exited with 'signal 15 (SIGTERM)'

When a sigterm reaches the parent worker process, warm shutdown is performed. If in the meantime the preforked workers also got a term signal, the tasks are just killed.

Relevance

I cannot find a way to do a proper shutdown celery in a docker container with an init system.
(see mailing list: https://groups.google.com/forum/#!topic/celery-users/9UF_VyzRt8Q). It appears that I have to make sure that the term signal only reaches the parent process, but not any of the preforked workers. This seems to be very difficult when running a bash script with several celery instances (e.g. beat, and two queues)

Possible solution

Is it possible to add a feature that allows preforked worker processes ignore the sigterm?

Deployment Prefork Workers Pool

Source

croth1

Most helpful comment

Thanks @georgepsarakis and @xirdneh for your helpful suggestions. I managed to get a working setup with supervisord! 🎉
Having one worker per container is a very good suggestion, too! Thanks a lot!

croth1 on 4 Jun 2018

🎉3

All 3 comments

In order to have proper signal propagation when starting multiple processes inside a Docker container, you probably need a process manager such as supervisord.

georgepsarakis on 4 Jun 2018

For more information here's an overview of how docker stops stuff: https://www.ctl.io/developers/blog/post/gracefully-stopping-docker-containers/#docker-stop

Also, if you're already using docker I would recommend firing one worker per container.
It's easier to manage and containers do not have that much overhead. Also easier to do auto scaling when using tools like k8s.

One last thing. If you really want to roll your own bash script to manage multiple workers you would have to fire up each worker and then sleep indefinitely until a SIGTERM comes along.
When that happens you can go and get the workers PID and then gracefully stop each one.
Which is basically what supervisord does as @georgepsarakis pointed out.