Gunicorn: Gunicorn process fails to stop by Supervisor's stop command and keeps locking ports

Created on 26 Apr 2013 · 17Comments · Source: benoitc/gunicorn

I am having issue very similar to #291. Application processes (gunicorn based) are managed by supervisor and after fresh deploy is done, supervisored processes are ordered to restart.

In most cases, this works as expected but for one specific application this always fails and prevents application to restart properly. The old gunicorn processes hang up blocking ports for new ones. After few minutes, they finally die but still, this is very inconvenient since it causes unavailability in the app for too long time.

Supervisor config for the app is following:

[program:iw2_admin]
command=/srv/fragaria/iw2/bin/gunicorn --name=gunicorn_iw2_admin --bind=10.0.0.50:13000 --workers=2 --max-requests=5000 --timeout=500 --user=www-data --group=www-data --worker-class sync --worker-connections 1000  iw2.wsgi:application
environment=DJANGO_SETTINGS_MODULE='iw2.admin.settings',LANG='cs_CZ.utf8',LC_ALL='cs_CZ.UTF-8',LC_LANG='cs_CZ.UTF-8'
redirect_stderr=True
stdout_logfile=/var/log/supervisor/iw2_admin.log

Gunicorn's log doesn't show anything interesting but this:

2013-04-26 11:35:16 [30352] [INFO] Starting gunicorn 0.15.0
2013-04-26 11:35:16 [30352] [ERROR] Connection in use: ('10.0.0.50', 13000)
2013-04-26 11:35:16 [30352] [ERROR] Retrying in 1 second.
2013-04-26 11:35:17 [30352] [ERROR] Connection in use: ('10.0.0.50', 13000)
2013-04-26 11:35:17 [30352] [ERROR] Retrying in 1 second.
2013-04-26 11:35:18 [30352] [ERROR] Connection in use: ('10.0.0.50', 13000)
2013-04-26 11:35:18 [30352] [ERROR] Retrying in 1 second.
2013-04-26 11:35:19 [30352] [ERROR] Connection in use: ('10.0.0.50', 13000)
2013-04-26 11:35:19 [30352] [ERROR] Retrying in 1 second.
2013-04-26 11:35:20 [30352] [INFO] Listening at: http://10.0.0.50:13000 (30352)
2013-04-26 11:35:20 [30352] [INFO] Using worker: sync
2013-04-26 11:35:20 [30355] [INFO] Booting worker with pid: 30355
2013-04-26 11:35:20 [30356] [INFO] Booting worker with pid: 30356

The ERRORs are repeated for quite a while as mentioned above...

Env:
Python 2.6.6
Debian squeeze
Gunicorn 0.15.0

I might try to fix it by using fresh version of gunicorn, do you that it might be the solution? Don't wanna risk upgrading if it won't help anyway.

FeaturCore

Source

xaralis

Most helpful comment

@a2 it's because your bash script (via bash shell process) gets supervised, not gunicorn. Try using exec gunicorn ... in your script

fillest on 7 Jul 2014

👍14 🎉5 ❤1

All 17 comments

Any progress on this?

xaralis on 29 Apr 2013

Same issue. Bump!

a2 on 8 May 2013

@a2 what is your command line & version ?

@xaralis using the 0.17.x version is always better yes. This is actually the version supported. Anyway what if you add the setting stopsignal = QUIT in your program section?

benoit

benoitc on 8 May 2013

@benoitc

Ubuntu 12.04.2 LTS
gunicorn 0.17.4
Python 2.7.3

supervisor config:

[program:gunicorn]
command=/srv/example.com/www/start.sh
process_name=%(program_name)s
directory=/srv/example.com/www
user=web
autostart=true
autorestart=true
redirect_stderr=true
stopsignal=KILL

[program:watchmedo]
command=/usr/local/bin/watchmedo shell-command --patterns "*.py;*.txt;*.scss" --recursive --command='/usr/local/bin/supervisorctl restart gunicorn' /srv/example.com/www
process_name=%(program_name)s
directory=/srv/example.com/www
autostart=true
autorestart=true
redirect_stderr=true

start.sh:

#!/bin/bash
/usr/local/bin/compass compile --boring --trace
source venv/bin/activate
pip install -r requirements.txt
if [ -e "env.sh" ]
then
    source env.sh
fi
gunicorn app:app -c gunicorn.conf.py

I've gathered that the problem is that some of the worker processes are still using port 7200 but aren't killed by supervisor when the process restarts? I really have no idea. I'm sort of a noob but I'm trying to learn quickly.

Thanks so much for your speedy response, Benoit.

a2 on 8 May 2013

@a2 cam you replace the line stopsignal=KILL by stopsignal=QUIT in your config and let me know about the results?

benoitc on 8 May 2013

@benoitc If I change that line it makes no difference. If I touch a file monitored by watchmedo, then I get the same errors because the gunicorn process was restarted. I have to stop supervisord, pkill gunicorn, and then start supervisord to stop the errors.

a2 on 8 May 2013

It sounds like you have long lived connections and the high timeout combined with graceful restart is causes workers to exit slowly.

Try TERM or INT instead of QUIT.

tilgovi on 11 May 2013

@tilgovi Same result.

a2 on 11 May 2013

what do you mean by restarting supervisor? Sending an HUP? in that case I
remember there is a setting to send an hup signal on reload. Or maybe this
is just in gaffer.

Also if you can i would use dystemd that can pass a socket to gunicorn in
latest version.

benoit

On Saturday, May 11, 2013, Alexsander Akers wrote:

@tilgovi https://github.com/tilgovi Same result.

—
Reply to this email directly or view it on GitHubhttps://github.com/benoitc/gunicorn/issues/520#issuecomment-17752060
.

benoitc on 15 May 2013

signals have been switched in 81241907ffcf94517ffa14b8427205906b61b540 . closing this issue, thanks for the feedback!

benoitc on 9 Mar 2014

@a2 it's because your bash script (via bash shell process) gets supervised, not gunicorn. Try using exec gunicorn ... in your script

fillest on 7 Jul 2014

👍14 🎉5 ❤1

I know this is an old thread, but for anyone else who has landed here and is using make + gunicorn + supervisor, the above comment is the solution. Supervisor needs the specific gunicorn command to be able to kill the process- providing a make command that runs gunicorn will not kill the process. Something to do with make having its own shell, maybe.