Gunicorn: Clarify what/how timeout and graceful_timeout work

Created on 3 Apr 2017  ·  30Comments  ·  Source: benoitc/gunicorn

(Sorry for the monologue here: simple things got complicated and I ended up digging through the stack. Hopefully what I've documented is helpful for the reader, however.)

As I've understood, by default:

  • After 30 seconds (configurable with timeout) of request processing, gunicorn master process sends SIGTERM to the worker process, to initiate a graceful restart.
  • If worker does not shutdown during another 30 seconds (configurable with graceful_timeout), master process sends SIGKILL. Seems like this signal is also sent when the worker _does_ gracefully shutdown during the graceful_timeout period (https://github.com/benoitc/gunicorn/commit/d1a09732256fa8db900a1fe75a71466cf2645ef9).

The questions:

  • Are the signals correct?
  • What actually happens when gunicorn (sync) worker receives these signals? How does it tell WSGI app that the signal was caught and something should happen (ok I assume it just "passes it on")?
  • How does, for example, Flask handle SIGTERM signal - in practice, what happens during request processing? Does it just set a flag for the WSGI application (on werkzeug level) that it should shutdown after the request processing is complete? Or does SIGTERM already somehow affect ongoing request processing - kill IO connections or something to speed up request processing...?

On SIGKILL, I guess request processing is just forcefully aborted.

I could file a tiny PR to improve docs about this, if I get understanding how things actually work.

Discussion Documentation

Most helpful comment

@tuukkamustonen --timeout is not meant as a request timeout. It's meant as a liveness check for workers. For sync workers, this functions as a request timeout because the worker cannot do anything other than process the request. The asynchronous workers heartbeat even while they are handling long running requests, so unless the worker blocks/freezes it won't be killed.

Maybe it would be a good idea for us to change the name if other people find this confusing.

All 30 comments

Hmm, I think https://github.com/benoitc/gunicorn/issues/1236#issuecomment-254059927 confirms my assumptions about SIGTERM simply setting worker to shutdown after request processing is complete (and setting worker not to accept any new connections).

Seems like how I interpreted timeout and graceful_timeout is wrong. Both periods actually refer to time at the start of request processing. So, by default, because both settings are set to 30 seconds, there is no graceful restart enabled. If I do something like --graceful-timeout 15 --timeout 30 that should mean graceful restart is initiated at 15 seconds and worker is force killed at 30 seconds if request did not complete before that.

However, it seems like if response is returned between graceful_timeout and timeout, then the worker is not restarted after all? Shouldn't it?

I tested by app.py:

import time
from flask import Flask

app = Flask(__name__)

@app.route('/foo')
def foo():
    time.sleep(3)
    return 'ok'

Then:

12:51 $ gunicorn app:app --timeout 5 --graceful-timeout 1
[2017-04-03 12:51:37 +0300] [356] [INFO] Starting gunicorn 19.6.0
[2017-04-03 12:51:37 +0300] [356] [INFO] Listening at: http://127.0.0.1:8000 (356)
[2017-04-03 12:51:37 +0300] [356] [INFO] Using worker: sync
[2017-04-03 12:51:37 +0300] [359] [INFO] Booting worker with pid: 359

Then I send curl localhost:8000/foo, which returns after 3 seconds. But nothing happens in gunicorn - I see no trace of graceful restart being initiated or happened?

It seems that on timeout, SystemExit(1,) is thrown, aborting current request processing in Flask. What code or signal generates it, can't say.

This exception gets thrown through Flask stack, and any teardown_request handlers catch it. There's enough time to log something, but if you do time.sleep(1) or something else time consuming in the handler, it gets silently killed. It's as if there was 100-200ms time before the process is actually forcefully terminated and I am wondering what is this delay. It's not graceful timeout, that setting has no impact on the delay. I would expect the process to be just forcefully killed in place, instead of seeing SystemExit being thrown through the stack, but then potentially killing the process mid-air anyway.

In fact, I don't see graceful_timeout doing anything - maybe it's not supported for sync workers, or maybe it doesn't work "stand-alone" (or together with timeout) - only when you manually send SIGTERM?

Also what might be weird is that https://github.com/benoitc/gunicorn/blob/master/gunicorn/arbiter.py#L392 does not check graceful flag at all. I guess https://github.com/benoitc/gunicorn/blob/master/gunicorn/arbiter.py#L390 ensures that self.WORKERS is empty so graceful timeout is not waited when doing non-graceful stop.

@benoitc @tilgovi Care to give a hand here? Hopefully my writings above make sense...

@tuco86 The graceful timeout is only available when you quit the arbiter, upgrade it (USR2), send an HUP signal to the arbiter or send a QUIT signal to the worker. Ie it's only used when the action is normal

The timeout is here to prevent busy workers to block others requests. If they don't notify the arbiter in a time less than the timeout the worker is simply exited and the connection with the client closed.

Uh, alright. Does timeout have any effect when you:

quit the arbiter, upgrade it (USR2), send an HUP signal to the arbiter or send a QUIT signal to the worker

I mean, what if the worker doesn't shutdown in graceful_timeout - will timeout kick in after that and workers get forcefully killed, or is it left up to user to call for SIGQUIT in case they don't gracefully die?

QUIT signal to the worker

I assume you meant TERM here (as QUIT is documented as _quick shutdown_ for both master and workers)?

if the worker doesn't shutdown during the graceful time it will be killed without any other delay.

Of course. Thanks for clarifying things up!

@benoitc Asking in the context of this old ticket - what does the last sentence in timeout documentation actually mean?

Generally set to thirty seconds. Only set this noticeably higher if you’re sure of the repercussions for sync workers. For the non sync workers it just means that the worker process is still communicating and is not tied to the length of time required to handle a single request.

Not being a native english speaker, I have hard time understanding this. Does it mean that timeout is not supported for non-sync workers (because that's what I seem to be witnessing: I'm using gthread workers and timeout is not kickin in and killing too slow requests)?

@tuukkamustonen --timeout is not meant as a request timeout. It's meant as a liveness check for workers. For sync workers, this functions as a request timeout because the worker cannot do anything other than process the request. The asynchronous workers heartbeat even while they are handling long running requests, so unless the worker blocks/freezes it won't be killed.

Maybe it would be a good idea for us to change the name if other people find this confusing.

@tilgovi timeout is just fine, although something like worker_timeout might be more descriptive. I just got initially confused because timeout and graceful_timeout are declared next to each other in the documentation, so my brain assumed they are tightly connected, while they are actually not.

For sync workers, this functions as a request timeout because the worker cannot do anything other than process the request. The asynchronous workers heartbeat even while they are handling long running requests, so unless the worker blocks/freezes it won't be killed.

Would you have an example of when timeout kicks in with non-sync workers? Is it something that should never happen, really - maybe only if there's a bug that causes the worker to block/freeze?

That's correct. An asynchronous worker that relies on an event loop core might perform a CPU intensive procedure that doesn't yield within the timeout.

Not only a bug, in other words. Although, sometimes it may indicate a bug, such as a call to a blocking I/O function when an asyncio protocol would be more appropriate.

Getting stuck in CPU intensive task is a good example, thanks.

Calling blocking I/O in async code is one as well, but I'm not sure how it applies to this context - I'm running a traditional Flask app with blocking code but running it with an async worker (gthread) without any sort of monkey patching. And it works just ok. I know this isn't really in the context of this ticket anymore, but doesn't mixing and matching async/sync code like this cause problems?

Also, what is the heartbeat interval? What would be a sane value to use for timeout with non-sync workers?

The gthread worker is not asynchronous, but it does have a main thread for the heartbeat so it won't timeout either. In the case of that worker, you probably won't see a timeout unless the worker is very overloaded or, more likely, you call a C extension module the doesn't release the GIL.

You probably don't have to change the timeout unless you start seeing worker timeouts.

Alright. Just one more thing:

The gthread worker is not asynchronous

It may be a little bit confusing that gthread worker is not async but is listed as "AsyncIO" workers at http://docs.gunicorn.org/en/stable/design.html#asyncio-workers. Other than that, using "threads" doesn't need asyncio, so that also raises questions in the reader. Just saying this from a perspective of a naive user, I'm sure it's all technically well-founded.

In a nutshell, the gthread worker is implemented with asyncio lib but it spawns threads to handle sync code. Correct me if wrong.

Glad you asked!

The threaded worker does not use asyncio and does not inherit from the base asynchronous worker class.

We should clarify the documentation. I think it may have been listed as async because the worker timeout is handled concurrently, making it behave more like the async workers than the sync worker with respect to ability to handle long requests and concurrent requests.

It would be great to clarify the documentation and make it more accurately describe all the workers.

yeah the gthreads worker shouldn't listed in asyncio worker. maybe having a section that describe the design of each workers is better?

Re-opening this so we can track it as work to clarify the section on worker types and timeouts.

@tilgovi

--timeout is not meant as a request timeout. It's meant as a liveness check for workers. For sync workers, this functions as a request timeout because the worker cannot do anything other than process the request. The asynchronous workers heartbeat even while they are handling long running requests, so unless the worker blocks/freezes it won't be killed.

Is there a request timeout option available for async workers? In other words how to make arbiter kill a worker that did not process a request within a specified time?

@aschatten there is not, unfortunately. See also #1658.

kill a worker that did not process a request within a specified time

As a worker may be processing multiple requests concurrently, killing whole worker because one request times out sounds pretty extreme. Wouldn't that result in all the other requests getting killed in vain?

I recall uWSGI was planning to introduce thread-based killing in 2.1 or so, though probably even that applies for sync/threaded workers only (and my recollection on this is vague).

As a worker may be processing multiple requests concurrently, killing whole worker because one request times out sounds pretty extreme. Wouldn't that would result in all the other requests getting killed in vain?

The approach can be the same as for max_request, where there is a separate implementation for every worker type.

We are working on a release this week, at which point it _may_ be time to branch for R20, where we plan to tackle a few major things. That might be the right time to make the current timeout into a proper request timeout for every worker type.

Commenting here instead of filing a separate issue since I'm trying to understand how timeout is supposed to work and I'm not sure whether this is a bug or not.

The IMO unexpected behaviour I'm seeing is this:

Every max-requests'th request (the one after which worker will be restarted) is timeouted, whereas the other requests are completed successfully. In the below example 4 requests are performed, requests 1, 2, and 4 are successful, whereas request 3 fails.

Relevant configuration:

  • gthread worker
  • serving request takes longer than timeout
  • max-requests is non-zero
import time

def app(environ, start_response):
    start_response('200 OK', [('Content-type', 'text/plain; charset=utf-8')])
    time.sleep(5)
    return [b"Hello World\n"]

gunicorn:

gunicorn --log-level debug -k gthread -t 4 --max-requests 3 "app:app"
...
[2018-02-08 10:11:59 +0200] [28592] [INFO] Starting gunicorn 19.7.1
[2018-02-08 10:11:59 +0200] [28592] [DEBUG] Arbiter booted
[2018-02-08 10:11:59 +0200] [28592] [INFO] Listening at: http://127.0.0.1:8000 (28592)
[2018-02-08 10:11:59 +0200] [28592] [INFO] Using worker: gthread
[2018-02-08 10:11:59 +0200] [28595] [INFO] Booting worker with pid: 28595
[2018-02-08 10:11:59 +0200] [28592] [DEBUG] 1 workers
[2018-02-08 10:12:06 +0200] [28595] [DEBUG] GET /
[2018-02-08 10:12:11 +0200] [28595] [DEBUG] Closing connection.
[2018-02-08 10:12:15 +0200] [28595] [DEBUG] GET /
[2018-02-08 10:12:20 +0200] [28595] [DEBUG] Closing connection.
[2018-02-08 10:12:23 +0200] [28595] [DEBUG] GET /
[2018-02-08 10:12:23 +0200] [28595] [INFO] Autorestarting worker after current request.
[2018-02-08 10:12:27 +0200] [28592] [CRITICAL] WORKER TIMEOUT (pid:28595)
[2018-02-08 10:12:27 +0200] [28595] [INFO] Worker exiting (pid: 28595)
[2018-02-08 10:12:28 +0200] [28595] [DEBUG] Closing connection.
[2018-02-08 10:12:28 +0200] [28599] [INFO] Booting worker with pid: 28599
[2018-02-08 10:12:32 +0200] [28599] [DEBUG] GET /
[2018-02-08 10:12:37 +0200] [28599] [DEBUG] Closing connection.
^C[2018-02-08 10:12:39 +0200] [28592] [INFO] Handling signal: int

Client:

[salonen@mac ~]$ curl http://127.0.0.1:8000
Hello World
[salonen@mac ~]$ curl http://127.0.0.1:8000
Hello World
[salonen@mac ~]$ curl http://127.0.0.1:8000
curl: (52) Empty reply from server
[salonen@mac ~]$ curl http://127.0.0.1:8000
Hello World

what should be the plan there? I have in mind the following:

  • [ ] update worker description (if still needed)
  • [ ] document the protocol to detect dead or blocked workers

Should it be 20.0 or could we postpone it?

postponing.

Hey, so this won't be part of 20.0?

That might be the right time to make the current timeout into a proper request timeout for every worker type.

clarified. @lucas03 it's unclear what a request timeout is there. please open a ticket if you need something specific?.

Was this page helpful?
0 / 5 - 0 ratings