Gunicorn: Sockets

Created on 5 Nov 2018  ·  87Comments  ·  Source: benoitc/gunicorn

Service is running on a kubernetes pod, and out of nowhere and without any specific causes, it happens off and on:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/gunicorn/workers/sync.py", line 134, in handle
    req = six.next(parser)
  File "/usr/local/lib/python3.6/site-packages/gunicorn/http/parser.py", line 41, in __next__
    self.mesg = self.mesg_class(self.cfg, self.unreader, self.req_count)
  File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 181, in __init__
    super(Request, self).__init__(cfg, unreader)
  File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 54, in __init__
    unused = self.parse(self.unreader)
  File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 230, in parse
    self.headers = self.parse_headers(data[:idx])
  File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 74, in parse_headers
    remote_addr = self.unreader.sock.getpeername()
OSError: [Errno 107] Transport endpoint is not connected
[2018-11-04 17:57:55 +0330] [31] [ERROR] Socket error processing request.
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/gunicorn/workers/sync.py", line 134, in handle
    req = six.next(parser)
  File "/usr/local/lib/python3.6/site-packages/gunicorn/http/parser.py", line 41, in __next__
    self.mesg = self.mesg_class(self.cfg, self.unreader, self.req_count)
  File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 181, in __init__
    super(Request, self).__init__(cfg, unreader)
  File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 54, in __init__
    unused = self.parse(self.unreader)
  File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 230, in parse
    self.headers = self.parse_headers(data[:idx])
  File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 74, in parse_headers
    remote_addr = self.unreader.sock.getpeername()
OSError: [Errno 107] Transport endpoint is not connected
Investigation help wanted

Most helpful comment

I sent in a pull request with a fix that I've been running in production for a few days now. The bug is caused by a race condition, where the socket can be closed (by either client, OS, etc.) before getpeername() is called, so it correctly raises an exception. However, gunicorn isn't catching this exception, so it's bubbling up and crashing the server. My fix is just to catch the exception and raise it as a NoMoreData exception, which is already handled by other code further up the stack. This prevents a poorly-timed disconnection from crashing the server.

All 87 comments

Is the anything interesting upstream of gunicorn in your pod, like a reverse-proxy, nginx?

No, no proxies, no nginx

I'm having the exact same problem :confused:

Upstream we have HAProxy and on its HTTP log format, the session state at disconnection (see http://cbonte.github.io/haproxy-dconv/1.8/configuration.html#8.5) it logs those errors as CH-- meaning:

  • C : the TCP session was unexpectedly aborted by the client.
  • H : the proxy was waiting for complete, valid response HEADERS from the server (HTTP only).

So, if I understand that correctly, the client closed the connection while gunicorn was still sending the response.

Any clues what makes the client abort? Was it waiting a long time for gunicorn to send complete response headers?

@javabrett it does not seem like that, at least on the few log messages I looked up, it is mostly images or other assets, so it should not be taking much time.

The client might have closed the browser or any other action that would abruptly close the connection? :thinking:

@gforcada are you using the proxy protocol with haproxy?

@benoitc not that I'm aware of

anyone get anywhere with this? I don't have much to contribute except the exact same error.

My configuration consists of a loadbalancer that's being used to terminate SSL and forward requests to a django app running in a docker container. I'm not sure what the LB is implemented with - its a Digital Ocean product.

I'm fairly certain it related to the load balancer because I have the same app running in another container that isn't behind an LB and its never had this problem.

Any ideas on the root cause and how to prevent?

I wonder if there's any action here. If this is a regular client disconnect, we could maybe silence the error and maybe log a disconnect in the access log, but otherwise I'm not sure what to do.

I just had the same error which crashed our monitoring webserver:

[2019-06-10 11:38:25 +0200] [27989] [CRITICAL] WORKER TIMEOUT (pid:17906)
[2019-06-10 11:38:25 +0200] [17906] [INFO] Worker exiting (pid: 17906)
[2019-06-10 11:38:25 +0200] [17924] [INFO] Booting worker with pid: 17924
[2019-06-10 11:38:37 +0200] [17922] [ERROR] Socket error processing request.
Traceback (most recent call last):
  File "/home/off1user/.pyenv/versions/3.6.1/lib/python3.6/site-packages/gunicorn/workers/sync.py", line 134, in handle
    req = six.next(parser)
  File "/home/off1user/.pyenv/versions/3.6.1/lib/python3.6/site-packages/gunicorn/http/parser.py", line 41, in __next__
    self.mesg = self.mesg_class(self.cfg, self.unreader, self.req_count)
  File "/home/off1user/.pyenv/versions/3.6.1/lib/python3.6/site-packages/gunicorn/http/message.py", line 181, in __init__
    super(Request, self).__init__(cfg, unreader)
  File "/home/off1user/.pyenv/versions/3.6.1/lib/python3.6/site-packages/gunicorn/http/message.py", line 54, in __init__
    unused = self.parse(self.unreader)
  File "/home/off1user/.pyenv/versions/3.6.1/lib/python3.6/site-packages/gunicorn/http/message.py", line 230, in parse
    self.headers = self.parse_headers(data[:idx])
  File "/home/off1user/.pyenv/versions/3.6.1/lib/python3.6/site-packages/gunicorn/http/message.py", line 74, in parse_headers
    remote_addr = self.unreader.sock.getpeername()
OSError: [Errno 107] Transport endpoint is not connected
[2019-06-10 11:38:47 +0200] [27989] [CRITICAL] WORKER TIMEOUT (pid:17920)
[2019-06-10 11:38:47 +0200] [17920] [INFO] Worker exiting (pid: 17920)

I had the same with pod running docker image dpage/pgadmin4:4.2

OSError: [Errno 107] Socket not connected
[2019-06-14 12:20:32 +0000] [77] [ERROR] Socket error processing request.
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/gunicorn/workers/gthread.py", line 274, in handle
req = six.next(conn.parser)
File "/usr/local/lib/python3.6/site-packages/gunicorn/http/parser.py", line 41, in __next__
self.mesg = self.mesg_class(self.cfg, self.unreader, self.req_count)
File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 181, in __init__
super(Request, self).__init__(cfg, unreader)
File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 54, in __init__
unused = self.parse(self.unreader)
File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 230, in parse
self.headers = self.parse_headers(data[:idx])
File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 74, in parse_headers
remote_addr = self.unreader.sock.getpeername()

I'm getting this error occasionally on hosted Google Cloud Run. Below is a simplified version of our container definition:

FROM ubuntu:18.04

ENV APP_HOME /app
WORKDIR $APP_HOME

RUN apt-get update \
  && apt-get install --no-install-recommends -y python3 python3-pip \
  && rm -rf /var/lib/apt/lists/*

RUN pip3 install --compile --no-cache-dir --upgrade pip setuptools

RUN mkdir invoice_processing && \
    pip install --compile --disable-pip-version-check --no-cache-dir flask gunicorn

COPY app.py ./
CMD exec gunicorn --bind :$PORT --workers 1 --threads 1 app:app

Stackdriver shows the following stacktrace:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/gunicorn/arbiter.py", line 583, in spawn_worker
    worker.init_process()
  File "/usr/local/lib/python3.6/dist-packages/gunicorn/workers/base.py", line 134, in init_process
    self.run()
  File "/usr/local/lib/python3.6/dist-packages/gunicorn/workers/sync.py", line 124, in run
    self.run_for_one(timeout)
  File "/usr/local/lib/python3.6/dist-packages/gunicorn/workers/sync.py", line 68, in run_for_one
    self.accept(listener)
  File "/usr/local/lib/python3.6/dist-packages/gunicorn/workers/sync.py", line 27, in accept
    client, addr = listener.accept()
  File "/usr/lib/python3.6/socket.py", line 205, in accept
    fd, addr = self._accept()
OSError: [Errno 107] Transport endpoint is not connected

Same issue as OP here. Using Google Cloud Platform, Python 3.7, gunicorn 19.9.0

Traceback (most recent call last):
  File "/env/lib/python3.7/site-packages/gunicorn/workers/sync.py", line 134, in handle
    req = six.next(parser)
  File "/env/lib/python3.7/site-packages/gunicorn/http/parser.py", line 41, in __next__
    self.mesg = self.mesg_class(self.cfg, self.unreader, self.req_count)
  File "/env/lib/python3.7/site-packages/gunicorn/http/message.py", line 181, in __init__
    super(Request, self).__init__(cfg, unreader)
  File "/env/lib/python3.7/site-packages/gunicorn/http/message.py", line 54, in __init__
    unused = self.parse(self.unreader)
  File "/env/lib/python3.7/site-packages/gunicorn/http/message.py", line 230, in parse
    self.headers = self.parse_headers(data[:idx])
  File "/env/lib/python3.7/site-packages/gunicorn/http/message.py", line 74, in parse_headers
    remote_addr = self.unreader.sock.getpeername()
OSError: [Errno 107] Transport endpoint is not connected
 timestamp:  "2019-07-30T15:23:55.435130Z"

I'm having the exact same problem 😕

Exact same problem as GAEfan. Running a Flask app with Python 3.7 in App Engine Standard Env.

Same issue here

I'm having the same issue running Django app with Python 3.7 in Google App Engine.

Traceback (most recent call last): File "/env/lib/python3.7/site-packages/gunicorn/workers/sync.py", line 134, in handle req = six.next(parser) File "/env/lib/python3.7/site-packages/gunicorn/http/parser.py", line 41, in __next__ self.mesg = self.mesg_class(self.cfg, self.unreader, self.req_count) File "/env/lib/python3.7/site-packages/gunicorn/http/message.py", line 181, in __init__ super(Request, self).__init__(cfg, unreader) File "/env/lib/python3.7/site-packages/gunicorn/http/message.py", line 54, in __init__ unused = self.parse(self.unreader) File "/env/lib/python3.7/site-packages/gunicorn/http/message.py", line 230, in parse self.headers = self.parse_headers(data[:idx]) File "/env/lib/python3.7/site-packages/gunicorn/http/message.py", line 74, in parse_headers remote_addr = self.unreader.sock.getpeername() OSError: [Errno 107] Transport endpoint is not connected

Same issue running GAE python 3.7 gunicorn and fastapi/uvicorn.

Same issue Google Cloud Run

which kind of request are we talking about?

same issue in google app engine. POST request. happens inconsistently. Flask app. @benoitc please let me know what info would be useful and i can post.

Same issue as well, Google App Engine, POST request too, Flask app. It seemed to have started when I changed to a custom entrypoint code instead of letting the default one. Custom entrypoint is the following (in Google App Engine you set it inside an app.yaml file):

gunicorn -b :$PORT --timeout 1200 server.main:app

Default entrypoint is not setting anything (don't know what is used as default entrypoint though).

Not sure if it started because of that, but I noticed this when I made this change (among other changes).

No, no proxies, no nginx

I was using gunicorn without nginx. I was getting the same issue. My setup is running on Openshift.
gunicorn --chdir /src/app wsgi:application --bind 0.0.0.0:8000 --workers 4 --timeout 180 -k gevent

https://stackoverflow.com/questions/58389201/gunicorn-is-failing-with-oserror-errno-107-transport-endpoint-is-not-connecte

the question stand

Same issue as well, Google App Engine, POST request too, Flask app. It seemed to have started when I changed to a custom entrypoint code instead of letting the default one. Custom entrypoint is the following (in Google App Engine you set it inside an app.yaml file):

gunicorn -b :$PORT --timeout 1200 server.main:app

Default entrypoint is not setting anything (don't know what is used as default entrypoint though).

Not sure if it started because of that, but I noticed this when I made this change (among other changes).

what do you mean by entry point? can you post a debug log and the way the request is done? (raw http would help)

the question stand

Same issue as well, Google App Engine, POST request too, Flask app. It seemed to have started when I changed to a custom entrypoint code instead of letting the default one. Custom entrypoint is the following (in Google App Engine you set it inside an app.yaml file):
gunicorn -b :$PORT --timeout 1200 server.main:app
Default entrypoint is not setting anything (don't know what is used as default entrypoint though).
Not sure if it started because of that, but I noticed this when I made this change (among other changes).

what do you mean by entry point? can you post a debug log and the way the request is done? (raw http would help)

I think he's referring to the fact that you're explicitly specifying the app path the _gunicorn_ should import-find and run, like the server.main:app in his example.

L.E.: Maybe the updated example over here helps: https://github.com/GoogleCloudPlatform/python-docs-samples/tree/master/appengine/standard_python37/hello_world (so basically you have to let the service handle how the server should be started)

Firstly, @benoitc THANK YOU. Your work is awesome.

I'm also experiencing this same issue on Google Cloud Run w/gunicorn. I'm posting what I have, though it's likely not unique, perusing the above. I'm running a Flask app with Gunicorn as the server (and no proxy) in a Docker container.

The traceback (from GC console):

  File "/usr/local/lib/python3.7/site-packages/gunicorn/arbiter.py", line 583, in spawn_worker
    worker.init_process()
  File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/gthread.py", line 104, in init_process
    super(ThreadWorker, self).init_process()
  File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/base.py", line 134, in init_process
    self.run()
  File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/gthread.py", line 211, in run
    callback(key.fileobj)
  File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/gthread.py", line 127, in accept
    sock, client = listener.accept()
  File "/usr/local/lib/python3.7/socket.py", line 212, in accept
    fd, addr = self._accept()
OSError: [Errno 107] Transport endpoint is not connected

And Google's parsed output of the above:

OSError: [Errno 107] Transport endpoint is not connected
at accept (/usr/local/lib/python3.7/socket.py:212)
at accept (/usr/local/lib/python3.7/site-packages/gunicorn/workers/gthread.py:127)
at run (/usr/local/lib/python3.7/site-packages/gunicorn/workers/gthread.py:211)
at init_process (/usr/local/lib/python3.7/site-packages/gunicorn/workers/base.py:134)
at init_process (/usr/local/lib/python3.7/site-packages/gunicorn/workers/gthread.py:104)
at spawn_worker (/usr/local/lib/python3.7/site-packages/gunicorn/arbiter.py:583)

If there is anything else I can provide or do to help here, please let me know.

A PR would be welcome to handle ENOTCONN gracefully for all the workers. Please post here if you start working on this and I would be happy to review a PR. I'm sure some on this thread would be happy to help test a branch.

Same issue, Google App Engine, gunicorn serving a Django app, a small percentage of requests die like this:

image

entrypoint: gunicorn -b :$PORT wsgi_api:application

A PR would be welcome to handle ENOTCONN gracefully for all the workers. Please post here if you start working on this and I would be happy to review a PR. I'm sure some on this thread would be happy to help test a branch.

I am one of those on this thread happy to help test, so please ping me if I can assist.

A PR would be welcome to handle ENOTCONN gracefully for all the workers. Please post here if you start working on this and I would be happy to review a PR. I'm sure some on this thread would be happy to help test a branch.

I am one of those on this thread happy to help test, so please ping me if I can assist.

I'm happy to help testing the PR. If need any assistance please ping me.

I'm happy to help testing the PR. If need any assistance please ping me.

A PR would be welcome to handle ENOTCONN gracefully for all the workers. Please post here if you start working on this and I would be happy to review a PR. I'm sure some on this thread would be happy to help test a branch.

The root cause of this issue with latest version of gunicorn==19.9.0. I was shifted to use the older-version of gunicorn==19.7.1. I was able to run without any issue. Please try with the older version.

rather try the last master. 20.0 will finally come today i have been side tracked.

How your health check endpoint looks? How do you respond to it, are you closing the connection? do you set length header and co? i'm not sure why a health check is done via post. sounds weird...

@cmin764 does fflask set some default header? i will try but it would be interresting to ser ho the response looks

@benoitc When I was using the gunicorn==19.9.0 endpoint health check is really bad. Database connections are waiting for the longer time. Entire loading the application is goes down.

older gunicorn==19.7.1 Endpoint are breaking Endpoint health check is looks good. I was not closing any connections and not set length of headers.

I will also test the latest version 20.0.

No, no proxies, no nginx

I was using gunicorn without nginx. I was getting the same issue. My setup is running on Openshift.
gunicorn --chdir /src/app wsgi:application --bind 0.0.0.0:8000 --workers 4 --timeout 180 -k gevent

https://stackoverflow.com/questions/58389201/gunicorn-is-failing-with-oserror-errno-107-transport-endpoint-is-not-connecte

with older version gunicorn=19.7.1I was not able to run with thegevent`. I have changed my gunicorn command

gunicorn apps.wsgi:application --bind 0.0.0.0:8000 --workers 4 --timeout 180

i am looking at the changes since this version to see if there is any reason for it. thanks for the feedback!

Using 19.7.1 (downgrade from most recent) worked in google app engine environment with push queues feeding workers, and workers talking to each other over http.

Using 19.7.1 (downgrade from most recent) worked in google app engine environment with push queues feeding workers, and workers talking to each other over http.

This is my use case. I’ll be giving this a shot.

did you try the latest version?

Tried latest 20.0.0 on Openshift (openshift v3.11.135, kubernetes v1.11.0) - same error occurs. What I observed error is trigered on higher load (integration tests running 20 parralel workers). Raising number of pods reduces occurence of error, leaving single pod results in guaranteed error. It's 3 sync workers config. 19.7.1 just shows no error on pod logs, but external consumer experiences same unexpected EOF on connection like with newest version. So degrading version does not help.

2019-11-12 16:08:56,982 ERROR gunicorn.error glogging 277 glogging.py Socket error processing request.
Traceback (most recent call last):
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/gunicorn/workers/sync.py", line
134, in handle
req = six.next(parser)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/gunicorn/http/parser.py", line 41, in __next__
self.mesg = self.mesg_class(self.cfg, self.unreader, self.req_count)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/gunicorn/http/message.py", line 181, in __init__
super(Request, self).__init__(cfg, unreader)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/gunicorn/http/message.py", line 54, in __init__
unused = self.parse(self.unreader)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/gunicorn/http/message.py", line 230, in parse
self.headers = self.parse_headers(data[:idx])
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/gunicorn/http/message.py", line 74, in parse_headers
remote_addr = self.unreader.sock.getpeername()
OSError: [Errno 107] Transport endpoint is not connected

Tried latest 20.0.0 on Openshift (openshift v3.11.135, kubernetes v1.11.0) - same error occurs. What I observed error is trigered on higher load (integration tests running 20 parralel workers). Raising number of pods reduces occurence of error, leaving single pod results in guaranteed error. It's 3 sync workers config. 19.7.1 just shows no error on pod logs, but external consumer experiences same unexpected EOF on connection like with newest version. So degrading version does not help.

2019-11-12 16:08:56,982 ERROR gunicorn.error glogging 277 glogging.py Socket error processing request.
Traceback (most recent call last):
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/gunicorn/workers/sync.py", line
134, in handle
req = six.next(parser)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/gunicorn/http/parser.py", line 41, in next
self.mesg = self.mesg_class(self.cfg, self.unreader, self.req_count)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/gunicorn/http/message.py", line 181, in init
super(Request, self).init(cfg, unreader)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/gunicorn/http/message.py", line 54, in init
unused = self.parse(self.unreader)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/gunicorn/http/message.py", line 230, in parse
self.headers = self.parse_headers(data[:idx])
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/gunicorn/http/message.py", line 74, in parse_headers
remote_addr = self.unreader.sock.getpeername()
OSError: [Errno 107] Transport endpoint is not connected

Can you try increase the router timeout.

Can you try increase the router timeout.

Folowing this lead we found issue: Openshift readiness probe setup was too optimistic (app took too long for some requests), and on failure external loadbalancer (AVI) picked up this event and kicked pod out of loadbalancing pool.

I recently experienced this issue with gunicorn=19.9.0. A restart resolved the issue. I am deployed on Google Kubernetes Engine. The application is a flask app -
entry:
command: ["sh", "-c", "gunicorn -b 0.0.0.0:$${PORT} -c gunicorn_config.py run:app"]
config:

worker_temp_dir = '/dev/shm'
worker_class = 'gthread'
worker = 2
threads = 2
worker_connections = 1000
timeout = 180
keepalive = 2
backlog = 2048
accesslog = '-'
errorlog = '-'

Error:
Traceback (most recent call last): File "/usr/local/lib/python2.7/site-packages/gunicorn/workers/gthread.py", line 274, in handle req = six.next(conn.parser) File "/usr/local/lib/python2.7/site-packages/gunicorn/http/parser.py", line 41, in __next__ self.mesg = self.mesg_class(self.cfg, self.unreader, self.req_count) File "/usr/local/lib/python2.7/site-packages/gunicorn/http/message.py", line 181, in __init__ super(Request, self).__init__(cfg, unreader) File "/usr/local/lib/python2.7/site-packages/gunicorn/http/message.py", line 54, in __init__ unused = self.parse(self.unreader) File "/usr/local/lib/python2.7/site-packages/gunicorn/http/message.py", line 230, in parse self.headers = self.parse_headers(data[:idx]) File "/usr/local/lib/python2.7/site-packages/gunicorn/http/message.py", line 74, in parse_headers remote_addr = self.unreader.sock.getpeername() File "/usr/local/lib/python2.7/socket.py", line 228, in meth return getattr(self._sock,name)(*args) error: [Errno 107] Transport endpoint is not connected

Is the current solution to downgrade to 19.7.1?

Can anyone share a repository with deployment instructions that I can use to reproduce the issue? I'm happy to look into it, but I want to make sure I know exactly how to set it up.

Hi @tilgovi Fastapi uses gunicorn at production.
This repository has a minimal version and showed same error. You can try, I have been this error at App Engine I don't know if it replicates with another environments. That repository, could it help to reproduce the issue?

Same issue here:
gunicorn 19.9.0 + GKE also occurred when we were dealing with high load.

cmin

Not sure, but everything seems back to normal now.

This is my app.yaml

runtime: python37
entrypoint: gunicorn -b :$PORT truestory:app


handlers:
- url: /static
  static_dir: truestory/static

- url: /favicon\.ico
  static_files: truestory/static/img/favicon.ico
  upload: truestory/static/img/favicon\.ico

- url: /.*
  secure: always
  redirect_http_response_code: 301
  script: auto

And production run of Makefile:

run: export FLASK_CONFIG = production
run:
    # Run main server in production mode with Gunicorn (remote database).
    @echo "[i] Starting server with Gunicorn."
    gunicorn -b :$(PORT) truestory:app

So maybe it was something temporary on GAE.

I have the same issue. Gunicorn with gevent, just a Google HTTP LB in front. (no Nginx, or other reverse proxy). Stuff works fine for weeks but once in a while I get:

Traceback (most recent call last):
  File "XXX/gunicorn/workers/base_async.py", line 65, in handle
    util.reraise(*sys.exc_info())
  File "XXX/gunicorn/util.py", line 625, in reraise
    raise value
  File "XXX/gunicorn/workers/base_async.py", line 48, in handle
    req = next(parser)
  File "XXX/gunicorn/http/parser.py", line 41, in __next__
    self.mesg = self.mesg_class(self.cfg, self.unreader, self.req_count)
  File "XXX/gunicorn/http/message.py", line 186, in __init__
    super().__init__(cfg, unreader)
  File "XXX/gunicorn/http/message.py", line 53, in __init__
    unused = self.parse(self.unreader)
  File "XXX/gunicorn/http/message.py", line 235, in parse
    self.headers = self.parse_headers(data[:idx])
  File "XXX/gunicorn/http/message.py", line 73, in parse_headers
    remote_addr = self.unreader.sock.getpeername()
OSError: [Errno 107] Transport endpoint is not connected

Gunicorn 20.0.4.

Fact that several people reported that his "happens under high load" or that, in my case, this thing happens "a few times per month" looks like this is some sort of race condition.

@JordanP fo you have any error on google side? How is send the ping from Google LB? Does it timeout on google LB side?

It's running in Kubernetes, the health check is an HTTP one, very conservative (timeout 5 sec, 10 consecutive failures before marking the containers dead).

On Google side, the HTTP LB in front of Gunicorn returned more than 40k 502 errors (in a couple of minutes) with the following raison: "backend_timeout":
image

I got 4 replicas (4 containers), they all crashed at ~the same time that night. So, it's a wild guess but maybe Google had to restart their load balancer to deploy a new version, a fix, whatever, it's all software after all, so the client (as seen by Gunicorm) may have disconnect in an unfriendly/not-expected way. Any way, Gunicorn should be resilient to whatever client situation happen.

Ignoring ENOTCONN looks okay-ish, it was discussed to do that directly in some modules of the Python stdlib, for some operations: https://bugs.python.org/issue30319#msg297643

Same error.

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/sync.py", line 134, in handle
    req = six.next(parser)
  File "/usr/local/lib/python3.7/site-packages/gunicorn/http/parser.py", line 41, in __next__
    self.mesg = self.mesg_class(self.cfg, self.unreader, self.req_count)
  File "/usr/local/lib/python3.7/site-packages/gunicorn/http/message.py", line 181, in __init__
    super(Request, self).__init__(cfg, unreader)
  File "/usr/local/lib/python3.7/site-packages/gunicorn/http/message.py", line 54, in __init__
    unused = self.parse(self.unreader)
  File "/usr/local/lib/python3.7/site-packages/gunicorn/http/message.py", line 230, in parse
    self.headers = self.parse_headers(data[:idx])
  File "/usr/local/lib/python3.7/site-packages/gunicorn/http/message.py", line 74, in parse_headers
    remote_addr = self.unreader.sock.getpeername()
OSError: [Errno 107] Transport endpoint is not connected

Two flask apps (api and interface) in separate docker containers each running gunicorn. The error occurs in Chrome/Chromium (not firefox), when I post to the api via a form in the interface app. It might be connected to Chrome's preemptive TCP connections. Since nginx should be able to handle these, I have put it in front of the containers. Changes nothing.

@uree which worker? How do you launch gunicorn?

CMD ["gunicorn", "app:app", "-b", "0.0.0.0:8001", "-t 90"]
I've also tried
CMD ["gunicorn", "app:app", "-b", "0.0.0.0:8001", "-t 90", "--preload"]

I'm seeing the same issue when running django with gunicorn with docker-compose on Digital Ocean.
Gunicorn version 20.0.4

version: '3.7'

services:
  backend:
    build: .
    command: gunicorn --workers=2 --thread=2 --log-file=- --certfile=/etc/nginx/ssl/xxx.crt --keyfile=/etc/nginx/ssl/xxx.key backend.config.wsgi:application --bind 0.0.0.0:8000
    restart: unless-stopped
    volumes:
      - .:/usr/src/app/
      - ../media:/backend/media
      - /root/certs/:/etc/nginx/ssl/
    ports:
      - 8000:8000
    env_file:
      - ./.env.dev
    environment:
      - Debug=True
      # - GUNICORN_WORKERS=2
      # - GUNICORN_ERRORLOG=-
      # - GUNICORN_THREADS=4
      # - GUNICORN_ACCESSLOG=-
    depends_on:
      - db
  db:
    image: postgres:12.0-alpine
    restart: unless-stopped
    volumes:
      - ../postgres_data:/var/lib/postgresql/data/
    environment:
      - POSTGRES_USER=xxxx
      - POSTGRES_PASSWORD=xxxx
      - POSTGRES_DB=archlink
  frontend:
    build: ./frontend
    volumes:
      - ./frontend:/app
      - /app/node_modules
    ports:
      - "3000:3000"
    environment:
      - NODE_ENV=production
      - BACKEND_URL=http://142.93.235.130:8000/
    depends_on:
      - backend
    # command: npm start
  nginx:
    image: nginx:latest
    restart: unless-stopped
    volumes:
      - ./nginx/nginx-proxy.conf:/etc/nginx/conf.d/default.conf:ro
      - ./frontend/build:/var/www/frontend # maps frontend build inside nginx
      - /root/certs/:/etc/nginx/ssl/
    ports:
      - 80:8080
      - 443:443
    depends_on:
      - frontend

This error occurs every 4-5 minutes:

backend_1   | [2020-03-04 12:05:58 +0000] [18] [ERROR] Socket error processing request.
backend_1   | Traceback (most recent call last):
backend_1   |   File "/usr/local/lib/python3.8/site-packages/gunicorn/workers/gthread.py", line 266, in handle
backend_1   |     req = next(conn.parser)
backend_1   |   File "/usr/local/lib/python3.8/site-packages/gunicorn/http/parser.py", line 41, in __next__
backend_1   |     self.mesg = self.mesg_class(self.cfg, self.unreader, self.req_count)
backend_1   |   File "/usr/local/lib/python3.8/site-packages/gunicorn/http/message.py", line 186, in __init__
backend_1   |     super().__init__(cfg, unreader)
backend_1   |   File "/usr/local/lib/python3.8/site-packages/gunicorn/http/message.py", line 53, in __init__
backend_1   |     unused = self.parse(self.unreader)
backend_1   |   File "/usr/local/lib/python3.8/site-packages/gunicorn/http/message.py", line 198, in parse
backend_1   |     self.get_data(unreader, buf, stop=True)
backend_1   |   File "/usr/local/lib/python3.8/site-packages/gunicorn/http/message.py", line 189, in get_data
backend_1   |     data = unreader.read()
backend_1   |   File "/usr/local/lib/python3.8/site-packages/gunicorn/http/unreader.py", line 37, in read
backend_1   |     d = self.chunk()
backend_1   |   File "/usr/local/lib/python3.8/site-packages/gunicorn/http/unreader.py", line 64, in chunk
backend_1   |     return self.sock.recv(self.mxchunk)
backend_1   |   File "/usr/local/lib/python3.8/ssl.py", line 1226, in recv
backend_1   |     return self.read(buflen)
backend_1   |   File "/usr/local/lib/python3.8/ssl.py", line 1101, in read
backend_1   |     return self._sslobj.read(len)
backend_1   | OSError: [Errno 0] Error

Same error.

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/sync.py", line 134, in handle
    req = six.next(parser)
  File "/usr/local/lib/python3.7/site-packages/gunicorn/http/parser.py", line 41, in __next__
    self.mesg = self.mesg_class(self.cfg, self.unreader, self.req_count)
  File "/usr/local/lib/python3.7/site-packages/gunicorn/http/message.py", line 181, in __init__
    super(Request, self).__init__(cfg, unreader)
  File "/usr/local/lib/python3.7/site-packages/gunicorn/http/message.py", line 54, in __init__
    unused = self.parse(self.unreader)
  File "/usr/local/lib/python3.7/site-packages/gunicorn/http/message.py", line 230, in parse
    self.headers = self.parse_headers(data[:idx])
  File "/usr/local/lib/python3.7/site-packages/gunicorn/http/message.py", line 74, in parse_headers
    remote_addr = self.unreader.sock.getpeername()
OSError: [Errno 107] Transport endpoint is not connected

Two flask apps (api and interface) in separate docker containers each running gunicorn. The error occurs in Chrome/Chromium (not firefox), when I post to the api via a form in the interface app. It might be connected to Chrome's preemptive TCP connections. Since nginx should be able to handle these, I have put it in front of the containers. Changes nothing.

Update In my case the issue was elsewhere. It was caused by an onclick jquery event on a submit button. I had to post asynchronously with ajax to resolve the issue.

Are there any updates on this error?

Are there any updates on this error?

well can you describe the context in which it is happening? Also for all that using that kubernetes can you describe how your health check is configured so we may be able to reproduce it?

What makes think it's related to Kubernetes ? No misbehaving client, half closed connection should ever completely crash a Gunicorn worker, whether it's running in Kubernetes, Mesos, docker, baremetal: Gunicorn most be resilient.

I have not find a reliable/easy reproducer, but if I do, I think I may be able to crash every single gunicorn webservers directly exposed to the Internet.

well i never had such crash when gunicorn is behind nginx and some issued
reported there seems related to kubernetes.

On which worker is it happening ? Is gunicorn used behind a proxy? Which
one?

On Tue 10 Mar 2020 at 11:52 Jordan Pittier notifications@github.com wrote:

What makes think it's related to Kubernetes ? No misbehaving client, half
closed connection should ever completely crash a Gunicorn worker, whether
it's running in Kubernetes, Mesos, docker, baremetal: Gunicorn most be
resilient.

I have not find a reliable/easy reproducer, but if I do, I think I may be
able to crash every single gunicorn webservers directly exposed to the
Internet.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/benoitc/gunicorn/issues/1913?email_source=notifications&email_token=AAADRITYFZI4GINCSG752OTRGYLXDA5CNFSM4GBYQQA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOK5V2I#issuecomment-597023465,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAADRIRSO4T7WQTS6GMHLO3RGYLXDANCNFSM4GBYQQAQ
.

>

Sent from my Mobile

Same error with an aws ecs service behind a aws load balancer.
Happened once at the same time on all the replicas (containers/tasks)
Gunicorn as pip package. No Nginx, no proxy.
Python 3.7.6
Gunicorn version: 20.0.4
Run like:
gunicorn --bind 0.0.0.0:8000 --workers 1 --threads 5 --max-requests 100 --timeout 300 application.wsgi
Log:
[2020-03-10 22:28:38 +0100] [105] [ERROR] Socket error processing request. Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/gthread.py", line 266, in handle req = next(conn.parser) File "/usr/local/lib/python3.7/site-packages/gunicorn/http/parser.py", line 41, in __next__ 22:28:37.814 WARNING [django.request:log_response #228] Not Found: /443 22:28:36.176 WARNING [django.request:log_response #228] Not Found: /443 [2020-03-10 22:28:35 +0100] [105] [ERROR] Socket error processing request. Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/gthread.py", line 266, in handle req = next(conn.parser) File "/usr/local/lib/python3.7/site-packages/gunicorn/http/parser.py", line 41, in __next__ self.mesg = self.mesg_class(self.cfg, self.unreader, self.req_count) File "/usr/local/lib/python3.7/site-packages/gunicorn/http/message.py", line 186, in __init__ super().__init__(cfg, unreader) File "/usr/local/lib/python3.7/site-packages/gunicorn/http/message.py", line 53, in __init__ unused = self.parse(self.unreader) File "/usr/local/lib/python3.7/site-packages/gunicorn/http/message.py", line 235, in parse self.headers = self.parse_headers(data[:idx]) File "/usr/local/lib/python3.7/site-packages/gunicorn/http/message.py", line 73, in parse_headers remote_addr = self.unreader.sock.getpeername() OSError: [Errno 107] Transport endpoint is not connected [2020-03-10 22:28:35 +0100] [105] [ERROR] Socket error processing request. Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/gthread.py", line 266, in handle req = next(conn.parser) File "/usr/local/lib/python3.7/site-packages/gunicorn/http/parser.py", line 41, in __next__ self.mesg = self.mesg_class(self.cfg, self.unreader, self.req_count) File "/usr/local/lib/python3.7/site-packages/gunicorn/http/message.py", line 186, in __init__ super().__init__(cfg, unreader) File "/usr/local/lib/python3.7/site-packages/gunicorn/http/message.py", line 53, in __init__ unused = self.parse(self.unreader) File "/usr/local/lib/python3.7/site-packages/gunicorn/http/message.py", line 235, in parse self.headers = self.parse_headers(data[:idx]) File "/usr/local/lib/python3.7/site-packages/gunicorn/http/message.py", line 73, in parse_headers remote_addr = self.unreader.sock.getpeername() OSError: [Errno 107] Transport endpoint is not connected

Are there any updates on this error?

well can you describe the context in which it is happening? Also for all that using that kubernetes can you describe how your health check is configured so we may be able to reproduce it?

I don't have anything specific. The error occurs out of nowhere, no noticeable actions happen that cause this error.
In the django app other than regular REST API endpoints there is a django job scheduler. Everything else you can see in the docker-compose.yml.

I can provide some more data. I am seeing this occasionally with gunicorn 19.9.0 running behind haproxy as a reverse proxy (just using HTTP, not using the PROXY protocol).

Mar 17 21:38:07 redacted.com gunicorn[25470]: https://redacted.com/redacted[2020-03-17 21:38:07 +0000] [25495] [ERROR] Socket error processing request.
Mar 17 21:38:07 redacted.com gunicorn[25470]: Traceback (most recent call last):
Mar 17 21:38:07 redacted.com gunicorn[25470]:   File "/var/venvs/software-venv/lib/python3.6/site-packages/gunicorn/workers/sync.py", line 134, in handle
Mar 17 21:38:07 redacted.com gunicorn[25470]:     req = six.next(parser)
Mar 17 21:38:07 redacted.com gunicorn[25470]:   File "/var/venvs/software-venv/lib/python3.6/site-packages/gunicorn/http/parser.py", line 41, in __next__
Mar 17 21:38:07 redacted.com gunicorn[25470]:     self.mesg = self.mesg_class(self.cfg, self.unreader, self.req_count)
Mar 17 21:38:07 redacted.com gunicorn[25470]:   File "/var/venvs/software-venv/lib/python3.6/site-packages/gunicorn/http/message.py", line 181, in __init__
Mar 17 21:38:07 redacted.com gunicorn[25470]:     super(Request, self).__init__(cfg, unreader)
Mar 17 21:38:07 redacted.com gunicorn[25470]:   File "/var/venvs/software-venv/lib/python3.6/site-packages/gunicorn/http/message.py", line 54, in __init__
Mar 17 21:38:07 redacted.com gunicorn[25470]:     unused = self.parse(self.unreader)
Mar 17 21:38:07 redacted.com gunicorn[25470]:   File "/var/venvs/software-venv/lib/python3.6/site-packages/gunicorn/http/message.py", line 230, in parse
Mar 17 21:38:07 redacted.com gunicorn[25470]:     self.headers = self.parse_headers(data[:idx])
Mar 17 21:38:07 redacted.com gunicorn[25470]:   File "/var/venvs/software-venv/lib/python3.6/site-packages/gunicorn/http/message.py", line 74, in parse_headers
Mar 17 21:38:07 redacted.com gunicorn[25470]:     remote_addr = self.unreader.sock.getpeername()
Mar 17 21:38:07 redacted.com gunicorn[25470]: OSError: [Errno 107] Transport endpoint is not connected


The server was handling about 30 requests/second at the time. As you can see, the first log line was mangled, presumably due to buffered output and multiple workers.

Gunicorn is being run with systemd: ExecStart=/var/venvs/software-venv/bin/gunicorn -b 0.0.0.0:6000 -w 4 app:app and LimitNOFILE=49152.

I sent in a pull request with a fix that I've been running in production for a few days now. The bug is caused by a race condition, where the socket can be closed (by either client, OS, etc.) before getpeername() is called, so it correctly raises an exception. However, gunicorn isn't catching this exception, so it's bubbling up and crashing the server. My fix is just to catch the exception and raise it as a NoMoreData exception, which is already handled by other code further up the stack. This prevents a poorly-timed disconnection from crashing the server.

I'm using Kubernetes(1.16.8-gke.15) and the latest Gunicorn (20.0.4), and Python 3.7. If I make a POST request and increase a time delay starting with 1 second for each iteration, it stops working when the delay is 360 seconds. The job inside of Gunicorn is finished, and a few minutes later it returns this error:

Socket error processing request.
OSError: [Errno 107] Transport endpoint is not connected

When the connection drops between Kubernetes and Gunicorn, the Kubernetes endpoint and the client are still connected. The health checks look good, but it's possible they are misconfigured somehow. I haven't found any logs on the Kubernetes side to identify the problem.

I had the same result for Gunicorn (19.7.1).

I've added the timeout flag for Gunicorn, and I'm using the default GKE Loadbalancer with BackendConfig for Kubernetes GKE. I've also tried with an NGINX Ingress and adding annotations to handle any timeouts. Gunicorn command:

gunicorn --bind="0.0.0.0:5000" --workers=1 --timeout=1200 --keep-alive=1200 main:app

When I run Gunicorn locally without anything in front of it, it works fine. This might be more of a Kubernetes issue, however, the response is lost.

Has anyone had any luck with this problem? Or a good way to debug it?

Docker version 19.03.8, build afacb8b7f0

Python 3.8.2 (default, Feb 26 2020, 15:09:34)
[GCC 8.3.0] on linux

import multiprocessing
import os

bind = '0.0.0.0:8889'
max_requests = 100000
timeout = 60
graceful_timeout = 60
if os.environ.get('WEB_WORKERS') is None:
    _cpu_count = multiprocessing.cpu_count()
    workers = 2 * _cpu_count + 1
else:
    workers = int(os.environ['WEB_WORKERS'])
limit_request_line = 4094 * 4  # 4x then default val
errorlog = '/var/log/krapi/gunicorn.error.log'
accesslog = '/var/log/krapi/gunicorn.access.log'
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/gunicorn/workers/sync.py", line 133, in handle
    req = next(parser)
  File "/usr/local/lib/python3.8/site-packages/gunicorn/http/parser.py", line 41, in __next__
    self.mesg = self.mesg_class(self.cfg, self.unreader, self.req_count)
  File "/usr/local/lib/python3.8/site-packages/gunicorn/http/message.py", line 186, in __init__
    super().__init__(cfg, unreader)
  File "/usr/local/lib/python3.8/site-packages/gunicorn/http/message.py", line 53, in __init__
    unused = self.parse(self.unreader)
  File "/usr/local/lib/python3.8/site-packages/gunicorn/http/message.py", line 235, in parse
    self.headers = self.parse_headers(data[:idx])
  File "/usr/local/lib/python3.8/site-packages/gunicorn/http/message.py", line 73, in parse_headers
    remote_addr = self.unreader.sock.getpeername()
OSError: [Errno 107] Transport endpoint is not connected

In my case, I've discovered that any HEAD request emit this.

I'm using django behind gunicorn, and I suspect that the application want to write a response body, (it shouldn't), but I haven't confirmed that to be the case yet.

same behavior

I think this might be fixed by #2277

In my case, Ansible's wait_for module is the cause.

I use Ansible to deploy a gunicorn + flask server (specifically Python 3.6.12, gunicorn 19.9.0, Flask 1.4.1).

After starting the service, i use the wait_for module to make sure the service is up and running.
This module probably breaks the connection immediately after it validates the service is up (not waiting for gunicorn to response) and thus, gunicorn raises this error.

I guess other monitoring systems does the same.

I got the same error .. hmm
Currently we got huge traffic.. 100-1000 TPS, and some request failed randomly

Python 3.8
Flask
Gunicorn

With below docker properties..

FROM python:3-slim

RUN apt-get update && apt-get -y install gcc

ENV PYTHONUNBUFFERED True

COPY . /app

WORKDIR /app/src

RUN pip install Flask requests gunicorn
RUN pip install -U flask-cors
RUN pip install requests
RUN pip install DateTime
RUN pip install timedelta

RUN chmod 444 app.py

CMD exec gunicorn -b :443 --workers 5 --threads 8 --timeout 10 app:app --reload

Any solution?

Are there any updates on this?
It seems there are multiple PRs to fix it, do we have a time line to release them?
Screenshot 2020-12-14 at 12 45 42

Hi @tilgovi
Do we have a timeline to release this new version? it seems the Gunicorn package does not update for a long time...
image

I will make a release probably today. i will recheck this enotconn issue as i am not happy with the solution commited. @tilgovi has another fix that can be tested.

?

did you test the other patch to help?

thanks, I am wondering know is there any update info about the pip package?

@yehjames is master working for you? A release is planned now today. But any feedback on how master works on different platforms is welcome.

@benoitc Any update on this? Using 20.0.4 in production and implemented the change suggested by @asantoni (as a monkey-patch) to avoid frequent crashes. But Veracode static code scan doesn't like the patch, so trying to fix it now. Thank you!

We'll work to get a release out as soon as we can. We cannot promise a day, but we're working to figure out what remains for this release and to improve the release management for the future.

Please use GitHub's "Watch" feature for the repository and watch for releases if you want to be notified.

Hi. I am having the same Issue with HAProxy + Gunicorn + Django.

My HAProxy backend looses almost all its servers due to checks not responded and Gunicorn logs are plagued with:

[2021-07-23 18:16:27 -0500] [13] [ERROR] Socket error processing request. Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/gunicorn/workers/sync.py", line 133, in handle req = next(parser) File "/usr/local/lib/python3.9/site-packages/gunicorn/http/parser.py", line 41, in __next__ self.mesg = self.mesg_class(self.cfg, self.unreader, self.req_count) File "/usr/local/lib/python3.9/site-packages/gunicorn/http/message.py", line 186, in __init__ super().__init__(cfg, unreader) File "/usr/local/lib/python3.9/site-packages/gunicorn/http/message.py", line 53, in __init__ unused = self.parse(self.unreader) File "/usr/local/lib/python3.9/site-packages/gunicorn/http/message.py", line 235, in parse self.headers = self.parse_headers(data[:idx]) File "/usr/local/lib/python3.9/site-packages/gunicorn/http/message.py", line 73, in parse_headers remote_addr = self.unreader.sock.getpeername() OSError: [Errno 107] Transport endpoint is not connected

I am working with gunicorn==20.0.4, Django==3.1.5, HA-Proxy version 2.2.11-1ppa1~bionic

Any clue on how to proceed?

This is on TCP mode, no SSL, on Locust Stress Testing.

Someone pls share the solution on this issue

@krishnamanchikalapudi @ricarhincapie please upgrade to the latest release of Gunicorn :)

Was this page helpful?
0 / 5 - 0 ratings