Gunicorn: OSError: [Errno 0] Error

Created on 8 May 2018  ·  30Comments  ·  Source: benoitc/gunicorn

I'm running the app
gunicorn -w 2 -b 'localhost:8585' --timeout=200 --certfile=crt.crt --keyfile=key.key service:app

And I get the following, but I do not always get such an answer, most requests are handled correctly, but sometimes an error occurs

[2018-05-08 14:53:36 +0500] [11227] [ERROR] Socket error processing request.
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/gunicorn/workers/sync.py", line 134, in handle
    req = six.next(parser)
  File "/usr/lib/python3/dist-packages/gunicorn/http/parser.py", line 41, in __next__
    self.mesg = self.mesg_class(self.cfg, self.unreader, self.req_count)
  File "/usr/lib/python3/dist-packages/gunicorn/http/message.py", line 153, in __init__
    super(Request, self).__init__(cfg, unreader)
  File "/usr/lib/python3/dist-packages/gunicorn/http/message.py", line 53, in __init__
    unused = self.parse(self.unreader)
  File "/usr/lib/python3/dist-packages/gunicorn/http/message.py", line 165, in parse
    self.get_data(unreader, buf, stop=True)
  File "/usr/lib/python3/dist-packages/gunicorn/http/message.py", line 156, in get_data
    data = unreader.read()
  File "/usr/lib/python3/dist-packages/gunicorn/http/unreader.py", line 38, in read
    d = self.chunk()
  File "/usr/lib/python3/dist-packages/gunicorn/http/unreader.py", line 65, in chunk
    return self.sock.recv(self.mxchunk)
  File "/usr/lib/python3.5/ssl.py", line 922, in recv
    return self.read(buflen)
  File "/usr/lib/python3.5/ssl.py", line 799, in read
    return self._sslobj.read(len, buffer)
  File "/usr/lib/python3.5/ssl.py", line 585, in read
    v = self._sslobj.read(len)
OSError: [Errno 0] Error
( FeaturSSL

Most helpful comment

Hmm, well after some additional research it seems that this may actually be a bug in the way the python ssl library handles ragged EOFs on linux: https://bugs.python.org/issue31122

All 30 comments

From my memory, this error happens when a client tries to connect without SSL. Could that be the case for you?

I see your post on the other issue that I closed. My apologies if my comment is not the cause.

Is there a pattern to which requests fail this way?

@usmetanina what kind of clients connect to Gunicorn also? DO you have any SSL options used explicitly to connect to it?

is this already solved ? @usmetanina , because I have exactly the same issue

@benoitc I see @usmetanina's exact error frequently using python3.6 and gunicorn 19.9.0.

I use the below information to start up gunicorn with a flask app running within a docker container.

gunicorn --workers=3 --bind=0.0.0.0:8000 --config=gunicorn_config.py --preload main

The config file looks like this (domain-with-cert.com of course is a placeholder for the actual domain name):

workers = 3
bind = '0.0.0.0:443'
certfile = '/etc/letsencrypt/live/domain-with-cert.com/fullchain.pem'
keyfile = '/etc/letsencrypt/live/domain-with-cert.com/privkey.pem'

Any thoughts on debugging this would be helpful. If you need further info, just let me know.

@willpatera, see my comment:

From my memory, this error happens when a client tries to connect without SSL. Could that be the case for you?

@tilgovi I saw the above comment. I am pretty sure that the client is connecting over SSL. Any debugging suggestions?

@willpatera I would say, turn on the access logs and see if you can determine which request causes the issue. If you have a reverse proxy in front of gunicorn make sure it has access logs so you can maybe see which request causes an error with gunicorn even if gunicorn never logs it.

@tilgovi I am having the same issues. Had to edit the following information a bit as it was incorrect:
The request that is being made to gunicorn is always the exact same request (but with a different body). So there is no doubt that it is https and not http.
What I do notice is that it always happens when the amount of requests is going up. When the server is busy it seems to have trouble handling the requests properly.

Maybe this has to do with the workers or something like that? If you have any configuration suggestions I would gladly like to test them.

Hi guys, I am still looking for a way to solve this. Currently the only option we have is to downgrade to plain HTTP, which is not feasible at all.

I've witnessed the same thing. Had a production server running Gunicorn + Flask (behind a load balancer) that worked fine for months, then suddenly every request yielded this error until I restarted Gunicorn:

[2019-11-21 07:27:36 +0000] [24245] [ERROR] Socket error processing request.
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/gunicorn/workers/sync.py", line 134, in handle
    req = six.next(parser)
  File "/usr/local/lib/python3.6/dist-packages/gunicorn/http/parser.py", line 41, in __next__
    self.mesg = self.mesg_class(self.cfg, self.unreader, self.req_count)
  File "/usr/local/lib/python3.6/dist-packages/gunicorn/http/message.py", line 181, in __init__
    super(Request, self).__init__(cfg, unreader)
  File "/usr/local/lib/python3.6/dist-packages/gunicorn/http/message.py", line 54, in __init__
    unused = self.parse(self.unreader)
  File "/usr/local/lib/python3.6/dist-packages/gunicorn/http/message.py", line 193, in parse
    self.get_data(unreader, buf, stop=True)
  File "/usr/local/lib/python3.6/dist-packages/gunicorn/http/message.py", line 184, in get_data
    data = unreader.read()
  File "/usr/local/lib/python3.6/dist-packages/gunicorn/http/unreader.py", line 38, in read
    d = self.chunk()
  File "/usr/local/lib/python3.6/dist-packages/gunicorn/http/unreader.py", line 65, in chunk
    return self.sock.recv(self.mxchunk)
  File "/usr/lib/python3.6/ssl.py", line 997, in recv
    return self.read(buflen)
  File "/usr/lib/python3.6/ssl.py", line 874, in read
    return self._sslobj.read(len, buffer)
  File "/usr/lib/python3.6/ssl.py", line 633, in read
    v = self._sslobj.read(len)
OSError: [Errno 0] Error

Nothing in the logs preceding these errors hints at what the trigger could've been.

This was with Gunicorn 19.9.0 running with 3 workers on a single-cored server.

Since this is the first time I've seen this issue, I can't promise I'll ever reproduce it. However, if there's any kind of logging or other diagnostic code anyone would like me to add in on our server that might provide some useful information in the event that this happens again, I'm all ears.

Does your LB call a specific endpoint? How does it answer to the LB request?

When I said "Load Balancer", I really ought to have said CDN or caching layer. Specifically: it's Amazon Cloudfront. It just forwards on requests to our Gunicorn server (running on an EC2 instance) and caches the results for a while.

hrm shouldn't amazon cloudfront terminate the ssl request for you? @ExplodingCabbage . Why gunicorn has to listen on ssl behind?

@benoitc So, there's two layers with SSL involved in the architecture. Members of the public connect to our website via our CloudFront domain over HTTPS, and then CloudFront makes a request to our backend node running Gunicorn, also using HTTPS (with a different domain name and cert), caches the result, and serves it to the public.

I guess maybe you're wondering what the point of using SSL for that second, internal request is? It's certainly arguable it's pointless (although possibly not - it stops Amazon snooping on our comms in their internal network, and there are also regulatory reasons I won't go into why, given my company's industry, we might need to ensure we've got encryption all the way along the pipeline). Whether pointless or not, we do it. ¯\_(ツ)_/¯

could it be that cloudfront is sending to your endpoint a plain HTTP request though? If you have access to the cloudfront logs you should be able to see it.

@benoitc I don't think CloudFront exposes any logs that would be useful, but I'm sure that it wasn't trying to connect over HTTP, since:

  • Our distribution is configured in the CloudFront console to connect to the Gunicorn origin over "HTTPS only"
  • Gunicorn isn't listening on port 80
  • If I try to connect to our backend server over HTTP (including forcing HTTP on port 443) it doesn't reproduce the OSError quoted above
  • When I was getting the OSError quoted above, restarting Gunicorn on the backend server instantly fixed the problem, which points to something wrong on the Gunicorn end, not the Cloudfront end

@ExplodingCabbage ok I will have a look on it after the 20.0.1 is out. One last thing, which version of Python are you using?

3.6.8

I realise I left out a detail from my story above: before restarting Gunicorn, I also updated the SSL certificate Gunicorn uses with LetsEncrypt. I hadn't thought to mention this because I had wrongly concluded yesterday that there was no way that a certificate would've expired on the day that the errors began and that the certificate update had not in fact been relevant to fixing the problem.

However, from checking some logs, I now realise that the errors in fact began on the day that a previous certificate was due to expire.

There's still some mystery here, and some potential room for improvement (what exactly does this error signify, and why can't Gunicorn give a more useful message?), but the narrative I gave before - in which this error started out of the blue with no apparent cause - isn't right. I'd guess that CloudFront was terminating the connection in response to seeing an expired certificate from the Gunicorn server, and that Gunicorn, rather than being able to understand that and report it meaningfully, lets a messageless OSError bubble up.

I apologise for not having my ducks in a row before reporting. On the other hand, perhaps this'll make it easier to reproduce this exception at will if you want to try and handle the scenario more elegantly.

@ExplodingCabbage oh that's quite interresting, it should be reproducoible at some point then. Thanks for the additional details!

I have just run reproducibly into the same problem and I'm somewhat confident it's the consequence of some kind of resource exhaustion.

For me it was triggered by forgetting a timeout on a blocking call and requests piling up.

HTH

Hello! I'm experiencing this exact issue. I have a gunicorn/flask service running on an ECS cluster behind a network load balancer. Some version specifics:

python    - 3.7.4
gunicorn  - 19.9.0
flask     - 1.0.4

The service is able to respond to requests coming from a client using TLS without issue, however my logs are flooded with OSErrors. As far as I can tell, these are resulting from the health check requests coming from the load balancer (TCP).

I was able to reproduce the error locally by opening and closing a TCP connection manually on the listening port (8000 in this case):

$ nc -vz 127.0.0.1 8000
localhost [127.0.0.1] 8000 (irdmi) open

Which resulted in the following error being thrown:

Traceback (most recent call last):
    File "/nix/store/nh3v0c2nipihwblkdn0mh2kqyv3jq9nz-python3-3.7.4-env/lib/python3.7/site-packages/gunicorn/workers/sync.py" line 134 in handle
        req = six.next(parser)
    File "/nix/store/nh3v0c2nipihwblkdn0mh2kqyv3jq9nz-python3-3.7.4-env/lib/python3.7/site-packages/gunicorn/http/parser.py" line 41 in __next__
        self.mesg = self.mesg_class(self.cfg, self.unreader, self.req_count)
    File "/nix/store/nh3v0c2nipihwblkdn0mh2kqyv3jq9nz-python3-3.7.4-env/lib/python3.7/site-packages/gunicorn/http/message.py" line 181 in __init__
        super(Request, self).__init__(cfg, unreader)
    File "/nix/store/nh3v0c2nipihwblkdn0mh2kqyv3jq9nz-python3-3.7.4-env/lib/python3.7/site-packages/gunicorn/http/message.py" line 54 in __init__
        unused = self.parse(self.unreader)
    File "/nix/store/nh3v0c2nipihwblkdn0mh2kqyv3jq9nz-python3-3.7.4-env/lib/python3.7/site-packages/gunicorn/http/message.py" line 193 in parse
        self.get_data(unreader, buf, stop=True)
    File "/nix/store/nh3v0c2nipihwblkdn0mh2kqyv3jq9nz-python3-3.7.4-env/lib/python3.7/site-packages/gunicorn/http/message.py" line 184 in get_data
        data = unreader.read()
    File "/nix/store/nh3v0c2nipihwblkdn0mh2kqyv3jq9nz-python3-3.7.4-env/lib/python3.7/site-packages/gunicorn/http/unreader.py" line 38 in read
        d = self.chunk()
    File "/nix/store/nh3v0c2nipihwblkdn0mh2kqyv3jq9nz-python3-3.7.4-env/lib/python3.7/site-packages/gunicorn/http/unreader.py" line 65 in chunk
        return self.sock.recv(self.mxchunk)
    File "/nix/store/azwzsm1pkbzjxpkiq88w68p4jdghgasl-python3-3.7.4/lib/python3.7/ssl.py" line 1056 in recv
        return self.read(buflen)
    File "/nix/store/azwzsm1pkbzjxpkiq88w68p4jdghgasl-python3-3.7.4/lib/python3.7/ssl.py" line 931 in read
        return self._sslobj.read(len)
OSError: [Errno 0] Error

Hope this helps!

Hmm, well after some additional research it seems that this may actually be a bug in the way the python ssl library handles ragged EOFs on linux: https://bugs.python.org/issue31122

As mentioned by @shevisjohnson if you execute "nc -vz hostname port_no" this error appears.
We can suppress this error in log file by using below logging mechanism.

$cat logging_config.yml

version: 1

formatters:
  simple:
    format: " %(asctime)s || %(name)s || %(levelname)s || %(message)s"

  test_api:
    format: "[%(asctime)s] [%(process)s] [%(levelname)s] %(message)s"

handlers:

  console:
    class: logging.StreamHandler
    level: DEBUG
    formatter: simple
    stream: ext://sys.stdout

  test_api_file_handler:
     class: logging.handlers.RotatingFileHandler
     level: DEBUG
     formatter: test_api
     filename: logs/test.log
     maxBytes: 2000000000
     backupCount: 1
     encoding: utf8

loggers:

  test_api: 
    level: DEBUG
    handlers: [test_api_file_handler]
    propagate: 0

root:
  level: DEBUG
  handlers: [console]

Here is the python file.

import logging
import yaml
from flask import Flask

app = Flask(__name__)

def logSetter(logger_name:str) -> logging:
    with open("logging_config.yml", 'r') as f:
        config = yaml.safe_load(f)
    logging.config.dictConfig(config)
    logger = logging.getLogger(logger_name)
    return logger

logger=logSetter(logger_name="test_api")

@app.route("/api/test")
def hello():
     app.logger.info("hey from api")
     return "Hello from Python!"

Hope it helps.

We've intermittently observed several Gunicorn apps failing with this error in production while under concurrent load.

It only took a moment to come up with a reliable reproduction: using hey to send 100 concurrent requests to latest Gunicorn (20.0.4) using the gthread worker:

$ hey -n 100 -c 100 https://127.0.0.1:8000

```
$ gunicorn app:app -k gthread --certfile=... --keyfile=...
...
[2020-07-11 19:10:58 +0000] [3628247] [ERROR] Socket error processing request.
Traceback (most recent call last):
return self._sslobj.read(len)
OSError: [Errno 0] Error


Using a Debian 9 / Linux 4.14.67 based environment.

The WSGI app to reproduce need not be anything beyond:
```python
# app.py
def app(environ, start_response):
    start_response("200 OK", [])
    return ""

In case this helps too!

If the root cause is in fact https://bugs.python.org/issue31122:

  • There was a fix submitted on March 4 (python/cpython#18772), but it still has not been acknowledged by a core developer. Perhaps a Gunicorn maintainer leaving a comment there or on BPO-31122 saying that it's affecting gunicorn users would help?
  • Gunicorn would still need to work around this for supported versions of Python that predate that fix being released. Worth also asking if there's a workaround in that same comment?

This is affecting my organization in prod as well.

I noticed that the bugfix landed in 3.8 and 3.9 branches, but they're considering <= 3.7 EOL and we're still kinda stuck on 3.6 for the time being. Is there a known workaround to this issue at this time in gunicorn itself? Is there anything planned?

We're looking into what could be calling the service so much to trigger this, but I'm just trying to figure out what could be done, as this results in huge resource spikes on the affected nodes.

In addition to jriddy's comment regarding no intention to backport prior to 3.8, if anyone else is having this issue, also note that the fix is set to be included in CPython 3.8.6.

Having trouble telling exactly where this traceback emanates from - in my case, using gevent as WSGI app server directly, so assuming it's a logging call somewhere within gevent/greenlet, but can't find it as of yet. For Gunicorn, it happens here, for synchronous workers:

https://github.com/benoitc/gunicorn/blob/e636bf81989bb833d2b99104feb11e86c3f2c43a/gunicorn/workers/sync.py#L150

In the Gunicorn case, if you're just concerned about noise in logs, might be able to do something such as:

import logging

class HandshakeFilter(logging.Filter):
    # example: https://docs.python.org/3/howto/logging-cookbook.html
    # I have not tested this
    def filter(self, record):
        return "socket error processing request" in record.msg.casefold()

logging.getLogger("gunicorn").addFilter(HandshakeFilter())

Related gevent issue: https://github.com/gevent/gevent/issues/1671

Was this page helpful?
0 / 5 - 0 ratings