(Updated, because I realised I had been attempting to fix this, so got confused by the original issue and the issue that made me give up)
I'm making a chunked request to an app that uses Werkzeug (current git master). get_input_stream ends up calling LimitedStream(wsgi.input, min(content_length, max_content_length))
. min(content_length, max_content_length)
will always be None if using chunked encoding.
In an attempt to make progress I tried using LimitedStream(stream, content_length or max_content_length)
, but then I end up getting a ClientDisconnected
exception, because we can't tell the difference between a client disconnecting and the chunked request simply being done, as wsgi.input.read()
just returns a 0-length string in both cases.
This is more than just the simple bug with min
. After fixing that, the problem is that doing wsgi.input.read(max_content_length)
with Werkzeug blocks forever if there's less than max_content_length
to read. Or using Gunicorn, which supports chunks by parsing and buffering the whole message, the length comes out to less than max_content_length
, so LimitedStream
thinks the client disconnected.
I tested this out on Django just to be sure we're not doing something weird, it sees an empty stream too.
The simplest way forward is to provide a middleware that sets environ['wsgi.input_terminated']
and tell people to use it when using a server that supports chunked transfer.
Which wsgi currently supports input_terminated
? After burning all day on debugging why the data is empty, then why stream is empty then realizing it's due to chunked stream encoding I now can't find anything that supports this :/
In addition to checking environ.get('wsgi.input_terminated')
should we also check environ.get('HTTP_TRANSFER_ENCODING') == 'chunked'
and in that case also return the unwrapped stream directly? That fixes problems like a simple curl -H 'Transfer-Encoding: chunked' ...
returning an empty stream.
Per RFC2616
4.4.2
If a Transfer-Encoding header field (section 14.41) is present and
has any value other than "identity", then the transfer-length is
defined by use of the "chunked" transfer-coding (section 3.6),
unless the message is terminated by closing the connection.
4.4.3
If a Content-Length header field (section 14.13) is present, its
decimal value in OCTETs represents both the entity-length and the
transfer-length. The Content-Length header field MUST NOT be sent
if these two lengths are different (i.e., if a Transfer-Encoding
header field is present). If a message is received with both a
Transfer-Encoding header field and a Content-Length header field,
the latter MUST be ignored.
... noting that Transfer-Encoding
of chunked
demands that any Content-Length
header be ignored, and so no stream-processing should be done based on it. This is currently the case unless wsgi.input_terminated
is set.
Should it really be necessary to set wsgi.input_terminated
in order for normal chunked encoding e.g. of a 10 byte JSON payload to work?
Most helpful comment
Which wsgi currently supports
input_terminated
? After burning all day on debugging why the data is empty, then why stream is empty then realizing it's due to chunked stream encoding I now can't find anything that supports this :/