Werkzeug: Chunked requests still don't really work

Created on 16 Jul 2017 · 4Comments · Source: pallets/werkzeug

(Updated, because I realised I had been attempting to fix this, so got confused by the original issue and the issue that made me give up)

I'm making a chunked request to an app that uses Werkzeug (current git master). get_input_stream ends up calling LimitedStream(wsgi.input, min(content_length, max_content_length)). min(content_length, max_content_length) will always be None if using chunked encoding.

In an attempt to make progress I tried using LimitedStream(stream, content_length or max_content_length), but then I end up getting a ClientDisconnected exception, because we can't tell the difference between a client disconnecting and the chunked request simply being done, as wsgi.input.read() just returns a 0-length string in both cases.

bug

Source

sorenh

Most helpful comment

Which wsgi currently supports input_terminated? After burning all day on debugging why the data is empty, then why stream is empty then realizing it's due to chunked stream encoding I now can't find anything that supports this :/

iScrE4m on 16 Aug 2017

👍2

All 4 comments

This is more than just the simple bug with min. After fixing that, the problem is that doing wsgi.input.read(max_content_length) with Werkzeug blocks forever if there's less than max_content_length to read. Or using Gunicorn, which supports chunks by parsing and buffering the whole message, the length comes out to less than max_content_length, so LimitedStream thinks the client disconnected.

I tested this out on Django just to be sure we're not doing something weird, it sees an empty stream too.

davidism on 21 Jul 2017

The simplest way forward is to provide a middleware that sets environ['wsgi.input_terminated'] and tell people to use it when using a server that supports chunked transfer.

davidism on 21 Jul 2017

iScrE4m on 16 Aug 2017

👍2

In addition to checking environ.get('wsgi.input_terminated') should we also check environ.get('HTTP_TRANSFER_ENCODING') == 'chunked' and in that case also return the unwrapped stream directly? That fixes problems like a simple curl -H 'Transfer-Encoding: chunked' ... returning an empty stream.

Per RFC2616

4.4.2
If a Transfer-Encoding header field (section 14.41) is present and
has any value other than "identity", then the transfer-length is
defined by use of the "chunked" transfer-coding (section 3.6),
unless the message is terminated by closing the connection.

4.4.3
If a Content-Length header field (section 14.13) is present, its
decimal value in OCTETs represents both the entity-length and the
transfer-length. The Content-Length header field MUST NOT be sent
if these two lengths are different (i.e., if a Transfer-Encoding
header field is present). If a message is received with both a
Transfer-Encoding header field and a Content-Length header field,
the latter MUST be ignored.

... noting that Transfer-Encoding of chunked demands that any Content-Length header be ignored, and so no stream-processing should be done based on it. This is currently the case unless wsgi.input_terminated is set.

Should it really be necessary to set wsgi.input_terminated in order for normal chunked encoding e.g. of a 10 byte JSON payload to work?