Werkzeug: Chunked requests still don't really work

Created on 16 Jul 2017  ·  4Comments  ·  Source: pallets/werkzeug

(Updated, because I realised I had been attempting to fix this, so got confused by the original issue and the issue that made me give up)

I'm making a chunked request to an app that uses Werkzeug (current git master). get_input_stream ends up calling LimitedStream(wsgi.input, min(content_length, max_content_length)). min(content_length, max_content_length) will always be None if using chunked encoding.

In an attempt to make progress I tried using LimitedStream(stream, content_length or max_content_length), but then I end up getting a ClientDisconnected exception, because we can't tell the difference between a client disconnecting and the chunked request simply being done, as wsgi.input.read() just returns a 0-length string in both cases.

bug

Most helpful comment

Which wsgi currently supports input_terminated? After burning all day on debugging why the data is empty, then why stream is empty then realizing it's due to chunked stream encoding I now can't find anything that supports this :/

All 4 comments

This is more than just the simple bug with min. After fixing that, the problem is that doing wsgi.input.read(max_content_length) with Werkzeug blocks forever if there's less than max_content_length to read. Or using Gunicorn, which supports chunks by parsing and buffering the whole message, the length comes out to less than max_content_length, so LimitedStream thinks the client disconnected.

I tested this out on Django just to be sure we're not doing something weird, it sees an empty stream too.

The simplest way forward is to provide a middleware that sets environ['wsgi.input_terminated'] and tell people to use it when using a server that supports chunked transfer.

Which wsgi currently supports input_terminated? After burning all day on debugging why the data is empty, then why stream is empty then realizing it's due to chunked stream encoding I now can't find anything that supports this :/

In addition to checking environ.get('wsgi.input_terminated') should we also check environ.get('HTTP_TRANSFER_ENCODING') == 'chunked' and in that case also return the unwrapped stream directly? That fixes problems like a simple curl -H 'Transfer-Encoding: chunked' ... returning an empty stream.

Per RFC2616

4.4.2
If a Transfer-Encoding header field (section 14.41) is present and
has any value other than "identity", then the transfer-length is
defined by use of the "chunked" transfer-coding (section 3.6),
unless the message is terminated by closing the connection.

4.4.3
If a Content-Length header field (section 14.13) is present, its
decimal value in OCTETs represents both the entity-length and the
transfer-length. The Content-Length header field MUST NOT be sent
if these two lengths are different (i.e., if a Transfer-Encoding
header field is present). If a message is received with both a
Transfer-Encoding header field and a Content-Length header field,
the latter MUST be ignored.

... noting that Transfer-Encoding of chunked demands that any Content-Length header be ignored, and so no stream-processing should be done based on it. This is currently the case unless wsgi.input_terminated is set.

Should it really be necessary to set wsgi.input_terminated in order for normal chunked encoding e.g. of a 10 byte JSON payload to work?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

asottile picture asottile  ·  11Comments

lepture picture lepture  ·  6Comments

davidism picture davidism  ·  9Comments

ngaya-ll picture ngaya-ll  ·  8Comments

Nessphoro picture Nessphoro  ·  6Comments