requests 🚀 - requests has poor performance streaming large binary responses

That overhead is unexpectedly large. However, avoiding it might be tricky.

The big problem is that we do quite a lot of processing per chunk. That's all the way down the stack: requests, urllib3 and httplib. It would be extremely interesting to see where the time is being spent to work out who is causing the inefficiency.

Lukasa on 5 Dec 2014

guess a next step would be to try profiling httplib / urllib3 to see the
performance there?

Kevin Burke
phone: 925.271.7005 | twentymilliseconds.com

On Thu, Dec 4, 2014 at 5:01 PM, Cory Benfield [email protected]
wrote:

That overhead is unexpectedly large. However, avoiding it might be tricky.

The big problem is that we do quite a lot of processing per chunk. That's
all the way down the stack: requests, urllib3 and httplib. It would be
extremely interesting to see where the time is being spent to work out who
is causing the inefficiency.

—
Reply to this email directly or view it on GitHub
https://github.com/kennethreitz/requests/issues/2371#issuecomment-65732050
.

kevinburke on 5 Dec 2014

Just ran benchmarks with urllib3:

PyPy: 120MB/s
CPython: 70MB/s

And I re-ran CPython + requests: 35MB/s

(My machine seems to be experiencing a far bit of noise in benchmarks, if anyone has a quieter system they can these on, that'd be awesome)

alex on 5 Dec 2014

I tried running these on my machine after shutting down every other
application & terminal window and got a fair amount of noise as well - the
socket benchmark was anywhere from 30mb/s to 460mb/s.

Kevin Burke
phone: 925.271.7005 | twentymilliseconds.com

On Thu, Dec 4, 2014 at 9:24 PM, Alex Gaynor [email protected]
wrote:

Just ran benchmarks with urllib3:

PyPy: 120MB/s
CPython: 70MB/s

And I re-ran CPython + requests: 35MB/s

(My machine seems to be experiencing a far bit of noise in benchmarks, if
anyone has a quieter system they can these on, that'd be awesome)

—
Reply to this email directly or view it on GitHub
https://github.com/kennethreitz/requests/issues/2371#issuecomment-65748982
.

kevinburke on 5 Dec 2014

I made the benchmarks easier to run now, so other folks can hopefully verify my numbers:

CPython:

BENCH SOCKET:
   8GiB 0:00:22 [ 360MiB/s] [======================================================>] 100%
BENCH HTTPLIB:
   8GiB 0:02:34 [53.1MiB/s] [======================================================>] 100%
BENCH URLLIB3:
   8GiB 0:01:30 [90.2MiB/s] [======================================================>] 100%
BENCH REQUESTS
   8GiB 0:01:30 [90.7MiB/s] [======================================================>] 100%
BENCH GO HTTP
   8GiB 0:00:26 [ 305MiB/s] [======================================================>] 100%

PyPy:

BENCH SOCKET:
   8GiB 0:00:22 [ 357MiB/s] [======================================================>] 100%
BENCH HTTPLIB:
   8GiB 0:00:43 [ 189MiB/s] [======================================================>] 100%
BENCH URLLIB3:
   8GiB 0:01:07 [ 121MiB/s] [======================================================>] 100%
BENCH REQUESTS
   8GiB 0:01:09 [ 117MiB/s] [======================================================>] 100%
BENCH GO HTTP
   8GiB 0:00:26 [ 307MiB/s] [======================================================>] 100%

alex on 5 Dec 2014

Uh...those numbers are weird. CPython's httplib is slower than requests or urllib3, even though both libraries use httplib? That just cannot be right.

Lukasa on 5 Dec 2014

They reproduce consistently for me -- can you try the benchmarks and see if
you can reproduce? Assuming you can, do you see anything wrong with the
benchmarks?

On Fri Dec 05 2014 at 11:16:45 AM Cory Benfield [email protected]
wrote:

Uh...those numbers are weird. CPython's httplib is slower than requests or
urllib3, even though both libraries use httplib? That just cannot be right.

—
Reply to this email directly or view it on GitHub
https://github.com/kennethreitz/requests/issues/2371#issuecomment-65821989
.

alex on 5 Dec 2014

I'm just grabbing a known-quiet machine now. Should take a few minutes to become available because it's a physical box that has to get installed (god I love MAAS).

Lukasa on 5 Dec 2014

CPython 2.7.8

BENCH SOCKET:
   8GiB 0:00:26 [ 309MiB/s] [================================>] 100%
BENCH HTTPLIB:
   8GiB 0:02:24 [56.5MiB/s] [================================>] 100%
BENCH URLLIB3:
   8GiB 0:01:42 [79.7MiB/s] [================================>] 100%
BENCH REQUESTS
   8GiB 0:01:45 [77.9MiB/s] [================================>] 100%
BENCH GO HTTP
   8GiB 0:00:27 [ 297MiB/s] [================================>] 100%

frewsxcv on 5 Dec 2014

For what it's worth:

This patch, CPython 3.4.2:

BENCH SOCKET:
   8GiB 0:00:27 [ 302MiB/s] [================================>] 100%
BENCH HTTPLIB:
   8GiB 0:00:53 [ 151MiB/s] [================================>] 100%
BENCH URLLIB3:
   8GiB 0:00:54 [ 149MiB/s] [================================>] 100%
BENCH REQUESTS
   8GiB 0:00:56 [ 144MiB/s] [================================>] 100%
BENCH GO HTTP
   8GiB 0:00:31 [ 256MiB/s] [================================>] 100%

frewsxcv on 5 Dec 2014

You should be able to get that same effect on Python2 with
env PYTHONUNBUFFERED= or the -u flag.

On Fri Dec 05 2014 at 11:42:36 AM Corey Farwell [email protected]
wrote:

For what it's worth:

This patch https://gist.github.com/frewsxcv/1c0f3c81cda508e1bca9, CPython
3.4.2:

BENCH SOCKET:
8GiB 0:00:27 [ 302MiB/s] [================================>] 100%
BENCH HTTPLIB:
8GiB 0:00:53 [ 151MiB/s] [================================>] 100%
BENCH URLLIB3:
8GiB 0:00:54 [ 149MiB/s] [================================>] 100%
BENCH REQUESTS
8GiB 0:00:56 [ 144MiB/s] [================================>] 100%
BENCH GO HTTP
8GiB 0:00:31 [ 256MiB/s] [================================>] 100%

—
Reply to this email directly or view it on GitHub
https://github.com/kennethreitz/requests/issues/2371#issuecomment-65826239
.

alex on 5 Dec 2014

@alex Interestingly, neither env PYTHONUNBUFFERED= or -u has the same effect on Python 2. Results from my machine incoming.

Lukasa on 5 Dec 2014

Alright, the below data comes from a machine that is doing nothing else but running these tests. The last test was run with the Python -u flag set, and as you can see that flag has no effect.

Python 2.7.6
go version go1.2.1 linux/amd64
BENCH SOCKET:
   8GiB 0:00:16 [ 500MiB/s] [================================>] 100%
BENCH HTTPLIB:
   8GiB 0:01:32 [88.6MiB/s] [================================>] 100%
BENCH URLLIB3:
   8GiB 0:01:20 [ 101MiB/s] [================================>] 100%
BENCH REQUESTS
   8GiB 0:01:21 [ 100MiB/s] [================================>] 100%
BENCH GO HTTP
   8GiB 0:00:21 [ 385MiB/s] [================================>] 100%

Python 2.7.6
go version go1.2.1 linux/amd64
BENCH SOCKET:
   8GiB 0:00:16 [ 503MiB/s] [================================>] 100%
BENCH HTTPLIB:
   8GiB 0:01:33 [87.8MiB/s] [================================>] 100%
BENCH URLLIB3:
   8GiB 0:01:20 [ 101MiB/s] [================================>] 100%
BENCH REQUESTS
   8GiB 0:01:22 [99.3MiB/s] [================================>] 100%
BENCH GO HTTP
   8GiB 0:00:20 [ 391MiB/s] [================================>] 100%

Python 2.7.6
go version go1.2.1 linux/amd64
BENCH SOCKET:
   8GiB 0:00:16 [ 506MiB/s] [================================>] 100%
BENCH HTTPLIB:
   8GiB 0:01:31 [89.1MiB/s] [================================>] 100%
BENCH URLLIB3:
   8GiB 0:01:20 [ 101MiB/s] [================================>] 100%
BENCH REQUESTS
   8GiB 0:01:20 [ 101MiB/s] [================================>] 100%
BENCH GO HTTP
   8GiB 0:00:21 [ 389MiB/s] [================================>] 100%

These numbers are extremely stable, and show the following features:

Raw socket reads are fast (duh).
Go is about 80% the speed of a raw socket read.
urllib3 is about 20% the speed of a raw socket read.
requests is slightly slower than urllib3, which makes sense as we add a couple of stack frames for the data to pass through.
httplib is slower than requests/urllib3. That's just impossible, and I suspect that we must be configuring httplib or the sockets library in a way that httplib is not.

Lukasa on 5 Dec 2014

FWIW, I just merged adding buffering=True from @kevinburke, do your runs
include that?

On Fri Dec 05 2014 at 12:04:40 PM Cory Benfield [email protected]
wrote:

Alright, the below data comes from a machine that is doing nothing else
but running these tests. The last test was run with the Python -u flag
set, and as you can see that flag has no effect.

Python 2.7.6
go version go1.2.1 linux/amd64
BENCH SOCKET:
8GiB 0:00:16 [ 500MiB/s] [================================>] 100%
BENCH HTTPLIB:
8GiB 0:01:32 [88.6MiB/s] [================================>] 100%
BENCH URLLIB3:
8GiB 0:01:20 [ 101MiB/s] [================================>] 100%
BENCH REQUESTS
8GiB 0:01:21 [ 100MiB/s] [================================>] 100%
BENCH GO HTTP
8GiB 0:00:21 [ 385MiB/s] [================================>] 100%

Python 2.7.6
go version go1.2.1 linux/amd64
BENCH SOCKET:
8GiB 0:00:16 [ 503MiB/s] [================================>] 100%
BENCH HTTPLIB:
8GiB 0:01:33 [87.8MiB/s] [================================>] 100%
BENCH URLLIB3:
8GiB 0:01:20 [ 101MiB/s] [================================>] 100%
BENCH REQUESTS
8GiB 0:01:22 [99.3MiB/s] [================================>] 100%
BENCH GO HTTP
8GiB 0:00:20 [ 391MiB/s] [================================>] 100%

Python 2.7.6
go version go1.2.1 linux/amd64
BENCH SOCKET:
8GiB 0:00:16 [ 506MiB/s] [================================>] 100%
BENCH HTTPLIB:
8GiB 0:01:31 [89.1MiB/s] [================================>] 100%
BENCH URLLIB3:
8GiB 0:01:20 [ 101MiB/s] [================================>] 100%
BENCH REQUESTS
8GiB 0:01:20 [ 101MiB/s] [================================>] 100%
BENCH GO HTTP
8GiB 0:00:21 [ 389MiB/s] [================================>] 100%

These numbers are extremely stable, and show the following features:

Raw socket reads are fast (duh).

Go is about 80% the speed of a raw socket read.

urllib3 is about 20% the speed of a raw socket read.

requests is slightly slower than urllib3, which makes sense as we
add a couple of stack frames for the data to pass through.

httplib is slower than requests/urllib3. That's just impossible,
and I suspect that we must be configuring httplib or the sockets library in
a way that httplib is not.

—
Reply to this email directly or view it on GitHub
https://github.com/kennethreitz/requests/issues/2371#issuecomment-65829335
.

alex on 5 Dec 2014

Cory - see the latest version of the bench client which turns on the
buffering=True in httplib (as requests/urllib3 do)

Kevin Burke
phone: 925.271.7005 | twentymilliseconds.com

On Fri, Dec 5, 2014 at 10:04 AM, Cory Benfield [email protected]
wrote:

Alright, the below data comes from a machine that is doing nothing else
but running these tests. The last test was run with the Python -u flag
set, and as you can see that flag has no effect.

Python 2.7.6
go version go1.2.1 linux/amd64
BENCH SOCKET:
8GiB 0:00:16 [ 500MiB/s] [================================>] 100%
BENCH HTTPLIB:
8GiB 0:01:32 [88.6MiB/s] [================================>] 100%
BENCH URLLIB3:
8GiB 0:01:20 [ 101MiB/s] [================================>] 100%
BENCH REQUESTS
8GiB 0:01:21 [ 100MiB/s] [================================>] 100%
BENCH GO HTTP
8GiB 0:00:21 [ 385MiB/s] [================================>] 100%

Python 2.7.6
go version go1.2.1 linux/amd64
BENCH SOCKET:
8GiB 0:00:16 [ 503MiB/s] [================================>] 100%
BENCH HTTPLIB:
8GiB 0:01:33 [87.8MiB/s] [================================>] 100%
BENCH URLLIB3:
8GiB 0:01:20 [ 101MiB/s] [================================>] 100%
BENCH REQUESTS
8GiB 0:01:22 [99.3MiB/s] [================================>] 100%
BENCH GO HTTP
8GiB 0:00:20 [ 391MiB/s] [================================>] 100%

Python 2.7.6
go version go1.2.1 linux/amd64
BENCH SOCKET:
8GiB 0:00:16 [ 506MiB/s] [================================>] 100%
BENCH HTTPLIB:
8GiB 0:01:31 [89.1MiB/s] [================================>] 100%
BENCH URLLIB3:
8GiB 0:01:20 [ 101MiB/s] [================================>] 100%
BENCH REQUESTS
8GiB 0:01:20 [ 101MiB/s] [================================>] 100%
BENCH GO HTTP
8GiB 0:00:21 [ 389MiB/s] [================================>] 100%

These numbers are extremely stable, and show the following features:

Raw socket reads are fast (duh).

Go is about 80% the speed of a raw socket read.

urllib3 is about 20% the speed of a raw socket read.

requests is slightly slower than urllib3, which makes sense as we
add a couple of stack frames for the data to pass through.

httplib is slower than requests/urllib3. That's just impossible,
and I suspect that we must be configuring httplib or the sockets library in
a way that httplib is not.

—
Reply to this email directly or view it on GitHub
https://github.com/kennethreitz/requests/issues/2371#issuecomment-65829335
.

kevinburke on 5 Dec 2014

Yeah, that fixes the performance behaviour of httplib to make far more sense.

New results and conclusions:

Python 2.7.6
go version go1.2.1 linux/amd64
BENCH SOCKET:
   8GiB 0:00:16 [ 499MiB/s] [================================>] 100%
BENCH HTTPLIB:
   8GiB 0:01:12 [ 113MiB/s] [================================>] 100%
BENCH URLLIB3:
   8GiB 0:01:21 [ 100MiB/s] [================================>] 100%
BENCH REQUESTS
   8GiB 0:01:20 [ 101MiB/s] [================================>] 100%
BENCH GO HTTP
   8GiB 0:00:20 [ 391MiB/s] [================================>] 100%

Raw socket reads are fast (duh).
Go is about 80% the speed of a raw socket read.
httplib is just under 25% the speed of a raw socket read.
urllib3 is about 20% the speed of a raw socket read, adding some small overhead to httplib.
requests is slightly slower than urllib3, which makes sense as we add a couple of stack frames for the data to pass through.

Lukasa on 5 Dec 2014

So, arguably the real cost here is httplib. Speeding this up requires getting httplib out of the way.

I'm interested to work out what part of httplib is costing us though. I think profiling bench_httplib.py is a good next step.

Lukasa on 5 Dec 2014

I've ruled out the conversion of the socket to a file object through socket.makefile by adding that line to the bench_socket.py test, that doesn't slow it down at all. Weirdly, it appears to make it faster.

Lukasa on 5 Dec 2014

The answer is almost certainly the transfer-encoding: chunked handling.
See: https://github.com/alex/http-client-bench/pull/6 , switching to
Content-Length on the server produces some unexpected results.

On Fri Dec 05 2014 at 12:24:53 PM Cory Benfield [email protected]
wrote:

So, arguably the real cost here is httplib. Speeding this up requires
getting httplib out of the way.

I'm interested to work out what part of httplib is costing us though. I
think profiling bench_httplib.py is a good next step.

—
Reply to this email directly or view it on GitHub
https://github.com/kennethreitz/requests/issues/2371#issuecomment-65831653
.

alex on 5 Dec 2014

Interesting.

The chunked handling is almost certainly the problem, and I'm not really surprised that go handles it better, especially as chunked is the default HTTP mode for go.

However, requests being fast than a raw socket is...unexpected!

One thing worth noting: if the socket wasn't decoding the chunked encoding in the previous tests then it got an unfair advantage, as it was actually reading less data than the other methods were! They were all reading the chunked headers as well as the 8GB of data.

This leads to a follow-on question: do we still think all of these methods are actually reading the same amount of data?

Lukasa on 5 Dec 2014

Yes, the socket layer was cheating, it didn't decode the chunked metadata,
and technically read a bit less. It was there as a baesline for "how fast
can we read", not to prove anything.

On Fri Dec 05 2014 at 12:33:10 PM Cory Benfield [email protected]
wrote:

Interesting.

The chunked handling is almost certainly the problem, and I'm not really
surprised that go handles it better, especially as chunked is the default
HTTP mode for go.

However, requests being fast than a raw socket is...unexpected!

One thing worth noting: if the socket wasn't decoding the chunked encoding
in the previous tests then it got an unfair advantage, as it was actually
reading less data than the other methods were! They were all reading the
chunked headers as well as the 8GB of data.

This leads to a follow-on question: do we still think all of these methods
are actually reading the same amount of data?

—
Reply to this email directly or view it on GitHub
https://github.com/kennethreitz/requests/issues/2371#issuecomment-65833299
.

alex on 5 Dec 2014

I wouldn't be surprised if this is related to the chunk size that we're reading off the socket at a time.

kennethreitz42 on 8 Dec 2014

Cake for @alex for being super helpful :cake:

kennethreitz42 on 8 Dec 2014

@nelhage did some stracing of the various examples (in the transfer
encoding: chunked case) https://gist.github.com/nelhage/dd6490fbc5cfb815f762
are the results. It looks like there's a bug in httplib which results in it
not always reading a full chunk off the socket.

On Mon Dec 08 2014 at 9:05:14 AM Kenneth Reitz [email protected]
wrote:

Cake for @alex https://github.com/alex for being super helpful [image:
:cake:]

—
Reply to this email directly or view it on GitHub
https://github.com/kennethreitz/requests/issues/2371#issuecomment-66147998
.

alex on 8 Dec 2014

So what we have here is a bug in a standard library that no one is really maintaining? (@Lukasa has at least 2 patch sets that have been open for >1 year.) Maybe I'll raise a stink on a list somewhere tonight

sigmavirus24 on 8 Dec 2014

Someone (I might get to it, unclear) probably needs to drill down with pdb
or something and figure out what exact code is generating those 20-byte
reads so we can put together a good bug report.

On Mon Dec 08 2014 at 9:14:09 AM Ian Cordasco [email protected]
wrote:

So what we have here is a bug in a standard library that no one is really
maintaining? (@Lukasa https://github.com/Lukasa has at least 2 patch
sets that have been open for >1 year.) Maybe I'll raise a stink on a list
somewhere tonight

—
Reply to this email directly or view it on GitHub
https://github.com/kennethreitz/requests/issues/2371#issuecomment-66149522
.

alex on 8 Dec 2014

I'll try to fit that in tonight or tomorrow if no one else gets to it.

sigmavirus24 on 8 Dec 2014

So, any news on the root cause? What's generating these short reads, and how much does the situation improve without them?

kislyuk on 19 Dec 2014

@kislyuk Not as far as I'm aware. Hopefully I'll have some time to chase it down this christmas holiday.

Lukasa on 20 Dec 2014

Thanks @Lukasa. I'm dealing with a performance issue where download speed on a chunked response using urllib3/requests is much slower than with curl and other libraries, and trying to understand if this is the culprit.

kislyuk on 22 Dec 2014

I was poking around with this a little. The short reads come from the _read_chunked functon in httplib

https://fossies.org/linux/misc/Python-2.7.9.tgz/Python-2.7.9/Lib/httplib.py#l_585

The 2 byte reads seems to be primarily from line 622.

I got a slightly different strace pattern to the one posted earlier:
recvfrom(3, "400\r\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192, 0, NULL, NULL) = 8192
recvfrom(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 54, 0, NULL, NULL) = 54
recvfrom(3, "\r\n", 2, 0, NULL, NULL) = 2

This pattern can be explained as follows:

the self.fp.readline (line 591) triggers a buffered read for 8192 bytes (in socket.readline)
each chunk that is consumed is 1031 bytes (5 byte chunk length ("400\r\n") + 1024 bytes of data + 2 bytes of terminator)
we can consume 7 such chunks out of the buffered 8192 bytes which leaves us with 975 bytes
we then read the next chunk length (5 bytes) which leaves with 970 bytes
we now only have 970 bytes which is insufficient to fulfil the current chunk (1024) so we go back to the network for the shortfall of 54 bytes
to accomplish this httplib does a sock.read(54) on the outstanding bytes. socket.read in this case (with an explicit length) will opt to go to the network for the specified 54 bytes (rather than buffering another 8192)
we then get to reading the chunk terminator which is 2 bytes and again that is the same scenario as above

The pattern will then repeat (go back to step 1)

FWIW, I found that a modest (20% or so) speed-up could be made here by rolling the 2 byte chunk terminator read into the chunk body read, i.e. rather than this:

            value.append(self._safe_read(chunk_left)) 
            amt -= chunk_left

        self._safe_read(2)  # toss the CRLF at the end of the chunk

do this instead:

            value.append(self._safe_read(chunk_left + 2)[:-2]) 
            amt -= chunk_left

Really, though, it would probably be better if the read for the 54 bytes could buffer more bytes than 54 (i.e. 8192 bytes) which would mean the buffered socket would not be empty when it comes to the 2 byte read.

gardenia on 7 Mar 2015

Further to this. I'm not sure the small reads are the main factor in the loss of throughput (or not on localhost). I played around with the socket buffer size such that it was a multiple of 1031 bytes and despite the strace no longer having small reads it didn't have much impact on the throughput.

I think the loss of throughput may be more to do with how socket.py deals with small reads. Here is the relevant code (from socket.read):

https://fossies.org/linux/misc/Python-2.7.9.tgz/Python-2.7.9/Lib/socket.py#l_336

When you pass an explicit length into socket.read and it can be fulfilled from existing buffered data then this is the code path:

        buf = self._rbuf
        buf.seek(0, 2)  # seek end

        #.....

        # Read until size bytes or EOF seen, whichever comes first
        buf_len = buf.tell()
        if buf_len >= size:
            # Already have size bytes in our buffer?  Extract and return.
            buf.seek(0)
            rv = buf.read(size)
            self._rbuf = StringIO()
            self._rbuf.write(buf.read())
            return rv

The issue I perceive here is that even a 2 byte read means copying the unread remainder into a fresh StringIO. This looks like it will become very expensive for a lot of small reads. If a given StringIO could somehow drained on each read rather than the current pattern of copying the unread remainder into a fresh StringIO then I expect that may help the throughput

gardenia on 7 Mar 2015

@gardenia I haven't had a chance to absorb all of this, but thank you so much for your effort and work here. @shazow perhaps you'd find @gardenia's research interesting.

sigmavirus24 on 7 Mar 2015

:+1: thanks @gardenia. Incidentally, my own research into performance in my use case uncovered that in my case the responses are not chunked, but urllib3 performs 20+% faster than requests, so there is some overhead being introduced that I want to characterize. Still in line with the title of this issue, but different root cause.

kislyuk on 7 Mar 2015

Fascinating, thanks for sharing! :)

Seems like a great goal for @Lukasa's Hyper to address, too.

shazow on 7 Mar 2015

@alex - I toyed around a little with the urllib3 vs requests non-chunked performance issue you mentioned. I think I see a similar 20% drop in requests.

In requests I speculatively tried replacing the call to self.raw.stream with the inlined implementation of stream() (from urllib3). It seemed to bring the throughput a lot closer between requests and urllib3, at least on my machine:

--- requests.repo/requests/models.py    2015-03-06 16:05:52.072509869 +0000
+++ requests/models.py  2015-03-07 20:49:25.618007438 +0000
@@ -19,6 +19,7 @@
 from .packages.urllib3.fields import RequestField
 from .packages.urllib3.filepost import encode_multipart_formdata
 from .packages.urllib3.util import parse_url
+from .packages.urllib3.util.response import is_fp_closed
 from .packages.urllib3.exceptions import (
     DecodeError, ReadTimeoutError, ProtocolError, LocationParseError)
 from .exceptions import (
@@ -652,8 +654,12 @@
             try:
                 # Special case for urllib3.
                 try:
-                    for chunk in self.raw.stream(chunk_size, decode_content=True):
-                        yield chunk
+                    while not is_fp_closed(self.raw._fp):
+                        data = self.read(amt=chunk_size, decode_content=True)
+
+                        if data:
+                            yield data
+
                 except ProtocolError as e:
                     raise ChunkedEncodingError(e)
                 except DecodeError as e:

Maybe you could try the same on your machine to see if it makes a difference for you too.

(Note yes I know the call to is_fp_closed is encapsulation busting, it isn't meant as a serious patch just a data point)

gardenia on 7 Mar 2015

@shazow It's my hope that the BufferedSocket that hyper uses should address a lot of that inefficiency, by essentially preventing small reads. I wonder if httplib on Py3 has this problem, because it uses io.BufferedReader extensively, which should provide roughly the same kind of benefit as the BufferedSocket.

Certainly, however, when hyper grows enough HTTP/1.1 functionality to be useful we should try to benchmark it alongside these other implementations and make efforts to make hyper as fast as possible.

Lukasa on 7 Mar 2015

Inactive for almost a year. Closing.

sigmavirus24 on 16 Apr 2016

👎4

I'm seeing similar issues, 10x less throughput using requests in comparison with urllib3.

c4milo on 9 Jan 2018

I think the issue lives within urllib3's HTTPResponse class, when it is read as an iterator, its throughput is just really bad. I got my code working with a very ugly hack: I return the underlined httplib.HTTPResponse object used by urllib3 and that seems to fix my throughput problem.

Interesting fact: urllib3's HTTPResponse superclass is io.IOBase. Python3's httplib.HTTPResponse superclass is io.BufferedIOBase. I wonder if it has anything to do with this.

c4milo on 10 Jan 2018

requests has poor performance streaming large binary responses

All 40 comments

Related issues