Aiohttp: Degrading performance over time...

Created on 17 Apr 2017  ·  44Comments  ·  Source: aio-libs/aiohttp

I wrote a quick AsyncScraper class below:

import logging, datetime, time
import aiohttp
import asyncio
import uvloop

# asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())

logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
logging.basicConfig(format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger.addHandler(logging.StreamHandler())

class AsyncScraper(object):
    headers = {"User-Agent" : 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.131 Safari/537.36'}
    def __init__(self, max_connections=1000, timeout=10):
        self.max_connections = max_connections
        self.timeout = timeout

    async def get_response(self, url, session):
        with aiohttp.Timeout(timeout=self.timeout):
            async with session.get(url, allow_redirects=True, headers=AsyncScraper.headers, timeout=self.timeout) as response:
                try:
                    content = await response.text()
                    return {'error': "", 'status': response.status, 'url':url, 'content': content, 'timestamp': str(datetime.datetime.utcnow())}
                except Exception as err:
                    return {'error': err, 'status': "", 'url':url, 'content': "", 'timestamp': str(datetime.datetime.utcnow())}
                finally:
                    response.close()

    def get_all(self, urls):
        loop = asyncio.get_event_loop()
        with aiohttp.ClientSession(loop=loop, connector=aiohttp.TCPConnector(keepalive_timeout=10, limit=self.max_connections, verify_ssl=False)) as session:
            tasks = asyncio.gather(*[self.get_response(url, session) for url in urls], return_exceptions=True)
            results = loop.run_until_complete(tasks)
            return results

def chunks(l, n):
    for i in range(0, len(l), n):
        yield l[i:i + n]

def process_urls(urls, chunk_size=1000):
    scraper = AsyncScraper()

    results = []
    t0 = time.time()
    for i, urls_chunk in enumerate(chunks(sorted(set(urls)), chunk_size)):
        t1 = time.time()
        result = scraper.get_all(urls_chunk)
        success_size = len( [_ for _ in result if ((isinstance(_, Exception) is False) and (_['status']==200)) ] )
        results.extend(result)
        logger.debug("batch {} => success: {} => iteration time: {}s =>  total time: {}s => total processed {}".format(i+1, success_size, time.time()-t1, time.time()-t0, len(results)))
    return results

and I've run into two main issues:

  1. If I pass in a flat list of URLs, say 100k (via the get_all method), I get flooded with errors:

    2017-04-17 15:50:53,541 - asyncio - ERROR - Fatal error on SSL transport
    protocol:
    transport: <_SelectorSocketTransport closing fd=612 read=idle write=>
    Traceback (most recent call last):
    File "/Users/vgoklani/anaconda3/lib/python3.6/asyncio/sslproto.py", line 639, in _process_write_backlog
    ssldata = self._sslpipe.shutdown(self._finalize)
    File "/Users/vgoklani/anaconda3/lib/python3.6/asyncio/sslproto.py", line 151, in shutdown
    raise RuntimeError('shutdown in progress')
    RuntimeError: shutdown in progress

  2. I then batched the URLs in chunks of 1,000, and timed the response between batches. And I was clearly able to measure the performance decay over time (see below). Moreover, the number of errors increased over time... What am I doing wrong?

    iteration 0 done in 16.991s
    iteration 1 done in 39.376s
    iteration 2 done in 35.656s
    iteration 3 done in 19.716s
    iteration 4 done in 29.331s
    iteration 5 done in 19.708s
    iteration 6 done in 19.572s
    iteration 7 done in 29.907s
    iteration 8 done in 23.379s
    iteration 9 done in 21.762s
    iteration 10 done in 22.091s
    iteration 11 done in 22.940s
    iteration 12 done in 31.285s
    iteration 13 done in 24.549s
    iteration 14 done in 26.297s
    iteration 15 done in 23.816s
    iteration 16 done in 29.094s
    iteration 17 done in 24.885s
    iteration 18 done in 26.456s
    iteration 19 done in 27.412s
    iteration 20 done in 29.969s
    iteration 21 done in 28.503s
    iteration 22 done in 28.699s
    iteration 23 done in 31.570s
    iteration 26 done in 31.898s
    iteration 27 done in 33.553s
    iteration 28 done in 34.022s
    iteration 29 done in 33.866s
    iteration 30 done in 36.351s
    iteration 31 done in 40.060s
    iteration 32 done in 35.523s
    iteration 33 done in 36.607s
    iteration 34 done in 36.325s
    iteration 35 done in 38.425s
    iteration 36 done in 39.106s
    iteration 37 done in 38.972s
    iteration 38 done in 39.845s
    iteration 39 done in 40.393s
    iteration 40 done in 40.734s
    iteration 41 done in 47.799s
    iteration 42 done in 43.070s
    iteration 43 done in 43.365s
    iteration 44 done in 42.081s
    iteration 45 done in 44.118s
    iteration 46 done in 44.955s
    iteration 47 done in 45.400s
    iteration 48 done in 45.987s
    iteration 49 done in 46.041s
    iteration 50 done in 45.899s
    iteration 51 done in 49.008s
    iteration 52 done in 49.544s
    iteration 53 done in 55.432s
    iteration 54 done in 52.590s
    iteration 55 done in 50.185s
    iteration 56 done in 52.858s
    iteration 57 done in 52.698s
    iteration 58 done in 53.048s
    iteration 59 done in 54.120s
    iteration 60 done in 54.151s
    iteration 61 done in 55.465s
    iteration 62 done in 56.889s
    iteration 63 done in 56.967s
    iteration 64 done in 57.690s
    iteration 65 done in 57.052s
    iteration 66 done in 67.214s
    iteration 67 done in 58.457s
    iteration 68 done in 60.882s
    iteration 69 done in 58.440s
    iteration 70 done in 60.755s
    iteration 71 done in 58.043s
    iteration 72 done in 65.076s
    iteration 73 done in 63.371s
    iteration 74 done in 62.800s
    iteration 75 done in 62.419s
    iteration 76 done in 61.376s
    iteration 77 done in 63.164s
    iteration 78 done in 65.443s
    iteration 79 done in 64.616s
    iteration 80 done in 69.544s
    iteration 81 done in 68.226s
    iteration 82 done in 78.050s
    iteration 83 done in 67.871s
    iteration 84 done in 69.780s
    iteration 85 done in 67.812s
    iteration 86 done in 68.895s
    iteration 87 done in 71.086s
    iteration 88 done in 68.809s
    iteration 89 done in 70.945s
    iteration 90 done in 72.760s
    iteration 91 done in 71.773s
    iteration 92 done in 72.522s

The time here corresponds to the iteration time to process 1,000 URLs. Please advise. Thanks

outdated

Most helpful comment

Maybe the issue is one of expectation? The crux of this seems to be "is this a bug, or is another method meant to used to support this use case?" and no one really seems to be sure. I'd argue that right now it's at best unexpected behaviour that looks like a bug to the user, which results in a response ranging from mild frustration to simply not utilising the library.

All 44 comments

What version of aiohttp do you use? Also check for open tcp connection, does number grow?

On Apr 17, 2017, at 12:46 PM, Vishal Goklani notifications@github.com wrote:

I wrote a quick AsyncScraper class below:

import logging, datetime
import aiohttp
import asyncio
import uvloop

asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())

logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
logging.basicConfig(format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger.addHandler(logging.StreamHandler())

class AsyncScraper(object):
headers = {"User-Agent" : 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.131 Safari/537.36'}
def __init__(self, number_workers=100, max_connections=1000, timeout=10):
self.number_workers = number_workers
self.max_connections = max_connections
self.timeout = timeout
self.work_queue = asyncio.Queue()
self.cookie_jar = aiohttp.CookieJar(unsafe=True)
self.session = aiohttp.ClientSession(connector=aiohttp.TCPConnector(limit=self.max_connections, verify_ssl=False), cookie_jar=self.cookie_jar)

def __del__(self):
self.session.close()

async def get_response(self, url):
try:
with aiohttp.Timeout(self.timeout):
async with self.session.get(url, allow_redirects=True, headers=AsyncScraper.headers) as response:
content = await response.text()
return {'error': "", 'status': response.status, 'url':url, 'content': content, 'timestamp': str(datetime.datetime.utcnow())}
except Exception as err:
return {'error': err, 'status': "", 'url':url, 'content': "", 'timestamp': str(datetime.datetime.utcnow())}

async def consumer(self, worker_id):
results = []
while not self.work_queue.empty():
queue_url = await self.work_queue.get()

      logger.debug("processing {} from worker_{}".format(queue_url, worker_id))
      response = await self.get_response(queue_url)
      logger.debug("received {} from worker_{}".format(queue_url, worker_id))
      if response['error'] is "":
          results.append(response)
  logger.debug("finished processing {} urls via worker_{}".format(len(results), worker_id))
  return results

def get_all(self, urls):
[self.work_queue.put_nowait(url) for url in urls]
tasks = asyncio.gather(*[self.consumer(worker_id) for worker_id in range(self.number_workers)], return_exceptions=True)
loop = asyncio.get_event_loop()
results = loop.run_until_complete(tasks)
return [__ for _ in results for __ in _]
and I've run into two main issues:

If I pass in a flat list of URLs, say 100k, I get flooded with timeout errors:

I then batched the URLs in chunks of 1,000, and timed the response between batches. And I was able to clearly measure that the performance degrades over time:

iteration 0 done in 16.991s
iteration 1 done in 39.376s
iteration 2 done in 35.656s
iteration 3 done in 19.716s
iteration 4 done in 29.331s
iteration 5 done in 19.708s
iteration 6 done in 19.572s
iteration 7 done in 29.907s
iteration 8 done in 23.379s
iteration 9 done in 21.762s
iteration 10 done in 22.091s
iteration 11 done in 22.940s
iteration 12 done in 31.285s
iteration 13 done in 24.549s
iteration 14 done in 26.297s
iteration 15 done in 23.816s
iteration 16 done in 29.094s
iteration 17 done in 24.885s
iteration 18 done in 26.456s
iteration 19 done in 27.412s
iteration 20 done in 29.969s
iteration 21 done in 28.503s
iteration 22 done in 28.699s
iteration 23 done in 31.570s
iteration 26 done in 31.898s
iteration 27 done in 33.553s
iteration 28 done in 34.022s
iteration 29 done in 33.866s
iteration 30 done in 36.351s
iteration 31 done in 40.060s
iteration 32 done in 35.523s
iteration 33 done in 36.607s
iteration 34 done in 36.325s
iteration 35 done in 38.425s
iteration 36 done in 39.106s
iteration 37 done in 38.972s
iteration 38 done in 39.845s
iteration 39 done in 40.393s
iteration 40 done in 40.734s
iteration 41 done in 47.799s
iteration 42 done in 43.070s
iteration 43 done in 43.365s
iteration 44 done in 42.081s
iteration 45 done in 44.118s
iteration 46 done in 44.955s
iteration 47 done in 45.400s
iteration 48 done in 45.987s
iteration 49 done in 46.041s
iteration 50 done in 45.899s
iteration 51 done in 49.008s
iteration 52 done in 49.544s
iteration 53 done in 55.432s
iteration 54 done in 52.590s
iteration 55 done in 50.185s
iteration 56 done in 52.858s
iteration 57 done in 52.698s
iteration 58 done in 53.048s
iteration 59 done in 54.120s
iteration 60 done in 54.151s
iteration 61 done in 55.465s
iteration 62 done in 56.889s
iteration 63 done in 56.967s
iteration 64 done in 57.690s
iteration 65 done in 57.052s
iteration 66 done in 67.214s
iteration 67 done in 58.457s
iteration 68 done in 60.882s
iteration 69 done in 58.440s
iteration 70 done in 60.755s
iteration 71 done in 58.043s
iteration 72 done in 65.076s
iteration 73 done in 63.371s
iteration 74 done in 62.800s
iteration 75 done in 62.419s
iteration 76 done in 61.376s
iteration 77 done in 63.164s
iteration 78 done in 65.443s
iteration 79 done in 64.616s
iteration 80 done in 69.544s
iteration 81 done in 68.226s
iteration 82 done in 78.050s
iteration 83 done in 67.871s
iteration 84 done in 69.780s
iteration 85 done in 67.812s
iteration 86 done in 68.895s
iteration 87 done in 71.086s
iteration 88 done in 68.809s
iteration 89 done in 70.945s
iteration 90 done in 72.760s
iteration 91 done in 71.773s
iteration 92 done in 72.522s

The time here corresponds to the iteration time to process 1,000 URLs. Please advise. Thanks


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub https://github.com/aio-libs/aiohttp/issues/1821, or mute the thread https://github.com/notifications/unsubscribe-auth/AAkjzsPpQtk7SDgKmTUKUblSZ0Bfu1lVks5rw8GTgaJpZM4M_fXe.

I'm using '2.0.5'. How do I check for the open tcp connection?

What version of python do you use?

To check connections use “netstat” command, take several measurements

On Apr 17, 2017, at 12:50 PM, Vishal Goklani notifications@github.com wrote:

I'm using '2.0.5'. How do I check for the open tcp connection?


You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/aio-libs/aiohttp/issues/1821#issuecomment-294571709, or mute the thread https://github.com/notifications/unsubscribe-auth/AAkjzh_1rIO2GvJzBYhX94q03wH0Fmfsks5rw8JrgaJpZM4M_fXe.

I'm using python 3.6.0

Am I making an obvious mistake in the code?

I do not see anything obvious. I just want to clear what part of aiohttp is the source of the problem.
Also python3.6 leaks ssl sockets.

On Apr 17, 2017, at 3:02 PM, Vishal Goklani notifications@github.com wrote:

Am I making an obvious mistake in the code?


You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/aio-libs/aiohttp/issues/1821#issuecomment-294607188, or mute the thread https://github.com/notifications/unsubscribe-auth/AAkjzl8UAYIRQ4JYbn5iuOoqiB1fZCshks5rw-F1gaJpZM4M_fXe.

Is there anything specific I can do to help you test? I'm not clear on what you want me to do with netstat?

does number of opened connections grow?

if linux - run this in console while process first, second, midle and last chunks of urls

PID=pid_of_server
netstat -npt | grep $PID/ | wc -l

I've just encountered this. 10k async reqs using a TCPConnecter with a limit of 100, initially starts well but immediately the connections seen via netstat drop away. It's a long tail but eventually it hits 0 at which point all remaining requests throw.

Using an asyncio.Semaphore to rate limit seems to workaround this issue (netstat reports at or close to the limit for the duration), although I'm relatively new to aio python, so I'm unsure at what cost, if any.

Edit: Seen on Ubuntu 16.04 with Python 3.5.2 using:

aiohttp==2.0.7
appdirs==1.4.3
async-timeout==1.2.1
chardet==3.0.3
multidict==2.1.5
packaging==16.8
pyparsing==2.2.0
six==1.10.0
yarl==0.10.2

@vgoklani @mal can you try the same test disabling the timeout? I know its not ideal but its to check if the error is because of task cancellation or something else

Just ran it again with timeout=None in the session.request call, doesn't appear to have changed behaviour any, connections still dry up. I've cleaned up my testcase a bit and made it reproducible locally. Hopefully this will help: https://gist.github.com/mal/53506cd81fa27c39afc4f7fbef7b7b5a

FWIW, it also runs through requests about 4 times faster using asyncio.Semaphore locally.

highly recommend trying with 3.5.2 to see if it's related to the ssl leak. Also I use a pattern like this for doing multiple tasks in parallel: https://gist.github.com/thehesiod/7081ab165b9a0d4de2e07d321cc2391d. Spawning too many async tasks makes things very slow as it has to navigate through the giant list of tasks.

@thehesiod I have the same issue, running 3.5.2. I ran this snippet overnight:

async def page_get(url):
    while True:
        try:
            t = time.time()
            async with aiohttp.request("GET", url) as req:
                resp_content = await req.read()
            print(time.time() - t, "seconds taken")
        except Exception as e:
            print(str(datetime.datetime.now()) + " - Connection error: {0}".format(str(e)))
        await asyncio.sleep(2)

if __name__ == "__main__":
    loop = asyncio.get_event_loop()
    loop.run_until_complete(asyncio.wait([page_get('http://www.google.com')]))

0.05777549743652344 seconds taken
0.05143165588378906 seconds taken
0.045487165451049805 seconds taken
0.0573725700378418 seconds taken
0.049524545669555664 seconds taken

Today when I checked it, I was getting:

1.3857007026672363 seconds taken
0.47487688064575195 seconds taken
0.23848772048950195 seconds taken
2.2507810592651367 seconds taken
0.25020551681518555 seconds taken
0.12477540969848633 seconds taken
2.4836790561676025 seconds taken
1.1492459774017334 seconds taken

@thehesiod running the testcase I posted against 3.5.2, 3.5.3 and for good measure 3.6.1, all show the same degredation around the 7k mark.

Mind taking snapshots when fast and slow with pyvmmonitor?

For whatever reason pyvmmonitor won't play ball (X11 issues, missing python modules, unmentioned missing shared object files). Happy to provide cProfiles if those are helpful, but if you need the pyvmmonitor output it looks like your best bet is running the testcase yourself I'm afraid, sorry I can't be of more help :slightly_frowning_face:

Of course as soon as I said that, I realised what was wrong. >_<

The testcase now includes fast and slow variants (fast = semaphore, slow = limit param) and the pstat file from running each.

nice, that should help us, thanks!

looks like in the slow case for some reason ~60% of the time is spent in select.epoll. I can't even find it in the fast case yet on my laptop monitor. Also I noticed in slow.py you're not draining the connection before returning it to the pool (reading body). I forget if this is now done automatically by ClientResponse.__aexit__. Between your fast and slow you don't pass limit to fast, so I'm guessing the 'slowness' is aiohttp waiting to return connections to the pool. Any ideas @fafhrd91 ?

@mal have you checked SSL vs non-SSL? Would help narrow the problem

if SSL only may want to try with https://github.com/python/cpython/pull/2270

Sorry for the delay, I was on vacation last week. I've not tried SSL vs non-SSL but the testcase I was using is plain old HTTP not HTTPS. Is there any reason to suspect that performance would be better with the additional overhead of HTTPS?

Ah ok. No just thought I'd check it were https only could be explained by existing issues

Guys could you test against new aiohttp 2.2.0 release?

Just re-ran the testcase using 2.2.0 issue still exists. 😞

Just re-ran the testcase against 2.3.6 and can confirm that the issue is still present 🙁 In this latest run using the internal limit resulted in a total time for 10000 that was ~4x slower (20.13s) than the manually implemented semaphore approach (5.83).

I'll try to find time if nobody from @aio-libs/aiohttp-committers pick up the issue first

It seems that @dionhaefner fixed it on https://github.com/DHI-GRAS/itsybitsy/issues/1 maybe we can have some info here?

Not exactly fixed - I just used @mal's workaround to use a Semaphore for rate-limiting rather than the limit keyword:

sem = asyncio.Semaphore(max_connections)
connector = aiohttp.TCPConnector(limit=None)
session = aiohttp.ClientSession(connector=connector)

async def get_page(url):
    async with sem:
        async with session.get(url):
            ...

performs well for me.

@dteh it's hard to use connections to google.com as a testcase given its response times can vary widely, I suggest trying again with a local server.

@mal I'll let @asvetlov comment on the non-semaphore case, but my understanding is that adding a lot of tasks will make everything slow as the message loop is going to walk through all the tasks on each blocking call. The semaphore / workerpool (https://gist.github.com/thehesiod/7081ab165b9a0d4de2e07d321cc2391d) is the better strategy to only have as many work tasks active as you are able to process.

The implementation of the asyncio.Semaphore, asyncio.Queue and the aiohttp.BaseConnector.connect that allows making back pressure to limit the number of concurrent connection share the same pattern and have a quite identical implementation. So:

1) The aiohttp.BaseConnector.connect has a bug implementing the connection limitations.
2) There is a bug between the .get and the .connect functions that appear when the connections are limited and is addressed using an outer semaphore (the back pressures is made at the beginning)
3) Others.

@thehesiod it's been a while since I looked at this, but it sounds like you're suggesting that having to use a semaphore or queue to get it performant is expected? If that is the case, I'd question what the limit parameter is meant to achieve, because (from memory) it's description suggests it would support the identified use case. If it's not designed to do that then maybe that's a feature that could be considered?

To me it seems that this issue is blocking enough to consider deploying a solution that works but might not be perfect. I know it's there for a year now and there wasn't much complaint but... Imagine people who wants to run in production with aiohttp, they can't because of such issue.

@asvetlov I propose we double check and confirm this issue, then apply this manual semaphore mechanism directly to aiohttp on all maintained versions (as it seems to be the best way to solve it). We can still improve/replace it later if needed. What do you think?

As @pfreixes commented if aiohttp has a bug -- it should be fixed.
We need to discover what's going on first.

@mal the limit parameter is to limit the number of concurrent connections, not in general to apply any type of pressure. Just as a note I use aiohttp to make millions of requests every day in a prod environment across multiple services for years and haven't run into this issue by using things like worker pools and semaphores.

Maybe the issue is one of expectation? The crux of this seems to be "is this a bug, or is another method meant to used to support this use case?" and no one really seems to be sure. I'd argue that right now it's at best unexpected behaviour that looks like a bug to the user, which results in a response ranging from mild frustration to simply not utilising the library.

@mal after further reading of your code I didn't realize how similar you made the two codepaths whereby in both cases you have the same number of outstanding tasks, in one case they're waiting on a semaphore to unblock them, and in the other on the TCPConnector limiting. I did another change to make the fast case use a LIMIT as well to ensure both cases are similar and it had no effect on performance for the fast case which points to an issue in aiohttp.

So this is rather interesting. we know per the performance trace that the selectors.py:select method is causing things to run slowly (doesn't even show up in the fast trace), so it sounds like aiohttp is sub-optimally limiting connections as you suggested. I now think this is definitely a performance issue in aiohttp. I'm guessing it's due to the waiters mechanism in BaseConnector. You can envision that at start LIMIT connections get created, then LIMIT * 99 end up as "waiters". I'm wondering if this can be replaced with a simple semaphore or condition.

apologies for making an assumption about your example. Thanks again for pushing this issue! I think I'll have some time to experiment a bit with this today but hopefully @asvetlov can pursue this separately. I'm not a contributor but am an interested party :)

so I created a PR which greatly speeds up both cases. What this does is avoid listing through the the list of waiters in non-exceptional cases

on my machine it went from ~13s to 3s in the slow case

anyone want to verify the results from my PR?

Should be fixed by #2937 and #2964

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a [new issue] for related bugs.
If you feel like there's important points made in this discussion, please include those exceprts into that [new issue].

Was this page helpful?
0 / 5 - 0 ratings