Please bear with me as I'm quite new with Python and github in general.
I have been using requests to scrape data from the Play Store. I need to make a large amount of requests (about 20k). It works great for about 3000-4000 requests but gets stuck after that (SSL Error). I am not familiar with SSL and requests, so I don't know what causes this.
Error:
(SSLError Traceback (most recent call last)
<ipython-input-23-1da544640d89> in <module>()
53 time.sleep(0.1)
54
---> 55 r = requests.get('https://play.google.com' + link + '&hl=en')
56 link_tree = html.fromstring(r.content)
57 description = link_tree.xpath('//div[@jsname="C4s9Ed"]/text()') + link_tree.xpath('//div[@jsname="C4s9Ed"]/p/text()')
C:\Users\Nathan\AppData\Local\Enthought\Canopy\User\lib\site-packages\requests\api.pyc in get(url, params, **kwargs)
65
66 kwargs.setdefault('allow_redirects', True)
---> 67 return request('get', url, params=params, **kwargs)
68
69
C:\Users\Nathan\AppData\Local\Enthought\Canopy\User\lib\site-packages\requests\api.pyc in request(method, url, **kwargs)
51 # cases, and look like a memory leak in others.
52 with sessions.Session() as session:
---> 53 return session.request(method=method, url=url, **kwargs)
54
55
C:\Users\Nathan\AppData\Local\Enthought\Canopy\User\lib\site-packages\requests\sessions.pyc in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
466 }
467 send_kwargs.update(settings)
--> 468 resp = self.send(prep, **send_kwargs)
469
470 return resp
C:\Users\Nathan\AppData\Local\Enthought\Canopy\User\lib\site-packages\requests\sessions.pyc in send(self, request, **kwargs)
574
575 # Send the request
--> 576 r = adapter.send(request, **kwargs)
577
578 # Total elapsed time of the request (approximately)
C:\Users\Nathan\AppData\Local\Enthought\Canopy\User\lib\site-packages\requests\adapters.pyc in send(self, request, stream, timeout, verify, cert, proxies)
445 except (_SSLError, _HTTPError) as e:
446 if isinstance(e, _SSLError):
--> 447 raise SSLError(e, request=request)
448 elif isinstance(e, ReadTimeoutError):
449 raise ReadTimeout(e, request=request)
SSLError: EOF occurred in violation of protocol (_ssl.c:590) )
The-efi, on this github, seemed to have the same problem on this thread: https://github.com/kennethreitz/requests/issues/3006 (see below, he was not the OP) but I was not able to find the thread he opened for further assistance. I use Python 2.7. as well.
I have been stuck on this for quite a while now and I can't find any answer here nor StackOverflow (the answer was probably right under my nose but I've had trouble understanding them because of my lack of knowledge in SSL & requests).
Thank you in advance for your help, and sorry if something is unclear -- please let me know.
When you say you get stuck, is it just that the exception fires? Or do follow-up requests not work? I ask because transient network errors _do_ occur, and if you're making large numbers of web requests you should consider implementing some kind of retry logic in the face of them.
The exception fires. Follow-up requests seem to work, but I have not yet tried to implement retry. I was afraid I was being against the rules of making too many requests to a server or something, I guess.
I will definitely try that and update this thread. Thanks!
Well for what it's worth, because you're using requests.*
you're putting yourself at more risk of overloading the network resources between you and the server. You should try using a session.
For anyone with this problem:
I've fixed it by following @Lukasa 's suggestions and added this just after importing requests:
import requests
sess = requests.Session()
adapter = requests.adapters.HTTPAdapter(max_retries = 20)
sess.mount('http://', adapter)
Then, where I was using requests.get()
before, I used sess.get()
.
Hopefully this helps, and thanks for your help @Lukasa !
I had exactly the same error message, the problem was I didn't have ndg-httpsclient installed
@variable I installed ndg-httpsclient but the same error:urllib.error.URLError:
Most helpful comment
I had exactly the same error message, the problem was I didn't have ndg-httpsclient installed
https://github.com/kennethreitz/requests/issues/3605