Requests: unusual url string crashs is py3

Created on 19 Jun 2017  ·  4Comments  ·  Source: psf/requests

I installed the current master by pip install zipfile in a recent python3 conda environment

base_url = 'http://............127.0.0.1:8082'
request.get(base_url)
crashes

and ends with an UnidodeError
python3.6/encodings/idna.py",
line 165, in encode
raise UnicodeError("label empty or too long")
UnicodeError: label empty or too long

May be you can catch this?

Most helpful comment

For those interested in the issue on the python side
https://bugs.python.org/issue32958

All 4 comments

For posterity, the complete traceback is this:

>>> requests.get(base_url)
Traceback (most recent call last):
  File "/Users/cory/.pyenv/versions/3.6.0/lib/python3.6/encodings/idna.py", line 165, in encode
    raise UnicodeError("label empty or too long")
UnicodeError: label empty or too long

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/cory/Documents/Python/requests_org/requests/requests/api.py", line 72, in get
    return request('get', url, params=params, **kwargs)
  File "/Users/cory/Documents/Python/requests_org/requests/requests/api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  File "/Users/cory/Documents/Python/requests_org/requests/requests/sessions.py", line 493, in request
    prep.url, proxies, stream, verify, cert
  File "/Users/cory/Documents/Python/requests_org/requests/requests/sessions.py", line 666, in merge_environment_settings
    env_proxies = get_environ_proxies(url, no_proxy=no_proxy)
  File "/Users/cory/Documents/Python/requests_org/requests/requests/utils.py", line 692, in get_environ_proxies
    if should_bypass_proxies(url, no_proxy=no_proxy):
  File "/Users/cory/Documents/Python/requests_org/requests/requests/utils.py", line 676, in should_bypass_proxies
    bypass = proxy_bypass(netloc)
  File "/Users/cory/.pyenv/versions/3.6.0/lib/python3.6/urllib/request.py", line 2616, in proxy_bypass
    return proxy_bypass_macosx_sysconf(host)
  File "/Users/cory/.pyenv/versions/3.6.0/lib/python3.6/urllib/request.py", line 2593, in proxy_bypass_macosx_sysconf
    return _proxy_bypass_macosx_sysconf(host, proxy_settings)
  File "/Users/cory/.pyenv/versions/3.6.0/lib/python3.6/urllib/request.py", line 2566, in _proxy_bypass_macosx_sysconf
    hostIP = socket.gethostbyname(hostonly)
UnicodeError: encoding with 'idna' codec failed (UnicodeError: label empty or too long)

I don't think there's much we can do about this. The error is coming out of the standard library (specifically, in the urllib proxy_bypass function). It's present only on Python 3, which feels the need to call socket.gethostbyname. This function will automatically IDNA-encode a unicode hostname, even in situations like this where it's simply not necessary, and its IDNA encoder correctly rejects this.

The only way we can fix this is by moving to a much smarter URL handling implementation that normalizes URLs in some form. The best candidate is hyperlink, but hyperlink also barfs on this for a similar reason (it tries to IDNA-encode and fails).

This means that at best we could fix this by extending hyperlink with a URL host normalizer and then handle it. However, the WHATWG URL specification also appears to forbid this form of URL. If it does I'm not sure why, as Chrome normalizes it (though Safari does not).

Given the amount of work required to do this, I don't see any reason to tolerate it. The URL is just spectacularly far away from anything that can reasonably be expected to work, so I'm inclined to just close this as a won't fix.

I'm encountering this for a URL of the format:

https://key:[email protected]/path/file.json

and length of 132 characters.

@johnpaulhayes That's still not an issue with the requests library, but as I'm also running into it I figure I'll drop an update.

It's not the total length of the url that seems to do it, just a section of it. The idna encoder seems to break on urls when the first part of the host name is greater than 64 characters long. For whatever reason, it's including the key and secret in there as well. So either avoid python3 or avoid long "key:secret@example" strings (likely by avoiding long api keys) until the underlying functions are fixed. I submitted a bug for it to the python tracker yesterday.

For those interested in the issue on the python side
https://bugs.python.org/issue32958

Was this page helpful?
0 / 5 - 0 ratings

Related issues

justlurking picture justlurking  ·  3Comments

xsren picture xsren  ·  3Comments

remram44 picture remram44  ·  4Comments

brainwane picture brainwane  ·  3Comments

JimHokanson picture JimHokanson  ·  3Comments