Requests: UnicodeEncodeError: 'latin-1' codec can't encode characters

Created on 20 Dec 2013  ·  7Comments  ·  Source: psf/requests

Requests is the latest version.
When I try to post the data which contains Chinese character, this exception is thrown.

Traceback (most recent call last):
  File "X/threading.py", line 639, in _bootstrap_inner
  File "X/threading.py", line 596, in run
  File "C:\Users\Administrator\Dropbox\Sublime3056\Data\Packages\SublimeApex\salesforce\api.py", line 546, in execute_anonymous
    headers=headers)
  File "C:\Users\Administrator\Dropbox\Sublime3056\Data\Packages\SublimeApex\requests\api.py", line 88, in post
    return request('post', url, data=data, **kwargs)
  File "C:\Users\Administrator\Dropbox\Sublime3056\Data\Packages\SublimeApex\requests\api.py", line 44, in request
    return session.request(method=method, url=url, **kwargs)
  File "C:\Users\Administrator\Dropbox\Sublime3056\Data\Packages\SublimeApex\requests\sessions.py", line 338, in request
    resp = self.send(prep, **send_kwargs)
  File "C:\Users\Administrator\Dropbox\Sublime3056\Data\Packages\SublimeApex\requests\sessions.py", line 441, in send
    r = adapter.send(request, **kwargs)
  File "C:\Users\Administrator\Dropbox\Sublime3056\Data\Packages\SublimeApex\requests\adapters.py", line 292, in send
    timeout=timeout
  File "C:\Users\Administrator\Dropbox\Sublime3056\Data\Packages\SublimeApex\requests\packages\urllib3\connectionpool.py", line 428, in urlopen
    body=body, headers=headers)
  File "C:\Users\Administrator\Dropbox\Sublime3056\Data\Packages\SublimeApex\requests\packages\urllib3\connectionpool.py", line 280, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "X/http/client.py", line 1049, in request
  File "X/http/client.py", line 1086, in _send_request
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 1632-1633: ordinal not in range(256)

Most helpful comment

So, ST 3, but not the most recent revision. Ok, that gives us something. Specifically, Sublime Text 3 uses Python 3.3, not Python 2.7 (which Sublime Text 2 used). This means all the default strings in Sublime Apex are unicode strings.

If you open up the Python 3.3 http.client file, you'll find that the _send_request() function looks like this:

# Honor explicitly requested Host: and Accept-Encoding: headers.
header_names = dict.fromkeys([k.lower() for k in headers])
skips = {}
if 'host' in header_names:
    skips['skip_host'] = 1
if 'accept-encoding' in header_names:
    skips['skip_accept_encoding'] = 1

self.putrequest(method, url, **skips)

if body is not None and ('content-length' not in header_names):
    self._set_content_length(body)
for hdr, value in headers.items():
    self.putheader(hdr, value)
if isinstance(body, str):
    # RFC 2616 Section 3.7.1 says that text default has a
    # default charset of iso-8859-1.
    body = body.encode('iso-8859-1')
self.endheaders(body)

Now, ISO-8859-1 is an alias for Latin-1, which is the codec we're having trouble with. The problem we've got is that Sublime Apex is providing a unicode string body to Requests, which httplib needs to encode into bytes. Taking the default from RFC 2616, it concludes you want Latin-1, which doesn't include any Chinese characters. Clearly then, encoding fails, and you get the exception in question.

Considering that Sublime Apex claims in the headers it sends to be sending UTF-8 encoded data (which is a lie currently), Sublime Apex wants to be encoding the data as UTF-8 before sending it. This means any line sending data (in this case line 545 of salesforce/api.py) should read like this:

response = requests.post(self.apex_url, soap_body.encode('utf-8'), verify=False, headers=headers)

For the sake of anyone else who wants to confirm my diagnosis, here's a quick bit of sample code that confirms the problem:

a = "\u13E0\u19E0\u1320"
a.encode('latin1')  # Throws UnicodeEncodeError, proves that this can't be expressed in ISO-8859-1.
a.encode('utf-8')  # Totally fine.
r = requests.post('http://httpbin.org/post', data=a)  # Using unicode string, throws UnicodeEncodeError blaming Latin1.
r = requests.post('http://httpbin.org/post', data=a.encode('utf-8'))  # Works fine.

Thanks for raising this with us, but this is not a Requests bug. =)

All 7 comments

File "X/http/client.py"

Did you write X because that's a path to a local file? If so, your directory structure may be confusing urllib3. If not, then you should probably raise this with on bugs.python.org since this is not something I think requests should be handling. This looks like it's rising from httplib (or http on Python 3 which I'm guessing you're using).

@sigmavirus24 ,

I used requests in sublime plugin, if the soap_body in below statement didn't contains any Chinese characters, there will be no exception.

response = requests.post(self.apex_url, soap_body, verify=False, headers=headers)

Firstly, unless you're using a different version of Sublime Apex to the one in their public repository, Requests is _not_ the latest version, it's version 1.2.3. Can you tell me what version of Sublime Text you're using?

It's sublime text 3056

So, ST 3, but not the most recent revision. Ok, that gives us something. Specifically, Sublime Text 3 uses Python 3.3, not Python 2.7 (which Sublime Text 2 used). This means all the default strings in Sublime Apex are unicode strings.

If you open up the Python 3.3 http.client file, you'll find that the _send_request() function looks like this:

# Honor explicitly requested Host: and Accept-Encoding: headers.
header_names = dict.fromkeys([k.lower() for k in headers])
skips = {}
if 'host' in header_names:
    skips['skip_host'] = 1
if 'accept-encoding' in header_names:
    skips['skip_accept_encoding'] = 1

self.putrequest(method, url, **skips)

if body is not None and ('content-length' not in header_names):
    self._set_content_length(body)
for hdr, value in headers.items():
    self.putheader(hdr, value)
if isinstance(body, str):
    # RFC 2616 Section 3.7.1 says that text default has a
    # default charset of iso-8859-1.
    body = body.encode('iso-8859-1')
self.endheaders(body)

Now, ISO-8859-1 is an alias for Latin-1, which is the codec we're having trouble with. The problem we've got is that Sublime Apex is providing a unicode string body to Requests, which httplib needs to encode into bytes. Taking the default from RFC 2616, it concludes you want Latin-1, which doesn't include any Chinese characters. Clearly then, encoding fails, and you get the exception in question.

Considering that Sublime Apex claims in the headers it sends to be sending UTF-8 encoded data (which is a lie currently), Sublime Apex wants to be encoding the data as UTF-8 before sending it. This means any line sending data (in this case line 545 of salesforce/api.py) should read like this:

response = requests.post(self.apex_url, soap_body.encode('utf-8'), verify=False, headers=headers)

For the sake of anyone else who wants to confirm my diagnosis, here's a quick bit of sample code that confirms the problem:

a = "\u13E0\u19E0\u1320"
a.encode('latin1')  # Throws UnicodeEncodeError, proves that this can't be expressed in ISO-8859-1.
a.encode('utf-8')  # Totally fine.
r = requests.post('http://httpbin.org/post', data=a)  # Using unicode string, throws UnicodeEncodeError blaming Latin1.
r = requests.post('http://httpbin.org/post', data=a.encode('utf-8'))  # Works fine.

Thanks for raising this with us, but this is not a Requests bug. =)

Thanks.

r = requests.post('http://httpbin.org/post', data=a.encode('utf-8'))
very usefull,
thank you!

Was this page helpful?
0 / 5 - 0 ratings