pip issues UnicodeDecodeError on Windows 10 for Russian language

Created on 24 Jan 2017  ·  16Comments  ·  Source: pypa/pip

  • Pip version: 9.0.1
  • Python version: 3.6.0
  • Operating system: Microsoft Windows 10 Home Edition [Version 10.0.10586] for Russian language

Description:

pip issues UnicodeDecodeError on byte 0x8d in Windows 10 for Russian language.
It is not a problem for Windows 7 Ultimate SP1 for English language.
Probably has something to do with default CMD encoding, please fix it.

What I've run:

C:\WINDOWS\system32>pip install pyyaml
Collecting pyyaml
  Using cached PyYAML-3.12.tar.gz
Building wheels for collected packages: pyyaml
  Running setup.py bdist_wheel for pyyaml ... error
  Failed building wheel for pyyaml
  Running setup.py clean for pyyaml
Failed to build pyyaml
Installing collected packages: pyyaml
  Running setup.py install for pyyaml ... error
Exception:
Traceback (most recent call last):
  File "c:\program files (x86)\python36-32\lib\site-packages\pip\compat\__init__.py", line 73, in console_to_str
    return s.decode(sys.__stdout__.encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8d in position 68: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\program files (x86)\python36-32\lib\site-packages\pip\basecommand.py", line 215, in main
    status = self.run(options, args)
  File "c:\program files (x86)\python36-32\lib\site-packages\pip\commands\install.py", line 342, in run
    prefix=options.prefix_path,
  File "c:\program files (x86)\python36-32\lib\site-packages\pip\req\req_set.py", line 784, in install
    **kwargs
  File "c:\program files (x86)\python36-32\lib\site-packages\pip\req\req_install.py", line 878, in install
    spinner=spinner,
  File "c:\program files (x86)\python36-32\lib\site-packages\pip\utils\__init__.py", line 676, in call_subprocess
    line = console_to_str(proc.stdout.readline())
  File "c:\program files (x86)\python36-32\lib\site-packages\pip\compat\__init__.py", line 75, in console_to_str
    return s.decode('utf_8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8d in position 68: invalid start byte
encoding auto-locked

Most helpful comment

Add a solution here:
run a new cmd.exe console
chcp
it will show the system default code, for example 936.
open Lib/site-package/pip/compat/__init__.py
around 75 line, change return s.decode('utf_8') to return s.decode('cp936')

It's just a workaround. I think pip might need solve this issue asap, it's not easy to find solution.

This may have a general solution using cdll.
Not sure if this is the best solution on windows but I still made a PR for this issue.

All 16 comments

This is likely due to the fact that on Windows Python 3.6 switched to using UTF-8 for console IO. The code is running a subprocess, and then guessing the encoding of the subprocess output as being the same as the encoding of sys.stdout - which was true in Python <3.6 (arguably more by luck than anything else) but is no longer true in 3.6+

The simplest fix is probably to use locale.getpreferredencoding(False) for the encoding, as that's the default encoding used in io.TextIOWrapper and for subprocess when universal_newlines is True.

I thought pip is suposed to be easy for users, is it possible to hide this problems from us ?:)

Encodings are not easy for anyone :-) It's certainly possible to deal with this as I said. Just the first time it's come up (it's a Python 3.6 change).

I've easily reproduced the problem on my VM with Windows 6.1.7601 (win7 SP1 Russian),
cmd utility: chcp shows up 866,
chcp 65001(UTF-8) - doesn't help

Add a solution here:
run a new cmd.exe console
chcp
it will show the system default code, for example 936.
open Lib/site-package/pip/compat/__init__.py
around 75 line, change return s.decode('utf_8') to return s.decode('cp936')

It's just a workaround. I think pip might need solve this issue asap, it's not easy to find solution.

This may have a general solution using cdll.
Not sure if this is the best solution on windows but I still made a PR for this issue.

actually it was easier to use easy_install as workaround...

Closing as a duplicate of https://github.com/pypa/pip/issues/4110.

What is the official workaround? How do you update pip if it is itself broken?

@zed does #4280 fix this ?

@xavfernandez What do you mean? Are you suggesting to edit the installed pip/compat.py file manually? I meant something like: set PYTHONLEGACYWINDOWSIOENCODING=nonempty before running pip.

What is the right solution for fix this?

Hey @JoeVogel!

pip 10 is currently in beta and has a fix for this. You can upgrade o it (if you don't mind using a beta version) by running pip install -U --pre pip

win10
E:>pip -V
pip 10.0.1 from d:\program files\python\python35\lib\site-packagespip (python 3.5)

Still have the same problem when i install lupa1.6 with "pip install lupa":

    Using bundled Lua
    building without Cython
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "C:\Users\dell\AppData\Local\Temp\pip-req-build-zth2l84p\setup.py", line 308, in <module>
        for text_file in ['README.rst', 'INSTALL.rst', 'CHANGES.rst', "LICENSE.txt"]])
      File "C:\Users\dell\AppData\Local\Temp\pip-req-build-zth2l84p\setup.py", line 308, in <listcomp>
        for text_file in ['README.rst', 'INSTALL.rst', 'CHANGES.rst', "LICENSE.txt"]])
      File "C:\Users\dell\AppData\Local\Temp\pip-req-build-zth2l84p\setup.py", line 298, in read_file
        return f.read()
    UnicodeDecodeError: 'gbk' codec can't decode byte 0x93 in position 1183: illegal multibyte sequence

I'm a chinese,so the system default encoding is cp936,which is 'gbk'.Switch console encoding to utf-8(chcp 65001) won't make any diffrence.

So i download the lupa1.6 tar ball from:https://pypi.org/project/lupa/#files.Found the code raise error:

# line 295
def read_file(filename):
    with open(os.path.join(basedir, filename)) as f:
        return f.read()


def write_file(filename, content):
    with open(os.path.join(basedir, filename), 'w') as f:
        f.write(content)


long_description = '\n\n'.join([
    read_file(text_file)
    for text_file in ['README.rst', 'INSTALL.rst', 'CHANGES.rst', "LICENSE.txt"]])

write_file(os.path.join('lupa', 'version.py'), "__version__ = '%s'\n" % VERSION)

Files('README.rst', 'INSTALL.rst', 'CHANGES.rst', "LICENSE.txt") are encoding with utf-8,while function open
do not specify encoding argument.I add utf-8 encoding argument,problem solved.

def read_file(filename):
    with open(os.path.join(basedir, filename), 'r',encoding='utf-8') as f:
        return f.read()


def write_file(filename, content):
    with open(os.path.join(basedir, filename), 'w',encoding='utf-8') as f:
        f.write(content)

@changnet Please open a new issue.

@changnet However, this appears to be a problem with the setup.py for the lupa project, so you should probably raise it with them, rather than here.

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Was this page helpful?
0 / 5 - 0 ratings