Pip: Filename encoding error in some environments with PAX sdist

Created on 28 Jan 2020  ·  3Comments  ·  Source: pypa/pip

Environment

  • pip version: any
  • Python version: 2.7
  • OS: Windows, non-Windows in C locale

(pip Windows CI hits this)

Description
The PAX format wheel 0.34.1 sdists fail to install on Python 2.7 on Windows with a UnicodeEncodeError, or on non-Windows systems in a non-utf-8 locale: https://github.com/pypa/wheel/issues/331

Expected behavior
Unicode filename from the PAX tarball is correctly encoded for the local filesystem.

How to Reproduce
Attempt to install a PAX formatted tarball containing a file name that cannot be encoded to the default code page (Windows) or the default locale encoding (non-Windows).

In GNU tar, the affected paths are pre-mangled to something ASCII compatible, but PAX tar preserves them correctly, so the installer needs to handle them itself.

Output

See
https://dev.azure.com/pypa/pip/_build/results?buildId=18040&view=logs&j=404e6841-f5ba-57d9-f2c8-8c5322057572&t=0219f6bf-240d-5b08-c877-377b12af5079&l=309 for a Windows example in the pip test suite.

The wheel issue linked above has some Linux examples.

python 2 only bug

Most helpful comment

@johnthagen Yeah, the non-universal locale encoding problem I mention in https://github.com/pypa/pip/pull/7668#issuecomment-579706165 will apply Python 3 as well.

However 3.7+ mitigate it significantly, as they don't believe the OS when it claims to be using ASCII, and automatically switch to using UTF-8 instead.

All 3 comments

@ncoghlan Just an FYI, the issue I noted on https://github.com/pypa/wheel/issues/331 was using Python 3.6 (in case that has any bearing here).

In the process of justifying not fixing this, I figured out enough to fix it. :( See #7668.

@johnthagen Yeah, the non-universal locale encoding problem I mention in https://github.com/pypa/pip/pull/7668#issuecomment-579706165 will apply Python 3 as well.

However 3.7+ mitigate it significantly, as they don't believe the OS when it claims to be using ASCII, and automatically switch to using UTF-8 instead.

Was this page helpful?
0 / 5 - 0 ratings