Pandas: Windows์šฉ Python 3.6์—์„œ read_csv(filename_with_asian_locale)๊ฐ€ ์‹คํŒจํ–ˆ์Šต๋‹ˆ๋‹ค.

์— ๋งŒ๋“  2017๋…„ 06์›” 05์ผ  ยท  3์ฝ”๋ฉ˜ํŠธ  ยท  ์ถœ์ฒ˜: pandas-dev/pandas

์•”ํ˜ธ:

Python 3.6.0 |Anaconda 4.3.1 (64-bit)| (default, Dec 23 2016, 11:57:41) [MSC v.1900 64 bit (AMD64)] on win32
>>> pd.__version__
'0.20.1'
>>> import platform
>>> platform.platform()
'Windows-7-6.1.7601-SP1'
>>> import pandas as pd
>>> df = pd.read_csv(r'c:\tmp\ไธญๆ–‡.csv')
Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2881, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-6-0cd6317422e5>", line 1, in <module>
    df = pd.read_csv(r'c:\tmp\ไธญๆ–‡.csv')
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 655, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 405, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 762, in __init__
    self._make_engine(self.engine)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 966, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1582, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas\_libs\parsers.pyx", line 394, in pandas._libs.parsers.TextReader.__cinit__ (pandas\_libs\parsers.c:4209)
  File "pandas\_libs\parsers.pyx", line 712, in pandas._libs.parsers.TextReader._setup_parser_source (pandas\_libs\parsers.c:8895)
OSError: Initializing from file failed

๋ฌธ์ œ ์„ค๋ช…

python 3.6์€ "mbcs" ๋Œ€์‹  "utf-8"์„ ๋ฐ˜ํ™˜ํ•˜๋„๋ก sys.getfilesystemencoding()์„ ๋ณ€๊ฒฝํ–ˆ์Šต๋‹ˆ๋‹ค.
PEP 529๋ฅผ ์ฐธ์กฐํ•˜์‹ญ์‹œ์˜ค.

์–ด๋–ป๊ฒŒ ๊ณ ์น˜๋Š” ์ง€

์—ฌ๊ธฐ ๋ฌธ์ œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค: parsers.pyx

if isinstance(source, basestring):
     if not isinstance(source, bytes):
         source = source.encode(sys.getfilesystemencoding() or 'utf-8')

์†Œ์Šค ๋งค๊ฐœ๋ณ€์ˆ˜๋Š” ํŒŒ์ผ ์ด๋ฆ„์ด๋ฉฐ python 3.6์˜ ๋ ˆ๊ฑฐ์‹œ 'mbcs'๊ฐ€ ์•„๋‹Œ 'utf-8'๋กœ ์ธ์ฝ”๋”ฉ๋ฉ๋‹ˆ๋‹ค.
๋งˆ์ง€๋ง‰์œผ๋กœ io.c:new_file_source์˜ open()์— ์ „๋‹ฌ
๋”ฐ๋ผ์„œ mbcs ๋ฌธ์ž์—ด๋กœ ํ•ด์„๋˜๋ฏ€๋กœ "ํŒŒ์ผ์„ ์ฐพ์„ ์ˆ˜ ์—†์Œ" ์˜ˆ์™ธ๊ฐ€ ๋ฐœ์ƒํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
์•„๋งˆ๋„ ์ด๊ฒƒ์€ python 3.6์ด Windows API์˜ ์œ ๋‹ˆ์ฝ”๋“œ ๋ฒ„์ „์„ ์‚ฌ์šฉํ•˜์—ฌ ์ด๋Ÿฌํ•œ ๊ฒƒ๋“ค์„ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•œ cython์˜ ์ฑ…์ž„์ด์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
ํ•˜์ง€๋งŒ ์ง€๊ธˆ์€ sys.getfilesystemencoding()์„ "mbcs"๋กœ ๋Œ€์ฒดํ•ฉ๋‹ˆ๋‹ค.

Duplicate IO CSV Unicode

๊ฐ€์žฅ ์œ ์šฉํ•œ ๋Œ“๊ธ€

์†๋„๊ฐ€ ์ €ํ•˜๋œ ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•์ด ์žˆ์Šต๋‹ˆ๋‹ค.
ํŒŒ์ด์ฌ df = pd.read_csv(r'c:tmpไธญๆ–‡.csv', engine='ํŒŒ์ด์ฌ' )

๊ทธ๋Ÿฌ๋‚˜ ๋ชจ๋“  ํ”„๋กœ์ ํŠธ์—์„œ read_csv์— ๋Œ€ํ•œ ๋ชจ๋“  ๋‹จ์ผ ํ˜ธ์ถœ์„ ์ˆ˜์ •ํ•˜๋Š” ๊ฒƒ์€ ๋”๋Ÿฌ์šด ์ž‘์—…์ž…๋‹ˆ๋‹ค.

๋ชจ๋“  3 ๋Œ“๊ธ€

์†๋„๊ฐ€ ์ €ํ•˜๋œ ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•์ด ์žˆ์Šต๋‹ˆ๋‹ค.
ํŒŒ์ด์ฌ df = pd.read_csv(r'c:tmpไธญๆ–‡.csv', engine='ํŒŒ์ด์ฌ' )

๊ทธ๋Ÿฌ๋‚˜ ๋ชจ๋“  ํ”„๋กœ์ ํŠธ์—์„œ read_csv์— ๋Œ€ํ•œ ๋ชจ๋“  ๋‹จ์ผ ํ˜ธ์ถœ์„ ์ˆ˜์ •ํ•˜๋Š” ๊ฒƒ์€ ๋”๋Ÿฌ์šด ์ž‘์—…์ž…๋‹ˆ๋‹ค.

์ด๊ฒƒ์€ ์ด๊ฒƒ์˜ ๋ณต์ œ๋ณธ์ž…๋‹ˆ๋‹ค: https://github.com/pandas-dev/pandas/issues/15086

PR์ด ์ฒจ๋ถ€๋˜์—ˆ์ง€๋งŒ ๋ถˆํ–‰ํžˆ๋„ ๋‚ ์•„๊ฐ”์Šต๋‹ˆ๋‹ค.

ํ™•์‹คํžˆ ์ด๊ฒƒ์— ๋Œ€ํ•œ ์ˆ˜์ •์„ ์ทจํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค.

ํŒŒ์ผ ์ด๋ฆ„์€ ์ค‘๊ตญ์–ด ์ด๋ฆ„์„ ์‚ฌ์šฉํ•˜์ง€ ๋ง๊ณ  ์˜์–ด๋กœ ๋ณ€๊ฒฝํ•˜์‹ญ์‹œ์˜ค.
ํŒŒ์ผ ์ด๋ฆ„์— ์ค‘๊ตญ์–ด๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ๋ง๊ณ  ์˜์–ด๋กœ ๋ณ€๊ฒฝํ•˜์‹ญ์‹œ์˜ค.

์ด ํŽ˜์ด์ง€๊ฐ€ ๋„์›€์ด ๋˜์—ˆ๋‚˜์š”?
0 / 5 - 0 ๋“ฑ๊ธ‰