Pandas: ํŒŒ์ผ ๊ฒฝ๋กœ์— ์•…์„ผํŠธ๊ฐ€์žˆ๋Š” ํŒŒ์ผ์„ ์ฝ์„ ๋•Œ OSError

์— ๋งŒ๋“  2017๋…„ 01์›” 09์ผ  ยท  27์ฝ”๋ฉ˜ํŠธ  ยท  ์ถœ์ฒ˜: pandas-dev/pandas

์ฝ”๋“œ ์ƒ˜ํ”Œ, ๊ฐ€๋Šฅํ•œ ๊ฒฝ์šฐ ๋ณต์‚ฌํ•˜์—ฌ ๋ถ™์—ฌ ๋„ฃ์„ ์ˆ˜์žˆ๋Š” ์˜ˆ์ œ

test.txt ๋ฐ test_รฉ.txt ๋Š” ๋™์ผํ•œ ํŒŒ์ผ์ด๋ฉฐ ์ด๋ฆ„ ๋งŒ ๋ณ€๊ฒฝ๋ฉ๋‹ˆ๋‹ค.

pd.read_csv('test.txt')
Out[3]: 
   1 1 1
0  1 1 1
1  1 1 1

pd.read_csv('test_รฉ.txt')
Traceback (most recent call last):

  File "<ipython-input-4-fd67679d1d17>", line 1, in <module>
    pd.read_csv('test_รฉ.txt')

  File "d:\app\python36\lib\site-packages\pandas\io\parsers.py", line 646, in parser_f
    return _read(filepath_or_buffer, kwds)

  File "d:\app\python36\lib\site-packages\pandas\io\parsers.py", line 389, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)

  File "d:\app\python36\lib\site-packages\pandas\io\parsers.py", line 730, in __init__
    self._make_engine(self.engine)

  File "d:\app\python36\lib\site-packages\pandas\io\parsers.py", line 923, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)

  File "d:\app\python36\lib\site-packages\pandas\io\parsers.py", line 1390, in __init__
    self._reader = _parser.TextReader(src, **kwds)

  File "pandas\parser.pyx", line 373, in pandas.parser.TextReader.__cinit__ (pandas\parser.c:4184)

  File "pandas\parser.pyx", line 669, in pandas.parser.TextReader._setup_parser_source (pandas\parser.c:8471)

OSError: Initializing from file failed

๋ฌธ์ œ ์„ค๋ช…

Pandas๋Š” ํŒŒ์ผ ๊ฒฝ๋กœ์— ์•…์„ผํŠธ๊ฐ€์žˆ๋Š” ํŒŒ์ผ์„ ์ฝ์œผ๋ ค๊ณ  ํ•  ๋•Œ OSError๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

๋ฌธ์ œ๋Š” ์ƒˆ๋กœ์šด ๊ฒƒ์ž…๋‹ˆ๋‹ค (Python 3.6 ๋ฐ Pandas 0.19.2๋กœ ์—…๊ทธ๋ ˆ์ด๋“œํ–ˆ๊ธฐ ๋•Œ๋ฌธ์—)

pd.show_versions()

์„ค์น˜๋œ ๋ฒ„์ „

์ปค๋ฐ‹ : ์—†์Œ
ํŒŒ์ด์ฌ : 3.6.0.final.0
ํŒŒ์ด์ฌ ๋น„ํŠธ : 64
์šด์˜์ฒด์ œ : Windows
OS ๋ฆด๋ฆฌ์Šค : 7
๊ธฐ๊ณ„ : AMD64
ํ”„๋กœ์„ธ์„œ : Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
byteorder : ์กฐ๊ธˆ
LC_ALL : ์—†์Œ
LANG : fr
๋กœ์ผ€์ผ : ์—†์Œ. ์—†์Œ

ํŒ๋‹ค : 0.19.2
์ฝ” : ์—†์Œ
ํ• : 9.0.1
setuptools : 32.3.1
์‚ฌ์ด ํ†ค : 0.25.2
numpy : 1.11.3
scipy : 0.18.1
statsmodels : ์—†์Œ
xarray : ์—†์Œ
IPython : 5.1.0
์Šคํ•‘ํฌ์Šค : 1.5.1
patsy : ์—†์Œ
dateutil : 2.6.0
pytz : 2016.10
blosc : ์—†์Œ
๋ณ‘๋ชฉ ํ˜„์ƒ : 1.2.0
ํ…Œ์ด๋ธ” : ์—†์Œ
numexpr : 2.6.1
matplotlib : 1.5.3
openpyxl : ์—†์Œ
xlrd : ์—†์Œ
xlwt : ์—†์Œ
xlsxwriter : ์—†์Œ
lxml : ์—†์Œ
bs4 : ์—†์Œ
html5lib : 0.999999999
httplib2 : ์—†์Œ
apiclient : ์—†์Œ
sqlalchemy : 1.1.4
pymysql : ์—†์Œ
psycopg2 : ์—†์Œ
jinja2 : 2.9.3
boto : ์—†์Œ
pandas_datareader : ์—†์Œ

Bug IO CSV Unicode Windows

๊ฐ€์žฅ ์œ ์šฉํ•œ ๋Œ“๊ธ€

๋ˆ„๊ตฐ๊ฐ€๊ฐ€ ๊ฐ™์€ ๋ฌธ์ œ๋ฅผ ๊ฒช์—ˆ ๊ธฐ ๋•Œ๋ฌธ์— ์—ฌ๊ธฐ์— ์˜ค๋ฉด pandas๊ฐ€ pep 529์™€ ํ•จ๊ป˜ ์ž‘๋™ํ•˜๋„๋ก ์ˆ˜์ • ๋  ๋•Œ๊นŒ์ง€ ํ•ด๊ฒฐ์ฑ…์ด ์žˆ์Šต๋‹ˆ๋‹ค (๊ธฐ๋ณธ์ ์œผ๋กœ ๋น„ ASCII ๋ฌธ์ž๋Š” ๊ฒฝ๋กœ์— ์žˆ๊ฑฐ๋‚˜ ํŒŒ์ผ ์ด๋ฆ„์— ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค)

์ฝ”๋“œ ์‹œ์ž‘ ๋ถ€๋ถ„์— ๋‹ค์Œ ๋‘ ์ค„์„ ์‚ฝ์ž…ํ•˜์—ฌ ์ฐฝ์—์„œ ๊ฒฝ๋กœ๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š” ์ด์ „ ๋ฐฉ์‹์œผ๋กœ ๋˜๋Œ๋ฆฝ๋‹ˆ๋‹ค.

import sys
sys._enablelegacywindowsfsencoding()

๋ชจ๋“  27 ๋Œ“๊ธ€

๋‚ด ๋™์ „ ๊ฐ€์น˜๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. Mac OSX ๋ฐ Ubuntu์—์„œ ๋น ๋ฅด๊ฒŒ ์‚ฌ์šฉํ•ด ๋ณด์•˜์Šต๋‹ˆ๋‹ค.
๋ฌธ์ œ. ์•„๋ž˜๋ฅผ ์ฐธ์กฐํ•˜์‹ญ์‹œ์˜ค.

์ด๊ฒƒ์ด ํ™˜๊ฒฝ / ํ”Œ๋žซํผ ๋ฌธ์ œ์ผ๊นŒ์š”? LOCALE ๊ฐ€
None.None ํ•ฉ๋‹ˆ๋‹ค. ๋ถˆํ–‰ํžˆ๋„ ๋‚˜๋Š” ์ด๊ฒƒ์„ ์‹œ๋„ ํ•  Windows ์‹œ์Šคํ…œ์ด ์—†์Šต๋‹ˆ๋‹ค.
์˜ˆ. ์ธ์ • ํ•˜๋“ฏ์ด ๋‹น์‹ ์ด ํ›„ ๋ณธ ์ ์ด ์ด์œ ๋ฅผ ์„ค๋ช… ํ•  ๊ฒƒ
python3.6 ๋ฐ pandas 0.19.2๋กœ ์—…๊ทธ๋ ˆ์ด๋“œํ•˜์‹ญ์‹œ์˜ค.

์ฐธ๊ณ  : ๋ฐฉ๊ธˆ python3.6์œผ๋กœ virtualenv๋ฅผ ์„ค์ •ํ•˜๊ณ  pip๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ pandas 0.19.2๋ฅผ ์„ค์น˜ํ–ˆ์Šต๋‹ˆ๋‹ค.

>>> import pandas as pd
>>> pd.read_csv('test_รฉ.txt')
   a  b  c
0  1  2  3
1  4  5  6

pd.show_versions ()์˜ ์ถœ๋ ฅ


์„ค์น˜๋œ ๋ฒ„์ „

์ปค๋ฐ‹ : ์—†์Œ
ํŒŒ์ด์ฌ : 3.6.0.final.0
ํŒŒ์ด์ฌ ๋น„ํŠธ : 64
์šด์˜์ฒด์ œ : Linux
OS ๋ฆด๋ฆฌ์Šค : 4.4.0-57- ์ผ๋ฐ˜
์ปดํ“จํ„ฐ : x86_64
ํ”„๋กœ์„ธ์„œ : x86_64
byteorder : ์กฐ๊ธˆ
LC_ALL : ์—†์Œ
LANG : en_GB.UTF-8
๋กœ์ปฌ : en_GB.UTF-8

ํŒ๋‹ค : 0.19.2
์ฝ” : ์—†์Œ
ํ• : 9.0.1
setuptools : 32.3.1
Cython : ์—†์Œ
numpy : 1.11.3
scipy : ์—†์Œ
statsmodels : ์—†์Œ
xarray : ์—†์Œ
IPython : ์—†์Œ
์Šคํ•‘ํฌ์Šค : ์—†์Œ
patsy : ์—†์Œ
dateutil : 2.6.0
pytz : 2016.10
blosc : ์—†์Œ
๋ณ‘๋ชฉ ํ˜„์ƒ : ์—†์Œ
ํ…Œ์ด๋ธ” : ์—†์Œ
numexpr : ์—†์Œ
matplotlib : ์—†์Œ
openpyxl : ์—†์Œ
xlrd : ์—†์Œ
xlwt : ์—†์Œ
xlsxwriter : ์—†์Œ
lxml : ์—†์Œ
bs4 : ์—†์Œ
html5lib : ์—†์Œ
httplib2 : ์—†์Œ
apiclient : ์—†์Œ
sqlalchemy : ์—†์Œ
pymysql : ์—†์Œ
psycopg2 : ์—†์Œ
jinja2 : ์—†์Œ
boto : ์—†์Œ
pandas_datareader : ์—†์Œ

3.6์€ Windows์˜ ํŒŒ์ผ ์‹œ์Šคํ…œ ์ธ์ฝ”๋”ฉ์„ ascii์—์„œ utf8๋กœ ์ „ํ™˜ํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค. ๊ทธ ์™ธ์—๋„ 3.6 ์šฉ Windows์—์„œ๋Š” ์•„์ง ํ…Œ์ŠคํŠธ๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค (ํ•„์š”ํ•œ ํŒจํ‚ค์ง€ ์ค‘ ์ผ๋ถ€๋Š” ์ด์ œ ๋ง‰ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค).

๋ฟก๋ฟก

๊ทธ๋ž˜์„œ 3.6์— ๋Œ€ํ•œ appveyor (windows)์— ๋Œ€ํ•œ ๋นŒ๋“œ ์ง€์›์„ ์ถ”๊ฐ€ ํ–ˆ์œผ๋ฏ€๋กœ ํ…Œ์ŠคํŠธ๋ฅผ ๋ฐ€์–ด์„œ ์ž‘๋™ํ•˜๋Š”์ง€ ํ™•์ธํ•˜๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค.

ํ”„๋กœ๊ทธ๋žจ์ด pd.read_csv (file_path)์—์„œ ์ค‘์ง€๋˜์—ˆ์„ ๋•Œ๋„ ๋™์ผํ•œ ๋ฌธ์ œ์— ์ง๋ฉดํ–ˆ์Šต๋‹ˆ๋‹ค. ํŒŒ์ด์ฌ์„ 3.6์œผ๋กœ ์—…๊ทธ๋ ˆ์ด๋“œ ํ•œ ํ›„์˜ ์ƒํ™ฉ์€ ์ €์™€ ๋น„์Šทํ•ฉ๋‹ˆ๋‹ค (๋งˆ์ง€๋ง‰์œผ๋กœ ์„ค์น˜ ํ•œ ํŒŒ์ด์ฌ์ด ์ •ํ™•ํžˆ ์–ด๋–ค ๋ฒ„์ „์ธ์ง€, ์•„๋งˆ๋„ 3.5 ...์ธ์ง€ ๋ชจ๋ฅด๊ฒ ์Šต๋‹ˆ๋‹ค.).

@jreback ์—ฌ๊ธฐ์„œ ์ˆ˜์ •์„์œ„ํ•œ ๋‹ค์Œ ๋‹จ๊ณ„๋Š” ๋ฌด์—‡์ž…๋‹ˆ๊นŒ?
'๋‚ ์•„๊ฐ„'PR์— ๋Œ€ํ•ด ์–ธ๊ธ‰ํ•˜์…จ์Šต๋‹ˆ๋‹ค. ๊ทธ๊ฒŒ ๋ฌด์Šจ ๋œป์ž…๋‹ˆ๊นŒ?

Windows๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š” ๋™์•ˆ ๋„์›€์„ ์ค„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค (Windows์—์„œ ์ž‘๋™ํ•˜์ง€ ์•Š๋Š” ์ฝ”๋“œ๋ฅผ ๋””๋ฒ„๊น…ํ•˜๋Š” VM์ด โ€‹โ€‹์žˆ์Šต๋‹ˆ๋‹ค).

BTW, ํ•ด๊ฒฐ ๋ฐฉ๋ฒ• : ์ด๋ฆ„ ๋Œ€์‹  ํŒŒ์ผ ํ•ธ๋“ค ์ „๋‹ฌ
pd.read_csv(open('test_รฉ.txt', 'r'))
(๊ด€๋ จ ๋ฌธ์ œ์—๋Š” ๋ช‡ ๊ฐ€์ง€ ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•์ด ์žˆ์ง€๋งŒ์ด ๋ฌธ์ œ๋Š” ๋ณด์ง€ ๋ชปํ–ˆ์Šต๋‹ˆ๋‹ค)

@tpietruszka ๋Š” PR์— ๋Œ€ํ•œ ์˜๊ฒฌ์„ ์ฐธ์กฐํ•˜์‹ญ์‹œ์˜ค : https://github.com/pandas-dev/pandas/pull/15092 (๊ฐœ์ธ ํฌํฌ์—์„œ ์ œ๊ฑฐ๋˜์—ˆ์œผ๋ฉฐ ๊ฑฐ์˜ โ€‹โ€‹๊ฑฐ๊ธฐ์—์žˆ์—ˆ์Šต๋‹ˆ๋‹ค).

๊ธฐ๋ณธ์ ์œผ๋กœ wnidows์˜ py3.6 (๋‹ค๋ฅธ ํŒŒ์ด์ฌ๊ณผ ๋น„๊ต)์—์„œ ๊ฒฝ๋กœ๋ฅผ ๋‹ค๋ฅด๊ฒŒ ์ธ์ฝ”๋”ฉํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ๋ณธ์ ์œผ๋กœ ๊ตฌํ˜„ํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค : https://docs.python.org/3/whatsnew/3.6.html#pep -529-change-windows-filesystem-encoding-to-utf-8

๋‚ด ์ด์ „ ์ฝ”๋“œ (์‹คํ–‰ํ•  ์ˆ˜ ์—†์Œ) :

import pandas as pd
import os
file_path='./dict/ๅญ—ๅ…ธ.csv'
df_name = pd.read_csv(file_path,sep=',' )

์ƒˆ ์ฝ”๋“œ (์„ฑ๊ณต) :

import pandas as pd
import os
file_path='./dict/dict.csv'
df_name = pd.read_csv(file_path,sep=',' )

์ด ๋ฒ„๊ทธ๋Š” ํŒŒ์ผ ์ด๋ฆ„ ๋ฌธ์ œ๋ผ๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค.
์ค‘๊ตญ์–ด์—์„œ ์˜์–ด๋กœ ํŒŒ์ผ ์ด๋ฆ„์„ ๋ณ€๊ฒฝํ•˜๋ฉด ์ง€๊ธˆ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋ˆ„๊ตฐ๊ฐ€๊ฐ€ ๊ฐ™์€ ๋ฌธ์ œ๋ฅผ ๊ฒช์—ˆ ๊ธฐ ๋•Œ๋ฌธ์— ์—ฌ๊ธฐ์— ์˜ค๋ฉด pandas๊ฐ€ pep 529์™€ ํ•จ๊ป˜ ์ž‘๋™ํ•˜๋„๋ก ์ˆ˜์ • ๋  ๋•Œ๊นŒ์ง€ ํ•ด๊ฒฐ์ฑ…์ด ์žˆ์Šต๋‹ˆ๋‹ค (๊ธฐ๋ณธ์ ์œผ๋กœ ๋น„ ASCII ๋ฌธ์ž๋Š” ๊ฒฝ๋กœ์— ์žˆ๊ฑฐ๋‚˜ ํŒŒ์ผ ์ด๋ฆ„์— ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค)

์ฝ”๋“œ ์‹œ์ž‘ ๋ถ€๋ถ„์— ๋‹ค์Œ ๋‘ ์ค„์„ ์‚ฝ์ž…ํ•˜์—ฌ ์ฐฝ์—์„œ ๊ฒฝ๋กœ๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š” ์ด์ „ ๋ฐฉ์‹์œผ๋กœ ๋˜๋Œ๋ฆฝ๋‹ˆ๋‹ค.

import sys
sys._enablelegacywindowsfsencoding()

์œ„์˜ ์†”๋ฃจ์…˜์„ ์‚ฌ์šฉํ•˜๊ณ  ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค. @fotisj ๋Œ€๋‹จํžˆ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค!
๊ทธ๋Ÿฌ๋‚˜ DataFrame.to_csv ()๊ฐ€ ๋™์ผํ•œ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•˜์ง€ ์•Š๋Š” ์ด์œ ์— ๋Œ€ํ•ด ์—ฌ์ „ํžˆ ํ˜ผ๋ž€ ์Šค๋Ÿฝ์Šต๋‹ˆ๋‹ค. ์ฆ‰, ์œ ๋‹ˆ ์ฝ”๋“œ ํŒŒ์ผ ๊ฒฝ๋กœ์˜ ๊ฒฝ์šฐ ์“ฐ๊ธฐ๋Š” ๊ดœ์ฐฎ์ง€ ๋งŒ ์ฝ๊ธฐ๋Š” ๊ทธ๋ ‡์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

path = os.path.join ( 'E : ่ฏญๆ–™', 'sina.csv')
pd.read_csv (open (path, 'r', encoding = 'utf8'))

์„ฑ๊ณตํ–ˆ์Šต๋‹ˆ๋‹ค.

์˜ํ–ฅ์„๋ฐ›๋Š” ์‹œ์Šคํ…œ์„ ๊ฐ€์ง„ ๋ˆ„๊ตฐ๊ฐ€๊ฐ€์ด ๋ผ์ธ์„ ๋ณ€๊ฒฝํ•˜๋Š”์ง€ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ?

https://github.com/pandas-dev/pandas/blob/e8620abc12a4c468a75adb8607fd8e0eb1c472e7/pandas/io/common.py#L209

...์—

 return _expand_user(os.fsencode(filepath_or_buffer)), None, compression

๊ณ ์ณ?

์•„๋‹ˆ ๊ทธ๋ ‡์ง€ ์•Š์•„.
๊ฒฐ๊ณผ : OSError : ์˜ˆ์ƒ ํŒŒ์ผ ๊ฒฝ๋กœ ์ด๋ฆ„ ๋˜๋Š” ํŒŒ์ผ๊ณผ ์œ ์‚ฌํ•œ ๊ฐ์ฒด,์œ ํ˜•
(Windows 10)

    OSError                                   Traceback (most recent call last)
    <ipython-input-2-e8247998d6d4> in <module>()
      1 
----> 2 df = pd.read_csv(r'D:\mydata\Dropbox\uni\progrs\test รถรคau\n\teu.csv', sep='\t')

C:\conda\lib\site-packages\pandas\io\parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision)
    707                     skip_blank_lines=skip_blank_lines)
    708 
--> 709         return _read(filepath_or_buffer, kwds)
    710 
    711     parser_f.__name__ = name

C:\conda\lib\site-packages\pandas\io\parsers.py in _read(filepath_or_buffer, kwds)
    447 
    448     # Create the parser.
--> 449     parser = TextFileReader(filepath_or_buffer, **kwds)
    450 
    451     if chunksize or iterator:

C:\conda\lib\site-packages\pandas\io\parsers.py in __init__(self, f, engine, **kwds)
    816             self.options['has_index_names'] = kwds['has_index_names']
    817 
--> 818         self._make_engine(self.engine)
    819 
    820     def close(self):

C:\conda\lib\site-packages\pandas\io\parsers.py in _make_engine(self, engine)
   1047     def _make_engine(self, engine='c'):
   1048         if engine == 'c':
-> 1049             self._engine = CParserWrapper(self.f, **self.options)
   1050         else:
   1051             if engine == 'python':

C:\conda\lib\site-packages\pandas\io\parsers.py in __init__(self, src, **kwds)
   1693         kwds['allow_leading_cols'] = self.index_col is not False
   1694 
-> 1695         self._reader = parsers.TextReader(src, **kwds)
   1696 
   1697         # XXX

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()

OSError: Expected file path name or file-like object, got <class 'bytes'> type

์•„, ์ฃ„์†กํ•ฉ๋‹ˆ๋‹ค. fsdecode๊ฐ€ ๊ฑฐ๊ธฐ์—์„œ ์ž‘๋™ํ•ฉ๋‹ˆ๊นŒ?


๋ณด๋‚ธ ์‚ฌ๋žŒ : Fotis Jannidis [email protected]
์ „์†ก : 2018 ๋…„ 2 ์›” 3 ์ผ ํ† ์š”์ผ ์˜ค์ „ 8:00:13
๋ฐ›๋Š” ์‚ฌ๋žŒ : pandas-dev / pandas
์ฐธ์กฐ : Tom Augspurger; ๋…ผํ‰
์ œ๋ชฉ : Re : [pandas-dev / pandas] ํŒŒ์ผ ๊ฒฝ๋กœ์— ์•…์„ผํŠธ๊ฐ€์žˆ๋Š” ํŒŒ์ผ์„ ์ฝ์„ ๋•Œ OSError ๋ฐœ์ƒ (# 15086)

์•„๋‹ˆ ๊ทธ๋ ‡์ง€ ์•Š์•„.
๊ฒฐ๊ณผ : OSError : ์˜ˆ์ƒ ํŒŒ์ผ ๊ฒฝ๋กœ ์ด๋ฆ„ ๋˜๋Š” ํŒŒ์ผ๊ณผ ์œ ์‚ฌํ•œ ๊ฐ์ฒด,์œ ํ˜•

โ€”
๋Œ“๊ธ€์„ ๋‹ฌ์•˜ ๊ธฐ ๋•Œ๋ฌธ์— ์ˆ˜์‹  ํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค.
์ด ์ด๋ฉ”์ผ์— ์ง์ ‘ ๋‹ต์žฅํ•˜๊ฑฐ๋‚˜ GitHub https://github.com/pandas-dev/pandas/issues/15086#issuecomment-362809602 ์—์„œ ๋ณด๊ฑฐ๋‚˜ https://github.com/notifications/unsubscribe-auth/ ์Šค๋ ˆ๋“œ๋ฅผ ์Œ์†Œ๊ฑฐํ•ฉ๋‹ˆ๋‹ค.

์•„๋‹ˆ์š”. fsdecode๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ์›๋ž˜ ์žˆ์—ˆ๋˜ ๊ฒƒ๊ณผ ๋™์ผํ•œ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค ( error_msg.txt ).

์‹œ๋„ํ•ด ์ฃผ์…”์„œ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค.


๋ณด๋‚ธ ์‚ฌ๋žŒ : Fotis Jannidis [email protected]
์ „์†ก : 2018 ๋…„ 2 ์›” 3 ์ผ ํ† ์š”์ผ ์˜ค์ „ 8:57:07
๋ฐ›๋Š” ์‚ฌ๋žŒ : pandas-dev / pandas
์ฐธ์กฐ : Tom Augspurger; ๋…ผํ‰
์ œ๋ชฉ : Re : [pandas-dev / pandas] ํŒŒ์ผ ๊ฒฝ๋กœ์— ์•…์„ผํŠธ๊ฐ€์žˆ๋Š” ํŒŒ์ผ์„ ์ฝ์„ ๋•Œ OSError ๋ฐœ์ƒ (# 15086)

์•„๋‹ˆ์š”. fsdecode๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ์›๋ž˜ ์žˆ์—ˆ๋˜ ๊ฒƒ๊ณผ ๋™์ผํ•œ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค (error_msg.txt https://github.com/pandas-dev/pandas/files/1691837/error_msg.txt )

โ€”
๋Œ“๊ธ€์„ ๋‹ฌ์•˜ ๊ธฐ ๋•Œ๋ฌธ์— ์ˆ˜์‹  ํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค.
์ด ์ด๋ฉ”์ผ์— ์ง์ ‘ ๋‹ต์žฅํ•˜๊ฑฐ๋‚˜ GitHub https://github.com/pandas-dev/pandas/issues/15086#issuecomment-362818153 ์—์„œ ๋ณด๊ฑฐ๋‚˜ https://github.com/notifications/unsubscribe-auth/ ์Šค๋ ˆ๋“œ๋ฅผ ์Œ์†Œ๊ฑฐํ•ฉ๋‹ˆ๋‹ค.

์˜ค๋Š˜ ์Šคํ‹ฐ๋ธŒ ๊ฒฐํ˜ผ ์ง€์ฐธ๊ธˆ๊ณผ ์ด์•ผ๊ธฐ, ๊ทธ๋Š”์ด ๋ฌธ์ œ๊ฐ€ ์ค„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค ์˜์‹ฌ : https://github.com/pandas-dev/pandas/blob/e8f206d8192b409bc39da1ba1b2c5bcd8b65cc9f/pandas/_libs/src/parser/io.c#L30

IIUC, Windows ํŒŒ์ผ ์‹œ์Šคํ…œ API๋Š” ์ด๋Ÿฌํ•œ ๋ฐ”์ดํŠธ๊ฐ€ MBCS์—์žˆ์„ ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒํ•˜์ง€๋งŒ ์šฐ๋ฆฌ๋Š” utf-8์„ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

์‚ฌ์šฉ์ž ์ˆ˜์ค€์˜ ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•์€ pandas์— ๋ฐ”์ดํŠธ ๋ฌธ์ž์—ด์„ ์ „๋‹ฌํ•˜๊ธฐ ์ „์— ํŒŒ์ผ ์ด๋ฆ„์„ mbcs๋กœ ๋ช…์‹œ ์ ์œผ๋กœ ์ธ์ฝ”๋”ฉํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. https://www.python.org/dev/peps/pep-0529/#explicitly -using-mbcs

pd.read_csv(filename.encode('mbcs'))

๋ˆ„๊ตฌ๋“ ์ง€ ๊ทธ ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•์„ ํ…Œ์ŠคํŠธ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ?

์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ํŒŒ์„œ ์ฝ”๋“œ์— ์•ฝ๊ฐ„์˜ ๋ณ€๊ฒฝ์ด ํ•„์š”ํ•˜์ง€๋งŒ (PR์ด์ด ์ž‘์—…์„ ์ˆ˜ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค) ์‚ญ์ œ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

์ž‘๋™ํ•˜์ง€ ์•Š๋Š” @TomAugspurger . read_csv๋Š” bytes ๊ฐ’์ด ์•„๋‹ˆ๋ผ str ์˜ˆ์ƒํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๊ฒƒ์€ ์‹คํŒจํ•ฉ๋‹ˆ๋‹ค

OSError: Expected file path name or file-like object, got <class 'bytes'> type

ํ™•์ธํ•ด ์ฃผ์…”์„œ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค.

2018 ๋…„ 4 ์›” 20 ์ผ ๊ธˆ์š”์ผ ์˜คํ›„ 3:43, Joรฃo D. Ferreira [email protected]
์ผ๋‹ค :

@TomAugspurger https://github.com/TomAugspurger ์ž‘๋™ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
read_csv๋Š” ๋ฐ”์ดํŠธ ๊ฐ’์ด ์•„๋‹Œ str์„ ์˜ˆ์ƒํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๊ฒƒ์€ ์‹คํŒจํ•ฉ๋‹ˆ๋‹ค

OSError : ์˜ˆ์ƒ๋˜๋Š” ํŒŒ์ผ ๊ฒฝ๋กœ ์ด๋ฆ„ ๋˜๋Š” ํŒŒ์ผ๊ณผ ์œ ์‚ฌํ•œ ๊ฐ์ฒด์ž…๋‹ˆ๋‹ค.์œ ํ˜•

โ€”
๋‹น์‹ ์ด ์–ธ๊ธ‰ ๋˜์—ˆ๊ธฐ ๋•Œ๋ฌธ์— ์ด๊ฒƒ์„ ๋ฐ›๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
์ด ์ด๋ฉ”์ผ์— ์ง์ ‘ ๋‹ต์žฅํ•˜๊ณ  GitHub์—์„œ ํ™•์ธํ•˜์„ธ์š”.
https://github.com/pandas-dev/pandas/issues/15086#issuecomment-383217062 ,
๋˜๋Š” ์Šค๋ ˆ๋“œ ์Œ์†Œ๊ฑฐ
https://github.com/notifications/unsubscribe-auth/ABQHIiOHyt3sT7B0pHJuY5lB-cJtT5JHks5tqkiEgaJpZM4LeTSB
.

ํ•‘๋งŒํ•˜์„ธ์š”-๊ฐ™์€ ๋ฌธ์ œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ์ง€๋งŒ ํ•„์š”ํ•˜์ง€ ์•Š์€ ๊ฒฝ์šฐ์—๋Š” ์ข‹์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์ปค๋ฎค๋‹ˆํ‹ฐ ํŒจ์น˜๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค

์ด ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ–ˆ์Šต๋‹ˆ๋‹ค. ๋‚˜๋Š” patchc๋ฅผ ์‹œ๋„ํ•˜๊ณ  ๊ธฐ์—ฌํ•˜๊ณ  ์‹ถ๋‹ค ์ด๊ฒƒ์„ ์ˆ˜์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ํฌ์ธํ„ฐ๊ฐ€ ์žˆ์Šต๋‹ˆ๊นŒ?

๋‚˜๋Š” ์ด๊ฒƒ์„ ์žฌํ˜„ ํ•  ์ˆ˜์žˆ๋Š” ์‹œ์Šคํ…œ์— ์ ‘๊ทผ ํ•  ์ˆ˜์žˆ๋Š” ๊ด€๋ฆฌ์ž๊ฐ€ ์—†๋‹ค๊ณ  ์ƒ๊ฐํ•œ๋‹ค.

์•„๋งˆ๋„์ด ๋ฌธ์ œ์˜ ๋‹ค๋ฅธ ์ผ๋ถ€๋Š” ํ•ด๊ฒฐ์ฑ…์„ ๋ชจ์œผ๋Š” ๋ฐ ๋„์›€์ด ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์•ˆ๋…•ํ•˜์„ธ์š”, ์ง€๊ธˆ pandas 1.0.3 ์—์„œ์ด ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ–ˆ์œผ๋ฉฐ sys._enablelegacywindowsfsencoding() ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•์ด ์ž‘๋™์„ ๋ฉˆ์ท„์Šต๋‹ˆ๋‹ค. ํŒŒ์ผ ๊ฒฝ๋กœ์— ฤ… ๋ฐ ลบ ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.
์ด ์˜ค๋ฅ˜๋Š” pandas 0.25.3 ํ•˜์ง€๋งŒ 0.23.4 ๋Š” ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•  ๋•Œ ์ œ๋Œ€๋กœ ์ž‘๋™ํ•˜๋Š” ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค (๋‹ค๋ฅธ ๋ฒ„์ „์€ ํ™•์ธํ•˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค). ์ถ”๊ฐ€ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•ด ๋“œ๋ฆฌ๊ฒ ์Šต๋‹ˆ๋‹ค.

ํŒŒ์ผ์ด ํŒŒ์ผ๊ณผ ๋™์ผํ•œ ํด๋” ์ด๋ฆ„์— ์ €์žฅ๋œ ๊ฒฝ์šฐ, ๊ฐ™์€ ํด๋” ์ด๋ฆ„์—์„œ ํŒŒ์ผ์„ ์ œ๊ฑฐํ•˜์‹ญ์‹œ์˜ค.
ํ•ด๋‹น ํด๋”์—์„œ ํŒŒ์ผ์„ ์ œ๊ฑฐํ•˜๊ธฐ ๋งŒํ•˜๋ฉด๋ฉ๋‹ˆ๋‹ค.
๋™์ผํ•œ ํด๋” ์ด๋ฆ„์— ํŒŒ์ผ์„ ์ €์žฅํ•˜์ง€ ๋งˆ์‹ญ์‹œ์˜ค.
๊ทธ๋Ÿฌ๋ฉด ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค

@pranjulknit ๋‚ด๊ฐ€ ์ดํ•ดํ•œ๋‹ค๋ฉด ๊ฒฝ๋กœ์— ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๊ฐ€์žˆ๋Š” ๋ฌธ์ž๊ฐ€์—†๋Š” ํด๋”๋กœ ํŒŒ์ผ์„ ์ด๋™ํ•˜๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค. ์ด๊ฒƒ์ด ํ•ญ์ƒ ๊ฐ€๋Šฅํ•œ ๊ฒƒ์€ ์•„๋‹™๋‹ˆ๋‹ค. ํด๋” ์ด๋ฆ„๊ณผ ํŒŒ์ผ ์ด๋ฆ„์ด ๋‹ฌ๋ผ์•ผํ•œ๋‹ค๊ณ  ์ œ์•ˆํ•˜๋Š” ๊ฒฝ์šฐ-์ด๊ฒƒ์€ ์—ฌ๊ธฐ์— ์„ค๋ช… ๋œ ๋ฌธ์ œ๊ฐ€ ์•„๋‹ˆ๋ฏ€๋กœ ๋ฌธ์ œ๊ฐ€ ์—†์—ˆ์Šต๋‹ˆ๋‹ค.

์‹ค์ œ๋กœ jupyter ๋…ธํŠธ๋ถ์—์„œ csv ํŒŒ์ผ์„ ์ฝ๋Š” ๋™์•ˆ์ด ๋ฌธ์ œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

์ด ํŽ˜์ด์ง€๊ฐ€ ๋„์›€์ด ๋˜์—ˆ๋‚˜์š”?
0 / 5 - 0 ๋“ฑ๊ธ‰