Pandas: drop์œผ๋กœ set_index๊ฐ€ ์ž‘๋™ํ•˜์ง€ ์•Š์Œ

์— ๋งŒ๋“  2016๋…„ 07์›” 14์ผ  ยท  6์ฝ”๋ฉ˜ํŠธ  ยท  ์ถœ์ฒ˜: pandas-dev/pandas

์ฝ”๋“œ ์ƒ˜ํ”Œ, ๊ฐ€๋Šฅํ•œ ๊ฒฝ์šฐ ๋ณต์‚ฌํ•˜์—ฌ ๋ถ™์—ฌ ๋„ฃ์„ ์ˆ˜์žˆ๋Š” ์˜ˆ์ œ

from io import StringIO
from pandas import read_csv

dtf = read_csv(StringIO("DATE_TIME,A\n2/8/2015  6:00:30,1"))

print(dtf)

dtf.set_index(dtf.DATE_TIME, drop=True, inplace=True)
print(dtf.columns)
print(dtf)

์ „๋ฅ˜ ์ถœ๋ ฅ

           DATE_TIME  A
0  2/8/2015  6:00:30  1
Index(['DATE_TIME', 'A'], dtype='object')
                           DATE_TIME  A
DATE_TIME                              
2/8/2015  6:00:30  2/8/2015  6:00:30  1

์˜ˆ์ƒ ์ถœ๋ ฅ

           DATE_TIME  A
0  2/8/2015  6:00:30  1
Index(['A'], dtype='object')
                           A
DATE_TIME                              
2/8/2015  6:00:30  1

pd.show_versions() ์ถœ๋ ฅ

INSTALLED VERSIONS
------------------
commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Darwin
OS-release: 15.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.18.1
nose: None
pip: 8.1.2
setuptools: 20.6.7
Cython: None
numpy: 1.11.1
scipy: 0.16.1
statsmodels: None
xarray: None
IPython: 4.0.1
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.6
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.0
openpyxl: 2.3.5
xlrd: 1.0.0
xlwt: 1.0.0
xlsxwriter: None
lxml: None
bs4: 4.4.1
html5lib: None
httplib2: 0.9.2
apiclient: 1.5.0
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None
None
Bug Error Reporting Reshaping

๊ฐ€์žฅ ์œ ์šฉํ•œ ๋Œ“๊ธ€

thx, ๋ฒ„๊ทธ ์ธ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. ์ž…๋ ฅ์ด ์›๋ณธ์—์„œ ์Šฌ๋ผ์ด์Šค ๋œ Series ์ด๋ฉด ํ•ด๋‹น ์—ด์„ ์‚ญ์ œํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค.

์—ด ์ด๋ฆ„์„ ์ „๋‹ฌํ•˜๋ฉด ์ œ๋Œ€๋กœ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.

dtf.set_index('DATE_TIME', drop=True, inplace=True)
dtf.columns
# Index(['A'], dtype='object')

๋ชจ๋“  6 ๋Œ“๊ธ€

thx, ๋ฒ„๊ทธ ์ธ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. ์ž…๋ ฅ์ด ์›๋ณธ์—์„œ ์Šฌ๋ผ์ด์Šค ๋œ Series ์ด๋ฉด ํ•ด๋‹น ์—ด์„ ์‚ญ์ œํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค.

์—ด ์ด๋ฆ„์„ ์ „๋‹ฌํ•˜๋ฉด ์ œ๋Œ€๋กœ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.

dtf.set_index('DATE_TIME', drop=True, inplace=True)
dtf.columns
# Index(['A'], dtype='object')

๋ฒ„๊ทธ๊ฐ€ ์•„๋‹™๋‹ˆ๋‹ค-์ด๊ฒƒ์€ set_index์˜ ๋ณด์žฅ์„ ์œ„๋ฐ˜ํ•ฉ๋‹ˆ๋‹ค

์—ฌ๊ธฐ์— ์‹ค์ œ ์—ด์„ ์ „๋‹ฌํ•˜๋Š” ๊ฒƒ์€ ์œ ํšจํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

์‹ค์ œ๋กœ ์ธ๋ฑ์Šค๋ฅผ ํ• ๋‹นํ•˜๋Š” ๊ฒƒ๊ณผ ๊ฐ™์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

์ด ์ž‘์—…์„ ์‹œ๋„ํ•˜๋Š” PR์ด ์žˆ์ง€๋งŒ ๋ณธ์งˆ์ ์œผ๋กœ ๋ชจํ˜ธํ•ฉ๋‹ˆ๋‹ค.

์ด๊ฒƒ์— ๋Œ€ํ•ด ๊ฒฝ๊ณ  ํ•  ์ˆ˜ ์žˆ๋Š”์ง€์กฐ์ฐจ ํ™•์‹ ํ•˜์ง€ ๋ชปํ•ฉ๋‹ˆ๋‹ค.
(๋‚ด ์ƒ๊ฐ์— inplace ๋ฐ drop์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์€ ์˜ค๋ฅ˜์ด์ง€๋งŒ)

๋ฒ„๊ทธ๊ฐ€ ์•„๋‹™๋‹ˆ๋‹ค-์ด๊ฒƒ์€ set_index์˜ ๋ณด์žฅ์„ ์œ„๋ฐ˜ํ•ฉ๋‹ˆ๋‹ค

set_index์˜ ๋ณด์žฅ์— ๋Œ€ํ•ด ์ž์„ธํžˆ ์„ค๋ช…ํ•ด ์ฃผ์‹œ๊ฒ ์Šต๋‹ˆ๊นŒ? ํŠน๋ณ„ํžˆ drop=True ์‚ฌ์šฉํ•˜๋ฉด ํ˜ผ๋ž€์Šค๋Ÿฝ๊ณ  ์–ด๋–ค ์ด์œ ๋กœ ๋“œ๋กญ์ด ํ—ˆ์šฉ๋˜์ง€ ์•Š๊ฑฐ๋‚˜ ๊ฐ€๋Šฅํ•˜์ง€ ์•Š์„ ๋•Œ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

ํ—‰ํ—‰

ํ‚ค ๋ชฉ๋ก์„ ์ „๋‹ฌํ•˜๋Š” ๊ฒƒ์€ ์ •์˜์— ์˜ํ•œ ์ธ๋ฑ์Šค ์„ค์ •์ž…๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ [58]์ด [57]์˜ ์‹ค์ œ ๊ฒฐ๊ณผ๋ผ๊ณ  ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

In [55]: df = pd.DataFrame({'A':range(2),'B':range(2),'C':range(2)})

In [56]: df
Out[56]: 
   A  B  C
0  0  0  0
1  1  1  1

In [57]: df.set_index(['A','B'])
Out[57]: 
     C
A B   
0 0  0
1 1  1

In [58]: df.index=['A','B']

In [59]: df
Out[59]: 
   A  B  C
A  0  0  0
B  1  1  1
In [54]: DataFrame.set_index?
Signature: DataFrame.set_index(self, keys, drop=True, append=False, inplace=False, verify_integrity=False)
Docstring:
Set the DataFrame index (row labels) using one or more existing
columns. By default yields a new object.

Parameters
----------
keys : column label or list of column labels / arrays
drop : boolean, default True
    Delete columns to be used as the new index
append : boolean, default False
    Whether to append columns to existing index
inplace : boolean, default False
    Modify the DataFrame in place (do not create a new object)
verify_integrity : boolean, default False
    Check the new index for duplicates. Otherwise defer the check until
    necessary. Setting to False will improve the performance of this
    method

Examples
--------
>>> indexed_df = df.set_index(['A', 'B'])
>>> indexed_df2 = df.set_index(['A', [0, 1, 2, 0, 1, 2]])
>>> indexed_df3 = df.set_index([[0, 1, 2, 0, 1, 2]])

Returns
-------
dataframe : DataFrame

์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ๊ณ„ํš์ด ์žˆ์Šต๋‹ˆ๊นŒ?

์ด ํŽ˜์ด์ง€๊ฐ€ ๋„์›€์ด ๋˜์—ˆ๋‚˜์š”?
0 / 5 - 0 ๋“ฑ๊ธ‰