Pandas: Not working set_index with drop

Created on 14 Jul 2016  ·  6Comments  ·  Source: pandas-dev/pandas

Code Sample, a copy-pastable example if possible

from io import StringIO
from pandas import read_csv

dtf = read_csv(StringIO("DATE_TIME,A\n2/8/2015  6:00:30,1"))


dtf.set_index(dtf.DATE_TIME, drop=True, inplace=True)

Current output

           DATE_TIME  A
0  2/8/2015  6:00:30  1
Index(['DATE_TIME', 'A'], dtype='object')
                           DATE_TIME  A
2/8/2015  6:00:30  2/8/2015  6:00:30  1

Expected Output

           DATE_TIME  A
0  2/8/2015  6:00:30  1
Index(['A'], dtype='object')
2/8/2015  6:00:30  1

output of pd.show_versions()

commit: None
python-bits: 64
OS: Darwin
OS-release: 15.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.18.1
nose: None
pip: 8.1.2
setuptools: 20.6.7
Cython: None
numpy: 1.11.1
scipy: 0.16.1
statsmodels: None
xarray: None
IPython: 4.0.1
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.6
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.0
openpyxl: 2.3.5
xlrd: 1.0.0
xlwt: 1.0.0
xlsxwriter: None
lxml: None
bs4: 4.4.1
html5lib: None
httplib2: 0.9.2
apiclient: 1.5.0
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None
Bug Error Reporting Reshaping

Most helpful comment

thx, it looks to be a bug. if input is a Series sliced from original, corresponding column should be dropped.

works fine if we pass column name.

dtf.set_index('DATE_TIME', drop=True, inplace=True)
# Index(['A'], dtype='object')

All 6 comments

thx, it looks to be a bug. if input is a Series sliced from original, corresponding column should be dropped.

works fine if we pass column name.

dtf.set_index('DATE_TIME', drop=True, inplace=True)
# Index(['A'], dtype='object')

not a bug - this violates the guarantees of set_index

it's not valid to pass an actual column here -

its not the same as actually assigning the index

there is a PR where try to make this work - but it's inherently ambiguous

not even sure you could warn about this
(though it IS an error to use inplace and drop I think)

not a bug - this violates the guarantees of set_index

Could you elaborate what guarantee that is of set_index? I find it confusing if I specifically use drop=True and get no error when for some reason dropping is not allowed or possible.


when you pass a list for the keys, it is by-definition setting the index. However, one possibly could think that [58] is the actual result of [57].

In [55]: df = pd.DataFrame({'A':range(2),'B':range(2),'C':range(2)})

In [56]: df
   A  B  C
0  0  0  0
1  1  1  1

In [57]: df.set_index(['A','B'])
A B   
0 0  0
1 1  1

In [58]: df.index=['A','B']

In [59]: df
   A  B  C
A  0  0  0
B  1  1  1
In [54]: DataFrame.set_index?
Signature: DataFrame.set_index(self, keys, drop=True, append=False, inplace=False, verify_integrity=False)
Set the DataFrame index (row labels) using one or more existing
columns. By default yields a new object.

keys : column label or list of column labels / arrays
drop : boolean, default True
    Delete columns to be used as the new index
append : boolean, default False
    Whether to append columns to existing index
inplace : boolean, default False
    Modify the DataFrame in place (do not create a new object)
verify_integrity : boolean, default False
    Check the new index for duplicates. Otherwise defer the check until
    necessary. Setting to False will improve the performance of this

>>> indexed_df = df.set_index(['A', 'B'])
>>> indexed_df2 = df.set_index(['A', [0, 1, 2, 0, 1, 2]])
>>> indexed_df3 = df.set_index([[0, 1, 2, 0, 1, 2]])

dataframe : DataFrame

any plans to fix this?

Was this page helpful?
0 / 5 - 0 ratings