I am trying to merge multiple dataframe with consecutive merge operations, I want to add a suffix to the name of the newly merged column names. A simplified version of my code looks like this:
from pandas import *
f0 = DataFrame(columns=['data'], data=[1,2,3], index=['a','b','c'])
f1 = DataFrame(columns=['data'], data=[4,5,6], index=['c','b','a'])
f2 = DataFrame(columns=['data'], data=[7,8,9], index=['a','c','b'])
merged = f0
merged = merged.merge(f1, left_index=True, right_index=True, suffixes=("_0", "_1"))
merged = merged.merge(f2, left_index=True, right_index=True, suffixes=("", "_2"))
print merged.columns
With pandas 0.15.2 on python 2.7 this returns:
Index([u'data_0', u'data_1', u'data'], dtype='object')
while I would have expected
Index([u'data_0', u'data_1', u'data_2'], dtype='object')
seems you just want this?
In [45]: pd.concat([f0,f1,f2],axis=1,ignore_index=True)
Out[45]:
0 1 2
a 1 6 7
b 2 5 9
c 3 4 8
suffixes only apply if there are duplicate columns which after the first merge there are not (when compared to the 3rd)
In [46]: merged1 = merged.merge(f1, left_index=True, right_index=True, suffixes=("_0", "_1"))
In [47]: merged1
Out[47]:
data_0 data_1
c 3 4
b 2 5
a 1 6
In [48]: merged1.merge(f2, left_index=True, right_index=True, suffixes=("", "_2"))
Out[48]:
data_0 data_1 data
a 1 6 7
c 3 4 8
b 2 5 9
Ah thanks for the explanation, I missed the fact that suffixes only get applied to duplicate column names. And indeed the concat solution is simpler. And for the record: concat(... ignore_index=True) is exactly the opposite of what I want, but with concat(... ignore_index=False) it works nicely.
gr8
keep in mind that you generally don't want to have duplicate columns
you might want a multi level result - use the keys argument to concat
Most helpful comment
seems you just want this?
suffixes only apply if there are duplicate columns which after the first merge there are not (when compared to the 3rd)