Pandas: BUG: fillna with inplace does not work with multiple columns selection by loc

Created on 11 Dec 2016  ·  3Comments  ·  Source: pandas-dev/pandas

Code Sample, a copy-pastable example if possible

df = pd.DataFrame(np.random.randn(3, 4), columns=list('ABCD'))
df.iloc[1, 2:4] = np.nan
df.loc[:, ['C', 'D']].fillna(-1, inplace=True)
display(df)

Output:

A   B   C   D
0   1.387547    -1.299578   0.360015    1.290783
1   -0.395182   -0.112581   NaN NaN
2   -0.649372   -1.831869   -0.103746   0.533153

Problem description

It's expected to modify the Nan to -1 but it does NOT.

Please see the following comparisons.

Comparison (1)

On contrary, the following codes behave as expected.
(The only difference is selection by iloc or by loc)

df = pd.DataFrame(np.random.randn(3, 4), columns=list('ABCD'))
df.iloc[1, 2:4] = np.nan
df.iloc[:, 2:4].fillna(-1, inplace=True)
display(df)

Output:

    A   B   C   D
0   -0.522821   -1.600520   -1.468871   0.715790
1   0.493071    0.722474    -1.000000   -1.000000
2   0.545852    -0.877946   0.993169    -0.582661

Comparison (2)

When only one column is selected with loc, it behaves properly.

df = pd.DataFrame(np.random.randn(3, 4), columns=list('ABCD'))
df.iloc[1, 2:4] = np.nan
df.loc[:, 'C'].fillna(-1, inplace=True)
display(df)

Output:

A   B   C   D
0   -0.549106   0.261093    -1.278554   2.017178
1   -1.424498   0.439482    -1.000000   NaN
2   -1.281520   1.190736    0.356319    0.416363

Expected Output of the first code sample

A   B   C   D
0   1.181106    1.101231    -0.198445   0.295238
1   -0.654265   -1.129840   -1.000000   -1.000000
2   -1.070404   0.096556    0.499020    -1.835347

Output of pd.show_versions()

Paste the output here pd.show_versions() here

INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 2.6.32-358.14.1.el6.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: zh_TW.big5
LOCALE: zh_TW.big5

pandas: 0.19.1
nose: None
pip: 9.0.1
setuptools: 27.2.0
Cython: None
numpy: 1.11.2
scipy: 0.18.1
statsmodels: None
xarray: None
IPython: 5.1.0
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: 3.3.0
numexpr: 2.6.1
matplotlib: 1.5.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None

Indexing Missing-data Usage Question

Most helpful comment

not only multiple columns, but also one column.
df.loc[df.id==123, 'num'].fillna(0, inplace=True)
don't work ,
but
df.loc[df.id==123, 'num'] = 123
it works

why not edit the fillna function to adapt it in the future.
It seems like a bug.

All 3 comments

you are filling a copy. Using inplace is an anti-pattern. Most operations will show a SettingWithCopyWarning, but in this case this is a not easily detectable.

Use

In [11]: df[['C', 'D']] = df[['C', 'D']].fillna(-1)

In [12]: df
Out[12]: 
          A         B         C         D
0  0.236782  1.408896 -0.199882  0.803165
1 -1.763881  0.232414 -1.000000 -1.000000
2  0.878515 -0.394800  0.429696 -1.829569

Try this:
df.loc[:, ['C', 'D']] = df.loc[:, ['C', 'D']].fillna(-1)
I was having the same difficulty with a .relplace in my code. This worked.

not only multiple columns, but also one column.
df.loc[df.id==123, 'num'].fillna(0, inplace=True)
don't work ,
but
df.loc[df.id==123, 'num'] = 123
it works

why not edit the fillna function to adapt it in the future.
It seems like a bug.

Was this page helpful?
0 / 5 - 0 ratings