df = pd.DataFrame(np.random.randn(3, 4), columns=list('ABCD'))
df.iloc[1, 2:4] = np.nan
df.loc[:, ['C', 'D']].fillna(-1, inplace=True)
display(df)
Output:
A B C D
0 1.387547 -1.299578 0.360015 1.290783
1 -0.395182 -0.112581 NaN NaN
2 -0.649372 -1.831869 -0.103746 0.533153
It's expected to modify the Nan to -1 but it does NOT.
Please see the following comparisons.
On contrary, the following codes behave as expected.
(The only difference is selection by iloc or by loc)
df = pd.DataFrame(np.random.randn(3, 4), columns=list('ABCD'))
df.iloc[1, 2:4] = np.nan
df.iloc[:, 2:4].fillna(-1, inplace=True)
display(df)
Output:
A B C D
0 -0.522821 -1.600520 -1.468871 0.715790
1 0.493071 0.722474 -1.000000 -1.000000
2 0.545852 -0.877946 0.993169 -0.582661
When only one column is selected with loc, it behaves properly.
df = pd.DataFrame(np.random.randn(3, 4), columns=list('ABCD'))
df.iloc[1, 2:4] = np.nan
df.loc[:, 'C'].fillna(-1, inplace=True)
display(df)
Output:
A B C D
0 -0.549106 0.261093 -1.278554 2.017178
1 -1.424498 0.439482 -1.000000 NaN
2 -1.281520 1.190736 0.356319 0.416363
A B C D
0 1.181106 1.101231 -0.198445 0.295238
1 -0.654265 -1.129840 -1.000000 -1.000000
2 -1.070404 0.096556 0.499020 -1.835347
pd.show_versions()
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 2.6.32-358.14.1.el6.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: zh_TW.big5
LOCALE: zh_TW.big5
pandas: 0.19.1
nose: None
pip: 9.0.1
setuptools: 27.2.0
Cython: None
numpy: 1.11.2
scipy: 0.18.1
statsmodels: None
xarray: None
IPython: 5.1.0
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: 3.3.0
numexpr: 2.6.1
matplotlib: 1.5.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None
you are filling a copy. Using inplace
is an anti-pattern. Most operations will show a SettingWithCopyWarning
, but in this case this is a not easily detectable.
Use
In [11]: df[['C', 'D']] = df[['C', 'D']].fillna(-1)
In [12]: df
Out[12]:
A B C D
0 0.236782 1.408896 -0.199882 0.803165
1 -1.763881 0.232414 -1.000000 -1.000000
2 0.878515 -0.394800 0.429696 -1.829569
Try this:
df.loc[:, ['C', 'D']] = df.loc[:, ['C', 'D']].fillna(-1)
I was having the same difficulty with a .relplace in my code. This worked.
not only multiple columns, but also one column.
df.loc[df.id==123, 'num'].fillna(0, inplace=True)
don't work ,
but
df.loc[df.id==123, 'num'] = 123
it works
why not edit the fillna function to adapt it in the future.
It seems like a bug.
Most helpful comment
not only multiple columns, but also one column.
df.loc[df.id==123, 'num'].fillna(0, inplace=True)
don't work ,
but
df.loc[df.id==123, 'num'] = 123
it works
why not edit the fillna function to adapt it in the future.
It seems like a bug.