Both of the examples below fail with the same error
df = pd.DataFrame(index=[0, 1, 2], columns=['a', 'b'])
df.loc[0, 'a'] = dict(x=2)
df.iloc[0, 0] = dict(x=2)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-282-62f3ee5ff885> in <module>()
1 # file_map.loc[file_no, 'Q_step_length'] = dict(a=1)
2 df = pd.DataFrame(index=[0, 1, 2], columns=['a', 'b'])
----> 3 df.iloc[0, 0] = dict(x=2)
4 df['a'] = df['a'].apply(lambda x: x[0] if not pd.isnull(x) else x)
5 df
...\lib\site-packages\pandas\core\indexing.py in __setitem__(self, key, value)
177 key = com._apply_if_callable(key, self.obj)
178 indexer = self._get_setitem_indexer(key)
--> 179 self._setitem_with_indexer(indexer, value)
180
181 def _has_valid_type(self, k, axis):
...\lib\site-packages\pandas\core\indexing.py in _setitem_with_indexer(self, indexer, value)
603
604 if isinstance(value, (ABCSeries, dict)):
--> 605 value = self._align_series(indexer, Series(value))
606
607 elif isinstance(value, ABCDataFrame):
...\lib\site-packages\pandas\core\indexing.py in _align_series(self, indexer, ser, multiindex_indexer)
743 return ser.reindex(ax)._values
744
--> 745 raise ValueError('Incompatible indexer with Series')
746
747 def _align_frame(self, indexer, df):
ValueError: Incompatible indexer with Series
This works, but is placing a list into the dataframe
df[0, 'a'] = [dict(x=2)]
It is possible to get the dict directly in the dataframe by using a very inelegant construct like this:
df['a'] = df['a'].apply(lambda x: x[0] if not pd.isnull(x) else x)
Since it is possible to store a dict in a dataframe, trying an assignment as above should not fail. I am aware that df.loc[...] = dict(...) will assign values in the dict to the corresponding columns if present (is that documented?) and has its own issues but this behaviour should not apply when accessing a single location of the dataframe
A dataframe with a dict inside the specified location.
pd.show_versions()
commit: None
python: 3.5.4.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.20.3
pytest: None
pip: 9.0.1
setuptools: 36.5.0
Cython: 0.26
numpy: 1.13.1
scipy: 0.19.1
xarray: None
IPython: 6.1.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.9999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None
this is pretty non-idiomatic, and you are pretty much on your own here. you could do it by just using a list/tuple around it
In [14]: df.loc[0, 'a'] = [dict(x=2)]
In [15]: df
Out[15]:
a b
0 [{'x': 2}] NaN
1 NaN NaN
2 NaN NaN
Encountered the same issue, had two thoughts:
Storing a dict within a DataFrame is unusual, but there are valid cases where software may be using Pandas as a way to represent and manipulate arbitrary key/value style data where the data is indexed in a way that makes sense for panel representation.
The behavior that location based indexing will update columns based on the keys/values of a provided dictionary was a surprise to me. This is a cool convenience feature that makes sense when an explicit column is not referenced. For example, when providing:
df.loc[row, :] = dict(key1=value1, key2=value2)
It makes sense that the keys of the dictionary might be written as columns and that df.loc[row, key1] == value1
. However, when providing an explicit column index, inferring the target columns from a provided dictionary is (to me) counter-intuitive. If I instead supply:
df.loc[row, col] = dict(key=value)
I am explicitly denoting that I want to store the entire value in the col
column, and I would expect the dictionary to be inserted as-is.
Anyways, I agree with @jreback that this is somewhat non-idiomatic BUT I am sympathetic to the original issue raised by @andreas-thomik. I encountered a problem where trying to store a dict to an element of a dataframe using this syntax made sense for the particular problem I was facing, so he isn't entirely on his own with this request.
@aaclayton this is related to #18955 . We could/should prob supporting setting scalars of dicts better (and other iterables). Its a bit tricky though.
Most helpful comment
Encountered the same issue, had two thoughts:
Storing a dict within a DataFrame is unusual, but there are valid cases where software may be using Pandas as a way to represent and manipulate arbitrary key/value style data where the data is indexed in a way that makes sense for panel representation.
The behavior that location based indexing will update columns based on the keys/values of a provided dictionary was a surprise to me. This is a cool convenience feature that makes sense when an explicit column is not referenced. For example, when providing:
df.loc[row, :] = dict(key1=value1, key2=value2)
It makes sense that the keys of the dictionary might be written as columns and that
df.loc[row, key1] == value1
. However, when providing an explicit column index, inferring the target columns from a provided dictionary is (to me) counter-intuitive. If I instead supply:df.loc[row, col] = dict(key=value)
I am explicitly denoting that I want to store the entire value in the
col
column, and I would expect the dictionary to be inserted as-is.Anyways, I agree with @jreback that this is somewhat non-idiomatic BUT I am sympathetic to the original issue raised by @andreas-thomik. I encountered a problem where trying to store a dict to an element of a dataframe using this syntax made sense for the particular problem I was facing, so he isn't entirely on his own with this request.