Pandas: Storing a dict in a DataFrame fails

Created on 4 Oct 2017  ·  3Comments  ·  Source: pandas-dev/pandas

Code Sample, a copy-pastable example if possible

Both of the examples below fail with the same error

df = pd.DataFrame(index=[0, 1, 2], columns=['a', 'b'])

df.loc[0, 'a'] = dict(x=2)
df.iloc[0, 0] = dict(x=2)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-282-62f3ee5ff885> in <module>()
      1 # file_map.loc[file_no, 'Q_step_length'] = dict(a=1)
      2 df = pd.DataFrame(index=[0, 1, 2], columns=['a', 'b'])
----> 3 df.iloc[0, 0] = dict(x=2)
      4 df['a'] = df['a'].apply(lambda x: x[0] if not pd.isnull(x) else x)
      5 df

...\lib\site-packages\pandas\core\indexing.py in __setitem__(self, key, value)
    177             key = com._apply_if_callable(key, self.obj)
    178         indexer = self._get_setitem_indexer(key)
--> 179         self._setitem_with_indexer(indexer, value)
    180 
    181     def _has_valid_type(self, k, axis):

...\lib\site-packages\pandas\core\indexing.py in _setitem_with_indexer(self, indexer, value)
    603 
    604             if isinstance(value, (ABCSeries, dict)):
--> 605                 value = self._align_series(indexer, Series(value))
    606 
    607             elif isinstance(value, ABCDataFrame):

...\lib\site-packages\pandas\core\indexing.py in _align_series(self, indexer, ser, multiindex_indexer)
    743             return ser.reindex(ax)._values
    744 
--> 745         raise ValueError('Incompatible indexer with Series')
    746 
    747     def _align_frame(self, indexer, df):

ValueError: Incompatible indexer with Series

This works, but is placing a list into the dataframe

df[0, 'a'] = [dict(x=2)]

It is possible to get the dict directly in the dataframe by using a very inelegant construct like this:

df['a'] = df['a'].apply(lambda x: x[0] if not pd.isnull(x) else x)

Problem description

Since it is possible to store a dict in a dataframe, trying an assignment as above should not fail. I am aware that df.loc[...] = dict(...) will assign values in the dict to the corresponding columns if present (is that documented?) and has its own issues but this behaviour should not apply when accessing a single location of the dataframe

Expected Output

A dataframe with a dict inside the specified location.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.4.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.20.3
pytest: None
pip: 9.0.1
setuptools: 36.5.0
Cython: 0.26
numpy: 1.13.1
scipy: 0.19.1
xarray: None
IPython: 6.1.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.9999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

Indexing

Most helpful comment

Encountered the same issue, had two thoughts:

Storing a dict within a DataFrame is unusual, but there are valid cases where software may be using Pandas as a way to represent and manipulate arbitrary key/value style data where the data is indexed in a way that makes sense for panel representation.

The behavior that location based indexing will update columns based on the keys/values of a provided dictionary was a surprise to me. This is a cool convenience feature that makes sense when an explicit column is not referenced. For example, when providing:

df.loc[row, :] = dict(key1=value1, key2=value2)

It makes sense that the keys of the dictionary might be written as columns and that df.loc[row, key1] == value1. However, when providing an explicit column index, inferring the target columns from a provided dictionary is (to me) counter-intuitive. If I instead supply:

df.loc[row, col] = dict(key=value)

I am explicitly denoting that I want to store the entire value in the col column, and I would expect the dictionary to be inserted as-is.

Anyways, I agree with @jreback that this is somewhat non-idiomatic BUT I am sympathetic to the original issue raised by @andreas-thomik. I encountered a problem where trying to store a dict to an element of a dataframe using this syntax made sense for the particular problem I was facing, so he isn't entirely on his own with this request.

All 3 comments

this is pretty non-idiomatic, and you are pretty much on your own here. you could do it by just using a list/tuple around it

In [14]: df.loc[0, 'a'] = [dict(x=2)]

In [15]: df
Out[15]: 
            a    b
0  [{'x': 2}]  NaN
1         NaN  NaN
2         NaN  NaN

Encountered the same issue, had two thoughts:

Storing a dict within a DataFrame is unusual, but there are valid cases where software may be using Pandas as a way to represent and manipulate arbitrary key/value style data where the data is indexed in a way that makes sense for panel representation.

The behavior that location based indexing will update columns based on the keys/values of a provided dictionary was a surprise to me. This is a cool convenience feature that makes sense when an explicit column is not referenced. For example, when providing:

df.loc[row, :] = dict(key1=value1, key2=value2)

It makes sense that the keys of the dictionary might be written as columns and that df.loc[row, key1] == value1. However, when providing an explicit column index, inferring the target columns from a provided dictionary is (to me) counter-intuitive. If I instead supply:

df.loc[row, col] = dict(key=value)

I am explicitly denoting that I want to store the entire value in the col column, and I would expect the dictionary to be inserted as-is.

Anyways, I agree with @jreback that this is somewhat non-idiomatic BUT I am sympathetic to the original issue raised by @andreas-thomik. I encountered a problem where trying to store a dict to an element of a dataframe using this syntax made sense for the particular problem I was facing, so he isn't entirely on his own with this request.

@aaclayton this is related to #18955 . We could/should prob supporting setting scalars of dicts better (and other iterables). Its a bit tricky though.

Was this page helpful?
0 / 5 - 0 ratings