Pandas: Dataclass support

Created on 14 Jul 2018  ·  3Comments  ·  Source: pandas-dev/pandas

Proposal description

Dataclasses were added in Python 3.7.

It would be nice for pandas to support dataclasses. For example could be possible to construct dataframe from by calling .from_dataclasses or just .DataFrame(data=dataclass_list). There should be also possibility to do .to_dataclasses.

Expected Behaviour

from dataclasses import dataclass
import pandas as pd

@dataclass
class SimpleDataObject(object):
  field_a: int
  field_b: str

dataclass_object1 = SimpleDataObject(1, 'a')
dataclass_object2 = SimpleDataObject(2, 'b')
>>> asd

# Dataclasses to DataFrame
df = pd.from_dataclasses([dataclass_object1, dataclass_object2])
df.dtypes == ['field_a', 'field_b']
>>> True
df.dtypes == ['int', 'str']
>>> True

# Dataclasses to DataFrame
df = pd.DataFrame(data=[dataclass_object1, dataclass_object2])
df.dtypes == ['field_a', 'field_b']
>>> True
df.dtypes == ['int', 'str']
>>> True

# DataFrame to Dataclasses
df = pd.DataFrame(columns=['field_a', 'field_b'], data=[[1, 'a'], [2, 'b']])
dataclass_list = df.to_dataclasses()
dataclass_list == [dataclass_object1, dataclass_object2]
>>> True
API Design Needs Discussion

Most helpful comment

The dataclasses module has is_dataclass and fields introspection functions, so that part shouldn't be an issue.

That said I'm not sure we should quickly commit to any specific API/support here. For now the the asdict helper from the dataclasses module can help with the ingest usecase.

In [18]: from dataclasses import asdict

In [19]: pd.DataFrame([asdict(x) for x in [dataclass_object1, dataclass_object2]])
Out[19]:
   field_a field_b
0        1       a
1        2       b

All 3 comments

AFAIK is s not guaranteed that you can know that a certain instance is a dataclass. E.g. Classes do not inherit from dataclass.

From your example:

@dataclass
class SimpleDataObject(object):
  field_a: int
  field_b: str

x = SimpleDataObject(a=2, b=‘f’)

I dont think you could even tell from introspection that x is a dataclass, correct? If that’s the case, this isn’t possible to do.

The dataclasses module has is_dataclass and fields introspection functions, so that part shouldn't be an issue.

That said I'm not sure we should quickly commit to any specific API/support here. For now the the asdict helper from the dataclasses module can help with the ingest usecase.

In [18]: from dataclasses import asdict

In [19]: pd.DataFrame([asdict(x) for x in [dataclass_object1, dataclass_object2]])
Out[19]:
   field_a field_b
0        1       a
1        2       b

I compiled a solution where I check the data provided during __init__, in this PR, however, it looks like their testing pipeline is setup to support multiple py-versions. So I may need a bit more time to make this happen.

Was this page helpful?
0 / 5 - 0 ratings