Pandas: Dataframe creation: Specifying dtypes with a dictionary

Created on 18 Jan 2015  ·  3Comments  ·  Source: pandas-dev/pandas

Apologies if this feature has been suggested before. Many of the IO functions (e.g. read_csv) allow use to easily specify the format for each column using a dictionary. As far as I understand, this is not
possible with the regular dataframe construction, e.g:

df = pd.DataFrame(data=data, columns=columns, dtypes={'colname1': str, 'colname2': np.int})

Even better, it would be great if one could change the dtypes for the dataframe columns using a similar contruction, e.g.:

df.change_types({'colname1': str, 'colname2': np.int})

Is anything like this planned for already?

API Design Dtypes Duplicate Reshaping

Most helpful comment

This way actually works:
data_df = data_df.astype(dtype= {"wheel_number":"int64", "car_name":"object","minutes_spent":"float64"})

All 3 comments

see #9133 and #4464 it's not that difficult
want to give a try?

This way actually works:
data_df = data_df.astype(dtype= {"wheel_number":"int64", "car_name":"object","minutes_spent":"float64"})

One difference between read_csv( csvFile, dtype={..} ) and df.astpye(dtype={..} ) :
In read_csv's case, it's ok if the dict supplied contains more columns that aren't in the csv, they are ignored gracefully. In astype()'s case, it errors out if all the columns defined aren't present in the data.

It should be more like read_csv. Because we can have incoming dict's that may have some columns and not others. Right now this is the workaround I have to do:

df = pd.DataFrame( incoming_data )
gtfs_dtypes = { ... } # my master dtypes dict, having all possible column names
gtfs_dtypes_specific = { x:gtfs_dtypes.get(x,'str') for x in df.columns.tolist() }
df = df.astype(dtype=gtfs_dtypes_specific)
Was this page helpful?
0 / 5 - 0 ratings