Pandas: ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„ ์ƒ์„ฑ: ์‚ฌ์ „์œผ๋กœ dtypes ์ง€์ •

์— ๋งŒ๋“  2015๋…„ 01์›” 18์ผ  ยท  3์ฝ”๋ฉ˜ํŠธ  ยท  ์ถœ์ฒ˜: pandas-dev/pandas

์ด ๊ธฐ๋Šฅ์ด ์ด์ „์— ์ œ์•ˆ๋œ ๊ฒฝ์šฐ ์ฃ„์†กํ•ฉ๋‹ˆ๋‹ค. ๋งŽ์€ IO ํ•จ์ˆ˜(์˜ˆ: read_csv )๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ์‚ฌ์ „์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ ์—ด์˜ ํ˜•์‹์„ ์‰ฝ๊ฒŒ ์ง€์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‚ด๊ฐ€ ์ดํ•ดํ•˜๋Š” ํ•œ, ์ด๊ฒƒ์€ ์•„๋‹ˆ๋‹ค
์ผ๋ฐ˜ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„ ๊ตฌ์„ฑ์œผ๋กœ ๊ฐ€๋Šฅ, ์˜ˆ:

df = pd.DataFrame(data=data, columns=columns, dtypes={'colname1': str, 'colname2': np.int})

๋” ๋‚˜์€ ๋ฐฉ๋ฒ•์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์œ ์‚ฌํ•œ ๊ตฌ์„ฑ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„ ์—ด์— ๋Œ€ํ•ด dtypes ๋ฅผ ๋ณ€๊ฒฝํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

df.change_types({'colname1': str, 'colname2': np.int})

์ด๋ฏธ ์ด์™€ ๊ฐ™์€ ๊ณ„ํš์ด ์žˆ์Šต๋‹ˆ๊นŒ?

API Design Dtypes Duplicate Reshaping

๊ฐ€์žฅ ์œ ์šฉํ•œ ๋Œ“๊ธ€

์ด ๋ฐฉ๋ฒ•์€ ์‹ค์ œ๋กœ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.
data_df = data_df.astype( dtype= {"ํœ _๋ฒˆํ˜ธ":"int64", "์ž๋™์ฐจ ์ด๋ฆ„":"๊ฐ์ฒด","๋ถ„_์‚ฌ์šฉ ์‹œ๊ฐ„":"float64"})

๋ชจ๋“  3 ๋Œ“๊ธ€

#9133 ๋ฐ #4464๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”. ๊ทธ๋ ‡๊ฒŒ ์–ด๋ ต์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
์‹œ๋„๋ฅผ ํ•˜์‹œ๊ฒ ์Šต๋‹ˆ๊นŒ?

์ด ๋ฐฉ๋ฒ•์€ ์‹ค์ œ๋กœ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.
data_df = data_df.astype( dtype= {"ํœ _๋ฒˆํ˜ธ":"int64", "์ž๋™์ฐจ ์ด๋ฆ„":"๊ฐ์ฒด","๋ถ„_์‚ฌ์šฉ ์‹œ๊ฐ„":"float64"})

read_csv( csvFile, dtype={..} ) ์™€ df.astpye(dtype={..} ) ์ฐจ์ด์ :
read_csv์˜ ๊ฒฝ์šฐ ์ œ๊ณต๋œ dict์— csv์— ์—†๋Š” ๋” ๋งŽ์€ ์—ด์ด ํฌํ•จ๋˜์–ด ์žˆ์–ด๋„ ๊ดœ์ฐฎ์Šต๋‹ˆ๋‹ค. ์ด ์—ด์€ ์ •์ƒ์ ์œผ๋กœ ๋ฌด์‹œ๋ฉ๋‹ˆ๋‹ค. astype() ์˜ ๊ฒฝ์šฐ ์ •์˜๋œ ๋ชจ๋“  ์—ด์ด ๋ฐ์ดํ„ฐ์— ์—†์œผ๋ฉด ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

read_csv์™€ ๋น„์Šทํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ผ๋ถ€ ์—ด์€ ์žˆ๊ณ  ๋‹ค๋ฅธ ์—ด์€ ์—†์„ ์ˆ˜ ์žˆ๋Š” ๋“ค์–ด์˜ค๋Š” ์‚ฌ์ „์„ ๊ฐ€์งˆ ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ์ง€๊ธˆ ๋‹น์žฅ ํ•ด์•ผ ํ•  ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

df = pd.DataFrame( incoming_data )
gtfs_dtypes = { ... } # my master dtypes dict, having all possible column names
gtfs_dtypes_specific = { x:gtfs_dtypes.get(x,'str') for x in df.columns.tolist() }
df = df.astype(dtype=gtfs_dtypes_specific)
์ด ํŽ˜์ด์ง€๊ฐ€ ๋„์›€์ด ๋˜์—ˆ๋‚˜์š”?
0 / 5 - 0 ๋“ฑ๊ธ‰