Data.table: [R-Forge # 5222] DT [, list (sum (non-.SD-col), lapply (.SD, mean)), by = ..., .SDcols = ...] ์ผ ๋•Œ '์ฐพ์„ ์ˆ˜ ์—†์Œ'

์— ๋งŒ๋“  2014๋…„ 06์›” 08์ผ  ยท  12์ฝ”๋ฉ˜ํŠธ  ยท  ์ถœ์ฒ˜: Rdatatable/data.table

์ œ์ถœ์ž : Matt Weller; ํ• ๋‹น ๋Œ€์ƒ : ์•„๋ฌด๋„; R-Forge ๋งํฌ

.SDcols ์‚ฌ์šฉํ•  ๋•Œ (์—ฌ๋Ÿฌ ์—ด์— ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•˜๊ธฐ ์œ„ํ•ด) ๋‹ค์Œ ๊ตฌ๋ฌธ์„ ์‚ฌ์šฉํ•˜์—ฌ ์›๋ž˜ ํ…Œ์ด๋ธ” (v1)์˜ ๋‹ค๋ฅธ ์—ด์„ ์ฐธ์กฐ ํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.

dt = data.table(grp=c(2,3,3,1,1,2,3), v1=1:7, v2=7:1, v3=10:16)
dt.out = dt[, c(v1 = sum(v1),  lapply(.SD,mean)), by = grp, .SDcols = v2:v3]
# Error in `[.data.table`(dt, , list(v1 = sum(v1), lapply(.SD, mean)), by = grp,  : 
#   object 'v1' not found

๋ชฉ๋ก ๋Œ€์‹  c๋ฅผ ์‚ฌ์šฉํ•  ๋•Œ ๋น„์Šทํ•œ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. ๋ถ„๋ช…ํžˆ j ์ ˆ ๋‚ด์—์„œ v1 ์—ด์— ์•ก์„ธ์Šค ํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.

๋‚˜๋Š” ๊ทธ๊ฒƒ์ด lapply ๋ถ€๋ถ„์— ํฌํ•จ๋˜๋Š” ๊ฒƒ์„ ์›ํ•˜์ง€ ์•Š์ง€๋งŒ ๊ณ„์‚ฐ ํ›„์— ๊ทธ๊ฒƒ์„ ์‚ญ์ œํ•ด์•ผํ–ˆ์ง€๋งŒ, ์—ด v1์„ ํฌํ•จํ•˜๋Š” ๋‹ค์Œ ์ฝ”๋“œ์— ์˜์ง€ํ–ˆ์Šต๋‹ˆ๋‹ค.

sd.cols = c("v1","v2", "v3")
dt.out = dt[, c(sum.v1 = sum(v1), lapply(.SD,mean)), by = grp, .SDcols = sd.cols]

Stackoverflow์˜ eddi์— ๋”ฐ๋ฅด๋ฉด ์ด๊ฒƒ์€ ๋ฒ„๊ทธ์ด๋ฉฐ ๊ทธ๋Š” ๋‚˜์—๊ฒŒ๋ณด๊ณ ํ•˜๋„๋ก ์š”์ฒญํ–ˆ์Šต๋‹ˆ๋‹ค. ๋‚˜๋Š” ๊ทธ๊ฐ€ ๋ฒ„๊ทธ๋ผ๊ณ  ์ƒ๊ฐํ•˜๋Š” ๋ถ€๋ถ„์ด ์ •ํ™•ํžˆ ํ™•์‹คํ•˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— ํ›จ์”ฌ ๋” ์ž์„ธํ•œ ๋‚ด์šฉ์„ ์ œ๊ณต ํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. Arun์ด ๋ฐ›์•„ ๋“ค์ธ ๋‹ต๋ณ€์„ ์‚ดํŽด๋ณด๋ฉด ๋ฌธ์ œ๊ฐ€ ์–ด๋””์— ์žˆ๋Š”์ง€ ๊ฐ•์กฐ ํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค.

๋‹ค์Œ ์€ ๊ด€๋ จ SO ๊ฒŒ์‹œ๋ฌผ์ž…๋‹ˆ๋‹ค.

High bug

๋ชจ๋“  12 ๋Œ“๊ธ€

์—…๋ฐ์ดํŠธ ํ•  ๋‹ค๋ฅธ ๊ฒŒ์‹œ๋ฌผ : http://stackoverflow.com/questions/27755518/data-table-sd-lapply-multiple-columns-in-argument

ํ›„๋ฐ˜ ๋น„ํŠธ,ํ•˜์ง€๋งŒ ์ถ”๊ฐ€ ์ด ๋”๋ฏธ์— ๋‚ด ์งˆ๋ฌธ์—

๋ฒ„๊ทธ๋ผ๊ณ  ์ƒ๊ฐํ•˜์ง€๋„ ์•Š์•˜์Šต๋‹ˆ๋‹ค. ์ผ๋ฐ˜์ ์œผ๋กœ .SDcols ์ถ”๊ฐ€ ํ•„์ˆ˜ ํ•„๋“œ๋ฅผ ์ œ๊ณตํ•˜๊ณ  ๋‚˜์ค‘์— j ์›์น˜ ์•Š๋Š” ์—ด์„ ์ œ์™ธํ•˜๊ธฐ ์œ„ํ•ด .SD[, !"total", with=FALSE] ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

๊ทธ๊ฒƒ์€ ๋˜ ๋‹ค๋ฅธ ์ข‹์€ ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. dt$total ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ๊ณผ ๋น„๊ตํ•˜์—ฌ ์„ฑ๋Šฅ ์ฐจ์ด๊ฐ€ ๊ถ๊ธˆํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์˜ˆ, ์ด๋Ÿฐ ์ข…๋ฅ˜์˜ ๊ฒƒ์€ FR๊ณผ ๋ฒ„๊ทธ, IMO ์‚ฌ์ด์˜ ๊ฒฝ๊ณ„๋ฅผ ์ถค์ถ”๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

๋‹ค์‹œ ๋ถ€ํ’€๋ ค. ์ด๊ฒƒ์€ ๋งค์šฐ ์ค‘์š”ํ•œ ์ˆ˜์ • ์‚ฌํ•ญ์ด ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ์งˆ๋ฌธ ์€ DT[, (deltaColsNewNames) := lapply(.SD, normalDelta, price), .SDcols = deltaColsNames] ์™€ (๊ณผ) ๊ด€๋ จ๋œ ๊ฒƒ์œผ๋กœ ๋ณด์ด๋ฉฐ ์ž ์žฌ์ ์œผ๋กœ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ด๊ฒƒ์ด ์œ ์šฉํ•œ ๋˜ ๋‹ค๋ฅธ ๊ฐ„๋‹จํ•œ ๊ฒฝ์šฐ์ž…๋‹ˆ๋‹ค. http://stackoverflow.com/a/32498711/1191259

๊ณ ํ†ต๋ฐ›๋Š” ๋˜ ๋‹ค๋ฅธ ๊ฐ„๋‹จํ•œ ์‚ฌ๋ก€๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. http://stackoverflow.com/questions/32944060/using-data-table-to-calculate-new-columns/32944519#32944519

์ˆ˜์ •์‹œ ์—…๋ฐ์ดํŠธ ํ•  ๋‹ค๋ฅธ ํ•ญ๋ชฉ : http://stackoverflow.com/q/32915770/1191259

์˜ˆ์ด! ์ด์ œ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

require(data.table)
dt = data.table(grp=c(2,3,3,1,1,2,3), v1=1:7, v2=7:1, v3=10:16)
dt.out = dt[, c(v1 = sum(v1),  lapply(.SD,mean)), by = grp, .SDcols = v2:v3]
  #  grp v1  v2   v3
# 1:   2  7 4.5 12.5
# 2:   3 12 4.0 13.0
# 3:   1  9 3.5 13.5

์—ฌ๊ธฐ์— ๋งํฌ ๋œ ๋ชจ๋“  SO ๊ฒŒ์‹œ๋ฌผ์„ ์—…๋ฐ์ดํŠธํ–ˆ์Šต๋‹ˆ๋‹ค. ๋ชจ๋‘์—๊ฒŒ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค.

๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค, @arunsrinivasan. ๋‚˜๋Š”์ด ์ˆ˜์ •์„ ๋ช‡ ๋…„ ๋™์•ˆ ๊ธฐ๋‹ค๋ฆฌ๊ณ  ์žˆ์—ˆ๋‹ค.

๋Œ€๋ฐ•! ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค

์ด ํŽ˜์ด์ง€๊ฐ€ ๋„์›€์ด ๋˜์—ˆ๋‚˜์š”?
0 / 5 - 0 ๋“ฑ๊ธ‰