Data.table: ๊ทธ๋ฃนํ™” ์„ธํŠธ

์— ๋งŒ๋“  2015๋…„ 10์›” 05์ผ  ยท  3์ฝ”๋ฉ˜ํŠธ  ยท  ์ถœ์ฒ˜: Rdatatable/data.table

์ผ๋ถ€ ํ‚ค์›Œ๋“œ: GROUPING SETS, ROLLUP, CUBE, GROUPING
์ผ๋ถ€ ์ฐธ์กฐ: postgres , Oracle , SQL Server , ์ž„์˜ ๊ธฐ๋Šฅ๊ณผ ๊ฒฐํ•ฉ๋œ ๊ทธ๋ฃนํ™”

_๊ทธ๋ฃนํ™” ์ง‘ํ•ฉ_ ๋ฐ ์นœ๊ตฌ๋Š” ์ข…์ข… ์›ํ•˜๋Š” ๋‹ค์–‘ํ•œ ์ง‘๊ณ„ ์ˆ˜์ค€์„ ๋ฏธ๋ฆฌ ๊ณ„์‚ฐํ•˜๋Š” ๋ฐ ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค. data.table์˜ ํ•ด๋‹น ๊ธฐ๋Šฅ์— ๋Œ€ํ•œ API๋Š” ๊ทธ๋‹ค์ง€ ์นœ์ˆ™ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. data.table์„ ์‚ฌ์šฉํ•˜์—ฌ ์†Œ๊ณ„ ๋ฐ ์ด๊ณ„ ์ง‘๊ณ„๋ฅผ ์ฐธ์กฐํ•˜์‹ญ์‹œ์˜ค.

_rollup_์˜ ๊ฒฝ์šฐ ์œ„์—์„œ ์•„๋ž˜๋กœ ์ œ๊ณต๋œ by ๋Œ€ํ•œ ์ง‘๊ณ„์ž…๋‹ˆ๋‹ค. postgres man์˜ ์„ค๋ช…๊ณผ ์•„๋ž˜ ์˜ˆ์ œ ์ฝ”๋“œ๋ฅผ ์ฐธ์กฐํ•˜์‹ญ์‹œ์˜ค.

ROLLUP ( e1, e2, e3, ... )

๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

GROUPING SETS (
    ( e1, e2, e3, ... ),
    ...
    ( e1, e2 )
    ( e1 )
    ( )
)

๊ทธ ๊ณผ์ •์—์„œ ๊ฐ’์‹ผ ์†๋„๊ฐ€ ์žˆ์„ ์ˆ˜ ์žˆ๋Š”์ง€ ๊ถ๊ธˆํ•ฉ๋‹ˆ๋‹ค. ์ด๊ฒƒ์€ ์ž ์žฌ์ ์œผ๋กœ ๋ฌด๊ฑฐ์šด ์ปดํ“จํŒ… ์ž‘์—…์ž…๋‹ˆ๋‹ค. C๋กœ ๊ฐœ๋ฐœ๋œ _๊ทธ๋ฃนํ™” ์„ธํŠธ_ ๊ธฐ๋Šฅ์˜ ๊ณ„์‚ฐ์„ ๊ฐ–๋Š” ๊ฒƒ์ด ์ข‹์„ ๊ฒƒ์ด๋ฏ€๋กœ ๋ชจ๋“  _๋กค์—…/ํ๋ธŒ_ ๋ฐ ๊ธฐํƒ€ ๊ธฐ๋Šฅ์€ ์—ฌ์ „ํžˆ โ€‹โ€‹์ตœ๋Œ€ ์†๋„๋ฅผ ํ™œ์šฉํ•˜๋Š” R์—์„œ _๊ทธ๋ฃนํ™” ์„ธํŠธ_ ์œ„์— ๋” ์‰ฝ๊ฒŒ ๊ตฌ์ถ•๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.


๋‹ซ์„ ๋•Œ ์—…๋ฐ์ดํŠธํ•  ๋‹ต๋ณ€:

library(plyr)
grp.cols <- c("vs", "am", "gear", "carb", "cyl")
plyr.r = do.call(
    rbind.fill,
    lapply(1:length(grp.cols), function(x) ddply(mtcars, grp.cols[1:x], summarize, agg=mean(mpg)))
)

library(data.table) # 1.9.7+
dt.r = rollup(as.data.table(mtcars), j = .(agg=mean(mpg)), by=grp.cols)
all.equal(
    as.data.table(plyr.r),
    dt.r[-.N], # exclude grand total, not present in BrodieG answer
    ignore.row.order = TRUE,
    ignore.col.order = TRUE
)
#[1] TRUE
# install.packages("data.table", type = "source", repos = "https://Rdatatable.github.io/data.table")
library(data.table)
set.seed(1)
DT = data.table(
    group=sample(letters[1:2],100,replace=TRUE), 
    year=sample(2010:2012,100,replace=TRUE),
    v=runif(100))

cube(DT, mean(v), by=c("group","year"))
#    group year        V1
#1:     a 2011 0.4176346
#2:     b 2010 0.5231845
#3:     b 2012 0.4306871
#4:     b 2011 0.4997119
#5:     a 2012 0.4227796
#6:     a 2010 0.2926945
#7:    NA 2011 0.4463616
#8:    NA 2010 0.4278093
#9:    NA 2012 0.4271160
#10:     a   NA 0.3901875
#11:     b   NA 0.4835788
#12:    NA   NA 0.4350153
cube(DT, mean(v), by=c("group","year"), id=TRUE)
#    grouping group year        V1
#1:        0     a 2011 0.4176346
#2:        0     b 2010 0.5231845
#3:        0     b 2012 0.4306871
#4:        0     b 2011 0.4997119
#5:        0     a 2012 0.4227796
#6:        0     a 2010 0.2926945
#7:        2    NA 2011 0.4463616
#8:        2    NA 2010 0.4278093
#9:        2    NA 2012 0.4271160
#10:        1     a   NA 0.3901875
#11:        1     b   NA 0.4835788
#12:        3    NA   NA 0.4350153

# install.packages("data.table", type = "source", repos = "https://Rdatatable.github.io/data.table")

๋‹ค๋ฅธ ์งˆ๋ฌธ์— ๋Œ€ํ•ด์„œ๋„ ์ƒˆ๋กœ์šด ๋‹ต๋ณ€์„ ์–ป์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

feature request

๊ฐ€์žฅ ์œ ์šฉํ•œ ๋Œ“๊ธ€

์ด๊ฒƒ์€ ๋‹จ์ง€ ๊ต‰์žฅํ•ฉ๋‹ˆ๋‹ค. Shiny ๋ฐฉ์‹์œผ๋กœ ํ”ผ๋ฒ— ํ…Œ์ด๋ธ” ์ž‘์—…์„ ๋” ์‰ฝ๊ฒŒ ๋งŒ๋“ญ๋‹ˆ๋‹ค.

๋ชจ๋“  3 ๋Œ“๊ธ€

+1

library(data.table) # version 1.10.5 required
dt = data.table(ggplot2::diamonds)
groupingsets(dt, c(lapply(.SD, mean), list(COUNT = .N)), 
     by = names(dt)[2:4], .SDcols = 5:10, id = FALSE,
     sets = as.list(names(dt)[2:4]))
          cut color clarity    depth    table    price        x        y        z COUNT
 1:     Ideal    NA      NA 61.70940 55.95167 3457.542 5.507451 5.520080 3.401448 21551
 2:   Premium    NA      NA 61.26467 58.74610 4584.258 5.973887 5.944879 3.647124 13791
 3:      Good    NA      NA 62.36588 58.69464 3928.864 5.838785 5.850744 3.639507  4906
 4: Very Good    NA      NA 61.81828 57.95615 3981.760 5.740696 5.770026 3.559801 12082
 5:      Fair    NA      NA 64.04168 59.05379 4358.758 6.246894 6.182652 3.982770  1610
 6:        NA     E      NA 61.66209 57.49120 3076.752 5.411580 5.419029 3.340689  9797
 7:        NA     I      NA 61.84639 57.57728 5091.875 6.222826 6.222730 3.845411  5422
 8:        NA     J      NA 61.88722 57.81239 5323.818 6.519338 6.518105 4.033251  2808
 9:        NA     H      NA 61.83685 57.51781 4486.669 5.983335 5.984815 3.695965  8304
10:        NA     F      NA 61.69458 57.43354 3724.886 5.614961 5.619456 3.464446  9542
11:        NA     G      NA 61.75711 57.28863 3999.136 5.677543 5.680192 3.505021 11292
12:        NA     D      NA 61.69813 57.40459 3169.954 5.417051 5.421128 3.342827  6775
13:        NA    NA     SI2 61.77217 57.92718 5063.029 6.401370 6.397826 3.948478  9194
14:        NA    NA     SI1 61.85304 57.66254 3996.001 5.888383 5.888256 3.639845 13065
15:        NA    NA     VS1 61.66746 57.31515 3839.455 5.572178 5.581828 3.441007  8171
16:        NA    NA     VS2 61.72442 57.41740 3924.989 5.657709 5.658859 3.491478 12258
17:        NA    NA    VVS2 61.66378 57.02499 3283.737 5.218454 5.232118 3.221465  5066
18:        NA    NA    VVS1 61.62465 56.88446 2523.115 4.960364 4.975075 3.061294  3655
19:        NA    NA      I1 62.73428 58.30378 3924.169 6.761093 6.709379 4.207908   741
20:        NA    NA      IF 61.51061 56.50721 2864.839 4.968402 4.989827 3.061659  1790

์ด๊ฒƒ์€ ๋‹จ์ง€ ๊ต‰์žฅํ•ฉ๋‹ˆ๋‹ค. Shiny ๋ฐฉ์‹์œผ๋กœ ํ”ผ๋ฒ— ํ…Œ์ด๋ธ” ์ž‘์—…์„ ๋” ์‰ฝ๊ฒŒ ๋งŒ๋“ญ๋‹ˆ๋‹ค.

์ด ํŽ˜์ด์ง€๊ฐ€ ๋„์›€์ด ๋˜์—ˆ๋‚˜์š”?
0 / 5 - 0 ๋“ฑ๊ธ‰

๊ด€๋ จ ๋ฌธ์ œ

jameslamb picture jameslamb  ยท  3์ฝ”๋ฉ˜ํŠธ

tcederquist picture tcederquist  ยท  3์ฝ”๋ฉ˜ํŠธ

sengoku93 picture sengoku93  ยท  3์ฝ”๋ฉ˜ํŠธ

mattdowle picture mattdowle  ยท  3์ฝ”๋ฉ˜ํŠธ

andschar picture andschar  ยท  3์ฝ”๋ฉ˜ํŠธ