Data.table: ๊ธฐ๋Šฅ๋ณ„ ํ‚ค ๊ตฌ๋ถ„

์— ๋งŒ๋“  2016๋…„ 05์›” 16์ผ  ยท  3์ฝ”๋ฉ˜ํŠธ  ยท  ์ถœ์ฒ˜: Rdatatable/data.table

๋‚˜๋Š” ์ด๊ฒƒ์„ ์ผ์œผํ‚ค๋Š” ์›์ธ์ด ๋ฌด์—‡์ธ์ง€ ์™„์ „ํžˆ ํ™•์‹ ํ•˜์ง€ ๋ชปํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์—ฌ๊ธฐ ๋‚ด๊ฐ€ ์ฐพ์„ ์ˆ˜ ์žˆ๋Š” ๊ฐ€์žฅ ์ตœ์†Œํ•œ์˜ WE๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

library(data.table) # Tested on v 1.9.7
dt <-  data.table( origin = c("A", "A", "A", "A", "A", "A", "B", "B", "A", "A", "C", "C", "B", "B", "B", "B", "B", "C", "C", "B", "A", "C", "C", "C", "C", "C", "A", "A", "C", "C", "B", "B"),
                   destination = c("A", "A", "A", "A", "B", "B", "A", "A", "C", "C", "A", "A", "B", "B", "B", "C", "C", "B", "B", "A", "B", "C", "C", "C", "A", "A", "C", "C", "B", "B", "C", "C"),
                   points_in_dest = c(5, 5, 5, 5, 4, 4, 5, 5, 3, 3, 5, 5, 4, 4, 4, 3, 3, 4, 4, 5, 4, 3, 3, 3, 5,5, 3, 3, 4, 4, 3, 3),
                   depart_time = c(7, 8, 16, 18, 7, 8, 16, 18, 7, 8, 16, 18, 7, 8, 16, 7, 8, 16, 18, 8, 16, 7, 8, 18, 7, 8, 16, 18, 7, 8, 16, 18),   
                   travel_time = c(0, 0, 0, 0, 70, 10, 70, 10, 10, 10, 70, 70, 0, 0, 0, 70, 10, 10, 70, 70, 10, 0, 0, 0, 10, 70, 10, 70, 10, 70, 70, 10) )

dt[ depart_time<=8  & travel_time < 60, condition1 := TRUE]
dt[ depart_time>=16 & travel_time < 60, condition2 := TRUE] 

setkey(dt, origin, destination)
res <- unique(dt[(condition1)])[unique(dt[(condition2)]), 
                                on = c(destination = "origin", origin = "destination"), 
                                nomatch = 0L]
res[, .(points = sum(points_in_dest)),  keyby = origin]
#    origin points
#1:      A      5
#2:      A      4
#3:      B      4
#4:      B      3
#5:      C      5
#6:      C      4
#7:      C      3

๋ณด์‹œ๋‹ค์‹œํ”ผ by ์ด ์˜๋„ํ•œ ๋Œ€๋กœ ์ž‘๋™ํ•˜์ง€ ์•Š์•˜๊ณ  ๋ชจ๋“  ํ–‰์ด ๋ฐ˜ํ™˜๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๋‹ค์Œ์ด ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋ฏ€๋กœ ๋ถ„๋ช…ํžˆ ํ‚ค ๋ฌธ์ œ์ž…๋‹ˆ๋‹ค.

setattr(res, "sorted", NULL)
res[, .(points = sum(points_in_dest)), keyby = origin]
#    origin points
#1:      A      9
#2:      B      7
#3:      C     12

๋˜๋Š” ๋Œ€์•ˆ์œผ๋กœ origin ๋ฅผ ์š”์ธ์œผ๋กœ ๋ฏธ๋ฆฌ ๋ถ„๋ฅ˜ํ•ฉ๋‹ˆ๋‹ค.

res[, .(points = sum(points_in_dest)), keyby = factor(origin)]
#    factor points
#1:      A      9
#2:      B      7
#3:      C     12

์ด๊ฒƒ์€ ์ด SO ์งˆ๋ฌธ์—์„œ ๊ฐ€์ ธ์˜จ ๊ฒƒ์ž…๋‹ˆ๋‹ค. http://stackoverflow.com/questions/37239649/aggregate-data-table-based-on-condition-in-another-row

High bug

๊ฐ€์žฅ ์œ ์šฉํ•œ ๋Œ“๊ธ€

์•„์ฃผ ์ข‹์€ ์˜ˆ์ž…๋‹ˆ๋‹ค. ์ˆ˜์ •ํ•ฉ๋‹ˆ๋‹ค. ๊ฐ์‚ฌ ํ•ด์š”.

๋ชจ๋“  3 ๋Œ“๊ธ€

์•„์ฃผ ์ข‹์€ ์˜ˆ์ž…๋‹ˆ๋‹ค. ์ˆ˜์ •ํ•ฉ๋‹ˆ๋‹ค. ๊ฐ์‚ฌ ํ•ด์š”.

๊ทธ๊ฒƒ์€ ๊ธฐ๋Šฅ์„ ์ฒ ์žํ•˜๋Š” ์ฐฝ์กฐ์  ์ธ ๋ฐฉ๋ฒ•์ด๋ผ๊ณ  ๋งํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค!

๊ฒฐ์ •๋œ....

์ด ํŽ˜์ด์ง€๊ฐ€ ๋„์›€์ด ๋˜์—ˆ๋‚˜์š”?
0 / 5 - 0 ๋“ฑ๊ธ‰