Data.table: ํ‚ค๊ฐ€ ์—†๋Š” ํ…Œ์ด๋ธ”์—์„œ ํ‚ค๊ฐ€ ์žˆ๋Š” ํ…Œ์ด๋ธ” ์กฐ์ธ์ด ์ž‘๋™ํ•˜์ง€ ์•Š๋Š” ๊ฒฝ์šฐ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

์— ๋งŒ๋“  2019๋…„ 03์›” 04์ผ  ยท  3์ฝ”๋ฉ˜ํŠธ  ยท  ์ถœ์ฒ˜: Rdatatable/data.table

๋‘ ๊ฐœ์˜ data.table ๊ฐœ์ฒด๋ฅผ ๊ฒฐํ•ฉํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. dt_tbl ( dcast ์˜ํ•ด ์ž๋™์œผ๋กœ ์ƒ์„ฑ๋œ ํ‚ค๊ฐ€ ์žˆ์Œ) on Y (ํ‚ค๊ฐ€ ์—†์Œ), ROLE_TYPE ๋ผ๋Š” ์—ด. ์ตœ์ข… ๊ฐœ์ฒด์˜ NumTxns ์—ด์ด ROLE_TYPE == "A" ๋Œ€ํ•ด ๊ฐ’ 86 ์„ ๊ฐ€์งˆ ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒํ•˜์ง€๋งŒ ๋Œ€์‹  NA ์–ป์Šต๋‹ˆ๋‹ค.

ํฅ๋ฏธ๋กญ๊ฒŒ๋„ ROLE_TYPE ( dcast -ed ๊ฐœ์ฒด์˜ dt_tbl ์— ๋Œ€ํ•œ ์ฒซ ๋ฒˆ์งธ ์กฐ์ธ์ด ์ œ๋Œ€๋กœ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.

์žฌํ˜„ ๊ฐ€๋Šฅํ•œ ์˜ˆ

library(data.table)

dt_tbl <- data.table(
  ROLE_TYPE = c("D", "A"), 
  CountCases = c(16L, 25L)
)

X <- data.table(
  outlier = c(FALSE, TRUE), 
  ROLE_TYPE = c("A", "A"),
  N = c(220L, 29L)
  )

# a dcast-ed table is now keyed
str(dcast(X, ROLE_TYPE ~ outlier, value.var = "N", fill = 0)) 

# cast and join
dt_tbl <- dcast(X, ROLE_TYPE ~ outlier, value.var = "N", fill = 0)[
  dt_tbl,
  on = "ROLE_TYPE"
  ]
# this is correct
dt_tbl
str(dt_tbl)

Y <- data.table(ROLE_TYPE = "A", NumTxns = 86L)

dt_tbl <- Y[
  dt_tbl,
  on = "ROLE_TYPE"
  ]
# why is NumTxns NA?
dt_tbl
# ROLE_TYPE NumTxns FALSE TRUE CountCases
# 1:         D      NA    NA   NA         16
# 2:         A      NA   220   29         25

sessionInfo() ์ถœ๋ ฅ

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.5

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.12.1

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.0        rstudioapi_0.7    magrittr_1.5      usethis_1.4.0     devtools_2.0.1    pkgload_1.0.2     R6_2.3.0          rlang_0.3.1      
 [9] tools_3.5.1       pkgbuild_1.0.2    sessioninfo_1.1.1 cli_1.0.1         withr_2.1.2       remotes_2.0.2     yaml_2.2.0        assertthat_0.2.0 
[17] digest_0.6.18     rprojroot_1.3-2   crayon_1.3.4      processx_3.2.0    callr_3.0.0       base64enc_0.1-3   fs_1.2.6          ps_1.2.1         
[25] curl_3.3          testthat_2.0.0    glue_1.3.0        memoise_1.1.0     compiler_3.5.1    desc_1.2.0        backports_1.1.2   prettyunits_1.0.2

๊ฐ€์žฅ ์œ ์šฉํ•œ ๋Œ“๊ธ€

์˜ˆ, x์— ๋Œ€ํ•œ ํ‚ค๋Š” x[i, on=key(x)] ์ดํ›„์— ๋ณด์กด๋˜์–ด์„œ๋Š” ์•ˆ ๋˜๋ฏ€๋กœ ์ฒซ ๋ฒˆ์งธ ์กฐ์ธ๋„ ์˜ฌ๋ฐ”๋ฅด์ง€ ์•Š์œผ๋ฉฐ ๋ฌธ์ œ๊ฐ€ ์‹œ์ž‘๋œ ๊ณณ์ž…๋‹ˆ๋‹ค.

library(data.table)
dx = data.table(id = "A", key = "id")
di = list(c("D", "A"))
(res <- dx[di])
#    id
# 1:  D
# 2:  A
key(res)
# [1] "id"

ํ‚ค๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์ •๋ ฌํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

Btw, ๊ฐ์ฒด ๋ฎ์–ด์“ฐ๊ธฐ / ์ด๋ฆ„ ์žฌ์‚ฌ์šฉ์€ ์˜ˆ์ œ๋ฅผ ํ•„์š” ์ด์ƒ์œผ๋กœ ํ˜ผ๋ž€์Šค๋Ÿฝ๊ฒŒ ๋งŒ๋“ญ๋‹ˆ๋‹ค.

๋ชจ๋“  3 ๋Œ“๊ธ€

์˜ˆ, x์— ๋Œ€ํ•œ ํ‚ค๋Š” x[i, on=key(x)] ์ดํ›„์— ๋ณด์กด๋˜์–ด์„œ๋Š” ์•ˆ ๋˜๋ฏ€๋กœ ์ฒซ ๋ฒˆ์งธ ์กฐ์ธ๋„ ์˜ฌ๋ฐ”๋ฅด์ง€ ์•Š์œผ๋ฉฐ ๋ฌธ์ œ๊ฐ€ ์‹œ์ž‘๋œ ๊ณณ์ž…๋‹ˆ๋‹ค.

library(data.table)
dx = data.table(id = "A", key = "id")
di = list(c("D", "A"))
(res <- dx[di])
#    id
# 1:  D
# 2:  A
key(res)
# [1] "id"

ํ‚ค๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์ •๋ ฌํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

Btw, ๊ฐ์ฒด ๋ฎ์–ด์“ฐ๊ธฐ / ์ด๋ฆ„ ์žฌ์‚ฌ์šฉ์€ ์˜ˆ์ œ๋ฅผ ํ•„์š” ์ด์ƒ์œผ๋กœ ํ˜ผ๋ž€์Šค๋Ÿฝ๊ฒŒ ๋งŒ๋“ญ๋‹ˆ๋‹ค.

๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค - ํ˜ผ๋ž€์Šค๋Ÿฌ์šด ์˜ˆ๋ฅผ ๋“ค์–ด ์ฃ„์†กํ•ฉ๋‹ˆ๋‹ค :)

ํŒจ์น˜ ์ œ์ถœ

์ด ํŽ˜์ด์ง€๊ฐ€ ๋„์›€์ด ๋˜์—ˆ๋‚˜์š”?
0 / 5 - 0 ๋“ฑ๊ธ‰