๋ ๊ฐ์ data.table
๊ฐ์ฒด๋ฅผ ๊ฒฐํฉํ๊ณ ์์ต๋๋ค. dt_tbl
( dcast
์ํด ์๋์ผ๋ก ์์ฑ๋ ํค๊ฐ ์์) on Y
(ํค๊ฐ ์์), ROLE_TYPE
๋ผ๋ ์ด. ์ต์ข
๊ฐ์ฒด์ NumTxns
์ด์ด ROLE_TYPE == "A"
๋ํด ๊ฐ 86 ์ ๊ฐ์ง ๊ฒ์ผ๋ก ์์ํ์ง๋ง ๋์ NA
์ป์ต๋๋ค.
ํฅ๋ฏธ๋กญ๊ฒ๋ ROLE_TYPE
( dcast
-ed ๊ฐ์ฒด์ dt_tbl
์ ๋ํ ์ฒซ ๋ฒ์งธ ์กฐ์ธ์ด ์ ๋๋ก ์๋ํฉ๋๋ค.
library(data.table)
dt_tbl <- data.table(
ROLE_TYPE = c("D", "A"),
CountCases = c(16L, 25L)
)
X <- data.table(
outlier = c(FALSE, TRUE),
ROLE_TYPE = c("A", "A"),
N = c(220L, 29L)
)
# a dcast-ed table is now keyed
str(dcast(X, ROLE_TYPE ~ outlier, value.var = "N", fill = 0))
# cast and join
dt_tbl <- dcast(X, ROLE_TYPE ~ outlier, value.var = "N", fill = 0)[
dt_tbl,
on = "ROLE_TYPE"
]
# this is correct
dt_tbl
str(dt_tbl)
Y <- data.table(ROLE_TYPE = "A", NumTxns = 86L)
dt_tbl <- Y[
dt_tbl,
on = "ROLE_TYPE"
]
# why is NumTxns NA?
dt_tbl
# ROLE_TYPE NumTxns FALSE TRUE CountCases
# 1: D NA NA NA 16
# 2: A NA 220 29 25
sessionInfo()
์ถ๋ ฅ> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.5
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.12.1
loaded via a namespace (and not attached):
[1] Rcpp_1.0.0 rstudioapi_0.7 magrittr_1.5 usethis_1.4.0 devtools_2.0.1 pkgload_1.0.2 R6_2.3.0 rlang_0.3.1
[9] tools_3.5.1 pkgbuild_1.0.2 sessioninfo_1.1.1 cli_1.0.1 withr_2.1.2 remotes_2.0.2 yaml_2.2.0 assertthat_0.2.0
[17] digest_0.6.18 rprojroot_1.3-2 crayon_1.3.4 processx_3.2.0 callr_3.0.0 base64enc_0.1-3 fs_1.2.6 ps_1.2.1
[25] curl_3.3 testthat_2.0.0 glue_1.3.0 memoise_1.1.0 compiler_3.5.1 desc_1.2.0 backports_1.1.2 prettyunits_1.0.2
์, x์ ๋ํ ํค๋ x[i, on=key(x)]
์ดํ์ ๋ณด์กด๋์ด์๋ ์ ๋๋ฏ๋ก ์ฒซ ๋ฒ์งธ ์กฐ์ธ๋ ์ฌ๋ฐ๋ฅด์ง ์์ผ๋ฉฐ ๋ฌธ์ ๊ฐ ์์๋ ๊ณณ์
๋๋ค.
library(data.table)
dx = data.table(id = "A", key = "id")
di = list(c("D", "A"))
(res <- dx[di])
# id
# 1: D
# 2: A
key(res)
# [1] "id"
ํค๋ฅผ ๊ธฐ์ค์ผ๋ก ์ ๋ ฌํด์ผ ํฉ๋๋ค.
Btw, ๊ฐ์ฒด ๋ฎ์ด์ฐ๊ธฐ / ์ด๋ฆ ์ฌ์ฌ์ฉ์ ์์ ๋ฅผ ํ์ ์ด์์ผ๋ก ํผ๋์ค๋ฝ๊ฒ ๋ง๋ญ๋๋ค.
๊ฐ์ฌํฉ๋๋ค - ํผ๋์ค๋ฌ์ด ์๋ฅผ ๋ค์ด ์ฃ์กํฉ๋๋ค :)
ํจ์น ์ ์ถ
๊ฐ์ฅ ์ ์ฉํ ๋๊ธ
์, x์ ๋ํ ํค๋
x[i, on=key(x)]
์ดํ์ ๋ณด์กด๋์ด์๋ ์ ๋๋ฏ๋ก ์ฒซ ๋ฒ์งธ ์กฐ์ธ๋ ์ฌ๋ฐ๋ฅด์ง ์์ผ๋ฉฐ ๋ฌธ์ ๊ฐ ์์๋ ๊ณณ์ ๋๋ค.ํค๋ฅผ ๊ธฐ์ค์ผ๋ก ์ ๋ ฌํด์ผ ํฉ๋๋ค.
Btw, ๊ฐ์ฒด ๋ฎ์ด์ฐ๊ธฐ / ์ด๋ฆ ์ฌ์ฌ์ฉ์ ์์ ๋ฅผ ํ์ ์ด์์ผ๋ก ํผ๋์ค๋ฝ๊ฒ ๋ง๋ญ๋๋ค.