I am joining two data.table
objects: dt_tbl
(which has a key automatically created by dcast
) on Y
(which does not have a key), on a column called ROLE_TYPE
. I am expecting the NumTxns
column in the final object to have value 86 for ROLE_TYPE == "A"
, but instead I get NA
.
Interestingly, the first join on ROLE_TYPE
(dt_tbl
on the dcast
-ed object) works fine.
library(data.table)
dt_tbl <- data.table(
ROLE_TYPE = c("D", "A"),
CountCases = c(16L, 25L)
)
X <- data.table(
outlier = c(FALSE, TRUE),
ROLE_TYPE = c("A", "A"),
N = c(220L, 29L)
)
# a dcast-ed table is now keyed
str(dcast(X, ROLE_TYPE ~ outlier, value.var = "N", fill = 0))
# cast and join
dt_tbl <- dcast(X, ROLE_TYPE ~ outlier, value.var = "N", fill = 0)[
dt_tbl,
on = "ROLE_TYPE"
]
# this is correct
dt_tbl
str(dt_tbl)
Y <- data.table(ROLE_TYPE = "A", NumTxns = 86L)
dt_tbl <- Y[
dt_tbl,
on = "ROLE_TYPE"
]
# why is NumTxns NA?
dt_tbl
# ROLE_TYPE NumTxns FALSE TRUE CountCases
# 1: D NA NA NA 16
# 2: A NA 220 29 25
sessionInfo()
> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.5
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.12.1
loaded via a namespace (and not attached):
[1] Rcpp_1.0.0 rstudioapi_0.7 magrittr_1.5 usethis_1.4.0 devtools_2.0.1 pkgload_1.0.2 R6_2.3.0 rlang_0.3.1
[9] tools_3.5.1 pkgbuild_1.0.2 sessioninfo_1.1.1 cli_1.0.1 withr_2.1.2 remotes_2.0.2 yaml_2.2.0 assertthat_0.2.0
[17] digest_0.6.18 rprojroot_1.3-2 crayon_1.3.4 processx_3.2.0 callr_3.0.0 base64enc_0.1-3 fs_1.2.6 ps_1.2.1
[25] curl_3.3 testthat_2.0.0 glue_1.3.0 memoise_1.1.0 compiler_3.5.1 desc_1.2.0 backports_1.1.2 prettyunits_1.0.2
Yeah, the key for x should not be preserved after x[i, on=key(x)]
, so the first join is also incorrect and is where the problem started.
library(data.table)
dx = data.table(id = "A", key = "id")
di = list(c("D", "A"))
(res <- dx[di])
# id
# 1: D
# 2: A
key(res)
# [1] "id"
It should be sorted by its key.
Btw, overwriting objects / reusing names makes the example more confusing than it needs to be.
Thanks - sorry for the confusing example :)
patch submitted
Most helpful comment
Yeah, the key for x should not be preserved after
x[i, on=key(x)]
, so the first join is also incorrect and is where the problem started.It should be sorted by its key.
Btw, overwriting objects / reusing names makes the example more confusing than it needs to be.