์ด ๋ฌธ์ ๋ stackoverflow ์ ๋ํ ๋ด
๋ ๊ฐ์ data.tables์ ํน์ ์ค์ ์ ๊ฒฝ์ฐ ์กฐ์ธ์ด ์์ ํ ๊ฒฐ๊ณผ๋ฅผ ์ ๊ณตํ์ง ์์ต๋๋ค.
library(data.table)
# In the code below the join does not deliver the result I would expect
DT1 <- data.table(colname=c("test1","test2","test2","test3"), colname_with_suffix=c("other","test","includes test within","other"))
DT2 <- data.table(lookup=c("test1","test2","test3"), lookup_result=c(1,2,3))
DT1[colname_with_suffix == "not found", ] # automatically creates index on colname_with_suffix
DT1[DT2, lookup_result := i.lookup_result, on=c("colname"="lookup")][]
# PLEASE NOTE: same result with slightly different syntax: DT1[DT2, lookup_result := i.lookup_result, on=c(colname="lookup")][]
# colname colname_with_suffix lookup_result
# 1: test1 other NA
# 2: test2 test NA
# 3: test2 includes test within NA
# 4: test3 other 3
# Expected result:
# colname colname_with_suffix lookup_result
# 1: test1 other 1
# 2: test2 test 2
# 3: test2 includes test within 2
# 4: test3 other 3
๋ค์ ๋ณํ์ ๊ฒฝ์ฐ ์กฐ์ธ์ด ์์๋๋ก ์๋ํฉ๋๋ค. ์์ ์์์น ๋ชปํ ๋์์ ์ด ์ด๋ฆ์ด ์กฐ์ธ ์ด ์ด๋ฆ์ ์ ๋์ฌ์ด๊ณ ๋ ๋ค ์ ์ฌํ ํ ์คํธ ๋ด์ฉ์ ๊ฐ๋ ์ด์ ์ธ๋ฑ์ค๊ฐ์๋ ๊ฒฝ์ฐ์๋ง ๋ฐ์ํ๋ ๊ฒ์ผ๋ก ๋ณด์ ๋๋ค.
# For all following alternatives the join delivers the correct result
# (a) Same data tables as above, but no index
DT1 <- data.table(colname=c("test1","test2","test2","test3"), colname_with_suffix=c("other","test","includes test within","other"))
DT2 <- data.table(lookup=c("test1","test2","test3"), lookup_result=c(1,2,3))
DT1[DT2, lookup_result := i.lookup_result, on=c("colname"="lookup")][]
# (b) Index on DT2, but completely different values in indexed column than in join column
DT1 <- data.table(colname=c("test1","test2","test2","test3"), colname_with_suffix=c("other","other","other","other"))
DT2 <- data.table(lookup=c("test1","test2","test3"), lookup_result=c(1,2,3))
DT1[colname_with_suffix == "not found", ] # automatically creates index on colname_with_suffix
DT1[DT2, lookup_result := i.lookup_result, on=c("colname"="lookup")][]
# (c) Index on DT2, similar values in indexed column, but indexed column name is not a prefix of join column name
DT1 <- data.table(colname=c("test1","test2","test2","test3"), x.colname_with_suffix=c("other","test","includes test within","other"))
DT2 <- data.table(lookup=c("test1","test2","test3"), lookup_result=c(1,2,3))
DT1[x.colname_with_suffix == "not found", ] # automatically creates index on x.colname_with_suffix
DT1[DT2, lookup_result := i.lookup_result, on=c("colname"="lookup")][]
์ธ์ ์ ๋ณด :
# R version 3.3.2 (2016-10-31)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows 7 x64 (build 7601) Service Pack 1
#
# locale:
# [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 LC_MONETARY=German_Germany.1252 LC_NUMERIC=C LC_TIME=German_Germany.1252
#
# attached base packages:
# [1] stats graphics grDevices utils datasets methods base
#
# other attached packages:
# [1] data.table_1.10.0
#
# loaded via a namespace (and not attached):
# [1] tools_3.3.2
Windows ๋ฐ Ubuntu Linux 14.04์์ data.table 1.10.4 ๋ฐ R.Version 3.4.2์ ๋ํด ๋์ผํ ๋์์ด ๋ฐ์ํฉ๋๋ค.
์ด๊ฒ์ ๊ณ ์น ๊ฐ์น๊ฐ์๋ ๋ฒ๊ทธ ์ธ ๊ฒ ๊ฐ์ต๋๋ค. ๋๋ ๊ทธ๊ฒ์ ํ๊ณ ๋ด ์๊ฐ์ด ํ๋ฝํ๋ ํ ๊ณ ์น๋ ค๊ณ ๋ ธ๋ ฅํ ๊ฒ์ ๋๋ค.
์ด๊ฒ์ ์ฌ์ด ์์ ์ด์์ต๋๋ค (pull ์์ฒญ ์ฐธ์กฐ). ๋ฒ๊ทธ๋ฅผ ์ ๊ณ ํด ์ฃผ์ ์ ๊ฐ์ฌ ๋๋ฆฌ๋ฉฐ ์ง๊ธ ๋ฌธ์ ๊ฐ ํด๊ฒฐ๋๊ธฐ๋ฅผ ๋ฐ๋๋๋ค.
@MarkusBonsch ๋ฐฉ๊ธ data.table
์ ๊ฐ์ฅ ์ต์ ๊ฐ๋ฐ ๋ฒ์ ์ ์ปค๋ฐ์ ์ ์ฉํ๊ณ ์ฐ๊ฒฐ๋ SO ์ง๋ฌธ์ ๋ ๊ฐ์ง ์์ ๋ก ํ
์คํธํ์ต๋๋ค.
๋ ์์ ๋ชจ๋ ์์๋๋ก ์๋ํฉ๋๋ค!
๋น ๋ฅธ ์์ ์์ํ ๋ง์ THX.
๊ฐ์ฅ ์ ์ฉํ ๋๊ธ
์ด๊ฒ์ ์ฌ์ด ์์ ์ด์์ต๋๋ค (pull ์์ฒญ ์ฐธ์กฐ). ๋ฒ๊ทธ๋ฅผ ์ ๊ณ ํด ์ฃผ์ ์ ๊ฐ์ฌ ๋๋ฆฌ๋ฉฐ ์ง๊ธ ๋ฌธ์ ๊ฐ ํด๊ฒฐ๋๊ธฐ๋ฅผ ๋ฐ๋๋๋ค.