ΠΡΠ° ΠΏΡΠΎΠ±Π»Π΅ΠΌΠ° ΠΎΡΠ½ΠΎΠ²Π°Π½Π° Π½Π° ΠΌΠΎΠ΅ΠΌ Π²ΠΎΠΏΡΠΎΡΠ΅ ΠΎ stackoverflow .
ΠΠ»Ρ ΠΊΠΎΠ½ΠΊΡΠ΅ΡΠ½ΠΎΠΉ Π½Π°ΡΡΡΠΎΠΉΠΊΠΈ Π΄Π²ΡΡ ΡΠ°Π±Π»ΠΈΡ data.tables ΠΎΠ±ΡΠ΅Π΄ΠΈΠ½Π΅Π½ΠΈΠ΅ Π½Π΅ Π΄Π°Π΅Ρ ΠΎΠΆΠΈΠ΄Π°Π΅ΠΌΡΡ ΡΠ΅Π·ΡΠ»ΡΡΠ°ΡΠΎΠ².
library(data.table)
# In the code below the join does not deliver the result I would expect
DT1 <- data.table(colname=c("test1","test2","test2","test3"), colname_with_suffix=c("other","test","includes test within","other"))
DT2 <- data.table(lookup=c("test1","test2","test3"), lookup_result=c(1,2,3))
DT1[colname_with_suffix == "not found", ] # automatically creates index on colname_with_suffix
DT1[DT2, lookup_result := i.lookup_result, on=c("colname"="lookup")][]
# PLEASE NOTE: same result with slightly different syntax: DT1[DT2, lookup_result := i.lookup_result, on=c(colname="lookup")][]
# colname colname_with_suffix lookup_result
# 1: test1 other NA
# 2: test2 test NA
# 3: test2 includes test within NA
# 4: test3 other 3
# Expected result:
# colname colname_with_suffix lookup_result
# 1: test1 other 1
# 2: test2 test 2
# 3: test2 includes test within 2
# 4: test3 other 3
ΠΠ»Ρ ΡΠ»Π΅Π΄ΡΡΡΠΈΡ Π²Π°ΡΠΈΠ°Π½ΡΠΎΠ² ΡΠΎΠ΅Π΄ΠΈΠ½Π΅Π½ΠΈΠ΅ ΡΠ°Π±ΠΎΡΠ°Π΅Ρ Π΄ΠΎΠ»ΠΆΠ½ΡΠΌ ΠΎΠ±ΡΠ°Π·ΠΎΠΌ. ΠΠ΅ΠΎΠΆΠΈΠ΄Π°Π½Π½ΠΎΠ΅ ΠΏΠΎΠ²Π΅Π΄Π΅Π½ΠΈΠ΅, ΠΎΠΏΠΈΡΠ°Π½Π½ΠΎΠ΅ Π²ΡΡΠ΅, ΠΊΠ°ΠΆΠ΅ΡΡΡ, ΠΏΡΠΎΠΈΡΡ ΠΎΠ΄ΠΈΡ ΡΠΎΠ»ΡΠΊΠΎ Π² ΡΠΎΠΌ ΡΠ»ΡΡΠ°Π΅, Π΅ΡΠ»ΠΈ ΡΡΡΠ΅ΡΡΠ²ΡΠ΅Ρ ΠΈΠ½Π΄Π΅ΠΊΡ Π΄Π»Ρ ΡΡΠΎΠ»Π±ΡΠ°, ΠΈΠΌΡ ΠΊΠΎΡΠΎΡΠΎΠ³ΠΎ ΡΠ²Π»ΡΠ΅ΡΡΡ ΠΏΡΠ΅ΡΠΈΠΊΡΠΎΠΌ ΠΈΠΌΠ΅Π½ΠΈ ΡΡΠΎΠ»Π±ΡΠ° ΡΠΎΠ΅Π΄ΠΈΠ½Π΅Π½ΠΈΡ ΠΈ ΠΎΠ±Π° ΠΈΠΌΠ΅ΡΡ Π°Π½Π°Π»ΠΎΠ³ΠΈΡΠ½ΠΎΠ΅ ΡΠ΅ΠΊΡΡΠΎΠ²ΠΎΠ΅ ΡΠΎΠ΄Π΅ΡΠΆΠΈΠΌΠΎΠ΅.
# For all following alternatives the join delivers the correct result
# (a) Same data tables as above, but no index
DT1 <- data.table(colname=c("test1","test2","test2","test3"), colname_with_suffix=c("other","test","includes test within","other"))
DT2 <- data.table(lookup=c("test1","test2","test3"), lookup_result=c(1,2,3))
DT1[DT2, lookup_result := i.lookup_result, on=c("colname"="lookup")][]
# (b) Index on DT2, but completely different values in indexed column than in join column
DT1 <- data.table(colname=c("test1","test2","test2","test3"), colname_with_suffix=c("other","other","other","other"))
DT2 <- data.table(lookup=c("test1","test2","test3"), lookup_result=c(1,2,3))
DT1[colname_with_suffix == "not found", ] # automatically creates index on colname_with_suffix
DT1[DT2, lookup_result := i.lookup_result, on=c("colname"="lookup")][]
# (c) Index on DT2, similar values in indexed column, but indexed column name is not a prefix of join column name
DT1 <- data.table(colname=c("test1","test2","test2","test3"), x.colname_with_suffix=c("other","test","includes test within","other"))
DT2 <- data.table(lookup=c("test1","test2","test3"), lookup_result=c(1,2,3))
DT1[x.colname_with_suffix == "not found", ] # automatically creates index on x.colname_with_suffix
DT1[DT2, lookup_result := i.lookup_result, on=c("colname"="lookup")][]
SessionInfo:
# R version 3.3.2 (2016-10-31)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows 7 x64 (build 7601) Service Pack 1
#
# locale:
# [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 LC_MONETARY=German_Germany.1252 LC_NUMERIC=C LC_TIME=German_Germany.1252
#
# attached base packages:
# [1] stats graphics grDevices utils datasets methods base
#
# other attached packages:
# [1] data.table_1.10.0
#
# loaded via a namespace (and not attached):
# [1] tools_3.3.2
ΠΠ±ΡΠ°ΡΠΈΡΠ΅ Π²Π½ΠΈΠΌΠ°Π½ΠΈΠ΅, ΡΡΠΎ ΡΠ°ΠΊΠΎΠ΅ ΠΆΠ΅ ΠΏΠΎΠ²Π΅Π΄Π΅Π½ΠΈΠ΅ ΠΏΡΠΎΠΈΡΡ ΠΎΠ΄ΠΈΡ Π΄Π»Ρ data.table 1.10.4 ΠΈ R.Version 3.4.2 ΠΏΠΎΠ΄ Windows, Π° ΡΠ°ΠΊΠΆΠ΅ Π΄Π»Ρ Ubuntu Linux 14.04.
ΠΠ°ΠΆΠ΅ΡΡΡ, ΡΡΠ° ΠΎΡΠΈΠ±ΠΊΠ° Π·Π°ΡΠ»ΡΠΆΠΈΠ²Π°Π΅Ρ ΠΈΡΠΏΡΠ°Π²Π»Π΅Π½ΠΈΡ. Π― ΠΏΠΎΠΊΠΎΠΏΠ°ΡΡΡ Π² ΡΡΠΎΠΌ ΠΈ ΠΏΠΎΡΡΠ°ΡΠ°ΡΡΡ ΠΈΡΠΏΡΠ°Π²ΠΈΡΡ, ΠΊΠΎΠ³Π΄Π° ΠΏΠΎΠ·Π²ΠΎΠ»ΠΈΡ ΠΌΠΎΠ΅ Π²ΡΠ΅ΠΌΡ.
ΠΡΠΎ Π±ΡΠ»ΠΎ Π»Π΅Π³ΠΊΠΎ ΠΈΡΠΏΡΠ°Π²ΠΈΡΡ (ΡΠΌ. ΠΠ°ΠΏΡΠΎΡ Π½Π° ΠΏΠ΅ΡΠ΅Π½ΠΎΡ). ΠΠ»Π°Π³ΠΎΠ΄Π°ΡΠΈΠΌ Π·Π° ΡΠΎΠΎΠ±ΡΠ΅Π½ΠΈΠ΅ ΠΎΠ± ΠΎΡΠΈΠ±ΠΊΠ΅ ΠΈ Π½Π°Π΄Π΅Π΅ΠΌΡΡ, ΡΡΠΎ ΠΏΡΠΎΠ±Π»Π΅ΠΌΠ° ΡΠ΅ΠΏΠ΅ΡΡ ΡΠ΅ΡΠ΅Π½Π°.
@MarkusBonsch Π― ΡΠΎΠ»ΡΠΊΠΎ ΡΡΠΎ ΠΏΡΠΈΠΌΠ΅Π½ΠΈΠ» Π²Π°ΡΡ ΡΠΈΠΊΡΠ°ΡΠΈΡ ΠΊ ΡΠ°ΠΌΠΎΠΉ ΠΏΠΎΡΠ»Π΅Π΄Π½Π΅ΠΉ Π²Π΅ΡΡΠΈΠΈ ΡΠ°Π·ΡΠ°Π±ΠΎΡΡΠΈΠΊΠ° data.table
ΠΈ ΠΏΡΠΎΡΠ΅ΡΡΠΈΡΠΎΠ²Π°Π» Π΅Π΅ Ρ Π΄Π²ΡΠΌΡ ΠΏΡΠΈΠΌΠ΅ΡΠ°ΠΌΠΈ ΠΈΠ· ΡΠ²ΡΠ·Π°Π½Π½ΠΎΠ³ΠΎ Π²ΠΎΠΏΡΠΎΡΠ° SO.
ΠΠ±Π° ΠΏΡΠΈΠΌΠ΅ΡΠ° ΡΠ°Π±ΠΎΡΠ°ΡΡ, ΠΊΠ°ΠΊ ΠΈ ΠΎΠΆΠΈΠ΄Π°Π»ΠΎΡΡ!
ΠΠΎΠ»ΡΡΠΎΠ΅ ΡΠΏΠ°ΡΠΈΠ±ΠΎ Π·Π° Π±ΡΡΡΡΠΎΠ΅ ΠΈΡΠΏΡΠ°Π²Π»Π΅Π½ΠΈΠ΅.
Π‘Π°ΠΌΡΠΉ ΠΏΠΎΠ»Π΅Π·Π½ΡΠΉ ΠΊΠΎΠΌΠΌΠ΅Π½ΡΠ°ΡΠΈΠΉ
ΠΡΠΎ Π±ΡΠ»ΠΎ Π»Π΅Π³ΠΊΠΎ ΠΈΡΠΏΡΠ°Π²ΠΈΡΡ (ΡΠΌ. ΠΠ°ΠΏΡΠΎΡ Π½Π° ΠΏΠ΅ΡΠ΅Π½ΠΎΡ). ΠΠ»Π°Π³ΠΎΠ΄Π°ΡΠΈΠΌ Π·Π° ΡΠΎΠΎΠ±ΡΠ΅Π½ΠΈΠ΅ ΠΎΠ± ΠΎΡΠΈΠ±ΠΊΠ΅ ΠΈ Π½Π°Π΄Π΅Π΅ΠΌΡΡ, ΡΡΠΎ ΠΏΡΠΎΠ±Π»Π΅ΠΌΠ° ΡΠ΅ΠΏΠ΅ΡΡ ΡΠ΅ΡΠ΅Π½Π°.