Data.table: ОбъСдинСниС с индСксом Π΄Π°Π΅Ρ‚ Π½Π΅ΠΎΠΆΠΈΠ΄Π°Π½Π½Ρ‹Π΅ Ρ€Π΅Π·ΡƒΠ»ΡŒΡ‚Π°Ρ‚Ρ‹, Ссли имя индСксированного столбца являСтся прСфиксом ΠΈΠΌΠ΅Π½ΠΈ столбца объСдинСния.

Π‘ΠΎΠ·Π΄Π°Π½Π½Ρ‹ΠΉ Π½Π° 6 нояб. 2017  Β·  3ΠšΠΎΠΌΠΌΠ΅Π½Ρ‚Π°Ρ€ΠΈΠΈ  Β·  Π˜ΡΡ‚ΠΎΡ‡Π½ΠΈΠΊ: Rdatatable/data.table

Π­Ρ‚Π° ΠΏΡ€ΠΎΠ±Π»Π΅ΠΌΠ° основана Π½Π° ΠΌΠΎΠ΅ΠΌ вопросС ΠΎ stackoverflow .

Для ΠΊΠΎΠ½ΠΊΡ€Π΅Ρ‚Π½ΠΎΠΉ настройки Π΄Π²ΡƒΡ… Ρ‚Π°Π±Π»ΠΈΡ† data.tables объСдинСниС Π½Π΅ Π΄Π°Π΅Ρ‚ ΠΎΠΆΠΈΠ΄Π°Π΅ΠΌΡ‹Ρ… Ρ€Π΅Π·ΡƒΠ»ΡŒΡ‚Π°Ρ‚ΠΎΠ².

library(data.table)

# In the code below the join does not deliver the result I would expect
DT1 <- data.table(colname=c("test1","test2","test2","test3"), colname_with_suffix=c("other","test","includes test within","other"))
DT2 <- data.table(lookup=c("test1","test2","test3"), lookup_result=c(1,2,3))
DT1[colname_with_suffix == "not found", ]  # automatically creates index on colname_with_suffix
DT1[DT2, lookup_result := i.lookup_result, on=c("colname"="lookup")][]
# PLEASE NOTE: same result with slightly different syntax: DT1[DT2, lookup_result := i.lookup_result, on=c(colname="lookup")][]
# colname  colname_with_suffix lookup_result
# 1:   test1                other         NA
# 2:   test2                 test         NA
# 3:   test2 includes test within         NA
# 4:   test3                other          3


# Expected result:
 # colname  colname_with_suffix lookup_result
# 1:   test1                other          1
# 2:   test2                 test          2
# 3:   test2 includes test within          2
# 4:   test3                other          3

Для ΡΠ»Π΅Π΄ΡƒΡŽΡ‰ΠΈΡ… Π²Π°Ρ€ΠΈΠ°Π½Ρ‚ΠΎΠ² соСдинСниС Ρ€Π°Π±ΠΎΡ‚Π°Π΅Ρ‚ Π΄ΠΎΠ»ΠΆΠ½Ρ‹ΠΌ ΠΎΠ±Ρ€Π°Π·ΠΎΠΌ. НСоТиданноС ΠΏΠΎΠ²Π΅Π΄Π΅Π½ΠΈΠ΅, описанноС Π²Ρ‹ΡˆΠ΅, каТСтся, происходит Ρ‚ΠΎΠ»ΡŒΠΊΠΎ Π² Ρ‚ΠΎΠΌ случаС, Ссли сущСствуСт индСкс для столбца, имя ΠΊΠΎΡ‚ΠΎΡ€ΠΎΠ³ΠΎ являСтся прСфиксом ΠΈΠΌΠ΅Π½ΠΈ столбца соСдинСния ΠΈ ΠΎΠ±Π° ΠΈΠΌΠ΅ΡŽΡ‚ Π°Π½Π°Π»ΠΎΠ³ΠΈΡ‡Π½ΠΎΠ΅ тСкстовоС содСрТимоС.

# For all following alternatives the join delivers the correct result

# (a) Same data tables as above, but no index
DT1 <- data.table(colname=c("test1","test2","test2","test3"), colname_with_suffix=c("other","test","includes test within","other"))
DT2 <- data.table(lookup=c("test1","test2","test3"), lookup_result=c(1,2,3))
DT1[DT2, lookup_result := i.lookup_result, on=c("colname"="lookup")][]

# (b) Index on DT2, but completely different values in indexed column than in join column
DT1 <- data.table(colname=c("test1","test2","test2","test3"), colname_with_suffix=c("other","other","other","other"))
DT2 <- data.table(lookup=c("test1","test2","test3"), lookup_result=c(1,2,3))
DT1[colname_with_suffix == "not found", ]  # automatically creates index on colname_with_suffix
DT1[DT2, lookup_result := i.lookup_result, on=c("colname"="lookup")][]

# (c) Index on DT2, similar values in indexed column, but indexed column name is not a prefix of join column name
DT1 <- data.table(colname=c("test1","test2","test2","test3"), x.colname_with_suffix=c("other","test","includes test within","other"))
DT2 <- data.table(lookup=c("test1","test2","test3"), lookup_result=c(1,2,3))
DT1[x.colname_with_suffix == "not found", ]  # automatically creates index on x.colname_with_suffix
DT1[DT2, lookup_result := i.lookup_result, on=c("colname"="lookup")][]

SessionInfo:

# R version 3.3.2 (2016-10-31)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows 7 x64 (build 7601) Service Pack 1
# 
# locale:
#     [1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252    LC_MONETARY=German_Germany.1252 LC_NUMERIC=C                    LC_TIME=German_Germany.1252    
# 
# attached base packages:
#     [1] stats     graphics  grDevices utils     datasets  methods   base     
# 
# other attached packages:
#     [1] data.table_1.10.0
# 
# loaded via a namespace (and not attached):
#     [1] tools_3.3.2

ΠžΠ±Ρ€Π°Ρ‚ΠΈΡ‚Π΅ Π²Π½ΠΈΠΌΠ°Π½ΠΈΠ΅, Ρ‡Ρ‚ΠΎ Ρ‚Π°ΠΊΠΎΠ΅ ΠΆΠ΅ ΠΏΠΎΠ²Π΅Π΄Π΅Π½ΠΈΠ΅ происходит для data.table 1.10.4 ΠΈ R.Version 3.4.2 ΠΏΠΎΠ΄ Windows, Π° Ρ‚Π°ΠΊΠΆΠ΅ для Ubuntu Linux 14.04.

Π‘Π°ΠΌΡ‹ΠΉ ΠΏΠΎΠ»Π΅Π·Π½Ρ‹ΠΉ ΠΊΠΎΠΌΠΌΠ΅Π½Ρ‚Π°Ρ€ΠΈΠΉ

Π­Ρ‚ΠΎ Π±Ρ‹Π»ΠΎ Π»Π΅Π³ΠΊΠΎ ΠΈΡΠΏΡ€Π°Π²ΠΈΡ‚ΡŒ (см. Запрос Π½Π° пСрСнос). Π‘Π»Π°Π³ΠΎΠ΄Π°Ρ€ΠΈΠΌ Π·Π° сообщСниС ΠΎΠ± ошибкС ΠΈ надССмся, Ρ‡Ρ‚ΠΎ ΠΏΡ€ΠΎΠ±Π»Π΅ΠΌΠ° Ρ‚Π΅ΠΏΠ΅Ρ€ΡŒ Ρ€Π΅ΡˆΠ΅Π½Π°.

ВсС 3 ΠšΠΎΠΌΠΌΠ΅Π½Ρ‚Π°Ρ€ΠΈΠΉ

ΠšΠ°ΠΆΠ΅Ρ‚ΡΡ, эта ошибка заслуТиваСт исправлСния. Π― покопаюсь Π² этом ΠΈ ΠΏΠΎΡΡ‚Π°Ρ€Π°ΡŽΡΡŒ ΠΈΡΠΏΡ€Π°Π²ΠΈΡ‚ΡŒ, ΠΊΠΎΠ³Π΄Π° ΠΏΠΎΠ·Π²ΠΎΠ»ΠΈΡ‚ ΠΌΠΎΠ΅ врСмя.

Π­Ρ‚ΠΎ Π±Ρ‹Π»ΠΎ Π»Π΅Π³ΠΊΠΎ ΠΈΡΠΏΡ€Π°Π²ΠΈΡ‚ΡŒ (см. Запрос Π½Π° пСрСнос). Π‘Π»Π°Π³ΠΎΠ΄Π°Ρ€ΠΈΠΌ Π·Π° сообщСниС ΠΎΠ± ошибкС ΠΈ надССмся, Ρ‡Ρ‚ΠΎ ΠΏΡ€ΠΎΠ±Π»Π΅ΠΌΠ° Ρ‚Π΅ΠΏΠ΅Ρ€ΡŒ Ρ€Π΅ΡˆΠ΅Π½Π°.

@MarkusBonsch Π― Ρ‚ΠΎΠ»ΡŒΠΊΠΎ Ρ‡Ρ‚ΠΎ ΠΏΡ€ΠΈΠΌΠ΅Π½ΠΈΠ» Π²Π°ΡˆΡƒ Ρ„ΠΈΠΊΡΠ°Ρ†ΠΈΡŽ ΠΊ самой послСднСй вСрсии Ρ€Π°Π·Ρ€Π°Π±ΠΎΡ‚Ρ‡ΠΈΠΊΠ° data.table ΠΈ протСстировал Π΅Π΅ с двумя ΠΏΡ€ΠΈΠΌΠ΅Ρ€Π°ΠΌΠΈ ΠΈΠ· связанного вопроса SO.

Оба ΠΏΡ€ΠΈΠΌΠ΅Ρ€Π° Ρ€Π°Π±ΠΎΡ‚Π°ΡŽΡ‚, ΠΊΠ°ΠΊ ΠΈ оТидалось!

Π‘ΠΎΠ»ΡŒΡˆΠΎΠ΅ спасибо Π·Π° быстроС исправлСниС.

Π‘Ρ‹Π»Π° Π»ΠΈ эта страница ΠΏΠΎΠ»Π΅Π·Π½ΠΎΠΉ?
0 / 5 - 0 Ρ€Π΅ΠΉΡ‚ΠΈΠ½Π³ΠΈ