Data.table: [๋ฒ„๊ทธ] ์นดํ…Œ๊ณ ๋ฆฌ์— ์†Œ๋ฌธ์ž์™€ ๋Œ€๋ฌธ์ž๊ฐ€ ๋ชจ๋‘ ํฌํ•จ ๋œ ๊ฒฝ์šฐ % in % ๋ฌธ์ด ์‹คํŒจํ•ฉ๋‹ˆ๋‹ค.

์— ๋งŒ๋“  2018๋…„ 05์›” 15์ผ  ยท  6์ฝ”๋ฉ˜ํŠธ  ยท  ์ถœ์ฒ˜: Rdatatable/data.table

1.11.2 ๋ฒ„์ „์—์„œ %in% ๋ฐ & ๋ฌธ์„ ํ•จ๊ป˜ ์‚ฌ์šฉํ•  ๋•Œ %in% ๋Š” ๋Œ€๋ฌธ์ž๋กœ ์‹œ์ž‘ํ•˜๋Š” ์š”์†Œ๋ฅผ ๊ณ ๋ คํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

install.packages('data.table')
packageVersion("data.table")   # โ€˜1.11.2โ€™
data("iris")
library(data.table)
iris <- data.table(iris)
iris$grp <- c('A', 'B')

[๋ฐœํ–‰๋ฌผ]
'virginica'์˜ ์ฒซ ๊ธ€์ž๋ฅผ ๋Œ€๋ฌธ์ž ํ›„ %in% ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ ๋ฌธ์„ ๋‘ ๊ทธ๋ฃน์œผ๋กœ ๋Œ์•„๊ฐˆ ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค & ๋ฌธ์„ ์•„๋ž˜ ์ฐธ์กฐ :

iris[, Species1 := factor(Species, levels = c('setosa', 'versicolor', 'virginica'), labels = c('setosa', 'versicolor', 'Virginica'))]

iris[Species1 %in% c('setosa', 'Virginica') & grp == 'B', table(Species1)]
# Species1
# setosa versicolor  Virginica 
# 0          0         25 

[์˜ˆ์‹œ]
์•„๋ž˜์˜ ๋ช‡ ๊ฐ€์ง€ ์˜ˆ๋ฅผ ์‹œ๋„ํ•ด ๋ณด์•˜๊ณ  ์ž˜ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.
์†Œ๋ฌธ์ž ๋งŒ ํฌํ•จ ๋œ ๊ทธ๋ฃน์˜ ํ•˜์œ„ ์ง‘ํ•ฉ์„ ์‚ฌ์šฉํ•˜๋ฉด ๋‘ ๊ทธ๋ฃน์ด ๋ชจ๋‘ ๋ฐœ๊ฒฌ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

iris[Species1 %in% c('setosa', 'versicolor') & grp == 'B', table(Species1)]
# Species1
# setosa versicolor  Virginica 
# 25         25          0 

๋˜๋Š” ์–ด๋Š ์ชฝ ๋ฌธ์— ๊ด„ํ˜ธ๋ฅผ ์ถ”๊ฐ€ํ•˜๋ฉด ๋‘ ๊ทธ๋ฃน์ด ๋ชจ๋‘ ๋ฐœ๊ฒฌ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

iris[(Species1 %in% c('setosa', 'Virginica')) & grp == 'B', table(Species1)]
# Species1
# setosa versicolor  Virginica 
# 25          0         25 
iris[Species1 %in% c('setosa', 'Virginica') & (grp == 'B'), table(Species1)]
# Species1
# setosa versicolor  Virginica 
# 25          0         25 

subset ํ•จ์ˆ˜์—์„œ์ด ๋ฌธ์„ ์‹œ๋„ํ–ˆ๋Š”๋ฐ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.

table(subset(iris, Species1 %in% c('setosa', 'Virginica') & grp == 'B')$Species1)
# setosa versicolor  Virginica 
# 25          0         25 

์ด ๊ธฐ๋Šฅ์€ ์ด์ „ ๋ฒ„์ „์˜ data.table ํŒจํ‚ค์ง€์—์„œ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค (์—ฌ๊ธฐ์—์„œ๋Š” ๋ฒ„์ „ 1.10.4-3 ).

devtools::install_version("data.table", version = "1.10.4-3", repos = "http://cran.us.r-project.org")

packageVersion("data.table")   # โ€˜1.10.4.3โ€™
data("iris")
library(data.table)
iris <- data.table(iris)
iris$grp <- c('A', 'B')

iris[, Species1 := factor(Species, levels = c('setosa', 'versicolor', 'virginica'), labels = c('setosa', 'versicolor', 'Virginica'))]

iris[Species1 %in% c('setosa', 'Virginica') & grp == 'B', table(Species1)]
# Species1
# setosa versicolor  Virginica 
# 25          0         25 

[์„ธ์…˜ ์ •๋ณด]

> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.11.2

loaded via a namespace (and not attached):
[1] compiler_3.4.4 tools_3.4.4    yaml_2.1.18   
bug regression

๊ฐ€์žฅ ์œ ์šฉํ•œ ๋Œ“๊ธ€

๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” PR์„ ๋งŒ๋“ค์—ˆ์Šต๋‹ˆ๋‹ค. ๋‚ด PR ์ค‘ ํ•˜๋‚˜์— ์˜ํ•ด ๋„์ž… ๋œ ํšŒ๊ท€์ž…๋‹ˆ๋‹ค.

๋ชจ๋“  6 ๋Œ“๊ธ€

@MarkusBonsch ์‚ดํŽด ๋ณด์‹œ๊ฒ ์Šต๋‹ˆ๊นŒ? ์ด์ƒํ•ด ๋ณด์ธ๋‹ค

verbose = TRUE

Optimized subsetting with index 'grp__Species1'
on= matches existing index, using index
Coercing character column i.'Species1' to factor to match type of x.'Species1'. If possible please change x.'Species1' to character. Character columns are now preferred in joins.

๋‚˜๋Š” ์ด๊ฒƒ์ด ์ ์–ด๋„ ๊ฒฝ๊ณ  ๋ฉ”์‹œ์ง€ ์—ฌ์•ผํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•œ๋‹ค.

@ ddong63 %in% ๋ฅผ ํ˜ผํ•ฉ ๋ฌธ์ž์™€ ์š”์†Œ์— ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์€ ํ™•์‹คํžˆ ํ”ผํ•ด์•ผ ํ•  ์ผ์ž…๋‹ˆ๋‹ค. match ๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ์ „์— ์ ์ ˆํ•œ ๋ฐ์ดํ„ฐ ์œ ํ˜•์œผ๋กœ ๊ฐ•์ œํ•˜์‹ญ์‹œ์˜ค.

@HughParsonage ๊ณง ์ถœ์‹œ ๋  ์˜ˆ์ •์ด๋ฉฐ https://github.com/Rdatatable/data.table/pull/2734 ๋ณด๋ฅ˜ ์ค‘์ž…๋‹ˆ๋‹ค.

๋งค์šฐ ์ด์ƒํ•ฉ๋‹ˆ๋‹ค. ์ตœ๋Œ€ํ•œ ๋นจ๋ฆฌ ์กฐ์‚ฌํ•˜๊ณ  ์ˆ˜์ •ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ์‹ ๊ณ  ํ•ด ์ฃผ์…”์„œ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค.

@jangorecki ๊ฐ€ ์˜ณ์•˜์Šต๋‹ˆ๋‹ค. ๋‘ ์—ด์ด ๋™์ผํ•œ ๋ฐ์ดํ„ฐ ์œ ํ˜• (๋ฌธ์ž ๋˜๋Š” ์ธ์ˆ˜)์„ ๊ฐ€์งˆ ๋•Œ ์ œ๋Œ€๋กœ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.
๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค @MarkusBonsch

๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” PR์„ ๋งŒ๋“ค์—ˆ์Šต๋‹ˆ๋‹ค. ๋‚ด PR ์ค‘ ํ•˜๋‚˜์— ์˜ํ•ด ๋„์ž… ๋œ ํšŒ๊ท€์ž…๋‹ˆ๋‹ค.

์ด ํŽ˜์ด์ง€๊ฐ€ ๋„์›€์ด ๋˜์—ˆ๋‚˜์š”?
0 / 5 - 0 ๋“ฑ๊ธ‰