http://stackoverflow.com/questions/22523131
ãããžã®ã€ã³ã¿ãŒãã§ãŒã¹ãã©ãããã¹ããããããªã-ããããããã©ã«ãã§drop = FALSEã«ããå¿ èŠããããŸãã
ãã®å·ãéããŠãããŠããããšããããªãŒã
ïŒ+1ïŒä»æ¥åãåé¡ã«ééããŸããã drop = FALSE
ã¯ç§ã«ãšã£ãŠå€§ããªå©ãã«ãªãã§ãããïŒ
.drop = FALSEã«çžåœãããã®ãdplyrã«å ¥ããããã®æéæ ã«é¢ããã¢ã€ãã¢ã¯ãããŸããïŒ ç¹å®ã®rChartãæ£ããã¬ã³ããªã³ã°ããã«ã¯ããããå¿ èŠã§ãã
ãã®éã«ãç§ã¯ããªãã®ãªã³ã¯ã§çããåŸãŸããã
http://stackoverflow.com/questions/22523131
ç§ã¯2ã€ã®å€æ°ã§ã°ã«ãŒãåããŸããã
空ã®ã°ã«ãŒããåé€ããªããªãã·ã§ã³ã®+1
ïŒ486ããã³ïŒ413ãšäžéšéè€ããŠããå¯èœæ§ããããŸãã
空ã®ã°ã«ãŒããåé€ããªããšéåžžã«äŸ¿å©ã§ãã ãµããªãŒããŒãã«ãäœæãããšãã«ããå¿ èŠã«ãªããŸãã
+ 1-ããã¯å€ãã®åæã«ãšã£ãŠå€§ããªåé¡ã§ã
ç§ã¯äžèšã®ãã¹ãŠã«åæããŸã-éåžžã«åœ¹ç«ã€ã§ãããã
@romainfrancoisçŸåšã build_index_cpp()
ã¯dropå±æ§ãå°éããŠããŸããã
t1 <- data_frame(
x = runif(10),
g1 = rep(1:2, each = 5),
g2 = factor(g1, 1:3)
)
g1 <- grouped_df(t1, list(quote(g2)), drop = FALSE)
attr(g1, "group_size")
# should be c(5L, 5L, 0L)
attr(g1, "indices")
# shoud be list(0:4, 5:9, integer(0))
ããããå±æ§ã¯ãå åã§ã°ã«ãŒãåããå Žåã«ã®ã¿é©çšãããŸãããã®å Žåãã¬ãã«ãå®éã«ããŒã¿ã«é©çšããããã©ããã«é¢ä¿ãªããå åã¬ãã«ããšã«1ã€ã®ã°ã«ãŒããå¿ èŠã§ãã
ããã¯ã次ã®æ¹æ³ã§åäžããŒãã«ã®åè©ã«ã圱é¿ããŸãã
select()
ïŒå¹æãªãarrange()
ïŒå¹æãªãsummarise()
ïŒãŒãè¡ã°ã«ãŒãã«é©çšãããé¢æ°ã«ã¯0ã¬ãã«ã®æŽæ°ãæå®ããå¿
èŠããããŸãã n()
ã¯0ãè¿ãã mean(x)
ã¯NaNãè¿ãå¿
èŠããããŸãfilter()
ïŒäžéšã®ã°ã«ãŒãã«è¡ããªãå Žåã§ããã°ã«ãŒãã®ã»ããã¯äžå®ã®ãŸãŸã§ããå¿
èŠããããŸãmutate()
ïŒç©ºã®ã°ã«ãŒãã®åŒãè©äŸ¡ããå¿
èŠã¯ãããŸããæçµçã«ã¯drop = FALSE
ãããã©ã«ãã«ãªãã drop = FALSE
ãšdrop = TRUE
äž¡æ¹ã®ãã©ã³ããäœæããã®ãé¢åãªå Žåã¯ã drop = FALSE
ãµããŒããåãã§äžæ¢ããŸãïŒãã€ã§ãèªåã§å åãåå¹³æºåãããã代ããã«æåãã¯ãã«ã䜿çšã§ããããïŒã
ããã¯çã«ããªã£ãŠããŸããïŒ å€§å€ãªäœæ¥ã®å Žåã¯ã0.4ãŸã§å»¶æã§ããŸãã
@statwonkã@wsurlesã@jennybcã@slacklineã@mcfrankã@ eipi10ããªããå©ããããå Žåã¯ãè¡ãããã®æåã®ããšã¯ãç°ãªãåè©ãçžäºäœçšããå¯èœæ§ã®ãããã¹ãŠã®æ¹æ³ãè¡äœ¿ãããã¹ãã±ãŒã¹ã®ã»ããã§åäœããããã«ãªããŸãé·ãããŒãã®ã°ã«ãŒãã
ããã drop
ãäœãããã®ãããããªãã£ããšæããŸãã ããã¯ãããæããã«ããŸãã 倧å€ãªäœæ¥ã§ã¯ãªããšæããŸãã
äžèšã®åäžã®ããŒãã«åè©ãé·ããŒãã®ã°ã«ãŒããæ£ããåŠçãããã©ããããã¹ããããã«ãªã¯ãšã¹ãïŒ833ãéããŸããã ãã¡ãããdplyrã¯çŸåšãã¹ãã«å€±æããŠãããããã»ãšãã©ã®ãã¹ãã¯ã³ã¡ã³ãåãããŠããŸãã
+1ãããã§ã¹ããŒã¿ã¹ã®æŽæ°ã¯ãããŸããïŒ èŠçŽã倧奜ãã§ã空ã®ã¬ãã«ãç¶æããå¿ èŠããããŸãïŒ
@ebergelson ããããé·ããŒãã®ã°ã«ãŒããååŸããããã®ç§ã®çŸåšã®ããã¯ã§ãã æ£ã°ã©ããç©ã¿éãªãããã«ããããå¿ èŠã«ãªãããšããããããŸãã
ããã§ãdfã«ã¯ãååãã°ã«ãŒããã¡ããªãã¯ã®3ã€ã®åããããŸãã
df2 <- expand.grid(name = unique(df$name), group = unique(df$group)) %>%
left_join(df, by=c("name","group")) %>%
mutate(metric = ifelse(is.na(metric),0,metric))
ç§ãåæ§ã®ããšãããŸããæ¬ èœããŠããã°ã«ãŒãããã§ãã¯ããŠããããã¹ãŠã®çµã¿åãããšleft_join
ãŸãã
æ®å¿µãªããããã®åé¡ãããŸãæãããŠããªãããã§ã...ãããããã®ç°¡åãªåé¿çãããããã§ãã
@ wsurles ã @ bpbondããããšããã¯ããç§ã¯ããªããææ¡ãããã®ãšåæ§ã®åé¿çã䜿çšããŸããïŒ .dropã®ãããªçµã¿èŸŒã¿ã®ä¿®æ£ãèŠãŠã¿ããã§ãã
äžèšã®ãã¹ãŠã®äººãè¿œå ããŠåæããã ãã§ããããã¯ãå€ãã®åæã®éåžžã«éèŠãªåŽé¢ã§ãã å®è£ ãèŠããã§ãã
ããã§å¿ èŠãªè©³çŽ°ïŒ
ç§ããããæã£ãŠããå ŽåïŒ
> df <- data_frame( x = c(1,1,1,2,2), f = factor( c(1,2,3,1,1) ) )
> df
Source: local data frame [5 x 2]
x f
1 1 1
2 1 2
3 1 3
4 2 1
5 2 1
ãããŠã x
ã次ã«f
ã§ã°ã«ãŒãåãããšãã°ã«ãŒã(2, 2)
ãš(2,3)
ã空ã®6ïŒ2x3ïŒã°ã«ãŒãã«ãªããŸãã ããã§å€§äžå€«ã§ãã ãªããšãå®è£
ã§ãããšæããŸãã
ä»ãç§ããããæã£ãŠããå Žåã¯ã©ããªããŸããïŒ
> df <- data_frame( f = factor( c(1,1,2,2), levels = 1:3), x = c(1,2,1,4) )
> df
Source: local data frame [4 x 2]
f x
1 1 1
2 1 2
3 2 1
4 2 4
f
ã次ã«x
ã°ã«ãŒãåããŸãã ã°ã«ãŒãã¯äœã§ããããïŒ @hadley
ãã®å Žåã stats::aggregate
ãšplyr::ddply
äž¡æ¹ã4ã€ã®ã°ã«ãŒãïŒ1,1; 1,2; 2,1;ããã³2,4ïŒãè¿ãã®ã§ããããæºæ ããåäœã§ããããšããå§ãããŸãã ã
代ããã«table()
ã«åæããã¹ãã§ã¯ãããŸããããã€ãŸãã9ã€ã®ã°ã«ãŒããè¿ããŸããïŒ
> table(df$f, df$x)
1 2 4
1 1 1 0
2 1 0 1
3 0 0 0
df %>% group_by(f, x) %>% tally
ãåºæ¬çã«with(df, as.data.frame(table(f, x)))
ããã³ddply(df, .(f, x), nrow, .drop=FALSE)
ãšåãçµæã«ãªããšæããŸãã
ç§ãã¡ã®æãŸããæ¯ãèãã¯ãé·ãããŒãã®ã°ã«ãŒããèŠå ã§ããå ŽåïŒ.drop in plyrãªã©ïŒãä¿æããããšã ãšæã£ãã®ã§ã @ huftisã®ææ¡ãå¿ èŠã ãšæããŸãã ãã ããããã©ã«ãã®åäœãå€æŽãããªãããã«ãããã©ã«ããdrop = TRUEã«ããããšããå§ãããŸããããã«ã€ããŠã¯ã @ bpbondã®ææ¡ãåç §ããŠãã ããã
ããŒããè¡åãã©ãããã¹ããæ£ç¢ºã«é ãå ãã®ã¯é£ããã§ãã ãããã®éåžžã«åçŽãªæèå®éšã¯æ£ããããã«èŠããŸããïŒ
df <- data_frame(x = 1, y = factor(1, levels = 2))
df %>% group_by(x) %>% summarise(n())
#> x n
#> 1 1
df %>% group_by(y) %>% summarise(n())
#> y n
#> 1 1
#> 2 0
df %>% group_by(x, y) %>% summarise(n()
#> x y n
#> 1 1 1
#> 1 2 0
ãããã x
ã«è€æ°ã®å€ãããå Žåã¯ã©ããªããŸããïŒ ãã®ããã«åäœããå¿
èŠããããŸããïŒ
df <- data_frame(x = 1:2, y = factor(1, levels = 2))
df %>% group_by(x, y) %>% summarise(n()
#> x y n
#> 1 1 1
#> 2 1 1
#> 1 1 0
#> 2 2 0
空ã®ã°ã«ãŒããä¿æããããšã¯ãåäžã®å€æ°ã§ã°ã«ãŒãåããå Žåã«ã®ã¿æå³ããããŸããïŒ ããçŸå®çã«ãã¬ãŒã åããå Žåãããšãã°data_frame(age_group = c(40, 60), sex = factor(M, levels = c("F", "M"))
ã¯ãæ¬åœã«å¥³æ§ã®ã«ãŠã³ããå¿
èŠã§ããïŒ æã
ããããããšãããã°ãããããªãããšããããšæããŸãã ãã¹ãŠã®çµã¿åãããå±éããããšã¯ãç§ã«ã¯å€å°ç°ãªãæäœã®ããã«æããŸãïŒãããŠå åã®äœ¿çšãšã¯ç¡é¢ä¿ã§ãïŒã
ãã¶ãgroup_by
ã¯drop
ãšexpand
äž¡æ¹ã®åŒæ°ãå¿
èŠã§ããïŒ drop = FALSE
ã¯ãããŒã¿ã«è¡šç€ºãããªãå åã¬ãã«ã«ãã£ãŠçæããããã¹ãŠã®ãµã€ãºãŒãã®ã°ã«ãŒããä¿æããŸãã expand = TRUE
ã¯ãããŒã¿ã«è¡šç€ºãããªãå€ã®çµã¿åããã«ãã£ãŠçæããããã¹ãŠã®ãµã€ãºãŒãã®ã°ã«ãŒããä¿æããŸãã
@hadleyããªãã®äŸã¯ç§ã«ã¯æ£ããèŠããŸãïŒããªããlevels = 1:2
ã§ã¯ãªãlevels = 2
levels = 1:2
ãæå³ããŠãããšä»®å®ããŸãïŒã ãŸããããã€ãã®å€æ°ã§ã°ã«ãŒãåããå Žåã§ãã空ã®ã°ã«ãŒããä¿æããããšã¯çã«ããªã£ãŠãããšæããŸãã ããšãã°ãå€æ°ãsex
ïŒ male
ãšfemale
ïŒãšanswer
ïŒã¢ã³ã±ãŒãã§ãã¬ãã«disagree
ã neutral
ã agree
ïŒãæ§å¥ããšã«ååçã®é »åºŠãã«ãŠã³ããããå ŽåïŒããšãã°ãããŒãã«ã®å ŽåããŸãã¯åŸã§ããããããå ŽåïŒãåçã«ããŽãªãåé€ããã ãã§ã¯äžååã§ãã女æ§ãããã«çããªãã£ãã®ã§ã
ãŸããå åå€æ°ã¯ãçµæã®data_frame
ïŒæååã«å€æãããªãïŒã§ã_å
ã®ã¬ãã«_ã§å åå€æ°ã®ãŸãŸã§ãããšäºæ³ããŸãã ïŒãããã£ãŠãããŒã¿ãããããããå Žåãåçã«ããŽãªã¯ãã¢ã«ãã¡ãããé ã®agree
ã disagree
ã neutral
ã§ã¯ãªããæ£ããé åºã«ãªããŸãïŒã
æåŸã®äŸã§ã¯ã sex
å€æ°ãåé€ããã®ãèªç¶ãªå ŽåïŒããšãã°ãæå³çã«å¥³æ§ã調æ»ãããªãã£ãå ŽåïŒãããã§ãªãå ŽåïŒããšãã°ã次ã®ããã«å±€åãããå
倩æ§æ¬ æçã®æ°ãæ°ããå ŽåïŒã§ããæ§å¥ïŒãããŠãããã幎ïŒïŒã ããããããã¯ããŒã¿ãéçŽããåŸã§ãç°¡åã«åŠçã§ããŸãïŒãããŠãããã¹ãã§ãïŒã ïŒå¥ã®è§£æ±ºçã¯ã_vector-valued_ .drop
åŒæ°ãåãå
¥ããããšã§ããããã¯çŽ æŽãããããšã§ãããè€éã«ãªãå¯èœæ§ããããšæããŸããïŒïŒ
ïŒå¥ã®è§£æ±ºçã¯ããã¯ãã«å€ã®.dropåŒæ°ãåãå ¥ããããšã§ããããã¯çŽ æŽãããããšã§ãããè€éã«ãªãå¯èœæ§ããããšæããŸããïŒïŒ
ã¯ããããããè€éãããŸãã ãã以å€ã®å Žåã¯ã @ huftisã®ã³ã¡ã³ãã«åæããŸãã
@hadley
ç§ãæãã«
YESå€ã®ãã¹ãŠã®çµã¿åãããããŒã¿ã«ååšããå Žåã¯ãgroup_byã«å±éãããŸãã
ããããååšããªãå åã¬ãã«ã§ã¯æ¡åŒµããŸããã
ç§ã®æãé »ç¹ãªäœ¿çšäŸã¯ãã°ã©ãçšã«èŠçŽãããããŒã¿ã®ã»ãããæºåããããšã§ãïŒæ¢çŽ¢äžïŒã ãŸããã°ã©ãã«ã¯ãã¹ãŠã®å€ã®çµã¿åãããå¿ èŠã§ãã ãã ãããã¹ãŠã®ã°ã«ãŒãã«å¯ŸããŠ0ã®å åã¬ãã«ãæã€å¿ èŠã¯ãããŸãããããšãã°ããã¹ãŠã®çµã¿åããããªããšæ£ã°ã©ããç©ã¿éããããšã¯ã§ããŸããã ãã ããããŒã¿ã«ååšããªãå åå€ã¯å¿ èŠãããŸãããã¹ã¿ãã¯ãããš0ã«ãªããå¡äŸã«ã¯ç©ºã®å€ã«ãªããŸãã
å¿ èŠã«å¿ããŠãã°ã«ãŒãååŸã®0ã±ãŒã¹ããã£ã«ã¿ãªã³ã°ããæ¹ãã¯ããã«ç°¡åïŒãã€çŽæçïŒã§ããããããã¹ãŠã®å€ãgroup_byã«å±éããããšãããã©ã«ãã§ãããšæããŸãã .dropåŒæ°ã¯å¿ èŠãªããšæããŸãããªããªããåŸã®0ã±ãŒã¹ããã£ã«ã¿ãªã³ã°ããã®ã¯ç°¡åã ããã§ãã ä»ã®é¢æ°ã«è¿œå ã®åŒæ°ã䜿çšããªãã®ã§ãããã¯åãå£ããŸãã ããã©ã«ãã§ã¯ãgroup_byã«åºã¥ããŠæ¢åã®å€ã®ãã¹ãŠã®çµã¿åããã®çµæã衚瀺ããå¿ èŠããããŸãã
ãããæ£ããããã©ã«ãã®åäœã«ãªããšæããŸãã ããã§ãäžæã¯ããã¹ãŠã®å åã¬ãã«ã§ã¯ãªããå åã®æ¢åã®å€ã®ã¿ãæ¡åŒµããŸãã ïŒããã¯ã0åã®å€ãããããããgroup_byãå®è¡ããåŸã«å®è¡ãããã®ã§ãïŒ
## Expand data so plot groups works correctly
df2 <- expand.grid(name = unique(df$name), group = unique(df$group)) %>%
left_join(df, by=c("name","group")) %>%
mutate(
measure = ifelse(is.na(measure),0,measure)
)
ãã¹ãŠã®ã°ã«ãŒãã«ãŒãããã£ããšããŠããå€ãå¿ èŠãªå Žæã確èªã§ããå¯äžã®ã±ãŒã¹ã¯ãæéããŒã¿ã®å Žåã§ãã ãã¶ããããŒã¿ã®1æ¥ãéäžã®ã©ããã§æ¬ èœããŠããŸãã ããã§ã¯ãæ¥ä»ç¯å²ã§ã®å±éãšçµåãåŒãç¶ãå¿ èŠã§ãã å åã¬ãã«ã®å Žåã¯é©çšãããŸããã ããŒã¿ã¯ã©ã³ãã£ãŒãäžè¶³ããŠããæ¥ä»ãèªåã§åŠçããã®ã¯å ¬å¹³ã ãšæããŸãã
ãã®ã©ã€ãã©ãªã§ã®çŽ æŽãããäœæ¥ã«æè¬ããŸãã ç§ã®ä»äºã®90ïŒ ã¯dplyrã䜿çšããŠããŸãã ïŒïŒ
@huftisã«åŒ·ãåæããŸãã
ã¬ãã«ã®åé€ãŸãã¯ã¬ãã«ã®çµã¿åããã¯ãããŒã¿ãšã¯äœã®é¢ä¿ããªããšæããŸãã å°ããªãµã³ãã«ã䜿çšããŠãé¢æ°ãŸãã¯å³ã®ãããã¿ã€ããäœæããŠããå¯èœæ§ããããŸãã ãŸãã¯ãsplit-apply-combineæäœãå®è¡ããŸãããã®å Žåãåã°ã«ãŒãã®åºåãä»ã®ãã¹ãŠã®ã°ã«ãŒããšäžèŽããããšãä¿èšŒããå¿ èŠããããŸãã
ç§ã®ç«å ŽãåãããïŒã°ã«ãŒãåå€æ°ããã§ã«é©åãªå åã§ããå Žåãšãå åã匷å¶ãããŠããå Žåã§ãããã©ã«ãã®åäœãç°ãªããã©ãããæ€èšãã䟡å€ããããšæããŸãã 匷å¶ã®å Žåãæªäœ¿çšã®ã¬ãã«ãç¶æãã矩åãå°ãªããªãå¯èœæ§ãããããšãããããŸãã ããããç§ãäœããèŠå ãšããŠèšå®ããã¬ãã«ãå¶åŸ¡ããã®ã«èŠåŽããå Žå...éåžžã¯æ£åœãªçç±ãããããããç¶æããããã«çµ¶ããæŠãããšã¯ããããããŸããã
åèãŸã§ã«ããã®æ©èœãèŠããã§ãã @huftisã§èª¬æãããŠããã®ãšåæ§ã®ã·ããªãªããããå¿ èŠãªçµæãåŸãã«ã¯ããŒããé£ã³è¶ããªããã°ãªããŸããã
SOããããã«æ¥ãŸããã ããã¯ããtidyrãã®complete
ã圹ç«ã€ã¯ãã§ãã
ã¯ããããã§ãã ç§ã¯å®éã«æè¿ãå®å šãã«ã€ããŠåŠã³ãŸããããããŠããã¯ææ ®æ·±ãæ¹æ³ã§ãããéæããããã§ãã
SQLããã¯ãšã³ãã«ãããå®è£ ããããšã¯ãããã©ã«ãã§ãã¹ãŠã®ã°ã«ãŒããåé€ãããããé£ããããã«èŠããŸãã ããããã®ãŸãŸã«ããŠãSQLçšã«tidyr :: completeïŒïŒãå®è£ ããŸãããïŒ
ç§ã¯ãã®åé¡ããã§ã«ååšããŠããããšã«æ°ã¥ããã«åé¡ïŒ3033ãäœæããŸãã-éè€ã«ã€ããŠãè©«ã³ããŸãã ç§èªèº«ã®è¬èãªææ¡ãè¿œå ããããã«ãç§ã¯çŸåšããã®åé¡ã®åé¿forcats::fct_count()
ãšããŠpull()
ãšforcats::fct_count()
ããŠããŸãã
fct_count()
ã¯ãåžžã«å
¥åãšåãåã®åºåãäœæãããšããæŽç¶ãšããååãè£åã£ãŠããããïŒã€ãŸãããã®é¢æ°ã¯ãã¯ãã«ãããã£ãã«ãäœæããŸãïŒãç§ã¯ãã®ã¡ãœããã®ãã¡ã³ã§ã¯ãããŸãããåºåã®åã®ååãå€æŽããŸãã ããã«ããã dplyr::count()
ã1ã€ãã«ããŒããããšãæå³ããŠããå Žåã3ã€ã®ã¹ãããïŒ pull() %>% fct_count() %>% rename()
ïŒãäœæãããŸãã forcats::fct_count()
ãšdplyr::count()
ãªãããã®æ¹æ³ã§çµ±åãã forcats::fct_count()
ãå»æ¢ã§ããã°çŽ æŽããããšæããŸãã
tidyr::complete()
ã¯èŠå ã«å¯ŸããŠæ©èœããŸããïŒ
ãã¹ãŠã®å åã¬ãã«ããã³å åã¬ãã«ã®çµã¿åããã¯ãããã©ã«ãã§ä¿æããå¿
èŠããããŸãã ãã®åäœã¯ã drop
ã expand
ãªã©ã®ãã©ã¡ãŒã¿ã§å¶åŸ¡ã§ããŸãããããã£ãŠã dplyr::count()
ã®ããã©ã«ãã®åäœã¯æ¬¡ã®ããã«ãªããŸãã
df <- data.frame(x = 1:2, y = factor(c(1, 1), levels = 1:2))
df %>% dplyr::count(x, y)
#> # A tibble: 4 x 3
#> x y n
#> <int> <fct> <int>
#> 1 1 1 1
#> 2 2 1 1
#> 3 1 2 0
#> 4 2 2 0
é·ãããŒãã®ã°ã«ãŒãïŒã°ã«ãŒãã®çµã¿åããïŒã¯ãåŸã§ãã£ã«ã¿ãªã³ã°ã§ããŸãã ããããæ¢çŽ¢çåæã®ããã«ã¯ãå šäœåãèŠãå¿ èŠããããŸãã
2ïŒã¯ãééããªã
1ïŒãã®åé¡ã«ã¯æè¡çãªå®è£
äžã®åé¡ãããã€ããããŸãããæ°é±é以å
ã«èª¿æ»ããŸãã
次ã®ããã«ãäºåŸã«ããŒã¿ãæ¡åŒµããããšã§ãããåé¿ã§ããå¯èœæ§ããããŸãã
library(tidyverse)
truly_group_by <- function(data, ...){
dots <- quos(...)
data <- group_by( data, !!!dots )
labels <- attr( data, "labels" )
labnames <- names(labels)
labels <- mutate( labels, ..index.. = attr(data, "indices") )
expanded <- labels %>%
tidyr::expand( !!!dots ) %>%
left_join( labels, by = labnames ) %>%
mutate( ..index.. = map(..index.., ~if(is.null(.x)) integer() else .x ) )
indices <- pull( expanded, ..index..)
group_sizes <- map_int( indices, length)
labels <- select( expanded, -..index..)
attr(data, "labels") <- labels
attr(data, "indices") <- indices
attr(data, "group_sizes") <- group_sizes
data
}
df <- data_frame(
x = 1:2,
y = factor(c(1, 1), levels = 1:2)
)
tally( truly_group_by(df, x, y) )
#> # A tibble: 4 x 3
#> # Groups: x [?]
#> x y n
#> <int> <fct> <int>
#> 1 1 1 1
#> 2 1 2 0
#> 3 2 1 1
#> 4 2 2 0
tally( truly_group_by(df, y, x) )
#> # A tibble: 4 x 3
#> # Groups: y [?]
#> y x n
#> <fct> <int> <int>
#> 1 1 1 1
#> 2 1 2 1
#> 3 2 1 0
#> 4 2 2 0
æããã«å°æ¥çã«ã¯ãããã¯å éšã§åŠçãããtidyrãŸãã¯purrrã䜿çšããå¿ èŠã¯ãããŸããã
ããã¯ãå ã®è³ªåãåŠçããŠããããã§ãã
> df = data.frame(a=rep(1:3,4), b=rep(1:2,6))
> df$b = factor(df$b, levels=1:3)
> df %>%
+ group_by(b) %>%
+ summarise(count_a=length(a), .drop=FALSE)
# A tibble: 2 x 3
b count_a .drop
<fct> <int> <lgl>
1 1 6 FALSE
2 2 6 FALSE
> df %>%
+ truly_group_by(b) %>%
+ summarise(count_a=length(a), .drop=FALSE)
# A tibble: 3 x 3
b count_a .drop
<fct> <int> <lgl>
1 1 6 FALSE
2 2 6 FALSE
3 3 0 FALSE
ããã§ã®éµã¯ããã§ã
tidyr::expand( !!!dots ) %>%
ããã¯ãå€æ°ãèŠå ã§ãããã©ããã«é¢ä¿ãªãããã¹ãŠã®å¯èœæ§ãæ¡å€§ããããšãæå³ããŸãã
ç§ãã¡ã¯ã©ã¡ããã ãšæããŸãïŒ
drop=FALSE
ãšãã«ãã¹ãŠå±éããé·ã0ã®ã°ã«ãŒããããããããå¯èœæ§ããããŸãdrop=TRUE
å Žåãä»è¡ã£ãŠããããšãå®è¡ããŸããããããããããã¹ãåãæ¿ããæ©èœããããŸãã
ããã¯ãã¡ã¿ããŒã¿ã®æäœã®ã¿ãå«ãŸãããããæ¯èŒçå®äŸ¡ãªæäœã§ãããããã£ãŠãæåã«Rã§ãããè¡ãæ¹ããªã¹ã¯ãå°ãªãã®ã§ã¯ãªãã§ããããã
ãããããŠcrossing()
ã®ä»£ããã«expand()
ïŒ
å
éšãèŠãŠããããå®çŸããããã«ã build_index_cpp()
ãç¹ã«labels
ããŒã¿ãã¬ãŒã ã®çæããå€æŽããã ããã§ããããšã«åæããŸããïŒ
drop = FALSE
èŠçŽ ã®ã¿ãæ¡åŒµããããšããå§ããããŸããïŒ ç§ã¯ãèªç¶ãªãæ§æãæ€èšããŸããããããã¯æçµçã«ã¯æ··ä¹±ãããããããããŸããïŒãããŠããããååã«åŒ·åã§ã¯ãããŸããïŒïŒ
group_by(data, crossing(col1, col2), col3)
æå³ïŒãã¹ãŠã®çµã¿åããã䜿çšããŠcol1
ãšcol2
ããŠããããŠããã«æ¢åã®çµã¿åããcol3
ã
ã¯ããããã¯build_index_cpp
ãšãæŒãã€ã¶ãããå±æ§labels
ã indices
ã group_sizes
ã®çæã«ã®ã¿åœ±é¿ãããšæããŸãã ïŒ3489ã®äžéšãšããŠã®æŽé ãããæ§é
ãã®è°è«ã®ãå¯äžã®æ¡å€§èŠå ãã®éšåã¯ãéåžžã«æéãããã£ããã®ã§ãã
ãããã®çµæã¯ã©ããªãã§ããããã
library(dplyr)
d <- data_frame(
f1 = factor( rep( c("a", "b"), each = 4 ), levels = c("a", "b", "c") ),
f2 = factor( rep( c("d", "e", "f", "g"), each = 2 ), levels = c("d", "e", "f", "g", "h") ),
x = 1:8,
y = rep( 1:4, each = 2)
)
f <- function(data, ...){
group_by(data, !!!quos(...)) %>%
tally()
}
f(d, f1, f2, x)
f(d, x, f1, f2)
f(d, f1, f2, x, y)
f(d, x, f1, f2, y)
è¡ã®é åºãç¡èŠãããšã f(d, f1, f2, x)
ã¯f(d, x, f1, f2)
ãšåãçµæã«ãªãã¯ãã§ãã ä»ã®2ã€ã«ã€ããŠãåãã§ãã
ãŸãèå³æ·±ãïŒ
f(d, f2, x, f1, y)
d %>% sample_frac(0.3) %>% f(...)
ãã¡ã¯ã¿ãŒã«å¯ŸããŠã®ã¿å®å šæ¡åŒµãå®è£ ãããšããã¢ã€ãã¢ã奜ãã§ãã æå以å€ã®ããŒã¿ïŒè«çãå«ãïŒã®å Žåãããããã®ããŒã¿åãç¶æ¿ããå åã®ãããªã¯ã©ã¹ãå®çŸ©/䜿çšã§ããŸãã ããããforcatsã«ãã£ãŠæäŸãã
ïŒ3492ã§é²è¡äžã®å®è£
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
df <- data_frame( f = factor( c(1,1,2,2), levels = 1:3), x = c(1,2,1,4) )
( res1 <- tally(group_by(df,f,x, drop = FALSE)) )
#> # A tibble: 9 x 3
#> # Groups: f [?]
#> f x n
#> <fct> <dbl> <int>
#> 1 1 1. 1
#> 2 1 2. 1
#> 3 1 4. 0
#> 4 2 1. 1
#> 5 2 2. 0
#> 6 2 4. 1
#> 7 3 1. 0
#> 8 3 2. 0
#> 9 3 4. 0
( res2 <- tally(group_by(df,x,f, drop = FALSE)) )
#> # A tibble: 9 x 3
#> # Groups: x [?]
#> x f n
#> <dbl> <fct> <int>
#> 1 1. 1 1
#> 2 1. 2 1
#> 3 1. 3 0
#> 4 2. 1 1
#> 5 2. 2 0
#> 6 2. 3 0
#> 7 4. 1 0
#> 8 4. 2 1
#> 9 4. 3 0
all.equal( res1, arrange(res2, f, x) )
#> [1] TRUE
all.equal( filter(res1, n>0), tally(group_by(df, f, x)) )
#> [1] TRUE
all.equal( filter(res2, n>0), tally(group_by(df, x, f)) )
#> [1] TRUE
reprexããã±ãŒãžïŒv0.2.0ïŒã«ãã£ãŠ2018-04-10ã«äœæãããŸããã
complete()
ãåé¡ã解決ãããã©ããã«ã€ããŠã¯ãããããããã§ã¯ãããŸããã èšç®ãããèŠçŽãäœã§ããã空ã®ãã¯ãã«ã§ã®ãããã®åäœã¯ãäºåŸã«ããããåœãŠãã®ã§ã¯ãªããä¿æããå¿
èŠããããŸãã äŸãã°ïŒ
data.frame(x=factor(1, levels=1:2), y=4:5) %>%
group_by(x) %>%
summarize(min=min(y), sum=sum(y), prod=prod(y))
# Should be:
#> x min sum prod
#> 1 4 9 20
#> 2 Inf 0 1
sum
ãšprod
ïŒããã³çšåºŠã¯å°ãªãã§ããmin
ïŒïŒããã³ãã®ä»ã®ããŸããŸãªé¢æ°ïŒã¯ã空ã®ãã¯ãã«ã«å¯ŸããŠéåžžã«æ確ã«å®çŸ©ãããã»ãã³ãã£ã¯ã¹ãæã£ãŠããŸãããã®åŸã complete()
ã䜿çšããŠããããã®åäœãåå®çŸ©ããŸãã
@kenahooããããŸããã ããã¯ãçŸåšã®éçºããŒãžã§ã³ã§åŸããããã®ã§ãã ãããã£ãŠãååŸã§ããªãã®ã¯min()
ããã®èŠåã ãã§ãã
library(dplyr)
data.frame(x=factor(1, levels=1:2), y=4:5) %>%
group_by(x) %>%
summarize(min=min(y), sum=sum(y), prod=prod(y))
#> # A tibble: 2 x 4
#> x min sum prod
#> <fct> <dbl> <int> <dbl>
#> 1 1 4 9 20
#> 2 2 Inf 0 1
min(integer())
#> Warning in min(integer()): no non-missing arguments to min; returning Inf
#> [1] Inf
sum(integer())
#> [1] 0
prod(integer())
#> [1] 1
reprexããã±ãŒãžïŒv0.2.0ïŒã«ãã£ãŠ2018-05-15ã«äœæãããŸããã
@romainfrancoisãã£ããããç§ã¯ããªãããã§ã«ãã®å®è£ ã«
ãã®å€ãåé¡ã¯èªåçã«ããã¯ãããŠããŸãã é¢é£ããåé¡ãèŠã€ãããšæãããå Žåã¯ãæ°ããåé¡ãïŒreprexã䜿çšããŠïŒæåºãããã®åé¡ã«ãªã³ã¯ããŠãã ããã https://reprex.tidyverse.org/
æãåèã«ãªãã³ã¡ã³ã
+ 1-ããã¯å€ãã®åæã«ãšã£ãŠå€§ããªåé¡ã§ã