以及团体中的个人
species <- iris %.%
group_by(Species) %.%
summarise(wt = sum(Sepal.Length)) %.%
sample_n(5, replace = T, weight = wt) %.%
select(-wt)
inner_join(species, iris)
我想知道为什么这被关闭了? 似乎是一个潜在有用的功能
iris %>%
group_by(Species) %>%
sample_n(1)
从随机物种中选择所有数据,例如
我不认为sample_n
的行为应该因组而改变,因为组内采样是其直观行为。 然而,能够将组作为一个整体进行采样通常很方便。 这应该是第二个功能。 这是我的实现:
sample_n_groups = function(tbl, size, replace = FALSE, weight=NULL) {
# regroup when done
grps = tbl %>% groups %>% unlist %>% as.character
# check length of groups non-zero
keep = tbl %>% summarise() %>% sample_n(size, replace, weight)
# keep only selected groups, regroup because joins change count.
# regrouping may be unnecessary but joins do something funky to grouping variable
tbl %>% semi_join(keep) %>% group_by_(grps)
}
@rcorty的示例工作正常
iris %>% group_by(Species) %>% sample_n_groups(1)
+1
编辑:对dplyr
更改破坏了此解决方案;
对于那些通过搜索引擎到达这里寻找此功能的人, @MarcusWalz的实现不会在replace = TRUE
时进行替换采样。 实现需要使用right_join
(或left_join
或inner_join
)来保留重复项:
sample_n_groups = function(tbl, size, replace = FALSE, weight=NULL) {
# regroup when done
grps = tbl %>% groups %>% unlist %>% as.character
# check length of groups non-zero
keep = tbl %>% summarise() %>% sample_n(size, replace, weight)
# keep only selected groups, regroup because joins change count.
# regrouping may be unnecessary but joins do something funky to grouping variable
tbl %>% right_join(keep, by=grps) %>% group_by_(grps)
}
集群引导是此功能的广泛用例。
@drhagen ,在您的实现中,您对如何生成新的唯一组 ID 有什么建议吗?
其实,这很简单:
sample_n_groups = function(tbl, size, replace = FALSE, weight=NULL) {
# regroup when done
grps = tbl %>% groups %>% unlist %>% as.character
# check length of groups non-zero
keep = tbl %>% summarise() %>% sample_n(size, replace, weight) %>%
mutate(unique_id = 1:NROW(.))
# keep only selected groups, regroup because joins change count.
# regrouping may be unnecessary but joins do something funky to grouping variable
tbl %>% right_join(keep, by=grps) %>% group_by_(grps)
}
@drhagen上面的
sample_n_groups = function(tbl, size, replace = FALSE, weight = NULL) {
# regroup when done
grps = tbl %>% groups %>% lapply(as.character) %>% unlist
# check length of groups non-zero
keep = tbl %>% summarise() %>% ungroup() %>% sample_n(size, replace, weight)
# keep only selected groups, regroup because joins change count.
# regrouping may be unnecessary but joins do something funky to grouping variable
tbl %>% right_join(keep, by=grps) %>% group_by_(.dots = grps)
}
最有用的评论
@drhagen上面的