Dplyr: НСобходимо ΠΈΠΌΠ΅Ρ‚ΡŒ Π²ΠΎΠ·ΠΌΠΎΠΆΠ½ΠΎΡΡ‚ΡŒ Π²Ρ‹Π±ΠΈΡ€Π°Ρ‚ΡŒ Π³Ρ€ΡƒΠΏΠΏΡ‹

Π‘ΠΎΠ·Π΄Π°Π½Π½Ρ‹ΠΉ Π½Π° 28 ΠΌΠ°Ρ€. 2014  Β·  9ΠšΠΎΠΌΠΌΠ΅Π½Ρ‚Π°Ρ€ΠΈΠΈ  Β·  Π˜ΡΡ‚ΠΎΡ‡Π½ΠΈΠΊ: tidyverse/dplyr

А Ρ‚Π°ΠΊΠΆΠ΅ ΠΎΡ‚Π΄Π΅Π»ΡŒΠ½Ρ‹Π΅ Π»ΠΈΡ†Π° Π² Π³Ρ€ΡƒΠΏΠΏΠ°Ρ…

Π‘Π°ΠΌΡ‹ΠΉ ΠΏΠΎΠ»Π΅Π·Π½Ρ‹ΠΉ ΠΊΠΎΠΌΠΌΠ΅Π½Ρ‚Π°Ρ€ΠΈΠΉ

ΠžΡ‚Π²Π΅Ρ‚ @drhagen Π²Ρ‹ΡˆΠ΅ выглядит ΡƒΡΡ‚Π°Ρ€Π΅Π²ΡˆΠΈΠΌ. ΠšΠ°ΠΆΠ΅Ρ‚ΡΡ, сСйчас это Ρ€Π°Π±ΠΎΡ‚Π°Π΅Ρ‚:

sample_n_groups = function(tbl, size, replace = FALSE, weight = NULL) {
  # regroup when done
  grps = tbl %>% groups %>% lapply(as.character) %>% unlist
  # check length of groups non-zero
  keep = tbl %>% summarise() %>% ungroup() %>% sample_n(size, replace, weight)
  # keep only selected groups, regroup because joins change count.
  # regrouping may be unnecessary but joins do something funky to grouping variable
  tbl %>% right_join(keep, by=grps) %>% group_by_(.dots = grps)
}

ВсС 9 ΠšΠΎΠΌΠΌΠ΅Π½Ρ‚Π°Ρ€ΠΈΠΉ

species <- iris %.% 
  group_by(Species) %.% 
  summarise(wt = sum(Sepal.Length)) %.%
  sample_n(5, replace = T, weight = wt) %.%
  select(-wt)

inner_join(species, iris)

Π˜Π½Ρ‚Π΅Ρ€Π΅ΡΠ½ΠΎ, ΠΏΠΎΡ‡Π΅ΠΌΡƒ это Π±Ρ‹Π»ΠΎ Π·Π°ΠΊΡ€Ρ‹Ρ‚ΠΎ? ΠŸΠΎΡ…ΠΎΠΆΠ΅ Π½Π° ΠΏΠΎΡ‚Π΅Π½Ρ†ΠΈΠ°Π»ΡŒΠ½ΠΎ ΠΏΠΎΠ»Π΅Π·Π½ΡƒΡŽ Ρ„ΡƒΠ½ΠΊΡ†ΠΈΡŽ

iris %>%
    group_by(Species) %>%
    sample_n(1)

Ρ‡Ρ‚ΠΎΠ±Ρ‹ Π²Ρ‹Π±Ρ€Π°Ρ‚ΡŒ всС Π΄Π°Π½Π½Ρ‹Π΅ ΠΈΠ· случайного Π²ΠΈΠ΄Π°, Π½Π°ΠΏΡ€ΠΈΠΌΠ΅Ρ€

Π― Π½Π΅ Π΄ΡƒΠΌΠ°ΡŽ, Ρ‡Ρ‚ΠΎ ΠΏΠΎΠ²Π΅Π΄Π΅Π½ΠΈΠ΅ sample_n Π΄ΠΎΠ»ΠΆΠ½ΠΎ ΠΈΠ·ΠΌΠ΅Π½ΠΈΡ‚ΡŒΡΡ для Π³Ρ€ΡƒΠΏΠΏ, ΠΏΠΎΡ‚ΠΎΠΌΡƒ Ρ‡Ρ‚ΠΎ Π²Ρ‹Π±ΠΎΡ€ΠΊΠ° Π²Π½ΡƒΡ‚Ρ€ΠΈ Π³Ρ€ΡƒΠΏΠΏ - это Π΅Π³ΠΎ ΠΈΠ½Ρ‚ΡƒΠΈΡ‚ΠΈΠ²Π½ΠΎΠ΅ ΠΏΠΎΠ²Π΅Π΄Π΅Π½ΠΈΠ΅. Однако часто Π±Ρ‹Π²Π°Π΅Ρ‚ ΡƒΠ΄ΠΎΠ±Π½ΠΎ Π²Ρ‹Π±ΠΈΡ€Π°Ρ‚ΡŒ Π³Ρ€ΡƒΠΏΠΏΡ‹ Π² Ρ†Π΅Π»ΠΎΠΌ. Π­Ρ‚ΠΎ Π΄ΠΎΠ»ΠΆΠ½Π° Π±Ρ‹Ρ‚ΡŒ вторая функция. Π’ΠΎΡ‚ моя рСализация:

sample_n_groups = function(tbl, size, replace = FALSE, weight=NULL) {
   # regroup when done
   grps = tbl %>% groups %>% unlist %>% as.character
   # check length of groups non-zero
   keep = tbl %>% summarise() %>% sample_n(size, replace, weight)
   # keep only selected groups, regroup because joins change count.
   # regrouping may be unnecessary but joins do something funky to grouping variable
   tbl %>% semi_join(keep) %>% group_by_(grps) 
}

ΠŸΡ€ΠΈΠΌΠ΅Ρ€ @rcorty Ρ€Π°Π±ΠΎΡ‚Π°Π΅Ρ‚, ΠΊΠ°ΠΊ ΠΈ оТидалось

iris %>% group_by(Species) %>% sample_n_groups(1)

+1

Π˜Π·ΠΌΠ΅Π½ΠΈΡ‚ΡŒ: ΠΈΠ·ΠΌΠ΅Π½Π΅Π½ΠΈΠ΅ Π½Π° dplyr Π½Π°Ρ€ΡƒΡˆΠΈΠ»ΠΎ это Ρ€Π΅ΡˆΠ΅Π½ΠΈΠ΅;


Для Ρ‚Π΅Ρ… ΠΈΠ· вас, ΠΊΡ‚ΠΎ ΠΏΡ€ΠΈΡˆΠ΅Π» сюда Ρ‡Π΅Ρ€Π΅Π· ΠΏΠΎΠΈΡΠΊΠΎΠ²ΡƒΡŽ систСму Π² поисках этой Ρ„ΡƒΠ½ΠΊΡ†ΠΈΠΎΠ½Π°Π»ΡŒΠ½ΠΎΡΡ‚ΠΈ, рСализация @MarcusWalz Π½Π΅ Π²ΠΊΠ»ΡŽΡ‡Π°Π΅Ρ‚ Π²Ρ‹Π±ΠΎΡ€ΠΊΡƒ с Π·Π°ΠΌΠ΅Π½ΠΎΠΉ, ΠΊΠΎΠ³Π΄Π° replace = TRUE . Π’ Ρ€Π΅Π°Π»ΠΈΠ·Π°Ρ†ΠΈΠΈ Π½Π΅ΠΎΠ±Ρ…ΠΎΠ΄ΠΈΠΌΠΎ ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚ΡŒ right_join (ΠΈΠ»ΠΈ left_join ΠΈΠ»ΠΈ inner_join ) для сохранСния Π΄ΡƒΠ±Π»ΠΈΠΊΠ°Ρ‚ΠΎΠ²:

sample_n_groups = function(tbl, size, replace = FALSE, weight=NULL) {
  # regroup when done
  grps = tbl %>% groups %>% unlist %>% as.character
  # check length of groups non-zero
  keep = tbl %>% summarise() %>% sample_n(size, replace, weight)
  # keep only selected groups, regroup because joins change count.
  # regrouping may be unnecessary but joins do something funky to grouping variable
  tbl %>% right_join(keep, by=grps) %>% group_by_(grps) 
}

Π‘Π°ΠΌΠΎΠ·Π°Π³Ρ€ΡƒΠ·ΠΊΠ° кластСра - это самый распространСнный Π²Π°Ρ€ΠΈΠ°Π½Ρ‚ использования этой Ρ„ΡƒΠ½ΠΊΡ†ΠΈΠΈ.

@drhagen , Π² вашСй Ρ€Π΅Π°Π»ΠΈΠ·Π°Ρ†ΠΈΠΈ Π΅ΡΡ‚ΡŒ Π»ΠΈ Ρƒ вас прСдлоТСния ΠΏΠΎ созданию Π½ΠΎΠ²ΠΎΠ³ΠΎ ΡƒΠ½ΠΈΠΊΠ°Π»ΡŒΠ½ΠΎΠ³ΠΎ ΠΈΠ΄Π΅Π½Ρ‚ΠΈΡ„ΠΈΠΊΠ°Ρ‚ΠΎΡ€Π° Π³Ρ€ΡƒΠΏΠΏΡ‹?

На самом Π΄Π΅Π»Π΅ это довольно просто:

sample_n_groups = function(tbl, size, replace = FALSE, weight=NULL) {
  # regroup when done
  grps = tbl %>% groups %>% unlist %>% as.character
  # check length of groups non-zero
  keep = tbl %>% summarise() %>% sample_n(size, replace, weight) %>% 
    mutate(unique_id = 1:NROW(.))
  # keep only selected groups, regroup because joins change count.
  # regrouping may be unnecessary but joins do something funky to grouping variable
  tbl %>% right_join(keep, by=grps) %>% group_by_(grps) 
}

ΠžΡ‚Π²Π΅Ρ‚ @drhagen Π²Ρ‹ΡˆΠ΅ выглядит ΡƒΡΡ‚Π°Ρ€Π΅Π²ΡˆΠΈΠΌ. ΠšΠ°ΠΆΠ΅Ρ‚ΡΡ, сСйчас это Ρ€Π°Π±ΠΎΡ‚Π°Π΅Ρ‚:

sample_n_groups = function(tbl, size, replace = FALSE, weight = NULL) {
  # regroup when done
  grps = tbl %>% groups %>% lapply(as.character) %>% unlist
  # check length of groups non-zero
  keep = tbl %>% summarise() %>% ungroup() %>% sample_n(size, replace, weight)
  # keep only selected groups, regroup because joins change count.
  # regrouping may be unnecessary but joins do something funky to grouping variable
  tbl %>% right_join(keep, by=grps) %>% group_by_(.dots = grps)
}
Π‘Ρ‹Π»Π° Π»ΠΈ эта страница ΠΏΠΎΠ»Π΅Π·Π½ΠΎΠΉ?
0 / 5 - 0 Ρ€Π΅ΠΉΡ‚ΠΈΠ½Π³ΠΈ

Π‘ΠΌΠ΅ΠΆΠ½Ρ‹Π΅ вопросы

tobadia picture tobadia  Β·  4ΠšΠΎΠΌΠΌΠ΅Π½Ρ‚Π°Ρ€ΠΈΠΈ

chrislad picture chrislad  Β·  5ΠšΠΎΠΌΠΌΠ΅Π½Ρ‚Π°Ρ€ΠΈΠΈ

slyrus picture slyrus  Β·  3ΠšΠΎΠΌΠΌΠ΅Π½Ρ‚Π°Ρ€ΠΈΠΈ

nalimilan picture nalimilan  Β·  3ΠšΠΎΠΌΠΌΠ΅Π½Ρ‚Π°Ρ€ΠΈΠΈ

profdave picture profdave  Β·  3ΠšΠΎΠΌΠΌΠ΅Π½Ρ‚Π°Ρ€ΠΈΠΈ