samplify()
creates a resampled tibble with virtual groups.
samplify(data, times, size, ..., replace = FALSE, key = ".sample")
data | A tbl. |
---|---|
times | A single integer specifying the number of resamples.
If the |
size | A single integer specifying the size of each resample. For a
grouped data frame, this is also allowed to be an integer vector with size
equal to the number of groups in |
... | Not used. |
replace | Whether or not to sample with replacement. |
key | A single character specifying the name of the virtual group that is added. |
A resampled_df
with an extra group specified by the key
.
The following functions have special / interesting behavior when used with
a resampled_df
:
Other virtual samplers: bootstrapify
#> # A tibble: 150 x 5 #> # Groups: .sample [3] #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> <dbl> <dbl> <dbl> <dbl> <fct> #> 1 5.1 3.5 1.4 0.2 setosa #> 2 4.9 3 1.4 0.2 setosa #> 3 4.7 3.2 1.3 0.2 setosa #> 4 4.6 3.1 1.5 0.2 setosa #> 5 5 3.6 1.4 0.2 setosa #> 6 5.4 3.9 1.7 0.4 setosa #> 7 4.6 3.4 1.4 0.3 setosa #> 8 5 3.4 1.5 0.2 setosa #> 9 4.4 2.9 1.4 0.2 setosa #> 10 4.9 3.1 1.5 0.1 setosa #> # … with 140 more rows#> # A tibble: 3 x 2 #> .sample per_strap_mean #> <int> <dbl> #> 1 1 1.17 #> 2 2 1.36 #> 3 3 1.26iris %>% group_by(Species) %>% samplify(times = 3, size = 20) %>% summarise(per_strap_species_mean = mean(Petal.Width))#> # A tibble: 9 x 3 #> # Groups: Species [3] #> Species .sample per_strap_species_mean #> <fct> <int> <dbl> #> 1 setosa 1 0.265 #> 2 setosa 2 0.215 #> 3 setosa 3 0.25 #> 4 versicolor 1 1.29 #> 5 versicolor 2 1.36 #> 6 versicolor 3 1.36 #> 7 virginica 1 2.10 #> 8 virginica 2 2.04 #> 9 virginica 3 2.05# Alter the name of the group with `key` # Materialize them with collect() samps <- samplify(iris, times = 3, size = 5, key = ".samps") collect(samps)#> # A tibble: 15 x 6 #> # Groups: .samps [3] #> .samps Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> <int> <dbl> <dbl> <dbl> <dbl> <fct> #> 1 1 5.9 3 4.2 1.5 versicolor #> 2 1 5.7 2.8 4.5 1.3 versicolor #> 3 1 6.2 2.2 4.5 1.5 versicolor #> 4 1 4.4 3.2 1.3 0.2 setosa #> 5 1 4.8 3 1.4 0.1 setosa #> 6 2 5 2 3.5 1 versicolor #> 7 2 5.8 2.7 5.1 1.9 virginica #> 8 2 5.5 2.6 4.4 1.2 versicolor #> 9 2 4.9 3.6 1.4 0.1 setosa #> 10 2 6 3 4.8 1.8 virginica #> 11 3 6.5 3 5.5 1.8 virginica #> 12 3 5.2 2.7 3.9 1.4 versicolor #> 13 3 5.5 4.2 1.4 0.2 setosa #> 14 3 5.4 3.7 1.5 0.2 setosa #> 15 3 5.6 3 4.5 1.5 versicolor#> # A tibble: 15 x 8 #> # Groups: .samps [3] #> .samps .id .orig_id Sepal.Length Sepal.Width Petal.Length Petal.Width #> <int> <int> <int> <dbl> <dbl> <dbl> <dbl> #> 1 1 1 62 5.9 3 4.2 1.5 #> 2 1 2 56 5.7 2.8 4.5 1.3 #> 3 1 3 69 6.2 2.2 4.5 1.5 #> 4 1 4 43 4.4 3.2 1.3 0.2 #> 5 1 5 13 4.8 3 1.4 0.1 #> 6 2 1 61 5 2 3.5 1 #> 7 2 2 102 5.8 2.7 5.1 1.9 #> 8 2 3 91 5.5 2.6 4.4 1.2 #> 9 2 4 38 4.9 3.6 1.4 0.1 #> 10 2 5 139 6 3 4.8 1.8 #> 11 3 1 117 6.5 3 5.5 1.8 #> 12 3 2 60 5.2 2.7 3.9 1.4 #> 13 3 3 34 5.5 4.2 1.4 0.2 #> 14 3 4 11 5.4 3.7 1.5 0.2 #> 15 3 5 67 5.6 3 4.5 1.5 #> # … with 1 more variable: Species <fct>#---------------------------------------------------------------------------- # Be careful not to specify a `size` larger # than one of your groups! This will throw an error. iris_group_sizes_of_50_and_5 <- iris[1:55,] %>% group_by(Species) %>% group_trim() count(iris_group_sizes_of_50_and_5, Species)#> # A tibble: 2 x 2 #> # Groups: Species [2] #> Species n #> <fct> <int> #> 1 setosa 50 #> 2 versicolor 5# size = 10 > min_group_size = 5 # \dontrun{ iris_group_sizes_of_50_and_5 %>% samplify(times = 2, size = 10)#> Error: `size` (10) must be less than or equal to the size of the data / current group (5), set `replace = TRUE` to use sampling with replacement.# } # Instead, pass a vector of sizes to `samplify()` if this # structure is absolutely required for your use case. # size of 10 for the first group # size of 5 for the second group # total number of rows is 10 * 2 + 5 * 2 = 30 iris_group_sizes_of_50_and_5 %>% samplify(times = 2, size = c(10, 5)) %>% collect()#> # A tibble: 30 x 6 #> # Groups: Species, .sample [4] #> .sample Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> <int> <dbl> <dbl> <dbl> <dbl> <fct> #> 1 1 4.8 3.4 1.9 0.2 setosa #> 2 1 5 3 1.6 0.2 setosa #> 3 1 4.8 3 1.4 0.3 setosa #> 4 1 5.1 3.5 1.4 0.3 setosa #> 5 1 4.9 3.1 1.5 0.2 setosa #> 6 1 5.7 3.8 1.7 0.3 setosa #> 7 1 5 3.2 1.2 0.2 setosa #> 8 1 5.7 4.4 1.5 0.4 setosa #> 9 1 4.8 3 1.4 0.1 setosa #> 10 1 5.3 3.7 1.5 0.2 setosa #> # … with 20 more rows