set.seed(123)
library(strapgod)
library(dplyr)

Introduction

As much as possible, strapgod attempts to let you use any dplyr function that you want on the resampled_df object that is returned by bootstrapify() and samplify(). Some functions have specialized behavior, like summarise(), while most others just call collect() to materialize the bootstrap rows before passing on to the underlying dplyr function.

What follows is a list of the dplyr functions that have “special” properties when used on a resampled_df.

collect()

The most important dplyr function for strapgod is collect(). Generally, this has been used to force a computation from a data base query and return the results as a tibble, and it has a similar context here. collect() forces the materialization of the virtual groups, and returns the full grouped tibble back to you.

When calling collect() directly, there are two arguments available to extract extra information about the bootstraps.

id adds a sequence of integers from 1:n for each bootstrap group. It would be equivalent to adding the row_number() by group after the collect(), but saves some typing.

original_id tacks on the original row of the current bootstrap observation. It is generally more useful than id, as it provides a way to link the bootstrap rows back to the original data.

do()

While dplyr::do() is basically deprecated and has been replaced by group_modify(), it still has its uses sometimes. Like summarise(), do() materializes the groups only when they are required. Here we run the same linear model on each bootstrapped set of data.

group_split()

group_split() allows you to materialize all of the bootstrap tibbles into separate tibbles, all bundled together into a list.

You can specify keep = FALSE if you never want to see the bootstrap columns.

group_modify()

group_modify() is similar to do(), but (as of dplyr 0.8.0.1) always returns a data frame and gives you access to the non-group and group data separately.

Like do(), it can be a convenient way to run multiple models as long as you return a data frame from each one.