Bootstrapping Methods
Bootstrapping Methods
Bootstrapping
• In statistics and machine learning, bootstrapping is a
resampling technique that involves repeatedly drawing
samples from our source data with replacement, often
to estimate a population parameter.
• By “with replacement”, we mean that the same data
point may be included in our resampled dataset
multiple times.
• Typically our source data is only a small sample of the
ground truth. Bootstrapping is loosely based on the law
of large numbers, which says that with enough data the
empirical distribution will be a good approximation of
the true distribution.
• Using bootstrapping, we can generate a distribution of
estimates, rather than a single point estimate. The
distribution gives us information about certainty, or the
lack of it.
Let’s use the same approach to calculate a confidence
interval when evaluating the accuracy of a model on a held-
out test set. Steps:
• Draw a sample of size N from the original dataset with
replacement. This is a bootstrap sample.
• Repeat step 1 S times, so that we have S bootstrap samples.
• Estimate our value on each of the bootstrap samples, so that
we have S estimates
• Use the distribution of estimates for inference (for example,
estimating the confidence intervals).