-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Description
Hello! I am writing to describe some errors in the bootstrap function. I am happy to help fix these errors if we agree that they are indeed bugs.
The issue arises when you give the bootstrap a statistic which depends on bootstrap([X,y], statistic, n_resamples=7, paired=True)
to give us an array of size 7; it should just resample
However, this is not what happens. In this MWE below, the function fails to run. Errors in the resampling protocol are causing this bug.
- Firstly, to be consistent with the rest of scipy, we should have the first column index the length of the sample, and the rest of the columns indexx dimensions.
- Running
_bootstrap_resample(X, n_resamples=7)
returns an array with shape(n, n_resamples, d)
. Running_bootstrap_resample(y, n_resamples=7)
gives an array of shape(n_resamples,n)
. - Calculating `statistic(*resampled_data, axis=-1)' will fail, due to a dimension mismatch. In some other examples, it will fail silently, which can be worse, causing errors that propagate into downstream statistical analyses.
Thank you for your consideration!
import numpy as np
from scipy.stats import bootstrap
def statistic(x,y):
v = np.ones((2,))
vXt = [email protected]
return vXt@y
X = np.column_stack((0*np.ones((10,)), np.ones((10,))))
Y = 100*np.ones((10,))
data = [X,Y]
n_resamples = 100
statistic_original = statistic(*data)
print("Original statistic (working shape): ", statistic_original)
print("The following line fails due to bad shape!")
bsd = bootstrap([X,Y], statistic=statistic, n_resamples=15, paired=True).bootstrap_distribution
print(bsd, bsd.shape)