Bootstrap: Estimate Statistical Uncertainties
Bootstrap: Estimate Statistical Uncertainties
Bootstrap: Estimate Statistical Uncertainties
Bootstrap
Estimate statistical uncertainties
Abstract
Bootstrapping is a method to estimate the statistical uncertainty of some quantity when straight-
forward error propagation is not feasible. In this note, we will investigate the technique of boot-
strapping through example using Python
This document is available in many formats at https://cholmcc.gitlab.io/nbi-python
Contents
1 Introduction 1
2 The method 1
3 Implementation 1
5 Example 3
6 Simulating 4
7 Confidence intervals 5
7.1 Normal confidence interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
7.2 Quantile confidence interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
7.3 Pivotal confidence interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
8 Another example 8
11 Summary 18
4 Bootstrap
1 Introduction 1
1 Introduction
The technique of bootstrapping is a way to estimate the statistical uncertainty of some quantity. It
is most often used when the variance of the quantity (or more formally estimator) is not feasible to
calculate directly from the data. Some examples are
or more complicated quantities such as the azimuthal anisotropic flow calculated from so-called Q-
cumulants (see f.eks. here). That is, if we are estimating a simple quantity like the mean of a sample,
we would not use the bootstrap method, since the variance of the mean
Var[ x ]
Var[ x̄ ] = ,
N
with sample size N, is easily computed from the sample directly.
2 The method
The method of bootstrapping was invented by Bradly Efron (see for example L.Wasserman All of
Statistics, Chapter 8), goes roughly like this
• Suppose we are interested in the quantity T calculated over the data X (more formally, T is a
statistics). We estimate T via the estimator T̂ over the sample X1 , X2 , . . . X N , of size N. We are
interested in estimating the variance Var( T ( X ))
• First, estimate the T over our sample X
• Secondly, for some number of iterations B, do
– Select, at random with replacement, N samples from the original sample
– Calculate the estimate T over this sample
• Finally, calculate the variance of T estimated over the B generated samples.
The underlying reasoning hinges on the law of large numbers, which says that for sufficiently large
number of independent identitical distributed variable the distribution will tend to a normal distribu-
tion. Thus, by making B (a large number) of simulations, we can approximate the original estimator
variance by the variance of the simulations.
Each simulation is performed by sampling the original sample X1 , X2 , . . . , X N exactly N times with
replacement. By with replacement we mean that the probability to draw Xi is exactly 1/N for all N
samples in the simulation. Thus, in our simulation sample the multiplicty of any Xi is anywhere from
0 to N.
3 Implementation
We can implement a general solution to the bootstrap method in Python. The first thing we need is the
ability to make our simulation samples. Here, we can use the standard function random.choices. To
see this, let us pick as the sample the numbers between 0 and 9 (inclusive), and make some simulation
samples
import random
random .seed (123456)
data = list( range (10))
for _ in range (5):
print( random . choices (data ,k=len(data )))
2 Bootstrap
[8, 7, 0, 1, 0, 6, 0, 2, 1, 2]
[4, 1, 0, 9, 3, 6, 3, 8, 7, 4]
[4, 8, 5, 5, 8, 0, 7, 0, 5, 0]
[8, 7, 5, 4, 7, 3, 3, 2, 1, 8]
[2, 2, 1, 9, 4, 3, 9, 0, 4, 1]
Secondly, our solution will need to accept some estimator function T̂ to operator on our simulation
samples. We will simple take that as an argument in form of a callable. The final input to our solution
in the choice of number of simulations B. Since we may want to calculate other statistics than the
variance on our bootstrap sample, we will return the entire list of B estimates of T over all simulations.
Thus, our solution becomes
def bootstrap (data ,estimator ,size =1000 ,* args ):
""" Perform the bootstrap simulation run.
Parameters
----------
data :
The data to analyse . This can be any indexable object . That is , we must be
able to do
>>> v = data[i]
estimator : callable
This function is evaluated over data (with the same type as the data argument )
repeatedly to calculate the estimator on the bootstrap simulation . It must
accept a single argument of the same type as data. Additional arguments can
be passed in the args argument .
size : int , positive
The number of bootstrap simulations . This number should be large ( >1000).
*args : dict
Additional arguments to pass to estimator function
Returns
-------
value : generator
The estimator function evaluated over size bootstrap simulations . One can
calculate the variance of this list to get the estimate of the estimator
variance
"""
from random import choices
Thus, to calculate the bootstrap estimate of the variance of an estimator, we simply pass in our
indexable data and our estimator function, get back a generator (which we can evaluate immediately
using list, if needed) on which we can calculate the variance.
Below we want to calculate the variance and quantiles of samples, so we will define a few helper
functions. The first one will return the mean and the variance of a sample
def meanVar (x,ddof =0):
n = len(x)
m = sum(x)/n
v = sum ([(xx -m)**2 for xx in x])/(n-ddof)
return m,v
5 Example 3
Again, we could have used NumPy for this, but for the sake of illustration we code it up ourselves.
Let us make a sample ∼ U (0, 1) and calculate the mean (0.5) and variance (1/12):
m, v = meanVar ([ random . random () for _ in range (1000)])
print ('{:.3f} +/- {:.3f} ( expect {:.3f} and {:.3f}) '. format (m,v ,.5 ,1/12))
0.500 +/- 0.082 ( expect 0.500 and 0.083)
The next function will calculate the α quantile of a sample. Essentially what we need to do is order
the data and return the element at index αN where N is the number of samples.
def quantile (x,alpha ,key=None ):
return sorted (x,key=key )[ int(alpha *len(x))]
5 Example
The following example is due to Bradly Efron (reproduced in L.Wasserman All of Statistics, Chapter
8). A law school is interested in the correlation between LSAT (Law School Achievement Test) and
GPA (Grade Point Average) scores. That is
∑i (Yi − Ȳ )( Zi − Z̄ )
θ̂ = p p ,
∑i (Yi − Ȳ )2 ∑i ( Zi − Z̄ )2
where Y is the LSAT score, and Z the GPA score. First, let us get some data to work on.
lsat =[576 , 635, 558, 578, 666, 580, 555, 661, 651, 605, 653, 575, 545, 572, 594]
gpa =[3.39 ,3.30 ,2.81 ,3.03 ,3.44 ,3.07 ,3.00 ,3.43 ,3.36 ,3.13 ,3.12 ,2.74 ,2.76 ,2.88 ,2.96]
We could have used NumPy here to perform this calculation more easily, but for the sake of illustration
we write it out.
Now, our general bootstrap function expects the callable to take a single data argument, so we will
wrap corr in another function below. Let us write a function that calculates θ̂ and estimates the
standard deviation of θ̂ using the bootstrap method.
def corrLsatGpa (lsat ,gpa ,b =1000):
def est(data ):
""" Wrapper """
4 Bootstrap
• θ̂ the estimate
q of the correlation,
b (θ̂ ) = Varboot [θ̂ ] the bootstrap estimate of the standard deviation, and
• se
The last return value is mainly done in the interest of visualising the simulation. Let us run the
example and plot
(Figure 1)
6 Simulating
It is worth noting, that the method of bootstrapping is based on the law or large numbers. That is
what necessitates that we perform a relative large number of simulations to get an estimate of the
variance of our estimator.
7 Confidence intervals 5
The data
Estimate
3.0 Bootstrap
3.4 se
3.3 2.5
3.2 2.0
3.1
GPA
1.5
3.0
1.0
2.9
0.5
2.8
0.0
540 560 580 600 620 640 660 0.2 0.4 0.6 0.8 1.0
LSAT
Figure 1: Left: GPA versus LSAT data. Right: Bootstrap estimate of uncertainty
To see this, let us run the above example with a varying number of steps ranging from 3 to 10000 and
then plot the estimated standard deviation as a function of the number of steps.
bs = [3, 6, 10, 30, 60, 100, 300, 600, 1000 , 3000 , 6000 , 10000]
ob = []
os = []
for b in bs:
ob. append ([b ]*10)
os. append ([ corrLsatGpa (lsat ,gpa ,b)[1] for _ in range (10)])
(Figure 2)
The exact shape of the curve depends on the state of the random number generator used by random.choices,
b (θ̂ ) does not stabilize until B is sufficiently large. Thus,
but in general we see that the estimate of se
we must ensure sufficiently large number of simulations when applying the bootstrap method, or our
estimate of the variance of the estimator is wholly uncertain.
7 Confidence intervals
We can estimate confidence intervals from our bootstrap estimate of the variance in three ways
In this method, we assume that the estimator is roughly normal, and we can give the standard 2σ
confidence limits
6 Bootstrap
0.25
Uncertainty estimate as a function of number of simulations
0.20
0.15
se( )
0.10
0.05
θ̂ − 2se
b (θ̂ ), θ̂ + 2se
b (θ̂ ) .
Parameters
----------
theta : value
Estimate
boot : data
Bootstrap sample
z : factor
Number of standard errors
Return
------
low , high : tuple
Confince interval
"""
_, var = meanVar (boot)
se = math.sqrt(var)
return theta - z * se , theta + z * se
Let us calculate the confidence interval for the LSAT versus GPA example above
An alternative, which does not assume θ̂ is roughly normal, but will tend to underestimate the confi-
dence range is to calculate the α and 1 − α quantiles. That is, we quote the confidence limits as
where Qα is the α quantile of the bootstrap samples. Again, we will code this up in a function.
def bootstrapQuantileCL (theta ,boot ,alpha =0.05):
""" Calculate the quantile confidence limits on the estimate
theta and the bootstrap sample boot , where alpha is the percentile
below and above
Parameters
----------
theta : value
Estimate
boot : data
Bootstrap sample
alpha : percentage
Percentage below and above the confidence limits
Return
------
low , high : tuple
Confidence interval
"""
return quantile (boot ,alpha ), quantile (boot ,1- alpha )
Let us, again, calculate the 5% and 95% confidence limits on the LSAT versus GPA example above
qlim = bootstrapQuantileCL (theta ,boot ,0.05)
print ('Confidence limits ( quantile ): {:.3f} ,{:.3f}'. format (* qlim ))
Confidence limits ( quantile ): 0.531 ,0.950
This method uses the estimate θ̂ and the α quantiles of the bootstrap simulations, and give the
confidence limits as
Parameters
----------
theta : value
Estimate
boot : data
Bootstrap sample
alpha : percentage
Percentage below and above the confidence limits
Return
------
low , high : tuple
8 Bootstrap
Estimate
Pivot (5-95)%
Quantile (5-95)%
Normal (2 )
Confidence interval
"""
return 2* theta - quantile (boot ,1- alpha ),2* theta - quantile (boot , alpha )
(Figure 3)
We note that the normal and pivot confidence limits exceed 1 on the high end, which indicate that
these two estimates tend to overestimate the size of the confidence interval. The quantile confidence
limits, on the other hand is probably on the low side, but does reflect the distribution of the bootstrap
sample in this example.
8 Another example
This example comes from the exercises of L.Wasserman All of Statistics, Chapter 8.
We have a sample of 100 observations X ∼ N (5, 1), and we are interested in the statistics θ = eµ , for
which we will use the estimator θ̂ = e X̄ . We will use the bootstrap method to calculate the standard
uncertainty and 95% confidence limits on θ̂.
First, let us make our sample, and calculate our estimator
8 Another example 9
0.035
Estimate
0.030 Normal
Quantile
0.025 Pivot
Data
Bootstrap
0.020 se
0.015
0.010
0.005
0.000
0 500 1000 1500 2000
Next, we generate our bootstrap sample and calculate the standard uncertainty and confidence limits
using all three methods above and plot them with the distribution of e X as well as the bootstrap
distribution.
boot = list( bootstrap (data , lambda d:math.exp(sum(d)/ len(d))))
_, var = meanVar (boot)
(Figure 4)
We immediately see that the bootstrap sample is much more narrowly centred around the estimate,
and the width of the distribution reflect well the expected variance
2
∂θ Var[ x ]
Var[θ ] ≈ δ2 x̄ = e2x̄ Var[ x̄ ] = e2x̄ ,
∂ x̄ N
10 Bootstrap
This approach was developed by Maurice Quenouille (see appendix to L.Wasserman All of Statistics,
Chapter 8) and predates the bootstrap method. The idea is again to use the observed data to simulate
variations in the sample and then estimate the sample variance from these simulations.
• Suppose we are interested in the quantity T calculated over the data X (more formally, T is a
statistics). We estimate T via the estimator T̂ over the sample X1 , X2 , . . . X N , of size N. We are
interested in estimating the variance Var( T ( X ))
– For the ith iteration, calculate the estimate T leaving out the ith data point. That is we
take the sample X1 , . . . , Xi−1 , Xi+1 , . . . , X N and calculate the estimate on that sample
• Finally, calculate the variance of T estimated over the N generated samples given by
N−1 N
N ∑
2
Var[ T ] = ( Ti − T̄ ) ,
i
where Ti is the estimate calculated over the ith jackknife sample and T̂ is the mean of the estimate
calculated over all jackknife samples.
We can code this up in a general function. As before, we expect an indexable data set and a function
to calculate the estimator.
def jackknife (data ,estimator ,* args ):
""" Generate the jackknife samples and evaluate the estimator over
these.
Parameters
----------
data :
The data to calculate the jackknife samples over
estimator : callable
The function to calculate the estimator
Returns
-------
jack :
The estimator calculated over all jackkknife samples
"""
def _inner (data ,estimator ,i):
return estimator (( data[j] for j in range(len(data )) if j != i),* args)
16 Estimate
Jackknife
14 se
12
10
8
6
4
2
0
0.65 0.70 0.75 0.80 0.85 0.90
Let us apply this method to our example above of the correlation between LSAT and GPA
def jkLsatGpa (lsat ,gpa ):
def est(data ):
""" Wrapper """
d = list(data)
y = [lsat for lsat ,_ in d]
z = [gpa for _,gpa in d]
return corr(y,z)
We run this example and compare to the previous result of 0.776 ± 0.127
theta , std , jk = jkLsatGpa (lsat ,gpa)
print ("LSAT versus GPA correlation : {:.3f} +/- {:.3f}". format (theta ,std ))
plt. figure ()
plot1 (plt.gca (),jk ,theta ,std ,'Jackknife ')
LSAT versus GPA correlation : 0.776 +/- 0.143
Jackknife 0.77637 +/- 0.14252
(Figure 5)
We use the jackknife method on our generated data from above. First, we calculate our estimate
θ̂ = e X̄ of the sample X ∼ N (5, 1)
12 Bootstrap
Estimate
0.25 Jackknife
se
0.20
0.15
0.10
0.05
0.00
155 160 165 170 175 180 185
which is clearly the same as before, and then we perform our jackknife analysis to finde the variance.
We plot the result as before
def est(data ):
d = list(data)
return math.exp(sum(d)/ len(d))
jk = list( jackknife (data ,est ))
_, var = meanVar (jk)
std = math.sqrt(var *( len(data ) -1))
plot1 (plt.gca (),jk ,theta ,std ,'Jackknife ')
Jackknife 169.10419 +/- 15.88762
(Figure 6)
Clearly, the jackknife method does not produce as wide simulated distributions as the bootstrap
method does, and consequently, the estimates of the variance are more uncertain. If possible, one
should opt for the bootstrap method over the jackknife method.
Suppose we have analysed millions of events { E1 , . . .} for a particular observable X. We have split our
events Ei into N sub-samples
[
N
Si = { E1 , . . .} ,
i
{ X1 , . . . , X N } .
All the observations Xi are independent identically distributed (iid) random variable, in that
∀i, j ∈ {1, . . . , N } ∧ i ̸= j : S j ∩ Si = ∅ ,
and the events Ei are assumed to be equal ins some meaning of that word. Thus, we want to estimate
θ over and its variance. Our estimator is then the mean of the N samples
N
1
θ̂ =
N ∑ Xi ,
i =1
and we will use the bootstrap and jackknife methods for estimating the variance and standard uncer-
tainty.
Here, we will choose N = 10 and X ∼ N (0, 1) without loss of generality. Thus, we expect to find that
θ̂ = 0 ± 0.1 .
We can of course calculate the mean and the variance directly from this sample to obtained the sample
mean and standard uncertainty
m, v = meanVar (data ,1)
e = math.sqrt(v/len(data ))
mes = '{:10s} mean = {:.3f} and variance = {:.3f} -> {:.3f} +/- {:.3f}'
print (mes. format ('Sample ', m, v, m, e))
Sample mean = 0.070 and variance = 0.530 -> 0.070 +/- 0.230
1.0
Estimate Estimate Estimate
Direct 1.75 Bootstrap Jackknife
se se 8 se
0.8 1.50
1.25 6
0.6
1.00
4
0.4 0.75
0.50
2
0.2
0.25
0.0 0.00 0
1.0 0.5 0.0 0.5 1.0 1.0 0.5 0.0 0.5 1.0 1.0 0.5 0.0 0.5 1.0
(Figure 7)
As is clear from the results above, it makes little sense to use the bootstrap or jackknife methods for
estimating the variance if the estimator in question is a simple estimator such as the mean.
Suppose, again, we are analysing millions of events which we may split into some number N of sub-
samples. For each sub-sample i we calculate some quantity from which we will derive a complicated
estimator θ̂i . This could for example be
− a + 2b
θ̂i = ,
c3
where a, b, and c are calculated over the sub-samples. The final estimator over the sub-samples is
then the average
1
θ̂ =
N ∑ θ̂i .
i
Let us try to simulate this case. We will generate 10000 events with
• a ∼ N (1, 1)
• b ∼ N (5, 1)
• c ∼ N (3, 1)
10 When not to do bootstrap or jackknife 15
from which we will select N = 10 sub-samples and calculate the means of a, b, and c.
events = [( random . normalvariate (1,1),
random . normalvariate (5,1),
random . normalvariate (3 ,1))
for _ in range (1000)]
data = list(zip (*[ events [i:: len( events )//10]
for i in range(len( events )//10)]))
data = [( sum(a for a,_,_ in sub )/ len(sub),
sum(b for _,b,_ in sub )/ len(sub),
sum(c for _,_,c in sub )/ len(sub ))
for sub in data]
Let us define the estimator function which calculates the average over θ̂i , and evaluate it on the 10
subsamples
def est(data ):
def _inner (a,b,c):
return (-a + 2*b) / c**3
d = list(data)
return sum( _inner (a,b,c) for a,b,c in d)/ len(d)
theta = est(data)
print ('Estimator {} '. format ( theta ))
Estimator 0.3319428026358412
Let us do the bootstrap and jackknife methods to estimate the variance of θ̂, as well as direct estimate
from the N sub-sample results
dirc = [(-a + 2*b)/c**3 for a,b,c in data]
dmean ,dvar = meanVar (dirc)
dstd = math.sqrt(dvar / len(dirc ))
fig , ax = plt. subplots ( ncols =3, figsize =(10 ,6) , sharex =True)
plot1 (ax [0], dirc ,theta ,dstd ,'Sub - samples ')
plot1 (ax [1], boot ,theta ,bstd ,'Bootstrap ')
plot1 (ax [2], jack ,theta ,jstd ,'Jackknife ')
fig. tight_layout ()
Sub - samples 0.33194 +/- 0.00882
Bootstrap 0.33194 +/- 0.00842
Jackknife 0.33194 +/- 0.00929
(Figure 8)
Again, we see that the bootstrap and jackknife methods does not provide significant advantages over
direct calculation of the variance from the N sub-samples. This is, of course, because the final estimator
is a simple average of the sub-samples.
We will continue the example above, where we however will store a, b, c as calculated in each event
and our final estimator becomes
16 Bootstrap
30 Estimate Estimate
Bootstrap Jackknife
se 250 se
40
25
200
20 30
150
15
20
100
10
10
5 50
Estimate
Sub-samples
se
0 0 0
0.28 0.30 0.32 0.34 0.36 0.38 0.28 0.30 0.32 0.34 0.36 0.38 0.28 0.30 0.32 0.34 0.36 0.38
− ā + 2b̄
θ̂ = .
c̄3
We will thus perform the bootstrap analysis by sampling new events from our empirical distributions
of a, b, and c and calculate the estimator value for each of those samples. Note, in this case, it is not
easy to calculate the variance directly from the data, so we will refrain from doing so.
We use the events generated above to do our estimate and variance estimates, but first, we need a
function to calculate the mean of a, b, and c over all events.
def est(data ):
d = list(data)
a = sum(aa for aa ,_,_ in d) / len(d)
b = sum(bb for _,bb ,_ in d) / len(d)
c = sum(cc for _,_,cc in d) / len(d)
return (-a + 2*b) / c**3
2 2
1 2 Var[ a] 2 Var[b] 3(− ā + 2b̄) Var[c]
Var[θ ] = − 3 + 3 + −
c̄ N c̄ N c̄4 N
−1 2 Cov[ a, b] −1 −3(− ā + 2b̄) Cov[ a, c] 2 −3(− ā + 2b̄) Cov[b, c]
+2 3 3 + 3 + 3
c̄ c̄ N c̄ c4 N c̄ c4 N
1 3 3(− ā + 2b̄)
= 6 Var[ a] + 4(Var[b] − Cov[ a, b]) + (− ā + 2b̄) Var[c] + Cov[ a, c] − 2 Cov[b, c] ,
c̄ N c̄ c̄
35 Estimate Estimate
Bootstrap Jackknife
se se
1000
30
25 800
20
600
15
400
10
200
5
0 0
0.30 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.30 0.31 0.32 0.33 0.34 0.35 0.36 0.37
b (θ̂ ) with samples from full event data. Left: Bootstrap, right:
Figure 9: Estimates of uncertainty se
Jackknife
n = len(data)
meana , vara = meanVar ([ aa for aa ,_,_ in data ])
meanb , varb = meanVar ([ bb for _,bb ,_ in data ])
meanc , varc = meanVar ([ cc for _,_,cc in data ])
covab = sum ([( aa - meana )*( bb - meanb) for aa ,bb ,_ in data ])/n
covac = sum ([( aa - meana )*( cc - meanc) for aa ,_,cc in data ])/n
covbc = sum ([( bb - meanb )*( cc - meanc) for _,bb ,cc in data ])/n
tmp = 3*( - meana + 2* meanb )/ meanc
dvar = 1/ meanc **6/n * (vara + 4*( varb+covab) + tmp *( tmp*varc+covac -2* covbc ))
dstd = math.sqrt(dvar)
print ('{:10s} {:.5f} +/- {:.5f}'. format ('Direct ',theta ,dstd ))
Direct 0.33046 +/- 0.00884
fig , ax = plt. subplots ( ncols =2, sharex =True , figsize =(10 ,6))
print ('{:10s} {:.5f} +/- {:.5f}'. format ('Direct ',theta ,dstd ))
plot1 (ax [0], boot ,theta ,bstd ,'Bootstrap ')
plot1 (ax [1], jack ,theta ,jstd ,'Jackknife ')
fig. tight_layout ()
Direct 0.33046 +/- 0.00884
Bootstrap 0.33046 +/- 0.01103
Jackknife 0.33046 +/- 0.01103
(Figure 9)
The 3 estimates all agree to one significant digit, and we see that even in this case we do better by
18 Bootstrap
11 Summary
The bootstrap and jackknife methods for estimating the variance of an estimator are powerful tools,
but are not generally applicable. Here are some key take-aways