by Guangming Lang
3 min read

Categories

  • r

When calculating the bias and precision of a sample estimate to the popluation parameter, because we often don’t know the true value of the population parameter and we often only have one sample, we use a procedure called bootstrap. The idea is simple:

  1. treat the sample as the population
  2. understand the sampling scheme, i.e., how the sample was taken from the population
  3. sample the same number of observations with replacement from the sample according to the same sampling scheme
  4. for each of the new samples, calculate its sample statistic. For example, if you’re interested in the mean of the population, you just calculate the sample average. If you’re interested in the sd of the population, you just calculate the sample sd.
  5. these sample statistics form a distribution. Take its average and use it as a proxy to the true value of the population parameter. Plug it and the sample statistic of your original sample into the formulas of bias and precision.

The following is a concrete example implementing the above bootstrap procedure using R and some stock price data.

Step 1. Download the monthly adjusted closing price data of VTSMX since Sept 2005 from Yahoo!. Change the class of the time index to yearmon. And calculate continuously compounded returns.

library(zoo)
library(tseries)
## Error in library(tseries): there is no package called 'tseries'
VTSMX = get.hist.quote(instrument="VTSMX", start="2005-09-30", 
                       end="2014-09-30", origin="1970-01-01",
                       quote="AdjClose", provider="yahoo",
                       compression="m", retclass="zoo")
## Error in get.hist.quote(instrument = "VTSMX", start = "2005-09-30", end = "2014-09-30", : could not find function "get.hist.quote"
index(VTSMX) = as.yearmon(index(VTSMX))
## Error in index(VTSMX): object 'VTSMX' not found
colnames(VTSMX) = "VTSMX"
## Error in colnames(VTSMX) = "VTSMX": object 'VTSMX' not found
ret.cc = diff(log(VTSMX))
## Error in diff(log(VTSMX)): object 'VTSMX' not found
head(ret.cc, 4)
## Error in head(ret.cc, 4): object 'ret.cc' not found

Step 2. Calculate the sample average c.c. returns

muhat.VTSMX = mean(ret.cc)
## Error in mean(ret.cc): object 'ret.cc' not found
cat(paste0(round(muhat.VTSMX*100, 2), "%"))
## Error in paste0(round(muhat.VTSMX * 100, 2), "%"): object 'muhat.VTSMX' not found

Step 3. Calculate bias and precision of the mean using bootstrap.

VTSMX = coredata(ret.cc)
## Error in coredata(ret.cc): object 'ret.cc' not found
n.samples = 999
muhat.boot = rep(0, n.samples)
nobs = length(ret.cc)
## Error in eval(expr, envir, enclos): object 'ret.cc' not found
for (i in 1:n.samples) {
        boot.data = sample(VTSMX, nobs, replace=TRUE)
        muhat.boot[i] = mean(boot.data)
}
## Error in sample(VTSMX, nobs, replace = TRUE): object 'VTSMX' not found
# bootstrap bias
bias = mean(muhat.boot) - muhat.VTSMX
## Error in eval(expr, envir, enclos): object 'muhat.VTSMX' not found
cat(paste0(round(bias*100, 2), "%"))
## Error in paste0(round(bias * 100, 2), "%"): object 'bias' not found
# bootstrap SE
precision = sd(muhat.boot)
cat(paste0(round(precision*100, 2), "%"))
## 0%

Step 4. Instead of doing the bootstrap procedure ourselves, we can use the boot() function in the boot library. For example, we can use the following code to calculate the bootstrap bias and precision for the volatility (standard deviation).

library(boot)

# define a function to be passed in the boot() function
sd.boot = function(x, idx) {
        # x: data to be resampled
        # idx: vector of scrambled indices created by boot() function
        #
        # returns: sd value computed using resampled data
        sd(x[idx])
}

# pass sd.boot to the boot() function to calculate the bootstrap sd's
VTSMX.sd.boot = boot(VTSMX, statistic=sd.boot, R=999)
## Error in NROW(data): object 'VTSMX' not found
class(VTSMX.sd.boot)
## Error in eval(expr, envir, enclos): object 'VTSMX.sd.boot' not found
VTSMX.sd.boot
## Error in eval(expr, envir, enclos): object 'VTSMX.sd.boot' not found
plot(VTSMX.sd.boot)
## Error in plot(VTSMX.sd.boot): object 'VTSMX.sd.boot' not found

Step 5. We can also calculate the bootstrap 95% confidence intervals of the volatility.

# bootstrap confidence interval
boot.ci(VTSMX.sd.boot, conf=0.95, type=c("norm", "perc"))
## Error in boot.ci(VTSMX.sd.boot, conf = 0.95, type = c("norm", "perc")): object 'VTSMX.sd.boot' not found

Step 6. A common measure in risk management is Value at Risk (VaR). We can find the bootstrap 95% confidence intervals of the VaR assuming the initial investment is $100,000.

# define a function to calculate VaR
VaR.boot = function(x, idx, p=0.05, w=100000) {
        # w: initial investment
        q = mean(x[idx]) + sd(x[idx]) * qnorm(p)
        w * (exp(q) - 1)
}

# pass VaR.boot to the boot() function to calculate the bootstrap VaRs
VTSMX.VaR.boot = boot(VTSMX, statistic=VaR.boot, R=999)
## Error in NROW(data): object 'VTSMX' not found
boot.ci(VTSMX.VaR.boot, conf=0.95, type=c("norm", "perc"))
## Error in boot.ci(VTSMX.VaR.boot, conf = 0.95, type = c("norm", "perc")): object 'VTSMX.VaR.boot' not found