Sunday, June 6, 2010

Jackknife

Jackknife is a partitioning method similar to k-fold cross validation but with the purpose similar to some of the bootstrap, which is to estimate the bias and standard error of a measure (mean, variance, correlation coefficient, etc).

\[\hat{Bias}_{jack}(T)=(n-1)(\bar T_{jack}-T)\]
\[\hat{SE}_{jack}(T)={\left[\frac{n-1}{n}\sum\limits_{i=1}^n{(T^{(-i)}-\bar T_{jack})}^2\right]}^{1/2}\]
The following jackknife method set aside 1 data item and calculate the correlation coefficient $\rho$ between the test scores (lsat) and the GPA from a sample of 15 law students.

load law;
lsat=law(:,1);
gpa=law(:,2);
tmp=corrcoef(gpa,lsat);
T=tmp(1,2);

n=length(gpa);
reps=zeros(1,n);
for i=1:n
    lsatt=lsat([1:i-1,i+1:end]);
    gpat=gpa([1:i-1,i+1:end]);
    tmp=corrcoef(gpat,lsatt);
    reps(i)=tmp(1,2);
end
mureps=mean(reps);
sehat=sqrt((n-1)/n * sum((reps-mureps).^2))
biashat = (n-1)*(mureps-T)

sehat =
    0.1425

biashat =
   -0.0065

Or, we can use the MATLAB jackknife() function.

m=jackknife(@corr,gpa,lsat);
mum=mean(m);
sejhatM=sqrt((n-1)/n * sum((m-mum).^2))
biasjhatM = (n-1)*(mum-T)


sejhatM =
    0.1425

biasjhatM =
   -0.0065


On the other hand, the following code use bootstrap to obtain the bias and standard error of the same data.
 
bootstat=bootstrp(200,@corr,gpa,lsat);
sebhat=std(bootstat)
biasbhat=mean(bootstat)-T

sebhat =
    0.1372

biasbhat =
   -0.0041

No comments:

Post a Comment