Saturday, June 19, 2010

Akaike and Bayesian Information Criteria

Usually, when we have data, we want to know what (parametric) model did the data come from. In other words, we want to do model fitting to the data. Different models consists of different number of parameters or different parameters themselves.

When there are several models to choose from, we estimate model parameters using maximum likelihood estimation. We may want to increase the likelihood by adding more parameters but that may result in overfitting.

Akaike Information Criteria (AIC) and Bayesian Information Criteria (BIC) resolve the overfitting problem by penalizing for the number of parameters in the model.

The formula for the AIC is

\[AIC = 2k - 2 ln(L)\]
and, the formula for the BIC is

\[-2\cdot ln\;p(x|k)\approx BIC=-2\cdot ln L + k ln(n)\]
where $x$ is the observed data, $n$ the number of data points in $x$, $k$ the number of parameters, $p(x|k)$ the likelihood of the observed data given the number of parameters and $L$ the value of the likelihood function for the estimated model.

For example, let generate some random values
x = binornd(10,0.5,1,1000);

and pretend that we do not know what distribution function they are from. There are two models we want to fit the data, $X\sim Bino(10,0.5)$ and $X\sim Poiss(5)$.

First, try plotting the data, and the two theoretical points.
[n,xi]=hist(x);
bar(xi,n/(sum(n)*(xi(2)-xi(1))),1)
hold on;
l=min(x):max(x);
px=binopdf(l,10,0.5);
py=poisspdf(l,5);
plot(l,px,'r.');  % theoretical binomial dist
plot(l,py,'rx');  % theoretical poisson dist
hold off;
axis tight;

 Now, try using the AIC and BIC for model selection
px=binopdf(x,10,0.5);
px(px == 0)=[];
llx=sum(log(px));
py=poisspdf(x,5);
py(py == 0)=[];
lly=sum(log(py));
[aic,bic]=aicbic([llx,lly],[2,1],[length(px),length(py)])


aic =
  1.0e+003 *
    3.7569    3.9542

bic =
  1.0e+003 *
    3.7667    3.9591


The first model $X\sim Bino(10,0.5)$ has lower AIC and BIC, so it is a better model.

Next, try using different random values.

x = poissrnd(5,1,10000);

aic =
  1.0e+004 *
    4.4925    4.4069

bic =
  1.0e+004 *
    4.4939    4.4076


This time the second model $X\sim Poiss(5)$ has lower AIC and BIC so it is a better model for the data.

No comments:

Post a Comment