When there are several models to choose from, we estimate model parameters using maximum likelihood estimation. We may want to increase the likelihood by adding more parameters but that may result in overfitting.
Akaike Information Criteria (AIC) and Bayesian Information Criteria (BIC) resolve the overfitting problem by penalizing for the number of parameters in the model.
The formula for the AIC is
\[AIC = 2k - 2 ln(L)\]
and, the formula for the BIC is
\[-2\cdot ln\;p(x|k)\approx BIC=-2\cdot ln L + k ln(n)\]
where $x$ is the observed data, $n$ the number of data points in $x$, $k$ the number of parameters, $p(x|k)$ the likelihood of the observed data given the number of parameters and $L$ the value of the likelihood function for the estimated model.
For example, let generate some random values
x = binornd(10,0.5,1,1000);
and pretend that we do not know what distribution function they are from. There are two models we want to fit the data, $X\sim Bino(10,0.5)$ and $X\sim Poiss(5)$.
First, try plotting the data, and the two theoretical points.
[n,xi]=hist(x);
bar(xi,n/(sum(n)*(xi(2)-xi(1))),1)
hold on;
l=min(x):max(x);
px=binopdf(l,10,0.5);
py=poisspdf(l,5);
plot(l,px,'r.'); % theoretical binomial dist
plot(l,py,'rx'); % theoretical poisson dist
hold off;
axis tight;
bar(xi,n/(sum(n)*(xi(2)-xi(1))),1)
hold on;
l=min(x):max(x);
px=binopdf(l,10,0.5);
py=poisspdf(l,5);
plot(l,px,'r.'); % theoretical binomial dist
plot(l,py,'rx'); % theoretical poisson dist
hold off;
axis tight;
Now, try using the AIC and BIC for model selection
px=binopdf(x,10,0.5);
px(px == 0)=[];
llx=sum(log(px));
py=poisspdf(x,5);
py(py == 0)=[];
lly=sum(log(py));
[aic,bic]=aicbic([llx,lly],[2,1],[length(px),length(py)])
aic =
1.0e+003 *
3.7569 3.9542
bic =
1.0e+003 *
3.7667 3.9591
The first model $X\sim Bino(10,0.5)$ has lower AIC and BIC, so it is a better model.
Next, try using different random values.
x = poissrnd(5,1,10000);
aic =
1.0e+004 *
4.4925 4.4069
bic =
1.0e+004 *
4.4939 4.4076
This time the second model $X\sim Poiss(5)$ has lower AIC and BIC so it is a better model for the data.