Tuesday, May 11, 2010

Kernel Density Estimation

Nonparametric probability density estimation can be done using kernel estimators. For a simple univariate case, the kernel estimator is given by:
\[ {\mathop f\limits ^\wedge} _{Ker} (x) = \frac{1}{{nh}}\sum\limits_{i = 1}^n {K(\frac{{x - {X_i}}}{h})} \]
where the function K(t) is call a kernel ($\int {K(t)dt = 1} $).
If we define $ {K_h}(t) = K(t/h)/h$, then
\[ {\mathop f\limits ^\wedge} _{Ker} (x) = \frac{1}{{n}}\sum\limits_{i = 1}^n {K({x - {X_i}})} \]
The smoothing parameter h is called window width, or bandwidth and 1/h is called weight.  A small h yields rough curve while large h yields smooth curve.

The choice of h by Silverman [1986] is $1.06(\sigma){n^{ - 1/5}}$ or $0.786(IQR){n^{ - 1/5}}$ whatever is smaller.

For example, using normal kernel $ K(t) = \frac{1}{{\sqrt {2\pi } }}\exp \left\{ {\frac{{ - {t^2}}}{2}} \right\} $, the following MATLAB code illustrates a kernel density estimation for n=10.

n=10;
kCenter=randn(1,n);
x=linspace(-5,5,100);
fhat=zeros(size(x));
h=1.06*n^(-1/5);
figure;
hold on;
for i=1:n
    f=exp(-((x-kCenter(i))/h).^2/2)/...
      (sqrt(2*pi)*h) /n;
    plot(x,f);
    fhat = fhat + f;
end
plot(x,fhat,'r');
hold off;


The area under the density should be unity. This can be checked by the command:

>> sum(fhat*(x(2)-x(1)))

ans =

    1.0000

Matlab already has the built-in function ksdensity() for the same purpose. 

The method the kernel density estimation here is known as "Parzen Window" method.
ref:  Kernel Density Estimation

No comments:

Post a Comment