Saturday, July 10, 2010

Likelihood Function in Machine Learning

The (log) probability function $p({\bf x}|\theta)$ assigns a probability (density) to any joint configuration of variables $\bf x$ given fixed parameter $\theta$.

Instead, thinking of $p({\bf x}|\theta)$ as a function of $\theta$ for fixed $\bf x$:
\[L(\theta;{\bf x}) = p({\bf x}|\theta)\]
\[l(\theta;{\bf x})=log\;p({\bf x}|\theta)\]
This function is called the (log) "likelihood"

if we choose $\theta$ that maximizes some cost function $c(\theta)$.

$c(\theta) = l(\theta;D)$  ==> maximum likelihood (ML)
$c(\theta) = l(\theta;D)+r(\theta)$ ==> maximum a posteriori/penalty BIC, AIC, ...

Statistical Machine Learning Problems

There are 4 basics problems in machine learning community: density estimation, clustering, classification and regression.

These problem can also be formulated in statistical framework.

Regression: $p({\bf y}|{\bf x})=p({\bf y,x})/p({\bf x})=p({\bf y,x})/\int p({\bf y,x})d{\bf y}$
Classification: $p(c|{\bf x})=p(c,{\bf x})/p({\bf x})=p(c,{\bf x})/\sum_c p(c,{\bf x})$
Clustering: $p(c|{\bf x})=p(c,{\bf x})/p({\bf x})$,  $c$ is unobserved
Density Estimation:  $p({\bf y}|{\bf x})=p({\bf y,x})/p({\bf x})$, $\bf x$ is unobserved