Tuesday, June 15, 2010

Model-based Clustering

Model-Based Clustering (MBC) assumes that the data comes from several subpopulations, each is model separately by $g_k({\bf x}; {\bf \theta_k})$. The overall population is therefore a finite mixture model.

\[f({\bf x};p,{\bf \theta})=\sum\limit_{k=1}^c p_k g_k({\bf x}; {\bf \theta_k})\]
The weights are given by $p_k$, which is also called the mixing proportions or mixing coefficients.

The most common used density is multivariate normal (Gaussian), therefore, the MBC can assume that the data come from a multivariate Gaussian finite mixture.

There are three components of the MBC.
1. Agglomerative clustering is used to partition the data. Each partition then becomes the initial starting values for the EM algorithm.
2. The EM algorithm is used to estimate the parameters of the finite mixture.
3. The Bayesian Information Criterion (BIC) is used to choose the best grouping and number of clusters, given the data. It is an approximation to Bayes factors.

No comments:

Post a Comment