Sunday, May 23, 2010

Maximum Likelihood Estimation

In statistics, there are two schools of thoughts, the frequentist and the Bayesian (more about them in later blogs). One of the most popular tools for the frequentist is Maximum Likelihood Estimations (MLE).

 \[\hat \theta = \mathop {\arg \max }\limits_{all\theta } L(\theta |{x_1},...x{}_n)\]

The procedure goes like this.

1. Collect the data. For example, height, weight, etc.
2. Make an assumption about the distribution of the data, i.e. Normal Distribution (the shape of the likelihood $f({x_1},...,{x_n}|\mu ,\sigma)$), but with unknown mean and variance.  The mean and variance are parameters we want to estimate.
3. Let's assume they are many ways to get the mean and variance, i.e. different values of mu and sigma. Which mean value gives highest likelihood? also Which variance value give hightest likelihood?
4. It turns out that, for this case, the answers are sample mean $\hat\mu = {\textstyle{{\sum {{x_i}} } \over n}}$, and sample variance $\hat\sigma = \sqrt {\frac{{\sum {({x_i} - \hat\mu} {)^2}}}{n}} $ that we are so familiar with.

See the proof here.

Another example: A coin was tossed 80 times, yields 49 heads, and 31 tails but we do not know which coin. There are three coins in the basket with probability of giving head 1/3, 1/2 and 2/3. Which coin was used for this activity?

Naive answer: The proportion of the heads is  49/80 = 0.6125. Three coins have the head prob = [0.3333 0.5 0.6666]. The 0.6125 is closer to 0.6666 than any other values. Therefore, we think coin #3 must be the one.

Better answer: The distribution of the data is binomial. With different values of p, we have:
>> binopdf(49,80,1/3)
ans =
  2.0789e-007

>> binopdf(49,80,1/2)
ans =
    0.0118

>> binopdf(49,80,2/3)

ans =
    0.0545


Coin #3 has maximum likelihood, therefore it must be coin #3.

The next question, What is the MLE of p?
>> simplify(diff(p^49 * (1-p)^31))
ans =
-p^48*(80*p - 49)*(p - 1)^30

To make the above equation equal to zero (and therefore maximize the likelihood), the answer for p is 0, 1 and 49/80. Thus MLE for p is 49/80 (same as the number of heads in the data).

No comments:

Post a Comment