info prev up next book cdrom email home

Maximum Likelihood

The procedure of finding the value of one or more parameters for a given statistic which makes the known Likelihood distribution a Maximum. The maximum likelihood estimate for a parameter $\mu$ is denoted $\hat\mu$.

For a Bernoulli Distribution,

{d\over d\theta} \left[{{N\choose Np} \theta^{Np}(1-\theta )^{Nq}}\right]= Np(1-\theta)-\theta Nq = 0,
\end{displaymath} (1)

so maximum likelihood occurs for $\theta=p$. If $p$ is not known ahead of time, the likelihood function is

$\displaystyle f(x_1,\ldots,x_n\vert p)$ $\textstyle =$ $\displaystyle P(X_1=x_1,\ldots,X_n=x_n\vert p)$  
  $\textstyle =$ $\displaystyle p^{x_1}(1-p)^{1-x_1}\cdots p^{x_n}(1-p)^{1-x_1n}$  
  $\textstyle =$ $\displaystyle p^{\Sigma x_i}(1-p)^{\Sigma (1-x_i)} = p^{\Sigma x_i}(1-p)^{n-\Sigma x_i},$ (2)

where $x=0$ or 1, and $i=1$, ..., $n$.
\ln f=\sum x_i\ln p+\left({n-\sum x_i}\right)\ln(1-p)
\end{displaymath} (3)

{d(\ln f)\over dp} = {\sum x_i\over p}-{n-\sum x_i\over 1-p} = 0
\end{displaymath} (4)

\sum x_i-p\sum x_i = np-p\sum x_i
\end{displaymath} (5)

\hat p={\sum x_i\over n}.
\end{displaymath} (6)

For a Gaussian Distribution,

f(x_1,\ldots,x_n\vert\mu,\sigma) = \prod {1\over\sigma\sqrt{...
...xp}\nolimits \left[{-{\sum (x_i-\mu)^2\over 2\sigma^2}}\right]
\end{displaymath} (7)

\ln f=-{\textstyle{1\over 2}}n\ln(2\pi)-n\ln\sigma-{\sum(x_i-\mu)^2\over 2\sigma^2}
\end{displaymath} (8)

{\partial(\ln f)\over\partial\mu} = {\sum (x_i-\mu)\over\sigma^2}=0
\end{displaymath} (9)

\hat \mu = {\sum x_i\over n}.
\end{displaymath} (10)

{\partial(\ln f)\over\partial\sigma} = -{n\over\sigma}+{\sum(x_i-\mu)^2\over\sigma^3}
\end{displaymath} (11)

\hat\sigma=\sqrt{\sum(x_i-\hat\mu)^2\over n}.
\end{displaymath} (12)

Note that in this case, the maximum likelihood Standard Deviation is the sample Standard Deviation, which is a Biased Estimator for the population Standard Deviation.

For a weighted Gaussian Distribution,

f(x_1,\ldots,x_n\vert\mu,\sigma) = \prod {1\over\sigma_i\sqr...
...xp}\nolimits \left[{-{\sum (x_i-\mu)^2\over 2\sigma^2}}\right]
\end{displaymath} (13)

\ln f=-{\textstyle{1\over 2}}n\ln(2\pi)-n\sum \ln\sigma_i-\sum {(x_i-\mu)^2\over 2{\sigma_i}^2}
\end{displaymath} (14)

{\partial(\ln f)\over\partial \mu} = \sum {(x_i-\mu)\over {\...
...^2} = \sum{x_i\over{\sigma_i}^2}-\mu\sum{1\over{\sigma_i}^2}=0
\end{displaymath} (15)

\hat \mu = {\sum {x_i\over {\sigma_i}^2}\over \sum{1\over{\sigma_i}^2}}.
\end{displaymath} (16)

The Variance of the Mean is then
{\sigma_\mu}^2 = \sum {\sigma_i}^2\left({\partial \mu\over\partial x_i}\right)^2.
\end{displaymath} (17)

{\partial \mu\over\partial x_i} = {\partial\over\partial x_i...
.../{\sigma_i}^2)} = {1/{\sigma_i}^2\over \sum (1/{\sigma_i}^2)},
\end{displaymath} (18)

$\displaystyle {\sigma_\mu}^2$ $\textstyle =$ $\displaystyle \sum {\sigma_i}^2 \left({1/{\sigma_i}^2\over \sum (1/{\sigma_i}^2)}\right)^2$  
  $\textstyle =$ $\displaystyle \sum {1/{\sigma_i}^2\over \left[{\sum (1/{\sigma_i}^2)}\right]^2} = {1\over\sum (1/{\sigma_i}^2)}.$ (19)

For a Poisson Distribution,

f(x_1,\ldots,x_n\vert\lambda) = {e^{-\lambda} \lambda^{x_1}\...
..._n!} = {e^{-n\lambda}\lambda^{\sum x_i} \over x_1!\cdots x_n!}
\end{displaymath} (20)

\ln f=-n\lambda+(\ln\lambda)\sum x_i-\ln\left({\prod x_i!}\right)
\end{displaymath} (21)

{d(\ln f)\over\lambda}=-n+{\sum x_i\over\lambda}=0
\end{displaymath} (22)

\hat \lambda={\sum x_i\over n}.
\end{displaymath} (23)

See also Bayesian Analysis


Press, W. H.; Flannery, B. P.; Teukolsky, S. A.; and Vetterling, W. T. ``Least Squares as a Maximum Likelihood Estimator.'' §15.1 in Numerical Recipes in FORTRAN: The Art of Scientific Computing, 2nd ed. Cambridge, England: Cambridge University Press, pp. 651-655, 1992.

info prev up next book cdrom email home

© 1996-9 Eric W. Weisstein