The procedure of finding the value of one or more parameters for a given statistic which makes the known
Likelihood distribution a Maximum. The maximum likelihood estimate for a parameter
is denoted
.
For a Bernoulli Distribution,
![\begin{displaymath}
{d\over d\theta} \left[{{N\choose Np} \theta^{Np}(1-\theta )^{Nq}}\right]= Np(1-\theta)-\theta Nq = 0,
\end{displaymath}](m_716.gif) |
(1) |
so maximum likelihood occurs for
. If
is not known ahead of time, the likelihood function is
where
or 1, and
, ...,
.
![\begin{displaymath}
\ln f=\sum x_i\ln p+\left({n-\sum x_i}\right)\ln(1-p)
\end{displaymath}](m_724.gif) |
(3) |
![\begin{displaymath}
{d(\ln f)\over dp} = {\sum x_i\over p}-{n-\sum x_i\over 1-p} = 0
\end{displaymath}](m_725.gif) |
(4) |
![\begin{displaymath}
\sum x_i-p\sum x_i = np-p\sum x_i
\end{displaymath}](m_726.gif) |
(5) |
![\begin{displaymath}
\hat p={\sum x_i\over n}.
\end{displaymath}](m_727.gif) |
(6) |
For a Gaussian Distribution,
![\begin{displaymath}
f(x_1,\ldots,x_n\vert\mu,\sigma) = \prod {1\over\sigma\sqrt{...
...xp}\nolimits \left[{-{\sum (x_i-\mu)^2\over 2\sigma^2}}\right]
\end{displaymath}](m_728.gif) |
(7) |
![\begin{displaymath}
\ln f=-{\textstyle{1\over 2}}n\ln(2\pi)-n\ln\sigma-{\sum(x_i-\mu)^2\over 2\sigma^2}
\end{displaymath}](m_729.gif) |
(8) |
![\begin{displaymath}
{\partial(\ln f)\over\partial\mu} = {\sum (x_i-\mu)\over\sigma^2}=0
\end{displaymath}](m_730.gif) |
(9) |
gives
![\begin{displaymath}
\hat \mu = {\sum x_i\over n}.
\end{displaymath}](m_731.gif) |
(10) |
![\begin{displaymath}
{\partial(\ln f)\over\partial\sigma} = -{n\over\sigma}+{\sum(x_i-\mu)^2\over\sigma^3}
\end{displaymath}](m_732.gif) |
(11) |
gives
![\begin{displaymath}
\hat\sigma=\sqrt{\sum(x_i-\hat\mu)^2\over n}.
\end{displaymath}](m_733.gif) |
(12) |
Note that in this case, the maximum likelihood Standard Deviation is the sample Standard Deviation, which
is a Biased Estimator for the population Standard Deviation.
For a weighted Gaussian Distribution,
![\begin{displaymath}
f(x_1,\ldots,x_n\vert\mu,\sigma) = \prod {1\over\sigma_i\sqr...
...xp}\nolimits \left[{-{\sum (x_i-\mu)^2\over 2\sigma^2}}\right]
\end{displaymath}](m_734.gif) |
(13) |
![\begin{displaymath}
\ln f=-{\textstyle{1\over 2}}n\ln(2\pi)-n\sum \ln\sigma_i-\sum {(x_i-\mu)^2\over 2{\sigma_i}^2}
\end{displaymath}](m_735.gif) |
(14) |
![\begin{displaymath}
{\partial(\ln f)\over\partial \mu} = \sum {(x_i-\mu)\over {\...
...^2} = \sum{x_i\over{\sigma_i}^2}-\mu\sum{1\over{\sigma_i}^2}=0
\end{displaymath}](m_736.gif) |
(15) |
gives
![\begin{displaymath}
\hat \mu = {\sum {x_i\over {\sigma_i}^2}\over \sum{1\over{\sigma_i}^2}}.
\end{displaymath}](m_737.gif) |
(16) |
The Variance of the Mean is then
![\begin{displaymath}
{\sigma_\mu}^2 = \sum {\sigma_i}^2\left({\partial \mu\over\partial x_i}\right)^2.
\end{displaymath}](m_738.gif) |
(17) |
But
![\begin{displaymath}
{\partial \mu\over\partial x_i} = {\partial\over\partial x_i...
.../{\sigma_i}^2)} = {1/{\sigma_i}^2\over \sum (1/{\sigma_i}^2)},
\end{displaymath}](m_739.gif) |
(18) |
so
For a Poisson Distribution,
![\begin{displaymath}
f(x_1,\ldots,x_n\vert\lambda) = {e^{-\lambda} \lambda^{x_1}\...
..._n!} = {e^{-n\lambda}\lambda^{\sum x_i} \over x_1!\cdots x_n!}
\end{displaymath}](m_743.gif) |
(20) |
![\begin{displaymath}
\ln f=-n\lambda+(\ln\lambda)\sum x_i-\ln\left({\prod x_i!}\right)
\end{displaymath}](m_744.gif) |
(21) |
![\begin{displaymath}
{d(\ln f)\over\lambda}=-n+{\sum x_i\over\lambda}=0
\end{displaymath}](m_745.gif) |
(22) |
![\begin{displaymath}
\hat \lambda={\sum x_i\over n}.
\end{displaymath}](m_746.gif) |
(23) |
See also Bayesian Analysis
References
Press, W. H.; Flannery, B. P.; Teukolsky, S. A.; and Vetterling, W. T. ``Least Squares as a Maximum Likelihood
Estimator.'' §15.1 in
Numerical Recipes in FORTRAN: The Art of Scientific Computing, 2nd ed. Cambridge, England:
Cambridge University Press, pp. 651-655, 1992.
© 1996-9 Eric W. Weisstein
1999-05-26