Gaussian Distribution

$\begin{figure}\begin{center}\BoxedEPSF{NormalDistribution.epsf scaled 650}\end{center}\end{figure}$

The Gaussian probability distribution with Mean $\mu$ and Standard Deviation $\sigma$ is a Gaussian Function of the form

$\begin{displaymath} P(x) = {1\over\sigma \sqrt{2\pi}} e^{-(x-\mu)^2/2\sigma^2}, \end{displaymath}$

(1)

where $P(x)\,dx$ gives the probability that a variate with a Gaussian distribution takes on a value in the range

. This distribution is also called the Normal Distribution or, because of its curved flaring shape, the Bell Curve. The distribution

is properly normalized for $x\in(-\infty, \infty)$ since

$\begin{displaymath} \int_{-\infty}^\infty P(x)\,dx=1. \end{displaymath}$

(2)

The cumulative Distribution Function, which gives the probability that a variate will assume a value $\leq x$ , is then

$\begin{displaymath} D(x)\equiv \int_{-\infty}^x P(x)\,dx = {1\over \sigma\sqrt{2\pi}}\int_{-\infty}^x e^{-(x-\mu)^2/2\sigma^2}\,dx. \end{displaymath}$

(3)

Gaussian distributions have many convenient properties, so random variates with unknown distributions are often assumed to be Gaussian, especially in physics and astronomy. Although this can be a dangerous assumption, it is often a good approximation due to a surprising result known as the Central Limit Theorem. This theorem states that the Mean of any set of variates with any distribution having a finite Mean and Variance tends to the Gaussian distribution. Many common attributes such as test scores, height, etc., follow roughly Gaussian distributions, with few members at the high and low ends and many in the middle.

Making the transformation

$\begin{displaymath} z\equiv{x-\mu\over\sigma} \end{displaymath}$

(4)

so that $dz=dz/\sigma$ gives a variate with unit Variance and 0 Mean

$\begin{displaymath} P(x)\,dx={1\over\sqrt{2\pi}}e^{-z^2/2}\,dz, \end{displaymath}$

(5)

known as a standard Normal Distribution. So defined,

is known as a z-Score.

The Normal Distribution Function gives the probability that a standard normal variate assumes a value in the interval ,

$\begin{displaymath} \Phi(z)\equiv {1\over\sqrt{2\pi}}\int_0^z e^{-x^2/2}\,dx = {... ...er 2}}\mathop{\rm erf}\nolimits \left({z\over\sqrt{2}}\right). \end{displaymath}$

(6)

Here, Erf is a function sometimes called the error function. Neither $\Phi(z)$ nor Erf can be expressed in terms of finite additions, subtractions, multiplications, and root extractions, and so both must be either computed numerically or otherwise approximated. The value of

for which

falls within the interval

with a given probability

is called the

Confidence Interval.

The Gaussian distribution is also a special case of the Chi-Squared Distribution, since substituting

$\begin{displaymath} z\equiv {(x-\mu)^2\over\sigma^2} \end{displaymath}$

(7)

so that

$\begin{displaymath} dz={\textstyle{1\over 2}}{2(x-\mu)\over\sigma}\,dx = \sqrt{z\over\sigma}\,dx \end{displaymath}$

(8)

(where an extra factor of 1/2 has been added to

since

runs from 0 to $\infty$ instead of from $-\infty$ to $\infty$ ), gives

$\displaystyle P(x)\,dx$	$\textstyle =$	$\displaystyle {1\over\sqrt{2\pi}}e^{-(z/\sigma)/2}\left({z\over\sigma}\right)^{-1/2}d\left({z\over\sigma}\right)\,dz$
	$\textstyle =$	$\displaystyle {1\over 2^{1/2}\Gamma({\textstyle{1\over 2}})}e^{-(z/\sigma)/2}\left({z\over\sigma}\right)^{-1/2}d\left({z\over\sigma}\right)\,dz,$	(9)

which is a Chi-Squared Distribution in $z/\sigma$ with

(i.e., a Gamma Distribution with $\alpha=1/2$ and $\theta=2$ ).

Cramer showed in 1936 that if and are Independent variates and has a Gaussian distribution, then both and must be Gaussian (Cramer's Theorem).

The ratio of independent Gaussian-distributed variates with zero Mean is distributed with a Cauchy Distribution. This can be seen as follows. Let and both have Mean 0 and standard deviations of $\sigma_x$ and $\sigma_y$ , respectively, then the joint probability density function is the Gaussian Bivariate Distribution with $\rho=0$ ,

$\begin{displaymath} f(x,y)={1\over 2\pi\sigma_x\sigma_y} e^{-[x^2/(2{\sigma_x}^2)+y^2/(2{\sigma_y}^2)]}. \end{displaymath}$

(10)

From Ratio Distribution, the distribution of

$\displaystyle P(u)$	$\textstyle =$	$\displaystyle \int_{-\infty}^\infty \vert x\vert f(x,ux)\,dx$
	$\textstyle =$	$\displaystyle {1\over 2\pi\sigma_x\sigma_y} \int_{-\infty}^\infty \vert x\vert e^{-[x^2/(2{\sigma_x}^2)+u^2 x^2/(2{\sigma_y}^2)]}$
	$\textstyle =$	$\displaystyle {1\over\pi\sigma_x\sigma_y} \int_0^\infty x\mathop{\rm exp}\nolim... ...-x^2\left({{1\over 2{\sigma_x}^2}+{u^2\over 2{\sigma_y}^2}}\right)}\right]\,dx.$
			(11)

But

$\begin{displaymath} \int_0^\infty xe^{-ax^2}\,dx =\left[{-{1\over 2a} e^{-ax^2}}\right]_0^\infty={1\over 2a}[0-(-1)] = {1\over 2a}, \end{displaymath}$

(12)

$\displaystyle P(u)$	$\textstyle =$	$\displaystyle {1\over\pi\sigma_x\sigma_y} {1\over 2\left({{1\over 2{\sigma_x}^2... ...^2}}\right)} = {1\over\pi} {\sigma_x\sigma_y\over u^2{\sigma_x}^2+{\sigma_y}^2}$
	$\textstyle =$	$\displaystyle {1\over\pi} {{\sigma_y\over\sigma_x}\over u^2+\left({\sigma_y\over\sigma_x}\right)^2},$	(13)

which is a Cauchy Distribution with Mean $\mu=0$ and full width

$\begin{displaymath} \Gamma={2\sigma_y\over\sigma_x}. \end{displaymath}$

(14)

The Characteristic Function for the Gaussian distribution is

$\begin{displaymath} \phi(t)=e^{imt-\sigma^2 t^2/2}, \end{displaymath}$

(15)

and the Moment-Generating Function is

$\displaystyle M(t)$	$\textstyle =$	$\displaystyle \left\langle{e^{tx}}\right\rangle{} = \int_{-\infty}^\infty {e^{tx}\over \sigma \sqrt{2\pi }} e^{-(x-\mu)^2/2\sigma^2}\,dx$
	$\textstyle =$	$\displaystyle {1\over \sigma \sqrt{2\pi}} \int_{-\infty}^\infty \!\mathop{\rm e... ...{- {1\over 2\sigma^2} \left[{x^2-2(\mu +\sigma^2t)x+\mu^2}\right]}\right\}\,dx.$	(16)

Completing the Square in the exponent,

$\begin{displaymath} {1\over 2\sigma^2} [x^2-2(\mu +\sigma^2t)x+\mu^2]= {1\over 2... ...\{{[x-(\mu +\sigma^2t)]^2+[\mu^2-(\mu +\sigma^2t)^2]}\right\}. \end{displaymath}$

(17)

Let

$\displaystyle y$	$\textstyle \equiv$	$\displaystyle x-(\mu +\sigma^2t)$	(18)
$\displaystyle dy$	$\textstyle =$	$\displaystyle dx$	(19)
$\displaystyle a$	$\textstyle \equiv$	$\displaystyle {1\over 2\sigma^2}.$	(20)

The integral then becomes

$\displaystyle M(t)$	$\textstyle =$	$\displaystyle {1\over\sigma\sqrt{ 2\pi}} \int_{-\infty}^\infty \mathop{\rm exp}\nolimits \left[{-ay^2 + {2\mu\sigma^2t+\sigma^4t^2\over 2\sigma^2}}\right]\,dy$
	$\textstyle =$	$\displaystyle {1\over\sigma\sqrt{2\pi}} \int_{-\infty}^\infty \mathop{\rm exp}\nolimits [-ay^2+\mu t+{\textstyle{1\over 2}}\sigma^2 t^2]\,dy$
	$\textstyle =$	$\displaystyle {1\over\sigma\sqrt{2\pi}} e^{\mu t+\sigma^2t^2/2}\int_{-\infty}^\infty e^{-ay^2}\,dy$
	$\textstyle =$	$\displaystyle {1\over\sigma\sqrt{2\pi}} \sqrt{\pi\over a} \,e^{\mu t+\sigma^2t^2/2}$
	$\textstyle =$	$\displaystyle {\sqrt{2\sigma^2\pi}\over \sigma\sqrt{2\pi}}\, e^{\mu t+\sigma^2t^2/2} = e^{\mu t + \sigma^2t^2/2},$	(21)

$\displaystyle M'(t)$	$\textstyle =$	$\displaystyle (\mu +\sigma^2t)e^{\mu t+\sigma^2 t^2/2}$	(22)
$\displaystyle M''(t)$	$\textstyle =$	$\displaystyle \sigma^2e^{\mu t+\sigma^2t^2/2} +e^{\mu t+\sigma^2t^2/2}(\mu +t\sigma^2)^2,$	(23)

and

$\displaystyle \mu$	$\textstyle =$	$\displaystyle M'(0) = \mu$	(24)
$\displaystyle \sigma^2$	$\textstyle =$	$\displaystyle M''(0)-[M'(0)]^2$
	$\textstyle =$	$\displaystyle (\sigma^2+\mu^2)-\mu^2=\sigma^2.$	(25)

These can also be computed using

$\displaystyle R(t)$	$\textstyle =$	$\displaystyle \ln[M(t)] = \mu t + {\textstyle{1\over 2}}\sigma^2t^2$	(26)
$\displaystyle R'(t)$	$\textstyle =$	$\displaystyle \mu + \sigma^2t$	(27)
$\displaystyle R''(t)$	$\textstyle =$	$\displaystyle \sigma^2,$	(28)

yielding, as before,

$\displaystyle \mu$	$\textstyle =$	$\displaystyle R'(0) = \mu$	(29)
$\displaystyle \sigma^2$	$\textstyle =$	$\displaystyle R''(0) = \sigma^2.$	(30)

The moments can also be computed directly by computing the Moments about the origin $\mu'_n\equiv\left\langle{x^n}\right\rangle{}$ ,

$\begin{displaymath} \mu'_n={1\over\sigma\sqrt{2\pi}} \int_{-\infty}^\infty x^ne^{-(x-\mu)^2/2\sigma^2}\,dx. \end{displaymath}$

(31)

Now let

$\displaystyle u$	$\textstyle \equiv$	$\displaystyle {x-\mu\over\sqrt{2}\sigma}$	(32)
$\displaystyle du$	$\textstyle =$	$\displaystyle {dx\over\sqrt{2}\sigma}$	(33)
$\displaystyle x$	$\textstyle =$	$\displaystyle \sigma u\sqrt{2}+\mu,$	(34)

giving

$\begin{displaymath} \mu'_n={\sqrt{2}\,\sigma\over\sigma\sqrt{2\pi}} \int_{-\inft... ...u = {1\over\sqrt{\pi}} \int_{-\infty}^\infty x^n e^{-u^2}\,du, \end{displaymath}$

(35)

$\displaystyle \mu'_0$	$\textstyle =$	$\displaystyle 1$	(36)
$\displaystyle \mu'_1$	$\textstyle =$	$\displaystyle {1\over\sqrt{\pi}} \int_{-\infty}^\infty xe^{-u^2}\,du$
	$\textstyle =$	$\displaystyle {1\over\sqrt{\pi}} \int_{-\infty}^\infty (\sqrt{2}\,\sigma u+\mu)e^{-u^2}\,du$
	$\textstyle =$	$\displaystyle [\sqrt{2}\sigma H_1(1)+\mu H_0(1)] = (0+\mu)=\mu$	(37)
$\displaystyle {\mu'}_2$	$\textstyle =$	$\displaystyle {1\over\sqrt{\pi}} \int_{-\infty}^\infty x^2e^{-u^2}\,du$
	$\textstyle =$	$\displaystyle {1\over\sqrt{\pi}} \int_{-\infty}^\infty (2\sigma^2u^2+2\sqrt{2}\,\sigma\mu u+\mu^2)e^{-u^2}\,du$
	$\textstyle =$	$\displaystyle [2\sigma^2 H_2(1)+2\sqrt{2}\,\sigma\mu H_1(1)+\mu^2 H_0(1)]$
	$\textstyle =$	$\displaystyle (2\sigma^2{\textstyle{1\over 2}}+0+\mu^2)=\mu^2+\sigma^2$	(38)
$\displaystyle \mu'_3$	$\textstyle =$	$\displaystyle {1\over\sqrt{\pi}} \int_{-\infty}^\infty x^3e^{-u^2}\,du$
	$\textstyle =$	$\displaystyle {1\over\sqrt{\pi}} \int_{-\infty}^\infty (2\sqrt{2}\,\sigma^3 u^3+6\mu\sigma^2u^2+3\sqrt{2}\,\mu^2\sigma u+\mu^3)e^{-u^2}\,du$
	$\textstyle =$	$\displaystyle [2\sqrt{2}\,\sigma^3 H_3(1)+6\mu\sigma^2 H_2(1)+3\sqrt{2}\, \mu^2\sigma H_1(1)+\mu^3 H_0(1)]$
	$\textstyle =$	$\displaystyle (0+6\mu^2\sigma^2{\textstyle{1\over 2}}+0+\mu^3) = \mu(\mu^2+3\sigma^2)$	(39)
$\displaystyle \mu'_4$	$\textstyle =$	$\displaystyle {1\over\sqrt{\pi}} \int_{-\infty}^\infty x^3e^{-u^2}\,du$
	$\textstyle =$	$\displaystyle {1\over\sqrt{\pi}} \int_{-\infty}^\infty (4\sigma^4u^4+8\sqrt{2}\,\mu\sigma^3 u^3+12\mu^2\sigma^2 u^2+4\sqrt{2}\,\mu^3\sigma u+\mu^4)e^{-u^2}\,du$
	$\textstyle =$	$\displaystyle [4\sigma^4H_4(1)+8\sqrt{2}\,\mu\sigma^3 H_3(1)+12\mu^2\sigma^2 H_2(1)+4\sqrt{2}\,\mu^3\sigma H_1(1)+\mu^4 H_0(1)]$
	$\textstyle =$	$\displaystyle (4\sigma^4 {\textstyle{3\over 4}}+0+12\mu^2\sigma^2{\textstyle{1\over 2}}+0+\mu^4)$
	$\textstyle =$	$\displaystyle \mu^4+6\mu^2\sigma^2+3\sigma^4,$	(40)

where

are Gaussian Integrals.

Now find the Moments about the Mean,

$\displaystyle \mu_1$	$\textstyle \equiv$	$\displaystyle 0$	(41)
$\displaystyle \mu_2$	$\textstyle \equiv$	$\displaystyle \mu'_2-(\mu'_1)^2=(\mu^2+\sigma^2)-\mu^2=\sigma^2$	(42)
$\displaystyle \mu_3$	$\textstyle \equiv$	$\displaystyle \mu'_3-3\mu'_2\mu'_1+2(\mu'_1)^3$
	$\textstyle =$	$\displaystyle \mu(\mu^2+3\sigma^2)-3(\sigma^2+\mu^2)\mu+2\mu^3=0$	(43)
$\displaystyle \mu_4$	$\textstyle \equiv$	$\displaystyle \mu'_4-4\mu'_3\mu'_1+6\mu'_2(\mu'_1)^2-3(\mu'_1)^4$
	$\textstyle =$	$\displaystyle (\mu^4+6\mu^2\sigma^2+3\sigma^4)-4(\mu^3+3\mu\sigma^2)\mu+6(\mu^2+\sigma^2)\mu^2-3\mu^4$
	$\textstyle =$	$\displaystyle 3\sigma^4,$	(44)

so the Variance, Standard Deviation, Skewness, and Kurtosis are given by

$\displaystyle \mathop{\rm var}\nolimits (x)$	$\textstyle \equiv$	$\displaystyle \mu_2 = \sigma^2$	(45)
$\displaystyle {\rm stdv}\,(x)$	$\textstyle \equiv$	$\displaystyle \sqrt{\mathop{\rm var}\nolimits (x)} = \sigma$	(46)
$\displaystyle \gamma_1$	$\textstyle =$	$\displaystyle {\mu_3\over\sigma^3}=0$	(47)
$\displaystyle \gamma_2$	$\textstyle =$	$\displaystyle {\mu_4\over\sigma^4}-3 = {3\sigma^4\over\sigma^4}-3=0.$	(48)

The Variance of the Sample Variance for a sample taken from a population with a Gaussian distribution is

$\displaystyle \mathop{\rm var}\nolimits (s^2)$	$\textstyle =$	$\displaystyle {(N-1)[(N-1)\mu'_4-(N-3){\mu'_2}^2\over N^3}$
	$\textstyle =$	$\displaystyle {(N-1)\over N^3} [(N-1)(\mu^4+6\mu^2\sigma^2+3\sigma^4)-(N-3)(\mu^2+\sigma^2)^2]$
	$\textstyle =$	$\displaystyle {2(N-1)(\mu^4+2\mu^2 N\sigma^2+N\sigma^4)\over N^3}.$	(49)

If $\mu=0$ , this expression simplifies to

$\begin{displaymath} \mathop{\rm var}\nolimits (s^2) = {2(N-1)N\sigma^4\over N^3} = {2\sigma^4(N-1)\over N^2}, \end{displaymath}$

(50)

and the Standard Error is

$\begin{displaymath} \hbox{[standard error]}={\sqrt{2(N-1)}\over N}. \end{displaymath}$

(51)

The Cumulant-Generating Function for a Gaussian distribution is

$\begin{displaymath} K(h)=\ln(e^{\nu_1h}e^{\sigma^2 h^2/2}) = \nu_1h+{\textstyle{1\over 2}}\sigma^2 h^2, \end{displaymath}$

(52)

$\displaystyle \kappa_1$	$\textstyle =$	$\displaystyle \nu_1$	(53)
$\displaystyle \kappa_2$	$\textstyle =$	$\displaystyle \sigma^2$	(54)
$\displaystyle \kappa_r$	$\textstyle =$	$\displaystyle 0 \quad {\rm for\ } r>2.$	(55)

For Gaussian variates, $\kappa_r=0$ for , so the variance of k-Statistic is

$\displaystyle \mathop{\rm var}\nolimits (k_3)$	$\textstyle =$	$\displaystyle {\kappa_6\over N}+{9\kappa_2\kappa_4\over N-1}+{9{\kappa_3}^2\over N-1}+ {6{\kappa_2}^3\over N(N-1)(N-2)}$
	$\textstyle =$	$\displaystyle {6{\kappa_2}^3\over N(N-1)(N-2)}.$	(56)

Also,

$\displaystyle \mathop{\rm var}\nolimits (k_4)$	$\textstyle =$	$\displaystyle {24{k_2}^4N(N-1)^2\over (N-3)(N-2)(N+3)(N+5)}$	(57)
$\displaystyle \mathop{\rm var}\nolimits (g_1)$	$\textstyle =$	$\displaystyle {6N(N-1)\over (N-2)(N+1)(N+3)}$	(58)
$\displaystyle \mathop{\rm var}\nolimits (g_2)$	$\textstyle =$	$\displaystyle {24N(N-1)^2\over (N-3)(N-2)(N+3)(N+5)},$	(59)

where

$\displaystyle g_1$	$\textstyle \equiv$	$\displaystyle {k_3\over {k_2}^{3/2}}$	(60)
$\displaystyle g_2$	$\textstyle \equiv$	$\displaystyle {k_4\over {k_2}^2}.$	(61)

If is a Gaussian distribution, then

$\begin{displaymath} D(x)={1\over 2}\left[{1+\mathop{\rm erf}\nolimits \left({x-\mu\over \sigma\sqrt{2}}\right)}\right], \end{displaymath}$

(62)

so variates

with a Gaussian distribution can be generated from variates

having a Uniform Distribution in (0,1) via

$\begin{displaymath} x_i=\sigma\sqrt{2}\,\mathop{\rm erf}\nolimits ^{-1}(2y_i-1)+\mu. \end{displaymath}$

(63)

However, a simpler way to obtain numbers with a Gaussian distribution is to use the Box-Muller Transformation.

The Gaussian distribution is an approximation to the Binomial Distribution in the limit of large numbers,

$\begin{displaymath} P(n_1) = {1\over\sqrt{ 2\pi Npq}}\,\, \mathop{\rm exp}\nolimits \left[{-{(n_1-Np)^2\over 2Npq}}\right], \end{displaymath}$

(64)

where

is the number of steps in the Positive direction,

is the number of trials ( $N\equiv n_1+n_2$ ), and

and

are the probabilities of a step in the Positive direction and Negative direction ( $q\equiv 1-p$ ).

The differential equation having a Gaussian distribution as its solution is

$\begin{displaymath} {dy\over dx} = {y(\mu-x)\over \sigma^2}, \end{displaymath}$

(65)

since

$\begin{displaymath} {dy\over y}={\mu-x\over \sigma^2}\,dx \end{displaymath}$

(66)

$\begin{displaymath} \ln\left({y\over y_0}\right)= -{1\over 2\sigma^2} (\mu-x)^2 \end{displaymath}$

(67)

$\begin{displaymath} y=y_0e^{-(x-\mu)^2/2\sigma^2}. \end{displaymath}$

(68)

This equation has been generalized to yield more complicated distributions which are named using the so-called Pearson System.

References

Beyer, W. H. CRC Standard Mathematical Tables, 28th ed. Boca Raton, FL: CRC Press, pp. 533-534, 1987.

Kraitchik, M. ``The Error Curve.'' §6.4 in Mathematical Recreations. New York: W. W. Norton, pp. 121-123, 1942.

Spiegel, M. R. Theory and Problems of Probability and Statistics. New York: McGraw-Hill, p. 109-111, 1992.