Hypergeometric Distribution

Let there be ways for a successful and ways for an unsuccessful trial out of a total of possibilities. Take samples and let equal 1 if selection is successful and 0 if it is not. Let be the total number of successful selections,

$\begin{displaymath} x \equiv \sum_{i=1}^N x_i. \end{displaymath}$

(1)

The probability of

successful selections is then

$\displaystyle P(x=i)$	$\textstyle =$	$\displaystyle {\hbox{[\char93 ways for $i$\ successes][\char93 ways for $N-i$\ unsuccesses]}\over\hbox{[total number of ways to select]}}$
	$\textstyle =$	$\displaystyle {{n \choose i}{m \choose N-i}\over {n+m \choose N}} = {{n!\over i... ...{(n+m)!\over N!(N-n-m)!}} = {n!m!N!(N-m-n)!\over i!(n-i)!(m+i-N)!(N-i)!(n+m)!}.$	(2)

The

th selection has an equal likelihood of being in any trial, so the fraction of acceptable selections

$\begin{displaymath} p \equiv {n\over n+m} \end{displaymath}$

(3)

$\begin{displaymath} P(x_i = 1) = {n\over n+m}\equiv p. \end{displaymath}$

(4)

The expectation value of

$\displaystyle \mu$	$\textstyle \equiv$	$\displaystyle \langle x\rangle = \left\langle{\sum_{i=1}^N x_i}\right\rangle{} = \sum_{i=1}^N \left\langle{x_i}\right\rangle{}$
	$\textstyle =$	$\displaystyle \sum_{i=1}^N {n\over n+m} = {nN\over n+m}=Np.$	(5)

The Variance is

$\begin{displaymath} \mathop{\rm var}\nolimits (x) \equiv \sum_{i=1}^N \mathop{\r... ...p \scriptstyle j\not = i} \mathop{\rm cov}\nolimits (x_i,x_j). \end{displaymath}$

(6)

Since

is a Bernoulli variable,

$\displaystyle \mathop{\rm var}\nolimits (x_i)$	$\textstyle =$	$\displaystyle p(1-p) = {n\over n+m} \left({1 - {n\over n+m}}\right)$
	$\textstyle =$	$\displaystyle {n\over n+m} \left({1-{n\over n+m}}\right)$
	$\textstyle =$	$\displaystyle {n\over n+m}\left({n+m-n\over n+m}\right)= {nm\over (n+m)^2},$	(7)

$\begin{displaymath} \sum_{i=1}^N \mathop{\rm var}\nolimits (x_i) = {Nnm\over (n+m)^2}. \end{displaymath}$

(8)

For

, the Covariance is

$\begin{displaymath} \mathop{\rm cov}\nolimits (x_i,x_j) = \langle x_ix_j\rangle -\langle x_i\rangle \langle x_j\rangle. \end{displaymath}$

(9)

The probability that both

and

are successful for $i \not = j$ is

$\displaystyle P(x_i = 1, x_j = 1)$	$\textstyle =$	$\displaystyle P(x_i = 1)P(x_j = 1\vert x_i = 1)$
	$\textstyle =$	$\displaystyle {n\over n+m} {n-1\over n+m-1}$
	$\textstyle =$	$\displaystyle {n(n-1)\over (n+m)(n+m-1)}.$	(10)

But since

and

are random Bernoulli variables (each 0 or 1), their product is also a Bernoulli variable. In order for

to be 1, both

and

must be 1,

$\displaystyle \left\langle{x_ix_j}\right\rangle{}$	$\textstyle =$	$\displaystyle P(x_ix_j = 1) = P(x_i = 1, x_j = 1)$
	$\textstyle =$	$\displaystyle {n\over n+m} {n-1\over n+m-1}$
	$\textstyle =$	$\displaystyle {n(n-1)\over (n+m)(n+m-1)}.$	(11)

Combining (11) with

$\begin{displaymath} \left\langle{x_i}\right\rangle{}\left\langle{x_j}\right\rangle{} = {n\over n+m}{n\over n+m} = {n^2\over (n+m)^2}, \end{displaymath}$

(12)

gives

$\displaystyle \mathop{\rm cov}\nolimits (x_i,x_j)$	$\textstyle =$	$\displaystyle {(n+m)(n^2-n)-n^2(n+m-1)\over (n+m)^2(n+m-1)}$
	$\textstyle =$	$\displaystyle {n^3+mn^2-n^2-mn-n^3-n^2m+n^2\over (n+m)^2(n+m-1)}$
	$\textstyle =$	$\displaystyle - {mn\over (n+m)^2(n+m-1)}.$	(13)

There are a total of

terms in a double summation over

. However,

for

of these, so there are a total of

terms in the Covariance summation

$\begin{displaymath} \sum_{i=1}^N \sum^N_{\scriptstyle j=1\atop\scriptstyle j\not... ...rm cov}\nolimits (x_i,x_j) = - {N(N-1)mn\over (n+m)^2(n+m-1)}. \end{displaymath}$

(14)

Combining equations (6), (8), (11), and (14) gives the Variance

$\displaystyle \mathop{\rm var}\nolimits (x)$	$\textstyle =$	$\displaystyle {Nmn\over (n+m)^2} - {N(N-1)mn\over (n+m)^2(n+m-1)}$
	$\textstyle =$	$\displaystyle {Nmn\over (m+n)^2}\left({1 - {N-1\over n+m-1}}\right)$
	$\textstyle =$	$\displaystyle {Nmn\over (n+m)^2} \left({N+m-1-N+1\over n+m-1}\right)$
	$\textstyle =$	$\displaystyle {Nmn(n+m-N)\over (n+m)^2(n+m-1)},$	(15)

so the final result is

$\begin{displaymath} \left\langle{x}\right\rangle{} = Np \end{displaymath}$

(16)

and, since

$\begin{displaymath} 1-p={m\over n+m} \end{displaymath}$

(17)

and

$\begin{displaymath} np(1-p)={mn\over (n+m)^2}, \end{displaymath}$

(18)

we have

$\displaystyle \sigma^2$	$\textstyle =$	$\displaystyle \mathop{\rm var}\nolimits (x) = Np(1-p)\left({1 - {N-1\over n+m-1}}\right)$
	$\textstyle =$	$\displaystyle {mnN(m+n-N)\over (m+n)^2(m+n-1)}.$	(19)

The Skewness is

$\displaystyle \gamma_1$	$\textstyle =$	$\displaystyle {q-p\over \sqrt{npq}} \sqrt{N-1\over N-m} \left({N-2n\over N-2}\right)$
	$\textstyle =$	$\displaystyle {(m-n)(m+n-2N)\over m+n-2} \sqrt{m+n-1\over mnN(m+n-N)},$
			(20)

and the Kurtosis

$\begin{displaymath} \gamma_2= {F(m, n, N) \over mnN( -3 + m + n) ( -2 + m + n)( -m - n + N)}, \end{displaymath}$

(21)

where

$\displaystyle F(m, n, N)$	$\textstyle =$	$\displaystyle {m^3}-{m^5}+3{m^2}n-6{m^3}n+{m^4}n+3m{n^2}$
	$\textstyle \phantom{=}$	$\displaystyle -12{m^2}{n^2}+8{m^3}{n^2}+{n^3}-6m{n^3}+8{m^2}{n^3}$
	$\textstyle \phantom{=}$	$\displaystyle +m{n^4}-{n^5}-6{m^3}N+6{m^4}N+18{m^2}nN$
	$\textstyle \phantom{=}$	$\displaystyle -6{m^3}nN+18m{n^2}N-24{m^2}{n^2}N-6{n^3}N$
	$\textstyle \phantom{=}$	$\displaystyle -6m{n^3}N+6{n^4}N+6{m^2}{{N}^2}-6{m^3}{{N}^2}$
	$\textstyle \phantom{=}$	$\displaystyle -24mn{{N}^2}+12{m^2}n{{N}^2}+6{n^2}{{N}^2}$
	$\textstyle \phantom{=}$	$\displaystyle +12m{n^2}{{N}^2}-6{n^3}{{N}^2}.$	(22)

The Generating Function is

$\begin{displaymath} \phi(t)={{m\choose N}\over{n+m\choose N}}{}_2F_1(-N,-n; m-N+1; e^{it}), \end{displaymath}$

(23)

where ${}_2F_1(a,b;c;z)$ is the Hypergeometric Function.

If the hypergeometric distribution is written

$\begin{displaymath} h_n(x,s)={{np\choose x}{nq\choose s-x}\over{n\choose s}}, \end{displaymath}$

(24)

then

$\begin{displaymath} \sum_{x=0}^s h_n(x,s)u^x = A\,{}_2F_1(-s,-np;nq-s+1;u). \end{displaymath}$

(25)

References

Beyer, W. H. CRC Standard Mathematical Tables, 28th ed. Boca Raton, FL: CRC Press, pp. 532-533, 1987.

Spiegel, M. R. Theory and Problems of Probability and Statistics. New York: McGraw-Hill, pp. 113-114, 1992.