info prev up next book cdrom email home

Hypergeometric Distribution

Let there be $n$ ways for a successful and $m$ ways for an unsuccessful trial out of a total of $n+m$ possibilities. Take $N$ samples and let $x_i$ equal 1 if selection $i$ is successful and 0 if it is not. Let $x$ be the total number of successful selections,

x \equiv \sum_{i=1}^N x_i.
\end{displaymath} (1)

The probability of $i$ successful selections is then

$\displaystyle P(x=i)$ $\textstyle =$ $\displaystyle {\hbox{[\char93  ways for $i$\ successes][\char93  ways for $N-i$\ unsuccesses]}\over\hbox{[total number of ways to select]}}$  
  $\textstyle =$ $\displaystyle {{n \choose i}{m \choose N-i}\over {n+m \choose N}} = {{n!\over i...
...{(n+m)!\over N!(N-n-m)!}} = {n!m!N!(N-m-n)!\over i!(n-i)!(m+i-N)!(N-i)!(n+m)!}.$ (2)

The $i$th selection has an equal likelihood of being in any trial, so the fraction of acceptable selections $p$ is
p \equiv {n\over n+m}
\end{displaymath} (3)

P(x_i = 1) = {n\over n+m}\equiv p.
\end{displaymath} (4)

The expectation value of $x$ is
$\displaystyle \mu$ $\textstyle \equiv$ $\displaystyle \langle x\rangle = \left\langle{\sum_{i=1}^N x_i}\right\rangle{} = \sum_{i=1}^N \left\langle{x_i}\right\rangle{}$  
  $\textstyle =$ $\displaystyle \sum_{i=1}^N {n\over n+m} = {nN\over n+m}=Np.$ (5)

The Variance is
\mathop{\rm var}\nolimits (x) \equiv \sum_{i=1}^N \mathop{\r...
...p \scriptstyle j\not = i} \mathop{\rm cov}\nolimits (x_i,x_j).
\end{displaymath} (6)

Since $x_i$ is a Bernoulli variable,
$\displaystyle \mathop{\rm var}\nolimits (x_i)$ $\textstyle =$ $\displaystyle p(1-p) = {n\over n+m} \left({1 - {n\over n+m}}\right)$  
  $\textstyle =$ $\displaystyle {n\over n+m} \left({1-{n\over n+m}}\right)$  
  $\textstyle =$ $\displaystyle {n\over n+m}\left({n+m-n\over n+m}\right)= {nm\over (n+m)^2},$ (7)

\sum_{i=1}^N \mathop{\rm var}\nolimits (x_i) = {Nnm\over (n+m)^2}.
\end{displaymath} (8)

For $i < j$, the Covariance is
\mathop{\rm cov}\nolimits (x_i,x_j) = \langle x_ix_j\rangle -\langle x_i\rangle \langle x_j\rangle.
\end{displaymath} (9)

The probability that both $i$ and $j$ are successful for $i \not = j$ is
$\displaystyle P(x_i = 1, x_j = 1)$ $\textstyle =$ $\displaystyle P(x_i = 1)P(x_j = 1\vert x_i = 1)$  
  $\textstyle =$ $\displaystyle {n\over n+m} {n-1\over n+m-1}$  
  $\textstyle =$ $\displaystyle {n(n-1)\over (n+m)(n+m-1)}.$ (10)

But since $x_i$ and $x_j$ are random Bernoulli variables (each 0 or 1), their product is also a Bernoulli variable. In order for $x_ix_j$ to be 1, both $x_i$ and $x_j$ must be 1,
$\displaystyle \left\langle{x_ix_j}\right\rangle{}$ $\textstyle =$ $\displaystyle P(x_ix_j = 1) = P(x_i = 1, x_j = 1)$  
  $\textstyle =$ $\displaystyle {n\over n+m} {n-1\over n+m-1}$  
  $\textstyle =$ $\displaystyle {n(n-1)\over (n+m)(n+m-1)}.$ (11)

Combining (11) with
\left\langle{x_i}\right\rangle{}\left\langle{x_j}\right\rangle{} = {n\over n+m}{n\over n+m} = {n^2\over (n+m)^2},
\end{displaymath} (12)

$\displaystyle \mathop{\rm cov}\nolimits (x_i,x_j)$ $\textstyle =$ $\displaystyle {(n+m)(n^2-n)-n^2(n+m-1)\over (n+m)^2(n+m-1)}$  
  $\textstyle =$ $\displaystyle {n^3+mn^2-n^2-mn-n^3-n^2m+n^2\over (n+m)^2(n+m-1)}$  
  $\textstyle =$ $\displaystyle - {mn\over (n+m)^2(n+m-1)}.$ (13)

There are a total of $N^2$ terms in a double summation over $N$. However, $i=j$ for $N$ of these, so there are a total of $N^2-N = N(N-1)$ terms in the Covariance summation
\sum_{i=1}^N \sum^N_{\scriptstyle j=1\atop\scriptstyle j\not...
...rm cov}\nolimits (x_i,x_j) = - {N(N-1)mn\over (n+m)^2(n+m-1)}.
\end{displaymath} (14)

Combining equations (6), (8), (11), and (14) gives the Variance
$\displaystyle \mathop{\rm var}\nolimits (x)$ $\textstyle =$ $\displaystyle {Nmn\over (n+m)^2} - {N(N-1)mn\over (n+m)^2(n+m-1)}$  
  $\textstyle =$ $\displaystyle {Nmn\over (m+n)^2}\left({1 - {N-1\over n+m-1}}\right)$  
  $\textstyle =$ $\displaystyle {Nmn\over (n+m)^2} \left({N+m-1-N+1\over n+m-1}\right)$  
  $\textstyle =$ $\displaystyle {Nmn(n+m-N)\over (n+m)^2(n+m-1)},$ (15)

so the final result is
\left\langle{x}\right\rangle{} = Np
\end{displaymath} (16)

and, since
1-p={m\over n+m}
\end{displaymath} (17)

np(1-p)={mn\over (n+m)^2},
\end{displaymath} (18)

we have
$\displaystyle \sigma^2$ $\textstyle =$ $\displaystyle \mathop{\rm var}\nolimits (x) = Np(1-p)\left({1 - {N-1\over n+m-1}}\right)$  
  $\textstyle =$ $\displaystyle {mnN(m+n-N)\over (m+n)^2(m+n-1)}.$ (19)

The Skewness is
$\displaystyle \gamma_1$ $\textstyle =$ $\displaystyle {q-p\over \sqrt{npq}} \sqrt{N-1\over N-m} \left({N-2n\over N-2}\right)$  
  $\textstyle =$ $\displaystyle {(m-n)(m+n-2N)\over m+n-2} \sqrt{m+n-1\over mnN(m+n-N)},$  

and the Kurtosis
\gamma_2= {F(m, n, N) \over mnN( -3 + m + n) ( -2 + m + n)( -m - n + N)},
\end{displaymath} (21)

$\displaystyle F(m, n, N)$ $\textstyle =$ $\displaystyle {m^3}-{m^5}+3{m^2}n-6{m^3}n+{m^4}n+3m{n^2}$  
  $\textstyle \phantom{=}$ $\displaystyle -12{m^2}{n^2}+8{m^3}{n^2}+{n^3}-6m{n^3}+8{m^2}{n^3}$  
  $\textstyle \phantom{=}$ $\displaystyle +m{n^4}-{n^5}-6{m^3}N+6{m^4}N+18{m^2}nN$  
  $\textstyle \phantom{=}$ $\displaystyle -6{m^3}nN+18m{n^2}N-24{m^2}{n^2}N-6{n^3}N$  
  $\textstyle \phantom{=}$ $\displaystyle -6m{n^3}N+6{n^4}N+6{m^2}{{N}^2}-6{m^3}{{N}^2}$  
  $\textstyle \phantom{=}$ $\displaystyle -24mn{{N}^2}+12{m^2}n{{N}^2}+6{n^2}{{N}^2}$  
  $\textstyle \phantom{=}$ $\displaystyle +12m{n^2}{{N}^2}-6{n^3}{{N}^2}.$ (22)

The Generating Function is
\phi(t)={{m\choose N}\over{n+m\choose N}}{}_2F_1(-N,-n; m-N+1; e^{it}),
\end{displaymath} (23)

where ${}_2F_1(a,b;c;z)$ is the Hypergeometric Function.

If the hypergeometric distribution is written

h_n(x,s)={{np\choose x}{nq\choose s-x}\over{n\choose s}},
\end{displaymath} (24)

\sum_{x=0}^s h_n(x,s)u^x = A\,{}_2F_1(-s,-np;nq-s+1;u).
\end{displaymath} (25)


Beyer, W. H. CRC Standard Mathematical Tables, 28th ed. Boca Raton, FL: CRC Press, pp. 532-533, 1987.

Spiegel, M. R. Theory and Problems of Probability and Statistics. New York: McGraw-Hill, pp. 113-114, 1992.

info prev up next book cdrom email home

© 1996-9 Eric W. Weisstein