info prev up next book cdrom email home

Correlation Coefficient--Gaussian Bivariate Distribution

For a Gaussian Bivariate Distribution, the distribution of correlation Coefficients is given by


$\displaystyle P(r)$ $\textstyle =$ $\displaystyle {1\over \pi} (N-2)(1-r^2)^{(N-4)/2}(1-\rho^2)^{(N-1)/2}\int_0^\infty {d\beta\over (\cosh\beta -\rho r)^{N-1}}$  
  $\textstyle =$ $\displaystyle {1\over \pi} (N-2)(1-r^2)^{(N-4)/2}(1-\rho^2)^{(N-1)/2} \sqrt{\pi\over 2}\,
{\Gamma(N-1)\over \Gamma(N-{\textstyle{1\over 2}})}$  
  $\textstyle \phantom{=}$ $\displaystyle \mathop{\times} (1-\rho r)^{-(N-3/2)}{}_2F_1({\textstyle{1\over 2...
...textstyle{1\over 2}}, {\textstyle{2N-1\over 2}}; {\textstyle{\rho r+1\over 2}})$  
  $\textstyle =$ $\displaystyle {(N-2)\Gamma(N-1)(1-\rho^2)^{(N-1)/2}(1-r^2)^{(N-4)/2}\over\sqrt{2\pi}\,\Gamma(N-{\textstyle{1\over 2}})(1-\rho r)^{N-3/2}}$  
  $\textstyle \phantom{=}$ $\displaystyle \mathop{\times}
\left[{1+{1\over 4} {\rho r+1\over 2N-1} +{9\over 16} {(\rho r+1)^2\over (2N-1)(2N+1)} +\cdots}\right],$ (1)

where $\rho$ is the population correlation Coefficient, ${}_2F_1(a,b;c;x)$ is a Hypergeometric Function, and $\Gamma(z)$ is the Gamma Function (Kenney and Keeping 1951, pp. 217-221). The Moments are

$\displaystyle \left\langle{r}\right\rangle{}$ $\textstyle =$ $\displaystyle \rho-{\rho(1-\rho^2)\over 2n}$ (2)
$\displaystyle \mathop{\rm var}\nolimits (r)$ $\textstyle =$ $\displaystyle {(1-\rho^2)^2\over n} \left({1+{11\rho^2\over 2n}+\cdots}\right)$ (3)
$\displaystyle \gamma_1$ $\textstyle =$ $\displaystyle {6\rho\over \sqrt{n}}\left({1+{77\rho^2-30\over 12 n}+\cdots}\right)$  
$\displaystyle \gamma_2$ $\textstyle =$ $\displaystyle {6\over n}(12\rho^2-1)+\ldots,$ (4)

where $n\equiv N-1$. If the variates are uncorrelated, then $\rho=0$ and
$\displaystyle {}_2F_1({\textstyle{1\over 2}}, {\textstyle{1\over 2}}, {\textstyle{2N-1\over 2}}; {\textstyle{\rho r+1\over 2}})$ $\textstyle =$ $\displaystyle {}_2F_1({\textstyle{1\over 2}}, {\textstyle{1\over 2}}, {\textstyle{2N-1\over 2}}; {\textstyle{1\over 2}})$  
  $\textstyle =$ $\displaystyle {\Gamma(N-{\textstyle{1\over 2}})2^{3/2-N}\sqrt{\pi}\over [\Gamma({\textstyle{N\over 2}})]^2},$ (5)

so


$\displaystyle P(r)$ $\textstyle =$ $\displaystyle {(N-2)\Gamma(N-1)\over \sqrt{2\pi}\,\Gamma(N-{\textstyle{1\over 2...
...tstyle{1\over 2}}) 2^{3/2-N}\sqrt{\pi}\over [\Gamma({\textstyle{N\over 2}})]^2}$  
  $\textstyle =$ $\displaystyle {2^{1-N}(N-2)\Gamma(N-1)\over [\Gamma({\textstyle{N\over 2}})]^2}
(1-r^2)^{(N-4/2)}.$ (6)

But from the Legendre Duplication Formula,
\begin{displaymath}
\sqrt{\pi}\,\Gamma(N-1)=2^{N-2}\Gamma({\textstyle{N\over 2}})\Gamma({\textstyle{N-1\over 2}}),
\end{displaymath} (7)

so
$\displaystyle P(r)$ $\textstyle =$ $\displaystyle {(2^{1-N})(2^{N-2})(N-2)\Gamma({\textstyle{N\over 2}})\Gamma({\te...
...over 2}})\over\sqrt{\pi}\,[\Gamma({\textstyle{N\over 2}})]^2}
(1-r^2)^{(N-4)/2}$  
  $\textstyle =$ $\displaystyle {(N-2)\Gamma({\textstyle{N-1\over 2}})\over 2\sqrt{\pi}\,\Gamma({\textstyle{N\over 2}})}(1-r^2)^{(N-4)/2}$  
  $\textstyle =$ $\displaystyle {1\over\sqrt{\pi}} {{\nu\over 2} \Gamma({\textstyle{\nu+1\over 2}})\over\Gamma({\textstyle{\nu\over 2}}+1)} (1-r^2)^{(\nu-2)/2}$  
  $\textstyle =$ $\displaystyle {1\over\sqrt{\pi}} {\Gamma({\textstyle{\nu+1\over 2}})\over \Gamma({\textstyle{\nu\over 2}})}(1-r^2)^{(\nu-2)/2}.$ (8)


The uncorrelated case can be derived more simply by letting $\beta$ be the true slope, so that $\eta=\alpha+\beta x$. Then

\begin{displaymath}
t\equiv (b-\beta) {{\rm s}_x\over {\rm s}_y} \sqrt{N-2\over 1-r^2}
= {(b-\beta)r\over b} \sqrt{N-2\over 1-r^2}
\end{displaymath} (9)

is distributed as Student's t-Distribution with $\nu\equiv N-2$ Degrees of Freedom. Let the population regression Coefficient $\rho$ be 0, then $\beta=0$, so
\begin{displaymath}
t=r\sqrt{\nu\over 1-r^2},
\end{displaymath} (10)

and the distribution is
\begin{displaymath}
P(t)\,dt = {1\over\sqrt{\nu\pi}} {\Gamma({\textstyle{\nu+1\o...
...u\over 2}})
\left({1+{t^2\over\nu}}\right)^{(\nu+1)/2}}\,dt.
\end{displaymath} (11)

Plugging in for $t$ and using
$\displaystyle dt$ $\textstyle =$ $\displaystyle \sqrt{\nu} \left[{\sqrt{1-r^2}-r({\textstyle{1\over 2}})(-2r)(1-r^2)^{-1/2}\over 1-r^2}\right]\,dr$  
  $\textstyle =$ $\displaystyle \sqrt{\nu\over 1-r^2} \left({1-r^2+r^2\over 1-r^2}\right)\,dr$  
  $\textstyle =$ $\displaystyle \sqrt{\nu\over (1-r)^3}\,dr$ (12)

gives
$\displaystyle P(t) \,dt$ $\textstyle =$ $\displaystyle {1\over \sqrt{\nu\pi}} {\Gamma({\textstyle{\nu+1\over 2}})\over
\...
...left[{1+{r^2\nu\over(1-r^2)\nu}}\right]^{(\nu+1)/2}}\sqrt{\nu\over (1-r)^3}\,dr$  
  $\textstyle =$ $\displaystyle {(1-r^2)^{-3/2}\over\sqrt{\pi}}
{\Gamma({\textstyle{\nu+1\over 2}...
...r \Gamma({\textstyle{\nu\over 2}}) \left({1\over 1-r^2}\right)^{(\nu+1)/2}}\,dr$  
  $\textstyle =$ $\displaystyle {1\over\sqrt{\pi}}{\Gamma({\textstyle{\nu+1\over 2}})\over \Gamma({\textstyle{\nu\over 2}})}(1-r^2)^{-3/2}(1-r^2)^{(\nu+1)/2}\,dr$  
  $\textstyle =$ $\displaystyle {1\over\sqrt{\pi}}{\Gamma({\textstyle{\nu+1\over 2}})\over\Gamma({\textstyle{\nu\over 2}})}(1-r^2)^{(\nu-2)/2}\,dr,$ (13)

so
\begin{displaymath}
P(r)={1\over\sqrt{\pi}} {\Gamma\left({\nu+1\over 2}\right)\over\Gamma\left({\nu\over 2}\right)}(1-r^2)^{(\nu-2)/2}
\end{displaymath} (14)

as before. See Bevington (1969, pp. 122-123) or Pugh and Winslow (1966, §12-8). If we are interested instead in the probability that a correlation Coefficient would be obtained $\geq \vert r\vert$, where $r$ is the observed Coefficient, then
$\displaystyle P_c(r,N)$ $\textstyle =$ $\displaystyle 2\int_{\vert r\vert}^1 P(r',N)\,dr' = 1-2\int_0^{\vert r\vert} P(r',N)\,dr'$  
  $\textstyle =$ $\displaystyle 1-{2\over\sqrt{\pi}} {\Gamma({\textstyle{\nu+1\over 2}})\over\Gamma({\textstyle{\nu\over 2}})}\int_0^{\vert r\vert}(1-r^2)^{(\nu-2)/2}\,dr.$  
      (15)

Let $I\equiv {1\over 2}(\nu-2)$. For Even $\nu$, the exponent $I$ is an Integer so, by the Binomial Theorem,
\begin{displaymath}
(1-r^2)^I = \sum_{k=0}^I {I\choose k} (-r^2)^k
\end{displaymath} (16)

and
$\displaystyle P_c(r)$ $\textstyle =$ $\displaystyle 1-{2\over\sqrt{\pi}} {\Gamma({\textstyle{\nu+1\over 2}})\over \Ga...
... 2}})}
(-1)^k {I!\over(I-k)!k!} \int_0^{\vert r\vert} \sum_{k=0}^I r'^{2k}\,dr'$  
  $\textstyle =$ $\displaystyle 1-{2\over\sqrt{\pi}} {\Gamma({\textstyle{\nu+1\over 2}})\over \Ga...
...k=0}^I \left[{(-1)^k{I!\over (I-k)!k!} {\vert r\vert^{2k+1}\over 2k+1}}\right].$  
      (17)

For Odd $\nu$, the integral is
$\displaystyle P_c(r)$ $\textstyle =$ $\displaystyle 1-2\int_0^{\vert r\vert} P(r')\,dr'$  
  $\textstyle =$ $\displaystyle 1-{2\over\sqrt{\pi}}{\Gamma({\textstyle{\nu+1\over 2}})\over\Gamma({\textstyle{\nu\over 2}})}\int_0^{\vert r\vert}(\sqrt{1-r^2}\,)^{\nu-2}\,dr.$ (18)

Let $r\equiv\sin x$ so $dr=\cos x \,dx$, then
$\displaystyle P_c(r)$ $\textstyle =$ $\displaystyle 1-{2\over\sqrt{\pi}} {\Gamma[({\textstyle{\nu+1\over 2}})]\over \...
...tstyle{\nu\over 2}})}
\int_0^{\sin^{-1} \vert r\vert} \cos^{\nu-2}x \cos x \,dx$  
  $\textstyle =$ $\displaystyle 1-{2\over\sqrt{\pi}} {\Gamma({\textstyle{\nu+1\over 2}})\over \Gamma({\textstyle{\nu\over 2}})}+
\int_0^{\sin^{-1} \vert r\vert} \cos^{\nu-1}x\,dx.$ (19)

But $\nu$ is Odd, so $\nu-1\equiv 2n$ is Even. Therefore
$\displaystyle {2\over \sqrt{\pi}} {\Gamma({\textstyle{\nu+1\over 2}})\over \Gamma({\textstyle{\nu\over 2}})}$ $\textstyle =$ $\displaystyle {2\over \sqrt{\pi}} {\Gamma(n+1)\over \Gamma(n+{\textstyle{1\over 2}})} = {2\over \sqrt{\pi}}
{n!\over {(2n-1)!!\sqrt{\pi}\over 2^n}}$  
  $\textstyle =$ $\displaystyle {2\over \pi} {2^n n!\over (2n-1)!!} = {2\over \pi}{(2n)!!\over
(2n-1)!!}.$ (20)

Combining with the result from the Cosine Integral gives


\begin{displaymath}
P_c(r)=1-{2\over \pi}{(2n)!!(2n-1)!!\over (2n-1)!!(2n)!!}\le...
...er (2k+1)!!}\cos^{2k+1}x+ x}\right]_0^{\sin^{-1}\vert r\vert}.
\end{displaymath} (21)

Use
\begin{displaymath}
\cos^{2k-1} x=(1-r^2)^{(2k-1)/2} = (1-r^2)^{(k-1/2)},
\end{displaymath} (22)

and define $J\equiv n-1=(\nu-3)/2$, then


\begin{displaymath}
P_c(r)= 1-{2\over \pi} \left[{\sin^{-1}\vert r\vert+\vert r\vert\sum_{k=0}^J {(2k)!!\over (2k+1)!!} (1-r^2)^{k+1/2}}\right].
\end{displaymath} (23)

(In Bevington 1969, this is given incorrectly.) Combining the correct solutions


\begin{displaymath}
P_c(r) = \cases{
1-{2\over\sqrt{\pi}} {\Gamma[(\nu+1)/2]\ove...
...}(1-r^2)^{k+1/2}}\right]\cr
\quad {\rm for\ }\nu{\rm\ odd}\cr}
\end{displaymath} (24)

If $\rho\not=0$, a skew distribution is obtained, but the variable $z$ defined by

\begin{displaymath}
z\equiv \tanh^{-1} r
\end{displaymath} (25)

is approximately normal with
$\displaystyle \mu_z$ $\textstyle =$ $\displaystyle \tanh^{-1}\rho$ (26)
$\displaystyle {\sigma_z}^2$ $\textstyle =$ $\displaystyle {1\over N-3}$ (27)

(Kenney and Keeping 1962, p. 266).


Let $b_j$ be the slope of a best-fit line, then the multiple correlation Coefficient is

\begin{displaymath}
R^2\equiv \sum_{j=1}^n\left({b_j {{s_{jy}}^2\over {s_y}^2}}\right)= \sum_{j=1}^n \left({b_j {s_j\over s_y} r_{jy}}\right),
\end{displaymath} (28)

where $s_{jy}$ is the sample Variance.


On the surface of a Sphere,

\begin{displaymath}
r\equiv {\int fg\,d\Omega\over \int f\,d\Omega \int g\,d\Omega},
\end{displaymath} (29)

where $d\Omega$ is a differential Solid Angle. This definition guarantees that $-1<r<1$. If $f$ and $g$ are expanded in Real Spherical Harmonics,


$\displaystyle f(\theta,\phi)$ $\textstyle \equiv$ $\displaystyle \sum_{l=0}^\infty \sum_{m=0}^l [C_l^m {Y_l^m}^c(\theta,\phi)\sin(m\phi)+S_l^m {Y_l^m}^s(\theta, \phi)]$ (30)
$\displaystyle g(\theta,\phi)$ $\textstyle \equiv$ $\displaystyle \sum_{l=0}^\infty \sum_{m=0}^l [A_l^m {Y_l^m}^c(\theta,\phi)\sin(m\phi)+B_l^m {Y_l^m}^s(\theta, \phi)].$ (31)

Then
\begin{displaymath}
r_l ={\sum_{m=0}^l (C_l^mA_l^m+S_l^mB_l^m)\over \sqrt{\sum_{...
...^m}^2+{S_l^m}^2)}
\sqrt{\sum_{m=0}^l ({A_l^m}^2+{B_l^m}^2)}}.
\end{displaymath} (32)

The confidence levels are then given by
$\displaystyle G_1(r)$ $\textstyle =$ $\displaystyle r$  
$\displaystyle G_2(r)$ $\textstyle =$ $\displaystyle r(1+{\textstyle{1\over 2}}s^2)={\textstyle{1\over 2}}r(3-r^2)$  
$\displaystyle G_3(r)$ $\textstyle =$ $\displaystyle r[1+{\textstyle{1\over 2}}s^2(1+{\textstyle{3\over 4}} s^2)]={\textstyle{1\over 8}} r(15-10r^2+3r^4)$  
$\displaystyle G_4(r)$ $\textstyle =$ $\displaystyle r\{1+{\textstyle{1\over 2}}s^2[1+{\textstyle{3\over 4}}s^2(1+{\textstyle{5\over 6}}s^2)]\}$  
  $\textstyle =$ $\displaystyle {\textstyle{1\over 16}}r(35-35r^2+21r^4-5r^6),$  

where
\begin{displaymath}
s\equiv \sqrt{1-r^2}
\end{displaymath} (33)

(Eckhardt 1984).

See also Fisher's z'-Transformation, Spearman Rank Correlation Coefficient, Spherical Harmonic


References

Bevington, P. R. Data Reduction and Error Analysis for the Physical Sciences. New York: McGraw-Hill, 1969.

Eckhardt, D. H. ``Correlations Between Global Features of Terrestrial Fields.'' Math. Geology 16, 155-171, 1984.

Kenney, J. F. and Keeping, E. S. Mathematics of Statistics, Pt. 1, 3rd ed. Princeton, NJ: Van Nostrand, 1962.

Kenney, J. F. and Keeping, E. S. Mathematics of Statistics, Pt. 2, 2nd ed. Princeton, NJ: Van Nostrand, 1951.

Pugh, E. M. and Winslow, G. H. The Analysis of Physical Measurements. Reading, MA: Addison-Wesley, 1966.



info prev up next book cdrom email home

© 1996-9 Eric W. Weisstein
1999-05-25