Chi Square
Chi Square
Introduction
The Chi-Square distribution is a staple of statistical analysis. It is often used to judge how far away some number is from some other number. The simplest place to start is that the Chi-Square distribution is what you get if you take observations from a standard Normal distribution and square them and add them up. If we use Z1 ,Z2 , and so forth to refer to draws from N (0, 1), then
N 2 2 2 Z1 + Z2 + . . . + ZN = i=1
Zi2 2 N
Thats means the sum of Z s squared has a Chi-Square distribution with N degrees of freedom. The term degrees of freedom has some emotional and cognitive implications for psychologists, but it is really just a parameter for us. Things that are sums of squares have 2 distributions. Now, suppose the numbers being added up are not standardized, but they are centered. That is to say, they have a Normal distribution with a mean of 0 and a standard deviation of sd. That means we would have to divide each observation by sd in order to obtain the Zi s which are standardized Normal observations. Obviously, Y1 sd
2
Y2 sd
++
YN sd
Equivalently, suppose you think of the Yi as being proportional to the Zi in this way: Yi = sd Zi The coecient sd is playing the role of a scaling coecient and without too much eort you nd out that if some variable xi = Zi2 has a Chi-square distribution, 2 N , then sd xi has a distribution equal to sd 2 . N The elementary laws of expected values and variances dictate that E (sd xi ) = sd E (xi ) and V ar(sd xi ) = sd2 V ar(xi ) 1
In other words, the Chi-square distribution applies not just for a sum of squares of a standardized Normal distribution, but in fact it describes a sum of squares of any Normal distribution that is centered around zero.
Mathematical Description
Zi2 is dened as:
(N/21)
It is dened on a range of positive numbers, 0 xi . Because we are thinking of this value as a sum of squared values, it could not possibly be smaller than zero. It also assumes that N > 0, which is obviously true because we are thinking of the variable as a sum of N squared items. Why does the 2 have that functional form? Well, write down the probability model for a standardized Normal distribution, and then realize that the probability of a squared-value of that standardized Normal is EXTREMELY easy to calculate if you know a little bit of mathematical statistics. The only fancy bit is that this formula uses our friend the Gamma Function (see my handout on the Gamma distribution), to represent a factorial. But we have it on good authority (Robert V. Hogg and Allen T. Craig, Introduction to Mathematical Statistics, 4ed, New York: Macmillian, 1978, p. 115) that (1/2) = .
Illustrations
The probability density function of the Chi-Square distribution changes quite a bit when one puts in dierent values of the parameters. If somebody knows some interesting parameter settings, then a clear, beautiful illustration of the Chi-square can be produced. Consider the following code, which can be used to create the illustration of 2 possible Chi-Square density functions in Figure 1. x v a l s < s e q ( 0 , 1 0 , l e n g t h . o u t =1000) c h i s q u a r e 1 < d c h i s q ( x v a l s , d f =1) c h i s q u a r e 2 < d c h i s q ( x v a l s , d f =6) matplot ( x v a l s , cbind ( c h i s q u a r e 1 , c h i s q u a r e 2 ) , type=" l " , x l a b=" p o s s i b l e v a l u e s o f x " , y l a b=" p r o b a b i l i t y o f x " , ylim=c ( 0 , 1 ) , main=" Chi Square P r o b a b i l i t y D e n s i t i e s " ) t e x t ( . 4 , . 9 , " d f=1" , pos =4, c o l =1) t e x t ( 4 , . 2 , " d f=6" , pos =4, c o l =2) The shape of the Chi-Square is primarily dependent upon the degrees of freedom that are witnessed in any particular univariate analysis. The adjustment of the degrees of freedom will have a substantial impact on the shape of the distribution. The following code will produce example density functions for a variety of shapes with a variety of degrees of freedom. Examples of Chi-Square density function with a variety of degrees of freedom are found in Figure 2. 2
df=6
0.0 0
10
possible values of x
Figure 1: 2 Density Functions
The Chi-Square distribution is a form of the Gamma distribution, and most treatments of the Chi-Square rely on the general results about the Gamma to state the characteristics of the special-case Chi-square. The Gamma distribution G(, ) is a two parameter distribution, with parameters shape () and scale ( ). Gamma probability density = 1 x1 ex/ ()
Note that if the shape parameter of a Gamma distribution is N 2 and the scale parameter is equal to 2, then this probability density is identical to the Chi-square distribution with degrees of freedom equal to N . Since it is known that the expected value of a Gamma distribution is and the variance is 2 , that means that the expected value of a Chi-square for N observations is E ( x) = N and the variance of a Chi-square variable is V ar(x) = 2N 3
df=2
0.2
0.0 0
10 possible values of x
15
20
Now, if a variable is proportional to a Chi-Square xi , yi = xi , we know that yi has a distribution yi 2 N and the probability density is (via a change of variables) ( N 1) yi yi 2 exp 2 N/2 2N/2 [N/2]
f (yi ) = and
E (yi ) = N V ar(yi ) = 2 N The mode (for N > 2) is mode(yi ) = (N 2) The Chi-Square is related to the Poisson distributions with parameter and expected value i equal to x 2 by: P [Chi Square(n) xi ] = P P oisson
xi 2
n 2
In statistical problems, we often confront 2 kinds of parameters. The slope coecients of a regression model are one type, and we usually have priors that are single-peaked and symmetric. The prior for such a coecient might be Uniform, Normal, or any other mathematically workable distribution. Sometimes other coecients are not supposed to be symmetrical. For example, the variance of a distribution cannot be negative, so we need a distribution that is shaped to have a minimum at zero. The Gamma, or its special case the Chi-square, is an obvious candidate. The most important aspect of the Chi-square, however, is that it is very mathematically workable! If one is discussing a Normal distribution, for example, N (, 2 ) one must specify prior beliefs about the distributions of and 2 . Recall that in Bayesian updating, we calculate the posterior probability as the product of the likelihood times the prior, so some formula that makes that result as simple as possible would be great. p( 2 |y ) = p(y | 2 )p( 2 ) From the story that we told about where Chi-square variables come from, it should be very obvious that if y is normal, we can calculate p(y | 2 ) (assuming is taken as given for the moment). So all we need is a prior that makes p( 2 |y ) as simple as possible. If you choose p( 2 ) to be Chi-squared, then it turns out to be very workable. Suppose you look at the numerator from the Chi-Square, and guess that you want to put 1/ 2 in place of xi . You describe your prior opinion about 2 5
We use N and S0 as a scaling factors to describe how our beliefs vary from one situation to another. N is the degrees of freedom. Note that is very convenient if your Normal theory for y says: p(yi | 2 ) = 1 2 2 exp( 1 (yi )2 ) 2 2
n 2 i=1 (yi )
Suppose the sample size of the dataset is n. If you let S = sum of squares, then we rearrange to nd a posterior:
represent the
1 p( 2 |y ) ( 2 )(N +n)/21 exp( (So + S )/ 2 ) 2 Look how similar the prior is to the posterior. It gets confusing discussing 2 and 1/ 2 . Bayesians dont usually talk about estimating the variance of 2 , but rather the precision, which is dened as = 1 2
Hence, the distribution of the precision is given as a Chi-Square variable, and if your prior is 1 prior : p ( ) N/21 exp So 2 then the posterior is a Chi-Square variable (So + S ) 2 N +n If you really do want to talk about the variance, rather than the precision, then you are using a prior that is an INVERSE Chi-Square. Your prior is the inverse chi-square
2 2 So XN
As a result, a prior for a variance parameter is often given as an inverse Chi-square, while the prior for a precision parameter is given as a Chi-square.