Chi Square Distribution

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 18

Chi Square Distribution & Hypothesis Test

Posted by Ted Hessing

The chi square (χ2) distribution is the best method to test a


population variance against a known or assumed value of the population
variance. A chi square distribution is a continuous distribution with  degrees of
freedom. Another best part of chi square distribution is to describe the
distribution of a sum of squared random variables. It is also used to test the
goodness of fit of a distribution of data, whether data series are independent,
and for estimating confidences surrounding variance and standard
deviation for a random variable from a normal distribution.

History of Chi Square

Karl Pearson (1857-1936) father of modern statistics (establishing the first


statistics department in the world at University College London) came up with
the chi squared distribution. Pearson’s work in statistics began with developing
mathematical methods for studying the processes of heredity and evolution.
Later the chi squared distribution came about as Pearson was attempting to
find a measure of the goodness of fit of other distributions to random variables
in his heredity and evolutionary modeling.

Chi Square Statistic

Chi square maybe skewed to the right or with a long tail toward the large
values of the distribution. The overall shape of the distribution will depend on
the number of degrees of freedom in a given problem. The degrees of freedom
are 1 less than the sample size.
Chi Square Properties

 The mean of the distribution is equal to the number of degrees of


freedom: μ=ϑ.
 The variance is equal to two times the number of degrees of freedom:
σ2 = 2*ϑ.
 When the degrees of freedom are greater than or equal to 2, the
maximum value for Y occurs when χ2=ϑ-2.
 As the degrees of freedom increase, the chi square curve approaches a
normal distribution.
 As the degrees of freedom increase, the symmetry of the graph also
increases.
 Finally, It may be skewed to the right, and since the random variable on
which it is based is squared, it has no negative values. As the degrees of
freedom increases, the probability density function begins to appear
symmetrical in shape.

The formula for the probability density function of the chi square distribution
is
Where ϑ the shape parameter and Γ is the gamma function.

The formula for the gamma function is

Chi Square (χ2) Hypothesis Test


Usually the objective of the six sigma team is to find the level of variation of
the output, not just mean of the population. Most importantly, the team would
like to know how much variation the production process exhibits about the
target to see what adjustments are needed to reach a defect-free process.

A comparison between several sample variances, or a comparison between


frequency proportions, the standard test statistic called chi square χ2 test will
be used. So, The distribution of the chi square statistic is called the chi square
distribution.

Types of Chi square Hypothesis Tests

There are basically two types of Chi squares test,

 Chi-square Test of Independence: Determines is there any association


between two categorical variables by comparing observed and expected
frequencies of test outcomes when there is no defined population
variance.
 Chi Square Test of Variance: Compare the variances when the variance
of the population is known.

Chi-Square test of Independence


Chi square test of independence determines whether there is an association
between two categorical variables (like gender, course selection). For example,
chi square test of independence examines the association between one
category like gender (male and female) and the other category like percentage
of absenteeism in a school. Chi square test of independence is a non-
parametric test. In other words, the assumption of normality is not required to
perform the test.

Chi square test utilizes a contingency table to analyze the data. Each row
shows the categories of one variable. Similarly, each column shows the
categories of another variable. Each variable must have two or more
categories. Each cell reflects the total number of cases for a specific pair of
categories.

Assumptions of Chi-square test of


independence
 Variable must be nominal or categorical
 Category of variables are mutually exclusive
 The sampling method to be a simple random sampling
 The data in the contingency table are frequencies or count

Steps to perform Chi Square test of


independence
Step1: Define the null hypothesis and alternative hypothesis

 Null hypothesis (H0): There is no association between two categorical


variables
 Alternative hypothesis (H1): There  is a significant association between
two categorical variables

Step2: Specify the level of significance

Step 3: Compute χ2 statistic

 O is the observed frequency


 E is the expected frequency
Expected frequency is calculated for each cell = (frequency of
columns*frequency of rows)/ n

Step 4: Calculate the degree of freedom= (number of rows -)*(number of


columns -1)= (r-1) * (c-1)

Step5: Find the critical value, based on degrees of freedom

Step 6: Finally, draw the statistical conclusion: If the test statistic value is
greater than the critical value, reject the null hypothesis, and hence we can
conclude that there is a significant association between two categorical
variables.

Chi Square test of independence example


Example: 1000 middle school students are asked which their favorite
superhero is: Superman, Ironman or Spiderman. At 95% confidence level would
you conclude is there any relationship between gender and superhero
characters?

 Null hypothesis (H0): There is no association between gender and


favourite superhero characters.
 Alternative hypothesis (H1): There is a significant association between
gender and favourite superhero characters.

Level of significance: α=0.05

First calculate the expected frequency


For the cell (Boys, Superman) = (200*600)/1000 = 120

Similarly, calculate the expected frequency of all cells

Degrees of freedom = (r-1)*(c-1) = (2-1)*(3-1) =2

Chi-square critical value for 2 degrees of freedom =5.991


The test statistic value is greater than the critical value, hence we can reject
the null hypothesis

So, we can conclude that there is a significant association between gender and
favorite superhero characters.

Download Chi Square Test of Independence Excel Exemplar


Thank You for being a Member!
Here’s some of the bonus content that is only available to you as a paying member.
Chi_Square_Test_Of_IndependenceDownload

Chi Square test – Comparing Variances


The chi square test is best option for two applications.

 Case I: comparing variances when the variance of the population known.


 Case II: Comparing observed and expected frequencies of test outcomes
when there is no defined population variance.

When the population follows a normal distribution, the hypothesis tests for
comparing a population variance σx2. The test statistic is given by
Where the number of samples is n and the sample variance is s2. The shape of
the χ2 distribution resembles the normal curve but it is not symmetrical, and its
shape depends on the degrees of freedom.

Hypothesis testing

A tailed hypothesis is an assumption about a population parameter. The


assumption may or may not be true. One-tailed hypothesis is a test of
hypothesis where the area of rejection is only in one direction. Whereas two-
tailed, it tests against the alternative that the actual variance is greater than or
less than the particular value. The selection of one or two-tailed tests depends
upon the problem.

Left tail and Right tailed χ2 distribution


The chi square test has the following properties

 Evaluates sample variances


 Chi square is non-negative.
 Chi square is non-symmetric.
 The degrees of freedom when working with a single population variance
is n-1.
 You do not need knowledge of population variation

Left-tailed chi square test example


The average standard deviation of an airline’s passengers waiting time for a
single queue is 16 minutes. Accordingly, the population variance is 256 (square
of the standard deviation). The average standard deviation of the waiting time
for separate queues of the pilot project with 7 passengers is 8 minutes. Thus,
the sample variance is 64 (square of the standard deviation). Check whether
the wait time reduction with 95% confidence level?

The null hypothesis is H0: σ12 ≥ (16)2

The alternative hypothesis is H1: σ12 < (16)2

Let’s look at the chi square table. Because S is less than σ, this is left tail test,
so, df =7-1=6. The critical value for 95% confidence is 1.63
The test statistic (1.5) is less than the critical value (1.63), and it is in the
rejection region. Hence the null hypothesis must be rejected. The wait time
decreased with the separate line.

Example 1:

The Barnes Company manufactures a DVD player and claims that the mean
number of hours of use before repairs is 400, with a standard deviation of 10
hours.

Thank You for being a Member!


Here’s some of the bonus content that is only available to you as a paying member.

The specified variance, therefore, is σo2 = 102 = 100 hours2. A new company
marketing representative suspects that the “before repair” variance is actually
less than 100 hours2. To verify this, she tests nine machines and finds a sample
mean of 410 hours and a standard deviation of 5.5. Is the sample variance
statistically significantly less than the currently claimed variance? Use α = 0.05.
Rigt-tailed chi square test example
Smartwatch manufacturer received customer complaints about the XYZ model,
whose battery lasts a shorter time than the previous model. The variance of
the battery life of the previous model is 49 hours. 11 watches were tested, and
the battery life standard deviation was 9 hours. Assuming that the data are
normally distributed, Could the claim about increase in performance of the
new model be validated with 5% significance level?

Population standard deviation σ12= 49 hours σ1 = 7

Sample standard deviation = 9hours

The null hypothesis is H0: σ12 ≤ (7)2

The alternative hypothesis is H1: σ12 > (7)2

Let’s look at the chi square table. Because S is greater than σ, this is a right tail
test, so, df =11-1=10. The critical value for 95% confidence is 18.307.

The test statistic is


Test statics is less than the critical value and it is not in rejection region. Hence
we failed to reject the null hypothesis. There is no sufficient evidence to claim
the new model battery has better performance.

Two-tailed chi square test example


Company HR believes that the variation in the salaries of new digital
technology is not the same as the java technology. From historical data, the
standard deviation of salaries of the java is $49K. Salaries of 30 new digital
technology employees were collected, and its standard deviation is $70K.
Assuming that the data are normally distributed, Could the HR claim be
validated with 95% confidence?

Population standard deviation σ1 = 49

Sample standard deviation = 70

The null hypothesis is H0: σ12 =(49)2

The alternative hypothesis is H1: σ12 ≠ (49)2

df =30-1=29.

Since s is not equal to σ, it is two tail test. So α/2 =0.05/2 = 0.025

For 29 degrees of freedom left tail (1-α/2 = 1-0.025 = 0.975) is 16.047


And right tail α/2=0.025 is 45.722
Test statics is more than 45.722 and it is in the rejection region. Hence, we can
reject the null hypothesis.

Chi Square Sample Size


Chi-Square tests are susceptible to adequate sample size. If the sample size is
more, the absolute differences become a smaller and smaller proportion of the
expected value. In other words, the strong association may not come up if the
sample size is small, and the findings are not significant, even though they are
statistically significant.

Where

 n is sample size with correction


 n’ is sample size without continuity correction
 P1 and P2 are proportions in each group
 Q1 =1-P1
 P̅ = P1+P2/2

Chi Square Videos


https://www.youtube.com/watch?v=53kYOOr5Yhk

Additional Chi Square Examples and Helpful Links:


Chi Square Tables

 National Institute of Standards and Technology


 Nice Chi Square one-pager from Richland Community College
 https://people.richland.edu/james/lecture/m170/tbl-chi.html
 https://www.statisticshowto.datasciencecentral.com/tables/chi-
squared-table-right-tail/
 https://www.socscistatistics.com/tests/chisquare2/default2.aspx

Chi Square Sample Size Calculation

 https://stats.stackexchange.com/questions/340291/estimate-sample-
size-for-chi-squared-test
 http://www.statskingdom.com/sample_size_chi2.html

Other Uses of Chi Square


 Chi-square test for goodness of fit: It is a statistical hypothesis test to
see how well sample data fit into population characteristics.
 Chi Square contingency table.

Six Sigma Black Belt Certification Chi Square Questions:


Question: The time for a fail-safe device to trip is thought to be a discrete
uniform distribution from 1 to 5 seconds. To test this hypothesis, 100 tests are
conducted with results as shown below.

On the basis of these data, what are the chi square (c2) value and the number
of degrees of freedom (df)?

(A) (c2) value = 57.5, degrees of freedom = 4


(B) (c2) value = 57.5, degrees of freedom = 5
(C) (c2) value = 1,150.0, degrees of freedom = 4
(D) (c2) value = 1,150.0, degrees of freedom = 5

Answer: 

Thank You for being a Member!


Here’s some of the bonus content that is only available to you as a paying member.

A: 57.7. and 4 degrees of freedom.

First of all, we will figure out the degrees of freedom. It’s an easy way to
eliminate half the answers on the page.

There are 5 rows and 2 columns in the chart.

So, Degrees of freedom = (rows -1) * (columns – 1) = (5-1) * ( 2 – 1) = 4* 1 = 4.

Now we’ll run the equation Chi Squared = X^2 = Σ (((o-E)^2 )/ E) = 100 / 20 +
25 / 20 + 900 / 20 + 25/20 + 100 / 20 = 1150 / 20 = 57.5

Authors

You might also like