0% found this document useful (0 votes)
21 views34 pages

Lecture LAB5 Chi Square

The document discusses how to perform and interpret a chi-square test of independence. It provides examples of using chi-square to test for associations between categorical variables. It also discusses assumptions of chi-square and using alternative tests like Fisher's exact test if assumptions are not met.

Uploaded by

Samar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views34 pages

Lecture LAB5 Chi Square

The document discusses how to perform and interpret a chi-square test of independence. It provides examples of using chi-square to test for associations between categorical variables. It also discusses assumptions of chi-square and using alternative tests like Fisher's exact test if assumptions are not met.

Uploaded by

Samar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 34

LAB 5:

Contingency Table Analysis


(Chi-Square Analysis)
Introduction

 Be able to correctly use and interpret Pearson Chi-Square test, a test for
independence between two qualitative variables.
 Pearson Chi-Square test, a test for independence
 Is used to examine the relationship between two qualitative variables.
 The null hypothesis: There is no association (relationship) between the two
variables
 The alternative hypothesis: The two variables are associated
 Scenario when you would use Chi-Square test:
• Does smoking status at baseline depend on gender?
• Is there a relationship between coffee consumption and age group?
Assumptions
 Random Sample: The sample should be randomly selected from the population
 Independence: Observations must be independent from each other (Not matched pairs,
e.g. Matched Case-control study)
 Adequate sample size :

o No more than 20% of the cells have expected count ­< 5.

o The expected frequency (count) in each cell must be ≥1

• If the expected frequencies are too small then FISHER’S EXACT TEST should be used in

place of the Pearson Chi-square


• If independence cannot be assumed then another non-parametric test called the MCNEMAR

TEST must be used


Chi-Square
Test Statistics

2 Qualitative
(Nominal)
Variables

Ho : There is no Ha : There is an
association between the association between the
2 variables 2 variables
(2 vars. are indep.) (2 vars. not indep.)
All assumptions are met Pearson Chi-Square

Sample size assumption is not


Contingency met (the expected frequencies Fisher’ Exact Test
Table Analysis are too small)

The assumption of
independence is not met. McNemar Test

(Paired observations)
Evidence or Proof

• If p-value ≤ 0.05, we reject the null hypothesis

P-value (Sig.)
• If p-value > 0.05, we fail to reject the null hypothesis
Example1: Does smoking status at baseline depend on gender?

H0: Smoking status at baseline is independent of gender

Observed Values

BASELINE SMOKING STATUS * GENDER


Gender Total
Male Female
Smoking Smoker 16 17 33
Status Non-smoker 143 230 373
Total 159 247 406
Expected Values 
 Row total  Column total 
Grand total
Expected Values

BASELINE SMOKING STATUS * GENDER

Gender Total

Male Female

Smoker (33*159)/406=12.9 (33*247)/406=20.1 33


Smoking
Status
Non-smoker (373*159)/406=146.1 (373*247)/406=226.9 373

Total 159 247 406


Observed and Expected Values

BASELINE SMOKING STATUS * GENDER

Gender Total
Male Female
Observed 16 17
Smoker 33
Smoking Expected 12.9 20.1
Status Non- Observed 143 230
373
smoker Expected 146.1 226.9
Total 159 247 406
(O  E ) 2
Test Statistics i


i


2
  1.31
E
i

BASELINE SMOKING STATUS * GENDER


Gender Total
Male Female
Observed 16 17
Smoker 33
Smoking Expected 12.9 20.1
Status Non- Observed 143 230
373
smoker Expected 146.1 226.9
Total 159 247 406

=+++=1.31
Degrees of freedom: (rows - 1)(columns - 1) = 1

Chi-square test statistic = 1.31 Critical value from table (α = 0.05) = 3.841

FTR

x
3.84
Variable
view

Data view
Using Weight cases option

Data Weight cases choose weight cases by move weighing


variable (frequency) in OK
To Obtain Chi-Square:
From the menus choose:

Analyze Descriptive Statistics Crosstabs option Select smoking as the row variable

Select gender as the column variable Click Cells and select column and percentages Click continue

Click statistics option and select Chi-square Click continue then OK

.
SPSS output

Since the p-value is >


0.05, there is no
association between
Assumptions smoking status at
baseline and gender
are met
Summary Table

Gender
Characteristic Female Male P-value
n=247 n=159
Smoking Status at Baseline [n (%)]
Smoker 17 (6.9) 16 (10.1) 0.252*
Non-smoker 230 (93.1) 143 (89.9)
Pearson Chi-Square
*

Decision: Fail to reject H0

Conclusion: at the 0.05 level of significance, there is no association between smoking status

at baseline and gender, X2 (1, N= 406)= 1.31, p=0.252).


Example2: Using Framingham dataset, is there an association between BMI (categories) and
coronary heart disease (CHD)?
Hypothesis:
H0 : No association between BMI categories and coronary heart disease
Ha : There is an association between BMI categories and coronary heart disease
Since the two involved variables are qualitative, so chi-square is the appropriate test to evaluate the
association
Assumptions:
• Random Sample: The sample is randomly selected from the population
• Independence: Observations are independent from each other (Not matched pairs)
• Sample Size:
o No more than 20% of the cells have expected count ­< 5
o The expected frequency (count) in each cell must be ≥1
BMI CATEGORIES
SPSS output
SPSS output
SPSS output
Assumptions are met,
so use Pearson Chi-
Square p value
Summary Table

BMI Groups
Characteristic Underweight Normal Overweight Obese P-value
71 2152 1866 601
CHD Status[n (%)]
CHD (No) 63 (88.7) 1622 (75.4) 1180 (63.2) 353 (58.7) <0.01*
CHD (Yes) 8 (11.3) 530 (24.6) 686 (36.8) 248 (41.3)
*Pearson Chi-Square

Decision: reject H0

Conclusion: at the 0.05 level of significance, there is an association between BMI groups

and coronary heart disease risk, X2 (3, N= 4690)= 111.26, p<0.01).


Example3: Is there a relationship between marital status and exercise levels at baseline?
Hypothesis:
H0 : No association between marital status and exercise
Ha : There is an association between marital status and exercise
Assumptions:

• Random Sample: we will that the sample are randomly selected from the population

• Independence: Observations are independent from each other ( Not matched pairs)

• Sample Size:

o No more than 20% of the cells have expected count ­< 5

o The expected frequency (count) in each cell must be ≥1


SPSS output
SPSS output
Assumptions not met Fisher’s Exact test
To Obtain Fisher’s Exact Test
(Please note that this option may not be available for Mac users)

Analyze Descriptive Statistics Crosstabs option Select marital status at baseline as the row variable

Select marital status as the column variable Click Cells and select column and row percentages Click

continue

Click statistics option and select Chi-square Click Exact option and select Monte Carlo with 99%CI Click
Please note that
this option is not
continue then OK available for Mac
users.
.
SPSS output
Summary table
Marital status
Single Married Divorced Widowed p-value*

N % N % N % N %

Exercise None 6 35.3 127 38.3 7 21.2 10 41.7 0.142


levels at Mild 8 47.1 96 28.9 13 39.4 8 33.3
baseline
Moderate 2 11.8 93 28 8 24.2 6 25
Vigorous 1 5.9 16 4.8 5 15.2 0 0
*p-value of Fisher’s exact test
Decision:
Fail to reject the null hypothesis (FTR)
Since P-value greater than alpha 0.05, we fail to reject the null hypothesis.
Conclusion (Interpretation):there is no association between marital status
and exercise levels at baseline
In-class Assignment
Using Framingham dataset, answer the following question:
 Is there an association between systolic blood pressure
(categories) and the risk of coronary heart diseases?
 Check for the assumptions
 Use the following table to categorize systolic blood pressure:
Answer
Hypothesis

Assumptions
SPSS Output:
 Cross tabulation
 Chi-square tests

Summary Table

Decision/interpretation/conclusion
Thank you

You might also like