Categorical Data
Categorical Data
CATEGORICAL DATA
(CHI-SQUARE & FISHER EXACT TEST)
Dr. Mohd Razif Shahril
School of Nutrition & Dietetics
Faculty of Health Sciences
Universiti Sultan Zainal Abidin
1
Topic Learning Outcomes
At the end of this lecture, students should be able to;
• identify types of categorical data analysis and their use
• explain assumptions to be met when using chi-square
and fisher exact test
• perform chi-square and fisher exact test using SPSS
• explain how to interpret the SPSS outputs from chi-
square and fisher exact test
2
SCHOOL OF NUTRITION AND DIETETICS • UNIVERSITI SULTAN ZAINAL ABIDIN
What is categorical data analysis?
• Independent (Explanatory) Variable is
Categorical (Nominal or Ordinal)
• Dependent (Response) Variable is Categorical
(Nominal or Ordinal)
• Most common;
– 2x2 (Each variable has 2 levels)
– Nominal/Nominal CONTINGENCY
– Nominal/Ordinal TABLE
– Ordinal/Ordinal
3
SCHOOL OF NUTRITION AND DIETETICS • UNIVERSITI SULTAN ZAINAL ABIDIN
Contingency Table
• Tables representing all combinations of levels of
explanatory and response variables
• Numbers in table represent Counts of the
number of cases in each cell
• Row and column totals are called Marginal
counts
4
SCHOOL OF NUTRITION AND DIETETICS • UNIVERSITI SULTAN ZAINAL ABIDIN
Example of Contingency Table
• Response Variable
– Cognitive Level (Low, Counts
High) Marginal Count
• Explanatory Variable
– BMI (Underweight, Cognitive
Normal, Overweight, BMI Total
Obese) Low High
Underweight 59 232 291
Normal 54 367 421
Overweight 114 101 215
Obese 173 54 227
Total 400 754 1154
Marginal Count 5
SCHOOL OF NUTRITION AND DIETETICS • UNIVERSITI SULTAN ZAINAL ABIDIN
2 x 2 Contingency Table
• Each variable has 2 levels
– Explanatory Variable – Groups (Typically based on
demographics, exposure, or treatment)
– Response Variable – Outcome (Typically presence or
absence of a characteristic)
Cognitive
BMI Total
Low High
≤ 24.9 113 599 712
> 24.9 287 155 442
Total 400 754 1154
6
SCHOOL OF NUTRITION AND DIETETICS • UNIVERSITI SULTAN ZAINAL ABIDIN
Chi-Square Test (X2)
The chi-square test for independence,
• Hypothesis; also called Pearson's chi-square test or
– Comparing two or more the chi-square test of association, is
proportion used to discover if there is a
– Ho : P1 = P2 relationship between two categorical
variables.
• Assumption
– Random samples Based on study design &
– Observations are independent method
– The number of cells with
Expected Count (EC) less than
5, must be less than 20% of the Calculate expected
total number of cells. count for each cell
(SPSS will do it)
– The smallest EC must be at least
2.
7
SCHOOL OF NUTRITION AND DIETETICS • UNIVERSITI SULTAN ZAINAL ABIDIN
Example Chi-Square Test (X2) – (1)
• Hypothesis;
– Association between gender and Knowledge on
Nutrition (KoN)
– Comparing the proportion of Low KoN between
gender
– Ho : P(KoN)male = P(KoN)femafe
• Assumption
– Random samples [ √ ]
– Observations are independent [ √ ]
– The number of cells with Expected Count (EC) less
than 5, must be less than 20% of the total number of
cells Calculated by SPSS
– The smallest EC must be at least 2
8
SCHOOL OF NUTRITION AND DIETETICS • UNIVERSITI SULTAN ZAINAL ABIDIN
Chi-square using SPSS - procedure:
9
Chi-square using SPSS - procedure:
6
4
9 10
Chi-square using SPSS - Output:
Descriptive statistics for each group
Must be ≥ 2
2 EC
assumptions
Must be < 20% is met
11
Chi-square using SPSS – Table and Interpretation:
12
What if assumptions were not met?
• Combine adjacent columns or/and rows to
increase the EC if possible.
• If still did not meet expected cell assumption,
Fisher’s exact (FE) test can be applied (only
for 2 x 2 table in SPSS).
13
SCHOOL OF NUTRITION AND DIETETICS • UNIVERSITI SULTAN ZAINAL ABIDIN
Example Chi-Square Test (X2) – (2)
• Hypothesis;
– Association between ethnicity and Knowledge on Nutrition
(KoN)
– Comparing the proportion of Low KoN between ethnicity
– Ho : P(KoN)malay=P(KoN)chinese=P(KoN)indian=P(KoN)others
• Assumption
– Random samples [ √ ]
– Observations are independent [ √ ]
– The number of cells with Expected Count (EC) less than
5, must be less than 20% of the total number of cells
– The smallest EC must be at least 2 Calculated by SPSS
14
SCHOOL OF NUTRITION AND DIETETICS • UNIVERSITI SULTAN ZAINAL ABIDIN
Chi-square using SPSS - Output:
Descriptive statistics for each group
If EC assumptions
is still not met
2 EC
assumptions
Must be < 20% is met Must be ≥ 2 16
Chi-square using SPSS – Table and Interpretation:
18
SCHOOL OF NUTRITION AND DIETETICS • UNIVERSITI SULTAN ZAINAL ABIDIN
Thank You
19