0% found this document useful (0 votes)
8 views19 pages

Categorical Data

STUDY MATERIALS FOR BIOSTATISTICS
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views19 pages

Categorical Data

STUDY MATERIALS FOR BIOSTATISTICS
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

BIOSTATISTICS (HFS3283)

CATEGORICAL DATA
(CHI-SQUARE & FISHER EXACT TEST)
Dr. Mohd Razif Shahril
School of Nutrition & Dietetics
Faculty of Health Sciences
Universiti Sultan Zainal Abidin

KNOWLEDGE FOR THE BENEFIT OF HUMANITY

1
Topic Learning Outcomes
At the end of this lecture, students should be able to;
• identify types of categorical data analysis and their use
• explain assumptions to be met when using chi-square
and fisher exact test
• perform chi-square and fisher exact test using SPSS
• explain how to interpret the SPSS outputs from chi-
square and fisher exact test

2
SCHOOL OF NUTRITION AND DIETETICS • UNIVERSITI SULTAN ZAINAL ABIDIN
What is categorical data analysis?
• Independent (Explanatory) Variable is
Categorical (Nominal or Ordinal)
• Dependent (Response) Variable is Categorical
(Nominal or Ordinal)
• Most common;
– 2x2 (Each variable has 2 levels)
– Nominal/Nominal CONTINGENCY
– Nominal/Ordinal TABLE
– Ordinal/Ordinal

3
SCHOOL OF NUTRITION AND DIETETICS • UNIVERSITI SULTAN ZAINAL ABIDIN
Contingency Table
• Tables representing all combinations of levels of
explanatory and response variables
• Numbers in table represent Counts of the
number of cases in each cell
• Row and column totals are called Marginal
counts

4
SCHOOL OF NUTRITION AND DIETETICS • UNIVERSITI SULTAN ZAINAL ABIDIN
Example of Contingency Table
• Response Variable
– Cognitive Level (Low, Counts
High) Marginal Count
• Explanatory Variable
– BMI (Underweight, Cognitive
Normal, Overweight, BMI Total
Obese) Low High
Underweight 59 232 291
Normal 54 367 421
Overweight 114 101 215
Obese 173 54 227
Total 400 754 1154
Marginal Count 5
SCHOOL OF NUTRITION AND DIETETICS • UNIVERSITI SULTAN ZAINAL ABIDIN
2 x 2 Contingency Table
• Each variable has 2 levels
– Explanatory Variable – Groups (Typically based on
demographics, exposure, or treatment)
– Response Variable – Outcome (Typically presence or
absence of a characteristic)

Cognitive
BMI Total
Low High
≤ 24.9 113 599 712
> 24.9 287 155 442
Total 400 754 1154
6
SCHOOL OF NUTRITION AND DIETETICS • UNIVERSITI SULTAN ZAINAL ABIDIN
Chi-Square Test (X2)
The chi-square test for independence,
• Hypothesis; also called Pearson's chi-square test or
– Comparing two or more the chi-square test of association, is
proportion used to discover if there is a
– Ho : P1 = P2 relationship between two categorical
variables.
• Assumption
– Random samples Based on study design &
– Observations are independent method
– The number of cells with
Expected Count (EC) less than
5, must be less than 20% of the Calculate expected
total number of cells. count for each cell
(SPSS will do it)
– The smallest EC must be at least
2.
7
SCHOOL OF NUTRITION AND DIETETICS • UNIVERSITI SULTAN ZAINAL ABIDIN
Example Chi-Square Test (X2) – (1)
• Hypothesis;
– Association between gender and Knowledge on
Nutrition (KoN)
– Comparing the proportion of Low KoN between
gender
– Ho : P(KoN)male = P(KoN)femafe
• Assumption
– Random samples [ √ ]
– Observations are independent [ √ ]
– The number of cells with Expected Count (EC) less
than 5, must be less than 20% of the total number of
cells Calculated by SPSS
– The smallest EC must be at least 2
8
SCHOOL OF NUTRITION AND DIETETICS • UNIVERSITI SULTAN ZAINAL ABIDIN
Chi-square using SPSS - procedure:

9
Chi-square using SPSS - procedure:
6
4

9 10
Chi-square using SPSS - Output:
Descriptive statistics for each group

Chi-square statistic = 0.417


df = 1; P-value = 0.518

Must be ≥ 2
2 EC
assumptions
Must be < 20% is met
11
Chi-square using SPSS – Table and Interpretation:

Table 1: Factors (categorical variable) associated with Knowledge on Nutrition

Low KoN High KoN X2 statistics a


Variable n P-value
Freq (%) Freq (%) (df)
Gender
Male 39 19 (48.7) 20 (51.3)
0.417 (1) 0.518
Female 34 14 (41.2) 20 (58.8)
Ethnicity
Malay
The prevalence (proportion) of Low Knowledge on
Others
Nutrition between male and female is not
Education Level
significantly different (P = 0.518). Therefore, there
Low
is no significant association between gender and
High
Knowledge on Nutrition.
a Chi-square test for independence

12
What if assumptions were not met?
• Combine adjacent columns or/and rows to
increase the EC if possible.
• If still did not meet expected cell assumption,
Fisher’s exact (FE) test can be applied (only
for 2 x 2 table in SPSS).

13
SCHOOL OF NUTRITION AND DIETETICS • UNIVERSITI SULTAN ZAINAL ABIDIN
Example Chi-Square Test (X2) – (2)
• Hypothesis;
– Association between ethnicity and Knowledge on Nutrition
(KoN)
– Comparing the proportion of Low KoN between ethnicity
– Ho : P(KoN)malay=P(KoN)chinese=P(KoN)indian=P(KoN)others
• Assumption
– Random samples [ √ ]
– Observations are independent [ √ ]
– The number of cells with Expected Count (EC) less than
5, must be less than 20% of the total number of cells
– The smallest EC must be at least 2 Calculated by SPSS

14
SCHOOL OF NUTRITION AND DIETETICS • UNIVERSITI SULTAN ZAINAL ABIDIN
Chi-square using SPSS - Output:
Descriptive statistics for each group

4 (50%) cells have EC less than


5. The smallest EC is 1.36.
One remedial maybe to
combine Indian and others, (or
even combing 3 levels) and
call it as “others”.
(Combination should be
interpretable/ meaningful)

Must be < 20%


2 EC
assumptions
Must be ≥ 2 is not met
15
Chi-square using SPSS - Output:
Descriptive statistics for each group

Chi-square statistic = 0.072


df = 1; P-value = 0.788

If EC assumptions
is still not met

2 EC
assumptions
Must be < 20% is met Must be ≥ 2 16
Chi-square using SPSS – Table and Interpretation:

Table 1: Factors (categorical variable) associated with Knowledge on Nutrition

Low KoN High KoN X2 statistics a


Variable n P-value
Freq (%) Freq (%) (df)
Gender
Male 39 19 (48.7) 20 (51.3)
0.417 (1) 0.518
Female 34 14 (41.2) 20 (58.8)
Ethnicity
Malay 43 20 (46.5) 23 (53.5)
0.072 (1) 0.788
Others 30 13 (43.3) 17 (56.7)
Education Level
The prevalence (proportion) of Low Knowledge on
Low
Nutrition between Malay and other ethnicity is
High
not significantly different (P = 0.788). Therefore,
a Chi-square test for independence
there is no significant association between
ethnicity and Knowledge on Nutrition.
17
Fisher Exact Test
• Fisher’s Exact Test is a test for independence in a 2 X
2 table.
• It is most useful when the total sample size and the
expected values are small.
– Useful when E(cell counts) < 5.
• The output consists of more than one p-values:
– Choose Exact Sig. (2-sided)

18
SCHOOL OF NUTRITION AND DIETETICS • UNIVERSITI SULTAN ZAINAL ABIDIN
Thank You

19

You might also like