0% found this document useful (0 votes)
9 views

Class5 Lecture

The document outlines the principles of continuous measurement by categorical groups in social data analysis, focusing on t-tests and ANOVA. It explains the assumptions, hypotheses, and significance testing involved in these statistical methods, particularly when comparing means across three or more groups. Additionally, it discusses the importance of post-hoc tests following ANOVA to identify specific group differences.

Uploaded by

4A13 NG SHAN YU
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Class5 Lecture

The document outlines the principles of continuous measurement by categorical groups in social data analysis, focusing on t-tests and ANOVA. It explains the assumptions, hypotheses, and significance testing involved in these statistical methods, particularly when comparing means across three or more groups. Additionally, it discusses the importance of post-hoc tests following ANOVA to identify specific group differences.

Uploaded by

4A13 NG SHAN YU
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

CLASS 5 SOWK 2144 Introduction to

CONTINUOUS MEASURE BY CATEGORICAL GROUPS Social Data Analysis

(THREE OR MORE GROUPS) Yu-Chih Chen, PhD


HOUSEKEEPING
We have Class 6 (Oct. 16) during the reading week
Project updates—see my response
OUTLINE
Review of t-test
ANOVA
R Lab (next week)
REVIEW OF KEY CONCEPT
Two-sample t test
 Assumptions?
 DV?
 IV?
 # obs?
 Requires for normality?

Paired t test
 Assumptions?
 DV?
 IV?
REVIEW OF KEY CONCEPT
Sample statistics and population
parameters
 We are not able to estimate population
parameters (why? Time and money issues)
 We use sample statistics instead (with an
assumption that the sample is randomly selected
and representative of the population)
 We then conduct tests based on the samples we
get. If we find out there are some effects in the
samples, we assume that this effect can also be
observed in the population
REVIEW OF KEY CONCEPT
Parametric estimates vs. non-parametric estimates?
 Parametric estimates: when all the possible statistical assumptions are met, then we can use parametric
estimates
 If some statistical assumptions are violated, you will need to consider non-parametric estimates (Class 9,
categorical data analysis)

Statistical significant vs. practical significant


 Statistical significance: the results are significant (because the power is high enough to detect the
difference—many times due to a large sample size)
 Practical significance: the difference is bigger enough and has some clinical/practical meaning (effect
size, which will not be covered in this course due to it being more clinical in nature)
TEST OF MEANS ACROSS 3+ GROUPS
(ANALYSIS OF VARIANCE, ANOVA)
STUDY QUESTIONS
Our world is not simple and can be sliced into two groups (male/female or pro-
democracy/pro-establishment). Our world often time has multiple categories
People often make comparisons among multiple groups, such as
 Does waiting time in the hospital vary by area (HK Island East and West, Kowloon Central, East and
West, and New Territory East and West)?
 Does educational achievement (number of education years) vary by socioeconomic status (measured by
income quintile)?
 Does job discrimination vary across the younger, middle-aged, and older populations?

If you have a similar study question that involves a grouping IV that has more than 3
or more categories with an outcome of continuous measure, you are using a method
called the analysis of variance (ANOVA) or F test (Fisher’s test)
ANOVA (ANALYSIS OF VARIANCE) FAMILY
ANOVA is a big family
 One-way ANOVA (Fisher’s test or F-test)*
 DV is one continuous variable
 IV is one categorical (nominal, 3+ groups) variable
 Two-way ANOVA (factorial ANOVA; two-factors ANOVA)
 DV is one continuous variable
 IV is two categorical variables
 MANOVA (Multivariate ANOVA)
 DV is multiple continuous variables
 IV is one categorical (nominal, 3+ group) variable
 ANCOVA (Analysis of Covariance)
 DV is a continuous variable
 IV is one categorical (nominal, 3+ group) variable
 Covariates can be any type of variables
WHAT DOES THE ANOVA TEST TELL ME?
It shows you the “group effect,” this is originally that ANOVA comes from clinical
research
An ANOVA can also be used to describe differences between groups where there is
no experimental design or intervention. Difference?

Difference? Difference?
T-TEST VS. ANOVA
Independent variable (IV) Dependent variable (DV)
Independent t-test Nominal (two levels. • Interval/ratio measurement
Example: males or females; • Independence of observation
young or old…) • Normal distribution
• Equal variance
ANOVA (F test) Nominal (three- or more • Interval/ratio measurement
levels. Example: • Independence of observation
Caucasian, African • Normal distribution
American, Hispanics…) • Equal variance
HYPOTHESIS TESTING
Research hypothesis:
 These is a difference in the mean of DV among groups A, B, & C
(MeanA ≠ MeanB ≠ MeanC)

Null hypothesis
 There is no difference in the means of DV among groups A, B, & C
(MeanA = MeanB = MeanC)

Note: The IV doesn’t have to be limited to three groups, but it is better to keep your
group < 6 groups because post-hoc comparison will be complicated as the number of
groups increases.
ASSUMPTIONS
IV: nominal variable with 3 or more groups
DV: continuous and normally distributed
Independence of observations
Rules of 30 (i.e., 30+ per group robust rule applies)
Equal variance
 Rule of thumb: If the largest group variance is no more than 1.5 times larger than the smallest, the test
will be robust
 But it is safer to do other formal tests (what tests?)
WHY NOT JUST RUN MULTIPLE T-TESTS?
If we had three groups, why don’t we just run 3 times of the t-test?
 It will inflate Type I error (false positive) because of running the same test multiple times with the same
data
 We control our type I error by 5%, meaning we get significant results by chance by limiting the
probability to only 5%
WHY NOT JUST RUN MULTIPLE T TEST?
Why can’t we do 3 times of t-tests? Because it will inflate my type I error
If each of these t-tests uses a 0.05 level of significance (i.e., 5%), that means for
each test, the probability of falsely rejecting the null hypothesis (known as Type I
error) is only 5%. Therefore, the probability of getting a NO Type I error is 95%
If we assume each 3 test is independent, then the overall probability of no Type I
error is 0.95*0.95*0.95=0.857
Type I error by doing 3 t test = 1-0.857=0.143 (or14.3%). This means my Type I
error inflates from 5% to 14.3%
The ANOVA test takes these multiple tests (by pair) into account in a SINGLE test and
controls my Type I error at 5% (no inflation). Consequently, it is called an “omnibus”
test
HOW ARE RESULTS SIGNIFICANT?
BSS= between sum of squares. This can be thought of as the distance between
groups

WSS=within (or error) sum of squares. This can be considered the distance between
individual points and the central mean.

IF BSS > WSS then p<.05


WHAT IS THE COMPUTING DOING?
The F-test is essentially a ratio between the between groups and within groups
variance.
First we have a sum of squares—this is the summed difference of either the
observations –within the group mean or the summed difference of the group means
from the overall adjusted for n-sizes
MSb 2 = SSb/ dfb
MSw2 = SSw/ dfw
dfb = # groups – 1
dfw = Total N - # of groups
F= MSb 2 / MSw2
HOW DOES THE F TEST WORK?
Step 1: Check the assumption and the variance of
the outcome variables in each group.
Elementary=1
 Normality by each group?
N 1077 Sum Weights 1077
 Group size (30 rule) Mean 15.4716806 Sum Observations 16663
 Variance ratio < 1.5 (a rule of thumb for equal variance—a Std Deviation
Skewness
17.9360799
3.22556221
Variance
Kurtosis
321.702961
14.8901863
crude approach)
 Variance ratio=321.70/281.96=1.14 Middle=1

N 445 Sum Weights 445


Mean 15.2359551 Sum Observations 6780
Std Deviation 17.8332002 Variance 318.023029
Skewness 2.5908082 Kurtosis 9.0023776

High School=1

N 959 Sum Weights 959


Mean 11.7007299 Sum Observations 11221
Std Deviation 16.7916469 Variance 281.959404
Skewness 3.85795425 Kurtosis 19.9709328
HOW DOES THE F TEST WORK?
Step 2: Check the test for equal variance
 We use Levene's test to test equal variance across groups (below shows the hypothesis testing for
Levene’s test)
 Null hypothesis H0: Var1 = Var2 = Var3 (equal variance)
 Research hypothesis H1: Var1 ≠ Var2 ≠ Var3 (unequal variance)
 If Levene’s test is significant (i.e., p < 0.05), we reject H0, and this means we have unequal variance
 If Levene’s test is not significant (i.e., p > 0.05), we accept H0, and this means we have equal variance
HOW DOES THE F TEST WORK?
Step 3: An overall test (F test) shows at least a pair of comparisons has a
difference if the F test is significant.
 The F test is like a t-test. If the test is SIGNIFICANT, it says that at least there is a group’s mean is
different from another two groups, or the mean is different across three groups—but you don’t know
which one makes the difference
 Because you want to know which comparison has the mean difference, you will need to do another test
called “post-hoc test.”
 The post hoc test tells you whether any pairs of groups have differences
 Before you do a post-hoc test, you have to make sure equal variance
 Yes (equal variance): proceed to post-hoc test
 No (unequal variance): do Welch’s correction for the F-test, then proceed to the post-hoc test
POST-HOC TEST

If your F test is NOT SIGNIFICANT, then you should STOP

You do post-hoc test only when you have a SIGNIFICANT


F test
HOW DOES THE F TEST WORK?
Step 4: Conduct the post-hoc test to find out which pair of comparisons has a
difference via either an experiment-wise (or called family-wise) test (more
conservative) or a pair-wise test (more liberal).
Experiment-wise (family-wise) test: All the comparisons use the same criterion. This test
is much more conservative (it increases the Type II error but controls the Type I error).
 The common family-wise tests are Tukey HSD (honestly significant difference), Bonferroni, Scheffe,
and Dunnett test.
Pair-wise test: Different pairs of comparison use different criterion. This test is very
liberal (it increases the Type I error but decreases the Type II error).
 The common methods are LSD (least-significant difference) and studentized Newman-Keuls (SNK)
test.
POST HOC TEST: WHICH ONE SHOULD I USE?
Some tests control for experiment-wise alpha (=0.05), and others control for the alpha for
each comparison. An experiment-wise (Tukey is one) approach is more conservative, which
means it has a higher likelihood of a Type II error instead of Type 1.
Many researchers use Tukey because of its performance on simulations and the fact that it can
be used to generate confidence intervals and handle unequal sample size (recommended)
The Scheffe used to be the test of choice because it handled unequal group sizes, but the
Tukey can also handle this
Dunnett’s test is useful if you specifically want to test groups against a single control and you
have to specify what group the control
The Neuman-Keuls test is a common comparison-wise approach. It may give you more power
to detect, but it also has some problems when there are more than 3 groups.
You rarely see people use more specialized tests for non-equal variances (though they exist)—
as the typical tests can handle moderate violations just like the ANOVA
SOME LITERATURE
POST-HOC TEST
Some illustration…

McHugh, M. (2011). Multiple comparison analysis testing in


ANOVA. Biochemia Medica, 21, 203-209.
REVIEW OF THE TESTS WE HAVE LEARNED
If Rachel wants to examine the changes of knowledge on math
using pre- and post-test
1. What is this test?
2. What are the assumptions?
3. What is the research question?
4. What key information should be reported?
REVIEW OF THE TESTS WE HAVE LEARNED
If Phoebe wants to know how aggressive levels (range: 10-60)
varied by gender
1. What is this test?
2. What are the assumptions?
3. What is the research question?
4. What key information should be reported?
REVIEW OF THE TESTS WE HAVE LEARNED
If Ross wants to examine the number of survival years among
different patients with different types pf cancer (lung cancer,
breast cancer, and prostate cancer)
1. What is this test?
2. What are the assumptions?
3. What is the research question?
4. What key information should be reported?
ANOVA WITH R CODE
STUDY QUESTION: IS THERE A DIFFERENCE IN SELF-
REPORTED HEALTH AND AGE BY EDUCATION LEVELS?
IV: Education level DV: Self-rated health and age
Education level (low, medium, and high) DV1 : Self-rated health (we treat it as
 1 = illiterate continuous, for now)
 2 = did not finish primary  1 = very good
Low
 3 = sishu  2 = good
 4 = elementary school  3 = fair
 5 = middle school  4 = poor
 6 = high school Medium  5 = very poor
 7 = vocational school DV2: Age (in years)
 8 = associate degree
 9 = college High
 10 = postgraduate
HYPOTHESIS (USING SRH AS EXAMPLE)
Self-rated health differences by education
 Null hypothesis: The mean of self-rated health is the same across different levels of education (Mlow =
Mmedium = Mhigh)
 Research hypothesis: The mean of self-rated health is not the same (different) across different levels
of education (Mlow ≠ Mmedium ≠ Mhigh)

Age differences by education


 Null hypothesis: The mean age is the same across different levels of education (Mlow = Mmedium = Mhigh)
 Research hypothesis: The mean age is not the same (different) across different levels of education
(Mlow ≠ Mmedium ≠ Mhigh)
STEPS
Step 0: Data preparation
Step 1: Check the assumptions for each variable
Step 2: Check the assumptions for each variable by group
Step 3: Check the test for equal variance test (Levene’s test)
Step 4: Conduct the F test (and considerations to use Welch correction)
Step 5: Conduct post-hoc test
Step 6: Provide your interpretation (write-up)
STEP 0: DATA PREPARATION (RECODE VARIABLE)
Recode/collapse a variable with more categories into fewer categories
We need to examine the distribution of the variable first
 In our education variable, the values ranging from 1-10
 No strange values were observed (e.g., missing)

How missing was coded?


 Missing = no data point. Common way: assign a dot (.) or empty
 Other ways: assign “impossible” values, such as 9999 or -9999
STEP 0: DATA PREPARATION (RECODE VARIABLE) What if the missing is coded as -9999?
Always set a boundary when recode variables
Beware of missing values in recoding variables when you use logic
Example (see table at right) edu N Recode #1 Recode #2
We want to create 4 categories 1 = illiterate 2
 Illiterate (1, 2) 4 4
2 = no primary school 2
 Low (3, 4)
3 = primary school 2
 Middle (5, 6) 4 4
4 = middle school 2
 High (7, 8, 9)
5 = high school 2
Logic 4 4
6 = vocational school 2
 Illiterate (1, 2): edu <=2
7 = associate degree 2
 Low (3, 4): edu >= 3 & edu <=4
8 = college 2 6
 Middle (5, 6): edu >=5 & edu <=6 8
 High (7, 8, 9): edu >=7 & edu <=9 (recode #1) 9 = post graduate 2
edu >=7 (recode #2) 9999 = missing 2 2
STEP 0: DATA PREPARATION (RECODE VARIABLE)
R code
Method 1: data$new.var[data$old.var with logic] <- “category”
 data$edu4[data$edu >=1 & data$edu <=2] <- “illiterate”
 data$edu4[data$edu >=3 & data$edu <=4] <- “low”
 data$edu4[data$edu >=5 & data$edu <=6] <- “middle”
 data$edu4[data$edu >=7 & data$edu <=9] <- “high”

Method 2: using cut, breaks, and label function


 data$edu4 <- cut(data$edu, breaks = c(0, 2, 4, 6, 9),
labels = c(“illiterate", "low", "medium", "high"))
 0 and 9 are boundary
 2, 4, & 6 are cutting points
STEP 1: CHECK THE ASSUMPTIONS FOR EACH
VARIABLE
Two steps
 First, examine the variable (DV) alone
 Second, examine the DV by groups

In each step, we will examine


 Distribution (mean, SD, variance, skewness, others)
 Normality (visual approaches: Q-Q plot, histogram; formal test: K-S test)
STEP 1: CHECK THE ASSUMPTIONS FOR EACH
VARIABLE
Distribution (R code)
 summary(var): getting mean for continuous variable
 table(var): getting distribution for categorical variable

Mean (R code) using “psych” package: mean, sd, skewness, other index…
 install.packages("psych")
 library(psych)
 describe(data$var)

Normality (Q-Q plot, R code)


 install.packages("ggplot2")
 library(ggplot2)
 qqplot.var <- qplot(sample = data$var, stat="qq")
 qqplot.var
STEP 1: CHECK THE ASSUMPTIONS FOR EACH
VARIABLE
Normality (histogram, R code)
 hist(data$var)

K-S test for normality


 install.packages("nortest")
 library(nortest)
 lillie.test(data$var)
STEP 2: CHECK THE ASSUMPTIONS FOR EACH
VARIABLE, BY GROUP
Distribution (mean, SD, skewness, etc.) by group using “by” statement (R code)
 by(data$DV, data$IV, describe)

Q-Q plot by group (R code)


 install.packages("ggpubr")
 library(ggpubr)
 ggqqplot(data, “DV", facet.by = “IV")
STEP 2: CHECK THE ASSUMPTIONS FOR EACH
VARIABLE, BY GROUP
R output (descriptive stats): Self-rated health, by education level (low, medium, high)

Check (1) sample size in each group, (2) mean, (3) Var, (4) skewness, and (5) others such as min, max, range…
STEP 2: CHECK THE ASSUMPTIONS FOR EACH
VARIABLE, BY GROUP
R output (Q-Q plot): Self-rated
health, by education level (low,
medium, high)
STEP 3: CHECK THE TEST FOR EQUAL VARIANCE
TEST (LEVENE’S TEST)
Levene’s test (R code)
 install.packages("car")
 library(car)
 leveneTest(data$DV, data$IV, center = median)

Test results:
 If the test is significant (p < 0.05), we don’t have equal variance
 If the test is not significant (p > 0.05), we have equal variance

Why? Thinking about the write-up for hypothesis testing


 Null hypothesis: var1 = var2 = var3
 Alternative hypothesis: var1 ≠ var2 ≠ var3
STEP 3: CHECK THE TEST FOR EQUAL VARIANCE
TEST (LEVENE’S TEST)
R output (Levene’s test) for testing variance of self-rated health by education

Equal variance or unequal variance?


STEP 4: CONDUCT F TEST
ANOVA, assuming equal variance (R code)
 model <- aov(DV ~ IV, data = your data set)
 summary(model)

ANOVA, if equal variance does not hold. We request welch correction (R code)
 oneway.test(DV ~ IV, data = your data set)
STEP 4: CONDUCT F TEST
R output (ANOVA), self-rated health by education

Which hypothesis is supported?


• Null hypothesis: The mean of self-rated health is the same across different levels of education (Mlow =
Mmedium = Mhigh)
• Research hypothesis: The mean of self-rated health is not the same (different) across different levels of
education (Mlow ≠ Mmedium ≠ Mhigh)
STEP 4: CONDUCT F TEST
R output (Welch correction, assuming non equal variance), self-rated health by
education

Even though we did Welch’s correction, the result remains significant! (but the value is a bit different)
But this is the correct result, not the prior one. Why???
STEP 5: CONDUCT POST-HOC TEST
If the ANOVA is significant, it means at least one pair of comparisons has a mean
difference. We use post-hoc test to identify which pair of mean comparisons cause
difference
R has pairwise.t.test function as part of the R base system. It supports post-hoc test
methods such as Bonferroni, Tukey, or others
We will use Tukey HSD for post-hoc tests as it is more conservative. To use this
method, you need “multcomp” package with the glht command
STEP 5: CONDUCT POST-HOC TEST
Tukey HSD post-hoc test (R code)
 install.packages("multcomp")
 library(multcomp)
 model <- glht(aov.model, linfct = mcp(IV = "Tukey"))
 summary(model)
STEP 5: CONDUCT POST-HOC TEST
STEP 6: PROVIDE YOUR INTERPRETATION (WRITE-
UP)
You must report the overall F test, including the degree of freedom, F values,
and significance.
You need to report whether the equal variance assumption is passed. If not,
you will need to use Welch to correct it.
Report the results from the post hoc test (report the comparisons concisely). You
will also need to report the mean and SD for each group.
EXAMPLES
This study examines the effect of education levels (low, medium, and high) on older
adult’s self-rated health. The significant Levene’s test (F(2,3838) = 8.47, p < .001)
indicated the equal variance assumption was violated, and therefore the Welch
correction was applied.
The F-test results showed that education has a significant effect on self-rated health
(F(2,3838) = 24.676, p < .001). The Tukey post-hoc test indicated that older adults with
high education (M = 3.22; SD = 0.88) had higher levels of self-rated health
compared to those with medium education (M = 2.94; SD = 0.88) and low education
(M = 2.73; SD = 0.92). All the comparisons were statistically significant at the 0.05
level.
EXAMPLES OF TABLES
Do we need to do post-hoc test for ANOVA?
No, why?
Do we need to do post-hoc test for ANOVA?
Yes, why?

You might also like