0% found this document useful (0 votes)
16 views66 pages

Class 13

class 13

Uploaded by

Victor Conan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views66 pages

Class 13

class 13

Uploaded by

Victor Conan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 66

Class 13: Nonparametric Statistics

Class 13
Nonparametric Statistics
I. Parametric or nonparametric test?
II. One-sample and paired data (analog of one-sample t-test)
III. Two independent samples (analog of two-sample t-test)
IV. Comparing more than two groups (analog of one-way
ANOVA)

1
I. Parametric or Nonparametric Test?

Parametric or
Nonparametric Test?

2
I. Parametric or Nonparametric Test?

Background
• We have discussed several statistical tests, including
the one-sample (or paired) and two-sample t-tests, and
ANOVA, that make assumptions about the distribution
of the data (i.e. normally distributed data).
• These methods are called parametric because they are
based on distributions that are defined by parameters.
• For example, normal distributions are defined by two
parameters and

3
I. Parametric or Nonparametric Test?

Background
If the assumptions needed to use these distributions are
violated then two general approaches can be used:
1. Transform the data (y and/or x variables) so that the
distribution appears more normal. Log transformation
and square root are common. Data is then analyzed
using parametric methods that assume normality.
2. Use nonparametric methods. Nonparametric methods
make fewer and more generic assumptions about the
distribution of the data.
In this lecture, will discuss some commonly used
nonparametric tests.
4
I. Parametric or Nonparametric Test?

Parametric or nonparametric test?


• No hard and fast rules to determine if a parametric or
nonparametric test is most appropriate.
• Decision may depend on the data being analyzed.
• The next few slides provide a set of guidelines to
consider.
• A range of evidence should be used when deciding
whether to use a parametric or nonparametric test.
• A nonparametric test will never be wrong but the
parametric analog is usually more powerful if the
assumptions are valid.
5
I. Parametric or Nonparametric Test?

Parametric or nonparametric test?


1. Visual inspection of the distribution of data in each
group. Skewed distributions and/or severe outliers
may indicate that a nonparametric test is appropriate.
2. Comparison of mean and median. In data from a
normal distribution, the mean and median should be
nearly identical. If these two measures of center differ
a lot, then a nonparametric test may be more
appropriate.

6
I. Parametric or Nonparametric Test?

Parametric or nonparametric test?


3. Test the null hypothesis that the data are from a
normal distribution using the Shapiro-Wilk (S-W) test.
• in R, shapiro.test(x)

NOTE: The S-W test can be overly sensitive for a large sample
size and underpowered for a small sample size. This test may
reject the null hypothesis of normality when a parametric test
is appropriate or may fail to reject the hypothesis of normality
when a nonparametric test is appropriate.

7
I. Parametric or Nonparametric Test?

Parametric or nonparametric test?


4. Prior knowledge of the variable. An ordinal variable with
a limited number of categories will not have a normal
distribution and a nonparametric test is more appropriate.
5. T-tests are much less robust to violations of the normality
assumption for small sample sizes. So with only a handful
of observations a nonparametric test is more
appropriate.
6. When the sample size is large enough the Central Limit
Theorem kicks in and the tests are much more robust to
departures from normality and thus, a parametric test
may be more appropriate.

8
Overview
Parametric Nonparametric
One sample One sample t-test 1. Wilcoxon’s signed
rank test

Two samples Paired t test 1. Wilcoxon’s signed


rank test

Unpaired t test Wilcoxon rank sum test


(aka Mann-Whitney U
test)

>2 Samples ANOVA Kruskal-Wallis test


II. One-sample and Paired Data

One-sample and Paired


Data

10
II. One-sample and Paired Data

Wilcoxon Signed Rank Test (WSRT)


• The signed rank test uses information on the relative
magnitude of the paired differences as well as their
signs.

• Assumptions of the differences:


• Independent observations
• Continuous or ordinal observations (continuous is
preferred)
• The distribution is symmetric

11
II. One-sample and Paired Data

Wilcoxon Signed Rank Test (WSRT)


for paired sample
before<-c(38,20,76,80,60,50,52,40,60,30)
after<-c(66,35,89,80,56,42,76,56,64,42)
diff<-after - before

hist(diff)
qqnorm(diff)
qqline(diff)
shapiro.test(diff)

16
II. One-sample and Paired Data

Diet Example
Biggest reason for
nonparametric test here is
small sample size (n=10)

17
II. One-sample and Paired Data

Why we use the WSRT here


• S-W test for normality is not rejected (p = 0.8135) - so we
don’t have statistical evidence that the population from
which the data were sampled is not normally distributed.

• However, our sample size of 10 is very small and thus the


Shapiro-Wilk test is underpowered to detect a departure
from normality. So we’ll use a nonparametric test.

• Although difficult to tell, due to the small sample size, the


distribution does not look incredibly skewed and we will
assume the distribution of the population is symmetric.
Using the assumption of symmetry enables us to use the
Wilcoxon Signed Rank Test.

18
II. One-sample and Paired Data

In R
• One sample
wilcox.test(x, mu = 0)

• In our diet example (paired data)


wilcox.test(diff, mu = 0)
wilcox.test(before, after, paired = TRUE)

19
II. One-sample and Paired Data

Example: Reporting Results


• Hypotheses
H0: the median difference is zero
H1: the median difference is not zero

• Test statistic V = 40.5


• Conclusion: We reject the null hypothesis that the median difference
in rating of overall health before and after a 6-week diet and exercise
program is zero. with p-value of 0.03798. We conclude there is a
median difference in health before vs after.

• We performed the Wilcoxon Signed Rank test to assess if there was a


significant change in health rating before versus after a six-week diet
and exercise program. The median difference in health rating was a
12.5 unit increase which was significant (p=0.038, V=40.5) at the 5%
level. 20
II. One-sample and Paired Data

Wilcoxon Signed Rank Test (WSRT):


Comparison to paired t-test
• WSRT is based on the signs and ranks of the data
• The one sample and paired t-test assume independent
observations that are from a normal distribution and
tests the null hypothesis that the mean difference is
zero.
• Although requiring the more stringent assumption of
normality, the one sample and paired t-test are more
powerful than the signed rank test if the assumptions
are met or the sample size is large.
21
III. Two Independent Samples

Two Independent
Samples

22
III. Two Independent Samples

Two Independent Samples:


Wilcoxon Rank Sum test (WRST)
• Nonparametric analog of two-sample t-test
• Assumptions
• Independent observations.
• Continuous or ordinal observations.

• Also called the Man-Whitney U Test

23
III. Two Independent Samples

Wilcoxon Rank Sum test: Hypothesis


H0: The distributions of the populations from which the two
groups are sampled are the same.

H1: The distributions of the populations from which the two


groups are sampled are different.

Note, to use the Wilcoxon rank sum test as a test of medians, the
two population distributions from which the groups are sampled
must have the same shape.

24
III. Two Independent Samples

Wilcoxon Rank Sum test: Example


A total of n = 8 children with asthma are randomized to two different
asthma treatments (A or B). The outcome of interest is the number
of asthma attacks that occur over a one-month period.

Observed data:
A: 1 7 11 16
B: 8 10 12 15

26
Class 1II: Nonparametric Statistics

Example: Two-sample test

29
III. Two Independent Samples

Wilcoxon Rank Sum test: Example

Results: The median number of asthma attacks in


Group A is 9 and the median number of asthma
attacks in Group B is 11. Our test statistic is W=6 with
4 people in each group.

Conclusion: There is no significant difference in the


distribution of the number of asthma attacks by
treatments A and B (p = 0.6857) at the 0.05 level.

30
III. Two Independent Samples

Wilcoxon Rank Sum test: Reporting Results


Technical Summary
Hypotheses:
: Distribution of the number of asthma attacks is the same for both
treatment groups.
: Distribution of the number of asthma attacks is not the same for the
treatment groups.
Test was conducted at the α = 0.05 significance level.
Results: The median number of asthma attacks in Group A is 9 and the
median number of asthma attacks in Group B is 11. Our Mann-Whitney
test statistic is W=6 with 4 people in each group.

Conclusion: No significant difference in the distribution of the number of


asthma attacks by treatments A and B (p = 0.6857) at the 0.05 level.
31
III. Two Independent Samples

Wilcoxon Rank Sum test: Reporting Results


Write-up
We first checked for normality in each treatment group using Shapiro-
Wilk test of normality and by visually inspecting the data using a
histogram of asthma attacks for each treatment group. The S-W test was
not significant for group A or B (p-values of 0.99 and 0.95) so null
hypothesis of normality is not rejected for either group. However, due to
small sample sizes of 4 for each group we chose to run the more
conservative nonparametric Wilcoxon rank-sum test to compare the
distribution of number of asthma attacks for the two treatment groups.

Median number of asthma attacks were 9 and 11 for treatments A and B


respectively. While treatment A resulted in slightly less asthma attacks
than treatment B, difference was not significant (W=6, p = 0.6857) at the
0.05 level.
32
IV. Comparing More than Two Groups

Comparing More than


Two Groups

33
IV. Comparing More than Two Groups

Comparing more than two groups –


the Kruskal-Wallis test
• Nonparametric analogue to one-way ANOVA F-test
• No assumptions about normality and homoscedasticity,
which are both required for the ANOVA F-test.
• Works similarly to Wilcoxon Rank Sum Test by focusing on
comparing the ranks in each group.
• Substituting ranks for original values makes this a less
powerful test than an ANOVA F test, so ANOVA should be
used if its assumptions are not severely violated.
• In R: kruskal.test(response~group, data=ds)
35
Review for units 2-
4
1. Histgram of continuous variable. R code: hist(x)
2. Pearson and spearman correlation of two continuous variables.

(1) How to write the null hypothesis and alternative hypothesis?


(2) R code:
cor.test(x,y, method="pearson")
cor.test(x,y, method="spearman")
(3) How to interpret the R results? Which one is correlation coefficient?
Which one is p value?
(4) How to write conclusions including a comment on the strength and
direction of the association?
Spearman correlation:
Null hypothesis (Ho) in words: There is no monotone association between x and y. (ρ=0)
Alternative hypothesis (Ha) in words: There is a monotone association between x and y.
(ρ≠0)

Pearson correlation:
Null hypothesis (Ho) in words: There is no linear association between x and y. (ρ=0)
Alternative hypothesis (Ha) in words: There is a linear association between x and y.
(ρ≠0)
P-VALUE

Correlation coefficient

P-VALUE

Correlation coefficient
Spearman:
Conclusion: Since the p-value of 0.0013 is less than α=0.05, we reject the null
hypothesis and conclude that there is significant evidence of a strong positive
monotone association between x and y (r = 0.916, p-value = 0.0013).

• |r| ≥ 0.9 strong

• 0.7 ≤ |r| < 0.9 moderate-to-strong

• 0.5 ≤ |r| < 0.7 moderate

• 0.3 ≤ |r| < 0.5 moderate-to-weak

• 0 ≤ |r| < 0.3 weak


3. ANOVA: used to test the overall difference in mean among multiple groups (>2)
(1) How to write null and alternative hypothesis?
(2) How to write r code to get anova results?
(3) How to interpret the anova results from R? Which one is test statistic? Which
one is degree of freedom? Which one is p-value? What’s the decision and
conclusion?
(4) Given an anova table with missing entries, how to fill these missing entries?
(5) How to calculate and interpret R^2 from the anova table?
ANOVA

ANOVA – Blister Example


• A study was conducted to compare the effectiveness of
multiple treatments on the healing of fever blisters (#
of days to heal)
• 30 subjects with fever blisters were randomly assigned to
placebo or one of 4 treatments
• μ1 is the mean number of days to heal in gp 1 (Placebo)
• H0: μ1 = μ2 = μ3 = μ4 = μ5 the mean number of days to heal
is the same in all groups
• H1: at least one group differs from the others with respect
to the mean number of days to heal
42
ANOVA

ANOVA in R
• Formula for one-way ANOVA: y ~ group
> fit <- aov(days~treatment, data=ds)
> summary(fit)

Model

Error

Source DF Sum of Squares Mean Square F Value Pr > F

Model
Error
Total

43
Conclusion: Since the p-value of ?? is greater than α=0.05, we fail
to reject Ho and conclude that there is no significant difference in
mean number of days to heal (replace with the true meaning of
response in your case) across the different treatment (replace
with the true meaning of groups in your case) groups

Conclusion: Since the p-value of ?? is less than α=0.05, we reject


Ho and conclude that there is significant difference in mean
number of days to heal (replace with the true meaning of
response in your case) across the different treatment (replace
with the true meaning of groups in your case) groups
R2 = Model SS/Total SS = 0.6

60% of the variability in birth weight of the


baby is explained by treatment groups
4. Linear regression
(1) How to write r code to get multiple linear regression results?
(2) How to interpret the regression results from R?
(3) T test for each variable. Get the
How to write null and alternative hypothesis?
Wha is test statistic, degree of freedom, effect estimate and 95% CI (confint
function in R), p value, decision and conclusion?
(4) write the equation for the best fit line.
(5) Make predictions based on the best fit.
(6) When making predictions, be cautious!
I. Multivariable Linear Regression

Example
ds = read.csv("fev.csv")
dim(ds)
head(ds)

summary(ds$age)
ds$smoking = factor(ds$smoking)
res=lm(fev ~ age + smoking, data=ds)
summary(res)
confint(res)

47
I. Multivariable Linear Regression

Parameter Estimates and specific tests


Conclusions for
specific tests?

equation for the best fit line:


= 2.86 + 0.012*– 0.895*SMOKING

DF for t-statistics
t-statistic follows a t-distribution with degree of freedom (n-p-1) 48
I. Multivariable Linear Regression

Test for Age


• H0: There is no linear association between FEV and age, after adjusting
for smoking status
• H1: There is a linear association between FEV and age, after adjusting
for smoking status
• Level of significance α = 0.05
• βage=0.012, t = 1.151, df = 651, p = 0.25
• Interpretation of βage For every one-year increase in age, on average
there is a 0.012 unit increase in FEV after adjusting for smoking status
• Conclusion: Do not reject H0. There is no evidence of an association
between FEV and age when adjusting for smoking status. For every
one-year increase in age, on average there is a 0.012 (95% CI = (-0.08,
0.03) ) unit increase in FEV after adjusting for smoking status but there
is no statistically significant association observed.

50
I. Multivariable Linear Regression

Prediction
• We can predict FEV from a combination of age and
smoking status.
SMOKING

• A 10-year old non-smoker has a predicted FEV of:

• A 17-year old smoker has a predicted FEV of:

51
Is it appropriate to predict FEV for a 30-year old non-
smoker?

No, this is not reasonable. Since the study was conducted among
individuals aged 3–19 years, it is inappropriate to extrapolate the
findings beyond this age range. Therefore, applying the model
results to a 30-year-old, who is outside the study's maximum age
limit, would not be valid.
4. Logistic regression
(1) How to write r code to get multiple logistic regression results?
(2) Based on Odds ratio (OR) and 95% CI of OR, interpret the association between
x and y. How to interpret OR is important!
IV. Multivariable Logistic Regression

54
IV. Multivariable Logistic Regression

Multiple Logistic Regression

Interpretation
of results?? 55
IV. Multivariable Logistic Regression

Interpretation – Odds Ratios for Trt


H0: There is no association between treatment group and
developing CA after adjusting for age (odds ratio = 1)
H1: There is an association between treatment group and
developing CA after adjusting for age (odds ratio ≠ 1)

• We reject the null hypothesis (p-value = 0.0020).


• The odds ratio is 0.197. Those treated with gamma
globulin have 0.197 times the odds (i.e. 80.3% decrease in
odds) of having a coronary abnormality compared to
those treated with aspirin, after adjusting for age. The
95% confidence interval of the OR is (0.063, 0.515).

56
IV. Multivariable Logistic Regression

Interpretation – Odds Ratios for Age


H0: There is no association between age and developing CA
after adjusting for treatment group (odds ratio = 1)
H1: There is an association between age and developing CA
after adjusting for treatment group (odds ratio ≠ 1)

• We fail to reject the null hypothesis (p-value = 0.57).


• We estimate that someone who is one year older has 0.93 times the odds
of having a coronary abnormality compared to someone aged 1 year
younger, after adjusting for treatment group gg. But this difference is not
significant.

• Or we can say: The odds ratio is 0.93. Each year increase in age is
associated with a 7% decrease in the odds of having a coronary
abnormality, after adjusting for treatment group. But this difference is not
significant. 57
What is the odds ratio for developing CA that is associated
with a 5 years old increase in age?

OR = e(-0.07*5) = 0.7

For each 5-year increase in age, the odds of developing CA


decreased by 30%, after adjusting for treatment group gg
5. determine whether x1 is an effect modifier of the association
between x2 and y
(1) Interaction between x1 and x2, using y as response. glm(y~ x1+ x2 +
x1:x2)
(2) Interpret the p-value for the interaction term. If the interaction
term is not significant, then the conclusion is: Since the interaction
term for x1 and x2 is not statistically significant, our next steps would
be to refit the model with only the main effects for x1 and x2.
Effect Modification and Interaction

Effect Modification and


Interaction
When Confounder Moderates the Exposure-Outcome Relationship

60
II. Effect Modification and Interaction

Interaction
• Interaction means that the effect of a predictor x1
on the outcome differs according to the level of
another predictor x2

• Interaction is also referred to as effect modification

• Can test for interaction/effect modification via


hypothesis tests

61
II. Effect Modification and Interaction

HERS Example
• Does the effect of hormone therapy (predictor) on
LDL (outcome) cholesterol differ according to baseline
statin use?

• Suppose allocation to hormone therapy and use of


statins are each coded using dummy variables (0/1)

62
II. Effect Modification and Interaction

Regression model with interaction term


• Now consider the following regression model:

• LDL cholesterol= β0 + β1HT + β2statins + β3 HT*statins


• HT is the indicator of assignment to hormone therapy
• statins the indicator of baseline statin use
• The product HT*statins is the interaction term
• To determine if the interaction term is significant, we
test if β3 = 0

63
II. Effect Modification and Interaction

Creating Interaction Term


lm(ldl1 ~ HT * statins, data=hers)

Or

lm(ldl1 ~ HT + statins + HT:statins, data=hers)

64
II. Effect Modification and Interaction

Test for Interaction


• The formal hypotheses for this test of interaction are:
• H0: The effect of HT on LDL does not vary by statin
use (β3 = 0)
• H1: The effect of HT on LDL does vary by statin use (β3
≠ 0).

65
II. Effect Modification and Interaction

Test for Interaction

We reject the null


hypothesis
(p=0.0161) – there is
evidence of the
need for an
interaction term
66

You might also like