Session 2 - Stats 2
Session 2 - Stats 2
Jadwiga Michlewicz
Overview of the sessions
Session 1: Session 2:
■ Recap of session 1
■ Recap Statistical Inference
■ Partial and Semi Partial Correlation
■ Simple Linear Regression ■ Regression with Code Variables
■ Correlation and Causation ■ Multiple Comparisons and Contrasts
■ ANOVA
■ Multiple Linear
■ Bayesian Statistics
Regression ■ Good and Bad Statistics
■ Interaction Effects Session 3:
■ Practice Questions
2
Overview of the sessions
Session 2:
■ Recap of session 1
■ Partial and Semi Partial Correlation
■ Regression with Code Variables
■ Multiple Comparisons and Contrasts
■ ANOVA
■ Bayesian Statistics
■ Good and Bad Statistics
3
Inference
4
p - value: the probability of the null hypothesis being true for the given data;
Inference about linear regression
5
Confidence interval of correlation coefficient
6
Multiple Linear Regression: Partitioning of Variance
7
Inference in multiple linear regression - omnibus test
df = (p, n-p-1)
8
Inference in multiple linear regression - specific test
Significance test:
Confidence interval:
9
Testing for interaction effects
10
Questions?
11
Overview of the sessions
Session 2:
■ Partial and Semi Partial Correlation
■ Regression with Code Variables
■ Multiple Comparisons and Contrasts
■ ANOVA
■ Bayesian Statistics
■ Good and Bad Statistics
12
Partial and semi-partial correlations
13
Partial and semi-partial correlations
14
Partial and semi-partial correlations
15
Partial and Semi-Partial Correlation: Ballantine-Venn Diagram
16
Exercise
17
Exercise
18
Partial and semi-partial correlation
19
Exercise
a. 0.648
b. 0.666
c. 0.572
20
d. 0.487
Exercise
a. 0.648
b. 0.666
c. 0.572
21
d. 0.487
Exercise
a. 0.42
b. 0.37
c. 0.32
22
d. 0.56
Exercise
a. 0.42
b. 0.37
c. 0.32
23
d. 0.56
Questions?
24
Overview of the sessions
Session 2:
■ Partial and Semi Partial Correlation
■ Regression with Code Variables
■ Multiple Comparisons and Contrasts
■ ANOVA
■ Bayesian Statistics
■ Good and Bad Statistics
25
Regression with code variables
26
Dummy coding
z1 z2
Science 1 0
Math 0 1
Literature 0 0
27
Dummy coding - easy example
z
Y = a + b*z Woman 1
If z = 1 → woman Man 0
If z = 0 → man
28
Dummy coding - easy example z
Woman 1
Man 0
Y = a + b*z
We found that an average height of men was 180 cm and the average
height of women was 167 cm.
29
Dummy coding - more difficult example
z1 z2
Math 0 1
Literature 0 0
30
Dummy coding - more difficult example
Math 0 1 8.12
Literature 0 0 8.08
31
Exercise z1 z2 Depression
score
CBT 1 0 17
Find the regression equation.
Medication 0 1 23
Control 0 0 28
Y = 28 - 11*z1 - 5*z2
32
Exercise
Three groups are compared. The model follows the regression equation:
μ = b0 + b1D1 + b2D2 using two dummy variables that are coded as follows:
D1 D2
Group 1 -1 -1
Group 2 1 0
Group 3 0 1
Three groups are compared. The model follows the regression equation:
μ = b0 + b1D1 + b2D2 using two dummy variables that are coded as follows:
D1 D2
Group 1 -1 -1
Group 2 1 0
Group 3 0 1
35
Overview of the sessions
Session 2:
■ Partial and Semi Partial Correlation
■ Regression with Code Variables
■ Multiple Comparisons and Contrasts
■ ANOVA
■ Bayesian Statistics
■ Good and Bad Statistics
36
Chance capitalization
Let’s say we have 4 treatments and we want to compare them to each other. We
could conduct a t-test for each comparison (we would need to do 6 t-tests). If
each t-test has a significance level of 95%, when we do many comparisons at
once, the chance of a Type I error increases:
Overall error rate = probability of at least one false rejection of H0
Overall error rate = 1 - (1 - α)^k
k - number of tests
37
Chance capitalization
38
Chance capitalization
39
z1 z2 Depression
Contrasts score
1. CBT 1 0 17
0.5μ1 + 0.5μ2 - μ3 = 0
Coefficients of this equation: 0.5, 0.5, -1
40
Contrasts
Relevant formulas:
41
Depression score
Contrasts
1. CBT 17
2.Medication 23
3. Control 28
c = 17*0.5 + 23*0.5 - 28 = -8
42
z1 z2 Depression
Contrasts score
1. CBT 1 0 17
H0: μ1 = μ2
μ1 - μ2 + 0*μ3 = 0
Coefficients of this equation: 1, -1, 0
43
Exercise Depression N SD
score
1. CBT 17 17 2.34
Set up a contrast comparing the effectiveness
2.Medication 23 22 3.14
of CBT and medication together against 3. Control 28 19 2
control.
freedom?
a = 0.5, 0.5, -1
c = -8
sp = 2.58
SEc = 0.724
t = 8/0,724 = 11.05; df = 55
44
Multiple comparisons
45
Multiple comparisons
46
Exercise
You have 5 groups and you want to compare the groups with each other. How
many comparisons do you need to do? If you want the overall significance level
to be 95%, what is the level of α you need to use for each comparison?
10 comparisons
α = 0.05/10 = 0.005
47
Questions?
48
Overview of the sessions
Session 2:
■ Partial and Semi Partial Correlation
■ Regression with Code Variables
■ Multiple Comparisons and Contrasts
■ ANOVA
■ Bayesian Statistics
■ Good and Bad Statistics
49
ANOVA
50
ANOVA
F-test = GMS/RMS
51
ANOVA
F-test = GMS/RMS
The null hypothesis is rejected if F > 1. That tells us that at least one group is
different from the other ones. But which one(s)?
To know that we need local tests:
52
Two-way ANOVA
53
Two-way ANOVA - Main and interaction effects
54
Two-way ANOVA - Partitioning of variance
55
Effect sizes
56
Effect sizes
Eta squared
- In 1-ANOVA, it is the same as R^2
- The effects are additive
- It depends on the number and size of the remaining effects
- Does not estimate the proportion of variance accounted for in the
population: biased estimator (overestimates the actual variance)
57
Effect sizes
58
Effect sizes
Omega squared
- Not biased estimator; does not overestimate the population effects
- Not additive
- Estimate can be negative
59
Exercise
Calculate:
a. Partial eta squared of gender
b. Eta squared of sports
c. Omega squared of gender x sports
60
Exercise
Calculate:
a. Partial eta squared of gender = 0.0018
b. Eta squared of sports = 0.100
c. Omega squared of gender x sports = -0.0116
61
Exercise
62
Exercise
63
Exercise
Factor A 128 3 d h
Factor B a b e i
Interaction AB 744 c f j
64
Exercise
65
Questions?
66
Overview of the sessions
Session 2:
■ Partial and Semi Partial Correlation
■ Regression with Code Variables
■ Multiple Comparisons and Contrasts
■ ANOVA
■ Bayesian Statistics
■ Good and Bad Statistics
67
Bayesian statistics
1. Prior knowledge about probability of some events - either that all values
have the same probability or that some values are more likely than
others
2. Collecting data
68
Bayesian statistics - is it going to rain today?
1. Prior knowledge
c. It rained yesterday
69
Bayesian statistics - is it going to rain today?
2. Data collection
70
Bayesian statistics - is it going to rain today?
3. Updating beliefs
71
Conditional probability
72
Exercise
73 c. 53%
d. 34%
Exercise
74 c. 53%
d. 34%
Bayes theorem
75
Bayes theorem
p(θ) = prior
- Our belief about the probability of θ before looking at the data
p(data|θ) = likelihood
- The probability of observing the data given the θ
p(data) = marginal likelihood
76
- Probability of observing the data across all possible values of θ
Bayesian statistics - distributions
77
Example
78 Statistics II
Questions?
79
Overview of the sessions
Session 2:
■ Partial and Semi Partial Correlation
■ Regression with Code Variables
■ Multiple Comparisons and Contrasts
■ ANOVA
■ Bayesian Statistics
■ Good and Bad Statistics
80
Questionable research practices
81
Questionable research practices
82
Questionable research practices - solutions
83
Summary
1. Practice questions
85
Questions?
86
Good job everyone!
vv