0% found this document useful (0 votes)
14 views87 pages

Session 2 - Stats 2

The document outlines a series of sessions focused on statistical concepts including statistical inference, regression analysis, and ANOVA. It covers topics such as partial and semi-partial correlations, dummy coding for categorical variables, and methods to address chance capitalization in multiple comparisons. Additionally, it provides exercises for practical application of the concepts discussed.

Uploaded by

floriswever03s
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views87 pages

Session 2 - Stats 2

The document outlines a series of sessions focused on statistical concepts including statistical inference, regression analysis, and ANOVA. It covers topics such as partial and semi-partial correlations, dummy coding for categorical variables, and methods to address chance capitalization in multiple comparisons. Additionally, it provides exercises for practical application of the concepts discussed.

Uploaded by

floriswever03s
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 87

Statistics 2

Jadwiga Michlewicz
Overview of the sessions

Session 1: Session 2:
■ Recap of session 1
■ Recap Statistical Inference
■ Partial and Semi Partial Correlation
■ Simple Linear Regression ■ Regression with Code Variables
■ Correlation and Causation ■ Multiple Comparisons and Contrasts
■ ANOVA
■ Multiple Linear
■ Bayesian Statistics
Regression ■ Good and Bad Statistics
■ Interaction Effects Session 3:
■ Practice Questions

2
Overview of the sessions

Session 2:
■ Recap of session 1
■ Partial and Semi Partial Correlation
■ Regression with Code Variables
■ Multiple Comparisons and Contrasts
■ ANOVA
■ Bayesian Statistics
■ Good and Bad Statistics

3
Inference

- Null hypothesis (H0) - opposite of what we think is true


- E.g. H0: Difference between A and B = 0
- Alternative hypothesis (Ha) - what we think is true
- E.g. Ha: Difference between A and B ≠ 0
Goal: reject null hypothesis

4
p - value: the probability of the null hypothesis being true for the given data;
Inference about linear regression

Are a and b different from 0?


- H0: a = 0, b = 0
- Ha: a ≠ 0, b ≠ 0

5
Confidence interval of correlation coefficient

6
Multiple Linear Regression: Partitioning of Variance

• Partitioning of Variance will be important for ANOVA


• Total sum of squares (TSS) = Regression sum of squares (RSS) + Sum of squared
error (SSE)

7
Inference in multiple linear regression - omnibus test

We use an F-test to test the null hypothesis:

df = (p, n-p-1)

8
Inference in multiple linear regression - specific test

Significance test:

Confidence interval:

9
Testing for interaction effects

We want to see if the proportion of explained variance is significantly higher in the


complete than in the reduced model.

df1 = difference in the number of predictors between models


df2 = n-p-1 of the complete model

10
Questions?

11
Overview of the sessions

Session 2:
■ Partial and Semi Partial Correlation
■ Regression with Code Variables
■ Multiple Comparisons and Contrasts
■ ANOVA
■ Bayesian Statistics
■ Good and Bad Statistics

12
Partial and semi-partial correlations

13
Partial and semi-partial correlations

Partial correlation - correlation between x1 and y controlling


for x2 (correlation between “y without x2” and “x1 without
x2”)

Semi-partial (part) correlation - controlling only for one/some


explanatory variables and not the response variable
(correlation between “y” and “x1 without x2”)

14
Partial and semi-partial correlations

Squared partial correlation - the proportion of variation in y


explained by x1, controlling for x2; the proportion of variance of
y not explained by other independent variables;

Squared semi-partial (part) correlation - how much of variation


in y is explained uniquely by one independent variable

15
Partial and Semi-Partial Correlation: Ballantine-Venn Diagram

Model Partial Semipartial Correlation


Correlation

16
Exercise

You do a multiple regression and want to establish the amount of y that


is uniquely explained by one of your predictors, what statistic do you
need:
a) Semi-partial correlation
b) Partial correlation
c) Partial Correlation Squared
d) Semi-Partial Correlation Squared

17
Exercise

You do a multiple regression and want to establish the amount of y that


is uniquely explained by one of your predictors, what statistic do you
need:
a) Semi-partial correlation
b) Partial correlation
c) Partial Correlation Squared
d) Semi-Partial Correlation Squared

18
Partial and semi-partial correlation

19
Exercise

Calculate the Partial Correlation between Y and X1.


Y X1 X2
Y 1 0.64 0.47
X1 0.64 1 0.16
X2 0.47 0.16 1

a. 0.648
b. 0.666
c. 0.572

20
d. 0.487
Exercise

Calculate the Partial Correlation between Y and X1.


Y X1 X2
Y 1 0.64 0.47
X1 0.64 1 0.16
X2 0.47 0.16 1

a. 0.648
b. 0.666
c. 0.572

21
d. 0.487
Exercise

Calculate the Semi-partial Correlation between Y and X2.


Y X1 X2
Y 1 0.64 0.47
X1 0.64 1 0.16
X2 0.47 0.16 1

a. 0.42
b. 0.37
c. 0.32

22
d. 0.56
Exercise

Calculate the Semi-partial Correlation between Y and X2.


Y X1 X2
Y 1 0.64 0.47
X1 0.64 1 0.16
X2 0.47 0.16 1

a. 0.42
b. 0.37
c. 0.32

23
d. 0.56
Questions?

24
Overview of the sessions

Session 2:
■ Partial and Semi Partial Correlation
■ Regression with Code Variables
■ Multiple Comparisons and Contrasts
■ ANOVA
■ Bayesian Statistics
■ Good and Bad Statistics

25
Regression with code variables

So far, we talked about regression models with only continuous


variables. But sometimes we want to include categorical
variables too.
E.g.
- Gender
- Educational level
- Whether someone has children or not

26
Dummy coding

Coding categorical variables into 1s and 0s.

Example: 3 subjects: Science, Math, Literature

z1 z2

Science 1 0

Math 0 1

Literature 0 0

27
Dummy coding - easy example

Regression model of height of men and women.

z
Y = a + b*z Woman 1

If z = 1 → woman Man 0

If z = 0 → man

28
Dummy coding - easy example z

Woman 1

Man 0
Y = a + b*z

We found that an average height of men was 180 cm and the average
height of women was 167 cm.

180 = a + b*0 167 = a +b*1


167 = a + b
180 = a
167 = 180 - 13
Y = 180 - 13z
b = - 13

29
Dummy coding - more difficult example

Regression model of students’ GPA based on what their favourite subject is

z1 z2

Y = a + b1*z1 + b2*z2 Science 1 0

Math 0 1

Literature 0 0

30
Dummy coding - more difficult example

Favourite subject z1 z2 GPA

Y = a + b1*z1 + b2*z2 Science 1 0 7.89

Math 0 1 8.12

Literature 0 0 8.08

We can make 3 equations:


7.89 = a + b1*1 + b2*0 = a + b1
8.12 = a + b1*0 + b2*1 = a + b2
Y = 8.08 - 0.19*z1 + 0.04*z2
8.08 = a + b1*0 + b2*0 = a

31
Exercise z1 z2 Depression
score

CBT 1 0 17
Find the regression equation.
Medication 0 1 23

Control 0 0 28

Y = 28 - 11*z1 - 5*z2

32
Exercise

Three groups are compared. The model follows the regression equation:
μ = b0 + b1D1 + b2D2 using two dummy variables that are coded as follows:
D1 D2

Group 1 -1 -1

Group 2 1 0

Group 3 0 1

What is the regression equation for the second group?


a. μ = b0 + b1 + b2
b. μ = b0 – b1
c. μ = b0 – b1 – b2
d. μ = b0 + b1
33
Exercise

Three groups are compared. The model follows the regression equation:
μ = b0 + b1D1 + b2D2 using two dummy variables that are coded as follows:
D1 D2

Group 1 -1 -1

Group 2 1 0

Group 3 0 1

What is the regression equation for the second group?


a. μ = b0 + b1 + b2
b. μ = b0 – b1
c. μ = b0 – b1 – b2
d. μ = b0 + b1
34
Questions?

35
Overview of the sessions

Session 2:
■ Partial and Semi Partial Correlation
■ Regression with Code Variables
■ Multiple Comparisons and Contrasts
■ ANOVA
■ Bayesian Statistics
■ Good and Bad Statistics

36
Chance capitalization

Let’s say we have 4 treatments and we want to compare them to each other. We
could conduct a t-test for each comparison (we would need to do 6 t-tests). If
each t-test has a significance level of 95%, when we do many comparisons at
once, the chance of a Type I error increases:
Overall error rate = probability of at least one false rejection of H0
Overall error rate = 1 - (1 - α)^k
k - number of tests

37
Chance capitalization

Overall error rate = 1 - (1 - α)^k


k - number of tests

What’s the overall error rate for 6 tests at 95% significance?

Overall error rate = 0.265

38
Chance capitalization

How to deal with chance capitalization?


- Contrasts
- Multiple comparisons
- Bonferroni correction
- Tukey correction

39
z1 z2 Depression
Contrasts score

1. CBT 1 0 17

Are CBT and medication 2.Medicatio 0 1 23


n
better treatments of 3. Control 0 0 28

depression than control?

0.5μ1 + 0.5μ2 - μ3 = 0
Coefficients of this equation: 0.5, 0.5, -1

40
Contrasts

Relevant formulas:

41
Depression score
Contrasts
1. CBT 17

2.Medication 23

3. Control 28

Calculate the value of c for this contrast.

c = 17*0.5 + 23*0.5 - 28 = -8

42
z1 z2 Depression
Contrasts score

1. CBT 1 0 17

Is CBT better than medication 2.Medicatio 0 1 23


n
as a depression treatment? 3. Control 0 0 28

H0: μ1 = μ2
μ1 - μ2 + 0*μ3 = 0
Coefficients of this equation: 1, -1, 0

43
Exercise Depression N SD
score

1. CBT 17 17 2.34
Set up a contrast comparing the effectiveness
2.Medication 23 22 3.14
of CBT and medication together against 3. Control 28 19 2

control.

What is the value of the t-test and degrees of

freedom?
a = 0.5, 0.5, -1
c = -8
sp = 2.58
SEc = 0.724
t = 8/0,724 = 11.05; df = 55

44
Multiple comparisons

Adjusting the significance level - Bonferroni method:


- If we have 4 pairs we want to compare and our desired significance level is
90%, we use 0.10/4 = 0.025 as an α for each comparison.

45
Multiple comparisons

Tukey correction - produces narrower intervals that the Bonferroni correction.

46
Exercise

You have 5 groups and you want to compare the groups with each other. How
many comparisons do you need to do? If you want the overall significance level
to be 95%, what is the level of α you need to use for each comparison?

10 comparisons
α = 0.05/10 = 0.005

47
Questions?

48
Overview of the sessions

Session 2:
■ Partial and Semi Partial Correlation
■ Regression with Code Variables
■ Multiple Comparisons and Contrasts
■ ANOVA
■ Bayesian Statistics
■ Good and Bad Statistics

49
ANOVA

ANOVA - analysis of variance


- Used to compare multiple groups
- A test of independence between the qualitative response
variable and the categorical explanatory variable that defines the
groups
- Analyses the variance between each sample mean and the
overall mean and the variability within each group. For the
groups to be different, the variability between groups should be
large and the variability within the group should be small

50
ANOVA

F-test = variability between groups/variability within groups

Total Group Residual

Sum of Squares TSS GSS RSS

Degrees of freedom n-1 g-1 n-g

Mean squares TSS/(n-1) = TMS GSS/(g-1) = GMS RSS/(n-g) = RMS

F-test = GMS/RMS

51
ANOVA

F-test = GMS/RMS
The null hypothesis is rejected if F > 1. That tells us that at least one group is
different from the other ones. But which one(s)?
To know that we need local tests:

52
Two-way ANOVA

What if we have 2 independent variables instead of one?


E.g. we want to check the influence of gender and education on income.

We need to consider main and interaction effects.


- Main effect - effect of each variable individually (like in one-way ANOVA)
- Interaction effect - how the 2 variables affect each other and the
dependent variable

53
Two-way ANOVA - Main and interaction effects

54
Two-way ANOVA - Partitioning of variance

55
Effect sizes

How strong is the effect of the independent variable on the dependent


variable?
- Eta squared
- Partial eta squared
- Omega squared

56
Effect sizes

Eta squared
- In 1-ANOVA, it is the same as R^2
- The effects are additive
- It depends on the number and size of the remaining effects
- Does not estimate the proportion of variance accounted for in the
population: biased estimator (overestimates the actual variance)

57
Effect sizes

Partial eta squared


- In 1-ANOVA, it is the same as R^2
- Does not depend so much on the remaining effects
- Not additive
- Does not estimate the proportion of variance accounted for in the
population: biased estimator (overestimates the actual variance)

58
Effect sizes

Omega squared
- Not biased estimator; does not overestimate the population effects
- Not additive
- Estimate can be negative

59
Exercise

Calculate:
a. Partial eta squared of gender
b. Eta squared of sports
c. Omega squared of gender x sports

60
Exercise

Calculate:
a. Partial eta squared of gender = 0.0018
b. Eta squared of sports = 0.100
c. Omega squared of gender x sports = -0.0116

61
Exercise

Which statement about the effects in this graph is true?


a. There is a main effect B and there is an interaction effect.
b. There is a main effect B but there is no interaction effect.
c. There is no main effect B but there is an interaction effect.
d. There is no main effect B and there is no interaction effect.

62
Exercise

Which statement about the effects in this graph is true?


a. There is a main effect B and there is an interaction effect.
b. There is a main effect B but there is no interaction effect.
c. There is no main effect B but there is an interaction effect.
d. There is no main effect B and there is no interaction effect.

63
Exercise

Fill in the two-way ANOVA table


SS DF MS F

Factor A 128 3 d h

Factor B a b e i
Interaction AB 744 c f j

Residual 1203 157 g

Total 2698 176

64
Exercise

Fill in the two-way ANOVA table


SS DF MS F

Factor A 128 3 d = 42.67 h = 5.57

Factor B a = 623 b=4 e = 155.75 i = 20.328


Interaction AB 744 c = 12 f = 62 j = 8.092

Residual 1203 157 g = 7.622

Total 2698 176

65
Questions?

66
Overview of the sessions

Session 2:
■ Partial and Semi Partial Correlation
■ Regression with Code Variables
■ Multiple Comparisons and Contrasts
■ ANOVA
■ Bayesian Statistics
■ Good and Bad Statistics

67
Bayesian statistics

1. Prior knowledge about probability of some events - either that all values

have the same probability or that some values are more likely than

others

2. Collecting data

3. Re-allocating the probability based on the collected data

68
Bayesian statistics - is it going to rain today?

1. Prior knowledge

a. We are in the Netherlands

b. It usually rains this time of the year

c. It rained yesterday

69
Bayesian statistics - is it going to rain today?

2. Data collection

a. No clouds in the sky

b. Weather report says it’s not going to rain today

70
Bayesian statistics - is it going to rain today?

3. Updating beliefs

a. Probability of rain decreases

b. Probability of no rain increases

71
Conditional probability

Remember statistics 1b:


- P(B|A) - the probability that B happens given that A happened.

72
Exercise

In a school 78% of students have a smartphone, 34% of students have a


laptop and 18% of students have both. What is the probability that a student
owns a laptop given that they own a smartphone?
a. 44%
b. 23%

73 c. 53%
d. 34%
Exercise

In a school 78% of students have a smartphone, 34% of students have a


laptop and 18% of students have both. What is the probability that a student
owns a laptop given that they own a smartphone?
a. 44%
b. 23%

74 c. 53%
d. 34%
Bayes theorem

Prior probability - probability distribution before collecting data


Posterior probability - probability distribution that incorporates collected data\
- Computed by updating prior probability using Bayes theorem
- Posterior probability - the probability of event A occurring given that B
has already occurred (conditional probability)

75
Bayes theorem

p(θ) = prior
- Our belief about the probability of θ before looking at the data
p(data|θ) = likelihood
- The probability of observing the data given the θ
p(data) = marginal likelihood

76
- Probability of observing the data across all possible values of θ
Bayesian statistics - distributions

77
Example

78 Statistics II
Questions?

79
Overview of the sessions

Session 2:
■ Partial and Semi Partial Correlation
■ Regression with Code Variables
■ Multiple Comparisons and Contrasts
■ ANOVA
■ Bayesian Statistics
■ Good and Bad Statistics

80
Questionable research practices

- Manipulation of data - modifying data, fabricating data, excluding data


that doesn’t align with the hypothesis
- Selective reporting of (dependent) variables
- Sequential sampling
- HARKing - Hypothesizing After the Results are Known

81
Questionable research practices

82
Questionable research practices - solutions

- Enhanced methodological training


- Open science
- Preregistration
- Replication
- Meta-analyses

83
Summary

■ Partial and Semi Partial Correlation


■ Regression with Code Variables
■ Multiple Comparisons and Contrasts
■ ANOVA
■ Bayesian Statistics
84
■ Good and Bad Statistics
Next session (14.01, 09:00-13:00)

1. Practice questions

85
Questions?

86
Good job everyone!
vv

You might also like