0% found this document useful (0 votes)
14 views35 pages

RiP Final Study

The document provides an overview of simple and multiple linear regression, including definitions of dependent and independent variables, the regression equation, and methods for determining the best fitting line. It also discusses the significance of relationships between variables, assumptions for regression analysis, and the importance of experimental research in establishing causality. Additionally, it outlines various experimental designs and potential threats to internal validity in experiments.

Uploaded by

gemesgirls
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views35 pages

RiP Final Study

The document provides an overview of simple and multiple linear regression, including definitions of dependent and independent variables, the regression equation, and methods for determining the best fitting line. It also discusses the significance of relationships between variables, assumptions for regression analysis, and the importance of experimental research in establishing causality. Additionally, it outlines various experimental designs and potential threats to internal validity in experiments.

Uploaded by

gemesgirls
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 35

Simple Linear Regression

- Theoretical concept  conceptual definition  operational definition


 variable
Prediction
o Can the PTSD scale scores be used to predict productivity at
work?
From correlation to regression
- Correlation is used for:
o Measuring strength of a linear relationship
o Measuring direction of a linear relationship
- Regression is used for:
o Describing the linear relationship with an equation
o Making predictions using this equation
 For other groups
 For other situations
 When only data on the independent variable is available
Variables and perditions
- The variable being predicted
o Is denoted by Y
o Is called the dependent variable
- The variable used to make predictions from
o Is denoted by X
o Is called
 The independent variable
 The predictor
Regression line
- How do we determine the equation of the best fitting line?
- How do we determine which line fits best?
- We use a technique classed Least Squared Regression
Residuals
- Residuals are the difference between the observed value of Y and
the predicted value of Y (= point on the line)
- When a line fits the data well, the residuals will tend to be small
- When a line does not fit the data well, the residuals will tend to be
large
Regression equation
- The regression equation is determined using these residuals
- The equation with the smallest sum of squared residuals is the
winner!
Residuals
- When there is little spread around the regression line, then
o The residuals tend to be small
o The predictions made with the regression equation will be very
accurate
- When there is more spread around the regression line, then
o The residuals tend to be larger
o The predictions made with the regression equation will be less
accurate
Accuracy
- A measure of accuracy of the predictions is called : Standard Error of
the Estimate, aka Root Mean Squared Error (RMSE)
o It is roughly, the average error we make when using the
regression equation to make predictions
Regression equation
- The regression equation
o Is determined by mathematically determining the smallest
sum of squared residuals (SSR)
o Can be used to make predictions

o Is written as ŷ = 𝑏 0 + 𝑏 1 𝑥

 Bo is called the Y-intercept

 B1 is called the Slope

Predictions

- Predicting a Y-value is simple: we use the linear function with the


value of X in the right place

Residuals

- Residuals can be calculates for dependent variable as the difference


between the observed value (y) and the predicted value (ŷ)

- RMSE = the average error we make when using the regression


equation to make predictions

- R(squared) = measure of how well the line represents the date

o Aka percentage variance explained

o Aka coefficient of determination

o Formally means: how much of the variatin in Y can be


explained by the linear relationship with X

Significance

- To test if the linear relationship is a significant relationship, tehre


are 2 tests we can conduct :

o Option 1: test for the slope


o Option 2: test for the explained variance

- Fact 1: A horizontal line has the slope of 0. The equation of a


horizontal line looks like Y=5 for instance

- Result 1: to test if there is a significant relationship between X and


Y, we can test to see if the slope differs from 0

o Test for the slope

 This test determines if the slope is significantly different


from 0

 𝐻0: 𝛽1 =0

 𝐻𝐴: 𝛽1 ≠0

 Can be tested using a basic test called the t-test

 Results van be found in output

- Fact 2: A horizontal regression line means that there is no linear


relationship between the two variables

- Result 2: to test if the model explaines a significant portion of th


variation, we can test to see if the propoertion of the variation that
is explained by the ,odel is greater than 0

o Test for explained variance

 This test determines if the proportion of explained


variation is significantly larger than 0

 𝐻0: 𝜌2 =0

 𝐻𝐴: 𝜌2 >0

 Can be tested using a test called the F-test

 Results can be found in output

- Fact 3: when no predictor variable is used, none of the variation can


be explained

Multiple Linear Regression


Coefficients
- So far; unstandardised regression coefficients: b 0 and b1
- These are both expressed in the same units are the dependent
variable

- ŷ = 33.48 – 12.00x equasion

Standardised coefficients: in Single Linear Regression it’s also called the


coefficient beta, is the expected result of a regression analysis where the
underlying data have been standardised so that the variences of
dependent and independent variables are equal to 1. It is equal to the
correlation coefficient r, and it measures the change in Y in Standard
Deviations when X increased with 1 Standard Deviation.

Why use this measure?

- It is usefull in 2 scenarios:

o When the units of measurement of X and Y differ dramatically

o When we use more than 1 independent variable to predict Y


and we wish to compare the importance of them

- The standardised regression coefficient is usually what is reported in


journa articles

Multiple Linear Regression

- In a regression model, we can add multiple varibales

- E.g.

o Job satisfaction

o Percentage of part-time jobs

- We write ŷ = 𝑏 0 + 𝑏 1 𝑥 1 + 𝑏 2 𝑥 2 + ⋯ + 𝑏 𝑘 𝑥 𝑘

- The regression equation is determined in the same way (smallest


sum of squared residuals)

A few things to know

- Adding more independent variables (predictors) to a model will


always

o Explain more of the variation of the dependent variable

o Reduce the average predictor error

o With more predictors, R2 will always increase


o With more predictors, the Standard Error will always decrease
 accuracy of prediction increases

- Adding more predictors to a model may not always

o Lead to a better model

o Mean that tehse predictors are signifact

o The increase in R2 and the decrease in the SE may not always


be a signifcant change

o The significance of individual predicors can be tested using


the t-test in the output

Coefficients

- Standardised regression coefficients put varibales on the same scale


(SD’s)

- Largest standardised coefficient indicates the predictor with the


most impact on the predictions of Y

- When we talk about most impact, we mean most impact on changes


in the Y-variable

Significance

- Significant testing in MLR works a bit different:

o Test for explained variance

 Same test ast SLR

 Test for significance of enitre model (with all predictors)

o Test for slope

 Test for significance of a sigbnle predictor

 On at a time!!!

 Can be used to improve the model

 “ as accurate though simple as possible”

Test for explained variance


- This test determines if the proportion of explained variance is
significantly larger than 0

o H0:r2 =0

o HA:r2 >0

- Can be tested using the same F-test as in SLR

- Results van be found in the output

o Note: this is a test for significance of the entire model (with all
predictors)

Test for a single predictor

- This test determines if the regressioncoeficient for a single predictor


is significantly different from 0

- For example:

o H0: bPTSD = 0

o HA: bPTSD ≠ 0

- Can be tested using the t-test

- Results can be found in jasp-output

o Note: if a predictor is not significant in the model, a simples


model could ne used without that predictor! But only one at a
time! Never remove multiple variables at once based on the t-
test

Asumptions

- The assumptions for MLR are the same as for SLR

- We check them using graphs

- Example: use histogram of residuals to check asumption of


normality

Assumption: homoscedasticity

- A sequence of random variables is homoscedastic if all its random


variables have the same finite variance; this is also known as
homogenity of variance. The complementary notion os called
heteroscedasticity

Assumption: Linearity

- With so many independent variables, the condiiton of a linear


relationship is difficult to check with a scatter plot

- We can check this with a residual plot too

- We check whether the residuals form a horizontal band

Assumption: no outliers

APA reporting – Regression

- Needed:

o n

o F and p

o R2

o Standardised coeffcietn and p=values

- Optional

o SE

- Create a table with values of Beta and p for each avriable in the
rows

- Report n, F, p and R2 in a note underneath the table

- Explain any abbreviations you may have used in the note as well

Intro into experimental research


Why causality?
- Why are reserachers interested in causility?
o To understand how (social) reality works
 For example:
 Effect of motivation on academic achievement
 Effect of SES on access to health care
o To intervene in that reality
 For example:
 Effect of inquiry-based learning on university
students’ study motivation
 Effectiveness of an intervention directed at
improving health related behaviour among people
living in a poor city disctrict
Conditions causality
1. Covariance
a. There should be a relationship between cause and outcome
2. Temporal precedence
a. The cause should precede the outcome in time
3. Internal validity
a. Alternative explenations for the relationship should be rules
out

Causality
- The best way to meet all the conditions is by means of a
randomised experiment
Randomised experement: a research design where by randomisation
groups can be assumed to be similar, one variable is manipulated (varied)
by the resreacher, and the researcher measures the effect of this
manipulation on another variable (the outcome).

Covariance
- When do we speak of a relationship between note taking mode and
exam score?
o When we see a difference in exam scores between students
who use the two different note taking techniques

Temporal precedence
- An experiment allows the researcher to ensure that the cause
precedes the outcome
- By applying the manipulation before measuring the dependent
variable

Internal validity
- Alternative explanations for the relationship should be rules out
- Is there a manipulated variable that explains the group difference or
is tehre an alternative explanation?
- Important role for:
o Design of the experiment
o Treatment of the participants
o Etc.

Research question
- An experimental research question can be identified by the following
elements:
- PICO:
o Population  the group of people the researcher whishes to
investigate
o Intervention  The experimental condition
o Comparison  the control group
o Outcome  the dependent variable

Research design
- The researcher chooses among whom and how the data will be
collected
- The researcher starts with a sample of participants
o Preferably a random sample
- Randomised experiment:
o Experimental group
o Control group

Random assignment
- Key of the true experiment
- Random procedure determines group assignemnt
o Treatment/experimental group
o Control group/placebo group
- Observed and unobserved factors are equally likely in both groups
- Transparent, reproducible
- Allows causal claims

Experimental designs
- Random assignment plays an integral role in experiment
- Experiments can be designed in different ways

Between vs. within


- When participants are divided into different groups and each group
receives a different treatment, we call this a between subject design
Between subject design: when participants are divided into different
groups and each group receives a different treatment.
- Aka independent-groups design
- The data is compared between the groups
- When all participants receive all different treatments (one after the
other, possibly randomized in order) we call this a within subject
design
Within sibject design: when all participants receive all different treatments
(one after the other, possibly randomised in order)
- Aka within-groups design
- We first compare the date within each person

Design 1: Posttest-Only-Design
- First, subjects are randombly assigned to the experimental group
and the control group
- After the treatment, the outcomes of the two groups are compared

Design 2: Pretest-Posttest-Design
- Aka the classical experiment
- A pre-test is added before the treatment

Design 3: Solomon Four-Group Design


- Pretest with and without treatment
- No pretest with and without treatment

Design 4: Repeated-Measures Design


- No random assignment
- Just pretest  treatmnet  posttest
- Or treatment  test  treatment  test

Design 5: Counterbalanced Measures Design


- With random assignment

Design 6: Quasi-experiments

Design 7: Interrupted Time Series Design


Lab experiments vs. Field experiments
- Field experiment:
o An experiment with a close simulation of the conditions under
which the process under study occurs or in a natural setting
 Less control of the researcher than in a lab
 Challanging/expensive to implement

Threats to internal validity in experimental designs


1. Learning effect
2. Design confounds
3. Selection effect
4. Contamination
5. Maturation
6. History
7. Regressing to the mean
8. Attriciton
9. Instrumental

1. Learning effects
- Repeated-measures design
Learning effects: are aka order effects, practice effects, testign effects

2. Design confounds
- A confoudning variable is a second variable that happens to vary
systematically along with the intended independent variable
Confounding varibale: a second variable that happens to vary
systematically along with the intended independent variable
- This avriable is tehreof an alternative explanation for the results
- Note: this does not apply to variables that vary randombly between
the groups/participants

3. Selection effects
- Were the groups comparable ar the start of the experiemnt?
o With respect to the dependent variable?
o With respect to other variables (observed and unobserved)?
- If, for some reason, the groups turn out to be not comparable at the
start of the experiment, we speak of a selection effect.
Selection effects: the group turns out to not be comparable at the start of
the experiment.
- Random assignment reduces selection effects to a minimum.
o The goal: making sure that the mean and variance ins cores,
on all variables, measured and unmeasured, are similar for
both groups at the onset of the study
o Issues:
 Sometimes impossible
 Non-ethical
 Infeasible
 Sometimes possible, but things fo wrong

4. Contamination
- Participants in the experimental group communicate with
participants in the control group
- Participants do not adhere to the treatment
- Influence from researcher(s)

8. Attrition
- When participants drop out during an experiment or study, this can
affect the results it is called attrition
Attrition: when participants drop out during an experiment or study
- This is espescially a problem if the people who drop out are
systematically different from the people who continue to participate

Independent t-test
Inferential statistics: experiments
- When researchers conduct experiments, they wish to test if there is
a difference
o Between groups or
o Between times of measurement
- The proces they follow is similar to the process for correlational
research
o 1. Follow the theory data cycle
o 2. Many researchers choose to follow the steps of NHST in the
place of date analysis

Inferential statistics: NHST steps


1. Formulate a hypothesis
2. Choose a test statistic and compyte its value
3. Calculate the propability of the result or more extreme, given Ho
4. Make a decision about Ho (reject or not)
5. (extra) State the conclusion

Example
- Randomized experiment

o Group 1:

 Control group, n1 = 40

 Study list of words in regular type font in normal size


(e.g., Verdana font size 11)

o Group 2:

 Experimental group, n2 = 40

 Study same list of words in same font but large size


(e.g., size 16)

 Response: Score on recall test

 Expectation: People who study words in large font score


higher

1. Formulating hypothesis

- Research hypothesis:

On average, people who. Stdy words in large font score highr on the
recall test that people who study in regular font size

- Null hypothesis:

On average, people who study words in large font and people who
study words in regular font score the same on the recall test

- Statistical hypothesis:

o H0: mL = mR , HA: mL > mR

2. Choose & Compute test statistic

- Based on the hypothesis, a researcher selects the most appropriate


test procedure

- When comparing the mean scores of two independent groups


researchers use the: Independent samples t-test

- When comparing the mean scores of repeated measures,


researchers use a test called the paired t-test

- Things to consider
- Units of measurement

o A difference of 2cm is very different from 2km

- Spread in measurement

o If all measures fall between 0 and 100 cm, a difference of 2cm


can be considered small

o If all measurements fall between 18 and 22 cm, a difference of


2 cm may be considered large

- With the t-test we consider the relative difference between the


groups, using:

o The mean difference: M1-M2

o The spread in scores in both groups: SD1 and SD2

o The group sizes: n1 and n2

Test statistic t

- Just like a sample correlation r will


vary across multiple samples, the
value of M1-M2 also veries from
sample to sample

- A standard error can be estimated


for a mean difference too

- This standard error contains the group sizes (n1 and n2) and spread
in scores in both groups (SD1 and SD2)

- When we divide M1-M2 by its standard eroor, we


obtain the standardizes score

- Whe units no longer play a role, since M1, M2, and the SE are all
measured in the same units

- We call this the test statistic t  values of t are always on the same
scale

- Standardised difference (=t)

- The idea behind the t-statistic


- When a lot of samples are drawn from a population in which Ho is
true (no mean difference between the groups in the population):

o The difference between the sample means will often be near


zero

o So, t will often be near zero, too

o Values of t that are far from zero will be found less often

- In NHST we wish to decide if we:

o Reject Ho

o Do not reject Ho

- We use p-value to make this decision:

o Reject Ho if p<alpha

3. Compute probability of results given Ho

- With the t-distribution, researchers can compyte the p-value

- Remember:

o Computed under the assumption that Ho is true (there is no


difference)

o P-value

o = the probability of observing the value of the observed test


statistic, or a value further away from zero

o = the area under the curve in the tail of the distribution

o Calculated by software

- Interpretation of p-value

- Situation 1:

o If the null-hypothesis, that there is no difference, were true,


then the chance of finding a difference is 2 or an even larger
difference, equals .117

o This means that in 11.7% of the cases that we conduct this


experiment with the same number of people when there really
is no difference between using regular sice font and large font,
we would observe a difference of 2 or even larger

4. Decision about Ho

- Depending on the p-value, the researcher decides to reject or not


reject the null-hypothesis

5. (extra) State the conclusion

- Decision:

o Do not reject Ho

- Conclusion:

o The size of the font of the list of words has no significant effect
on the recall scores

- Decision:

o Reject Ho

- Conslusion:

o The size of the font of the list of words has a significant effect
on the recall scores. On average, people who study in a large
font score higher on the recall test than people who study
words in regular size font

A closer look a t

Test Statistic t

- What influences the t-statistic?

- Lets take a closer look at the formula:

o Numerator: difference in means (M1-M2)

o Denominator: Stander error (SE)

 How do we obtain the SE?

 Formula:

 SDpooled is a weighted average of:

 The SD in sample 1: and


 The SD in sample 2

 The SE is dependent on:

 Group sizes (n1 and n2)

 Variation in scores in both


groups (SD1 and SD2)

 A large difference in means  larger t

 More variation in scores  larger SE  smaller t

 Larger samples  smaller SE  larger t

o So what influences the t-statistic?

 Difference in means (M1-M2)

 Variation in scores (SD1 and SD2)

 Sample sizes per group (n1 and n2)

Power
Choices
- When conductiong a hypothesis
test, researchers must always
make a choice:
o Reject Ho
o Do not reject Ho
Making choies = sametimes
making mistakes
- When the null hypothesis is yrue,
and researchers choose to reject the null hypothesis, they make a
type I error
- When the null hypothesis is not true, and the researcher does not
reject the null hypothesis, they make a type II error

Type I error
- Researchers consider making a type 1 error worse of the two
mistakes
- In NHST the null hypothesis is protected by making the chance of
making a type I error small
Choice of alpha
- Choice of alpha depends on
o Research situation
o Severity of consequences
- Imagine two researchers evaluate the effectiveness of a treatment
for depression:
o Mindfulness training: relatively cheap training, no adverse side
effect
o Lithium: relatively expensive drug, serious risk of side effects

Type II error
- Chance of type II error = beta
- Value of beta is indirectly related to the value of alpha
o If alpha high then beta low
o If alpha low then beta high
o NB: not by same amount
The inverse of a type II error
- Researchers are interested in the inverse of a type II error:
o Type II error: the researcher concludes – based on the sample
evidence – that there is no difference between two groups,
when – in reality – there is a difference in the population
o Inverse: the researcher concludes – based on the sample
evidence – that there is a difference between two groups,
when – in reality – there is difference in the population
Chances
- The chance of a type II error was dneoted by beta
- The chance of the inverse of the type II error, the chance of finding
the difference that actually exists is then 1-beta
- This chance is also referred to as the power of the test
Power = the chance of correctly rejecting Ho

More about power


- Power is influenced by several factors:
o In their research, researchers want the power to be high
- At the start of their research, researchers must choose a smaple
size
o Large sample sizes provide more information, but
 Are expensive
 Are sometimes practically unfeasible
o Large sample sizes provide more information, so
 Descriptive statistices are more accurate
 The standard error is smaller
 The ability to differentiate between groups increases
(=power)
 The larger the sample size the higher the power
- The power measures the chance that an existing difference in the
population will be found by the sample data and the statistical test
o What difference in the population si easier to be found in the
sample? A large difference
 The larger the difference between the groups, the
higher the power
- At the start of their research, researchers choose a value for alpha
- Many researchers in the social sciences choose 0.05
- When alpha increases, beta decreases
- When alpha increases, power increases
o The larger alpha, the higher the power
- Choosing a higher level of significance does come at a cost:
o Alpha higher  greater chance of making a type I error
o Researchers need to find a balance between a small value of
alpha and high power
- When an expectation of direction is formed from the literature,
researchers can conduct a one-sided test

- Disadvatage directional (one-sided) testing:

o If the difference turns out to be in the opposite direction, you


can’t reject Ho even though it may seem that p<alpha

- Should we then always conduct nondirectional (to-sided) tests?

o Also has disadvantages

One sided test

- Example:

Directional vs. nondirectional


testing
- Disadvantage directional (one-
sided) testing:
o If the difference turns out to
be in the opposite direction,
you can’t reject Ho, even
though it may seem that
p<alpha
- Disadvantage nondirectional (two-sided) testing:
o Less power
o Not theory driven

- On experimental research:
o Participants are randomly assigned to the groups
o Independent variable is manipulated by the researcher

Comparing groups
- The t-test can be used to compare two groups
- Three scenarios:
o 1. Two groups of a randomised experiment
o 2. Two existing groups, where an independent variable is
manipulated
 A kind of experiment eithout randomisation but with
manipulation is called a quasi-experiment
o 3. Two existign groups, where nothing is manipulated
 Comparison of two groupd without randomisation and
without manipulation is called a non-experiment
 This is no longer experimental research but correlational
research

Scientific integrity
- European code of conduct
o Four principles which are the basis of integrity in research
 Reliability
 Honestly
 Respect
 Accountability
Idea/Theory
- Theory:
o Degradation of the (cleanliness of the) streets leads to
stereotyping and discrimination
- Research question:
o Do people exhibit more discriminatory behaiviour on dirty
stations than on clean stations?
Experiment

1. Utrecht CS during strike of cleaning personnel (= dirty station)

- Questionnaire on ethnicity/sexual preference and personality traits

- In fact: fill out a questionnaire on a bench

o How far from person with different ethnic background does


the participant sit down?

2. Utrecht CS after strike (= clean station)

- Same questionnaire

- In fact: fill out a questionnaire on a bench

o How far from person with different ethnic background does


the participant sit down?
- People who filled out the questionnaire at the dirty station showed
more stereotyping

- At the dirty station, participants were more likely to sit further away
from the person on the bench if theu had a different ethnic
background

Major violations of scientific integrity

- Fabrication:

o Make up data

o Deliberate violations

- Plagiarism:

o Copy other people’s work

o Deliberate violation

- = examples of honesty

- Data faslification:

o Data falsification is deliberately:

 Not reporting certain findings

 File-drawer problem = not significant results do


not make it into an article, because scientific
journals would like to publish
inyteresting/innovative results, that attract more
readers

o And researchers need to publish enough to


make a living

o This creates QRP = Questionable Research


Practices

 Confirmation bias = results that do not correspond


to expectations are (un)deliberately ignored by
the researcher

 Publication bias = absence of non-significant


results leads to bias towards large effects

 Adjusting the data


 Removal of outliers

o Allowed if a mistake is made

 Misinterpreting the data

 HARKing

- Intentional errors = honesty violation

- Unintentional errors = reliability violation

QRP

- Conscious behaviour of researchers:

o Removing outliers to make a difference significant

o Add a few more participants to make results significant

o Run a different analysis than planned

o P-hacking

- Search for significant relationships in a dataset that contains a large


set of variables

o Researchers usually find at least something significant

o Searching for connections is allowd, as long as it is presented


as exploratory research

o Hypothesising After Results are Known

 In hindsight, formulating hypothses and pretending that


they were the main focus of the research all along

o = HARKing

Solutions
- Retraction
o Form of self-correction afterwards
o Has drawbacks:
 Reputational damage researcher
 Reputational damage science in general
 Often a long time between publication and retraction
- Post Publication Peer Review (PPPR)
o Online discussion platform about publications
 Authors
 Editors
 Peers
- On Honestly and Accountability
- Pre-registration
o Mandatory submission of research protocol before execution
of actual research
 Hypotheses
 Methodology
 Expectation
o Publication independent of outcome
- Replication
o As a regular part of the research cycle

Statistical validity
Construct validity
- How well were the variables manipulated/measured?
- What was the manipulated variable?
o The independent variable
o The dependent variable:
 Score on 10 questions about the facts
 Score on 10 questions about relationships between facts

Internal validity: Priority!


- Design confounds
o Was the manipulated variable the only difference in the
treatment of the two groups?
o During the revising, the revision group was actively working
on their notes, did that make a difference?
o For this reason, the control group was allowed to recopy their
notes once (without making any imporvements)
o In this way, it was ensured that the revision was really the
only difference between the two groups
- Selection effect
o Were the two groups comparable at the start of the
experiment?

External validity
- Non-random sample leads to lower external validity
- In experimental research, this is not always problomatic

Statistical validity
- P<alpha, so Ho was rejected
o The researchers conclude that there is an effect of interim
revision of notes on learning achievement
o Diffeence between group 1 and group 2 is significant, but:
 With a large sample, a small difference canalready be
signficant
 A significant effect is not the same as a large effect
 Important question to ask: how big is that
difference/effect

Effect size
- Difference between the two groups: M1-M2
- Intervention: revision
- Comparison: recopy
- Cohen’s d
o Measure of relevance
o AKA standardised mean
difference
o Expressed the difference
between the two means in the
number of standardised
deviations:
o Guidelines for interpretation:

Confidence interval
- Another way to describe the size of
the difference between the two
groups is with a confidence
intervan (CI)
- What is a confidence interval?
o How can we use it?
- Recall:
o Every sample mean differs
from the population mean
o In the same way: the
difference between two
samples means differs from the difference in poupaltion
means
- Is this point estimate informative?
o Point estimate gives false certainty
o Better option is interval of probable values based on sample
data
- An interval of probable values can be:
o We expect thre true mean age of students at UCU to be
between 19.5 and 20.5: [19.5,20.5]
o The correlation between self esteem and extraversion is
estimated to fall between .10 and .25: [.10, .25]
o The difference between the mean scores using two different
teaching techniques is estimated to fall between 1.1 and 3.4
points: [1.1,3.4]
- The intervan used in NHST is called Confidence interval
o Width of interval says something abut the accuracy of the
estimation
o Researchers would like to see a narrow(?) interval
o Width of the interval depends on:
 Sample size
 Spread/variation in scores in population
 Chosen confidence level
Width Confidence interval
- Width of the interval depends on:
o Sample size:
 Larher sample gives more information and therefore
more certainty
 Larger sample gives a smaller standard erro  narrower
interval
o Spread in scores in population
 Greater spread in scores in population gives greater
spread in scores in sample, so more uncertainty  wider
interval
- Researcher often chooses level of confidence that matches the level
of significance
- Widely used significance level is alpha = .05, thus confidance level
of 95%
- A single confidance interval gives us an interval of plausible values
for the value in the population and we have confidence in the
process that is used
- With a single confidence interval, we don’t know if it is 1 out of 95%
or 1 out of 5%
o Chosen confidence level
 Higher confidence level gives more certainty, but wider
interval
 With a 99%CI, the interval is more likely to fall around
the population value
 With a 90%CI, we have less certainty
 Higher confidence level  wider interval

Confidence interval vs. hypothesis testing


-
- Under the null hypothesis the difference equals zero
- Interpretation of 95%CI:
o In the population we expect the difference to be somehwere
between 0.4 and 25.4 (example numbers)

Summary of statistical validity


- Four parts to evaluate the statistical validity:
o Significance is detemined based on the test statistic t and the
p-value
o Relevance is assessed using a measure of effect size, such as
cohen’s d
o Accuracy is assessed using a confidence interval
o Suitability of the statistical test is assessed by checking the
assumptions

Notes on relevance
- Relevance is assessed using a measure of effect size
o With a t-test, we use Cohen’s d
o With a regression analysis, the effect size is measured using
R2 (squared)
o With a Chi-squared test, we use a measure called Cramer’s V

Assumptions and alternatives


Three claims, Four validities
- Three claims:
o Frequency claim
o Association claim
o Causal claim
- Four validities
o Construct validity
o Internal validity
o External validity
o Statistical validity

Assessing statistical validity


- Significance is determined by the p-value
- Relevance is assessed using an effect size
- Accuracy is assessed using a confidence interval
- The suitability of the statistical test is assessed by checking the
assumptions and execution of the test
- To check suitability of a statistical test:
o 1. Check assumptions
o 2. Check if hypotheses match expectations
o 3. Check if results match hypotheses
Assumtpions of the t-test
- Before researchers can use the t-test, some assumptions must be
met:
o 1. The sample is a random sample
o 2. Dependent variable is of imterval or ratio measurement
level
o 3. The two groups are independent
o 4. Scores in both groups are normally distributed
o 5. Scores in both groups have equal spread
o Violating assumptions leads to lower statistocal validity

Assumption 1: The sample is a random sample


- How do we check this assumption?
o Read methods section of an article
o How did researchers elect the participants?
o Sampling method affects external validity
o Not always important in experiments
 Internal validity is main focus
- What if the sample is not a random sample?
o Caution should be used when interpreting statistical results
o Random sample ensures independence of observations
o (the math is based on this assumtpion)
Assumption 2: measurement level of Dependent Variable
- How can we check this assunmption?
o Read methods section of an article
o How are the constructs (and especially the DV) operationally
defined?
o Example:
 Aggression measured on a 1-10 scale with 1 meaning
not aggressive and 10 meaning extremely aggressive
 Aggression measured as:
 Not aggressive
 Migly aggressive
 Very aggressive
- What if the DV is not of interval or ratio measurement level?
o Other examples:
 Anwer to yes/no question
 Do you enjoy wathcing TV at night?
 Mode of transportation to go to work
o Solution:
 Use statistical test for categorical variables
  the Chi-squared test of homogeneity
Chi-squared test of homogeneity
- Two indpeendent samples
- DV is categorical
- Use to determine if the distribution of a categorical variable is the
same in two groups
o Can be used with more than 2 groups
- RQ example: Is the distribution of answers of people with treatment
the same as the distribution of answers of people without?

- Null hypothesis t-test:

o 𝐻0: 𝜇1 = 𝜇2

o With assumption of normality; and

o With assumption of equal variance:

 𝐻0: distribution 1 = distribution 2

o General form of null hypothesis

o Can be used for variables of all measurement levels

- Ho: the distribution of answers in the control group is equal to the


distribution of answered in the treatment condition
- Ha: the distribution of answers in the control condition is different
from the distribution of answers in the treatment condition
- NOTE: the execution of this chi-squared test is identical to that of
the Chi-squared test of independence

Back to the assumptions


- Before researchers can use the t-test, some assumptions must be
met:
o 1. The sample size is a random sample
o 2. Dependent variable is of interval or ration measurement
level
o 3. The two groups are independent
o 4. Scores in both groups are normally distributed
o 5. Scores in both groups have equal spread
o Assumption 3. Is also needed for the Chi-squared test

Effect size
- Measure of effect size for Chi-squared test is Cramer’s V
- Value between 0 and 1
- Measures the strength of dependency between the two nominal
variables
- “kind of” similar to a correlation

Reporting in APA style


- For Chi-squared test no CI needed (just for t-test)
- Test statisitc and p (inform of significance)
- Cramer’s V (inform of effect size)
- Confidence Intervan (inform of accuracy)
- Chi-squared test results in APA style:
o 1. Test statistic: X(squared)(2,n)
o 2. P-value
o 3. Effect size (V)
Assessing statistical validity
- Significance is determined by the p-value
- Relevance is assessed using effect size
- Accuracy is assessed using a confidence interval
- The suitability of the statistical test is assessed by checking the
assumptions and execution of the test

Checking suitability of a statistical test


- 1. Check assumptions
- 2. Check is hypotheses match expectations
- 3. Check if results match hypotheses

Assumptions of the independent samples t-test


- Before researchers use the t-test, some assumptions must be met:
o 1. The sample is a random sample
o 2. Dependent variable is in interval or ration measurement
level
o 3. The two groups are independent
o 4. Scores in both groups are normally distributed
o 5. Scores in both groups have equal spread

Assumption 3: two independent samples


- How do we check this assumption?
o Read methods section of an article
o Are the participants randomly assigned to two separate
groups?
o Is there a link between the measurements in the two groups?
 Independent samples
 Repeated measurement
 Paired measurement
- What if the two groups are not independent?
o Examples:
 Scores on pre-test and post-test
 Grades at the beginning of the year and at the end of
the year
 Height of the youngest and oldest twin
o Solution:
 Conduct a t-test for dependent samples

Paired samples t-test


- DV is of interval/ratio measurement level
o Just like in the independent samples t-test
- Two dependent samples
o Not like in the independent samples t-test
Example of a repeated measures experiment
- RQ:
o Do people improve their feeling of well-being using
meditiation therapy?
o Participants fill in Quality of Well-Being Scale (QWB)
o Participants follow a month of meditation therapy
o Participants fill in QWB again

- Null hypothesis independent samples t-test:

o 𝐻0: 𝜇1 = 𝜇2 or 𝐻0: 𝜇1 − 𝜇2 = 0

- For dependent observations we define:

o 𝐷 = 𝑋after − 𝑋before

- We then get:

o 𝜇𝐷 = 𝜇after − 𝜇before

- And the null hypothesis for the paired samples t-test:

o 𝐻0: 𝜇𝐷 = 0

Effect size
- Also use Cohen’s d
- Formula is a little different:

Assumption 4: normally distributed scores


- How do we check this assumption?
o Histogram
 Independent samples t-test: two histograms
 1 of the scores in control group
 1 of the scores in experimental group
 Paired t-test: one histogram
 Calculate difference scores
 Make histogram of difference scores
- What if the scores are not normally distributed?
o Solution:
 In case of minor deviations  you may still use t-test
 In case of large samples  you may still use t-test
 In case of small samples and large deviations  use and
alternative

Assumption 5: equal variances


- How can we check this assumption?
o 1. Use a graph
 Side-by-side boxplots
o 2. Use a test
 There are formal tests for equal variances
 Levene’s test
 Brown-Forsythe test
 F-max test
 Hypotheses:

 H 0: s12 = s22

 H1: s12 ≠ s2

- What if the equal variance assumption is not satisfied?

o Solution:

 Use an alternative called Welch’s test

 Do that in JASP

 The t-test we use under the assumption of equal


variances has more power, so that option is preferred

Notes about testing for equal variances

- Graphical check is preferred, because:

o Multiple testing may inflate chance if type I error

o Variance tests are not robust

 When sample sizes are small:

 May indicate not equal variances when they are

 When sample sizes vary substantially:

 May indicate ewual variances when they are not

Bayesian testing
Inferential statistics: NHST steps
1. Formulate a hypothesis
2. Choose test statistic and compute its value
3. Calculate the p-value
4. Make a decision about Ho
5. State the conclusion
- The p-value measures:
o Given the null hypothesis is true, what is the chance of
observing the date we observed?
- In order to use NHST, researchers must make many assumptions
- E.g.
o Distribution population is normal
o Variances in two populations are equal
o Null hypothesis is true
- Some researchers prefer not to make so many assumptions

Hypothesis testing
- An alternative way to test hypotheses is called
o Bayesian Testing
- What is the idea behind Bayesian testing?
o In Bayesian testing, we calculate:
 Given the date we onserved, what is the chance the null
hypothesis is true?
o Compared to NHST:
 Given the null hypothesis is true, what is the chance of
observing the data we observed  what the p-value
measures

Bayesian testing
- In Bayesian testing, we do not report a p-value
- In bayesion testign we report what is called the Bayes Factor
o It measures how much more likely is the null hypothesis as
compared to the alternative hypothesis, given the oberved
date
o The Bayes Factor (BF) measures this using ratio (fraction)
o In Bayesian statistics we look at the relative support for one
hypothesis over the other:

o The Bayes Factor measures:


 How much more does the observed data support the
null hypothesis as compared to the alternative
hypothesis?
How do you interpret a Bayes factor?
- BF = 5 means that the support in the data for Ho is 5 times greater
than the support for H1
- How about a BF smaller than 1?
o BF = 0.4 would mean that the support for Ho is 0.4 times
greater than for H1
- The Bayes factor quantifies the relative support for one hypothesis
over the other
- This can be done with either Ho or H1 on top
- We use subscript to indicate this:
Reporting the BF
- The question “when is the BF large enough to choose Ho or H1?” is
not easily answered
o 1. If BF01 = 100, then there is still doubt that Ho is supported
more then H1
o 2. If BF01 = 25, there is still substantially more support for H0
than for H1
o 3. If BF01 = 5, then tehre is still more support for Ho, but not
so much more that H1 can be disqualified completely
o 4. If BF01 = 1.5 then there is not really a preference for Ho or
H1
- Good idea to interpret each value of BF without referring to any kind
of cut-off value

Interval estimation
- In NHST
o Interval estimate to give the reader an idea of the size of the
effect:
 Confidence interval
o In Bayesian testing:
 Credible interval

Interpretation Credible Interval


- 95% Credible Interval for the mean score in condition A [37.3,56.5]
- Interpreted as:
o Given the evidence in the data, the mean score of condition A
has a 95% chance of falling between 37.3 and 56.5
o Note the difference in interpretation with a confidence interval

Correlation
- Bayesian statistics can also be used for other analysis techniques
- E.g. correlation
- Using Pearson’s r and BF instead of Pearson’s r and p-value

Catergorical data
- Instead of X2 you get BF10 (10 is standard in JASP) independent
multinomial

Replication
Studies
Reproductability Project
- Researchers from all over the world collaborated to replicate 100
emperical studies from three top psychology journals, such as:
o Psychological science
o Journal of personality and social psychology
o Journal of experimental psychology
o Could they reproduce the results of the original studies?
- Significance
o In allmost all original studies the null hypothesis was rejected
o Only in 1/3 of the replication studies was the null hypothesis
rejected
- Effect size
o The effect sizes (like Cohen’s d) were only half as large in the
replication studies as in the original studies
- Non-profit technology organisation with a mission to “increase the
openness, integrity, and reproducubility of scientific research”

- There became a replication crisis, because a lot of work was not


replicable

- This was because many researchers had a career ythat depended


on publishing work, and journals only publishing ‘exiting news’
which leads to results that are not significant not being published,
researchers feeling the need to manipulate data or researchers
finding any result from the data that could be seen as significant

- Open science – open access

o Everyone should have acces to this scientific knowledge


o Everyone should be able to use it for the benefit of
science/society
o Open access aims for free and open online access to scientific
information
 No financial, legal, or technical barriers
o Everyone should be able to search, read, download, print,
copy, and distribute, the information
o Journals that publish scientific research results:
 Fully open access
 Hybrid
o Advanatges:
 Increases citations
 Increases visibility of academic research
 Increases reusability of acadmic results
o Disadvantages for the consumer
 The range of high-quality, fully open access journals is
still limited
 The number of available reliebale jounals and articles
varies per discipline
o Check for quality:
 Directory of Open Access Journals (DOAJ)
 Web of Science
 Scopus
o Open science
 Possible due to digital age
 Developed due to desire for more honesty and
transparancy in the research process
- FAIR principles  Findable Accesible Interoperable Reusable
o Guidelines for how data/software/programming code should
be:
 Described
 Stored
 Shared
o Leads to:
 A greater efficiency of research process, because new
research questions do not always require the collection
of new data because suitable data are already available
 Better reproducibility and greater reliability of research
- Data Management Plan
o Researchers should think about:
 The way in which the data is stored
 The way in which data manipulation is administered
 How access to the data is arranged
 The way in which anonymised data may be shared with
third parties

- Replication studies
o Three types:
 Direct replication
 Adavantages:
o Easy to compare
 Disadvantages:
o Problems with internal validity in original
research will still be present
 Conceptual replication
 Advantages:
o Ability to improve design
o Increase internal validity
 Disadvantages:
o Not as easy to compare
 Replication-plus-extension
 Advantages:
o Possibility to examine additional research
questions
 Disadvantages:
o Not as easy to compare

You might also like