L2 ResearchDesign - BRSM Lecture2
L2 ResearchDesign - BRSM Lecture2
Design
BRSM
Measurement in the behavioral
sciences
Measurement Examples
Define the property you want to study Aggression (operational
Find a way to detect that property definition? Measure?)
Intelligence (operational
definition? Measure?)
Productivity in the office (operational
definition? Measure?)
Age (how do you measure this? Depends..
Developmental psych? Consumer
research?)
Operational
definition
• A working definition of what
a researcher is measuring
Interval
Ratio
Nominal
scale Categorical
• If it is an ordinal scale measurement, there are some sensible ways to do this and others that don't make sense
• Again, the average does not make sense: the average endorsed statement here is 1.97
Interval scale
Both interval and ratio scales: Interval: differences between Addition and subtraction make
numerical value now can be numbers make sense, but there is sense, but not multiplication or
interpreted directly no natural "zero" on this scale division
Examples? -- what type of scale? Discrete or • RTs – ratio scale and continuous
continuous? • Year in which participants were
• RTs? born – interval scale and discrete
• Year in which participants were born? • Temperature – interval scale and
• Temperature? continuous
• Your mode of transport to work? • Your mode of transport to work? -
• Place attained in a race? nominal and discrete
• Place attained in a race? - ordinal
and discrete
Continuous vs discrete variables
Real world variables may not always
adhere to these classifications
1. Strongly disagree
• Likert scale 2. Disagree
3. Neutral
• Choose from the following options. You feel happy today: 4. Agree
• What scale is this? 5. Strongly agree
• Nominal? (hint: is there a natural ordering? If so, it can't be nominal)
• Ratio? (hint: is there a natural "zero"?)
• Ordinal or interval. Which one is it?
• Can we prove that everybody treats the difference between 1. and 2. the same as the
difference between 4. and 5.?
• In practice, most people treat the likert scale as an interval scale since many participants
treat the entire scale seriously (but this is very much dependent on the task and context).
Is the measurement any good?
E.g.
Test-retest Inter-rater
reliability reliability
Parallel Internal
forms consistency
reliability reliability
Test-retest reliability
terminology
Experimental
Research
• The experimenter
controls everything
• Manipulates the
predictors and sees
how the outcome
changes
Practical
issues We cannot possibly think
of ALL the predictors that
can influence the outcome
everyone
Experimental Control
group group
aggression
Test for a difference in the records
between game players and non-players
• Perhaps the people playing violent video games as young children are
also ones without proper parental support
• In the previous study, there was no consideration of this potential
confound
The ideal experiment?
then?
Incorporate confounds as covariates in your
statistical models!
• Internal validity
• External validity
• Construct validity
• Face validity
• Ecological validity
Internal validity
• The ability to draw cause and effect inferences from the data
• The effect of covid (Delta) on IQ.
• Recruit govt hospital patients. Compare with healthy controls who
responded to online ads for your study.
• Internal validity?
External validity
DOES YOUR TEST "APPEAR" TO BE DOESN'T REALLY MATTER FOR CAN MATTER IF YOU'RE TRYING TO
DOING THE JOB IT SAYS IT WILL DO? SCIENTISTS. CONVINCE POLICY MAKERS FOR
EXAMPLE. THEN THEIR PERCEPTION
ABOUT THE TEST WOULD MATTER.
Ecological
validity
• Does the experiment closely mimic
real-world scenarios?
"Hawthorne" effect
fabrication https://retractionwatch.com/
https://retractionwatch.com/
e.g. surveys that are self-evident, sit back and let reactivity decide your
results for you. If reviewers don't see the full surveys, this may not get
detected
Data mining and post-hoc
hypothesizing
• Data mining: I run 50 different variations of a model. Report only the one that
worked.
• If you are honest, your statistical methods would "correct" for the 50 times
you touched the data because we want to know that the result obtained is a
true one that is not likely to have come about due to mere chance.
• Post-hoc hypothesizing: my initial hypothesis didn't work but as part of the
data mining effort above, I found something else and reported that I had
actually hypothesized it.
• Huge statistical issue when you do this because many frequentist statistical
methods depend on assumptions made about the null hypothesis
Publication Bias
1 2 3 4
Be aware of all the Be aware of potential Address the Be aware of dubious
different ways in confounds confounds using practices such as
which the data from a statistical methods data mining and post-
study may have issues hoc hypothesizing
with reliability/validity
Advanced topics
Install R and RStudio
• http://cran.r-project.org/
• RStudio: http://www.RStudio.org/