Psych Stat Hypotheses Analyzing Results Drawing Conclusions

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

lOMoARcPSD|15928116

Psych Stat - Hypotheses; Analyzing Results; Drawing


Conclusions
Psychological Statistics, Lec (Ateneo de Manila University)

StuDocu is not sponsored or endorsed by any college or university


Downloaded by Carl Darcy ([email protected])
lOMoARcPSD|15928116

Chapter 13 – Why We Need Statistics • Statistical tests – checking whether pattern of


results are significantly different from what is

The Argument for Using Statistics expected given usual variability among different
people
• Statistics – quantitative measurements of
• Scientific Method = objective way
samples
o Statistical tests = standards
o For objective, consensual techniques
§ Conventions, guidelines on
for describing their results
what is significant
o Numerical index of characteristics of
o Outcomes of statistical tests used to
data
check if IVs have effects in psych
• Choosing statistical tests = central to hypothesis
experiments
testing
• Allows for objective evaluation of data
Null Hypothesis (H0)

Weighing the Evidence • Differences between treatments = nothing; not


significant
• We can never actually prove an IV caused the
o Performance of treatment groups so
changes we see in a DV
similar that scores could have been
o Cannot establish truth of it by
sampled from same population
presenting evidence; logical arguments
• Assume null hypothesis is correct until
alone
evidence shows it can be rejected
• Statistical tests – determine if IV probably
• Most aim to reject the null hypothesis
caused changes in the DV
o Shows that effect from IV caused real
o Must consider chance, coincidences
differences in groups
§ Differences sufficiently large
Statistical Inference – An Overview
enough that they are unlikely
• Population – consists of all individuals that have
to be encountered in same
at least one characteristic in common
population (even with normal
o Sample – group that represents larger
variability)
population
§ Data is so different that they
• Randomly selected samples – inferences that
look as if they came from
effects are generalizable to same population OK
different population
o Not randomly selected – generalizable
§ Normal variability on DV – not
only to extent samples truly represent
enough to account for results
larger population
o Change between groups as a result of
§ Ex. unique, special
experiment = confirmed;
characteristics
STATISTICALLY SIGNIFICANT
• Statistical Inference – making a statement
• Testing if population of treated subjects’ scores
about the population and all its samples based
DIFFERS from population of untreated subjects’
on what is seen in samples we have
scores
o Averages – means; cannot be used for
o Fail to reject the null hypothesis –
conclusions about separate
scores are still so similar that exp.
populations alone
manipulation had little impact
• Unusual scores = outliers
o More likely due to normal variability

Defining Variability
Alternative Hypothesis (H1)
• Variability – amount of change or fluctuation
• Research hypothesis
we see in something
o Data comes from different populations

Downloaded by Carl Darcy ([email protected])


lOMoARcPSD|15928116

[email protected]. Printing is for personal, private use only. No part of this book may be reproduced or transmitted
• No way to directly test – no way to really prove o Great deal of variability = large
ermission. Violators will be prosecuted.
research hypotheses are correct differences
• Can only show unlikeliness of patterns coming • More variability on DV = greater difference in
from chance treatment groups has to be before data can
o Null hypothesis is probably wrong look like they come from different populations
• The more variability, the harder it will be to
Testing Null Hypothesis reject the null hypothesis
• Statistical tests – formulation of null hypotheses
o Stating that performance of treatment Applying Statistical Inference
groups are so similar, they must (book has a long example lol)
belong to the same population o Ex. directional hypothesis – predicts
• Rejecting Null Hypothesis way which difference between groups
o Difference between treatment groups will go
so large that chance variations cannot • Consider variability in data - evaluate data using
explain it statistical tests
• Effect of IV (ex. large effect) o State null hypothesis (ex. coming from
o Can be shown by frequency same population)
distributions • Normal curve -> symmetrical, bell-shaped curve
o Many scores fall close to the center
o Most close to the mean
§ Further away from mean =
lower frequency with which
they appear
• Most tests assume populations sampled are
normally distributed on normal variable
o As if everyone in that population
measured on that variable = normal
curve frequency distribution
• Known distributions
o Can be used to make inferences about
The Process of Statistical Inference data
1. Consider population to be sampled o May require knowledge of probability
a. Variability – individual scores on DV theory (ex. odds of strength of
will differ variability’s effect)
2. Consider different random samples in • Null hypothesis (H0) tested for two reasons:
population o Most likely explanation of what has
a. Scores on DV will differ because of occurred
normal variability § Expected variation between
b. Assume null hypothesis is correct groups, times accounted for
3. Apply treatment conditions to randomly o No way to directly verify the
selected, randomly assigned samples alternative to the null hypothesis
4. IF after treatment, samples now appear to • H1 states that treatment means so different
belong to different populations = Reject the that they come from distinct populations
Null Hypothesis o However = there is always some
chance that the results were caused by
• Whether or not we reject null hypothesis – sampling error
depends on variability

Downloaded by Carl Darcy ([email protected])


lOMoARcPSD|15928116

Choosing a Significance Level Type 1 Error


• Significance level = criterion for deciding • Rejecting null hypothesis when it is really true
whether to reject the null hypothesis or not o Saying significant differences due to
o Evaluation of whether it is probably IV when really IV had little/no effect
correct + risks involved in making the • False positive
wrong decision o Like putting an innocent person in jail
• p < .05 = generally reject the null hypothesis, • Probability of making Type 1 error -> Greek
statistically significant letter alpha α
o Usually used in psych experiments o 1 – α chance of failing to reject the null
o Chance of obtaining data by chance when it is the correct decision
alone is less than 5% • Odds of making Type 1 error (α) = equal to the
o It could have occurred by chance less value we choose as the significance level for
than 5 times out of 100 rejecting the null hypothesis
• p < .01 = chance is less than 1 in 100 o Ex. .05 significance; probability of Type
o Usually in pharmaceutical research 1 error also .05
o medical research – p < .001 • Can possibly choose more extreme significance
• Level should be stated in research report level = but, may increase chances of Type 2
o Should be thought of in advance; not error
when results are already obtained
o Allows readers to evaluate these on Type 2 Error
their own • Fail to reject the null hypothesis even though it
• Data is “statistically significant” is really false
o Another way of saying they look as if o Concluding that results come from
they came from different populations chance; when really IV caused
§ IV had an effect differences
• Reality – many EVs can affect outcome of o Unable to detect IV that has produced
experiment = not true reflection of IV effects an effect
o Experimental errors – variations in • Less serious than making Type 1 error
subject scored produced by EVs, • Treatment effects are easier to find when the IV
experimenter bias, etc.; not related to has a very dramatic effect + not much variability
the IV effects on the DV
o Overlap cannot be perfectly measured o May end up sampling from two
= conclusions can be wrong completely distinct populations
@placeholder.24190.edu. Printing is for personal, private use only. No part of this book may be reproduced or transmitted
• Probability affected by amount of overlap
ion. Violators will be prosecuted.
Type 1 and Type 2 Errors between population samples
• Always some probability that null hypothesis o Ex. responses similar = hard to show
could be true even if statistically significant that IV altered responses; more
overlap = harder it is to detect the
effect of the IV
• β – Greek letter beta represents probability of
making Type 2 error
o exact value of beta found only through
measuring all possible samples of both
populations

Downloaded by Carl Darcy ([email protected])


lOMoARcPSD|15928116

o Odds of correctly rejecting null • Recommendation – null hypothesis significance


hypothesis when it is false ALWAYS testing should be supported by effect size
equal to 1 – β estimates and confidence intervals
• 1 – β -> power of the statistical test
The Odds of Finding Significance
Reducing 𝛃 The Importance of Variability
• Increase sample size • Different distribution can be made from finding
• Reducing variability in sample data (ex. differences between means (of all possible pairs
controlling EVs; within-subjects/matched- of samples)
groups) o Differences may fall right around
• Accept less extreme significance level 0/tend to be small since means are
o E.g more likely to detect differences normally distributed
at .05 than at .01 o Some differences may be large = at
o However also increases chance of Type extremes of distribution
1 error • Some differences between means more likely
• Using more powerful parametric tests than others
Parametric Tests o Normal distributions – able to
• Make certain assumptions about parameters of calculate odds that each difference will
population represented by samples in occur
experiment § Chances of getting small
o Ex. normally distributed data; differences between means of
comparable variability across groups; samples are high
interval, ratio scale data, etc. § Most sample means fall close
• Nonparametric tests – used when these to population mean =
assumptions cannot be met differences will be close to 0
o Odds of seeing larger differences less

Going Beyond Testing the Null Hypothesis § Exact odds depend on


variability in population
• Criticism of using null hypothesis testing
PRINTED BY: [email protected]. Printing is for personal, private use only. No part of this book may be reproduced or transmitted
without publisher's prior permission. Violators will be prosecuted. § More likely -> high variability
o Theoretical contributions have been
on the DV
made without using p values, statistics
Critical Regions
§ Ex. Piaget, Freud, Skinner
o Nothing magic about the .05 criterion
= arbitrary decision to make this the
benchmark for statistically significant
results
• Cohen (1994) – estimates of effect size = more
compelling evidence for treatments VS p values
for rejecting null
o Effect size – statistical estimate of
size/magnitude of treatment effect
• Cohen – confidence intervals must be used
o Range of values that we have
confidence that population/true mean
is included
• Suggestion -> prep instead of p values (average
probability of replication)

Downloaded by Carl Darcy ([email protected])


lOMoARcPSD|15928116

Critical Regions Two-Tailed Test


• Ex. p < .05 • Critical region of distribution divided between
o Parts of the distribution that make up its two tails
most extreme 5% of the differences o Ex. 2.5% on each side
between means • Non-directional hypotheses
o Differences large enough to fall within o One that does not predict the exact
these areas will occur by chance less pattern of results
than 5% of the time o Direction of the effect through
• Reject the null hypothesis if treatment groups manipulation not yet
differ by amounts that fall within these critical predicted/proposed
region • Want to know if pattern of results is so unlikely
o Different places/cutoffs for critical that is was most likely NOT caused by chance
regions for each distribution variations in the population
• As amount of variability in distribution • EXTREME DIFFERENCES CAN GO IN EITHER
increases, critical regions fall farther from DIRECTION
center of distribution o Critical region split between both tails
o More variability = larger differences of the curve
between means required/needed to o Any significant difference is postulated
reject the null hypothesis One-Tailed Test
• Ideal – treatment conditions should be only • Directional hypothesis
source of variability o 5% critical region located in just one
o Reduce experimental error by tail of the distribution
controlling variables • Advantage – size of critical region larger and
• Population with high variability -> assume high closer to center of distribution
variability on DV o Easier for differences between means
o Large differences needed to be to be large enough to fall there
statistically significant o Statistical value needed to achieve p
• Unnecessary sources of variation -> increase < .05 smaller in one-tailed tests
@placeholder.24190.edu. Printing is for personal, private use only. No part of this book may be reproduced or transmitted
on. Violators will be prosecuted.chances of Type 2 error • Treatment effects do not be to be super
dramatic to be significant
One-Tailed and Two-Tailed Tests • Hypotheses must be thought of in advance and
stuck to

Test Statistics
• Inferential/Test Statistics
o Can be used as indicators of what is
going on in the population
o Can be used to evaluate results;
numerical summary of what is going
on in the data
• Transformation of relationship between
treatment diff and variability -> simple
quantitative measure
• LARGER VALUE OF TEST STATISTIC = MORE

LIKELY IV HAD A SIGNIFICANT EFFECT
o Large – large VS amount of variability

§ More likely to reject the null

Downloaded by Carl Darcy ([email protected])


lOMoARcPSD|15928116

• Each test statistic – own distribution of values • Income -> subject to distortion
o Not symmetrical but positively skewed
Organizing and Summarizing the Data (long tail on right side/positive side)
Organizing o Mean higher than median or mode

[email protected]. Printing is for personal, private use only. No part of this book may be reproduced or transmitted
permission. Violators will be prosecuted.



Summarizing – Descriptives
• b) Bimodal – two scores can be mode
• Raw data – data we record as we run an
• c) positively skewed
experiment
• d) negatively skewed
• Summary data – reported results of experiment
o If not small N – group data should be
Measures of Variability
noted, not individual scores
• Defined numerically by one of several
• Descriptive statistics – shorthand ways of
descriptive statistics:
describing (group) data
o Range, variance, SD
o Comparison of variability of diff
Measures of Central Tendency
samples possible
• Describe what is typical of a distribution of
• Range – difference between largest and
scores
smallest scores in data
o Mode – occurs most often
o Indication of spread of scores
o Median – score that divides the
o Does not reflect precise amount of
distribution in half
variability in all the scores
§ Half above the median, half 2
• Variance (s ) – average squared deviation of
below the median
scores from their mean
o Mean – arithmetic average
o How much scores are
§ Sum of scores over total
dispersed/spread out around mean of
number of scores
data
• Indicators of shape of distributions
• Standard deviation (s) – square root of the
o Mean = median = mode -> symmetrical
variance
distribution
o Average deviation of scores about the
• Skewness – one tail of distribution is longer
mean
than the other
o Total of the deviations from the mean
is always 0

Downloaded by Carl Darcy ([email protected])


lOMoARcPSD|15928116

§ Square roots = return to


original unsquared units of
measurement
• Reporting of M, SD in place of raw data
o And then statistical test used to
NTED BY: [email protected]. Printing is for personal, private use only. No part of this book may be reproduced or transmitted
hout publisher's prior permission. Violators will be prosecuted.
interpret results


















PRINTED BY: [email protected]. Printing is for personal, private use only. No part of this book may be reproduced or transmitted
without publisher's prior permission. Violators will be prosecuted.




































Downloaded by Carl Darcy ([email protected])


mission. Violators will be prosecuted.
without publisher's prior permission. Violators will be prosecuted.
lOMoARcPSD|15928116

Chapter 14 – Analyzing Results

Which Test Do I Use?


Levels of Measurement
• RATIO
o Equal intervals between all values
o Absolute 0 point
o Ex. minutes – 2 minutes 2x as long as 1
• INTERVAL
o Magnitude/quantitative size
o Equal intervals
o No true 0 point
• ORDINAL
o Only magnitude differences
§ RANKS
o Cannot be sure that intervals are equal
o No true 0
• NOMINAL
o No quantitative relationship
o Least information
o No magnitude, equal intervals
• Most information – ratio, interval
o Most preferred by researchers

Selecting a Statistical Test

Downloaded by Carl Darcy ([email protected])


lOMoARcPSD|15928116

Statistics for Two-Group Experiments The t Test


• 1 IV; two treatment conditions; between-
2
The Chi-Square (x ) Test subjects; not matched; ratio DV
• 1 IV; two treatment conditions; between- • t – computational way of relating differences
subjects; not matched; nominal DV between treatment means
• Inferential statistic o Parametric test
o Computing for statistical value, • Evaluating likelihood of obtaining particular
compared with critical value value of t – t-test
o Critical value – must be exceeded to
reject the null hypothesis at a certain Effects of Sample Size
significance level • Small samples vary more from the mean of
• Nonparametric – does not assume population population VS large samples
has certain parameters (ex. normality, equality o Parametric tests – relationship
of variances) between treatment effects and
o Comparison of frequencies – whether variability
sample represents population o Sample size – affects variability and
2
• When H0 is true = x = 0 size of test statistics
2
o Value of x increases as differences • Distributions’ shape may change based on size
increase of samples
2
o X > critical value = reject null at p < .05 o Ex. t-distribution = becomes more like
• Independent samples normal curve as sample size increases
[email protected]. Printingo is for personal, private use only. No part of this book may be reproduced or
Usually nominal data • transmitted
Requirements of t-test – data must come from
permission. Violators will be prosecuted.
• 2 x 2 contingency table normally distributed populations
o tabulation of frequency o Large samples better
o Obtained frequencies (O) VS Expected § Easier to reject null because
frequencies (E) critical value gets smaller as
§ O = E, accept the null sample increases
Degrees of Freedom (df) o Fewer subjects – more chance of Type
• How many members of a set of data could 2 error
vary/change value without changing value of • Change in t distribution = change of critical
statistic already known value of t needed to reject the null
o Ex. how many members could change o Fewer df = more variability between
without affecting the mean samples; more cases away from the
PRINTED BY: [email protected]. Printing is for personal, private use only. No part of this book may be reproduc
without
• publisher's prior permission. Violators will be prosecuted.
Vary in a way related to number of subjects mean
sampled § = large differences by chance
Interpreting of the Chi Square • T-test = ROBUST

• Value obtained must exceed critical value for o Assumptions of the test CAN be
appropriate df violated without changing rate of Type
1, Type 2 errors
§ Ex. normality, comparison of
Y: [email protected]. Printing is for personal, private use only. No part of this bookvariability
may be reproduced or transmitted
sher's prior permission. Violators will be prosecuted. • Effect size – transforming t values and dfs to
correlation coefficient r


o

Downloaded by Carl Darcy ([email protected])


lOMoARcPSD|15928116

• CONFIDENCE INTERVALS – range of values o Related to estimates of variability


above and below our sample mean that is likely
to contain the population mean with the Sources of Variability
PRINTED BY: [email protected].
probability level that the true mean would Printing is for personal, private use only. No part of this book may be r
• Individual differences, subject variables
without publisher's prior permission. Violators will be prosecuted.
actually fall somewhere in that range o Random assignment, matching
PRINTED BY: [email protected].• Printing is for personal, private use only. No part of this book may be repro
ERROR
without publisher's prior permission. Violators will be prosecuted. o Individual differences, undetected
The t Test for Matched Groups
• Within-subjects t-test (dependent) mistakes in recording data, variations
o Interval, ratio data + normality on DV in testing conditions, other EVs
• Different computations VS independent t-test o Explain variability seen WITHIN groups
o df for matched groups N – 1 , where N • Experimental Manipulation
is number of pairs o Different treatment conditions and
• Fewer df VS independent t-test different behaviour
o Critical value of t needed to reject null § Expecting conditions to create
gets BIGGER as df gets smaller variability
§ Fewer df – harder to reject o Only between responses of different
PRINTED BY: [email protected]. Printing is for
the null personal, private use only. No part of this book may be reproduced or tr
treatment groups (not within or
without publisher's prior permission. Violators will be prosecuted.
o Still used however bc of less variability between specifically)
VS between-groups § Between groups = ERROR +
§ Computed value of t larger = treatment effects
less variability coming outside
the IV
• T-test for matched groups MORE POWERFUL vs
independent t-test
o Less chance of Type 2 error
• F ratio – comparison of sizes of variability
Analyzing Multiple Groups and Factorial Experiments
Analysis of Variance
• ANOVA – evaluates differences among 3+
treatments means
o Division of variance in data =

component parts
• F = 1; IV had no effect
§ Within groups variability
o Larger effect of IV = larger F
§ Between groups variability
§ Between larger VS within
o Within groups variability
• Distribution of F – significance of computed F
§ Differences WITHIN groups
ratio
o Between groups variability
o Size of sample can affect shape of
§ Differences ACROSS groups
distribution ‘
o Comparison + evaluation for statistical
o Df used to select correct distribution,
significance
critical values
§ Different proportions – vary
depending on effect of IV

§ Likelihood that proportions
could come from chance
• T – differences between treatment
groups/means (independent); pairs of scores
(matched-groups)

Downloaded by Carl Darcy ([email protected])


lOMoARcPSD|15928116

r personal, private use only. No part of this book may be reproduced or transmitted
.edu. Printing A One-Way ANOVA (Between-Subjects)
is for personal, private use only. No part of this book may be reproduced
• orEffect size
transmitted
e prosecuted.
• Between-subjects, ratio, 1 IV
o Null – means of groups come from
o
same population
o - proportion of variance in all the
• Treatment groups independent; random
scores that can be explained by the
selection; normality on DV
treatment
o Equality of variances =
HOMOGENEOUS
Graphing the Results
• Robust test, like t-test
• Data points usually represent group means
Within and Between Groups Variability
o Data of individual subjects usually not
• MS – mean square; variability, variance
graphed unless small N
o Average squared deviation
• F = MSW and MSB
Statistical Control for Differences Between Groups
• Moderating Variable – one that can
moderate/change the influence of IV
• ANCOVA – Analysis of Covariance
o Control statistically for potential
moderating variables
Interpreting the Results o Removing variance on the moderating
• F – only tests overall pattern of different means variable (covariate) from variance
o Significant F – across group means, produced by error
there is a significant difference § Refine estimates of
§ 3+ groups – ANOVA does not experimental error
identify specific differences § Adjust treatment effects for
between each pair of means any differences between
• POST-HOC TESTS – tests done after significant groups that existed before the
analyses experiment
o Tukey, Scheffe o Holding moderating variable constant
§ Pair-by-pair comparisons OR statistically equating subjects
o Can however result in less power to § More sensitive to detecting IV
detect treatment effects + increase effects
chances of Type 2 error o Whether covariate was an important
o Series of t-tests for pairwise influence or not
comparisons = Type 1 error
• A PRIORI COMPARISONS One-Way Within-Subjects/Repeated Measures ANOVA
o Tests between specific treatment • Multiple groups with 1 IV, within-subjects
groups that were anticipated/planned • Same as for independent groups but
before experiment was conducted denominator calculated differently
§ Planned o Between/within-subjects factorials
comparisons/contrasts o Mixed factorial designs
o As long as number of comparisons less
than number of treatment groups, Analyzing Data from a Between-Subjects Factorial
Type 1 error chances not increased Experiment
o More powerful VS post hoc (less • Factorial – effects of more than one IV +
conservative) interaction effects
o Main effects, interaction effects

Downloaded by Carl Darcy ([email protected])


lOMoARcPSD|15928116

• Factorial design – determining treatment Calculating Effect Sizes


[email protected]. Printing is for personal, private use only. No part of this book may be reproduced or transmitted
effects more complex than just one IV
ssion. Violators will be prosecuted.
o 2x2 – Two-Way ANOVA
• Between-groups variability •
[email protected]. Printing is for personal, private use only. No part of this book may be reproduced orGraphing Factorials
transmitted
ssion. Violators will be prosecuted. o From error and treatment effects (but
this time, may come differently from • vertical axis – DV
each IV) • horizontal axis – different levels of one IV
• line graphs -> each line represent levels of the
IV
• graphs do not substitute for statistical analysis

Repeated Measures and Mixed Factorial Designs


PRINTED BY: [email protected]. Printing is for personal, private use only. No part of this book may be reproduced or tra
without publisher's prior permission. Violators will•be prosecuted.
ANOVA for within-subjects factorial + mixed
factorial
• Total variability broken down into component
aceholder.24190.edu. Printing is for personal, private use only. No part of this book may be reproduced or transmitted
parts = treatment and error; F ratio computed
Violators will be prosecuted.
Two-Way ANOVA o Main effects and interactions
• Based on same assumptions as one-way ANOVA compared to critical values needed to
o Independent treatment groups, etc. reject the null (must exceed)
???? Interpreting Significant Effects
• Two significant main effects + significant
interaction = interpretation tricky
o Two-way interaction significant?
Limited conclusion about any
significant main effects
o Interaction – discuss effects of IV in
combination
§ Value of the impact depends
on the value of the other
• Post hoc tests – pinpoint differences among
experimental conditions involved in interaction
(Ex. 4 for 2x2)
o Null – groups sampled from the same
population; interaction would tell us
they are probably not
o Post hoc – which of the four groups
are different from the other



Evaluating the F-ratios
• Comparison to table values of F
o Locate critical value of F using df on
the F ratio
o Numerator and denominator df
§ Each ratio has own critical
value

Downloaded by Carl Darcy ([email protected])


lOMoARcPSD|15928116

NTED BY: [email protected]. Printing is for personal, private use only. No part of this book may be reproduced or transmitted
hout publisher's prior permission. Violators will be prosecuted.
















PRINTED BY: [email protected]. Printing is for personal, private use only. No part of this book may be reproduced or transmitted
without publisher's prior permission. Violators will be prosecuted.




















































Downloaded by Carl Darcy ([email protected])


lOMoARcPSD|15928116

Chapter 15 – Drawing Conclusions: The Search for the • Spending too much time listing any possible EVs
Elusive Bottom Line o If events are infrequent and could
occur randomly for Ss in any condition
Evaluating the Experiment from the Inside: INTERNAL = not considered a problem
VALIDITY § Adding error only; no need to
• Internally valid list
o When effects of EVs have not been • Spending too little time thinking of alternative
mistaken for the effects of the IV/s explanations = problem
o Free of confounding – effects on DV • If something other than the IV can explain
are due only to the treatment results = NOT INTERNALLY VALID
• Experiments with obvious source of o Review classic threats to internal
confounding – rarely make to print validity
• Importance of planning ahead o Discuss problems openly
o Procedures – appropriate methods and o We can never directly confirm/”prove”
control techniques hypotheses
o Enough levels of the IV = adequate test
of hypothesis Statistical Conclusion Validity
o Random assignment; constancy of • Validity of drawing conclusions about treatment
conditions; counterbalancing effects from statistical results that were
• Consider what occurs during actual experiment obtained
– MANIPULATION CHECK o Assumptions violated – statistical
o Verifies how successfully the conclusion validity is in doubt
experimenter manipulated the § Ex. using test statistic
situation he/she intended to produce inappropriately (ex. ANOVA
• Informal interviews/questionnaires following for nominal) = doubt
experiments • Number of statistical tests computed + power
o Sense of whether subjects guessed the of statistical test = common sources of low
hypothesis validity
• PACT OF IGNORANCE (Orne, 1969) • Statistical tests = probability statements only
o Ss might be aware that if they guess o Chance of Type 1 error – too many
the hypothesis of the experiment, their pairwise comparisons
data will be discarded o Lowering of statistical conclusion
§ If asked – might not reveal all balidity
that they know • Inferring cause/effect much riskier
o Subjects need to believe that you really o Small sample – reduce power of
want a truthful answer statistical tests; easier to get statistical
§ Ex. incentives for guessing significance with big samples
hypotheses in informal § Even if barely – be wary when
interviews drawing conclusions
o Experimenters might not press • Findings can be statistically significant, but
subjects for info as it could produce meaningful
unusable data o Ex. small effect sizes; overlapping of
§ Refuse to test additional confidence intervals (type 1 error)
subjects = accepting reports

at face value = less objective
o Always be open to other potential
explanations of findings

Downloaded by Carl Darcy ([email protected])


lOMoARcPSD|15928116

Taking a Broader Perspective: The Problem of • Procedures may not be super reliable or valid
EXTERNAL VALIDITY o Hedging when discussing
generalizability – qualifying statements
Generalizing from the Results § “suggest”; “appears that” =
• External Validity – can generalize from the cannot state with certainty
results • formulating general conclusions – moving
o Making findings more universal than further away from actual observations made
they actually are o going beyond what was actually
o Inductive thinking observed
• Two basic requirements: o qualifying of conclusions – no way to
o Must be internally valid (cause/effect, be certain of generalizations
free of confounding) § ex. specific circumstances in
o Can be replicated which an IV may only work
• Valid experimental findings appear again and • Generality – expect findings to be consistent
again; similar effects in similar studies with prior studies
Research Significance
Generalizing Across Subjects • Consistent with prior studies?
• Samples should be taken from populations one o Clarify/extend our knowledge?
wants to discuss o Implications for broader theoretical
• Practical problems preventing from truly issues?
random samples • Placing findings in context of prior research
o BIAS in way human subjects are o Inconsistent? – suspect experiment
chosen (Volunteers different) o Reconciling with what is already know
• Generality of research constrained by practical • Conflicting findings – subtle differences in ODs,
problems procedures, subjects used
o Ex. College students VS average adults • Theory building – can stand as long as it can
• Further from the population sampled = shakier explain observed results
position is
o May also accept generality of findings Generalizing Beyond the Laboratory
for ethical reasons (Ex. animal testing • Lab experiments – specific, controlled
for drugs) conditions
o Elimination of EVs = precise tool of
Generalizing from Procedures to Concepts: RESEARCH measuring effects of IVs
SIGNIFICANCE • HOWEVER, variables we study usually do not
• Hess (1975) occur under the same controlled conditions in
• Theoretical issues – variables with multiple real life
operational definitions o EVs uncontrolled in real life
o Ex. anxiety, anger, learning, frustration
o OD -> conclusions about the concept
itself
o Desirable to view findings from
abstract perspective
§ Induction – new theories,

application





Downloaded by Carl Darcy ([email protected])


lOMoARcPSD|15928116

Increasing External Validity MULTIVARIATE DESIGNS


• Multiple DVs
AGGREGATION o Look at many DVs in combination
• Epstein – grouping together and averaging of o Measurements can be made on 1+
data gathered in various ways samples
o Ex. Meta-analysis = statistics, effect • Ex. Multiple correlation, Factor Analysis,
sizes of diff. studies (ex. on Rosenthal Multivariate Analysis of Variance (MANOVA)
Effect) • MANOVA
Four Types of Aggregation (Epstein) o Measuring effects of IV as they affect
• Aggregation Over Subjects sets of DVs
o Combining data from several subjects § Whether IVs influences DVs as
(ex. Large N studies) they occur in combination
o Pooling of data = group averages o Tets the effects of the IV on the whole
§ Large sample more likely to set of measures at once
be representative = greater § + interactions on measures
external validity § Higher order interactions
• Aggregation Over Stimuli or Situations possible (ex. several IVs
o Stimuli must be sampled as effectively operating together)
as we sample subjects + context of o Differences in trends among various
expe. DVs
§ Ex. holiday season VS others § Ex. stronger IV on some
§ Variety of situations = more measures VS others
validity o Much more info VS just ANOVA
• Aggregation Over Trials or Occasions • Advantage – can look at combinations of
o Many trials, combining multiple testing variables (may be more representative of
sessions = minimizing effects reality)
associated with specific trials
o Various occasions of testing = • Factorial designs – considered “multivariate”
minimizes effects from uniqueness of o More than 1 IV
each testing session o HOWEVER - Term multivariate –
§ Physical, social, personality, usually used for designs with multiple
context variables DVs
§ Ex. similar to use of GPA
• Aggregation Over Measures NONREACTIVE MEASUREMENTS
o Multiple measuring procedures • REACTIVITY – reaction to being subjects; being
§ Offsetting of errors using observed; etc.
inadequate instruments o Increase external validity by
o Converging lines of evidence for minimizing reactivity
explanation of behaviour • Ex. demand characteristics; good subject
o More confidence that results can be phenomenon; pact of ignorance
reproduced using diff. subjects, • Avoid giving unnecessary cues; single or double-
situations, occasions, and ways blind experiments
• Controlling for social, personality variables



Downloaded by Carl Darcy ([email protected])


lOMoARcPSD|15928116

Developing Unobtrusive Measures NATURALISTIC OBSERVATION


• Measuring subjects’ behaviour without letting • Miller – can be used to validate or add
them know they are being measured substance to previously obtained lab findings
o Also called nonreactive measures o Can be used in a complementary way
PRINTED BY: [email protected]. Printing is for personal, private use only. No part of this book may be reproduced or tran
• without publisher's prior permission. Violators will be
Not influenced by subjects’ reactions = greater • prosecuted.
Naturalistic – suggesting of hypotheses; test
external validity hypotheses in the lab
• May depend on physical aspects of o verification again in naturalistic setting
environment; data collected for other purposes • Experiment – best to use in combination with
o Ex. reviews of hospital charts; other modes of research
standardized tests; reading o Using research methods in
preferences combination – maintaining precision
• Observing subjects unobtrusively without sacrificing relevance
o Marston, London, Cooper, Cohen –
diners in a restaurant (obese VS thin) Handling a Nonsignificant Outcome
o Cameras
• ETHICS – informed consent
o Not required for observations of public
behaviour (ex. restaurant diners;
driving cars)

FIELD EXPERIMENTS

• Same as experiment manipulation but in
natural setting Faulty Procedures
o Ex. Blais and Bacher – punishment on
• Review everything done – control procedures;
economic crimes
random assignment; counterbalancing; demand
• Limitations:
characteristics? Experimenter bias?
o Little control over choice of
• Numerous uncontrolled variables may increase
participants; samples may not be
amount of variability between individuals’
random
scores
o Hard to specify characteristics of the
o Less detecting of treatment effects
sample
o Ex. unreliable measuring instrument;
o Less able to control EV
scoring error = net effect of increasing
• Also can be used to validate findings obtained
error variance
in the lab
• Manipulations may not be powerful enough to
o Ex. Greenberg and equity theory
override effect of other sources of variation
o Type 2 error – unable to reject the null
even though effect was there
o Lower statistical conclusion validity







Downloaded by Carl Darcy ([email protected])


lOMoARcPSD|15928116

• Check if sample is large enough – maybe not


enough power (Type 2 Error)
• Manipulation may also be inadequate –
defining treatment levels important
o Did it measure what was intended?

PRINTED BY: [email protected]. Printing is for personal, private use only. No part of this book may be reproduced or transmitted
without publisher's prior permission. Violators will be prosecuted.


[email protected]. Printing is for personal, private use only. No part of this book may be reproduced or transmitted
sion. Violators will be prosecuted.
• CEILING EFFECT – subjects tend to “top out” on
the scale
• FLOOR EFFECT – people only use bottom of the
range

Faulty Hypothesis



INTED BY: [email protected]. Printing is for personal, private use only. No part of this book may be reproduced or transmitted
hout publisher's prior permission. Violators will be prosecuted.

Downloaded by Carl Darcy ([email protected])

You might also like