Psych Stat Hypotheses Analyzing Results Drawing Conclusions
Psych Stat Hypotheses Analyzing Results Drawing Conclusions
Psych Stat Hypotheses Analyzing Results Drawing Conclusions
[email protected]. Printing is for personal, private use only. No part of this book may be reproduced or transmitted
• No way to directly test – no way to really prove o Great deal of variability = large
ermission. Violators will be prosecuted.
research hypotheses are correct differences
• Can only show unlikeliness of patterns coming • More variability on DV = greater difference in
from chance treatment groups has to be before data can
o Null hypothesis is probably wrong look like they come from different populations
• The more variability, the harder it will be to
Testing Null Hypothesis reject the null hypothesis
• Statistical tests – formulation of null hypotheses
o Stating that performance of treatment Applying Statistical Inference
groups are so similar, they must (book has a long example lol)
belong to the same population o Ex. directional hypothesis – predicts
• Rejecting Null Hypothesis way which difference between groups
o Difference between treatment groups will go
so large that chance variations cannot • Consider variability in data - evaluate data using
explain it statistical tests
• Effect of IV (ex. large effect) o State null hypothesis (ex. coming from
o Can be shown by frequency same population)
distributions • Normal curve -> symmetrical, bell-shaped curve
o Many scores fall close to the center
o Most close to the mean
§ Further away from mean =
lower frequency with which
they appear
• Most tests assume populations sampled are
normally distributed on normal variable
o As if everyone in that population
measured on that variable = normal
curve frequency distribution
• Known distributions
o Can be used to make inferences about
The Process of Statistical Inference data
1. Consider population to be sampled o May require knowledge of probability
a. Variability – individual scores on DV theory (ex. odds of strength of
will differ variability’s effect)
2. Consider different random samples in • Null hypothesis (H0) tested for two reasons:
population o Most likely explanation of what has
a. Scores on DV will differ because of occurred
normal variability § Expected variation between
b. Assume null hypothesis is correct groups, times accounted for
3. Apply treatment conditions to randomly o No way to directly verify the
selected, randomly assigned samples alternative to the null hypothesis
4. IF after treatment, samples now appear to • H1 states that treatment means so different
belong to different populations = Reject the that they come from distinct populations
Null Hypothesis o However = there is always some
chance that the results were caused by
• Whether or not we reject null hypothesis – sampling error
depends on variability
Test Statistics
• Inferential/Test Statistics
o Can be used as indicators of what is
going on in the population
o Can be used to evaluate results;
numerical summary of what is going
on in the data
• Transformation of relationship between
treatment diff and variability -> simple
quantitative measure
• LARGER VALUE OF TEST STATISTIC = MORE
LIKELY IV HAD A SIGNIFICANT EFFECT
o Large – large VS amount of variability
§ More likely to reject the null
• Each test statistic – own distribution of values • Income -> subject to distortion
o Not symmetrical but positively skewed
Organizing and Summarizing the Data (long tail on right side/positive side)
Organizing o Mean higher than median or mode
[email protected]. Printing is for personal, private use only. No part of this book may be reproduced or transmitted
permission. Violators will be prosecuted.
Summarizing – Descriptives
• b) Bimodal – two scores can be mode
• Raw data – data we record as we run an
• c) positively skewed
experiment
• d) negatively skewed
• Summary data – reported results of experiment
o If not small N – group data should be
Measures of Variability
noted, not individual scores
• Defined numerically by one of several
• Descriptive statistics – shorthand ways of
descriptive statistics:
describing (group) data
o Range, variance, SD
o Comparison of variability of diff
Measures of Central Tendency
samples possible
• Describe what is typical of a distribution of
• Range – difference between largest and
scores
smallest scores in data
o Mode – occurs most often
o Indication of spread of scores
o Median – score that divides the
o Does not reflect precise amount of
distribution in half
variability in all the scores
§ Half above the median, half 2
• Variance (s ) – average squared deviation of
below the median
scores from their mean
o Mean – arithmetic average
o How much scores are
§ Sum of scores over total
dispersed/spread out around mean of
number of scores
data
• Indicators of shape of distributions
• Standard deviation (s) – square root of the
o Mean = median = mode -> symmetrical
variance
distribution
o Average deviation of scores about the
• Skewness – one tail of distribution is longer
mean
than the other
o Total of the deviations from the mean
is always 0
• Value obtained must exceed critical value for o Assumptions of the test CAN be
appropriate df violated without changing rate of Type
1, Type 2 errors
§ Ex. normality, comparison of
Y: [email protected]. Printing is for personal, private use only. No part of this bookvariability
may be reproduced or transmitted
sher's prior permission. Violators will be prosecuted. • Effect size – transforming t values and dfs to
correlation coefficient r
o
r personal, private use only. No part of this book may be reproduced or transmitted
.edu. Printing A One-Way ANOVA (Between-Subjects)
is for personal, private use only. No part of this book may be reproduced
• orEffect size
transmitted
e prosecuted.
• Between-subjects, ratio, 1 IV
o Null – means of groups come from
o
same population
o - proportion of variance in all the
• Treatment groups independent; random
scores that can be explained by the
selection; normality on DV
treatment
o Equality of variances =
HOMOGENEOUS
Graphing the Results
• Robust test, like t-test
• Data points usually represent group means
Within and Between Groups Variability
o Data of individual subjects usually not
• MS – mean square; variability, variance
graphed unless small N
o Average squared deviation
• F = MSW and MSB
Statistical Control for Differences Between Groups
• Moderating Variable – one that can
moderate/change the influence of IV
• ANCOVA – Analysis of Covariance
o Control statistically for potential
moderating variables
Interpreting the Results o Removing variance on the moderating
• F – only tests overall pattern of different means variable (covariate) from variance
o Significant F – across group means, produced by error
there is a significant difference § Refine estimates of
§ 3+ groups – ANOVA does not experimental error
identify specific differences § Adjust treatment effects for
between each pair of means any differences between
• POST-HOC TESTS – tests done after significant groups that existed before the
analyses experiment
o Tukey, Scheffe o Holding moderating variable constant
§ Pair-by-pair comparisons OR statistically equating subjects
o Can however result in less power to § More sensitive to detecting IV
detect treatment effects + increase effects
chances of Type 2 error o Whether covariate was an important
o Series of t-tests for pairwise influence or not
comparisons = Type 1 error
• A PRIORI COMPARISONS One-Way Within-Subjects/Repeated Measures ANOVA
o Tests between specific treatment • Multiple groups with 1 IV, within-subjects
groups that were anticipated/planned • Same as for independent groups but
before experiment was conducted denominator calculated differently
§ Planned o Between/within-subjects factorials
comparisons/contrasts o Mixed factorial designs
o As long as number of comparisons less
than number of treatment groups, Analyzing Data from a Between-Subjects Factorial
Type 1 error chances not increased Experiment
o More powerful VS post hoc (less • Factorial – effects of more than one IV +
conservative) interaction effects
o Main effects, interaction effects
Evaluating the F-ratios
• Comparison to table values of F
o Locate critical value of F using df on
the F ratio
o Numerator and denominator df
§ Each ratio has own critical
value
NTED BY: [email protected]. Printing is for personal, private use only. No part of this book may be reproduced or transmitted
hout publisher's prior permission. Violators will be prosecuted.
PRINTED BY: [email protected]. Printing is for personal, private use only. No part of this book may be reproduced or transmitted
without publisher's prior permission. Violators will be prosecuted.
Chapter 15 – Drawing Conclusions: The Search for the • Spending too much time listing any possible EVs
Elusive Bottom Line o If events are infrequent and could
occur randomly for Ss in any condition
Evaluating the Experiment from the Inside: INTERNAL = not considered a problem
VALIDITY § Adding error only; no need to
• Internally valid list
o When effects of EVs have not been • Spending too little time thinking of alternative
mistaken for the effects of the IV/s explanations = problem
o Free of confounding – effects on DV • If something other than the IV can explain
are due only to the treatment results = NOT INTERNALLY VALID
• Experiments with obvious source of o Review classic threats to internal
confounding – rarely make to print validity
• Importance of planning ahead o Discuss problems openly
o Procedures – appropriate methods and o We can never directly confirm/”prove”
control techniques hypotheses
o Enough levels of the IV = adequate test
of hypothesis Statistical Conclusion Validity
o Random assignment; constancy of • Validity of drawing conclusions about treatment
conditions; counterbalancing effects from statistical results that were
• Consider what occurs during actual experiment obtained
– MANIPULATION CHECK o Assumptions violated – statistical
o Verifies how successfully the conclusion validity is in doubt
experimenter manipulated the § Ex. using test statistic
situation he/she intended to produce inappropriately (ex. ANOVA
• Informal interviews/questionnaires following for nominal) = doubt
experiments • Number of statistical tests computed + power
o Sense of whether subjects guessed the of statistical test = common sources of low
hypothesis validity
• PACT OF IGNORANCE (Orne, 1969) • Statistical tests = probability statements only
o Ss might be aware that if they guess o Chance of Type 1 error – too many
the hypothesis of the experiment, their pairwise comparisons
data will be discarded o Lowering of statistical conclusion
§ If asked – might not reveal all balidity
that they know • Inferring cause/effect much riskier
o Subjects need to believe that you really o Small sample – reduce power of
want a truthful answer statistical tests; easier to get statistical
§ Ex. incentives for guessing significance with big samples
hypotheses in informal § Even if barely – be wary when
interviews drawing conclusions
o Experimenters might not press • Findings can be statistically significant, but
subjects for info as it could produce meaningful
unusable data o Ex. small effect sizes; overlapping of
§ Refuse to test additional confidence intervals (type 1 error)
subjects = accepting reports
at face value = less objective
o Always be open to other potential
explanations of findings
Taking a Broader Perspective: The Problem of • Procedures may not be super reliable or valid
EXTERNAL VALIDITY o Hedging when discussing
generalizability – qualifying statements
Generalizing from the Results § “suggest”; “appears that” =
• External Validity – can generalize from the cannot state with certainty
results • formulating general conclusions – moving
o Making findings more universal than further away from actual observations made
they actually are o going beyond what was actually
o Inductive thinking observed
• Two basic requirements: o qualifying of conclusions – no way to
o Must be internally valid (cause/effect, be certain of generalizations
free of confounding) § ex. specific circumstances in
o Can be replicated which an IV may only work
• Valid experimental findings appear again and • Generality – expect findings to be consistent
again; similar effects in similar studies with prior studies
Research Significance
Generalizing Across Subjects • Consistent with prior studies?
• Samples should be taken from populations one o Clarify/extend our knowledge?
wants to discuss o Implications for broader theoretical
• Practical problems preventing from truly issues?
random samples • Placing findings in context of prior research
o BIAS in way human subjects are o Inconsistent? – suspect experiment
chosen (Volunteers different) o Reconciling with what is already know
• Generality of research constrained by practical • Conflicting findings – subtle differences in ODs,
problems procedures, subjects used
o Ex. College students VS average adults • Theory building – can stand as long as it can
• Further from the population sampled = shakier explain observed results
position is
o May also accept generality of findings Generalizing Beyond the Laboratory
for ethical reasons (Ex. animal testing • Lab experiments – specific, controlled
for drugs) conditions
o Elimination of EVs = precise tool of
Generalizing from Procedures to Concepts: RESEARCH measuring effects of IVs
SIGNIFICANCE • HOWEVER, variables we study usually do not
• Hess (1975) occur under the same controlled conditions in
• Theoretical issues – variables with multiple real life
operational definitions o EVs uncontrolled in real life
o Ex. anxiety, anger, learning, frustration
o OD -> conclusions about the concept
itself
o Desirable to view findings from
abstract perspective
§ Induction – new theories,
application
FIELD EXPERIMENTS
• Same as experiment manipulation but in
natural setting Faulty Procedures
o Ex. Blais and Bacher – punishment on
• Review everything done – control procedures;
economic crimes
random assignment; counterbalancing; demand
• Limitations:
characteristics? Experimenter bias?
o Little control over choice of
• Numerous uncontrolled variables may increase
participants; samples may not be
amount of variability between individuals’
random
scores
o Hard to specify characteristics of the
o Less detecting of treatment effects
sample
o Ex. unreliable measuring instrument;
o Less able to control EV
scoring error = net effect of increasing
• Also can be used to validate findings obtained
error variance
in the lab
• Manipulations may not be powerful enough to
o Ex. Greenberg and equity theory
override effect of other sources of variation
o Type 2 error – unable to reject the null
even though effect was there
o Lower statistical conclusion validity
PRINTED BY: [email protected]. Printing is for personal, private use only. No part of this book may be reproduced or transmitted
without publisher's prior permission. Violators will be prosecuted.
[email protected]. Printing is for personal, private use only. No part of this book may be reproduced or transmitted
sion. Violators will be prosecuted.
• CEILING EFFECT – subjects tend to “top out” on
the scale
• FLOOR EFFECT – people only use bottom of the
range
Faulty Hypothesis
INTED BY: [email protected]. Printing is for personal, private use only. No part of this book may be reproduced or transmitted
hout publisher's prior permission. Violators will be prosecuted.