Coursera Statistics One - Notes and Formulas
Coursera Statistics One - Notes and Formulas
Coursera Statistics One - Notes and Formulas
Raw
score
formula:
r
=
SPxy
/
SSxSSy
SSx
=
(X
-
Mx)2
=
[(X
-
Mx)(X
-
Mx)]
SSy
=
(Y
-
My)2
=
[(Y
-
My)(Y
-
My)]
SPxy
=
[(X
-
Mx)(Y
-
My)]
r
=
SPxy
/
SSxSSy
r
=
[(X
-
Mx)(Y
-
My)]
/
(X
-
Mx)2
(Y
-
My)
Z-score
formula:
r
=
(zxzy)
/
N
zx
=
(X
-
Mx)
/
SDx
zy
=
(Y
-
My)
/
SDy
SDx
=(X-Mx)2
/N
SDy
=
(Y
-
My)2
/
N
zx
=
(X
-
Mx)
/
SDx
zy
=
(Y
-
My)
/
SDy
SDx
=
(X-Mx)2
/N
SDy
=
(Y
-
My)2
/
N
Proof
of
equivalence:
zx
=
(X
-
Mx)
/
(X
-
Mx)2
/
N
zy
=
(Y
-
My)
/
(Y
-
My)2
/
N
r
=
{
[(X
-
Mx)
/
(X
-
Mx)2
/
N]
[(Y
-
My)
/
(Y
-
My)2
/
N]
}
/
N
r
=
{
[(X
-
Mx)
/
(X
-
Mx)2
/
N]
[(Y
-
My)
/
(Y
-
My)2
/
N]
}
/
N
r
=
[(X
-
Mx)(Y
-
My)]
/
(X
-
Mx)2
(Y
-
My)2
r
=
SPxy
/
SSxSSy
Variance
and
covariance
Variance
=
MS
=
SS
/
N
Covariance
=
COV
=
SP
/
N
Correlation
is
standardized
COV
Standardized
so
the
value
is
in
the
range
-1
to
1
Note
on
the
denominators
Correlation
for
descriptive
purposes
Divide
by
N
Correlation
for
inferential
purposes
Divide
by
N-1
- L4c:
Interpreting
correlations:
Assumptions
for
correlation
Normal
distributions
for
X
and
Y
Linear
relationship
between
X
and
Y
Homoskedasticity
Reliability
of
a
correlation
Does
the
correlation
reflect
more
than
just
chance
covariance?
One
approach
to
this
question
is
to
use
NHST
o
H0
=
null
hypothesis:
e.g.,
r
=
0
o
HA
=
alternative
hypothesis:
e.g.,
r
>
0
Truth
Decision
=>
Retain
H0
Reject
H0
H0
true
Correct
Decision
Type
I
error
(False
alarm)
p
=
(1
-
)
p
=
H0
false
Type
II
error
(Miss)
Correct
Decision
p
=
p
=
(1
-
)
(1
-
POWER)
POWER
NHST
- p
=
P(D|H0)
- Given
that
the
null
hypothesis
is
true,
the
probability
of
these,
or
more
extreme
data,
is
p
- NOT:
The
probability
of
the
null
hypothesis
being
true
is
p
- In
other
words,
P(D|H0)
<>
P(H0|D)
NHST
can
be
applied
to:
- r
(Is
the
correlation
significantly
different
from
zero?)
- r1
vs.
r2
(Is
one
correlation
significantly
larger
than
another?)
There
are
other
correlation
coefficients:
Point
biserial
r
=>
When
1
variable
is
continuous
and
1
is
dichotomous
Phi
coefficient
=>
When
both
variables
are
dichotomous
Spearman
rank
correlation
=>
When
both
variables
are
ordinal
(ranked
data)
- L5a:
Reliability
&
Validity
Reliability
- Classical
test
theory
Raw
scores
(X)
are
not
perfect
They
are
influenced
by
bias
and
chance
error
In
a
perfect
world,
we
would
obtain
a
true
score
-
What
is
a
regression?
A
statistical
analysis
used
to
predict
scores
on
an
outcome
variable,
based
on
scores
on
one
or
more
predictor
variables
For
example,
we
can
predict
how
many
runs
a
baseball
player
will
score
(Y)
if
we
know
the
players
batting
average
(X)
Regression
equation
Y
=
B0
+
B1X1
+
e
=
B0
+
B1X1
#
is
the
predicted
score
on
Y
Y
=
e
#
e
is
the
prediction
error
(residual)
Estimation
of
coefficients
The
values
of
the
coefficients
(B)
are
estimated
such
that
the
model
yields
optimal
predictions.
Minimize
the
residuals!
The
sum
of
the
squared
(SS)
residuals
is
minimized
SS.RESIDUAL
=
(
-Y)2
ORDINARY
LEAST
SQUARES
estimation
Sum
of
Squared
deviation
scores
(SS)
in
variable
X
=
SS.X;
in
Y
=
SS.Y
Ss.X
Ss.Y
Sum
of
Cross
Products
(SP.XY)
(Also
called
SS.MODEL
)
SP.XY
SS.Y
=
SS.MODEL
+
SS.RESIDUAL
SS.MODEL
SS.RESIDUAL
How
to
calculate
B
(unstandardized)
B
=
r
x
(SDy/
SDx)
Standardized
regression
coefficient
=
=
r
If
X
and
Y
are
standardized
then:
SDy
=
SDx
=
1
B
=
r
x
(SDy/
SDx)
=
r
- L7b:
A
Closer
Look
at
NHST
H0
=
null
hypothesis:
e.g.,
r
=
0,
B
=
0
HA
=
alternative
hypothesis:
e.g.,
r
>
0,
B
>
0
Assume
H0
is
true,
then
calculate
the
probability
of
observing
data
with
these
characteristics,
given
that
H0
is
true
Thus,
p
=
P(D|
H0)
If
p
<
then
Reject
H0,
else
Retain
H0
- t
=
B
/
SE
B
is
the
unstandardized
regression
coefficient
SE
=
standard
error
SE
=
SS.RESIDUAL
/
(N
2)
Problems
- Biased
by
N
p-value
is
based
on
t-value
t
=
B
/
SE
SE
=
SS.RESIDUAL
/
(N
2)
- Binary
outcome
Technically
speaking,
one
must
Reject
or
Retain
the
Null
Hypothesis
What
if
p
=
.06?
- Null
model
is
a
weak
hypothesis
Demonstrating
that
your
model
does
better
than
NOTHING
is
not
very
impressive
Alternatives
to
NHST
- Effect
size
Correlation
coefficient
(r)
Standardized
regression
coefficient
(B)
Model
R2
- Confidence
intervals
Sample
statistics
are
point
estimates
Specific
to
the
sample
Will
vary
as
a
function
of
sampling
error
Instead
report
interval
estimates
Width
of
interval
is
a
function
of
standard
error
- Model
comparison
Propose
multiple
models
Model
A
Model
B
2
Compare
Model
R
- L8a:
Introduction
to
Multiple
Regression
Simple
vs.
multiple
regression
Simple
regression
=>
Just
one
predictor
(X)
Multiple
regression
=>
Multiple
predictors
(X1,
X2,
X3,
...)
Multiple
regression
equation
Just
add
more
predictors
(multiple
Xs)
=
B0
+
B1X1+
B2X2+
B3X3+
...
+
BkXk
=
B0
+
(BkXk)
=
predicted
value
on
the
outcome
variable
Y
B0
=
predicted
value
on
Y
when
all
X
=
0
Xk
=
predictor
variables
Bk
=
unstandardized
regression
coefficients
Y
-
=
residual
(prediction
error)
k
=
the
number
of
predictor
variables
Model
R
and
R2
R
=
multiple
correlation
coefficient
R
=
rY
The
correlation
between
the
predicted
scores
and
the
observed
scores
R2
The
percentage
of
variance
in
Y
explained
by
the
model
Types
of
multiple
regression
Standard
Sequential
(aka
hierarchical)
The
difference
between
these
approaches
is
how
they
handle
the
correlations
among
predictor
variables
If
X1,
X2,
and
X3
are
not
correlated
then
type
of
regression
analysis
doesnt
matter
If
predictors
are
correlated
then
different
methods
will
return
different
results
o All predictors are entered into the regression equation at the same time
Standard
Sequential
o Predictors
are
entered
into
the
regression
equation
in
ordered
steps;
the
order
is
specified
by
the
researcher
o Each
predictor
is
assessed
in
terms
of
what
it
adds
to
the
equation
at
its
point
of
entry
o Often
useful
to
assess
the
change
in
R2
from
one
step
to
another
-
-
-
-
-
The
inverse
of
a
matrix
is
similar
to
the
reciprocal
of
a
scalar
- Raw
data
matrix
Subjects
as
rows,
variables
as
columns
-
-
- Row
vector
of
means
M1p
=
T1p
*
N-1
=
(34
35
34)*
10-1
=
(3.4
3.5
3.4)
- Matrix
of
means
Sum
of
squares
and
cross-products
matrix
-
-
-
-
-
Variance-covariance matrix
Diagonal
matrix
of
standard
deviations
Correlation
matrix
L8c:
Estimation
of
Coefficients
Still
ORDINARY
LEAST
SQUARES
estimation,
but
using
matrix
algebra
The
values
of
the
coefficients
(B)
are
estimated
such
that
the
model
yields
optimal
predictions.
o Minimize
the
residuals!
o The
sum
of
the
squared
(SS)
residuals
is
minimized
o SS.RESIDUAL
=
(
-Y)2
o ORDINARY
LEAST
SQUARES
estimation
Regression
equation
o
=
B0
+
B1X1
#
is
the
predicted
score
on
Y
o Y
=
e
#
e
is
the
prediction
error
(residual)
Regression
equation,
matrix
form
o
=
BX
SES
moderates
the
relationship
between
psychological
trait
and
behavioral
outcome
(eg.
True
for
high
SES,
but
not
for
lower
SES)
A
mediation
analysis
is
typically
conducted
to
better
understand
and
observed
correlation
between
X
and
Y
Eg.
Why
is
extraversion
correlated
with
happiness?
We
know
from
simple
regression
analysis
that
if
X
and
Y
are
correlated
then
we
can
use
regression
to
predict
Y
from
X
Y
=
B0
+
B1X
+
e
o lm(Y~X+M)
Regression
coefficient
for
M
should
be
significant
Regression
coefficient
for
X?
(If
X
becomes
ns
=>
full
mediation,
if
X
remains
significant
=>
partial
mediation)
Eg.
- Assume
N
=
188
- Participants
surveyed
and
asked
to
report:
o Happiness
(happy)
o Extraversion
(extra)
o Diversity
of
life
experiences
(diverse)
- L10b:
Path
Analysis
Method
for
Mediation
Mediation
analyses
are
typically
illustrated
using
path
models
Rectangles:
Observed
variables
(X,
Y,
M)
Circles:
Unobserved
variables
(e)
Triangles:
Constants
Arrows:
Associations
(more
on
these
later)
Path
model
with
a
mediator
To
avoid
confusion,
lets
label
the
paths
a:
Path
from
X
to
M
b:
Path
from
M
to
Y
c:
Direct
path
from
X
to
Y
(before
including
M)
c:
Direct
path
from
X
to
Y
(after
including
M)
Path
model
with
a
mediator
How
to
test
for
mediation
Three
regression
equations
can
now
be
re-written
with
new
notation:
Y
=
B0
+
cX
+
e
Y
=
B0
+
cX
+
bM
+
e
M
=
B0
+
aX
+
e
- The
Sobel
test
z
=
(Ba*Bb)
/
(Ba2*SEb2)
+
(Bb2*SEa2)
o The
null
hypothesis
The
indirect
effect
is
zero
(Ba*Bb)
=
0
Results
in
path
model
=>
Interpretation
SES
moderates
the
relationship
between
extraversion
and
happiness
Moral
of
the
story:
The
picture
can
change,
literally,
when
you
consider
a
new
variable
Quick
example:
Working
memory
capacity
(X)
SAT
(Y)
Type
of
University
(Z)
o Large
Public
State
University
o Ivy
League
(ZAP!)
=>
Interpretation
Type
of
University
moderates
the
relationship
between
WMC
and
SAT
Moderation
model
Y
=
B0
+
B1X
+
B2Z
+
B3(X*Z)
+
e
How
to
test
for
moderation
Run
just
one
regression
model
lm(Y~X
+
Z
+
X*Z)
o Need
to
create
new
column
for
(X*Z)
o Lets
call
it
PRODUCT
Centering
predictors
- To
center
means
to
put
in
deviation
form
XC
=
X
-
M
- Why
center?
Two
reasons:
o Conceptual
reason
Suppose
Y
=
childs
language
development
X1
=
mothers
vocabulary
X2
=
childs
age
The
intercept,
B0,
is
the
predicted
score
on
Y
when
all
X
are
zero
If
X
=
zero
is
meaningless,
or
impossible,
then
B0
will
be
difficult
to
interpret
If
X
=
zero
is
the
average
then
B0
is
easy
to
interpret
The
regression
coefficient
B1
is
the
slope
for
X1
assuming
an
average
score
on
X2
No
moderation
implies
that
B1
is
consistent
across
the
entire
distribution
of
X2
However,
moderation
implies
that
B1
is
NOT
consistent
across
the
entire
distribution
of
X2
Where
in
the
distribution
of
X2
is
B1
most
representative?
Lets
look
at
this
graphically
o Statistical
reason
The
predictors,
X1
and
X2,
can
become
highly
correlated
with
the
product,
X1*X2
Can
result
in
multi-colinearity
Centering
for
moderation:
Summary
Center
predictors
Run
sequential
regression
(2
steps)
Step
1:
Main
effects
Step
2:
Moderation
effect
Evaluate
B
for
PRODUCT
or
R2
from
Model
1
to
Model
2
Dummy
coding
A
system
to
code
categorical
predictors
in
a
regression
analysis
Example
IV:
Area
of
research
Cognitive
Social
Neuroscience
Cognitive
neuroscience
DV:
#
of
publications
Regression
model
=
B0
+
B1(C1)
+
B2(C2)
+
B3(C3)
Regression
model:
Before
moderation
Y=
B0
+
B1(PUBS.C)
+
B2(C1)
+
B3(C2)
Interpretation
of
results
The
estimated
salary
for
a
Psychologist
with
15.5
pubs
is
58,482
The
average
return
per
publication
across
all
three
departments
is
926
When
taking
into
account
publications,
Historians
earn
10,447
more
than
Psychologists
When
taking
into
account
publication
rate,
Sociologists
earn
8,282
more
than
Psychologists
Regression
model:
Moderation
=
B0
+
B1(PUBS.C)
+
B2(C1)
+
B3(C2)
+
B4(C1*PUBS.C)+
B5(C2*PUBS.C)
Interpretation
of
results
The
estimated
salary
for
a
Psychologist
with
15.5
pubs
is
56,918
(taking
into
account
the
rate
of
return
for
Psychologists)
The
average
return
per
pub
for
Psychology
is
1,373
The
difference
in
salary
between
Psychology
and
History
is
9,796
(for
a
person
with
15.5
pubs,
taking
into
account
rate
of
return)
The
difference
in
salary
between
Psychology
and
Sociology
is
9,672
(for
a
person
with
15.5
pubs,
taking
into
account
rate
of
return)
The
difference
in
the
pubs
by
salary
slope
between
Psychology
and
History
is
-961
The
difference
in
the
pubs
by
salary
slope
between
Psychology
and
Sociology
is
-
1,115
Further
questions
Is
the
History
slope
significant?
Is
the
Sociology
slope
significant?
Is
the
difference
in
slope
between
History
and
Sociology
significant?
Re-code
to
make
a
different
reference
group
and
re-run
the
analysis
Test
of
simple
slopes
Dont
enter
the
main
effect
of
publications
Create
moderation
terms
that
represent
the
slope
for
each
group
Interpretation
of
results
The
Bs
for
the
moderation
terms
are
the
simple
slopes
Psychology
is
significant
History
is
not
significant
Sociology
is
not
significant
Department
moderates
the
relationship
between
publications
and
salary
z
=
(observed
expected)
/
SE
t
=
(observed
expected)
/
SE
When
to
use
z
and
t?
- z
When
comparing
a
sample
mean
to
a
population
mean
and
the
standard
deviation
of
the
population
is
known
- Single
sample
t
When
comparing
a
sample
mean
to
a
population
mean
and
the
standard
deviation
of
the
population
is
not
known
- Dependent
samples
t
When
evaluating
the
difference
between
two
related
samples
- Independent
samples
t
When
evaluating
the
difference
between
two
independent
samples
Notation
:
population
standard
deviation
:
population
mean
SD:
sample
standard
deviation
M:
sample
mean
SE:
standard
error
SEM:
standard
error
for
a
mean
SEMD:
standard
error
for
a
difference
(dependent)
SEDifference:
standard
error
for
a
difference
(independent)
p
values
for
z
and
t
Exact
p
value
depends
on:
Directional
or
non-directional
test?
df
Different
t-distributions
for
different
sample
sizes
z
:
NA
t
(single
sample)
:
N-1
t
(dependent)
:
N-1
t
(independent)
:
(N1-1)
+
(N2-1)
Single
sample
t
Compare
a
sample
mean
to
a
population
mean
t
=
(M
-
)
/
SEM
2
2
SE M
=
SD /N
SEM
=
SD/SQRT(N)
SD2
=
(X
-
M)
2
/
(N
-
1)
=
SS/df
=
MS
Example:
Suppose
it
takes
rats
just
2
trials
to
learn
how
to
navigate
a
maze
to
receive
a
food
reward
A
researcher
surgically
lesions
part
of
the
brain
and
then
tests
the
rats
in
the
maze.
Is
the
number
of
trials
to
learn
the
maze
significantly
more
than
2?
SD2
=
(X
-
M)
2
/
(N
-
1)
=
SS/df
=
26
/
4
=
6.5
SE2M
=
SD2/N
=
6.5
/
5
=
1.3
SEM
=
1.14
t
=
(M
-
)
/
SEM
=
(6
-
2)
/
1.14
=
3.51
Effect
size
(Cohens
d)
d
=
(M
-
)
/
SD
=
(6
-
2)
/
2.55
=
1.57
For
a
directional
test
with
alpha
=
.05,
df
=
4,
p
=
.012
Reject
H0
- L13b:
Dependent
&
Independent
t-tests
Dependent
means
t
The
formulae
are
actually
the
same
as
the
single
sample
t
but
the
raw
scores
are
difference
scores,
so
the
mean
is
the
mean
of
the
difference
scores
and
SEM
is
based
on
the
standard
deviation
of
the
difference
scores
Suppose
a
researcher
is
testing
a
new
technique
to
help
people
quit
smoking.
The
number
of
cigarettes
smoked
per
day
is
measured
before
and
after
treatment.
Is
the
difference
significant?
SD2
=
(D
-
MD)
2
/
(N
-
1)
=
SS/df
=
48
/
3
=
16
SE2MD
=
SD2/N
=
16
/
4
=
4
SEMD
=
2
t
=
(MD
-
)
/
SEMD
=
(-5
-
0)
/
2
=
-2.5
t
=
MD
/
SEMD
=
-5
/
2
=
-2.5
For
a
directional
test
with
alpha
=
.05,
df
=
3,
p
=
.044
=>
Reject
H0
Effect
size
d
=
(MD
-
)
/
SD
=
-5/4
=
-1.25
Note:
=
0
Independent
means
t
Compares
two
independent
groups
For
example,
males
and
females,
control
and
experimental,
patients
and
normals,
etc.
t
=
(M1
-
M2)
/
SE
Difference
SE2
Difference
=
SE2M1
+
SE2M2
SE2M1
=
SD2
Pooled
/
N1
2
SE M2
=
SD2Pooled
/
N2
SD2Pooled
=
df1/dfTotal(SD21)
+
df2/dfTotal(SD22)
Notice
that
this
is
just
a
weighted
average
of
the
sample
variances
Group
1
(young
adults)
M1
=
350
SD1
=
20
N1
=
100
Group
2
(elderly
adults)
M2
=
360
SD2
=
30
N2
=
100
Null
hypothesis:
1
=
2
ANOVA:
Appropriate
when
the
predictors
(IVs)
are
all
categorical
and
the
outcome
(DV)
is
continuous
Most
common
application
is
to
analyze
data
from
randomized
experiments
More
specifically,
randomized
experiments
that
generate
more
than
2
means
(If
only
2
means
then
use
t-tests)
NHST
may
accompany
ANOVA
The
test
statistic
is
the
F-test
F
=
systematic
variance
/
unsystematic
variance
Like
the
t-test
and
its
family
of
t
distributions,
the
F-test
has
a
family
of
F
distributions,
depending
on:
Number
of
subjects
per
group
Number
of
groups
L14b:
One-way
ANOVA
-
F
ratio
F
=
systematic
variance
/
unsystematic
variance
F
=
between-groups
variance
/
within-groups
variance
F
=
MSBetween
/
MSWithin
F
=
MSA
/
MSS/A
With
MSA
=
SSA
/
dfA
MSS/A=
SSS/A/
dfS/A
SSS/A
=
n
(Yj
-
YT)2
Yij
are
individual
scores
Yj
are
the
treatment
means
dfA
=
a
-
1
dfS/A
=
a(n
-
1)
dfTotal
=
N
-
1
Effect
size
R2
=
2
(eta-squared)
2
=
SSA
/
SSTotal
Assumptions
DV
is
continuous
DV
is
normally
distributed
Homogeneity
of
variance
Within-groups
variance
is
equivalent
for
all
groups
Levenes
test
(If
Levenes
test
is
significant
then
homogeneity
of
variance
assumption
has
been
violated)
=>
Conduct
comparisons
using
a
restricted
error
term
Two
IVs
(treatments)
One
continuous
DV
(response)
Three
F
ratios
:
FA
FB
FAxB
- Main
effect:
the
effect
of
one
IV
averaged
across
the
levels
of
the
other
IV
- Interaction
effect:
the
effect
of
one
IV
depends
on
the
other
IV
(the
simple
effects
of
one
IV
change
across
the
levels
of
the
other
IV)
- Simple
effect:
the
effect
of
one
IV
at
a
particular
level
of
the
other
IV
Main
effects
and
interaction
effect
are
independent
from
one
another
Note
that
this
is
different
from
studies
that
dont
employ
an
experimental
design
For
example,
in
MR,
when
predicting
faculty
salary,
the
effects
of
publications
and
years
since
the
Ph.D.
were
correlated
Factorial
ANOVA
is
just
a
special
case
of
multiple
regression.
It
is
a
multiple
regression
with
perfectly
independent
predictors
(IVs).
F
ratios
FA
=
MSA
/
MSS/AB
FB
=
MSB
/
MSS/AB
FAxB
=
MSAxB
/
MSS/AB
MS
MSA
=
SSA
/
dfA
MSB
=
SSB
/
dfB
MSAxB
=
SSAxB
/
dfAxB
MSS/AB
=
SSS/AB
/
dfS/AB
df
dfA
=
a
-
1
dfB
=
b
-
1
dfAxB
=
(a
-1)(b
-
1)
dfS/AB
=
ab(n
-
1)
dfTotal
=
abn
-
1
=
N
1
Follow-up
tests
Main
effects
Post-hoc
tests
Interaction
Analysis
of
simple
effects
Conduct
a
series
of
one-way
ANOVAs
For
example,
we
could
conduct
3
one-way
ANOVAs
comparing
high
and
low
spans
at
each
level
of
the
other
IV
Effect
size
Complete
2
2
=
SSeffect
/
SStotal
Partial
2
2
=
SSeffect
/
(SSeffect
+
SSS/AB)
Assumptions
Assumptions
underlying
the
factorial
ANOVA
are
the
same
as
for
the
one-way
ANOVA
DV
is
continuous
DV
is
normally
distributed
Homogeneity
of
variance
6) Factorial
ANOVA
&
Model
Comparison
- L16a:
Benefits
of
Repeated
Measures
ANOVA
Benefits
Less
cost
(fewer
subjects
required)
More
statistical
power
Variance
across
subjects
may
be
systematic
If
so,
it
will
not
contribute
to
the
error
term
MS
and
F
MSA
=
SSA
/
dfA
MSAxS
=
SSAxS
/
dfAxS
F
=
MSA
/
MSAxS
Post-hoc
tests
The
error
term
MSAxS
is
NOT
appropriate
Need
to
calculate
a
new
error
term
based
on
the
conditions
that
are
being
compared
FA=
MSA
/
MSAxS
MSA
=
SSA
/
dfA
MSAxS
=
SSAxS
/
dfAxS
Correct
for
multiple
comparisons
Bonferroni
Sphericity
assumption
Homogeneity
of
variance
Homogeneity
of
correlation
r12
=
r13
=
r23
How
to
test?
Mauchlys
test
If
significant
then
report
the
p
value
from
one
of
the
corrected
tests
Greenhouse-Geisser
Huyn-Feldt
- L16b:
Risks
of
Repeated
Measures
ANOVA
- Order
effects
- Counterbalancing
o Consider
a
simple
design
with
just
two
conditions,
A1
and
A2
One
approach
is
a
Blocked
Design
Is
the
pattern
random
or
lawful?
This
can
easily
be
detected
For
any
variable
of
interest
(X)
create
a
new
variable
(XM)
XM
=
0
if
X
is
missing
XM
=
1
if
X
is
not
missing
Conduct
a
t-test
with
XM
as
the
IV
and
X
as
the
DV
If
significant
then
pattern
of
missing
data
may
be
lawful
-
Remedies
Drop
all
cases
without
a
perfect
profile
Drastic
Use
only
if
you
can
afford
it
Keep
all
cases
and
estimate
the
values
of
the
missing
data
points
There
are
several
options
for
how
to
estimate
values
Estimation
methods
Insert
the
mean
Conservative
Decreases
variance
Regression-based
estimation
More
precise
than
using
the
mean
but
Confusion
often
arises
over
which
variables
to
use
as
predictors
in
the
regression
equation
- L17a:
Mixed
Factorial
ANOVA
Design
One
IV
is
manipulated
between
groups
One
IV
is
manipulated
within
groups
Repeated
measures
Whats
new?
Partitioning
SS
Formulae
for
FA,
FB,
FAxB
Error
term
for
post-hoc
tests
Approach
to
simple
effects
analyses
Assumptions
df
dfA
=
a
-
1
dfB
=
b
-
1
dfAxB
=
(a
-
1)(b
-
1)
dfS/A
=
a(n
-
1)
dfBxS/A
=
a(b
-
1)(n
-
1)
dfTotal
=
(a)(b)(n)
1
MS
MSA
=
SSA
/
dfA
MSB
=
SSB
/
dfB
MSAxB
=
SSAxB
/
dfAxB
MSS/A
=
SSS/A
/
dfS/A
MSBxS/A
=
SSBxS/A
/
dfBxS/A
F
FA
=
MSA
/
MSS/A
FB
=
MSB
/
MSBxS/A
FAxB
=
MSAxB
/
MSBxS/A
Must
choose
one
approach
or
the
other
(to
report
both
is
redundant)
Simple
effects
of
the
between
groups
IV
Or
Simple
effects
of
the
repeated
IV
Simple
effects
of
the
between
groups
IV
Simple
effect
of
A
at
each
level
of
B
FA.at.b1
=
MSA.at.b1
/
MSS/A.at.b1
Simple
comparisons
use
the
same
error
term
MSS/A.at.b1
- Within-subjects
assumptions
Sphericity:
the
variances
of
the
different
treatment
scores
(b)
are
the
same
and
the
correlations
among
pairs
of
treatment
means
are
the
same
If
violated
then
report
Greenhouse-Geisser
or
Huynh-Feldt
values