Coursera Statistics One - Notes and Formulas

STATISTICS
ONE NOTES AND FORMULAES

1) Descriptive Statistics

- L1a: Randomized Studies

Definitions:
Population: the entire collection of cases to which we want to generalize (e.g., all
children in the US)
Sample: a subset of the population

Parameter: a numerical measure that describes a characteristic of a population
Statistic: a numerical measure that describes a characteristic of a sample
Descriptive statistics: procedures used to summarize, organize, and simplify data
Inferential statistics: techniques that allow for generalizations about population
parameters based on sample statistics
Independent variable: a variable manipulated by the experimenter
aka treatment, e.g., polio vaccine
Dependent variable: a variable that represents the aspect of the world that the
experimenter predicts will be affected by the independent variable
aka response, e.g., rate of polio

- L2a: Histograms

Four moments of the mean:
Central tendency (mean/median/mode)
Variability
Skew

Kurtosis

L2b: Summary Statistics

Measures of central tendency
Mean: the average, M = (X)/N

Median: the middle score (the score below which 50% of the
distribution falls)
Mode: the score that occurs most often

Mean => best when normal distribution
Median => preferred when extreme scores

Variability: A measure that describes the range and diversity of scores in a
distribution

Standard deviation (SD): average deviation from the mean in a
distribution
Variance = SD2
SD2 = [(X - M)2] / N
(population)
SD2 = [(X - M)2] / N
(sample)

- L2c: Tools for Inferential Statistics

Z-scores: A standardized unit of measurement

Convert raw scores to z-scores:

Z = (X M) / SD

Percentile rank: The percentage of scores that fall at or below a given score

2) Correlation & Measurement

- L4a: Examples of Correlations

Correlation: A statistical procedure used to measure and describe the relationship
between two variables

=> Correlations can range between +1 and -1
+1 is a perfect positive correlation
-1 is a perfect negative correlation

- L4b: Calculating Correlations:

Correlation coefficient (r)
Pearson product-moment correlation coefficient

r = the degree to which X and Y vary together, relative to the degree to which X
and Y vary independently
r = (covariance of X & Y) / (variance of X & Y)

SP = Sum of cross Products

Review: To calculate SS

For each subject in the sample, calculate their deviation score

(X Mx)
Square the deviation scores

(X - Mx)2
Sum the squared deviation scores

SSx = [(X Mx)2] = [(X Mx) x (X Mx)]

To calculate SP

For each subject in the sample, calculate their deviation scores on both X and Y
(X-Mx)

(Y My)
Then, for each subject, multiply the deviation score on X by the deviation score on
Y

(X Mx) x (Y My)
Then sum the cross products
SP = [(X Mx) x (Y My)]

Formulae to calculate r

Raw score formula:
r = SPxy / SSxSSy
SSx = (X - Mx)2 = [(X - Mx)(X - Mx)]

SSy = (Y - My)2 = [(Y - My)(Y - My)]
SPxy = [(X - Mx)(Y - My)]
r = SPxy / SSxSSy
r = [(X - Mx)(Y - My)] / (X - Mx)2 (Y - My)
Z-score formula:

r = (zxzy) / N

zx = (X - Mx) / SDx
zy = (Y - My) / SDy
SDx =(X-Mx)2 /N
SDy = (Y - My)2 / N

zx = (X - Mx) / SDx
zy = (Y - My) / SDy
SDx = (X-Mx)2 /N
SDy = (Y - My)2 / N

Proof of equivalence:
zx = (X - Mx) / (X - Mx)2 / N
zy = (Y - My) / (Y - My)2 / N
r = { [(X - Mx) / (X - Mx)2 / N] [(Y - My) / (Y - My)2 / N] } / N
r = { [(X - Mx) / (X - Mx)2 / N] [(Y - My) / (Y - My)2 / N] } / N
r = [(X - Mx)(Y - My)] / (X - Mx)2 (Y - My)2

r = SPxy / SSxSSy

Variance and covariance
Variance = MS = SS / N
Covariance = COV = SP / N

Correlation is standardized COV
Standardized so the value is in the range -1 to 1

Note on the denominators
Correlation for descriptive purposes
Divide by N
Correlation for inferential purposes

Divide by N-1

- L4c: Interpreting correlations:

Assumptions for correlation
Normal distributions for X and Y

Linear relationship between X and Y
Homoskedasticity
o To detect violations, examine scatterplots and plot a histogram of

residuals
o In a scatterplot the distance between a dot and the regression line
reflects the amount of prediction error
o Homoskedasticity means that the distances (the errors, or residuals) are
not related to the variable plotted on the X axis (they are not a function
of X)

Reliability of a correlation
Does the correlation reflect more than just chance covariance?
One approach to this question is to use NHST
o H0 = null hypothesis: e.g., r = 0
o HA = alternative hypothesis: e.g., r > 0

Truth
Decision => Retain H0
Reject H0

H0 true
Correct Decision
Type I error (False alarm)
p = (1 - )
p =
H0 false
Type II error (Miss)
Correct Decision
p =
p = (1 - )

(1 - POWER)
POWER

NHST
- p = P(D|H0)

- Given that the null hypothesis is true, the probability of these, or more extreme
data, is p
- NOT: The probability of the null hypothesis being true is p
- In other words, P(D|H0) <> P(H0|D)

NHST can be applied to:

- r (Is the correlation significantly different from zero?)
- r1 vs. r2 (Is one correlation significantly larger than another?)

There are other correlation coefficients:
Point biserial r => When 1 variable is continuous and 1 is dichotomous
Phi coefficient => When both variables are dichotomous
Spearman rank correlation => When both variables are ordinal (ranked data)

- L5a: Reliability & Validity

Reliability
- Classical test theory
Raw scores (X) are not perfect

They are influenced by bias and chance error
In a perfect world, we would obtain a true score

X = true score + bias + error

Also known as true score theory
A measure (X) is considered to be reliable as it approaches the true score

The problem is we dont know the true score
So, we estimate reliability
- Methods to estimate reliability
o Test / re-test
=> Measure everyone twice (X1 X2)
=> The correlation between X1 and X2 is an estimate of reliability

However, if the bias is uniform then we wont detect it with the test / re-test method
o Parallel tests
=> Measures with two different ways (X1 X2)
=> The correlation between X1 and X2 is an estimate of reliability

AND, now the bias of the wand will be revealed
o Inter-item
=> Inter-item is the most commonly used method in the social sciences

Test / re-test and parallel tests are time consuming

Inter-item is therefore more cost efficient

Validity
- Construct
An ideal object that is not directly observable
As opposed to real observable objects
For example, intelligence is a construct
- Construct validity
o Content validity
Does the test consist of words that children should know?
o Convergent validity

Does the test correlate with other, established measures of verbal ability?
For example, reading comprehension

o Divergent validity

Does the test correlate less well with measures designed to test a different type of
ability?

For example, spatial reasoning
o Nomological validity
Are scores on the test consistent with more general theories, for example, of child
development and neuroscience
For example, a child with neural damage or disease to brain regions associated with
language development should score lower on the test

- L5b: Sampling

Sampling error
- The difference between the population and the sample

BUT, We typically dont know the population parameters

Sampling error clearly depends on the size of the sample, relative to the size of
the population
Also depends on the variance in the population
We therefore estimate sampling error from the size of the sample and the
variance in the sample
Under the assumption that the sample is random and representative of the
population

Standard Error
- Standard error is an estimate of amount of sampling error
- SE = SD / SQRT(N)

With SE: Standard error

SD: Standard deviation of the sample

N: Size of the sample

Probability histogram
- A distribution of sample means
- Standard error is the standard deviation of the probability histogram

Distribution of sample means
Characteristics

It is hypothetical, i.e., we dont know the dimensions of the distribution as we do
with a distribution of individual scores (we estimate the dimensions)
The mean of the distribution of sample means should be the same as the mean
of the population of individuals

The variance of the distribution of sample means is less than the variance in the
population of individuals

The shape of the distribution of sample means is approximately normal
o 2M = 2 /N
o 2M is the variance of the distribution of sample means
o M is the standard deviation of the distribution of sample means
(standard error)
o 2 is the variance of the population
o is the standard deviation of the population
o N is the sample size

Central Limit Theorem

- Three principles
o The mean of the distribution of sample means is the same as the mean of
the population
o The standard deviation of the distribution of sample means is the square
root of the variance of the distribution of sample means, 2M = 2 /N
o The shape of the distribution of sample means is approximately normal if

either (a) N >= 30 or (b) the shape of the population distribution is
normal

3) Regression & Hypothesis testing

L7a: Introduction to Regression
-

What is a regression?
A statistical analysis used to predict scores on an outcome variable, based on scores
on one or more predictor variables
For example, we can predict how many runs a baseball player will score (Y) if we
know the players batting average (X)

Regression equation
Y = B0 + B1X1 + e
= B0 + B1X1 # is the predicted score on Y

Y = e # e is the prediction error (residual)

Estimation of coefficients
The values of the coefficients (B) are estimated such that the model yields optimal
predictions.
Minimize the residuals!

The sum of the squared (SS) residuals is minimized

SS.RESIDUAL = ( -Y)2
ORDINARY LEAST SQUARES estimation

Sum of Squared deviation scores (SS) in variable X = SS.X; in Y = SS.Y

Ss.X
Ss.Y

Sum of Cross Products (SP.XY)
(Also called SS.MODEL )

SP.XY

SS.Y = SS.MODEL + SS.RESIDUAL

SS.MODEL

SS.RESIDUAL

How to calculate B (unstandardized)

B = r x (SDy/ SDx)

Standardized regression coefficient = = r

If X and Y are standardized then:
SDy = SDx = 1
B = r x (SDy/ SDx)
= r

- L7b: A Closer Look at NHST

H0 = null hypothesis: e.g., r = 0, B = 0

HA = alternative hypothesis: e.g., r > 0, B > 0
Assume H0 is true, then calculate the probability of observing data with these
characteristics, given that H0 is true

Thus, p = P(D| H0)
If p < then Reject H0, else Retain H0

- t = B / SE
B is the unstandardized regression coefficient
SE = standard error
SE = SS.RESIDUAL / (N 2)

Problems
- Biased by N

p-value is based on t-value
t = B / SE

SE = SS.RESIDUAL / (N 2)

- Binary outcome
Technically speaking, one must Reject or Retain the Null Hypothesis
What if p = .06?

- Null model is a weak hypothesis

Demonstrating that your model does better than NOTHING is not very impressive

Alternatives to NHST
- Effect size
Correlation coefficient (r)
Standardized regression coefficient (B)
Model R2

- Confidence intervals
Sample statistics are point estimates
Specific to the sample

Will vary as a function of sampling error
Instead report interval estimates
Width of interval is a function of standard error

- Model comparison

Propose multiple models
Model A

Model B

2
Compare Model R

- L8a: Introduction to Multiple Regression

Simple vs. multiple regression
Simple regression => Just one predictor (X)

Multiple regression => Multiple predictors (X1, X2, X3, ...)

Multiple regression equation
Just add more predictors (multiple Xs)
= B0 + B1X1+ B2X2+ B3X3+ ... + BkXk

= B0 + (BkXk)

= predicted value on the outcome variable Y
B0 = predicted value on Y when all X = 0
Xk = predictor variables

Bk = unstandardized regression coefficients

Y - = residual (prediction error)
k = the number of predictor variables

Model R and R2
R = multiple correlation coefficient
R = rY

The correlation between the predicted scores and the observed scores

R2

The percentage of variance in Y explained by the model

Types of multiple regression
Standard
Sequential (aka hierarchical)

The difference between these approaches is how they handle the correlations
among predictor variables

If X1, X2, and X3 are not correlated then type of regression analysis doesnt matter

If predictors are correlated then different methods will return different results

o All predictors are entered into the regression equation at the same time
o Each predictor is evaluated in terms of what it adds to the prediction of Y

that is different from the predictability offered by the others

o Overlapping areas are assigned to R2 but not to any individual B
Standard
Sequential
o Predictors are entered into the regression equation in ordered steps; the
order is specified by the researcher
o Each predictor is assessed in terms of what it adds to the equation at its
point of entry
o Often useful to assess the change in R2 from one step to another

-
-
L8b: Matrix Algebra

A matrix is a rectangular table of known or unknown numbers, e.g.,

M = 1 2
5 1
3 4
4 2
- The size, or order, of a matrix is given by identifying the number of rows and
columns, e.g., the order of matrix M is 4x2
- The transpose of a matrix is formed by rewriting its rows as columns

M = 1 2
5 1
3 4
4 2

MT = 1534
`
2142
- Two matrices may be added or subtracted only if they are of the same order
- Two matrices may be multiplied when the number of columns in the first matrix
is equal to the number of rows in the second matrix. If so, then we say they are
conformable for matrix multiplication.
- R = MT * N

Rij = (MTik * Nkj)

-
-
A square matrix has the same number of rows as columns

A square symmetric matrix is such
- Diagonal matrices are square
T
that D = D
matrices with zeroes in all off-
diagonal cells

- The inverse of a matrix is similar to the reciprocal of a scalar
e.g., the inverse of 2 is 12 and their product = 1

Inverses only exist for square matrices and not necessarily for all square matrices
- An inverse is such that D * D-1 = I where I is the identity matrix
The determinant of a matrix is a scalar derived from operations on a square

matrix. For example, for a 2x2 matrix A the determinant is denoted as |A| and is
obtained as follows:

|A| = a11*a22 - a12*a21

- A vector is a matrix with only one row or only one column
A row vector is a row of vector elements
A column vector is a column of vector elements

- Raw data matrix
Subjects as rows, variables as columns
-
-
Row vector of sums (totals)

- Row vector of means
M1p = T1p * N-1 = (34 35 34)* 10-1 = (3.4 3.5 3.4)

- Matrix of means

Matrix of deviation scores

Sum of squares and cross-products matrix

-
-
-
-

-
Variance-covariance matrix

Diagonal matrix of standard deviations

Correlation matrix

L8c: Estimation of Coefficients
Still ORDINARY LEAST SQUARES estimation, but using matrix algebra
The values of the coefficients (B) are estimated such that the model yields
optimal predictions.
o Minimize the residuals!
o The sum of the squared (SS) residuals is minimized
o SS.RESIDUAL = ( -Y)2
o ORDINARY LEAST SQUARES estimation

Regression equation
o = B0 + B1X1
# is the predicted score on Y
o Y = e # e is the prediction error (residual)
Regression equation, matrix form
o = BX
o is a [N x 1] vector (N = number of cases)

o B is a [(1+k) x 1] vector (k = number of predictors)
o X is a [N x (1+k)] matrix
- Make X square and symmetric
o To do this, pre-multiply by the transpose of X, X

o X = XXB

- To solve for B, get rid of XX
o To do this, pre-multiply by the inverse, (XX)-1

(XX)-1X = (XX)-1XXB

(XX)-1XX = I
IB = B
(XX) -1X = B

4) Mediation & Moderation

Specific types of regression analyses that are popular in the social sciences:

- Mediation
Study the relation between two variables (X, Y) and try to find a third variable (M)
that mediates this relation
Popular because social sciences rely heavily on observational studies, raising
concerns over causation. Mediation helps answer some of these concerns.

- Moderation (lect. 11)
Variable z (moderator) influences the relation between two variables (X, Y). Z might
cancel or enhance the relationship.

Two approaches to mediation:
- Regression method
- Path analysis method

- L10a: Regression method for Mediation
An example:
X: Psychological trait (extraversion)
Y: Behavioral outcome (happiness)
M: Mechanism (diversity of life experience)
Z: Moderator (Socio-economic status, SES)

Mediation hypothesis:
Extrovert people might have more diverse experiences, which in turn make you
happier

Moderation hypothesis:
SES moderates the relationship between psychological trait and behavioral outcome
(eg. True for high SES, but not for lower SES)

A mediation analysis is typically conducted to better understand and observed
correlation between X and Y
Eg. Why is extraversion correlated with happiness?

We know from simple regression analysis that if X and Y are correlated then we can
use regression to predict Y from X
Y = B0 + B1X + e

Now if X and Y are correlated BECAUSE of the mediator M, then (X=>M=>Y):

Y = B0 + B1M + e
(M=>Y)

&
M = B0 + B1X + e

(X=>M)
In a single equation:
Y = B0 + B1M + B2X + e

What will happen to the predictive value of X?
In other words, will B2 be significant?
If there is full mediation, B2 should no longer be significant
It is possible, however, that B2 will remain significant. In that case, we will have
partial mediation

A mediator variable (M) accounts for some or all of the relationship between X and
Y:
Some => partial mediation
All => full mediation

CAUTION!
- Correlation does not imply causation!
- In other words, there is a BIG difference between statistical mediation (based
on cross-sectional data) and true causal mediation (based on an
experimental design)

How to test for mediation

- Run three regression models:
o lm(Y~X)
o lm(M~X)
o lm(Y~X+M)

To get full mediation:
o lm(Y~X)
Regression coefficient for X should be significant
o lm(M~X)
Regression coefficient for X should be significant
o lm(Y~X+M)
Regression coefficient for M should be significant
Regression coefficient for X?
(If X becomes ns => full mediation, if X remains significant => partial mediation)

Eg.
- Assume N = 188
- Participants surveyed and asked to report:
o Happiness (happy)
o Extraversion (extra)
o Diversity of life experiences (diverse)

- L10b: Path Analysis Method for Mediation

Mediation analyses are typically illustrated using path models

Rectangles: Observed variables (X, Y, M)
Circles: Unobserved variables (e)
Triangles: Constants
Arrows: Associations (more on these later)

Path model with a mediator

To avoid confusion, lets label the paths
a: Path from X to M
b: Path from M to Y
c: Direct path from X to Y (before including M)
c: Direct path from X to Y (after including M)
Note: (a*b) is known as the indirect path


How to test for mediation
Three regression equations can now be re-written with new notation:
Y = B0 + cX + e

Y = B0 + cX + bM + e

M = B0 + aX + e

- The Sobel test
z = (Ba*Bb) / (Ba2*SEb2) + (Bb2*SEa2)
o The null hypothesis

The indirect effect is zero
(Ba*Bb) = 0

Results in path model


- Three regression equations:

happy = 2.19 + .28(extra) + e

diverse = 1.63 + .28(extra) + e

happy = 1.89 + .22(extra) +.19(diverse) + e

Sobel z = +1.98, p = .04

=> Interpretation
- Partial not full, mediation
- Partial mediation because the direct effect (extra) is still significant after adding
the mediator (diverse) into the regression equation
- According to the Sobel test, the indirect effect is statistically significant

Mediation: Final comments
- Here we used path analysis to *illustrate* the mediation analysis

- It is also possible to test for mediation using a statistical procedure called:
Structural Equation Modeling (SEM)

Also:
Causality!

The example here was weak in terms of ability to make causal statements

Mediation analysis is more powerful with:
o True independent variables
o The incorporation of time

L11a: Moderation Example
- KISS! Keep It Simple Stupid!

- Only 4 variables!
X: Predictor variable (could be an IV)

Y: Outcome variable (could be a DV)

M: Mediator variable
Z: Moderator variable

An example
X: Psychological trait = Extraversion

Y: Behavioral outcome = Happiness

M: Mechanism = Diversity of life experience

Z: Moderator (ZAP! or ZING!) = Socio-Economic-Status (SES)

A moderator variable (Z) has influence over the relationship between X and Y
=> For example, suppose X and Y are positively correlated

The moderator (Z) can change that (ZAP, ZING)

If X and Y are correlated then we can use regression to predict Y from X

Y = B0 + B1X + e

CAUTION!
If there is a moderator, Z, then B1 will depend on Z

The relationship between X and Y changes as a function of Z

- Assume N = 188
Participants surveyed and asked to report:
Happiness (happy)
Extraversion (extra)
Diversity of life experiences (diverse)

All scales 1-5

- To simplify, lets make SES categorical
SES = 1 = HIGH SES
SES = 0 = LOW SES

- Results: Before adding PRODUCT

= B0(1) + B1(EXTRA) + B2(SES)
= 3.04 + .039 (EXTRA) + 0.00(SES)

- Results: After adding PRODUCT
= B0(1) + B1(EXTRA) + B2(SES) + B3(PRODUCT)

= 3.88 + -0.20(EXTRA) + -1.69(SES) + 0.47(PRODUCT)

=> Interpretation
SES moderates the relationship between extraversion and happiness
Moral of the story:
The picture can change, literally, when you consider a new variable

Quick example:
Working memory capacity (X)

SAT (Y)

Type of University (Z)

o Large Public State University
o Ivy League (ZAP!)

=> Interpretation
Type of University moderates the relationship between WMC and SAT

Moderation model
Y = B0 + B1X + B2Z + B3(X*Z) + e

How to test for moderation
Run just one regression model

lm(Y~X + Z + X*Z)
o Need to create new column for (X*Z)
o Lets call it PRODUCT

L11b: Data Centering and Dummy Coding

Centering predictors
- To center means to put in deviation form
XC = X - M

- Why center?

Two reasons:

o Conceptual reason
Suppose

Y = childs language development
X1 = mothers vocabulary
X2 = childs age

The intercept, B0, is the predicted score on Y when all X are zero

If X = zero is meaningless, or impossible, then B0 will be difficult to interpret
If X = zero is the average then B0 is easy to interpret

The regression coefficient B1 is the slope for X1 assuming an average score on X2

No moderation implies that B1 is consistent across the entire distribution of X2

However, moderation implies that B1 is NOT consistent across the entire
distribution of X2
Where in the distribution of X2 is B1 most representative?

Lets look at this graphically

o Statistical reason

The predictors, X1 and X2, can become highly correlated with the product, X1*X2
Can result in multi-colinearity

Centering for moderation: Summary

Center predictors
Run sequential regression (2 steps)

Step 1: Main effects
Step 2: Moderation effect
Evaluate B for PRODUCT or R2 from Model 1 to Model 2

Dummy coding

A system to code categorical predictors in a regression analysis
Example

IV: Area of research
Cognitive

Social
Neuroscience
Cognitive neuroscience

DV: # of publications

Regression model

= B0 + B1(C1) + B2(C2) + B3(C3)

L11c: Moderation Example 2

DV = salary
IVs
# of publications
Department

Psychology
Sociology

History

Steps of the analysis:
1/Center continuous predictor
(here publications
2/Dummy code categorical predictor
(create dummy codes for department,
with Psych as the reference group)
3/Create moderation terms
4/Run sequential regression in 2 steps
Step 1: Main effects

Step 2: Moderation effect

Regression model: Before moderation

Y= B0 + B1(PUBS.C) + B2(C1) + B3(C2)

Interpretation of results

The estimated salary for a Psychologist with 15.5 pubs is 58,482

The average return per publication across all three departments is 926

When taking into account publications, Historians earn 10,447 more than
Psychologists

When taking into account publication rate, Sociologists earn 8,282 more than
Psychologists

Regression model: Moderation

= B0 + B1(PUBS.C) + B2(C1) + B3(C2) + B4(C1*PUBS.C)+ B5(C2*PUBS.C)


The estimated salary for a Psychologist with 15.5 pubs is 56,918 (taking into
account the rate of return for Psychologists)

The average return per pub for Psychology is 1,373

The difference in salary between Psychology and History is 9,796 (for a person with
15.5 pubs, taking into account rate of return)

The difference in salary between Psychology and Sociology is 9,672 (for a person
with 15.5 pubs, taking into account rate of return)

The difference in the pubs by salary slope between Psychology and History is -961

The difference in the pubs by salary slope between Psychology and Sociology is -
1,115

Further questions

Is the History slope significant?
Is the Sociology slope significant?
Is the difference in slope between History and Sociology significant?
Re-code to make a different reference group and re-run the analysis

Test of simple slopes
Dont enter the main effect of publications
Create moderation terms that represent the slope for each group

Simple slopes: Regression terms

Enter DV (salary)
Enter IVs

Main effect of department
C1, C2
Three moderation terms
SS1, SS2, SS3

The Bs for the moderation terms are the simple slopes
Psychology is significant
History is not significant
Sociology is not significant
Department moderates the relationship between publications and salary
5) Students t-test & ANOVA

L13a: Overview of Students t-test

z = (observed expected) / SE

t = (observed expected) / SE

When to use z and t?

- z
When comparing a sample mean to a population mean and the standard
deviation of the population is known

- Single sample t
When comparing a sample mean to a population mean and the standard
deviation of the population is not known
- Dependent samples t
When evaluating the difference between two related samples
- Independent samples t
When evaluating the difference between two independent samples

Notation
: population standard deviation
: population mean
SD: sample standard deviation

M: sample mean

SE: standard error
SEM: standard error for a mean

SEMD: standard error for a difference (dependent)

SEDifference: standard error for a difference (independent)

p values for z and t
Exact p value depends on:
Directional or non-directional test?
df

Different t-distributions for different sample sizes

z : NA

t (single sample) : N-1

t (dependent) : N-1

t (independent) : (N1-1) + (N2-1)

Single sample t
Compare a sample mean to a population mean
t = (M - ) / SEM

2
2
SE M = SD /N

SEM = SD/SQRT(N)
SD2 = (X - M) 2 / (N - 1) = SS/df = MS

Example:
Suppose it takes rats just 2 trials to learn how to navigate a maze to receive a food
reward

A researcher surgically lesions part of the brain and then tests the rats in the maze.
Is the number of trials to learn the maze significantly more than 2?

SD2 = (X - M) 2 / (N - 1) = SS/df = 26 / 4 = 6.5
SE2M = SD2/N = 6.5 / 5 = 1.3

SEM = 1.14
t = (M - ) / SEM = (6 - 2) / 1.14 = 3.51

Effect size (Cohens d)

d = (M - ) / SD = (6 - 2) / 2.55 = 1.57
For a directional test with alpha = .05, df = 4, p = .012
Reject H0

- L13b: Dependent & Independent t-tests

Dependent means t

The formulae are actually the same as the single sample t but the raw scores are
difference scores, so the mean is the mean of the difference scores and SEM is based
on the standard deviation of the difference scores

Suppose a researcher is testing a new technique to help people quit smoking. The
number of cigarettes smoked per day is measured before and after treatment. Is the
difference significant?

SD2 = (D - MD) 2 / (N - 1) = SS/df = 48 / 3 = 16
SE2MD = SD2/N = 16 / 4 = 4
SEMD = 2

t = (MD - ) / SEMD = (-5 - 0) / 2 = -2.5

t = MD / SEMD = -5 / 2 = -2.5
For a directional test with alpha = .05, df = 3, p = .044
=> Reject H0

Effect size
d = (MD - ) / SD = -5/4 = -1.25

Note: = 0

Independent means t

Compares two independent groups

For example, males and females, control and experimental, patients and
normals, etc.

t = (M1 - M2) / SE Difference
SE2 Difference = SE2M1 + SE2M2
SE2M1 = SD2 Pooled / N1

2
SE M2 = SD2Pooled / N2

SD2Pooled = df1/dfTotal(SD21) + df2/dfTotal(SD22)

Notice that this is just a weighted average of the sample variances

Group 1 (young adults)

M1 = 350

SD1 = 20

N1 = 100

Group 2 (elderly adults)
M2 = 360

SD2 = 30

N2 = 100

Null hypothesis: 1 = 2
Alternative hypothesis: 1 < 2
SD2 Pooled = df1/df Total (SD21) + df2/df Total (SD22)

SD2Pooled = 99/198(400) + 99/198(900)
SD2 Pooled = 650

SE2M1 = SD2 Pooled / N1 = 650 / 100 = 6.5
SE2M2 = SD2 Pooled / N2 = 650 / 100 = 6.5
SE2 Difference = SE2M1 + SE2M2= 13

2
SEDifference = SE Difference = 3.61

t = (M1 - M2) / SE Difference
t = (350 - 360) / 3.61 = -2.77

p = .003 (based on df = 198, = .05, directional test)

Reject H0

- L14a: GLM

GLM is the mathematical framework used in many common statistical analyses,
including multiple regression and ANOVA
ANOVA is typically presented as distinct from multiple regression but it IS a
multiple regression

Characteristics of GLM
Linear: pairs of variables are assumed to have linear relations
Additive: if one set of variables predict another variable, the effects are thought to
be additive
BUT! This does not preclude testing non-linear or non-additive effects
GLM can accommodate such tests, for example,
Transformation of variables
Transform so non-linear becomes linear
Moderation analysis
Fake the GLM into testing non-additive effects

Examples
Simple regression Y = B0 + B1X + e
Multiple regression Y = B0 + B1X + B2X + B3X + e
One-way ANOVA
Y = B0 + B1X + e
Factorial ANOVA Y = B0 + B1X + B2X + B3X + e

ANOVA: Appropriate when the predictors (IVs) are all categorical and the outcome
(DV) is continuous
Most common application is to analyze data from randomized experiments
More specifically, randomized experiments that generate more than 2 means
(If only 2 means then use t-tests)

NHST may accompany ANOVA

The test statistic is the F-test

F = systematic variance / unsystematic variance

Like the t-test and its family of t distributions, the F-test has a family of F
distributions, depending on:

Number of subjects per group

Number of groups

L14b: One-way ANOVA
-

F ratio
F = systematic variance / unsystematic variance
F = between-groups variance / within-groups variance
F = MSBetween / MSWithin

F = MSA
/ MSS/A
With MSA = SSA / dfA
MSS/A= SSS/A/ dfS/A

SSA = n (Yj - YT)2

Yj are the treatment means
YT is the grand mean

SSS/A = n (Yj - YT)2

Yij are individual scores

Yj are the treatment means

dfA = a - 1

dfS/A = a(n - 1)

dfTotal = N - 1

Effect size

R2 = 2 (eta-squared)
2 = SSA / SSTotal

Assumptions

DV is continuous

DV is normally distributed
Homogeneity of variance
Within-groups variance is equivalent for all groups

Levenes test (If Levenes test is significant then homogeneity of
variance assumption has been violated) => Conduct comparisons using a
restricted error term

L14c: Factorial ANOVA

Two IVs (treatments)

One continuous DV (response)

Three F ratios :
FA

FB

FAxB

- Main effect: the effect of one IV averaged across the levels of the other IV
- Interaction effect: the effect of one IV depends on the other IV (the simple
effects of one IV change across the levels of the other IV)
- Simple effect: the effect of one IV at a particular level of the other IV

Main effects and interaction effect are independent from one another
Note that this is different from studies that dont employ an experimental
design
For example, in MR, when predicting faculty salary, the effects of
publications and years since the Ph.D. were correlated

Factorial ANOVA is just a special case of multiple regression.
It is a multiple regression with perfectly independent predictors (IVs).

F ratios
FA = MSA / MSS/AB

FB = MSB / MSS/AB
FAxB = MSAxB / MSS/AB

MS
MSA = SSA / dfA

MSB = SSB / dfB

MSAxB = SSAxB / dfAxB
MSS/AB = SSS/AB / dfS/AB

df
dfA = a - 1

dfB = b - 1

dfAxB = (a -1)(b - 1)
dfS/AB = ab(n - 1)

dfTotal = abn - 1 = N 1

Follow-up tests
Main effects
Post-hoc tests
Interaction
Analysis of simple effects
Conduct a series of one-way ANOVAs

For example, we could conduct 3 one-way ANOVAs comparing
high and low spans at each level of the other IV
Effect size
Complete 2
2 = SSeffect / SStotal
Partial 2

2
= SSeffect / (SSeffect + SSS/AB)

Assumptions
Assumptions underlying the factorial ANOVA are the same as for the one-way
ANOVA

DV is continuous

DV is normally distributed

6) Factorial ANOVA & Model Comparison

- L16a: Benefits of Repeated Measures ANOVA

Benefits
Less cost (fewer subjects required)
More statistical power
Variance across subjects may be systematic

If so, it will not contribute to the error term

MS and F
MSA = SSA / dfA

MSAxS = SSAxS / dfAxS
F = MSA / MSAxS

Post-hoc tests
The error term MSAxS is NOT appropriate

Need to calculate a new error term based on the conditions that are being
compared

FA= MSA / MSAxS
MSA = SSA / dfA
MSAxS = SSAxS / dfAxS

Correct for multiple comparisons
Bonferroni

Sphericity assumption
Homogeneity of correlation
r12 = r13 = r23

How to test?
Mauchlys test

If significant then report the p value from one of the corrected tests
Greenhouse-Geisser
Huyn-Feldt

- L16b: Risks of Repeated Measures ANOVA

- Order effects

- Counterbalancing
o Consider a simple design with just two conditions, A1 and A2

One approach is a Blocked Design
Subjects are randomly assigned to one of two

order conditions
A1, A2
A2, A1

o Consider a simple case with just two conditions, A1 and A2
Another approach is a Randomized Design

Conditions are presented randomly in a mixed fashion
A2, A1, A1, A2, A2, A1, A2.....

Now suppose a = 3 and a blocked design
There are 6 possible orders (3!)
A1, A2, A3
A1, A3, A2
A2, A1, A3
A2, A3, A1
A3, A1, A2
A3, A2, A1
To completely counterbalance, subjects would be randomly assigned to one of 6
order conditions

The number of conditions needed to completely counterbalance becomes large with
more conditions

4! = 24
5! = 120

With many levels of the IV a better approach is to use a Latin Squares design

Latin Squares designs arent completely counterbalanced but every condition
appears at every position at least once

For example, if a = 3, then
A1, A2, A3
A2, A3, A1
A3, A1, A2

Missing data

Two issues to consider

- Relative amount of missing data
How much is a lot?
No hard and fast rules

A rule of thumb is
Less than 10% on any one variable, OK
Greater than 10%, not OK

- Pattern of missing data

Is the pattern random or lawful?
This can easily be detected

For any variable of interest (X) create a new variable (XM)

XM = 0 if X is missing
XM = 1 if X is not missing
Conduct a t-test with XM as the IV and X as the DV

If significant then pattern of missing data may be lawful

- Remedies
Drop all cases without a perfect profile
Drastic

Use only if you can afford it
Keep all cases and estimate the values of the missing
data points
There are several options for how to estimate values

Estimation methods
Insert the mean

Conservative
Decreases variance
Regression-based estimation

More precise than using the mean but

Confusion often arises over which variables to use as predictors in the
regression equation

- L17a: Mixed Factorial ANOVA

Design
One IV is manipulated between groups
One IV is manipulated within groups

Repeated measures

Whats new?
Partitioning SS

Formulae for FA, FB, FAxB

Error term for post-hoc tests

Approach to simple effects analyses

Assumptions

df

dfA = a - 1

dfB = b - 1

dfAxB = (a - 1)(b - 1)
dfS/A = a(n - 1)
dfBxS/A = a(b - 1)(n - 1)
dfTotal = (a)(b)(n) 1

MS
MSA = SSA / dfA

MSB = SSB / dfB
MSAxB = SSAxB / dfAxB
MSS/A = SSS/A / dfS/A
MSBxS/A = SSBxS/A / dfBxS/A
F
FA = MSA / MSS/A

FB = MSB / MSBxS/A
FAxB = MSAxB / MSBxS/A

Post-hoc tests on main effects

Post-hoc tests on the between-groups IV are performed in the same way as with a
one-way ANOVA

TukeyHSD
Post-hoc tests on the repeated measures IV are performed in the same way as with a
one-way repeated measures ANOVA

Pairwise comparisons with Bonferroni correction

Simple effects analyses
Must choose one approach or the other (to report both is redundant)
Simple effects of the between groups IV
Or Simple effects of the repeated IV

Simple effects of the between groups IV
Simple effect of A at each level of B
FA.at.b1 = MSA.at.b1 / MSS/A.at.b1

Simple comparisons use the same error term
MSS/A.at.b1

Simple effect on the repeated measures IV

Simple effect of B at each level of A
FB.at.a1 = MSB.at.a1 / MSBxS/A.at.a1

Assumptions
Each subject provides b scores
Therefore, there are
b variances
(b*(b+1) / 2) - b covariances (correlations)

e.g., if b = 3 then 3 covariances

e.g., if b = 4 then 6 covariances

- Between-groups assumptions
The variances do not depend upon the group
Levenes test (If violated then calculate a new restricted error term)
The covariances do not depend upon the group
Boxs test of equality of covariance matrices

- Within-subjects assumptions
Sphericity: the variances of the different treatment scores (b) are the same and the
correlations among pairs of treatment means are the same

If violated then report Greenhouse-Geisser or Huynh-Feldt values

Coursera Statistics One - Notes and Formulas

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Coursera Statistics One - Notes and Formulas

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Coursera Statistics One - Notes and Formulas

Uploaded by

Copyright:

Available Formats

STATISTICS

ONE NOTES AND FORMULAES

L2b: Summary Statistics

Z-scores: A standardized unit of measurement

o To detect violations, examine scatterplots and plot a histogram of

X = true score + bias + error

BUT, We typically dont know the population parameters

o The shape of the distribution of sample means is approximately normal if

o Each predictor is evaluated in terms of what it adds to the prediction of Y

o Overlapping areas are assigned to R2 but not to any individual B

L8b: Matrix Algebra

A square matrix has the same number of rows as columns

e.g., the inverse of 2 is 12 and their product = 1

The determinant of a matrix is a scalar derived from operations on a square

Row vector of sums (totals)

Matrix of deviation scores

o is a [N x 1] vector (N = number of cases)

Now if X and Y are correlated BECAUSE of the mediator M, then (X=>M=>Y):

Note: (a*b) is known as the indirect path

Path model with a mediator

- Three regression equations:

L11a: Moderation Example

- KISS! Keep It Simple Stupid!

L11b: Data Centering and Dummy Coding

L11c: Moderation Example 2

Simple slopes: Regression terms

5) Students t-test & ANOVA

Alternative hypothesis: 1 < 2

SD2 Pooled = df1/df Total (SD21) + df2/df Total (SD22)

SSA = n (Yj - YT)2

L14c: Factorial ANOVA

Subjects are randomly assigned to one of two

Post-hoc tests on main effects

Simple effect on the repeated measures IV

You might also like