0% found this document useful (0 votes)
34 views

x=^μ= x n x x) n−1 s σ s x−μ σ se (´x) = σ n: Sample Mean

This document defines key statistical concepts used in analyzing samples and populations, modeling relationships between variables, and testing hypotheses. It explains that the sample mean is calculated as the sum of all values divided by the sample size, and is an estimate of the population mean. It also defines concepts like variance, standard deviation, z-scores, confidence intervals, and linear regression models used to quantify relationships between variables and test whether relationships are statistically significant.
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

x=^μ= x n x x) n−1 s σ s x−μ σ se (´x) = σ n: Sample Mean

This document defines key statistical concepts used in analyzing samples and populations, modeling relationships between variables, and testing hypotheses. It explains that the sample mean is calculated as the sum of all values divided by the sample size, and is an estimate of the population mean. It also defines concepts like variance, standard deviation, z-scores, confidence intervals, and linear regression models used to quantify relationships between variables and test whether relationships are statistically significant.
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

n

Sample Mean
∑ xi Sum of all x values divided Sample mean is also the population
x́= ^μ = i=1 by the sample size estimate
n
n Sum of squares of how far
Sample Variance
∑ ( x i−x́)2 each x value is from the
Distance x values lie from the mean
s x2 =σ^ x2 = i=1 mean divided by the
n−1 sample size less one
Expresses variability in population
Sample Standard
s x =σ^ x =√ s x 2 Square root of the variance (distance values dispersed from
Deviation
estimated mean)
x−μ Subtract mean from x and Transforms data from a normal
z-score z= divide by the standard distribution into a standard normal
σ
deviation distribution
σ Population standard Describes spread of sampling
Standard Error se( x́ )= deviation divided by square distribution; estimate of population
√n root of the sample size standard deviation
Average of sample should
be the population average; Expected mean of all x values should
Expected Value E ( x́ )=μ sample mean is an equal population mean; same value as
unbiased estimate of the mode in a standard distribution
population mean
Sample estimate of θ
^ ¿
θ−θ minus the hypothesized If larger than critical value, can reject
Test Statistic t=
^
se ( θ) value of θ divided by the the null hypothesis
standard error
α α is the significance level The cutoff value for determining if a
Critical Value t n−1 ,
2 we’re willing to accept test is significant
The confidence interval at
Interval of numbers within which we
α level of a population
CI 100 (1−α ) % ( μ )= μ^ ± t α se ( μ
believe the parameter will fall; if
Confidence Interval [ n−1 ,
2
^)
] mean is estimate of the
mean plus or minus the
hypothesized mean not in interval,
reject the null
margin of error
With repeated sampling, 95% of these
Confidence Interval
CI 95% =[ x́−1.96 se ( x́ ) , x́+ 1.96 se (x́) ] samples will contain the true parameter
(95%)
μ
Confidence Interval CI 99% =[ x́−2.59 se ( x́ ) , x́+ 2.59 se ( x́) ] With repeated sampling, 99% of these
(99%) samples will contain the true parameter
μ
y is the function of x plus
Relationship holds true generally, but
any uncertainty that
Stochastic Model y=f ( x )+ ε there are some variations and
account for small
uncertainties
differences
y is the function of β0 plus
One unit of y changes as a function of x
Simple Linear y i=β 0 + β 1 x i +ε i x times β1 plus error; β0 is
by β1 units; null hypothesis says β1 is 0,
Regression Model the y intercept; β1 is the
so y changes by β0
slope
n

∑ ( x i−x́ ) ( y i− ý ) The sum of x and y values


Tells us the value of β1 to use in the
^β 1= i=1 minus their means, divided
Estimated β1 simple linear regression; one unit
n
by the sum of squares of x
∑ (x i−x́)2 change in x predicts a β1 change in y
values minus their means
i =1
The mean value of y minus
^β 0= ý −β1 x́ Tells us the value of β0 to use in the
Estimated β0 the mean value of x times
simple linear regression
β1
Estimated error is the
The deviation of the original y left over
difference between
Residual ε^ = y− β^ 0 − ^β1 x= y− ^y once relationship with x is taken into
observed y and estimated
account
(fitted) y
Sum of residuals squared is
n n The difference between our estimated y
Sum of Squares the sum of the difference
SS Residual=∑ ε^ i2 =∑ ( y i ¿− ^y i )2 ¿ line and our actual y values; want to
Residual between y values and
i=1 i=1 minimize the squares between them
estimated y
Total variation is sum of
n
Sum of Squares the difference between y All variation in a given simple linear
SSTotal =∑ ( y i− ý )2
Total i=1 values and the mean of y regression model
squared
Explained variation is the
n
Sum of Squares sum of difference between Variation of the estimated y values
SS Explained=∑ ( ^y i− ý )2
Explained i=1 estimated y and the mean around their mean
of y squared
Total variation is the sum
Sum of Squares SSTotal =SS Residual + S S Explained of residual variation and
Total
explained variation
SS SS Goodness of fit is the ratio
Coefficient of A measure of how well the regression
R = Explained =1− Residual
2
of the explained variance
Determination SSTotal SSTotal line fits the data (least squares)
to the total variance
Estimate of the magnitude of the
Standard Error of Square root of the residual
se Regression= √ σ^ ε2= σ^ ε typical deviation from the regression
Regression variance
line
n
Sum of residuals squared Estimate of the variance of the
Residual Variance SS ∑ ε^ i2 divided by the sample size population errors; for multiple
σ^ ε2= Residual = i=1 less two regression, n-k-1
n−2 n−2
n
The sum of x and y values
Sample Covariance
∑ ( x i−x́ ) ( y i− ý ) minus their means divided
Measure of how two variables move or
s xy = i=1
vary together
n−1 by sample size less one
Sample covariance divided
s xy by the standard deviation Measure of the strength of association
Sample Correlation r xy =
sx s y of x times the standard between two variables
deviation of y
Multiple Linear y=β 0 + β 1 x 1 + β 2 x 2 +…+ β k x k + ε The more x values you use,
Regressing multiple x variables on y
Regression Model the less error there will be
(Sum of squares of the
SS Model model divided by # of
MS Model regressors) over (sum of Tells the overall significance of the
k
F-Statistic F= = squares of the residuals regression model; if significant, at least
MS Residuals SS Residuals
divided by # of one x variable is related to y
n−k −1 observations - # of
regressors -1)
σx Population parameter in a multiple
OLS Formula for β 1= 1 y Covariance of the variables
regression; add/subtract pieces for
β1 σ2x 1
divided by the variance
omitted variables (β2)
Y is the overall mean
Population model for ANOVA;
One-Way ANOVA Y gi =μ +τ g+ ε gi across all groups (μ) plus
g=group;
Model deviation of group mean τ g=μ g−μ
from overall mean (τ)
g=group (level)
G
Sum of Squares G=number of groups
SSBetween =∑ n g ( ý g . − ý ..)2 Amount of variation between groups
Between Groups g=1 i=observations within
groups
ng=number of observations
G ng
Sum of Squares 2 within groups
SSWithin =∑ ∑ ( y gi − ý g . ) Amount of variation within groups
Within Groups g=1 i=1
N=overall number of
observations
G ng H 0 :μ 1=μ2=…=μG
Sum of Squares Total amount of variation for a one-
SSTotal =∑ ∑ ( y gi − ý ..)2 Ha: at least two population
Total (One-Way) way ANOVA
g=1 i=1
means are different
One-Way ANOVA SSTotal =SS Between + SSWithin + SS Error
Identity
ng
Sum of all observations
Sample Group ∑ y gi divided by number of Sample group mean for group g
Mean ý g. = i=1 observations in a group
ng
ng
Sum of (observations
Sample Group ∑ ( y gi−ý g .)2 minus the mean) squared Sample group variance for group g
Variance s g2= i=1 for a group
ng −1
Sum of all observations
G ng
1 across all groups divided
Grand Mean ý ..= ∑∑ y Overall mean
N g=1 i=1 gi by the total number of
observations
SS Between
F-Test for One- G−1 MS Between Tells if two means are different from
F= =
Way ANOVA SSWithin MS Within one another
N−G
i=level of factor A
αβ=interaction of factors
j=level of factor B
Two-Way ANOVA Population model for a two-way
Y ijk =μ+α i+ β j+(αβ )ij + ε ijk th
k=k observation in cell ij
Model ANOVA; observations depend on how
α=# of levels in factor A
one factor exists related to another
β=# of levels in factor B
a b n
Sum of Squares Total amount of variation for a two-
SSTotal =∑ ∑ ∑ ( y ijk − ý ..)2
Total (Two-Way) i=1 j=1 k=1 way ANOVA
a
Sum of Squares for 2
SS A =bn ∑ ( ýi ..− ý …¿ ) ¿ Tests main effect of Factor A
Factor A i=1
b
Sum of Squares for
SSb =an ∑ ( ý . j . − ý …)2 Tests main effect of Factor B
Factor B j =1
a b
Sum of Squares for Tests interaction effect of Factors A
SS AB=n ∑ ∑ ( ý ij . −ý i.. −ý . j . − ý …)2
Factor AxB i=1 j =1 and B
a b 2
Sum of Squares The amount of error in the two-way
SS Error =∑ ∑ ∑ eijk 2
Error i=1 j=1 k=1 ANOVA
Two-Way ANOVA SSTotal =SS A + SS B + SS AB + SS Error
Identity
ResidualsR=residuals for the
(SS ResidualsR −SS ResidualsUR ) restricted model Tests the joint significance of a subset
F-Test for Multiple q ResidualsUR=residuals for of several variables in a regression to
F=
Linear Restrictions SS ResidualsUR the unrestricted model see if, as a group, they are significant;
n−k −1 k=# of regressors if F > critical value, reject H0
q=# of restrictions
Probability of y given x
Binary Response P ( y=1∨x )=G ( β 0 + β 1 x 1 +…+ β k x k ) In binary response models, y is either 0
G a function that defines
Model or 1; this models probability y=1
relationship between y, x
Logistic Function G(z) is always greater than 0 and less
ez z = linear function of the
(Estimated G ( z ) = Λ ( z )= than 1 (symmetry)
(1+e z ) x’s
Probability) G ( z ) =1−G(−z )
Latent Continuous y i=1 if y i¿ >0 Latent variable can shift person from 0
Variable y i¿ =β 0+ β1 x1 i +…+ β k x ki +ε i y i=0 if yi¿ ≤ 0 to 1 at a certain threshold; OR for x1 is
(Population Logit) εi = standard logistic eβ
1

Probability of y given x is a Occurs due to symmetry of the logistic


Logistic Function ez
P ( y=1|x )= function of the logistic function;
Estimation (1+e z) function G(z) z=β0+β1x1+…+βkxk
P( A) Probability an event occurs
Odds= =e z If > 1, likely to occur; if < 1, not likely
Odds divided by probability that
[ 1−P ( A ) ] to occur
even does not occur
P(A) Can never be negative; OR = 0 if both
Relationship between
Odds 1−P ( A ) groups equally likely to have y=1;
Odds Ratio Odds Ratio= = predictor and outcomes
Odds P(A) when x1 increases by 1 unit, odds of
variables
1−P ( A ) y=1 multiply by eβ1
P
Logit (Log Odds) L=ln
[
(1−P) ]
=z=β 0 + β 1 x 1+ …+ β k x k Natural log of the odds

You might also like