0% found this document useful (0 votes)
4 views12 pages

Statistics Formula Cheatsheet

The document provides an overview of statistical concepts including population, sample, mean, variance, and standard deviation, along with methods for descriptive statistics and hypothesis testing. It discusses the importance of sample size, the effects of outliers, and the use of linear transformations, as well as probability rules and distributions. Additionally, it covers confidence intervals, error types, and regression analysis, emphasizing the significance of understanding data relationships and statistical measures.

Uploaded by

SANDEEP KUMAR
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views12 pages

Statistics Formula Cheatsheet

The document provides an overview of statistical concepts including population, sample, mean, variance, and standard deviation, along with methods for descriptive statistics and hypothesis testing. It discusses the importance of sample size, the effects of outliers, and the use of linear transformations, as well as probability rules and distributions. Additionally, it covers confidence intervals, error types, and regression analysis, emphasizing the significance of understanding data relationships and statistical measures.

Uploaded by

SANDEEP KUMAR
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Population ­ entire collection of objects or ➔ Mean ­ arithmetic average of data ➔ Variance ­ the average distance

individuals about which information is desired. values squared


➔ easier to take a sample ◆ **Highly susceptible to n
∑ (xi x)2
◆ Sample ­ part of the population extreme values (outliers).
that is selected for analysis Goes towards extreme values
sx2 = i=1
n 1
◆ Watch out for: ◆ Mean could never be larger or
● Limited sample size that smaller than max/min value but ◆ sx2 gets rid of the negative
might not be values
could be the max/min value
representative of
◆ units are squared
population
◆ Simple Random Sampling­ ➔ Median ­ in an ordered array, the
Every possible sample of a certain median is the middle number ➔ Standard Deviation ­ shows variation
size has the same chance of being ◆ **Not affected by extreme about the mean
values


selected n
∑ (xi x)2
i=1
Observational Study ­ there can always be ➔ Quartiles ­ split the ranked data into 4 s= n 1
lurking variables affecting results equal groups
➔ i.e, strong positive association between ◆ Box and Whisker Plot ◆ highly affected by outliers
shoe size and intelligence for boys ◆ has same units as original
➔ **should never show causation data
◆ finance = horrible measure of
Experimental Study­ lurking variables can be risk (trampoline example)
controlled; can give good evidence for causation

Descriptive Statistics Part I


Descriptive Statistics Part II
➔ Summary Measures
Linear Transformations
➔ Range = X maximum X minimum
◆ Disadvantages: Ignores the
way in which data are
distributed; sensitive to outliers

➔ Interquartile Range (IQR) = 3rd


➔ Linear transformations change the
quartile ­ 1st quartile
center and spread of data
◆ Not used that much
◆ Not affected by outliers ➔ V ar(a + bX) = b2 V ar(X)
➔ Average(a+bX) = a+b[Average(X)]
➔ Effects of Linear Transformations: Skewness ◆ Correlation doesn't imply
◆ meannew = a + b*mean ➔ measures the degree of asymmetry causation
◆ mediannew = a + b*median exhibited by data ◆ The correlation of a variable
◆ stdev new = |b| *stdev ◆ negative values= skewed left with itself is one
◆ IQRnew = |b| *IQR ◆ positive values= skewed right
➔ Z­score ­ new data set will have mean ◆ if |skewness| < 0.8 = don't need Combining Data Sets
0 and variance 1 to transform data ➔ Mean (Z) = Z = aX + bY
z= X X ➔ Var (Z) = sz2 = a2 V ar(X) + b2 V ar(Y ) +
S Measurements of Association 2abCov(X, Y )
➔ Covariance
Empirical Rule
◆ Covariance > 0 = larger x, Portfolios
➔ Only for mound­shaped data
larger y ➔ Return on a portfolio:
Approx. 95% of data is in the interval:
◆ Covariance < 0 = larger x,
(x 2sx , x + 2sx ) = x + / 2sx smaller y
➔ only use if you just have mean and std.
Rp = w A RA + w B RB
n
1
dev. ◆ sxy = n 1 ∑ (x x)(y y)
i=1 ◆ weights add up to 1
Chebyshev's Rule ◆ Units = Units of x Units of y ◆ return = mean
➔ Use for any set of data and for any ◆ Covariance is only +, ­, or 0 ◆ risk = std. deviation
number k, greater than 1 (1.2, 1.3, etc.) (can be any number)
➔ 1 1 ➔ Variance of return of portfolio
2
k ➔ Correlation ­ measures strength of a
➔ (Ex) for k=2 (2 standard deviations), linear relationship between two sp2 = wA2 sA2 + wB2 sB2 + 2wA wB (sA,B )
75% of data falls within 2 standard variables
deviations covariancexy
◆ r xy = (std.dev. )(std. dev. ) ◆ Risk(variance) is reduced when
x y
Detecting Outliers stocks are negatively
◆ correlation is between ­1 and 1
➔ Classic Outlier Detection correlated. (when there's a
◆ Sign: direction of relationship
◆ doesn't always work negative covariance)
◆ Absolute value: strength of
◆ |z | = || X S X || ≥ 2 relationship (­0.6 is stronger
➔ The Boxplot Rule relationship than +0.4)
Probability
◆ Value X is an outlier if: ➔ measure of uncertainty
X<Q1­1.5(Q3­Q1) ➔ all outcomes have to be exhaustive
or (all options possible) and mutually
X>Q3+1.5(Q3­Q1) exhaustive (no 2 outcomes can
occur at the same time)
Probability Rules ➔ Another way to find joint probability: ➔ Expected Value Solution =
1. Probabilities range from P (A and B) = P (A|B) P (B)
0 ≤ P rob(A) ≤ 1 P (A and B) = P (B|A) P (A) E M V = X 1 (P 1 ) + X 2 (P 2 )... + X n (P n )
2. The probabilities of all outcomes must
add up to 1 2 x 2 Table
3. The complement rule = A happens
or A doesn't happen
P (A) = 1 P (A)
Decision Tree Analysis
P (A) + P (A) = 1 ➔ square = your choice
4. Addition Rule: ➔ circle = uncertain events
P (A or B) = P (A) + P (B) P (A and B)

Contingency/Joint Table Discrete Random Variables


➔ To go from contingency to joint table, ➔ P X (x) = P (X = x)
divide by total # of counts
➔ everything inside table adds up to 1 Expectation

Conditional Probability ➔ μx = E(x) = ∑ xi P (X = xi )


➔ P (A|B)
P (A and B) Decision Analysis ➔ Example: (2)(0.1) + (3)(0.5) = 1.7
➔ P (A|B) = P (B) ➔ Maximax solution = optimistic
➔ Given event B has happened, what is approach. Always think the best is Variance
the probability event A will happen? going to happen ➔ σ 2 = E (x2 ) μx2
➔ Look out for: "given", "if" ➔ Maximin solution = pessimistic ➔ Example:
approach. (2)2 (0.1) + (3)2 (0.5) (1.7)2 = 2.01
Independence
➔ Independent if: Rules for Expectation and Variance
P (A|B) = P (A) or P (B|A) = P (B) ➔ μs = E (s) = a + bμx
➔ If probabilities change, then A and B
➔ Var(s)= b2 σ 2
are dependent
➔ **hard to prove independence, need
Jointly Distributed Discrete Random
to check every value
Variables
➔ Independent if:
Multiplication Rules
➔ If A and B are INDEPENDENT:
P x,y (X = x and Y = y ) = P x (x) P y (y)
P (A and B) = P (A) P (B)
➔ Combining Random Variables 2.) All Successes Continuous Probability Distributions
◆ If X and Y are independent: P (all successes) = pn ➔ the probability that a continuous
3.) At least one success random variable X will assume any
E (X + Y ) = E (X) + E (Y ) P (at least 1 success) = 1 (1 p)n particular value is 0
V ar(X + Y ) = V ar(X) + V ar(Y ) 4.) At least one failure ➔ Density Curves
P (at least 1 f ailure) = 1 pn ◆ Area under the curve is the
◆ If X and Y are dependent: 5.) Binomial Distribution Formula for probability that any range of
E (X + Y ) = E (X) + E (Y ) x=exact value values will occur.
V ar(X + Y ) = V ar(X) + V ar(Y ) + 2Cov(X, Y ) ◆ Total area = 1

➔ Covariance: Uniform Distribution


C ov(X, Y ) = E (XY ) E (X)E(Y )
➔ If X and Y are independent, Cov(X,Y)
=0

6.) Mean (Expectation) ◆ X ~ U nif (a, b)


μ = E (x) = np
7.) Variance and Standard Dev. Uniform Example
σ 2 = npq
σ = √npq
q=1 p

Binomial Example

Binomial Distribution
➔ doing something n times
➔ only 2 outcomes: success or failure
➔ trials are independent of each other (Example cont'd next page)
➔ probability remains constant

1.) All Failures


P (all f ailures) = (1 p)n
X μ
➔ Z = σ/√n

Sums of Normals
➔ Mean for uniform distribution:
(a+b)
E (X) = 2
➔ Variance for unif. distribution:
(b a)2
V ar(X) = 12 Confidence Intervals = tells us how good our
estimate is
Normal Distribution Sums of Normals Example: **Want high confidence, narrow interval
➔ governed by 2 parameters: **As confidence increases , interval also
μ (the mean) and σ (the standard increases
deviation)
➔ X ~ N (μ, σ 2 ) A. One Sample Proportion

Standardize Normal Distribution:


X μ
Z= σ
➔ Z­score is the number of standard
deviations the related X is from its
︿ number of successes in sample
mean ➔ Cov(X,Y) = 0 b/c they're independent ➔ p = nx = sample size
➔ **Z< some value, will just be the
probability found on table Central Limit Theorem
➔ **Z> some value, will be ➔ as n increases,
(1­probability) found on table ➔ x should get closer to μ (population ➔
mean) ➔ We are thus 95% confident that the true
➔ mean( x) = μ population proportion is in the interval…
Normal Distribution Example ︿
➔ variance (x) = σ 2 /n ➔ We are assuming that n is large, n p >5 and
2 our sample size is less than 10% of the
➔ X ~ N (μ, σn ) population size.
◆ if population is normally distributed,
n can be any value
◆ any population, n needs to be ≥ 30
Standard Error and Margin of Error B. One Sample Mean *Stata always uses the t­distribution when
For samples n > 30 computing confidence intervals
Confidence Interval:

Hypothesis Testing
➔ Null Hypothesis:
➔ H 0 , a statement of no change and is
➔ If n > 30, we can substitute s for assumed true until evidence indicates
σ so that we get: otherwise.
➔ Alternative Hypothesis: H a is a
statement that we are trying to find
evidence to support.
Example of Sample Proportion Problem ➔ Type I error: reject the null hypothesis
when the null hypothesis is true.
(considered the worst error)
➔ Type II error: do not reject the null
hypothesis when the alternative
hypothesis is true.

Example of Type I and Type II errors

Determining Sample Size


︿ ︿
(1.96)2 p(1 p)
n= e2 ︿
➔ If given a confidence interval, p is For samples n < 30
the middle number of the interval
➔ No confidence interval; use worst
case scenario
︿
◆ p =0.5
T Distribution used when:
➔ σ is not known, n < 30, and data is Methods of Hypothesis Testing
normally distributed 1. Confidence Intervals **
2. Test statistic
3. P­values **
➔ C.I and P­values always safe to do
because don’t need to worry about
size of n (can be bigger or smaller
than 30)
One Sample Hypothesis Tests
1. Confidence Interval (can be
used only for two­sided tests)

4. P­Values
➔ a number between 0 and 1
➔ the larger the p­value, the more
consistent the data is with the null
➔ the smaller the p­value, the more
consistent the data is with the
2. Test Statistic Approach alternative
(Population Mean) ➔ **If P is low (less than 0.05),
3. Test Statistic Approach (Population
H 0 must go ­ reject the null
Proportion)
hypothesis
Two Sample Hypothesis Tests ➔ Test Statistic for Two Proportions 2. Comparing Two Means (large
1. Comparing Two Proportions independent samples n>30)
(Independent Groups)
➔ Calculate Confidence Interval ➔ Calculating Confidence Interval

➔ Test Statistic for Two Means

Matched Pairs
➔ Two samples are DEPENDENT
Example:
︿
➔ Interpretation of slope ­ for each ➔ corr (Y , e) = 0
additional x value (e.x. mile on
odometer), the y value decreases/ A Measure of Fit: R2
increases by an average of b1 value
➔ Interpretation of y­intercept ­ plug in
︿
0 for x and the value you get for y is
the y­intercept (e.x.
y=3.25­0.0614xSkippedClass, a
student who skips no classes has a
gpa of 3.25.)
➔ **danger of extrapolation ­ if an x
value is outside of our data set, we
can't confidently predict the fitted y ➔ Good fit: if SSR is big, SEE is small
value ➔ SST=SSR, perfect fit
Simple Linear Regression
➔ R2 : coefficient of determination
➔ used to predict the value of one
variable (dependent variable) on the Properties of the Residuals and Fitted R2 = SSTSSR
= 1 SSE SST
basis of other variables (independent Values ➔ R2 is between 0 and 1, the closer R2
variables) 1. Mean of the residuals = 0; Sum of is to 1, the better the fit
︿ the residuals = 0
➔ Y = b0 + b1 X ➔ Interpretation of R2 : (e.x. 65% of the
︿ 2. Mean of original values is the same variation in the selling price is explained by
➔ Residual: e = Y Y f itted ︿
as mean of fitted values Y = Y the variation in odometer reading. The rest
➔ Fitting error: 35% remains unexplained by this model)
︿
ei = Y i Y i = Y i b0 bi X i ➔ ** R2 doesn’t indicate whether model
◆ e is the part of Y not related is adequate**
to X ➔ As you add more X’s to model, R2
➔ Values of b0 and b1 which minimize goes up
the residual sum of squares are: ➔ Guide to finding SSR, SSE, SST
sy
(slope) b1 = r s
x
b0 = Y b1 X 3.
4. Correlation Matrix
Assumptions of Simple Linear Regression Example of Prediction Intervals: Regression Hypothesis Testing
1. We model the AVERAGE of something *always a two­sided test
rather than something itself ➔ want to test whether slope ( β 1 ) is
needed in our model
2. ➔ H 0 : β 1 = 0 (don’t need x)
H a : β 1 =/ 0 (need x)
➔ Need X in the model if:
a. 0 isn’t in the confidence
interval
Standard Errors for b1 and b0 b. t > 1.96
➔ standard errors when noise c. P­value < 0.05
➔ sb0 amount of uncertainty in our
estimate of β 0 (small s good, large s Test Statistic for Slope/Y­intercept
bad) ➔ can only be used if n>30
➔ sb1 amount of uncertainty in our
➔ if n < 30, use p­values
estimate of β 1

◆ As ε (noise) gets bigger, it’s


harder to find the line

Confidence Intervals for b1 and b0


Estimating S e
2 ➔
➔ S e = SSE n 2
➔ S e2 is our estimate of σ 2


➔ S e = S e2 is our estimate of σ
➔ 95% of the Y values should lie within ➔
+
the interval b0 + b1 X 1.96S e

➔ n small → bad
se big → bad
s2x small→ bad (wants x’s spread out for
better guess)
Multiple Regression

➔ Variable Importance:
◆ higher t­value, lower p­value =
variable is more important
◆ lower t­value, higher p­value =
variable is less important (or not
Interaction Terms
needed)
➔ allow the slopes to change
➔ interaction between 2 or more x
Adjusted R­squared variables that will affect the Y variable
➔ k = # of X’s
Modeling Regression How to Create Dummy Variables (Nominal
Backward Stepwise Regression Variables)
1. Start will all variables in the model ➔ If C is the number of categories, create
2. at each step, delete the least important (C­1) dummy variables for describing
➔ Adj. R­squared will as you add junk x variable based on largest p­value above the variable
variables 0.05 ➔ One category is always the
➔ Adj. R­squared will only if the x you 3. stop when you can’t delete anymore
“baseline”, which is included in the
add in is very useful ➔ Will see Adj. R­squared and Se
intercept
➔ **want Adj. R­squared to go up and Se
low for better model Dummy Variables
➔ An indicator variable that takes on a
The Overall F Test value of 0 or 1, allow intercepts to
change

➔ Always want to reject F test (reject


null hypothesis)
Recoding Dummy Variables
➔ Look at p­value (if < 0.05, reject null)
Example: How many hockey sticks sold in
➔ H 0 : β 1 = β 2 = β 3 ... = β k = 0 (don’t
the summer (original equation)
need any X’s) hockey = 100 + 10W tr 20Spr + 30F all
H a : β 1 = β 2 = β 3 ... = β k =/ 0 (need at Write equation for how many hockey sticks
least 1 X) sold in the winter
➔ If no x variables needed, then SSR=0 hockey = 110 + 20F all 30Spri 10Summer
and SST=SSE ➔ **always need to get same exact
values from the original equation
Regression Diagnostics so that we can compare models. ◆ Homoskedastic: band around the
Standardize Residuals Can’t compare models if you take log values
of Y. ◆ Heteroskedastic: as x goes up,
◆ Transformations cheatsheet the noise goes up (no more band,
fan­shaped)
Check Model Assumptions ◆ If heteroskedastic, fix it by
➔ Plot residuals versus Yhat logging the Y variable
◆ If heteroskedastic, fix it by
making standard errors robust

➔ Multicollinearity
◆ when x variables are highly
correlated with each other.
◆ ovtest: a significant test ◆ R2 > 0.9
statistic indicates that ◆ pairwise correlation > 0.9
➔ Outliers polynomial terms should be ◆ correlate all x variables, include
◆ Regression likes to move added y variable, drop the x variable
towards outliers (shows up ◆ H 0 : data = no transf ormation that is less correlated to y
as R2 being really high) H a : data =/ no transf ormation
◆ want to remove outlier that is Summary of Regression Output
extreme in both x and y
➔ Nonlinearity (ovtest)
◆ Plotting residuals vs. fitted
values will show a
relationship if data is ➔ Normality (sktest)
nonlinear ( R2 also high) ◆ H 0 : data = normality
H a : data =/ normality
◆ don’t want to reject the null
hypothesis. P­value should
be big

◆ Log transformation ­
accommodates non­linearity,
reduces right skewness in the Y, ➔ Homoskedasticity (hettest)
eliminates heteroskedasticity ◆ H 0 : data = homoskedasticity
◆ **Only take log of X variable ◆ H a : data =/ homoskedasticity

You might also like