Cheat Sheet - Test 3
Cheat Sheet - Test 3
√ R2
Multiple R Coefficient of
distribution, indicating how much data falls in the tails compared to a 4. Calculate test statistic (Zdata or Tdata): Correlation R|
normal distribution. Situation Formula Reasoning
Formula:
> formula for z score: (𝒙̅ – population mean)/stdev x̅ −μ Population stdev is
known.
R square Coefficient of
Determination SSR
R2 or or
> table: Use the row for the first decimal place (e.g., 2.x) and the (variation in
column for the hundredths (e.g., .04). Gives the area from the
mean up to that z-score.
Zdata = σ
Assume data is normally
distributed or n >= 30 dependent variable
explain by variation
SST
> Both bounds below the mean: Subtract the areas between z-
SSE
√n
Population in independent
scores.
mean (µ) / variable) 1−
> One bound below, one above: Add the areas on either side of
the mean.
Sample mean
(𝒙̅) x̅ −μ Population stdev is
unknown & n<30
SST
area from the mean to 𝑋 from 0.5.
> More than the mean (upper bound is infinity): Subtract the Explained
T dist, df = n-1
>For Cut-off Points Identify the area (probability) first, and then
Tdata = s (sample size is
disregarded for actual
variation/total
variation
+ 𝑍 × stdev
use the z-score to calculate the value of X with the formula: = mean
√n application just Adjusted R CoD adjusted for DF
SSE
𝑧𝑑𝑎𝑡𝑎 =
remember for mcq) Square (R2 adjusted for the
Excel usage: Population proportion is no. preductors
Probability | Area under the curve: NORM.DIST(x, mean, stdev,
^p− p known. n−k−1
TRUE) – {in this case, mean and stdev are known. As is for exact. Population
Proportion ¿
Assume np >= 5 and
1−
SST
√
Less than | less than or norm.dist(x, mean, stdev, n(1-p) >= 5
(𝒑) / Sample
equal to
Greater than | greater than
true)
1-norm.dist(x, mean, stdev, Proportion p ( 1− p )
or equal to true) (𝒑̂) n−1
Atleast =1 - NORM.DIST(x, mean,
stdev, TRUE)
n or
5. Calculate pvalue:
Between NORM.DIST(X2, mean, stdev, Through Z score Through T Test ( 1−R 2 ) ( n−
TRUE) - NORM.DIST(X1,
mean, stdev, TRUE)
Two Tailed 2 * (1 – norm.s.dist) T.DIST.2T(x, 1−
Probability when only z score is known: NORM.S.DIST(z, TRUE)
degree of
freedom, n−k−1
Less than | less than or Norm.s.dist(z,true)
√
true/false) Standard Average distance that
equal to
Greater than | greater than 1-norm.s.dist(z,true)
One Tailed norm.s.dist (less than)
or 1 – norm.s.dist
t.dist(x,df,true) or
1-t.dist(x,df,true)
Error the observed values
fall from the
SSE
or equal to
Atleast 1 - NORM.S.DIST(z, TRUE)
(greater than) regression line
n−k −1
Between NORM.S.DIST(z2, TRUE) - 6. conclude whether to reject or accept the null hypothesis: Observations Number of n
NORM.S.DIST(z1, TRUE) Test Type Alternative Reject Null observations k: Represents the
Cut-Off Point: NORM.INV(probability, mean, stdev) Hypothesis (H1) hypothesis when:
Top x% à cut off point (raw x NORM.INV(1-x, mean, K is 1 in simple linear number of predictors
Left-Tailed H1: μ < μ0 zdata ≤ zstat | tstat ≤ tstat regression (independent
score) stdev) Right-Tailed H1: μ > μ0 zdata ≥ zstat | tdata ≥ tstat
Bottom x% NORM.INV(x, mean, stdev) variables) in the
Two-Tailed H1: μ ≠ μ0 |zdata| ≥ |zstat| same for regression model.
Z-score: NORM.S.INV(probability) ; only when prob is known and z t
score is required.
> Type I Error (False Positive) 𝛼: Rejecting a true null
Type of Errors
If no more than x%, then do norm.inv(x,mean,stdev). > In general the higher the value of R2, the better the model fits the
Exponential Distribution
> Type II Error (False Negative) 𝛽: Failing to reject a false null
hypothesis; probability data.
> smaller value of lambda – flatter the curve. R2 = 1: Perfect match between the line and the data points.
> mean = stdev = 1/λ(the λ is rate constant; lamda): R2 = 0: There are no linear relationship between x and y.
hypothesis;
lambda=1/mean
Lower 𝛼 (reduce Type I) → Higher 𝛽 (increase Type II).
Inverse Relationship: >use R to determine the Quality of Regression and R2 to determine
<=, less than expon.dist(X, LAMDA, FALSE)
Higher 𝛼 (increase Type I) → Lower 𝛽 (reduce Type II).
Percentage explained
BETWEEN =EXPON.DIST(b, lambda, TRUE) -
EXPON.DIST(a, lambda, TRUE) > variance of error is supposed to be constant, violation is called
Mitigation: Increasing sample size reduces both errors.
>=, greater than, at 1 – expon.dist( X EXACT, LAMDA, heteroscedasticity.
Probability of a Type II Error
least TRUE) > relationship between type I and type II: use to calculate and > ε (Greek letter epsilon à random term (error variable): diff
CONFIDENCE INTERVAL deduce. between actual and estimated. If std error sε is small, the fit is good.
Statistical inference: 2 types à (1) Estimation and (2) Hypothesis > steps: If sε is large, the model is poor (compare with sample mean of
Testing. (1) lay down hypothesis. dependent variable)
> Estimation: determine the approximate value of a population (2) calculate the Critical value=μ0+(zα×σ/√n)
parameter on the basis of sample statistics—two types: (1) Point and (3) Type II Error calculation à Zscore :
(2) Interval Estimator.
> confidence interval: 95% | alpha = 100-CL | alpha/2 would be
(100-CL)/2 ; (100-95)/2 = 2.5%
critical value−true mean
> confidence interval = sample mean +- margin of error | sample
stdev population ,
mean: (upper number - lower number)/2
standard error ( ) Four conditions
√n
> probability for z score would be cl + alpha/2 = 97.5% in this For these regression methods to be valid the following four
case. | use norm.s.inv to get z score. conditions for the error variable ε must be met:
> interval width: wide interval provides little information. corresponding pvalue through norm.s.dist is the probability of a Type • The probability distribution of ε is normal.
Formulas II error. • The mean of the distribution is 0; that is, E(ε ) = 0.
Item Population Mean Population Proportion T-test • The standard deviation of ε is σε , which is a constant regardless
> n-1 = degree of freedom defines t distribution. ; increase in of the value of x.
σ √ p−¿^ ¿ ¿
Margin of
error Z* (se) degree of freedom the t distribution approaches the standard normal • The value of ε associated with any particular value of y is
z* (se) distribution. independent of ε associated with any other value of y.
√n required conditions: np
>=10, n(1-p)>=10
> Cut-off weight=μ0+(t-critical×sxˉ)
Distribution
Lower (mean of sample) p-hat - z* σp-hat > T.DIST à left tailed (x, degree of freedom, true/false) ; x is the t-
confidence – Zσx (ME) value (less than or equal to) P(t<=x)
limit > T.DIST.2T à Two tailed (x, degree of freedom, true/false);
extreme values at both tail ends
Upper (mean of sample) p-hat + z* σp-hat > T.DIST.RT -à Right tailed (x, degree of freedom); complement
confidence + Zσx (ME) p-hat = % of success of left tailed (greater than specified value) ; P(t >= x)
Limit Critical T Value
Sample size ( > T.INV à LEFT TAILED ; Find the value of t when the probability is
¿^
given. T.inv(p, df)
Z∗σ p− ¿
> T.INV.2T is used for two tailed T value extraction.
Regression and ANOVA
Margin of Error Theory
)2 ANOVA
Sample Size, Confidence Interval, Confidence level
HYPOTHESIS TESTING Source of sum of degrees mean f-test
validatio squares of square statistic A larger sample size narrows the confidence interval, increasing
> null hypothesis H0 = μ n freedom precision, while a higher confidence level widens the interval to
ensure greater certainty that the interval contains the true
> alternative hypothesis à Treatmen SSTR df1 = k-1 MSTR = Fdata =
parameter.
statement opposite to Null
Hypothesis ( >, <, <>)
t
SSTR MSTR Relation:
> A small p value Larger sample size narrows CI.
(<0.05), reject the H0 .
> A large p value
k−1 MSE Higher confidence level widens CI (if sample size is fixed).
Additions after revisions
Error SSE df2 = n1-k MSE=
(>=0.05), not reject H0.
> p-value measures the
strength of evidence against H0
SSE ANOVA.
> Null hypothesis (H0)= everything is equal. | Alternative
hypothesis (H1) = differ (at least 2 – could be more but difference is
STEPS
1. State Hypothesis and define the type of test (1tailed – 2tailed) n−k the focus)
2. Draw graphs (picture above) Total sst > use F-distribution to test hypothesis. Total area under F curve =
3. Determine critical value à zscore of significance level 1
Alpha Tail(s) z-score Result REGRESSION > F is never negative ; starts at 0 and extends indefinitely to the
Level calculation right but never meets the horizontal axis. It is right skewed.
0.05 (one- Left NORM.S.INV(0.05) -1.6449 Actual Formulaic Relation Different F curve based on the degree of freedom(df).
Formula
> Mean Square Treatment (MSTR or MST): measures the variability
in the sample means.
> Mean Square Error (MSE): measures the variability within the
sample. Fdata=
MSTR . Fdata measures the variability among
MSE
the sample means, compared to the variability within the samples.
>Fdata MST compared to MSE follows an F distribution with df 1 = k-
1 & df2=nt-k (k: number of groups or treatments being compared
| n: total number of observations across all groups. )
> compare Fdata and Fcriticial to decide whether to reject hypothesis.
Can also use p value to draw the same conclusion.
To calculate:
Pvalue: F.dist.rt(fstat, df-treatement, df-error)
Fcriticial: f.inv.rt(significance alpha, df-treatment, df-error)
> Randomized Block Analysis of Variance: The purpose of
designing a randomized block experiment is to reduce the within-
treatments variation to more easily detect differences between
the treatment means.
> Blocking reduces variation in your results. effects of some outside
variables by bringing those variables into the experiment to form the
blocks. Separate conclusions can be made from each block,
making for more precise conclusions.