0% found this document useful (0 votes)
15 views8 pages

FlowChart V20

The document outlines various data types including categorical (nominal and ordinal) and numerical (discrete and continuous), along with their definitions and examples. It also covers statistical concepts such as population, sample, parameter, statistic, and different sampling methods. Additionally, it discusses hypothesis testing steps and confidence intervals, providing formulas and calculations for various statistical measures.

Uploaded by

Wilson
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views8 pages

FlowChart V20

The document outlines various data types including categorical (nominal and ordinal) and numerical (discrete and continuous), along with their definitions and examples. It also covers statistical concepts such as population, sample, parameter, statistic, and different sampling methods. Additionally, it discusses hypothesis testing steps and confidence intervals, providing formulas and calculations for various statistical measures.

Uploaded by

Wilson
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Data Types Categorical Data that is grouped by categories.

Population Sample
Definition Pronunciation Pronunciation
Categorical data that has no particular Parameters Statistics
Nominal
order. Mean μ mu x̅ x bar
Categorical data that has a logical order. Standard Deviation σ sigma s s
Ordinal
Example Likert scale.
Variance σ2 sigma squared s2 s squared
Numerical Data that is grouped by numbers.
Proportion π pi p p
Data is grouped in fixed steps. Not possible
Discrete to have answer between steps. Example: Slope β1 beta one b1 b one
number of children in family Coefficient of
Data may have any value along a given ρ rho r r
Correlation
Continuous
scale. Example: weight, time speed.
Interval Numerical data with arbitrary 0 Inequalities

= ≠ ≤ ≥ < >
Ratio Numerical data with real 0
less than or greater than
Basic Vocab Population Entire group of interest. equal to not equal to less than greater than
equal to or equal to
Parameter Some truth about the population.
Subset of population. A good sample is is different at most at least under over
Sample
representative of population. at a at a
not fewer than more than
Statistic Some truth about the sample. maximum minimum
Variable What is measured or observed no more
no less than smaller larger
than
Data List of results from sample
value or
A sample where the entire population is value or less exceeds
Census more
measure or observed. Very rare.
Formula Syntax Description Example Usage
Descriptive
Some result/truth from the sample Adds all the numbers in a
Statistic =SUM(range) =SUM(A1:A10)
range.
Results/truth from the sample that are used Multiplies all numbers in a
Inferential =PRODUCT(range) =PRODUCT(A1:A10)
to make a conclusion about the entire range.
Statistic
population. Counts the number of non-
=COUNTA(range) =COUNTA(A1:A10)
Purpose of Purpose of empty cells in a range.
Reveal something previously unknown.
Question Question =COUNTIF(range, Counts the number of cells that =COUNTIF(A1:A10,
Recording something observed. There is no criteria) meet a specified condition. ">10")
Observational intervention here. Correlations may be Returns the absolute value of a
=ABS(number) =ABS(-5)
Experiment found but does not establish cause and number.
effect. =ROUND(number, Rounds a number to a specified
=ROUND(3.14159, 2)
Some treatment is given to group. Results digits) number of digits.
Experimental Rounds a number up to the
are observed and recorded. This is how =CEILING.MATH(7.3,
Experiment =CEILING.MATH(number nearest multiple of a specified
cause and effect is able to be established. 1)
) value.
Sampling
Good Sample Sample well represents the population. Rounds a number down to the
Methods
nearest multiple of a specified =FLOOR.MATH(7.3, 1)
Simple Every member of population is equally =FLOOR.MATH(number)
value.
Random likely to be observed. A subset of them are
Sample randomly selected.
Calculates the average (mean)
Systematic Every member of population is lined up and =AVERAGE(range) =AVERAGE(A1:A10)
of numbers in a range.
Sample kth member is chosen Returns the median (middle
Population is broken up into groups. The =MEDIAN(range) =MEDIAN(A1:A10)
Cluster value) of a range.
groups are randomly selected. Every Returns the most frequently
Sample =MODE.MULT(range)
member in the selected group is measured. occurring value in a range. =MODE.SNGL(A1:A10)
Some demographic is selected and the Returns the sample standard
Stratified =STDEV.S(range) =STDEV.S(A1:A10)
sample matches the ratio of the selected deviation.
Sample
demographic. Returns the sample variance of
=VAR.S(range) =VAR.S(A1:A10)
Convenience Taking a sample of just where it is easy to a dataset.
Sample gather data. Finds the smallest value in a
=MIN(range) =MIN(A1:A10)
Good range.
Good Directly answers the question and is free of Finds the largest value in a
Measureme =MAX(range) =MAX(A1:A10)
Measurement bias range.
nt
Finds the skewness of the data
Survivorship Only measuring members who make it to =SKEW(range) =SKEW(A1:A10)
(- is left skew, + is right skew)
Bias the end of a study.
Finds the kurtosis of the data
Recall Bias Relying upon people’s memory. =KURT(range) (Mesokuritc = 3, Leptokurtic > =KURT(A1:A10)
Who is funding the study? Is it an impartial 3,Platykurtic < 3)
Funding Bias 3rd party or do they have an interest in
making the study have a particular result?
Cause and Correlation being misidentified as
Effect Bias causation.
People get the opportunity to opt in or out
Selection Bias
of the study.
Confirmation Only collecting data to support a
Bias conclusion.
CLTRules Key Variables: Calculate:
μ = pop. Mean Given
Data: Numerical: Means μx̄ = mean of sampling dist wrt x̄ =μ
Numerical: Means
Sampling distribution of x̄ is approximatelynormal if either n = sample size Given or =COUNTA(range)
σ = pop. std. dev. Given
1. the orignial dist. of xis normal
std. error OR
OR σx̄ = SE = = σ/sqrt(n)
std. dev of sampling dist wrt x̄
2. n >30
x̄= sample mean Given or =AVERAGE(range)
z = Z score (# of SE from mean) =STANDARDIZE(x̄ , μ , SE)

Less than: Areadleft: P( x̄ <x̄crit) =? Cuttoff for Bottom%: P(x̄ <?) =prob Zscore

Quantiles (Crit. sample mean)


Probability(Area under curve)

=norm.dist(x̄crit,μ,SE,TRUE) =norm.inv(prob,μ,SE) Z=(x̄ - μ) / SE

Z-Manipulation
Greater than: Arearight: P( x̄ >x̄crit ) =? Cuttoff for Top%: P(x̄ >?) =prob Solve for x̄

=1- norm.dist(x̄crit,μ,SE,TRUE) =norm.inv(1-prob,μ,SE) x̄ =Z*SE +μ

Between: Areabetween: P( x̄lower <x̄ <x̄upper ) =? Cuttoffs for Middle %: P(? <x̄ <?) =prob
=norm.dist(x̄upper,μ,SE,TRUE) x̄lower =norm.inv(0.5- prob/2,μ,SE)
- norm.dist(x̄lower,μ,SE,TRUE) x̄upper =norm.inv(0.5+prob/2,μ,SE)

Prob and Quantile: Prob and Quantile:


Means Proportions

Blue: What you provide function Red: What function


calculates
CLTRules Key Variables: Calculate:
Data: Categorical: Proporitons
Categorical: Proporitons

π = pop. proportion Given


μp = mean of sampling dist wrt p =π
Sampling distribution of p is approximatleynormal if BOTH n = sample size Given or =COUNTA(range)
std. error OR
n*π >15 (#of successes) σp = SE = = sqrt(π*(1-π)/n)
std. dev of sampling dist wrt p
AND
n*(1-π) >15 (#of failures) k = # with the trait of interest Given or =COUNTIF(range,criteria)
p = sample proportion Given or =k/n
Z = Z score (# of SE from mean) =STANDARDIZE(p , π, SE)

Less than: Arealeft: P(p<pcrit) =? Cuttoff for Bottom%: P(p<?) =prob Zscore
Probability(Area under curve)

Quantiles (Crit. sample prop.)

=norm.dist(pcrit,π,SE,TRUE) =norm.inv(prob,π,SE) Z=(p - π) / SE


Z-Manipulation

Greater than: Arearight: P(p>pcrit) =? Cuttoff for Top%: P(p>?) =prob Solve for p

=1- norm.dist(pcrit,π,SE,TRUE) =norm.inv(1-prob,π,SE) p =Z*SE +π

Between: Areabetween: P(plower<p<pupper) =? Cuttoffs for Middle %: P(? <p<?) =prob Solve for k
=norm.dist(pupper,π,SE,TRUE) plower =norm.inv(0.5- prob/2,π,SE) k =p * n
- norm.dist(plower,π,SE,TRUE) pupper =norm.inv(0.5+prob/2,π,SE) [typicallyround up]
Key Variables: Calculate:
μ є x̄ ± MoE
CL = conf. level Given or 1 - α

2 Τailed
α = CI miss rate Given or 1 - CL L: x̄ - MoE

U: x̄ + MoE
n = sample size Given or =COUNTA(range)
x̄ = sample mean Given or =AVERAGE(range)
Pop. Standard Deviation μ ≥ x̄ - z1tail * SE

Lower
σ = pop. st. dev Given

Tail
SE = standard error = σ/sqrt(n) L: x̄ - z1tail * SE
z2tail = 2 tailed zcrit =NORM.S.INV(1 - α/2) U: INF
σ Known

z1tail = 1 tailed zcrit =NORM.S.INV(1 - α)


μ ≤ x̄ + z1tail * SE
= z2tail * SE
MoE = margin of error
= CONFIDENCE.NORM(α,σ,n) L: -INF

Upper Tail
U: x̄ + z1tail * SE

Key Variables: Calculate:


μ є x̄ ± MoE

2 Τailed
CL = conf. level Given or 1 - α
α = CI miss rate Given or 1 - CL L: x̄ - MoE
n = sample size Given or =COUNTA(range) U: x̄ + MoE
[Numerical Data]

df = deg. of freedom =n-1 μ ≥ x̄ - t1tail * SE


Unknown: μ

Lower
x̄ = sample mean Given or =AVERAGE(range)

Tail
s = sample. st. dev Given or =STDEV.S(range) L: x̄ - t1tail * SE
Means

SE = standard error = s/sqrt(n) U: INF


t2tail = 2 tailed tcrit =T.INV.2T(α, df)
μ ≤ x̄ + t1tail * SE
t1tail = 1 tailed tcrit =T.INV(1 - α, df)
MoE = margin of error = t2tail * SE L: -INF

Estimate Min Sample Size for


CI Key Variables Calculate
Sample Standard Deviation

α = CI miss rate Given or 1 - CL


Means n = (z2tail*σ/MoEt)2 z(α/2) = 2 tailed crit =NORM.S.INV(1-α/2)
s Known

σ= pop. st. dev Given

p= sample prop Given OR use 0.5 for worst


n = (z2tail*sqrt(p*(1- case scenario
Prop.
p)) /MoEt)2
MoEt = target MoE Given
Upper
Tail

Concepts: Confidence Interval x̄ + t1tail * SE


U: Statement
Perform CI to estimate range where unknown μ or
π lies. We are (Confidence Level)% confident that the
true (population parameter) for (state
2 Tailed population) is somewhere between [Lower,
CL is the % of time the CI captures the pop.
parameter. Upper] (units).

α is the % of time the CI misses the pop. parameter.


Lower Tail We are (Confidence Level)% confident that the
1 Tailed > true (population parameter) for (state
Both mean and proportion CIs hinge on CLT. population) is greater than [Lower] (units).

As n increases, the width of the CI decreases. We are (Confidence Level)% confident that the
Upper Tail
1 Tailed < true (population parameter) for (state
population) is less than [Upper] (units).
As α increases, the width of the CI decreases.
1. Identify Data Type Hypothesis Testing Steps
1. Identify Data Type
Decision: Is there a business decision we are trying to make
Ask: What question do we ask subjects and how do they respond?
1 Sample 2. Determine Population and Parameter
3. State Hypothesis
Data Type
Numerical
Want to know
True Mean
Symbol
μ
Hypothes 4. State α
5. Determine Testing Method
6. Check Assumptions
Categorical
2. Determine Population and Parameter
True Proportion π
is Testing 7. Design Experiment and Collect Data
8. Calculate Test Statistics and pvalue
Population: Who we are trying to make a statement about 9. Reject/Fail to Reject Ho
10. Make Conclusion
Parameter: Specific μ or π of interest for the group.
3. State Hypothesis
Means Matched Pairs Proportions
Null Hypothesis Ho : μ (≤,≥,=) μ0 Ho : μd (≤,≥,=) μd0 Ho : π (≤,≥,=) π0
Alt. Hypothesis H1 : μ (<,>,≠) μ0 H1 : μd (<,>,≠) μd0 H1 : π (<,>,≠) π0
Variables μ: true pop mean μd: true pop mean difference π: true pop proportion
μ0: hypothesized pop mean μd0: theorized pop mean difference π0: theorized pop proportion
Inequality (<,>,≠) in alt hypothesis is established in the scenario. The null hypothesis is always opposite of this symbol.
Two tailed test if alternative hypothesis is ≠. One tailed test if alternative hypothesis is either > or <.
Null Hypothesis The baseline assumption about the population, assumed to be true.
Alt. Hypothesis An alternative scenario that we test with sample data in contrast to the null. This is typically what we want to see.
4. Establish α
5. Determine Testing Method
Directly stated OR α = 1 - CL Definition Layman's
The H0 is Correct The H0 is Incorrect Percent of the time The null hypothesis is
that we incorrectly correct, but the data that
α

😀
reject the null we collect suggest that it is
Fail to Reject H0 Reject H0

Type 1 Error: α hypothesis incorrect.

Percent of the time The null hypothesis is


that we incorrectly fail incorrect, but the data that
β

😀
to reject the null we collect do not suggest
Type 2 Error: β hypothesis that the null is incorrect.

6. Check Assumptions
Check Means Μatched Pairs Proportions
Check to see if scenario states assumed
Check to see if scenario states assumed
CLT: Orig Dist Normal Orig. distribution NEVER normal.
normality OR…. normality OR….
CLT: Sample Size n ≥ 30 nd ≥ 30 n * π0 ≥ 15 & n * (1 - π0) ≥ 15
Good Sample? Is the sample representative of the population?
7. Design Experiment and Gather Data Typically this is done for us in this class. We will do a few projects where we collect data.
8. Calculate Test Statistic, pvalue. , and t zcrit OR tcrit
Means (know σ) Means (know s) Matched Pairs Proportions
Test Statistic: z = zscore = (x̄ - μ0) / SE t = tscore = (x̄ - μ0) / SE t = tscore = (x̄ d - 0) / SE z = zscore = (p - π0)/SE
Variables: x̄ = sample mean x̄ = sample mean x̄ d = sample diff mean k= # with trait of interest
μ0 = hyp. true mean μ0 = hyp. true mean sd = sample diff stdev n= sample size
σ= pop. stdev n= sample size nd = sample diff size p= sample prop. = k/n
n= sample size df = deg. of freedom df = deg. of freedom π0 = hyp. true prop.
SE = std. error = σ/sqrt(n) s= sample stdev SE = std. error = sd/sqrt(nd) SE = =sqrt(π0*(1-π0)/n)
SE = std. error = s/sqrt(n)
pvalue 2 tailed (≠) = (1-norm.s.dist(abs(z),TRUE)) * 2 = t.dist.2t(abs(t),df) = t.dist.2t(abs(t),df) = (1-norm.s.dist(abs(z),TRUE)) * 2
pvalue 1 tailed (<) = norm.s.dist(z,TRUE) = t.dist(t,df,TRUE) = t.dist(t,df,TRUE) = norm.s.dist(z,TRUE)
pvalue 1 tailed (>) = 1 - norm.s.dist(z,TRUE) = t.dist.rt(t,df) = t.dist.rt(t,df) = 1 - norm.s.dist(z,TRUE)
zcrit OR tcrit zcrit_2tail = =NORM.S.INV(1 - α/2) tcrit_2tail = =T.INV.2T(α, df) tcrit_2tail = =T.INV.2T(α, df) zcrit_2tail = =NORM.S.INV(1 - α/2)
zcrit_1tail =
=NORM.S.INV(1 - α) tcrit_1tail = =T.INV(1 - α, df) tcrit_1tail = =T.INV(1 - α, df) zcrit_1tail = =NORM.S.INV(1 - α)
Vocab
Test Statistic: The number of standard errors the sample statistic is from the hypothesized population parameter.
pvalue: The probability of observing the sample statistic (or something more extreme) if the Null Hypothesis is true.
9. Fail to Reject or Reject the H0
Fail to reject the Null Hypothesis and continue under the
IF pvalue ≥ α sample statistic outside test statistic t/z smaller than
ALSO null parameter inside CI
baseline assertation. rejection region critical t/z
Reject the Null Hypothesis H0 and conclude the Alternative sample statistic inside null parameter outside test statistic t/z larger than
IF pvalue < α ALSO
Hypothesis H1 rejection region CI critical t/z
10. Conclusion (and CI if necessary)
Conclusion Confidence Interval Statement
We collected insufficient evidence (test statistics, pvalue , α) to Not Required: Since we failed to reject the null hypothesis the CI would contain the Null
Fail to reject reject the claim that (state H0 in words). We will continue population parameter
under the assumption that the H0 is correct.
2 Tailed (≠) We are (Confidence Level)% confident that the true (population
parameter) for (state population) is somewhere between [Lower,
(2 Tailed CI) Upper] (units).
We collected sufficient evidence (test statistic, pvalue , α) to
reject the claim that (state H0 in words) and instead we 1 Tailed (<) We are (Confidence Level)% confident that the true (population
Reject
(Upper Tail CI) parameter) for (state population) is less than [Upper] (units).
conclude (state H1 in words).

1 Tailed (>) We are (Confidence Level)% confident that the true (population
(Lower Tail CI) parameter) for (state population) is greater than [Lower] (units).
1. Identify Data Type Hypothesis Testing Steps
Ask: What question do we ask subjects and how do they respond?
Data Type Want to know Symbol
2 Sample 1. Identify Data Type
2. Determine Population and Parameter
3. State Hypothesis
Numerical
Categorical
Difference of True Means
Difference of True Proportion
μ1 - μ 2
π1 - π 2
Hypothesis 4. State α
5. Determine Testing Method

Testing
6. Check Assumptions
2. Determine Population and Parameter 7. Design Experiment and Collect Data
Populations: The 2 groups of interest (g and g )
1 2 8. Calculate Test Statistics and pvalue
Parameter: True mean (μ) or true proportion (π) of interest to compare between g 1 and g2. 9. Reject/Fail to Reject Ho
10. Make Conclusion
3. State Hypothesis
Means Proportions
Null Hypothesis Ho : μ1 - μ 2 = 0 OR μ1 = μ2 Ho : π1 - π 2 = 0 OR π1 = π2
Alt. Hypothesis H1 : μ1 - μ2 (<,>,≠) 0 OR μ1 (<,>,≠) μ2 H1 : π1 - π2 (<,>,≠) 0 OR π1 (<,>,≠) π2
Variables μ1: true pop mean of group 1 π1: true pop proportion of group 1
μ2: true pop mean of group 2 π2: true pop proportion of group 2
Inequality (<,>,≠) in alt hypothesis is established in the scenario.
Two tailed test if alternative hypothesis is ≠. One tailed test if alternative hypothesis is either > or <.
Null Hypothesis The baseline assumption about the population, assumed to be true. Typically assumed the groups have equal true parameters.
Alt. Hypothesis An alternative scenario that we test with sample data in contrast to the null. This is typically what we want to see.

4. Establish α 5. Determine Testing Method


Directly stated OR Check for Equal Variance
α = 1 - CL Levene’s Hypothesis F test for Equal Variance
The H0 is The H0 is H0 : The two samples have equal variances
Correct Incorrect
H1 : The two samples have unequal variances

😀
Fail to Reject H0 Reject H0

Type 1 F= software calculation


Error: α
pvalue= If pvalue< α Reject If pvalue ≥ α Fail to Reject
Reject: Use unequal variances t test

😀 Type 2 Error:
β
Fail to Reject: Use equal variances t test

7. Design Experiment and Gather Data


6. Check Assumptions
Check (Must pass for both g 1 How to Establish Cause and Effect
Means Proportions
and g2) Treatment comes before the effect.
g1 & g2 both stated to assume Original distributions are never Found significant results (Reject H0 , pvalue < α)
CLT: Original Dist. Normal
normality normal.
Utilized a true experiment. Eliminates other
g1: n1 ≥ 30 g1: n1 * π1 ≥ 15 & n1 * (1 - π1) ≥ 15 explanations
CLT: Sample Size
g2: n2 ≥ 30 g2: n2 * π2 ≥ 15 & n2 * (1 - π2) ≥ 15

Good Sample? Is the sample representative of the population?

8. Calculate Test Statistic and p value

Means (Equal Var) Means (Unequal Variance) Proportions


Test t statistics t statistic Z statistic
Statistic: report: (t(df)= , pvalue= , α=) report: (t(df)= , pvalue= , α=) report: (Z= , pvalue= , α=)
Key Values x̄ 1 - x̄ 2 = sample mean diff x̄ 1 - x̄ 2 = sample mean diff p1 - p2 = sample prop. diff
x̄ 1 & x̄ 2 sample means x̄ 1 & x̄ 2 sample means p1 & p2 sample proportions 9. Fail to Reject or
Reject the H0
n1 & n2 sample sizes n1 & n2 sample sizes n1 & n2 sample sizes
s1 & s2 sample stdev. s1 & s2 sample stdev. IF pvalue ≥ α Fail to reject the Null Hypothesis and
conclude the Null Hypothesis.
Vocab Test Stat.: The number of standard errors the sample stat. difference is from the H o.
Reject the Null Hypothesis H0 and
pvalue: The prob. of observing the sample stat. difference (or more extreme) if the Ho is true. IF pvalue < α
conclude the Alternative Hypothesis H1

10. Conclusion (and CI if necessary)


Conclusion Confidence Interval Statement

We collected insufficient evidence (test statistics, pvalue , α) to Not Required: Since we failed to reject the null hypothesis the CI would contain the Null
Fail to reject reject the claim that (state H0 in words). We will continue population parameter
under the assumption that the H0 is correct.
We are (Confidence Level)% confident that the true difference of
2 Tailed (population parameter) between the (state populations) is somewhere
between [Lower, Upper] (units) with (state larger group) as larger.

We collected sufficient evidence (test statistic, pvalue , α) to We are (Confidence Level)% confident that the true (population
Reject reject the claim that (state H0 in words) and instead we 1 Tailed < parameter) for (state group 1) is at least [Upper] (units) less than that of
conclude (state H1 in words). (state group 2).

We are (Confidence Level)% confident that the true (population


1 Tailed > parameter) for (state group 1) is at least [Lower] (units) greater than that
of (state group 2).
1. Identify Data Type
Ask: What question do we ask subjects and how do they respond?
Data Type Want to know Symbol
Regression Hypothesis Testing Steps
1. Identify Data Type

2. Determine Population and Parameter


Numerical x 2+ True Relationship(s) or True Slope(s) β1 (, β2, β3… )
Analysis 2. Determine Population and Parameter
3. State Hypothesis
4. State α
Population: The group of interest that we are measuring. 5. Determine Testing Method
6. Check Assumptions
Parameter(s): True slope (β1, β2, …) between the predictor(s) (x1, x2,…) and response (y).
7. Design Experiment and Collect Data
Response [y]: This is the variable that we would like to be able to model. (dependent variable) 8. Calculate Test Statistics and pvalue
9. Reject/Fail to Reject Ho
Predictor(s) [x1, x2,…]: At least 1 variable that we are using to make the prediction. (independent variable(s)) 10. Make Conclusion
3. State Hypothesis
1 Predictor 2+ Predictors
Null Hypothesis Ho : β1 = 0 No relationship between x and y Ho : β1 = β2 = ... = 0 Predictors do not model response.
Alt. Hypothesis H1 : β1 (<,>,≠) 0 There is a relationship between x and y. H1 : (β1 or β2 or ... ) ≠ 0 At least 1 predictor models response.
Variables β1 : true slope between x and y β1 : true relationship between y and x1
β2 (…): true relationship between y and x2
Inequality (<,>,≠) in alt hypothesis is established in the scenario. Regression is most commonly 2 tailed and with multiple predictors is only 2 tailed.
Two tailed test if alternative hypothesis is ≠. One tailed test if alternative hypothesis is either > or <.
Null Hypothesis The baseline assumption about the population, assumed to be true.
Alt. Hypothesis An alternative scenario that we test with sample data in contrast to the null. This is typically what we want to see.
4. Establish α 5. Determine Testing Method 6. Check Assumptions
Directly stated OR Residual Plot Checks Good Bad
α = 1 - CL
Constant variance looks like Plot shows fanning (i.e. non
The H0 is The H0 is Homoskedasticity
a cloud constant variance)
Correct Incorrect
No clustering of Data is clustered with large

😀
Independence
Fail to Reject H0 Reject H0

Type 1 observations empty spots


Error: α
Linear There is no pattern in the Visual pattern in the
residuals residual plots.

😀
Centered Residuals centered on 0 Not centered on 0
Type 2
Error: β QQ Plot Check Good Bad
Points deviate
Points lie on QQ plot line
7. Design Experiment and Gather Data Normality substantially from QQ
without major deviation.
8. Calculate Test Statistic and p value plot line.
How to Establish Cause and Effect
1 and 2+ Predictors Model Building
Treatment comes before the effect.
Test Statistic F statistic and pvalue (if significant
Found significant results (Reject H0 , pvalue < α) Overall Model Equation to make predictions
check each predictor)
Utilized a true experiment. Eliminates other
explanations Test Statistic t statistic and pvalue y = b0 + b1*x1 + (b2*x2 + …..)
per Predictor
b0, b1, … are the predictor
b0 = model intercept
coefficients.
Point b1 = slope between x1 and y
Estimates
(b2 …)= slope between x2 and y Plug in values of x1 , x2 … (for all
predictors) to make predictions of y

Test Stat: F statistic. As F gets larger, the rarer of an observation given H o is true.

The prob. of observing the sample statistic difference (or more extreme)
Vocab pvalue:
if the Ho is true.

The percent of variability in the data that the model explains. Better
R2 models explain more of the variability.
9. Fail to Reject or Reject the H0
IF pvalue ≥ α Fail to reject the Null Hypothesis (Ho) and conclude the Null Hypothesis (H1).
IF pvalue < α Reject the Null Hypothesis (H0) and conclude the Alternative Hypothesis (H1)
10. Conclusion (and CI if necessary)
Conclusion Confidence Interval Statement
We collected insufficient evidence (F=, pvalue =, α =) to reject the Not Required: Since we failed to reject the null hypothesis the CI would contain the Null
Fail to reject claim that (state H0 in words). We will continue under the population parameter
assumption that the H0 is correct.

We are (Confidence Level)% confident that for 1 (x unit) increase in (x) that
1 Predictor
the (y) changes by somewhere between [Lower , Upper] (y units).

We collected sufficient evidence (F=, pvalue =, α =) to reject the


Reject claim that (state H0 in words) and instead we conclude (state H1 (Note: We only report the confidence interval of the significant predictors)
in words). We are (Confidence Level)% confident that for 1 (x1 unit) increase in (x1)
2 Predictors that the (y) changes by somewhere between [Lower x1, Upper x1v] (y units)
and for (x2 unit) increase in (x2) that the (y) changes by somewhere
between [Lower x2, Upper x2] ...

You might also like