0% found this document useful (0 votes)

15 views8 pages

FlowChart V20

The document outlines various data types including categorical (nominal and ordinal) and numerical (discrete and continuous), along with their definitions and examples. It also covers statistical concepts such as population, sample, parameter, statistic, and different sampling methods. Additionally, it discusses hypothesis testing steps and confidence intervals, providing formulas and calculations for various statistical measures.

Uploaded by

Wilson

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views8 pages

FlowChart V20

Uploaded by

Wilson

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 8

Data Types Categorical Data that is grouped by categories.

Population Sample
Definition Pronunciation Pronunciation
Categorical data that has no particular Parameters Statistics
Nominal
order. Mean μ mu x̅ x bar
Categorical data that has a logical order. Standard Deviation σ sigma s s
Ordinal
Example Likert scale.
Variance σ2 sigma squared s2 s squared
Numerical Data that is grouped by numbers.
Proportion π pi p p
Data is grouped in fixed steps. Not possible
Discrete to have answer between steps. Example: Slope β1 beta one b1 b one
number of children in family Coefficient of
Data may have any value along a given ρ rho r r
Correlation
Continuous
scale. Example: weight, time speed.
Interval Numerical data with arbitrary 0 Inequalities

= ≠ ≤ ≥ < >
Ratio Numerical data with real 0
less than or greater than
Basic Vocab Population Entire group of interest. equal to not equal to less than greater than
equal to or equal to
Parameter Some truth about the population.
Subset of population. A good sample is is different at most at least under over
Sample
representative of population. at a at a
not fewer than more than
Statistic Some truth about the sample. maximum minimum
Variable What is measured or observed no more
no less than smaller larger
than
Data List of results from sample
value or
A sample where the entire population is value or less exceeds
Census more
measure or observed. Very rare.
Formula Syntax Description Example Usage
Descriptive
Some result/truth from the sample Adds all the numbers in a
Statistic =SUM(range) =SUM(A1:A10)
range.
Results/truth from the sample that are used Multiplies all numbers in a
Inferential =PRODUCT(range) =PRODUCT(A1:A10)
to make a conclusion about the entire range.
Statistic
population. Counts the number of non-
=COUNTA(range) =COUNTA(A1:A10)
Purpose of Purpose of empty cells in a range.
Reveal something previously unknown.
Question Question =COUNTIF(range, Counts the number of cells that =COUNTIF(A1:A10,
Recording something observed. There is no criteria) meet a specified condition. ">10")
Observational intervention here. Correlations may be Returns the absolute value of a
=ABS(number) =ABS(-5)
Experiment found but does not establish cause and number.
effect. =ROUND(number, Rounds a number to a specified
=ROUND(3.14159, 2)
Some treatment is given to group. Results digits) number of digits.
Experimental Rounds a number up to the
are observed and recorded. This is how =CEILING.MATH(7.3,
Experiment =CEILING.MATH(number nearest multiple of a specified
cause and effect is able to be established. 1)
) value.
Sampling
Good Sample Sample well represents the population. Rounds a number down to the
Methods
nearest multiple of a specified =FLOOR.MATH(7.3, 1)
Simple Every member of population is equally =FLOOR.MATH(number)
value.
Random likely to be observed. A subset of them are
Sample randomly selected.
Calculates the average (mean)
Systematic Every member of population is lined up and =AVERAGE(range) =AVERAGE(A1:A10)
of numbers in a range.
Sample kth member is chosen Returns the median (middle
Population is broken up into groups. The =MEDIAN(range) =MEDIAN(A1:A10)
Cluster value) of a range.
groups are randomly selected. Every Returns the most frequently
Sample =MODE.MULT(range)
member in the selected group is measured. occurring value in a range. =MODE.SNGL(A1:A10)
Some demographic is selected and the Returns the sample standard
Stratified =STDEV.S(range) =STDEV.S(A1:A10)
sample matches the ratio of the selected deviation.
Sample
demographic. Returns the sample variance of
=VAR.S(range) =VAR.S(A1:A10)
Convenience Taking a sample of just where it is easy to a dataset.
Sample gather data. Finds the smallest value in a
=MIN(range) =MIN(A1:A10)
Good range.
Good Directly answers the question and is free of Finds the largest value in a
Measureme =MAX(range) =MAX(A1:A10)
Measurement bias range.
nt
Finds the skewness of the data
Survivorship Only measuring members who make it to =SKEW(range) =SKEW(A1:A10)
(- is left skew, + is right skew)
Bias the end of a study.
Finds the kurtosis of the data
Recall Bias Relying upon people’s memory. =KURT(range) (Mesokuritc = 3, Leptokurtic > =KURT(A1:A10)
Who is funding the study? Is it an impartial 3,Platykurtic < 3)
Funding Bias 3rd party or do they have an interest in
making the study have a particular result?
Cause and Correlation being misidentified as
Effect Bias causation.
People get the opportunity to opt in or out
Selection Bias
of the study.
Confirmation Only collecting data to support a
Bias conclusion.
CLTRules Key Variables: Calculate:
μ = pop. Mean Given
Data: Numerical: Means μx̄ = mean of sampling dist wrt x̄ =μ
Numerical: Means
Sampling distribution of x̄ is approximatelynormal if either n = sample size Given or =COUNTA(range)
σ = pop. std. dev. Given
1. the orignial dist. of xis normal
std. error OR
OR σx̄ = SE = = σ/sqrt(n)
std. dev of sampling dist wrt x̄
2. n >30
x̄= sample mean Given or =AVERAGE(range)
z = Z score (# of SE from mean) =STANDARDIZE(x̄ , μ , SE)

Less than: Areadleft: P( x̄ <x̄crit) =? Cuttoff for Bottom%: P(x̄ <?) =prob Zscore

Quantiles (Crit. sample mean)

Probability(Area under curve)

=norm.dist(x̄crit,μ,SE,TRUE) =norm.inv(prob,μ,SE) Z=(x̄ - μ) / SE

Z-Manipulation
Greater than: Arearight: P( x̄ >x̄crit ) =? Cuttoff for Top%: P(x̄ >?) =prob Solve for x̄

=1- norm.dist(x̄crit,μ,SE,TRUE) =norm.inv(1-prob,μ,SE) x̄ =Z*SE +μ

Between: Areabetween: P( x̄lower <x̄ <x̄upper ) =? Cuttoffs for Middle %: P(? <x̄ <?) =prob
=norm.dist(x̄upper,μ,SE,TRUE) x̄lower =norm.inv(0.5- prob/2,μ,SE)
- norm.dist(x̄lower,μ,SE,TRUE) x̄upper =norm.inv(0.5+prob/2,μ,SE)

Prob and Quantile: Prob and Quantile:

Means Proportions

Blue: What you provide function Red: What function

calculates
CLTRules Key Variables: Calculate:
Data: Categorical: Proporitons
Categorical: Proporitons

π = pop. proportion Given

μp = mean of sampling dist wrt p =π
Sampling distribution of p is approximatleynormal if BOTH n = sample size Given or =COUNTA(range)
std. error OR
n*π >15 (#of successes) σp = SE = = sqrt(π*(1-π)/n)
std. dev of sampling dist wrt p
AND
n*(1-π) >15 (#of failures) k = # with the trait of interest Given or =COUNTIF(range,criteria)
p = sample proportion Given or =k/n
Z = Z score (# of SE from mean) =STANDARDIZE(p , π, SE)

Less than: Arealeft: P(p<pcrit) =? Cuttoff for Bottom%: P(p<?) =prob Zscore
Probability(Area under curve)

Quantiles (Crit. sample prop.)

=norm.dist(pcrit,π,SE,TRUE) =norm.inv(prob,π,SE) Z=(p - π) / SE

Z-Manipulation

Greater than: Arearight: P(p>pcrit) =? Cuttoff for Top%: P(p>?) =prob Solve for p

=1- norm.dist(pcrit,π,SE,TRUE) =norm.inv(1-prob,π,SE) p =Z*SE +π

Between: Areabetween: P(plower<p<pupper) =? Cuttoffs for Middle %: P(? <p<?) =prob Solve for k
=norm.dist(pupper,π,SE,TRUE) plower =norm.inv(0.5- prob/2,π,SE) k =p * n
- norm.dist(plower,π,SE,TRUE) pupper =norm.inv(0.5+prob/2,π,SE) [typicallyround up]
Key Variables: Calculate:
μ є x̄ ± MoE
CL = conf. level Given or 1 - α

2 Τailed
α = CI miss rate Given or 1 - CL L: x̄ - MoE

U: x̄ + MoE
n = sample size Given or =COUNTA(range)
x̄ = sample mean Given or =AVERAGE(range)
Pop. Standard Deviation μ ≥ x̄ - z1tail * SE

Lower
σ = pop. st. dev Given

Tail
SE = standard error = σ/sqrt(n) L: x̄ - z1tail * SE
z2tail = 2 tailed zcrit =NORM.S.INV(1 - α/2) U: INF
σ Known

z1tail = 1 tailed zcrit =NORM.S.INV(1 - α)

μ ≤ x̄ + z1tail * SE
= z2tail * SE
MoE = margin of error
= CONFIDENCE.NORM(α,σ,n) L: -INF

Upper Tail
U: x̄ + z1tail * SE

Key Variables: Calculate:

μ є x̄ ± MoE

2 Τailed
CL = conf. level Given or 1 - α
α = CI miss rate Given or 1 - CL L: x̄ - MoE
n = sample size Given or =COUNTA(range) U: x̄ + MoE
[Numerical Data]

df = deg. of freedom =n-1 μ ≥ x̄ - t1tail * SE

Unknown: μ

Lower
x̄ = sample mean Given or =AVERAGE(range)

Tail
s = sample. st. dev Given or =STDEV.S(range) L: x̄ - t1tail * SE
Means

SE = standard error = s/sqrt(n) U: INF

t2tail = 2 tailed tcrit =T.INV.2T(α, df)
μ ≤ x̄ + t1tail * SE
t1tail = 1 tailed tcrit =T.INV(1 - α, df)
MoE = margin of error = t2tail * SE L: -INF

Estimate Min Sample Size for

CI Key Variables Calculate
Sample Standard Deviation

α = CI miss rate Given or 1 - CL

Means n = (z2tail*σ/MoEt)2 z(α/2) = 2 tailed crit =NORM.S.INV(1-α/2)
s Known

σ= pop. st. dev Given

p= sample prop Given OR use 0.5 for worst

n = (z2tail*sqrt(p*(1- case scenario
Prop.
p)) /MoEt)2
MoEt = target MoE Given
Upper
Tail

Concepts: Confidence Interval x̄ + t1tail * SE

U: Statement
Perform CI to estimate range where unknown μ or
π lies. We are (Confidence Level)% confident that the
true (population parameter) for (state
2 Tailed population) is somewhere between [Lower,
CL is the % of time the CI captures the pop.
parameter. Upper] (units).

α is the % of time the CI misses the pop. parameter.

Lower Tail We are (Confidence Level)% confident that the
1 Tailed > true (population parameter) for (state
Both mean and proportion CIs hinge on CLT. population) is greater than [Lower] (units).

As n increases, the width of the CI decreases. We are (Confidence Level)% confident that the
Upper Tail
1 Tailed < true (population parameter) for (state
population) is less than [Upper] (units).
As α increases, the width of the CI decreases.
1. Identify Data Type Hypothesis Testing Steps
1. Identify Data Type
Decision: Is there a business decision we are trying to make
Ask: What question do we ask subjects and how do they respond?
1 Sample 2. Determine Population and Parameter
3. State Hypothesis
Data Type
Numerical
Want to know
True Mean
Symbol
μ
Hypothes 4. State α
5. Determine Testing Method
6. Check Assumptions
Categorical
2. Determine Population and Parameter
True Proportion π
is Testing 7. Design Experiment and Collect Data
8. Calculate Test Statistics and pvalue
Population: Who we are trying to make a statement about 9. Reject/Fail to Reject Ho
10. Make Conclusion
Parameter: Specific μ or π of interest for the group.
3. State Hypothesis
Means Matched Pairs Proportions
Null Hypothesis Ho : μ (≤,≥,=) μ0 Ho : μd (≤,≥,=) μd0 Ho : π (≤,≥,=) π0
Alt. Hypothesis H1 : μ (<,>,≠) μ0 H1 : μd (<,>,≠) μd0 H1 : π (<,>,≠) π0
Variables μ: true pop mean μd: true pop mean difference π: true pop proportion
μ0: hypothesized pop mean μd0: theorized pop mean difference π0: theorized pop proportion
Inequality (<,>,≠) in alt hypothesis is established in the scenario. The null hypothesis is always opposite of this symbol.
Two tailed test if alternative hypothesis is ≠. One tailed test if alternative hypothesis is either > or <.
Null Hypothesis The baseline assumption about the population, assumed to be true.
Alt. Hypothesis An alternative scenario that we test with sample data in contrast to the null. This is typically what we want to see.
4. Establish α
5. Determine Testing Method
Directly stated OR α = 1 - CL Definition Layman's
The H0 is Correct The H0 is Incorrect Percent of the time The null hypothesis is
that we incorrectly correct, but the data that
α

😀
reject the null we collect suggest that it is
Fail to Reject H0 Reject H0

Type 1 Error: α hypothesis incorrect.

Percent of the time The null hypothesis is

that we incorrectly fail incorrect, but the data that
β

😀
to reject the null we collect do not suggest
Type 2 Error: β hypothesis that the null is incorrect.

6. Check Assumptions
Check Means Μatched Pairs Proportions
Check to see if scenario states assumed
Check to see if scenario states assumed
CLT: Orig Dist Normal Orig. distribution NEVER normal.
normality OR…. normality OR….
CLT: Sample Size n ≥ 30 nd ≥ 30 n * π0 ≥ 15 & n * (1 - π0) ≥ 15
Good Sample? Is the sample representative of the population?
7. Design Experiment and Gather Data Typically this is done for us in this class. We will do a few projects where we collect data.
8. Calculate Test Statistic, pvalue. , and t zcrit OR tcrit
Means (know σ) Means (know s) Matched Pairs Proportions
Test Statistic: z = zscore = (x̄ - μ0) / SE t = tscore = (x̄ - μ0) / SE t = tscore = (x̄ d - 0) / SE z = zscore = (p - π0)/SE
Variables: x̄ = sample mean x̄ = sample mean x̄ d = sample diff mean k= # with trait of interest
μ0 = hyp. true mean μ0 = hyp. true mean sd = sample diff stdev n= sample size
σ= pop. stdev n= sample size nd = sample diff size p= sample prop. = k/n
n= sample size df = deg. of freedom df = deg. of freedom π0 = hyp. true prop.
SE = std. error = σ/sqrt(n) s= sample stdev SE = std. error = sd/sqrt(nd) SE = =sqrt(π0*(1-π0)/n)
SE = std. error = s/sqrt(n)
pvalue 2 tailed (≠) = (1-norm.s.dist(abs(z),TRUE)) * 2 = t.dist.2t(abs(t),df) = t.dist.2t(abs(t),df) = (1-norm.s.dist(abs(z),TRUE)) * 2
pvalue 1 tailed (<) = norm.s.dist(z,TRUE) = t.dist(t,df,TRUE) = t.dist(t,df,TRUE) = norm.s.dist(z,TRUE)
pvalue 1 tailed (>) = 1 - norm.s.dist(z,TRUE) = t.dist.rt(t,df) = t.dist.rt(t,df) = 1 - norm.s.dist(z,TRUE)
zcrit OR tcrit zcrit_2tail = =NORM.S.INV(1 - α/2) tcrit_2tail = =T.INV.2T(α, df) tcrit_2tail = =T.INV.2T(α, df) zcrit_2tail = =NORM.S.INV(1 - α/2)
zcrit_1tail =
=NORM.S.INV(1 - α) tcrit_1tail = =T.INV(1 - α, df) tcrit_1tail = =T.INV(1 - α, df) zcrit_1tail = =NORM.S.INV(1 - α)
Vocab
Test Statistic: The number of standard errors the sample statistic is from the hypothesized population parameter.
pvalue: The probability of observing the sample statistic (or something more extreme) if the Null Hypothesis is true.
9. Fail to Reject or Reject the H0
Fail to reject the Null Hypothesis and continue under the
IF pvalue ≥ α sample statistic outside test statistic t/z smaller than
ALSO null parameter inside CI
baseline assertation. rejection region critical t/z
Reject the Null Hypothesis H0 and conclude the Alternative sample statistic inside null parameter outside test statistic t/z larger than
IF pvalue < α ALSO
Hypothesis H1 rejection region CI critical t/z
10. Conclusion (and CI if necessary)
Conclusion Confidence Interval Statement
We collected insufficient evidence (test statistics, pvalue , α) to Not Required: Since we failed to reject the null hypothesis the CI would contain the Null
Fail to reject reject the claim that (state H0 in words). We will continue population parameter
under the assumption that the H0 is correct.
2 Tailed (≠) We are (Confidence Level)% confident that the true (population
parameter) for (state population) is somewhere between [Lower,
(2 Tailed CI) Upper] (units).
We collected sufficient evidence (test statistic, pvalue , α) to
reject the claim that (state H0 in words) and instead we 1 Tailed (<) We are (Confidence Level)% confident that the true (population
Reject
(Upper Tail CI) parameter) for (state population) is less than [Upper] (units).
conclude (state H1 in words).

1 Tailed (>) We are (Confidence Level)% confident that the true (population
(Lower Tail CI) parameter) for (state population) is greater than [Lower] (units).
1. Identify Data Type Hypothesis Testing Steps
Ask: What question do we ask subjects and how do they respond?
Data Type Want to know Symbol
2 Sample 1. Identify Data Type
2. Determine Population and Parameter
3. State Hypothesis
Numerical
Categorical
Difference of True Means
Difference of True Proportion
μ1 - μ 2
π1 - π 2
Hypothesis 4. State α
5. Determine Testing Method

Testing
6. Check Assumptions
2. Determine Population and Parameter 7. Design Experiment and Collect Data
Populations: The 2 groups of interest (g and g )
1 2 8. Calculate Test Statistics and pvalue
Parameter: True mean (μ) or true proportion (π) of interest to compare between g 1 and g2. 9. Reject/Fail to Reject Ho
10. Make Conclusion
3. State Hypothesis
Means Proportions
Null Hypothesis Ho : μ1 - μ 2 = 0 OR μ1 = μ2 Ho : π1 - π 2 = 0 OR π1 = π2
Alt. Hypothesis H1 : μ1 - μ2 (<,>,≠) 0 OR μ1 (<,>,≠) μ2 H1 : π1 - π2 (<,>,≠) 0 OR π1 (<,>,≠) π2
Variables μ1: true pop mean of group 1 π1: true pop proportion of group 1
μ2: true pop mean of group 2 π2: true pop proportion of group 2
Inequality (<,>,≠) in alt hypothesis is established in the scenario.
Two tailed test if alternative hypothesis is ≠. One tailed test if alternative hypothesis is either > or <.
Null Hypothesis The baseline assumption about the population, assumed to be true. Typically assumed the groups have equal true parameters.
Alt. Hypothesis An alternative scenario that we test with sample data in contrast to the null. This is typically what we want to see.

4. Establish α 5. Determine Testing Method

Directly stated OR Check for Equal Variance
α = 1 - CL Levene’s Hypothesis F test for Equal Variance
The H0 is The H0 is H0 : The two samples have equal variances
Correct Incorrect
H1 : The two samples have unequal variances

😀
Fail to Reject H0 Reject H0

Type 1 F= software calculation

Error: α
pvalue= If pvalue< α Reject If pvalue ≥ α Fail to Reject
Reject: Use unequal variances t test

😀 Type 2 Error:
β
Fail to Reject: Use equal variances t test

7. Design Experiment and Gather Data

6. Check Assumptions
Check (Must pass for both g 1 How to Establish Cause and Effect
Means Proportions
and g2) Treatment comes before the effect.
g1 & g2 both stated to assume Original distributions are never Found significant results (Reject H0 , pvalue < α)
CLT: Original Dist. Normal
normality normal.
Utilized a true experiment. Eliminates other
g1: n1 ≥ 30 g1: n1 * π1 ≥ 15 & n1 * (1 - π1) ≥ 15 explanations
CLT: Sample Size
g2: n2 ≥ 30 g2: n2 * π2 ≥ 15 & n2 * (1 - π2) ≥ 15

Good Sample? Is the sample representative of the population?

8. Calculate Test Statistic and p value

Means (Equal Var) Means (Unequal Variance) Proportions

Test t statistics t statistic Z statistic
Statistic: report: (t(df)= , pvalue= , α=) report: (t(df)= , pvalue= , α=) report: (Z= , pvalue= , α=)
Key Values x̄ 1 - x̄ 2 = sample mean diff x̄ 1 - x̄ 2 = sample mean diff p1 - p2 = sample prop. diff
x̄ 1 & x̄ 2 sample means x̄ 1 & x̄ 2 sample means p1 & p2 sample proportions 9. Fail to Reject or
Reject the H0
n1 & n2 sample sizes n1 & n2 sample sizes n1 & n2 sample sizes
s1 & s2 sample stdev. s1 & s2 sample stdev. IF pvalue ≥ α Fail to reject the Null Hypothesis and
conclude the Null Hypothesis.
Vocab Test Stat.: The number of standard errors the sample stat. difference is from the H o.
Reject the Null Hypothesis H0 and
pvalue: The prob. of observing the sample stat. difference (or more extreme) if the Ho is true. IF pvalue < α
conclude the Alternative Hypothesis H1

10. Conclusion (and CI if necessary)

Conclusion Confidence Interval Statement

We collected insufficient evidence (test statistics, pvalue , α) to Not Required: Since we failed to reject the null hypothesis the CI would contain the Null
Fail to reject reject the claim that (state H0 in words). We will continue population parameter
under the assumption that the H0 is correct.
We are (Confidence Level)% confident that the true difference of
2 Tailed (population parameter) between the (state populations) is somewhere
between [Lower, Upper] (units) with (state larger group) as larger.

We collected sufficient evidence (test statistic, pvalue , α) to We are (Confidence Level)% confident that the true (population
Reject reject the claim that (state H0 in words) and instead we 1 Tailed < parameter) for (state group 1) is at least [Upper] (units) less than that of
conclude (state H1 in words). (state group 2).

We are (Confidence Level)% confident that the true (population

1 Tailed > parameter) for (state group 1) is at least [Lower] (units) greater than that
of (state group 2).
1. Identify Data Type
Ask: What question do we ask subjects and how do they respond?
Data Type Want to know Symbol
Regression Hypothesis Testing Steps
1. Identify Data Type

2. Determine Population and Parameter

Numerical x 2+ True Relationship(s) or True Slope(s) β1 (, β2, β3… )
Analysis 2. Determine Population and Parameter
3. State Hypothesis
4. State α
Population: The group of interest that we are measuring. 5. Determine Testing Method
6. Check Assumptions
Parameter(s): True slope (β1, β2, …) between the predictor(s) (x1, x2,…) and response (y).
7. Design Experiment and Collect Data
Response [y]: This is the variable that we would like to be able to model. (dependent variable) 8. Calculate Test Statistics and pvalue
9. Reject/Fail to Reject Ho
Predictor(s) [x1, x2,…]: At least 1 variable that we are using to make the prediction. (independent variable(s)) 10. Make Conclusion
3. State Hypothesis
1 Predictor 2+ Predictors
Null Hypothesis Ho : β1 = 0 No relationship between x and y Ho : β1 = β2 = ... = 0 Predictors do not model response.
Alt. Hypothesis H1 : β1 (<,>,≠) 0 There is a relationship between x and y. H1 : (β1 or β2 or ... ) ≠ 0 At least 1 predictor models response.
Variables β1 : true slope between x and y β1 : true relationship between y and x1
β2 (…): true relationship between y and x2
Inequality (<,>,≠) in alt hypothesis is established in the scenario. Regression is most commonly 2 tailed and with multiple predictors is only 2 tailed.
Two tailed test if alternative hypothesis is ≠. One tailed test if alternative hypothesis is either > or <.
Null Hypothesis The baseline assumption about the population, assumed to be true.
Alt. Hypothesis An alternative scenario that we test with sample data in contrast to the null. This is typically what we want to see.
4. Establish α 5. Determine Testing Method 6. Check Assumptions
Directly stated OR Residual Plot Checks Good Bad
α = 1 - CL
Constant variance looks like Plot shows fanning (i.e. non
The H0 is The H0 is Homoskedasticity
a cloud constant variance)
Correct Incorrect
No clustering of Data is clustered with large

😀
Independence
Fail to Reject H0 Reject H0

Type 1 observations empty spots

Error: α
Linear There is no pattern in the Visual pattern in the
residuals residual plots.

😀
Centered Residuals centered on 0 Not centered on 0
Type 2
Error: β QQ Plot Check Good Bad
Points deviate
Points lie on QQ plot line
7. Design Experiment and Gather Data Normality substantially from QQ
without major deviation.
8. Calculate Test Statistic and p value plot line.
How to Establish Cause and Effect
1 and 2+ Predictors Model Building
Treatment comes before the effect.
Test Statistic F statistic and pvalue (if significant
Found significant results (Reject H0 , pvalue < α) Overall Model Equation to make predictions
check each predictor)
Utilized a true experiment. Eliminates other
explanations Test Statistic t statistic and pvalue y = b0 + b1*x1 + (b2*x2 + …..)
per Predictor
b0, b1, … are the predictor
b0 = model intercept
coefficients.
Point b1 = slope between x1 and y
Estimates
(b2 …)= slope between x2 and y Plug in values of x1 , x2 … (for all
predictors) to make predictions of y

Test Stat: F statistic. As F gets larger, the rarer of an observation given H o is true.

The prob. of observing the sample statistic difference (or more extreme)
Vocab pvalue:
if the Ho is true.

The percent of variability in the data that the model explains. Better
R2 models explain more of the variability.
9. Fail to Reject or Reject the H0
IF pvalue ≥ α Fail to reject the Null Hypothesis (Ho) and conclude the Null Hypothesis (H1).
IF pvalue < α Reject the Null Hypothesis (H0) and conclude the Alternative Hypothesis (H1)
10. Conclusion (and CI if necessary)
Conclusion Confidence Interval Statement
We collected insufficient evidence (F=, pvalue =, α =) to reject the Not Required: Since we failed to reject the null hypothesis the CI would contain the Null
Fail to reject claim that (state H0 in words). We will continue under the population parameter
assumption that the H0 is correct.

We are (Confidence Level)% confident that for 1 (x unit) increase in (x) that
1 Predictor
the (y) changes by somewhere between [Lower , Upper] (y units).

We collected sufficient evidence (F=, pvalue =, α =) to reject the

Reject claim that (state H0 in words) and instead we conclude (state H1 (Note: We only report the confidence interval of the significant predictors)
in words). We are (Confidence Level)% confident that for 1 (x1 unit) increase in (x1)
2 Predictors that the (y) changes by somewhere between [Lower x1, Upper x1v] (y units)
and for (x2 unit) increase in (x2) that the (y) changes by somewhere
between [Lower x2, Upper x2] ...

Statistics ESCP
No ratings yet
Statistics ESCP
383 pages
Sampling Distribution CI HyT
No ratings yet
Sampling Distribution CI HyT
98 pages
Statistics, Data Analysis, and Decision Modeling, 5th Edition
100% (5)
Statistics, Data Analysis, and Decision Modeling, 5th Edition
556 pages
Chapter 2 Statistical Concepts in Research
No ratings yet
Chapter 2 Statistical Concepts in Research
62 pages
002 Probability-and-Statistics-Part-4-Statistics
No ratings yet
002 Probability-and-Statistics-Part-4-Statistics
123 pages
Statistical Inference
No ratings yet
Statistical Inference
52 pages
13 Final Review
No ratings yet
13 Final Review
32 pages
Stats 1 For Students
No ratings yet
Stats 1 For Students
60 pages
Chapter 6 - Sampling and Estimation
No ratings yet
Chapter 6 - Sampling and Estimation
36 pages
Chapter Two (Estimation and Hypothesis Testing)
No ratings yet
Chapter Two (Estimation and Hypothesis Testing)
20 pages
Biostatistics Revision DR - NJ
No ratings yet
Biostatistics Revision DR - NJ
67 pages
Median Test
No ratings yet
Median Test
20 pages
Final Cheat Sheet 2
No ratings yet
Final Cheat Sheet 2
4 pages
CH 03 Wooldridge 5e PPT PDF
100% (3)
CH 03 Wooldridge 5e PPT PDF
35 pages
4 Inferentials
No ratings yet
4 Inferentials
53 pages
Statistics През
No ratings yet
Statistics През
46 pages
Research 9 Q3
No ratings yet
Research 9 Q3
17 pages
Estimation New
No ratings yet
Estimation New
37 pages
Statistics Cheat Sheet
100% (3)
Statistics Cheat Sheet
23 pages
STAT100 - Full Course Notes
No ratings yet
STAT100 - Full Course Notes
27 pages
Lecture 6 - Estimation Part A
No ratings yet
Lecture 6 - Estimation Part A
23 pages
Sampling
No ratings yet
Sampling
50 pages
Stats-And-Prob-Reviewer (Grade 11 Stem)
100% (1)
Stats-And-Prob-Reviewer (Grade 11 Stem)
5 pages
Review of Chapters 1-5
No ratings yet
Review of Chapters 1-5
21 pages
Tuesday, 16 January 2024 2:58 PM
No ratings yet
Tuesday, 16 January 2024 2:58 PM
46 pages
Statistical Sampling & Parameter Estimation: Prof M.Shashi
No ratings yet
Statistical Sampling & Parameter Estimation: Prof M.Shashi
25 pages
BUCSEP236P
No ratings yet
BUCSEP236P
45 pages
Chapter 2
No ratings yet
Chapter 2
30 pages
Ch-1.Ppt Business Statx
No ratings yet
Ch-1.Ppt Business Statx
66 pages
Research Methodology
No ratings yet
Research Methodology
104 pages
Sample Size Determination
100% (1)
Sample Size Determination
20 pages
RMP470S Lecture 7 - One-Dimensionalstatistics
No ratings yet
RMP470S Lecture 7 - One-Dimensionalstatistics
27 pages
6 Guidelines For BBS 4th Project Report
No ratings yet
6 Guidelines For BBS 4th Project Report
22 pages
Session On Confidence Interval
No ratings yet
Session On Confidence Interval
13 pages
Data Science Interview Q&A
100% (1)
Data Science Interview Q&A
39 pages
Planning, Implementation, Monitoring and Evaluation of Extension Program - Ag Ext 2
No ratings yet
Planning, Implementation, Monitoring and Evaluation of Extension Program - Ag Ext 2
37 pages
Quiz 2 Cheatsheet v3
No ratings yet
Quiz 2 Cheatsheet v3
2 pages
Business Analytics Module 2 Summary
No ratings yet
Business Analytics Module 2 Summary
3 pages
Stat Prob 3rd Quarter
No ratings yet
Stat Prob 3rd Quarter
5 pages
Statistics For Data Science
No ratings yet
Statistics For Data Science
30 pages
CHAPTER-4-Lesson-1 Stats
No ratings yet
CHAPTER-4-Lesson-1 Stats
4 pages
Lectorial Slides 6a
No ratings yet
Lectorial Slides 6a
30 pages
GEA1000 Final CS
No ratings yet
GEA1000 Final CS
3 pages
Unit-3 (Estimation)
No ratings yet
Unit-3 (Estimation)
16 pages
Stats Exam 1 Cheat Sheet
No ratings yet
Stats Exam 1 Cheat Sheet
3 pages
Stats
No ratings yet
Stats
2 pages
Probstats Reviewer
No ratings yet
Probstats Reviewer
3 pages
Business Statistics
No ratings yet
Business Statistics
25 pages
Stat Notes
No ratings yet
Stat Notes
5 pages
Sta 221 Test
No ratings yet
Sta 221 Test
1 page
Stat Prob
No ratings yet
Stat Prob
7 pages
Statistics For Data Science by Mihir Patnaik
No ratings yet
Statistics For Data Science by Mihir Patnaik
103 pages
Reviewer Statistics and Probability
No ratings yet
Reviewer Statistics and Probability
5 pages
Chapter 6-8 Sampling and Estimation
No ratings yet
Chapter 6-8 Sampling and Estimation
48 pages
Bio Statistics
No ratings yet
Bio Statistics
13 pages
Business Analytics Course Summary
No ratings yet
Business Analytics Course Summary
15 pages
GB Academy Equation List
No ratings yet
GB Academy Equation List
16 pages
Irb Application
No ratings yet
Irb Application
5 pages
Cheat Sheet 1
No ratings yet
Cheat Sheet 1
2 pages
Gea Cheatsheet
No ratings yet
Gea Cheatsheet
4 pages
URL Based Phishing Website Detection by Using Gradient and Catboost Algorithms
No ratings yet
URL Based Phishing Website Detection by Using Gradient and Catboost Algorithms
8 pages
Methods of Psychology
No ratings yet
Methods of Psychology
35 pages
DS 1.4 Ans
No ratings yet
DS 1.4 Ans
10 pages
Estimation: Large Characteristic of A Population Based On Its Sample
No ratings yet
Estimation: Large Characteristic of A Population Based On Its Sample
19 pages
Basic Geostatistics: Austin Troy
No ratings yet
Basic Geostatistics: Austin Troy
36 pages
Statistics: Dealing With Skewed Data
No ratings yet
Statistics: Dealing With Skewed Data
7 pages
Metode Statistika
No ratings yet
Metode Statistika
27 pages
Cape Applied Mathematics Cheat Sheet
No ratings yet
Cape Applied Mathematics Cheat Sheet
6 pages
A. Variables:: Types of Distributions
No ratings yet
A. Variables:: Types of Distributions
10 pages
Statistical Formula Sheet 1: X X N X N X F X N
No ratings yet
Statistical Formula Sheet 1: X X N X N X F X N
11 pages
121-124 (Dr. Athar 2)
No ratings yet
121-124 (Dr. Athar 2)
4 pages
Topic 6 - Confidence Interval Slides
No ratings yet
Topic 6 - Confidence Interval Slides
34 pages
BA Module 2 Summary
No ratings yet
BA Module 2 Summary
3 pages
Andom Ariables: Case 3
No ratings yet
Andom Ariables: Case 3
56 pages
Repeated Measures ANOVA and Two-Factor (Factorial) ANOVA
No ratings yet
Repeated Measures ANOVA and Two-Factor (Factorial) ANOVA
32 pages
SSRN Id3449848 PDF
No ratings yet
SSRN Id3449848 PDF
40 pages
Score-Based Continuous-Time Discrete Diffusion Models
No ratings yet
Score-Based Continuous-Time Discrete Diffusion Models
16 pages
0 A Critical Analysis of Agile and Lean Methodology To Fulfill The Project Management Gaps in NPOs
No ratings yet
0 A Critical Analysis of Agile and Lean Methodology To Fulfill The Project Management Gaps in NPOs
17 pages
Coursework Brief
No ratings yet
Coursework Brief
3 pages
SCSA3015 Deep Learning Quiz For IV Year (Batch 2019 - 2023)
No ratings yet
SCSA3015 Deep Learning Quiz For IV Year (Batch 2019 - 2023)
15 pages
Naïve Model Period A. Demand Forecast F.Error Bs. Forecast Mse Percentage Error
No ratings yet
Naïve Model Period A. Demand Forecast F.Error Bs. Forecast Mse Percentage Error
6 pages
Dray Etal 2014 Ecology
No ratings yet
Dray Etal 2014 Ecology
8 pages
181 1425 1 PB
No ratings yet
181 1425 1 PB
15 pages
3.1 Introductory Statement:: Chapter 3: Research Methodology
No ratings yet
3.1 Introductory Statement:: Chapter 3: Research Methodology
12 pages
Determination of Titer and Method Blank For Thermometric Titrations Using Tiamo
No ratings yet
Determination of Titer and Method Blank For Thermometric Titrations Using Tiamo
2 pages
Paper 4 PDF
No ratings yet
Paper 4 PDF
6 pages
1.1 What Is Data Mining?
No ratings yet
1.1 What Is Data Mining?
6 pages
RCT+Appraisal+sheets 2005
No ratings yet
RCT+Appraisal+sheets 2005
3 pages
Statistics: a QuickStudy Laminated Reference Guide
From Everand
Statistics: a QuickStudy Laminated Reference Guide
BarCharts Publishing, Inc.
No ratings yet

FlowChart V20

Uploaded by

FlowChart V20

Uploaded by

Data Types Categorical Data that is grouped by categories.

Quantiles (Crit. sample mean)

=norm.dist(x̄crit,μ,SE,TRUE) =norm.inv(prob,μ,SE) Z=(x̄ - μ) / SE

=1- norm.dist(x̄crit,μ,SE,TRUE) =norm.inv(1-prob,μ,SE) x̄ =Z*SE +μ

Prob and Quantile: Prob and Quantile:

Blue: What you provide function Red: What function

π = pop. proportion Given

Quantiles (Crit. sample prop.)

=norm.dist(pcrit,π,SE,TRUE) =norm.inv(prob,π,SE) Z=(p - π) / SE

=1- norm.dist(pcrit,π,SE,TRUE) =norm.inv(1-prob,π,SE) p =Z*SE +π

z1tail = 1 tailed zcrit =NORM.S.INV(1 - α)

Key Variables: Calculate:

df = deg. of freedom =n-1 μ ≥ x̄ - t1tail * SE

SE = standard error = s/sqrt(n) U: INF

Estimate Min Sample Size for

α = CI miss rate Given or 1 - CL

σ= pop. st. dev Given

p= sample prop Given OR use 0.5 for worst

Concepts: Confidence Interval x̄ + t1tail * SE

α is the % of time the CI misses the pop. parameter.

Type 1 Error: α hypothesis incorrect.

Percent of the time The null hypothesis is

4. Establish α 5. Determine Testing Method

Type 1 F= software calculation

7. Design Experiment and Gather Data

Good Sample? Is the sample representative of the population?

8. Calculate Test Statistic and p value

Means (Equal Var) Means (Unequal Variance) Proportions

10. Conclusion (and CI if necessary)

We are (Confidence Level)% confident that the true (population

2. Determine Population and Parameter

Type 1 observations empty spots

We collected sufficient evidence (F=, pvalue =, α =) to reject the

You might also like