Study Guide - Inference Procedures
Study Guide - Inference Procedures
MA 217
Binary
2 Samp. Prop.
2
2 Samp. Mean
Mult. Cat.
2
2
ANOVA
Numerical
2 Samp. Mean
ANOVA
Linear Reg.
(a) Both subsamples are SRS and are independent of each other; or the one sample
is an SRS, depends on how sample(s) were taken.
(b) Both subpopulations are large, at least 20 times sample; or the one population
is 20 times the one sample size, depends on how sample(s) were taken (rule of
thumb for independence of individuals).
(c) 0 : 15 : 40 rule for each subsample/subpopulation (rule of thumb for normality),
does not depend how sample(s) were taken.
Procedure: ANOVA
Use for: HT comparing the means of two or more populations, or relating a categorical
variable to a numerical.
Hints: Multiple columns of numbers. The words mean or average. The words difference or comparing, independence or relationship. Multiple populations, or
a categorical variable with multiple different values.
Assumptions Must meet all of
(a) All subsamples are SRS and are independent of each other; or the one sample is
an SRS, depends on how sample(s) were taken.
(b) All subpopulations are large, at least 20 times subsample; or the one population
is 20 times the one sample size, depends on how sample(s) were taken (rule of
thumb for independence of individuals),
(c) 0 : 15 : 40 rule for each subsample/subpopulation (rule of thumb for normality),
does not depend on how sample(s) were taken. If 15 n 40, suffices if
skewness is between 1 and 1.
(d) (rule of thumb for equal standard deviations) must meet both of
(i) All samples have n 5
(ii) Ratio of largest to smallest sample s.d. is at most 2.
Procedure: Linear Regression
Parameter: The slope fo the least squares line for the entire population relating the explanatory variable and the response variable
Use for: CI for the slope of the least squares line for two numerical variables in a population, HT comparing that slope to 0 (equivalently, for independence of the two
variables).
Hints: Two columns of numbers. Single population, no categorical variable. The words
slope, or linear.
Assumptions Must meet all of
(a) Sample is SRS.
(b) Large population, at least 20 times sample (rule of thumb for independence of
individuals).
(c) (rule of thumb for Linear Model) must meet all of
(i) n 40
(ii) Scatterplot of residuals shows no curvature, constant spread around line,
and no major outliers (or use scatterplot of residuals for more detail).
(iii) Histogram of residuals is not too skew, no major outliers.
Procedure: One sample proportion (z-procedure)
Parameter: The proportion of successes or yes answers in the entire population
Use for: CI estimating the population proportion of a single binary (categorical) variable
in a single population, or HT comparing that proportion to a given test proportion.
Hints: A single count or proportion of individuals with a given property. The words
proportion or percentage. No different treatments, no different populations.
Assumptions Must meet all of
(a) SRS
(b) Large population, at least 20 times sample (rule of thumb for independence of
individuals)
(c) Rule of 15 (rule of thumb for normality). Depends on version
(i) For CI, number of successes and failures must both be at least 15.
(ii) For HT, expected number of successes and failures, i.e. np0 and n(1 p0 )
must both be at least 15.
Procedure: Two sample proportion (z-procedure)
Parameter: The difference in the proportion of yes or successes in the first population
minus the proportion of yes or successes in the second population
Use for: CI estimating the difference in population proportions between two binary (categorical) variables or one variable under two circumstances, or HT comparing these
two proportions.
Hints: Two counts or proportions of individuals with a given property. The words proportion or percentage. The words difference or comparing. Two different
populations, or a second categorical variable with two different values.
Assumptions Must meet all of
(a) Both samples SRS and independent of each other.
(b) Both populations large, at least 20 times sample (rule of thumb for independence
of individuals)
(c) Rule of 10/5 (rule of thumb for normality). All of the numbers of successes and
failures in each of the two samples must be at least 10 (that is four numbers, all
at least 10). If you are using the two-tailed alternative hypothesis, use 5 instead
of 10.
Procedure: 2 or Chi-Squared Procedure
Use for: HT relating two categorical variables, or comparing a categorical variable in
multiple populations.
Hints: A rectangular table of counts. The words proportion or percentage. The words
difference or comparing, independence or relationship. Multiple populations,
or a categorical variable with multiple different values.
Assumptions Must meet all of
(a) All samples SRS and independent of each other.
(b) All populations large, at least 20 times sample (rule of thumb for independence
of individuals)
(c) Rule of 5 (rule of thumb for normality). At least 80% of all expected cells must
be at least 5.