STATISTICS AT A GLANCE (Cheat Sheet)
STATISTICS AT A GLANCE (Cheat Sheet)
Discipline of quantitatively describing the main features of a collection of data Numerical and graphical summaries used to characterize a dataset
CENTER measure of central tendency --- the typical or average value --- (mean, median, mode)
The three main measures are SPREAD measure of dispersion or variability of the data --- (standard deviation, variance, min, max, range)
SHAPE symmetric or skewed data --- (bell-shaped, normal curve, left/negative skewed, right/positive skewed)
The tools used for describing a collection of data are dependent on the nature of the data ----- Two main data types:
T 4
(Mean > Median > Mode) (Mean < Median < Mode)
0
O A B AB
In most cases, the variables of interest can be assigned generic names that help define the relationship being examined – these two variable types are:
Explanatory Variable (aka… Independent or Predictor Variable) AND Response Variable (aka… Dependent or Outcome Variable)
The simplest type of inferential statistics is univariate analysis which involves ONE EXPLANATORY variable and ONE RESPONSE variable
EXAMPLES: height predicts weight? --- blood type explains cholesterol level? --- aspirin use explains occurrence of heart attack?
One Quantitative Response Variable One Quantitative Response Variable One Categorical Response Variable
One Quantitative Explanatory Variable One Categorical Explanatory Variable One Categorical Explanatory Variable
Simple Linear regression (SLR) ANOVA 3 or more groups/categories Chi-square test of a relationship/association
Used for prediction and to measure how much one variable T-test 1 or 2 groups/categories between two variables
increases/decreases per unit of change in the other variable
Generic hypothesis for 2 or more samples: H0: The two variables are not related/associated
H0: β1 = 0 (slope = 0 , so y and x not linearly related) H0: The means (µ) for the categories are equal Ha: The two variables are related/associated
Ha: β1 ≠ 0 (slope ≠ 0 , so y and x linearly related) Ha: At least one mean (µ) for the categories differs EXAMPLE: Chi-square test for a relationship between
Regression equation
EXAMPLE: One-way ANOVA (ANALYSIS OF VARIANCE) aspirin use and heart attack (MI)
β0 is the y intercept
E (Y) = β0 + β1 x Since p-value = .0001, there
β1 is slope of the regression line Test if there is a difference in mean cholesterol
Statistic DF Value Prob is strong evidence of a
Example: weight = -97.2 + 3.72 (height) levels between 4 different blood types (O, A, B, AB) statistically significant
( Scatterplot with regression line )
Chi-Square 1 25.0139 .0001 relationship (at the .05 level)
EXAMPLE: Two-sample T-test between aspirin use and MI
Scatter plot --- y vs. x
Test if there is a difference in mean life spans
between sexes (i.e. male vs. female) Risk of MI w/ Aspirin
Weight lbs