0% found this document useful (0 votes)
28 views2 pages

STATISTICS AT A GLANCE (Cheat Sheet)

Uploaded by

kim769502
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views2 pages

STATISTICS AT A GLANCE (Cheat Sheet)

Uploaded by

kim769502
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

DESCRIPTIVE/SUMMARY STATISTICS

Discipline of quantitatively describing the main features of a collection of data Numerical and graphical summaries used to characterize a dataset
CENTER measure of central tendency --- the typical or average value --- (mean, median, mode)
The three main measures are SPREAD measure of dispersion or variability of the data --- (standard deviation, variance, min, max, range)
SHAPE symmetric or skewed data --- (bell-shaped, normal curve, left/negative skewed, right/positive skewed)

The tools used for describing a collection of data are dependent on the nature of the data ----- Two main data types:

CATEGORICAL DATA (aka… qualitative) or QUANTITATIVE DATA (aka...numeric or measurement)


Continuous - data that have an infinite number of real values and there
Categorical Data Fit into Defined Groups ----- Two types of categorical data: are no spaces/gaps between values (rounded to a specified precision)
EXAMPLES: BP, temperature, BMI, height, weight, blood serum level
NOMINAL DATA or ORDINAL DATA (aka…ranked data)
Discrete - data that have a finite number of values within a given interval
GROUPS HAVE NO NATURAL ORDERING GROUPS HAVE A NATURAL ORDERING and there are spaces/gaps between values (typically counts)
Examples: gender, race, blood type, eye color, EXAMPLES: test score, pages in a book, population of a country, # of trees in a forest
Examples: satisfaction level (Likert scale), educational level,
political affiliation, country of residence shirt size, medical condition (good, fair, serious, critical)
MEAN = arithmetic average
Measures of Center: Measures of Center: Measures of Center MEDIAN = middle value
MODE = category w/ largest count MODE = most numerous value
MEDIAN = category containing middle value
common
Measures of Spread – not germane MODE = category with largest count
STANDARD DEV = average distance from center
with nominal data
MIN = minimum category Measures VARIANCE = (Standard deviation)2
Measures
Shape – not germane with nominal MAX = maximum category of MIN = minimum value
of
data RANGE = min cat. to max cat. Spread MAX = maximum value
Spread IQR = middle 50 percent of the data
RANGE = maximum - minimum
9

Shape – seldom used (can be problematic due to


8
MODE possible unequal or unquantifiable changes/differences SYMMETRIC – bell-shaped? if yes, is it normal?
C 7 in magnitude among/between categories) Shape
O SKEWED – left/negative right/positive
6
U
N 5

T 4

(Mean > Median > Mode) (Mean < Median < Mode)
0

O A B AB

Bar Chart / Bar Graph for Blood Type


No Skew / Symmetric
Histogram for Blood Pressure Classification (Mean = Median = Mode)
INFERENTIAL STATISTICS
Inference Examines/Investigates a Possible Relationship between Variables
Representative Sample(s) of Data are used to make Conclusions about a Broader Population
Calculate a test statistic which is then used to determine a p-value
Hypothesis Testing Significance if calculated p-value is ≤ level of significance (α) usually = .05
Two most common procedures making up inferential statistics CI point estimate ± margin of error (confidence level usually 95%)
Confidence Intervals (CI) Significance if one CI does not capture a null value or if two CIs do not overlap

In most cases, the variables of interest can be assigned generic names that help define the relationship being examined – these two variable types are:
Explanatory Variable (aka… Independent or Predictor Variable) AND Response Variable (aka… Dependent or Outcome Variable)
The simplest type of inferential statistics is univariate analysis which involves ONE EXPLANATORY variable and ONE RESPONSE variable
EXAMPLES: height predicts weight? --- blood type explains cholesterol level? --- aspirin use explains occurrence of heart attack?

One Quantitative Response Variable One Quantitative Response Variable One Categorical Response Variable
One Quantitative Explanatory Variable One Categorical Explanatory Variable One Categorical Explanatory Variable

Simple Linear regression (SLR) ANOVA 3 or more groups/categories Chi-square test of a relationship/association
Used for prediction and to measure how much one variable T-test 1 or 2 groups/categories between two variables
increases/decreases per unit of change in the other variable
Generic hypothesis for 2 or more samples: H0: The two variables are not related/associated
H0: β1 = 0 (slope = 0 , so y and x not linearly related) H0: The means (µ) for the categories are equal Ha: The two variables are related/associated
Ha: β1 ≠ 0 (slope ≠ 0 , so y and x linearly related) Ha: At least one mean (µ) for the categories differs EXAMPLE: Chi-square test for a relationship between
Regression equation
EXAMPLE: One-way ANOVA (ANALYSIS OF VARIANCE) aspirin use and heart attack (MI)
β0 is the y intercept
E (Y) = β0 + β1 x Since p-value = .0001, there
β1 is slope of the regression line Test if there is a difference in mean cholesterol
Statistic DF Value Prob is strong evidence of a
Example: weight = -97.2 + 3.72 (height) levels between 4 different blood types (O, A, B, AB) statistically significant
( Scatterplot with regression line )
Chi-Square 1 25.0139 .0001 relationship (at the .05 level)
EXAMPLE: Two-sample T-test between aspirin use and MI
Scatter plot --- y vs. x
Test if there is a difference in mean life spans
between sexes (i.e. male vs. female) Risk of MI w/ Aspirin
Weight lbs

(two-way or contingency table)


104 ⁄ 11037 = .0094
r = + .85 Population Distributions for Male and Female Life Span Heart Attack (MI)?
Treatment Odds of MI w/ Placebo
Yes No Total 189 ⁄ 10845 = .0174

Height inches Aspirin 104 10933 11037 Odds ratio for MI


Correlation coefficient -- direction and strength of a linear Placebo vs. Aspirin
relationship -- usually represented by r or ρ (Rho) [ -1 ≤ r ≤ +1 ] Placebo 189 10845 11034
189 ⁄ 10845
µmale µfemale = 1.8321
Positive correlation r ≤ |.3| ↔ weak (none if r = 0) 104 ⁄ 10933
r > 0 ↔ y ↑ as x ↑ --- male life span --- female life span Total 293 21778 22071
|.3| < r < |.7| ↔ moderate Hence, the odds of MI w/
Negative correlation H0: µ male = µ female Placebo trt are ≈ 1.8 times
r < 0 ↔ y ↓ as x ↑ r ≥ |.7| ↔ strong (perfect if r = ±1)
Ha: µ male ≠ µ female greater than w/ Aspirin trt

You might also like