0% found this document useful (0 votes)
555 views

Statistics Cheatsheet

1. This document provides a cheat sheet on key statistics concepts including: fundamentals like populations, samples, and variables; measures of central tendency and dispersion; distributions; bivariate relationships; experimental design; and probability concepts. 2. It defines important terms like mean, median, variance, standard deviation, and correlation and outlines statistical procedures like hypothesis testing, confidence intervals, and linear regression. 3. Key probability topics covered include sample spaces, empirical and theoretical probability, independent and mutually exclusive events, and conditional probability.

Uploaded by

naticool1115906
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
555 views

Statistics Cheatsheet

1. This document provides a cheat sheet on key statistics concepts including: fundamentals like populations, samples, and variables; measures of central tendency and dispersion; distributions; bivariate relationships; experimental design; and probability concepts. 2. It defines important terms like mean, median, variance, standard deviation, and correlation and outlines statistical procedures like hypothesis testing, confidence intervals, and linear regression. 3. Key probability topics covered include sample spaces, empirical and theoretical probability, independent and mutually exclusive events, and conditional probability.

Uploaded by

naticool1115906
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 3

1 011333 0 56677

2 etc 1
Statistics Cheat Sheet q. Mean: x = ∑xi / n
Mr. Roth , Mar 2004 r. Median: M: If odd – center, if even - mean of 2
1. Fundamentals s. Boxplot:
Min Q1 M Q3 Max
a. Population – Everybody to be analysed
 Parameter - # summarizing Pop

b. Sample – Subset of Pop we collect data on


Variance: s 2 = ∑( x − x ) /( n −1) = SS x /( n −1) ,
2
t.
 Statistics - # summarizing Sample

c. Quantitative Variables – a number u. p78: standard deviation, s = √s2


 Discrete – countable (# cars in family)
v. SS x = ∑( x − x ) 2 = ∑ x 2 − (∑ x ) 2 / n
 Continuous – Measurements – always #

between w. Density curve – relative proportion within classes –


d. Qualitative area under curve = 1
 Nominal – just a name
x. Normal Distribution: 68, 95, 99.7 % within 1, 2, 3 std
 Ordinal – Order matters (low, mid, high)
deviations.

Choosing a Sample y. p98: z-score z = ( x − x ) / s or ( x − µ) / σ


z. Standard Normal: N(0,1) when N(μ,σ)
• Sample Frame – list of pop we choose sample from
• Biased – sampling differs from pop characteristics. 3. Bivariate - Scatterplots & Correlation
• Volunteer Sample – any of below three types may a. Explanatory – independent variable
end up as volunteer if people choose to respond. b. Response – dependent variable
Sample Designs c. Scatterplot: form, direction, strength, outliers
e. Judgement Samp: Choose what we think represents d. – form is linear negative, …
 Convenience Sample – easily accessed people e. – to add categorical use different color/symbol
f. Probability Samp: Elements selected by Prob f. p147: Linear Correlation- direction & strength of
 Simple random sample – every element = linear relationship
chance g. Pearsons Coeff: {-1 ≤ r ≤ 1} 1 is perfectly linear +
 Systematic sample – almost random but we slope, -1 is perfectly linear – slope.
choose by method 1 (x − x) ( y − y) SS xy
g. Census – data on every everyone/thing in pop h. r= *∑ = ,
n −1 sx sy SS x SS y
Stratified Sampling
i. r = zxzy / (n - 1),
Divide pop into subpop based upon characteristics
h. Proportional: in proportion to total pop j. SS xy = ∑xy −
∑x ∑ y
n
i. Stratified Random: select random within substrata
j. Cluster: Selection within representative clusters 4. Regression
Collect the Data k. least squares – sum of squares of vertical error
minimized
k. Experiment: Control the environment 
l. p154: y = b0 + b1x, or y = a + bx ,
l. Observation:
m. (same as y = mx + b)
2. Single Variable Data - Distributions
m. Graphing Categorical: Pie & bar chart) n. b1 =
∑( x − x )( y − y ) = SS xy
= r (sy / sx)
n. Histogram (classes, count within each class) ∑( x − x ) 2
SS x

o. – shape, center, spread. Symmetric, skewed right, o. Then solving knowing lines thru centroid (
skewed left ( x , y ); a = y −bx
p. Stemplots
0 11222 0 112233 p. b0 =
∑ y − (b ∑ x)1

n
Statistics Cheat Sheet
q. r^2 is proportion of variation described by linear c. 2) Theoretical: Relative frequency/proportion of a
relationship given event given all possible outcomes (Sample

r. residual = y - y = observed – predicted. Space)
s. Outliers: in y direction -> large residuals, in x d. Event: outcome of random phenomenon
direction -> often influential to least squares line. e. n(S) – number of points in sample space
t. Extrapolation – predict beyond domain studied f. n(A) – number of points that belong to A
u. Lurking variable g. p 183: Empirical: P'(A) = n(A)/n = #observed/
v. Association doesn't imply causation #attempted.
h. p 185: Law of large numbers – Exp -> Theoret.
5. Data – Sampling
i. p. 194: Theoretical P(A) = n(A)/n(S) ,
a. Population: entire group favorable/possible
b. Sample: part of population we examine j. 0 ≤ P(A) ≤ 1, ∑ (all outcomes) P(A) = 1
c. Observation: measures but does not influence k. p. 189: S = Sample space, n(S) - # sample points.
response Represented as listing {(, ), …}, tree diagram, or grid
d. Experiment: treatments controlled & responses l. p. 197 Complementary Events P(A) + P( A ) = 1
observed
m. p200: Mutually exclusive events: both can't happen
e. Confounded variables (explanatory or lurking) when
at the same time
effects on response variable cannot be distinguished
n. p203. Addition Rule: P(A or B) = P(A) + P(B) – P(A
f. Sampling types: Voluntary response – biased to
and B) [which = 0 if exclusive]
opinionated, Convenience – easiest
o. p207: Independent Events: Occurrence (or not) of A
g. Bias: systematically favors outcomes
does not impact P(B) & visa versa.
h. Simple Random Sample (SRS): every set of n
p. Conditional Probability: P(A|B) – Probability of A
individuals has equal chance of being chosen
given that B has occurred. P(B|A) – Probability of B
i. Probability sample: chosen by known probability given that A has occurred.
j. Stratified random: SRS within strata divisions q. Independent Events iff P(A|B) = P(A) and P(B|A) =
k. Response bias – lying/behavioral influence P(B)
6. Experiments r. Special Multiplication. Rule: P(A and B) = P(A)*P(B)
a. Subjects: individuals in experiment s. General mult. Rule: P(A and B) = P(A)*P(B|A) =
P(B)*P(A|B)
b. Factors: explanatory variables in experiment
t. Odds / Permutations
c. Treatment: combination of specific values for each
factor u. Order important vs not (Prob of picking four
numbers)
d. Placebo: treatment to nullify confounding factors
v. Permutations: nPr, n!/(n – r)! , number of ways to
e. Double-blind: treatments unknown to subjects &
pick r item(s) from n items if order is important :
individual investigators
Note: with repetitions p alike and q alike = n!/p!q!.
f. Control Group: control effects of lurking variables
w. Combinations: nCr, n!/((n – r)!r!) , number of ways
g. Completely Randomized design: subjects allocated to pick r item(s) from n items if order is NOT
randomly among treatments important
h. Randomized comparative experiments: similar x. Replacement vs not (AAKKKQQJJJJ10) (a) Pick an
groups – nontreatment influences operate equally A, replace, then pick a K. (b) Pick a K, keep it, pick
i. Experimental design: control effects of lurking another.
variables, randomize assignments, use enough y. Fair odds - If odds are 1/1000 and 1000 payout. May
subjects to reduce chance take 3000 plays to win, may win after 200.
j. Statistical signifi: observations rare by chance
8. Probability Distribution
k. Block design: randomization within a block of
individuals with similarity (men vs women) a. Refresh on Numb heads from tossing 3 coins. Do
grid {HHH,….TTT} then #Heads vs frequency
7. Probability & odds chart{(0,1), (1,3), (2,3), (4,1)} – Note Pascals triangle
a. 2 definitions: b. Random variable – circle #Heads on graph above.
b. 1) Experimental: Observed likelihood of a given "Assumes unique numerical value for each outcome
outcome within an experiment in sample space of probability experiment".

42010936.doc -2- Printed 10/15/2010


Statistics Cheat Sheet
c. Discrete – countable number a. Statistical Inference: methods for inferring data
d. Continuous – Infinite possible values. about population from a sample
e. Probability Distribution: Add next to coins frequency b. If x is unbiased, use to estimate μ
chart a P(x) with 1/8, 3/8, 3/8, 1/8 values c. Confidence Interval: Estimate+/- error margin
f. Probability Function: Obey two properties of prob. (0 d. Confidence Level C: probability interval captures
≤ P(A) ≤ 1, ∑ (all outcomes) P(A) = 1. true parameter value in repeated samples
g. Parameter: Unknown # describing population e. Given SRS of n & normal population, C confidence
h. Statistic: # computed from sample data interval for μ is: x ± z * σ / n
Sample Population f. Sample size for desired margin of error – set +/-
Mean x μ - mu value above & solve for n.
2
Variance s σ2
Standard
12. Tests of significance
s σ - sigma
deviation g. Assess evidence supporting a claim about popu.

Base: x = ∑x / n , s 2 = ∑
(x − x)
2 h. Idea – outcome that would rarely happen if claim
i. were true evidences claim is not true
( n −1)
i. Ho – Null hypothesis: test designed to assess
Frequency Dist Probability Distribution evidence against Ho. Usually statement of no effect
Me x = ∑xf / ∑ f µ = ∑[ xP ( x )]
j. Ha – alternative hypothesis about population
an
parameter to null
Var
∑( x − x ) f
2
σ = ∑[( x − µ) P ( x )]
2 2

s2 = k. Two sided: Ho: μ = 0, Ha: μ ≠ 0


(∑ f −1)
l. P-value: probability, assuming Ho is true, that test
Std s = √s2 σ= σ 2
statistic would be as or more extreme (smaller P-
Dv value is > evidence against Ho)
j. Probability acting as an f / ∑f . Lose the -1 x −µ
m. z=
9. Sampling Distribution σ/ n
a. By law of large #'s, as n -> population, x → µ n. Significance level α : if α = .05, then happens no
more than 5% of time. "Results were significant (P
b. Given x as mean of SRS of size n, from pop with μ
< .01 )"
and σ. Mean of sampling distribution of x is μ and
o. Level α 2-sided test rejects Ho: μ = μo when uo falls
standard deviation is σ / n outside a level 1 – α confidence int.
c. If individual observations have normal distribution a. Complicating factors: not complete SRS from
N(μ,σ) – then x of n has N(μ, σ / n ) population, multistage & many factor designs,
d. Central Limit Theorem: Given SRS of b from a outliers, non-normal distribution, σ unknown.
population with μ and σ. When n is large, the b. Under coverage and nonresponse often more
x is approx normal. serious than the random sampling error accounted
sample mean
for by confidence interval
10. Binomial Distribution c. Type I error: reject Ho when it's true – α gives
a. Binomial Experiment. Emphasize Bi – two possible probability of this error
outcomes (success,failure). n repeated identical d. Type II error: accept Ho when Ha is true
trials that have complementary P(success) + e. Power is 1 – probability of Type II error
P(failure) = 1. binomial is count of successful trials
where 0≤x≤n
b. p : probability of success of each observation
c. Binomial Coefficient: nCk = n!/(n – k)!k!
n  k n −k
d. Binomial Prob: P(x = k) =   p (1 − p )
 
k
e. Binomal μ = np
f. Binomal σ = np (1 − p )

11. Confidence Intervals

42010936.doc -3- Printed 10/15/2010

You might also like