Statistics: 1. Basics
Statistics: 1. Basics
1. Basics
a) Bernoulli ”1/2”: One trial, with a probability of ½ for one of two possible outcomes
b) Binomial distribution: 2 outcomes
1 −x / 22
2. Confidence interval
a) 95% CI for p (α=0.05): probability that the value falls between p ± 1.96 SD is 95%;
99% CI for p: probability that the value falls between p ± 2.576 SD is 99% (more conservative)
b) Margin of error: CI = [a,b]; MOE = (b-a)/2
c) CI (how probable a value will be within the CI) for mean vs CI for proportions (how probable the CI
calculated for a sample will include the population mean)
3. Sample size
p ( 1− p )
a) 1.96
√ n
=Marginof error
Given p, select an appropriate Margin of error, solve for n. Note that 1.96 is the Z α/2 Critical value
for CI = 95%
a) Slightly wider that N-distribution due to additional uncertainties owning to estimation of Sample
Variance
b) Degrees of freedom = n-1
c) Replace 1.96 with appropriate value (will be >1.96) from mathematical table
d) Sample variance:
5. Matched pairs
a) Used when the experimental data is closely similar e.g. twins, before/after on same subject, two
methods for same data
b) Not independent
c) Analysis: Find Mean of difference between the two data sets, and S.d of the difference; Then use to
6. Comparing two proportions A & B (two sets of data do not have the above
relations in matched pairs)
a) Analysis: You can either compute CI (interval that the difference is likely to be observed at 95% CI)
or Hypothesis testing (H0 vs HA)
b) CI : 95% CI = (pA−pB)±1.96∗√ Var (pA - pB)
Var (pA -pB) = pA (1- pA)/nA + pB (1- pB)/nB
c) Hypothesis method:
d) E.g.: A poll carried out in January surveyed 560 people and found that 45% of them supported a
political candidate. A poll about the same political candidate was also carried out in April and
showed that out of the 1100 people surveyed 52% supported the candidate. What is the CI? Is there
a significant difference between pA (0.45) & pB (0.52)?
b) For analysis of H0 : µA = µB; Var A = Var B (i.e. pooled variance), using the same plug-in formula
above,
8. Regression
To answer the question: How close does the experimental slope b1 is to the real slope β1? You need to
consider R^2 and SE of b1
a) Hypothesis testing:
b) Confidence Interval
c) Remember to check the outliers and ensure a random distribution. If not, the data suggest non-
linearism.