1a-Biostat Review
1a-Biostat Review
Workshop
Pham Ngoc Thach University of Medicine
Rishi Chakraborty1,2
1Center for AIDS Research, Duke University, North Carolina, USA
2Department of Biostatistics and Bioinformatics, Duke University, North Carolina, USA
Biostatics Review
1
MODELS
LINEAR MODELS
Y is Normally distributed. Y ~ N( µ , σ2 )
2
If all the independent variables are numeric,
we have regression.
LINEAR MODELS
3
Exceptions
4
The sciences do not try to explain, they hardly
even try to interpret, they mainly make
models. By a model is meant a mathematical
construct which, with the addition of certain
verbal interpretations, describes observed
phenomena. The justification of such a
mathematical construct is solely and precisely
that it is expected to work.
Inferential
Probability
Statistics
2
∑ (X i - X) 2
0 < p < 1
Discrete Distributions
µ−σ µ µ+σ 9
A standard Normal distribution is one where
µ = 0 and σ2 = 1. This is denoted by Z.
Z ~ N(0 , 1)
-3 -2 -1 0 1 2 3 10
Table A of the statistical tables gives cumulative
probabilities for a standard Normal distribution.
-3 -2 -1 0 1 2 3 11
1.27
Table A (continued)
Z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09 Z
0.00 .5000 .5040 .5080 .5120 .5160 .5199 .5239 .5279 .5319 .5359 0.00
0.10 .5398 .5438 .5478 .5517 .5557 .5596 .5636 .5675 .5714 .5753 0.10
0.20 .5793 .5832 .5871 .5910 .5948 .5987 .6026 .6064 .6103 .6141 0.20
0.30 .6179 .6217 .6255 .6293 .6331 .6368 .6406 .6443 .6480 .6517 0.30
0.40 .6554 .6591 .6628 .6664 .6700 .6736 .6772 .6808 .6844 .6879 0.40
0.50 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224 0.50
0.60 .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549 0.60
0.70 .7580 .7611 .7642 .7673 .7704 .7734 .7764 .7794 .7823 .7852 0.70
0.80 .7881 .7910 .7939 .7967 .7995 .8023 .8051 .8078 .8106 .8133 0.80
0.90 .8159 .8186 .8212 .8238 .8264 .8289 .8315 .8340 .8365 .8389 0.90
1.00 .8413 .8438 .8461 .8485 .8508 .8531 .8554 .8577 .8599 .8621 1.00
1.10 .8643 .8665 .8686 .8708 .8729 .8749 .8770 .8790 .8810 .8830 1.10
1.20 .8849 .8869 .8888 .8907 .8925 .8944 .8962 .8980 .8997 .9015 1.20
1.30 .9032 .9049 .9066 .9082 .9099 .9115 .9131 .9147 .9162 .9177 1.30
1.40 .9192 .9207 .9222 .9236 .9251 .9265 .9279 .9292 .9306 .9319 1.40
1.50 .9332 .9345 .9357 .9370 .9382 .9394 .9406 .9418 .9429 .9441 1.50
12
Table A of the statistical tables gives cumulative
probabilities for a standard Normal distribution.
-3 -2 -1 0 1 2 3 13
1.27
For other Normal distributions, we can convert
to a standard Normal by standardizing.
Y-µ
Z = ~ N(0 , 1)
σ
60 - 77
P(Y < 60) = P Z <
11.6
= P(Z < -1.47) = .0708 14
Other Distributions
t
- one parameter, called the df
- similar to a Z, but with “fatter tails”
15
Table B
Percentiles of the t-Distribution
16
Other Distributions
t
- one parameter, called the df
- similar to a Z, but with “fatter tails”
F
- two parameter, called the numerator df
and the denominator df
- specific percentiles are in Tables D1 – D3
18
Sampling Distributions
19
Sampling Distribution of X
E(X) = µ
σ2 σ
Var(X) = ⇒ s.e.(X) =
n n
2
σ
If X ~ N(µ , σ ), then X ~ N µ ,
2
n
X -µ
⇒ ~ N ( 0 , 1)
σ/ n 20
Central Limit Theorem
X -µ
~ N ( 0 , 1)
σ/ n
21
Statistical Inference
- Estimation
- Hypothesis Testing
A point estimate is a single statistic that is
used to estimate a population parameter.
We can also estimate a parameter by a
100(1-α)% confidence interval.
(
X ± t (n−1),1−α/2 ) s
n
24
Hypothesis testing
27
Hypothesis Testing Steps
1) Determine hypotheses
2) Decide on α ( .01 , .05 , .10 )