0% found this document useful (0 votes)
24 views28 pages

1a-Biostat Review

Uploaded by

dieu2802
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views28 pages

1a-Biostat Review

Uploaded by

dieu2802
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Introductory HIV/AIDS Data Analysis

Workshop
Pham Ngoc Thach University of Medicine

Rishi Chakraborty1,2
1Center for AIDS Research, Duke University, North Carolina, USA
2Department of Biostatistics and Bioinformatics, Duke University, North Carolina, USA

March 19-20, 2024

Biostatics Review

1
MODELS

Relate a dependent variable,outcome,


or response,Y, to some other variable(s).

These other variable(s) are called


s
independent or regressor variables, X’ .

LINEAR MODELS
Y is Normally distributed. Y ~ N( µ , σ2 )
2
If all the independent variables are numeric,
we have regression.

If all the independent variables are,


categorical, we have analysis of variance.

If we have both types of independent


variables, we have analysis of covariance.

LINEAR MODELS
3
Exceptions

Correlation - not a model at all

Logistic regression - dichotomous outcome

Poisson regression - count outcome

4
The sciences do not try to explain, they hardly
even try to interpret, they mainly make
models. By a model is meant a mathematical
construct which, with the addition of certain
verbal interpretations, describes observed
phenomena. The justification of such a
mathematical construct is solely and precisely
that it is expected to work.

John Von Neumann


5
Population (parameters, N obs.)

Inferential
Probability
Statistics

Sample (statistics, n obs.)


6
Statistics
n
∑ Xi
Sample mean, X = i=1
n
Sample median n

2
∑ (X i - X) 2

Sample variance, s = i=1


(n -1)
2
Sample standard deviation, s = s 7
Probability

0 < p < 1

Discrete Distributions

Binomial - number of successes in n trials

Poisson - number of events in an interval


8
Continuous Distributions
Normal Y ~ N( µ , σ2 )

µ−σ µ µ+σ 9
A standard Normal distribution is one where
µ = 0 and σ2 = 1. This is denoted by Z.
Z ~ N(0 , 1)

-3 -2 -1 0 1 2 3 10
Table A of the statistical tables gives cumulative
probabilities for a standard Normal distribution.

P(Z < 1.27)

-3 -2 -1 0 1 2 3 11
1.27
Table A (continued)

Cumulative Probabilities for the Standard Normal (Z) Distribution

Z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09 Z

0.00 .5000 .5040 .5080 .5120 .5160 .5199 .5239 .5279 .5319 .5359 0.00
0.10 .5398 .5438 .5478 .5517 .5557 .5596 .5636 .5675 .5714 .5753 0.10
0.20 .5793 .5832 .5871 .5910 .5948 .5987 .6026 .6064 .6103 .6141 0.20
0.30 .6179 .6217 .6255 .6293 .6331 .6368 .6406 .6443 .6480 .6517 0.30
0.40 .6554 .6591 .6628 .6664 .6700 .6736 .6772 .6808 .6844 .6879 0.40

0.50 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224 0.50
0.60 .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549 0.60
0.70 .7580 .7611 .7642 .7673 .7704 .7734 .7764 .7794 .7823 .7852 0.70
0.80 .7881 .7910 .7939 .7967 .7995 .8023 .8051 .8078 .8106 .8133 0.80
0.90 .8159 .8186 .8212 .8238 .8264 .8289 .8315 .8340 .8365 .8389 0.90
1.00 .8413 .8438 .8461 .8485 .8508 .8531 .8554 .8577 .8599 .8621 1.00

1.10 .8643 .8665 .8686 .8708 .8729 .8749 .8770 .8790 .8810 .8830 1.10
1.20 .8849 .8869 .8888 .8907 .8925 .8944 .8962 .8980 .8997 .9015 1.20
1.30 .9032 .9049 .9066 .9082 .9099 .9115 .9131 .9147 .9162 .9177 1.30
1.40 .9192 .9207 .9222 .9236 .9251 .9265 .9279 .9292 .9306 .9319 1.40
1.50 .9332 .9345 .9357 .9370 .9382 .9394 .9406 .9418 .9429 .9441 1.50
12
Table A of the statistical tables gives cumulative
probabilities for a standard Normal distribution.

P(Z < 1.27)


= .8980

-3 -2 -1 0 1 2 3 13
1.27
For other Normal distributions, we can convert
to a standard Normal by standardizing.
Y-µ
Z = ~ N(0 , 1)
σ

Y = diastolic blood pressure Y ~ N(77 , 11.62)

 60 - 77 
P(Y < 60) = P  Z < 
 11.6 
= P(Z < -1.47) = .0708 14
Other Distributions

t
- one parameter, called the df
- similar to a Z, but with “fatter tails”

- specific percentiles are in Table B


t(12),.95

15
Table B
Percentiles of the t-Distribution

df t.60 t.70 t.80 t.90 t.95 t.975 t.99 t.995 t.9995

1 0.325 0.727 1.376 3.078 6.314 12.706 31.821 63.657 636.619


2 0.289 0.617 1.061 1.886 2.920 4.303 6.965 9.925 31.599
3 0.277 0.584 0.978 1.638 2.353 3.182 4.541 5.841 12.924
4 0.271 0.569 0.941 1.533 2.132 2.776 3.747 4.604 8.610
5 0.267 0.559 0.920 1.476 2.015 2.571 3.365 4.032 6.869
6 0.265 0.553 0.906 1.440 1.943 2.447 3.143 3.707 5.959
7 0.263 0.549 0.896 1.415 1.895 2.365 2.998 3.499 5.408
8 0.262 0.546 0.889 1.397 1.860 2.306 2.896 3.355 5.041
9 0.261 0.543 0.883 1.383 1.833 2.262 2.821 3.250 4.781
10 0.260 0.542 0.879 1.372 1.812 2.228 2.764 3.169 4.587
11 0.260 0.540 0.876 1.363 1.796 2.201 2.718 3.106 4.437
12 0.259 0.539 0.873 1.356 1.782 2.179 2.681 3.055 4.318
13 0.259 0.538 0.870 1.350 1.771 2.160 2.650 3.012 4.221
14 0.258 0.537 0.868 1.345 1.761 2.145 2.624 2.977 4.140
15 0.258 0.536 0.866 1.341 1.753 2.131 2.602 2.947 4.073

16
Other Distributions

t
- one parameter, called the df
- similar to a Z, but with “fatter tails”

- specific percentiles are in Table B


t(12),.95 = 1.782

For “lower tail” values, t(df),α = -t(df),1-α 17


χ2
- one parameter, called the df
- specific percentiles are in Table C

F
- two parameter, called the numerator df
and the denominator df
- specific percentiles are in Tables D1 – D3
18
Sampling Distributions

The mean of a sampling distribution is called


the expected value of the statistic.

The standard deviation of a sampling distribution


is called the standard error of the statistic.

19
Sampling Distribution of X

E(X) = µ
σ2 σ
Var(X) = ⇒ s.e.(X) =
n n
2
 σ 
If X ~ N(µ , σ ), then X ~ N  µ , 
2
 n 
X -µ
⇒ ~ N ( 0 , 1)
σ/ n 20
Central Limit Theorem

For n sufficiently large, the sampling


distribution of X is at least approximately
Normal for any underlying distribution!

X -µ
~ N ( 0 , 1)
σ/ n
21
Statistical Inference
- Estimation
- Hypothesis Testing
A point estimate is a single statistic that is
used to estimate a population parameter.
We can also estimate a parameter by a
100(1-α)% confidence interval.

This has a probability of “capture” of (1-α).


22
µ
( )
( )
( )
( )
( ) 100(1-α)% of
( ) these intervals
. will capture the
.
. parameter (µ)
.
.
( )
23
Form of most confidence intervals:

point estimate ± (table value)(std. error)

A 100(1-α)% C. I. for µ is:

(
X ± t (n−1),1−α/2 ) s
n
24
Hypothesis testing

Test a null hypothesis, H0,


against an alternative hypothesis, H1.

Two possible decisions:

- Reject H0 (in favor of H1)


- Fail to reject H0
25
TRUTH
DECISION H0 true H1 true

Reject H0 Type I error correct

Fail to reject H0 correct Type II error

α = P(Type I error) = P(Reject H0 | H0 true)


α is the significance level of the test

β = P(Type II error) = P(Fail to reject H0 | H1 true)

Power = P(Reject H0 | H1 true) = 1 - β 26


p - values

The probability of getting a test statistic at


least as “extreme” (in the direction stated
by H1) as the one observed.

Reject H0 if the p-value < α.

27
Hypothesis Testing Steps
1) Determine hypotheses
2) Decide on α ( .01 , .05 , .10 )

3 & 4) State rejection region, calculate test statistic


(or)
Calculate test statistic and p-value

5) Make decision (reject or not reject)


6) Write conclusions (interpret results),
in the context of the problem 28

You might also like