0% found this document useful (0 votes)
74 views56 pages

Advanced Biostatistics LM

The document discusses linear regression analysis and statistical testing. It provides examples of using linear regression to analyze relationships between variables and assess the significance of predictors. It also covers assumptions of linear regression, including normality of residuals and homoscedasticity. Various statistical tests used in linear regression are explained such as t-tests, F-tests, and the calculation of R-squared.

Uploaded by

Fantahun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views56 pages

Advanced Biostatistics LM

The document discusses linear regression analysis and statistical testing. It provides examples of using linear regression to analyze relationships between variables and assess the significance of predictors. It also covers assumptions of linear regression, including normality of residuals and homoscedasticity. Various statistical tests used in linear regression are explained such as t-tests, F-tests, and the calculation of R-squared.

Uploaded by

Fantahun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 56

Teshale Sori

 We have our senses, to experience


the world & make observations

 We have the ability to reason,


enables us to make logical inferences

Statistics: science
of drawing conclusions from data
10/22/23
3

10/22/23
4

10/22/23
5

10/22/23
Chance
6

10/22/23
Chance
7
8

Example: Pigs from one abattoir; one


large farm 10/22/23
9

10/22/23
10

10/22/23
11

10/22/23
12

10/22/23
13

10/22/23
14

10/22/23
Statistical testing
15

 The principle: to compare two or more means


 Take into account the variances & sample size
 Associates independent & dependent variables
 Independent: measurable with accuracy
 Eg: serum antibody vs disease
 Disease vs production
 Dependent variables: the variables of interest in the
observational study or the experiment
 production,
 mortality,
 parasite burden, . . .
Example: milk production
16
. ttest milk, by(Ration)

Two-sample t test with equal variances

Group Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]

basic 30 799.3667 26.9389 147.5505 744.2704 854.4629


suppl 30 994.3 38.34775 210.0393 915.87 1072.73

combined 60 896.8333 26.47206 205.0517 843.8629 949.8038

diff -194.9333 46.86421 -288.7422 -101.1244

diff = mean(basic) - mean(suppl) t = -4.1595


Ho: diff = 0 degrees of freedom = 58

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0


Pr(T < t) = 0.0001 Pr(|T| > |t|) = 0.0001 Pr(T > t) = 0.9999
10/22/23
. regress milk i.Ration

Source SS df MS Number of obs = 60


17
F( 1, 58) = 17.30
Model 569985.067 1 569985.067 Prob > F = 0.0001
Residual 1910741.27 58 32943.8149 R-squared = 0.2298
Adj R-squared = 0.2165
Total 2480726.33 59 42046.209 Root MSE = 181.5

milk Coef. Std. Err. t P>|t| [95% Conf. Interval]

Ration
suppl 194.9333 46.86421 4.16 0.000 101.1244 288.7422
_cons 799.3667 33.138 24.12 0.000 733.0337 865.6996
10/22/23
Hypothesis testing
18

10/22/23
Hypothesis testing
19

10/22/23
Hypothesis testing
20
Analysis of continuous data:
Linear Regression
21

 Learning outcomes
 Identify if the data is analyzed by
Linear regression
 Construct a linear regression model
 Interpret the linear regression
coefficients from both technical and
causal perspectives
 Assess the fitness of linear
regression model

10/22/23
Analysis of continous variables:
Linear regression
22

Used to estimate one variable from the other


(assymetric)
10/22/23
23

 Example: predict head circumference of children


 Interested in probability distribution of HC of
population of low birth weight children (<1500gm)
o Mean = 27cm
o SD = 2.5cm
 Since the measurement is rougly normal, 95% of
the infants have HC between (22.1, 31.9)
 Suppose HC increases with gestational age
 Look at HC of infants by taking specific
10/22/23
gestational age
24

10/22/23
25

10/22/23
26

10/22/23
27 10/22/23
y = 3x + 2 baseline prediction = 4
28

SST=(mean –
observed)^2

SST = 24
SSE=SUM(predicted
– observed )^2
SSM = 18

R^2 = 1 –SSM/SST

RMS=SSM/N^1/2
10/22/23
Assumptions
29

For analysis:
 Linearity
 Normal distribution of the outcome
variable
For inference
 Normality of residuals
 Homeoscedasticity
10/22/23
Simple Linear Regression: carcass price
30

10/22/23

Does the volume of sales (independent variable) inuence


the price per unit (dependent variable)?
Regression
31

The t-test formula is a typical example of what we call a


statistical test

The ratio between the difference of two averages and its


standard error

The probability to obtain a value equal to or larger than this


ratio under the so-called null hypothesis is determined

i.e true difference between the two averages is zero

10/22/23
Regression In Stata:

32
Source SS df MS Number of obs = 10
F( 1, 8) = 7.99
Model 34.1342349 1 34.1342349 Prob > F = 0.0223
Residual 34.1867667 8 4.27334584 R-squared = 0.4996
Adj R-squared = 0.4371
Total 68.3210017 9 7.59122241 Root MSE = 2.0672

pricecarcass Coef. Std. Err. t P>|t| [95% Conf. Interval]

numberofpigssold1000000 -.366877 .1298104 -2.83 0.022 -.6662202 -.0675337


_cons 45.55915 9.783604 4.66 0.002 22.99812 68.12018

P = 45.55915 -.366877*number
10/22/23
Regression
33

10/22/23
34

10/22/23
Multivariable regression : relations among t-test,
ANOVA and linear regression
35

10/22/23
36

 Questions to be answered:
 Does supplementation have significant
effect on the weight gain?
 Is growth significantly different among
breeds?
 Is the effect of the supplement different
among breeds?
10/22/23
Source SS df MS Number of obs = 24
F( 7, 16) = 40.84
Model 11114.625 7 1587.80357 Prob > F = 0.0000
37 Residual 622 16 38.875 R-squared = 0.9470
Adj R-squared = 0.9238
Total 11736.625 23 510.288043 Root MSE = 6.235

adwg Coef. Std. Err. t P>|t| [95% Conf. Interval]

breed
nh 45 5.090841 8.84 0.000 34.2079 55.7921
lw 54 5.090841 10.61 0.000 43.2079 64.7921
p 37 5.090841 7.27 0.000 26.2079 47.7921

ration
s 21 5.090841 4.13 0.001 10.2079 31.7921

breed#ration
nh#s -4 7.199537 -0.56 0.586 -19.26234 11.26234
lw#s -5 7.199537 -0.69 0.497 -20.26234 10.26234
p#s -4 7.199537 -0.56 0.586 -19.26234 11.26234

_cons 200 3.599769 55.56 0.000 192.3688 207.6312


10/22/23
Source SS df MS Number of obs = 24
F( 4, 19) = 81.80
Model 11092.5 4 2773.125 Prob > F = 0.0000
38 Residual 644.125 19 33.9013158 R-squared = 0.9451
Adj R-squared = 0.9336
Total 11736.625 23 510.288043 Root MSE = 5.8225

adwg Coef. Std. Err. t P>|t| [95% Conf. Interval]

breed
nh 43 3.361612 12.79 0.000 35.96406 50.03594
lw 51.5 3.361612 15.32 0.000 44.46406 58.53594
p 35 3.361612 10.41 0.000 27.96406 42.03594

ration
s 17.75 2.377019 7.47 0.000 12.77484 22.72516
_cons 201.625 2.657588 75.87 0.000 196.0626 207.1874
10/22/23
39

10/22/23
40

10/22/23
41

 1. Test of normality of residuals

 Before any other test, the normal


distribution of the residual values
must be confirmed

 Stata offers two commands to this


end:

 Shapiro-Wilk test and Shapiro-


Francia test
10/22/23
42

10/22/23
43

 2. Residual values in function


of predicted values

 A graph of the residual values in


function of the predicted values is
produced with the command
rvfplot (residual-versus-fitted
plot)

10/22/23
44

10/22/23
45

 The command hettest (heteroscedasticity test) tests the


equality of the variances of the residual values

10/22/23
46
Source SS df MS Number of obs = 308
F( 7, 300) = 78.06
Model 6.35654316 7 .908077595 Prob > F = 0.0000
Residual 3.48991243 300 .011633041 R-squared = 0.6456
Adj R-squared = 0.6373
Total 9.8464556 307 .032073145 Root MSE = .10786

temp Coef. Std. Err. t P>|t| [95% Conf. Interval]

co2 .0062906 .0025726 2.45 0.015 .001228 .0113532


ch4 -.0004803 .0005816 -0.83 0.410 -.0016248 .0006642
n2o -.0248186 .0091634 -2.71 0.007 -.0428512 -.006786
cfc11 -.0080571 .0017144 -4.70 0.000 -.0114309 -.0046833
cfc12 .005394 .0010219 5.28 0.000 .0033829 .0074051
tsi .0726611 .0162568 4.47 0.000 .0406693 .1046529
aerosols -.8517611 .2332851 -3.65 0.000 -1.310843 -.3926787
_cons -93.3341 22.22649 -4.20 0.000 -137.0737 -49.59453

10/22/23
47

mei co2 ch4 n2o cfc11 cfc12 tsi

mei 1.0000
co2 -0.1529 1.0000
ch4 -0.1056 0.8723 1.0000
n2o -0.1624 0.9811 0.8944 1.0000
cfc11 0.0882 0.4013 0.7135 0.4122 1.0000
cfc12 -0.0398 0.8232 0.9582 0.8393 0.8314 1.0000
tsi -0.0768 0.0179 0.1463 0.0399 0.2846 0.1893 1.0000
aerosols 0.3524 -0.3693 -0.2904 -0.3535 -0.0323 -0.2438 0.0832
temp 0.1353 0.7485 0.6997 0.7432 0.3801 0.6889 0.1822

10/22/23
48

 Multicollinearity arises when independent


variables are highly correlated among
themselves
 The effect of multicollinearity to mask real
effects of one or more independent variables
 Either preventing the null hypothesis to be
rejected or even by reversing the sign of the
regression coefficient

10/22/23
49

10/22/23
50

10/22/23
51

 The results of this analysis are highly


contradictory:
 On one hand a very high R2 value is
obtained & a highly significant F-value,
 On the other hand, none of the
regression coefficients are different
from zero
 Such a result is almost always a very
good indication of the presence of
multicollinearity,

10/22/23
52

10/22/23
53

10/22/23
54

10/22/23
55

10/22/23
56

10/22/23

You might also like