0% found this document useful (0 votes)

5 views46 pages

13Simple linear Regression

The document discusses the evaluation of relationships between independent and dependent variables using linear regression. It covers concepts such as the formulation of the regression equation, assumptions of linearity, constant variance, normality, and independence, as well as methods for estimating parameters and testing hypotheses. Additionally, it illustrates these concepts with examples related to smoking, cognitive development, and health metrics like systolic blood pressure and weight.

Uploaded by

Berhanu Yelea

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views46 pages

13Simple linear Regression

Uploaded by

Berhanu Yelea

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 46

Objective: To evaluate the relationship of one continuous

variable to another continuous or discrete variable

Examples:
Smoking and birth weight
Investigators wish to know if a newborn’s birth weight is
associated with how much the mother smoked during
pregnancy

Cognitive development in children

Does the age at which a child begins to read predict his
mental abilities later in life, as measured by a
standardized test?

Effects of weight on health

A doctor would like a formula that allows her to compute 1
a patient’s cholesterol based upon the patient’s weight.
In all three examples, we are interested in
describing the relationship one variable, known as
the independent variable, has to the other variable,
known as the dependent variable.
Independent Dependent
Variable Variable
Example 1: Smoking Birth weight
Example 2: Reading age Cognitive skill
Example 3: Weight Cholesterol
Other common terminology:
Dependent variable = Outcome variable =Response
variable
Independent variable = Predictor variable = Explanatory
variable
2
Statistical shorthand:
X = Independent variable
Y = Dependent variable

The dependent variable must be continuous!

The independent variable can be discrete or
continuous!

We can visualize the relationship of two

variables with a scatter plot

3
4
Would you expect systolic blood pressure to
be related to weight of the person?

We often assume that the two variables

have a linear association. Is this
assumption valid????

In an ideal world, we can use the equation

for a line to express exactly how systolic

person
Y    X
blood pressure is related to weight of the
0 1
5
In the above equation,
Y is the dependent variable
X is the independent variable
βo is the intercept
β1 is the slope

Suppose we know that βo=100 and β1=0.5

Then, if the line is correct, every person with a

weight of 75 kg will have a systolic blood pressure
of:
100 + [ 0.5 x 75] = 137.5 mm Hg.

However, will person who weigh 75 kg have a

systolic blood pressure of 137.5 mm Hg? Why? 6
7
We can see that the line does not perfectly
predict systolic BP for every person in the
study.

There is variation around the line that we also

need to consider!

Thus, our equation must not only include the

line, but quantify the variation around the line:

Y  0  1 X  e
8
One role biostatistics is to assess the magnitude of
the errors (e’s).
If e has a tendency to be large, then the line fits the data
poorly.
If e has a tendency to be small, then the line fits the data
well.

In reality, we have no idea what range of e’s are

possible, so we have make another assumption:
e ~ N (0, σ2 )
e’s are normally distributed with mean of zero and
equal variance.
Thus, our linear model has two components:
Y = [ β0 + β1X ] + [e]
Fixed random
“ systematic” “deterministic” 9
Is Y a random variable? For a given value of X,
what is the distribution of Y?

Thus, for each value of X, we assume that the

corresponding Y value has a normal
distribution around β0 + β1X.

The variance of Y is determined by σ 2 ; we’ll

discuss later how to use the data to estimate
σ 2.

10
Let’s apply these concepts to our example;
Y = Systolic BP
X = Weight
Y/X = is normally distributed such that
E(Y/X) = 100 + 0. 5X
Var(Y/X) = σ2 (Let’s assume we know σ
=15)

Thus, individuals who weigh 75 kg have a normally

distributed systolic BP that average 137.5 with a
standard deviation of 15 mm Hg.

Likewise, individuals who weigh 100 kg have a

normally distributed systolic BP that average 150
with a standard deviation of 15 mm Hg.
11
Thus, as the value of X (weight) changes, the (normal) distribution of Y
(systolic BP) shifts location (mean), but the spread of values (variance) does
not change.

We have just described three core assumptions of linear regression:

1. Linearity:
The average value of Y changes exactly with X in a linear fashion.
The slope, β1 quantifies how much the average value of Y changes
with a one unit change in X.

2. Constant variance (homoscedasticity):

The variability of Y is constant across all values of X
X supplies no information regarding the variance of Y

3. Normality:
Each Y value has a normal distribution with: Mean =0 and constant
variance:

E(Y/X) = β0 + β1X (dependent on X)

Var(Y/X) = σ2 (independent of X)

More specifically, the errors are normally distributed with mean and
variance:
12
E(e) = 0 and Var (e) = σ2
We have described our model and
assumptions:

How do we determine what the data say are

the “best” values for the unknown
parameters β0, β1, and σ2

Suppose we draw our line with an intercept

of 100 (β0 =100) and a slope of 0. 5 (β1 = 0.
5) on the scatter plot:
13
14
For a given line, we can compute the residual for
each woman:
êi = Yi – Ŷi
Residual Actual value Predicted value

Note how êi is related to ei?

If we add up the residuals for all 45 women, we get a

measurement of the total error of the line.

Therefore,
n we
n can measure
n the total fit of any line as:
SSE  eˆi2  (Yi  Yˆi ) 2  (Yi   o  1 X i ) 2
i 1 i 1 i 1

= sum of squares due to error

15
SSE is our measurement of how good a line is:

As SSE increases, this means our total error

increases (bad!)
 the line is a poorer fit.

As SSE decreases this means our total error

decreases (good!)
 the line is a better fit

Thus, we want to choose a line (choose values of β0

and β1) so that SSE is as small as possible.

How do we do that?
Answer: The method of least squares 16
17
It can be proved that the values of β0 and β1 which
make SSE as small as possible are:
n

 (X i  X )(Yi  Y )
β̂1  i 1
n

 i
(X
i 1
 X ) 2

β̂ 0 Y  β1 X
For the systolic BP & weight data, the equations
give; βo=98.463 and β1=0.428
Yˆ 98.463  0.428 X
18
Yˆ 98.463  0.428 X

19
Suppose we take a person whose weight is 85.
How do we predict his/her systolic BP?

What is our interpretation of the̂slope

1
?

20
We have estimated the systematic component of our model:

Eˆ (YlX )  ˆ0  ˆ1 X

Let the estimated coefficients be written as bo and b1

What about the deterministic component σ2 =Var(Y/X)?

σ2 measures the leftover variation in Y (Systolic BP) after
accounting for the relationship with X (weight)
If σ2 is small, the points are scattered very close to our fitted
line.
If σ2 is large, the points are scattered far away from our
fitted line.
Yˆ b0  b1 X

Put another way, σ2 describes the average squared deviation of

the Y values around the line
21
Thus, we could compute the estimate:
σ2 = 1 n (Y     X ) 2
0 1
n i 1

σ2 = n (Yi   0  1 X i ) 2
i 1

n
However, we have to make two adjustments to
the above equation:
1. We don’t know β0 and β1 in the numerator
2. We have to reduce the denominator for every
estimated parameter in the numerator.

22
As a result, our estimate of σ2 is
n

 (Y  b i 0  b X
1 i ) 2

 
2 i 1
n 2

SSE

n 2

MSE
S 2 Y / x

23
We now have point estimates for the three parameters β0, β1 and σ2:
bo is the intercept of our estimated line
b1 is the slope of our estimated line
MSE is the estimated variability of the data around the line

Our interest is the value of b1:

β1 describes how X and Y are associated with each other.

However, what we have from the data is b1, which is an estimate of

β1
b1 is a random variable because it is a function of the Y values
(systolic BP), which are themselves random variables.

If we were to compute b1 from another sample, we would get a

different value than we did from the first sample.

Thus, we want to know how “certain” we are about our estimate b1,
or how much b1 will vary from sample to sample, through
confidence intervals. 24
Thus, b1 has its own distribution, with
corresponding mean and variance: 2

E(b1) =μ(b1)= β1 and Var(b1) =
2
(b1 )  2

n
(n  1) S x
Where S x2  1  ( xi  X ) 2


n  1 i 1

However, we don’t know σ2, so we replace it

with our estimate S2y/x = MSE: 2
S y/x
Sb21 
(n  1) S x2
The denominator of MSE is referred to as
the degree of freedom
25
Recall the three core assumptions of linear
regression – linearity, constant variance
and normality

Our formula for the variance of b1 is only

valid if we make one more assumption:
4. Independence:
The Y values must be independent of each
other, knowing some of the Y values tells you
nothing about what the other Y values could
be.
26
Now that we have an estimate for β1 and a variance
estimate for that estimate, we can compute a
confidence interval for β1

A(1-α)100% confidence
b1 t / 2 S b interval for the population
slope β1 1

MSE
b1 t / 2
( n  1) S x2

The degrees of freedom associated with the t statistic are

defined by the denominator MSE, i.e., n-2. 27
Consider the example on systolic BP and
weight:
b1=0.428, Sx=15.46, MSE=220.899,
se(b1)=0.043 and take t0.975(488) ≈ 1.96

A 95% CI for β1 will be:

0.428 ± 1.96*0.043 = (0.342, 0.513)

28
Recall that our goal is to determine whether or not Y
(systolic BP) and X (weight) have a linear relationship
If there is no linear relationship between X and Y, what is
our value of β1 (slope)?

We can test our hypothesis of no linear relationship:

H0: β1 =0 versus HA: β1 ≠ 0
b1  0
With a typical t statistic:t 
MSE
(n  1) S x2

Which has a t-distribution with (n-2) degrees of

freedom when Ho is true. If n is large, the
distribution of the test statistic will be close to the
standard normal 29
1. Rejection region approach
We will reject H0 if our test statistic is large in
magnitude
We will reject H0 if /t/ > tα/2
Where tα/2 denote the boundaries of the rejection region

2. P-value approach
We will reject H0 if our observed statistic has very low
probability when H0 is true
We will reject H0 if our p-value is less than the Type I
error rate α

3. Confidence interval approach

We will reject H0 if our (1- α) 100% CI does not contain
0.
30
We can apply this to the systolic BP & weight
data:
Ho: Systolic BP and weight have no linear relationship
(β1=0)
Versus
HA: Systolic BP and weight have a linear relationship
(β1≠0)
b1  0 0.428
t 
MSE 220.899
Our test statistic is 2
(n  1) S x 489 (15.46) 2
9.838

31
Thus, since our test statistic and values larger than our
statistic occur so rarely when the null hypothesis is
true, we reject H0.

Recall that our 95% confidence interval was

(0.342,0.513)

This interval does not contain 0; therefore, we once

again reject H0 at a level of 5%.

What do we conclude if we fail to reject H0: β1 = 0?

We say that we have no evidence that X (weight) and Y (systolic
BP) have a linear relationship.

However, some other association between X and Y may 32

exist.
Example: Suppose we have data:
X: 1 2 3 4 5 6
Y: 3 2 0 1 4 7

The slop would be = b1 = 0.77 with

SE(b1)=0.54.

Thus a 95% CI for β1 would be (-0.28, 2.05)

and we would fail to reject Ho: β 1=0.

33
However, a scatter plot displays an
obvious relationship:
8
7
6
5
4
Y

3
2
1
0
0 1 2 3 4 5 6 7
X

34
When we fail to reject H0 β1= 0, we can also
explain our conclusion in another way.

We can re-write our fitted values as:

 
Ŷ b 0  b1X i ( Y  b1 X )  b1X i Y  b1 (X i  X)
We see our fitted value is the sum of two
components:
(1) Sample mean of Y
(2) Some fraction, β1, of how far Xi is from its
mean.

Thus, if we fail to reject H0: β1 = 0, X does not help

us to predict Y, and we do just as well to predict Y
with it mean. 35
The following plot demonstrates this concept

Fitted line

Mean of Y

36
Our regression line is an estimate of a
parameter:
Parameter: E(Y/X0) = μY/x0 = β0 + β1X0
Estimate: E(Y/X0) = b0 + b1X0
ŶX 0
For every value of x0, we can compute our
estimate ; connecting all these estimates
gives us our estimated regression line.

37
SinceŶX is a random variable, it has its
0

own variance:
 1 ( X  X ) 2

Var (Yˆx0 )  Y2ˆx   
2 0
2 

 n (n  1) S x 
0

We don’t know σ2, so we substitute MSE

Ŷx o
for σ2 and estimate the standard error of
to be; 2
 1 (X 0  X ) 
S Yˆ  S y / x   
x0  n (n  1) S 2 
 x 

38
From the point estimate Ŷ
xo and its
standard
S Yˆ error estimate
xo
, we create a
confidence  y / xointerval for :
Yˆx0 t n  2,1  / 2 S Yˆ
x0

Where:
α is the Type I error rate
t1-α/2 with (n-2) degrees of freedom.

39
For every value of X0, we can compute a
confidence yinterval
/ xo for ; connecting all
these intervals gives us confidence bands for
our fitted regression line:

Confidence Band

Fitted Line

40
From any regression equation, we can compute
predicted values:
Sys BP =98.5 + 0.43 (Weight)
For a person weighing 65 kg, his sys BP is
predicted to be:
98.5 + 0.43(65)=126.5 mm Hg
Yˆx0
Thus, the point estimate is used to estimate two
 y / x0
different quantities:
= Average Y value for everyone in population
with X value equal to X0.
Y/X0 = Y value for a single person in population
with X value equal to X0. 41
y / x
Recall that we can compute a confidence interval 0

for because
y / x0 is a parameter.

We can compute a similar interval for Y/X0, (Not a

parameter) but we call the interval a prediction
interval.

In order to compute a prediction interval, we need

an appropriate standard error estimate.

Intuitively, the variation of a single person’s

predicted value is greater than the variation of the
population’s estimated mean. 42
Specifically, when computing a predicted value, we
are implicitly doing two steps:
1. ComputingYˆx0 b0  b1 X 0 , which has its own
variability Yˆx
0

Yˆx
2. Selecting a value from a population 0
with mean
and additional variation σ2 around

Thus, the total variance of a predicted value for a

single individual is;
As usual, we don’t
 know 2 σ 2, so we use MSE as a
1 X  X 
substitute. 2   0 2 
  2

n (n  1) S x 
 
Var (Yˆx 0 )  Var (Y )
43
From the point estimate
Yˆx 0
and the inflated
standard error estimate
S Y2ˆ  S Y2 / x
x0


2
1 ( X  X )
we createYˆa
x
prediction
t S
n  2 ,1  / 2 Y / X 1 interval:
  0

n (n  1) S x2

Where
α is the type I error rate,
tn-2, 1-α/2 is the confidence coefficient from at t
distribution with (n-2) degrees of freedom 44
For every value of X0, we can compute a prediction
interval; connecting all these intervals gives us
prediction bands around our fitted regression line:

Prediction Band

Confidence band

45
Prediction within the observed range of X
values is often reliable.

Caution must be exercised when computing a

fitted value for a value of X outside the
observed range
 Extrapolation is dangerous!

R Cheat Sheet PDF
100% (1)
R Cheat Sheet PDF
38 pages
ISPE Blend and Content Uniformity Guidance
0% (1)
ISPE Blend and Content Uniformity Guidance
26 pages
Cheat Sheet
No ratings yet
Cheat Sheet
4 pages
AP Statistics Study Guide
100% (1)
AP Statistics Study Guide
12 pages
Advanced Statistical Methods
No ratings yet
Advanced Statistical Methods
63 pages
Unit 3 - Notes
No ratings yet
Unit 3 - Notes
32 pages
05 Class RegressionCorrelation
No ratings yet
05 Class RegressionCorrelation
57 pages
Stt151a Notes
No ratings yet
Stt151a Notes
14 pages
correlation
No ratings yet
correlation
13 pages
Linear Regression and Correlation
No ratings yet
Linear Regression and Correlation
35 pages
ANOVA, Correlation and Regression: Dr. Faris Al Lami MB, CHB PHD FFPH
No ratings yet
ANOVA, Correlation and Regression: Dr. Faris Al Lami MB, CHB PHD FFPH
40 pages
MAP 716 Lecture 5 Multiple Regression
No ratings yet
MAP 716 Lecture 5 Multiple Regression
6 pages
Common Pitfalls in Statistical Analysis: Linear Regression Analysis
No ratings yet
Common Pitfalls in Statistical Analysis: Linear Regression Analysis
4 pages
5 Bivariate Data. Double The Data, Double The Fun: 5.1 Covariance and Correlation
No ratings yet
5 Bivariate Data. Double The Data, Double The Fun: 5.1 Covariance and Correlation
10 pages
L3-Statistics Tests of Sig-3
No ratings yet
L3-Statistics Tests of Sig-3
53 pages
EE3211 Modelling Techniques (1)
No ratings yet
EE3211 Modelling Techniques (1)
47 pages
Regression Logistic Regression
100% (1)
Regression Logistic Regression
37 pages
Calculation of VIF
No ratings yet
Calculation of VIF
24 pages
Assignment of Multiple Linear Regressions
No ratings yet
Assignment of Multiple Linear Regressions
9 pages
Is The Dependent Variable Related To The Independent Variable?
No ratings yet
Is The Dependent Variable Related To The Independent Variable?
10 pages
Statistical Methods in Nursing
No ratings yet
Statistical Methods in Nursing
73 pages
Psy 3 - M
No ratings yet
Psy 3 - M
3 pages
Untitled 472
No ratings yet
Untitled 472
13 pages
324.20
No ratings yet
324.20
2 pages
Notes 8-1
No ratings yet
Notes 8-1
28 pages
SPSS Advance Statistics Session 1 RCD DR Muhammad Khan Asif
No ratings yet
SPSS Advance Statistics Session 1 RCD DR Muhammad Khan Asif
55 pages
Statistics For Dummies
100% (3)
Statistics For Dummies
41 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
20 pages
Chapter 14 - Correlation and Regression - 2012 - Practical Biostatistics
No ratings yet
Chapter 14 - Correlation and Regression - 2012 - Practical Biostatistics
20 pages
Introduction To Logistic Regression: Rachid Salmi, Jean-Claude Desenclos, Alain Moren, Thomas Grein
No ratings yet
Introduction To Logistic Regression: Rachid Salmi, Jean-Claude Desenclos, Alain Moren, Thomas Grein
36 pages
Regn_lect_3
No ratings yet
Regn_lect_3
10 pages
Lecture 4 Linear Regression
No ratings yet
Lecture 4 Linear Regression
75 pages
Choosing The Right Statistical Test: Source
No ratings yet
Choosing The Right Statistical Test: Source
4 pages
Regression Modeling in Biostatistics
No ratings yet
Regression Modeling in Biostatistics
3 pages
Simple Linear
No ratings yet
Simple Linear
10 pages
Lecture 8 Correlation and Linear Regression
No ratings yet
Lecture 8 Correlation and Linear Regression
66 pages
Stats B
No ratings yet
Stats B
75 pages
linearregression
No ratings yet
linearregression
18 pages
STAT22209 - Chapter 03-Multiple Regression - 2022
No ratings yet
STAT22209 - Chapter 03-Multiple Regression - 2022
41 pages
Correlation and Regression: Fathers' and Daughters' Heights
No ratings yet
Correlation and Regression: Fathers' and Daughters' Heights
43 pages
Answer for Assignment I for Biostatistics Course 2024 PG1 1
No ratings yet
Answer for Assignment I for Biostatistics Course 2024 PG1 1
27 pages
Advance
No ratings yet
Advance
17 pages
Statistic SimpleLinearRegression
No ratings yet
Statistic SimpleLinearRegression
7 pages
DECS Cheat Sheet
No ratings yet
DECS Cheat Sheet
8 pages
reg
No ratings yet
reg
110 pages
Soal Correlation and Simple Linear Regression
No ratings yet
Soal Correlation and Simple Linear Regression
1 page
Linear Regression Model
No ratings yet
Linear Regression Model
3 pages
Lecture 1 - Simple Linear Regression
No ratings yet
Lecture 1 - Simple Linear Regression
9 pages
15Multiple Linear Regression
No ratings yet
15Multiple Linear Regression
168 pages
Study Guide - Biostatistics: 35% of Prevmed Exam (With Epi)
No ratings yet
Study Guide - Biostatistics: 35% of Prevmed Exam (With Epi)
14 pages
Notes 6
No ratings yet
Notes 6
26 pages
6 Continuous Data Analysis
No ratings yet
6 Continuous Data Analysis
49 pages
11 Regression JASP
100% (1)
11 Regression JASP
35 pages
Primer of Applied Regression and Analysis of Variance (Glantz S.a., Slinker B.K., Neilands T.B)
No ratings yet
Primer of Applied Regression and Analysis of Variance (Glantz S.a., Slinker B.K., Neilands T.B)
1,472 pages
Community Medicine Trans - Epidemic Investigation 2
100% (1)
Community Medicine Trans - Epidemic Investigation 2
10 pages
Linear Regression and Correlation
No ratings yet
Linear Regression and Correlation
99 pages
Lecture 1-2-118
No ratings yet
Lecture 1-2-118
117 pages
Correlation Regression Tutorial
No ratings yet
Correlation Regression Tutorial
42 pages
MSC Nursing
No ratings yet
MSC Nursing
8 pages
Practical Biostatistics BMB-308: Torial Port and Presentation
No ratings yet
Practical Biostatistics BMB-308: Torial Port and Presentation
28 pages
Bivariate
No ratings yet
Bivariate
28 pages
Topic 3a
No ratings yet
Topic 3a
64 pages
Introduction to Bessel Functions
From Everand
Introduction to Bessel Functions
Frank Bowman
2.5/5 (1)
Decentralization
No ratings yet
Decentralization
11 pages
Ed Icu Transfer
No ratings yet
Ed Icu Transfer
8 pages
Abebish Thesis Ppt May 2017
No ratings yet
Abebish Thesis Ppt May 2017
45 pages
Anemia 2019 Trevised
No ratings yet
Anemia 2019 Trevised
5 pages
006 the Head Nurse Rule 003
No ratings yet
006 the Head Nurse Rule 003
8 pages
Abebayehu Derejaw Do
No ratings yet
Abebayehu Derejaw Do
63 pages
Dtsch_Arztebl_Int-105-0865
No ratings yet
Dtsch_Arztebl_Int-105-0865
7 pages
Quality_Improvement_in_Postoperative_Pain_Manageme
No ratings yet
Quality_Improvement_in_Postoperative_Pain_Manageme
8 pages
pain assessment check list
No ratings yet
pain assessment check list
3 pages
Clinics
No ratings yet
Clinics
4 pages
pain QI
No ratings yet
pain QI
5 pages
Quality_improvement_initiative_for_pain_management
No ratings yet
Quality_improvement_initiative_for_pain_management
8 pages
Module 4_Problem Identification, Prioritization and Aim Statement_Nov 2025
No ratings yet
Module 4_Problem Identification, Prioritization and Aim Statement_Nov 2025
33 pages
lotte 3red
No ratings yet
lotte 3red
8 pages
36623_CE[Ra1]_F(SHU)_PF1(AGAK)_PFA(KM)_PB(AG_SHU)_PN(SL)
No ratings yet
36623_CE[Ra1]_F(SHU)_PF1(AGAK)_PFA(KM)_PB(AG_SHU)_PN(SL)
5 pages
[Journal_of_Perinatal_Medicine]_Expert_recommendations_for_the_diagnosis_and_treatment_of_iron-deficiency_anemia_during_pregnancy_and_the_postpartum_period_in_the_Asia-Pacific_region
No ratings yet
[Journal_of_Perinatal_Medicine]_Expert_recommendations_for_the_diagnosis_and_treatment_of_iron-deficiency_anemia_during_pregnancy_and_the_postpartum_period_in_the_Asia-Pacific_region
10 pages
Module 5_Generating Change_Nov 2025
No ratings yet
Module 5_Generating Change_Nov 2025
33 pages
Anaemia in pregnancy and the postpartum period
No ratings yet
Anaemia in pregnancy and the postpartum period
18 pages
Module 3_Quality Improvement Models_Nov 2025
No ratings yet
Module 3_Quality Improvement Models_Nov 2025
22 pages
Auditing & practice Ass
No ratings yet
Auditing & practice Ass
9 pages
Los
No ratings yet
Los
10 pages
EQP
No ratings yet
EQP
6 pages
DUTH capacity building protocol edited
No ratings yet
DUTH capacity building protocol edited
5 pages
DUTH Briefing and debriefing protocol edited
No ratings yet
DUTH Briefing and debriefing protocol edited
6 pages
Critical apprisal of 2008 MPH summer
No ratings yet
Critical apprisal of 2008 MPH summer
39 pages
Reducing Observation Unit Length of Stay Hours_ A Quality Improve
No ratings yet
Reducing Observation Unit Length of Stay Hours_ A Quality Improve
58 pages
Demography and Health service statistics
No ratings yet
Demography and Health service statistics
63 pages
ANOVA
No ratings yet
ANOVA
39 pages
Statistical method of categorical variable
No ratings yet
Statistical method of categorical variable
68 pages
Critical appraisal
No ratings yet
Critical appraisal
22 pages
Activity # 4.12
No ratings yet
Activity # 4.12
4 pages
(Ebook) Statistical Inference: A Short Course by Michael J. Panik ISBN 9781118229408, 1118229401 download
100% (1)
(Ebook) Statistical Inference: A Short Course by Michael J. Panik ISBN 9781118229408, 1118229401 download
51 pages
Jurnal Internasional Subangkit Meianto
No ratings yet
Jurnal Internasional Subangkit Meianto
12 pages
Six Sigma Tools in A Excel Sheet
No ratings yet
Six Sigma Tools in A Excel Sheet
19 pages
UNIT2 Probabilty Questions
No ratings yet
UNIT2 Probabilty Questions
4 pages
Department of Agricultural Statistics: JNR MSC Agri, Students, Uasd
No ratings yet
Department of Agricultural Statistics: JNR MSC Agri, Students, Uasd
2 pages
Stat ppt1
No ratings yet
Stat ppt1
45 pages
Lecture Notes
No ratings yet
Lecture Notes
141 pages
SSC CGL Syllabus 2024 for Tier 1 and Tier 2 Exam
No ratings yet
SSC CGL Syllabus 2024 for Tier 1 and Tier 2 Exam
14 pages
Activity 5 - Statistical Analysis and Design - Regression - Correlation
No ratings yet
Activity 5 - Statistical Analysis and Design - Regression - Correlation
29 pages
Black Belt Body of Knowledge
No ratings yet
Black Belt Body of Knowledge
12 pages
Chapter 3 - Data Presentation
100% (1)
Chapter 3 - Data Presentation
40 pages
Vinka Dwi Melinda (1023032031)-compressed
No ratings yet
Vinka Dwi Melinda (1023032031)-compressed
17 pages
Cornell CS578: Bagging and Boosting
No ratings yet
Cornell CS578: Bagging and Boosting
10 pages
Download ebooks file Handbook of Environmental Engineering 1st Edition Frank R Spellman all chapters
100% (4)
Download ebooks file Handbook of Environmental Engineering 1st Edition Frank R Spellman all chapters
61 pages
Scaling and Content Analysis and Data Processing
No ratings yet
Scaling and Content Analysis and Data Processing
22 pages
PowerPoint Presentation
No ratings yet
PowerPoint Presentation
10 pages
Chapter 5 Discrete Probalitity Dsitributions - Jaggia4e - PPT
No ratings yet
Chapter 5 Discrete Probalitity Dsitributions - Jaggia4e - PPT
66 pages
BV Cvxbook Extra Exercises PDF
No ratings yet
BV Cvxbook Extra Exercises PDF
152 pages
PSYCHOLOGICAL ASSESSMENT CHAPTER 3-6 (Summary) 1
No ratings yet
PSYCHOLOGICAL ASSESSMENT CHAPTER 3-6 (Summary) 1
11 pages
1) Mathematics 2) Statistics 3) English 4) Data Interpretation 5) Logic 1) Mathematics
No ratings yet
1) Mathematics 2) Statistics 3) English 4) Data Interpretation 5) Logic 1) Mathematics
2 pages
Data Analysis with Microsoft Excel Updated for Office 2007 3rd Edition by Kenneth N Berk, Patrick M Carey ISBN 0538494670 9780538494670 - Download the full set of chapters carefully compiled
100% (10)
Data Analysis with Microsoft Excel Updated for Office 2007 3rd Edition by Kenneth N Berk, Patrick M Carey ISBN 0538494670 9780538494670 - Download the full set of chapters carefully compiled
85 pages
BRM Akash
No ratings yet
BRM Akash
15 pages
Purpose: Test For Homogeneity of Variances: Levene 1960
No ratings yet
Purpose: Test For Homogeneity of Variances: Levene 1960
3 pages
GEOTECHNICAL ENGINEERING-II Sem
No ratings yet
GEOTECHNICAL ENGINEERING-II Sem
14 pages
BIOL 2163 Assignment 2
No ratings yet
BIOL 2163 Assignment 2
5 pages
Random Number Generation and Quasi-Monte Carlo Methods
No ratings yet
Random Number Generation and Quasi-Monte Carlo Methods
243 pages

13Simple linear Regression

Uploaded by

13Simple linear Regression

Uploaded by

Objective: To evaluate the relationship of one continuous

variable to another continuous or discrete variable

Cognitive development in children

Effects of weight on health

The dependent variable must be continuous!

We can visualize the relationship of two

We often assume that the two variables

In an ideal world, we can use the equation

Suppose we know that βo=100 and β1=0.5

Then, if the line is correct, every person with a

However, will person who weigh 75 kg have a

There is variation around the line that we also

Thus, our equation must not only include the

In reality, we have no idea what range of e’s are

Thus, for each value of X, we assume that the

The variance of Y is determined by σ 2 ; we’ll

Thus, individuals who weigh 75 kg have a normally

Likewise, individuals who weigh 100 kg have a

We have just described three core assumptions of linear regression:

2. Constant variance (homoscedasticity):

E(Y/X) = β0 + β1X (dependent on X)

How do we determine what the data say are

Suppose we draw our line with an intercept

Note how êi is related to ei?

If we add up the residuals for all 45 women, we get a

= sum of squares due to error

As SSE increases, this means our total error

As SSE decreases this means our total error

Thus, we want to choose a line (choose values of β0

What is our interpretation of the̂slope

Eˆ (YlX )  ˆ0  ˆ1 X

What about the deterministic component σ2 =Var(Y/X)?

Put another way, σ2 describes the average squared deviation of

Our interest is the value of b1:

However, what we have from the data is b1, which is an estimate of

If we were to compute b1 from another sample, we would get a

However, we don’t know σ2, so we replace it

Our formula for the variance of b1 is only

The degrees of freedom associated with the t statistic are

A 95% CI for β1 will be:

We can test our hypothesis of no linear relationship:

Which has a t-distribution with (n-2) degrees of

3. Confidence interval approach

Recall that our 95% confidence interval was

This interval does not contain 0; therefore, we once

What do we conclude if we fail to reject H0: β1 = 0?

However, some other association between X and Y may 32

The slop would be = b1 = 0.77 with

Thus a 95% CI for β1 would be (-0.28, 2.05)

We can re-write our fitted values as:

Thus, if we fail to reject H0: β1 = 0, X does not help

We don’t know σ2, so we substitute MSE

We can compute a similar interval for Y/X0, (Not a

In order to compute a prediction interval, we need

Intuitively, the variation of a single person’s

Thus, the total variance of a predicted value for a

Caution must be exercised when computing a

You might also like