0% found this document useful (0 votes)

14 views

13simple linear regression

The document discusses correlation and regression analysis, explaining how to find relationships between quantitative variables using statistical techniques. It covers concepts such as scatter diagrams, correlation coefficients, and the least-squares method for fitting regression lines. Additionally, it introduces Spearman Rank Correlation and outlines the assumptions of simple linear regression models.

Uploaded by

Berhanu Yelea

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

13simple linear regression

Uploaded by

Berhanu Yelea

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 127

• YILMA CHISHA

• ASISSTANT PROFESSOR, BHI

• AMU, SPH, CMHS

1
Correlation

Finding the relationship between

two quantitative variables
without being able to infer causal
relationships

Correlation is a statistical
technique used to determine the
degree to which two variables are
related
12 March 2025 2
Scatter diagram
• Rectangular coordinate
• Two quantitative variables
• One variable is called independent (X)
and the second is called dependent (Y)
• Points are not joined
• No frequency table Y
* *
*
X
12 March 2025 3
Example

Wt. 67 69 85 83 74 81 97 92 114 85
(kg)
SBP 120 125 140 160 130 180 150 140 200 130
(mmHg)

12 March 2025 4
SBP(mmHg)
Wt. 67 69 85 83 74 81 97 92 114 85
(kg)
220 SBP 120 125 140 160 130 180 150 140 200 130
(mmHg)
200
180
160
140
120
100
80 wt (kg)
60 70 80 90 100 110 120

Scatter diagram of weight and systolic blood

pressure
12 March 2025 5
SBP (mmHg)
220

200

180

160

140

120

100

80
Wt (kg)
60 70 80 90 100 110 120

Scatter diagram of weight and systolic blood

pressure
12 March 2025 6
Scatter plots

The pattern of data is indicative of the type

of relationship between your two
variables:
 positive relationship
 negative relationship
 no relationship

12 March 2025 7
Positive relationship

12 March 2025 8
18

12
Height in CM

0
0 10 20 30 40 50 60 70 80 90

12 March 2025 Age in Weeks 9

Negative relationship

10
Reliability

Age of Car
12 March 2025
No relation

12 March 2025 11
Correlation Coefficient

Statistic showing the degree of relation

between two variables

12 March 2025 12
Simple Correlation coefficient (r)

 It is also called Pearson's correlation

or product moment correlation
coefficient.
 It measures the nature and strength
between two variables of
the quantitative type.

12 March 2025 13
The sign of r denotes the nature
of association

while the value of r denotes the

strength of association.

12 March 2025 14
 If the sign is +ve this means the relation
is direct (an increase in one variable is
associated with an increase in the
other variable and a decrease in one
variable is associated with a
decrease in the other variable).

 While if the sign is -ve this means an

inverse or indirect relationship (which
means an increase in one variable is
associated with a decrease in the other).

12 March 2025 15
How to compute the simple correlation coefficient (r)

 xy   x y
r n
 ( x)2
 ( y) 
2
x 
2 .  y 
2 
 n  n 
  

12 March 2025 16
Example:

A sample of 6 children was selected, data about their

age in years and weight in kilograms was recorded as
shown in the following table . It is required to find the
correlation between age and weight.

17
Weight Age serial
(Kg) (years) No
12 7 1
8 6 2
12 8 3
10 5 4
11 6 5

12 March 2025
13 9 6
These 2 variables are of the quantitative type, one
variable (Age) is called the independent and
denoted as (X) variable and the other (weight)
is called the dependent and denoted as (Y)
variables to find the relation between age and
weight compute the simple correlation coefficient
using the following formula:

 xy   x y
r  n
 ( x) 2  ( y)2 
x 
2 .  y 
2 
 n  n 
  

12 March 2025 18
Weight Age
Serial
Y2 X2 xy (Kg) (years)
.n
(y) (x)
144 49 84 12 7 1

64 36 48 8 6 2

144 64 96 12 8 3

100 25 50 10 5 4

121 36 66 11 6 5

169 81 117 13 9 6

=y2∑ =x2∑ xy=∑ =y ∑ =x ∑ Total

742 291 461 66 41

12 March 2025 19
41 66
461 
r 6
 (41) 2   (66) 2 
 291   . 742  
 6  6 

r = 0.759
strong direct correlation

12 March 2025 20
EXAMPLE: Relationship between Anxiety and
Test Scores
Anxiety Test X2 Y2 XY
)X( score (Y)

21
10 2 100 4 20
8 3 64 9 24
2 9 4 81 18
1 7 1 49 7
5 6 25 36 30
6 5 36 25 30
X = 32∑ Y = 32∑ X2 = 230∑ Y2 = 204∑ XY=129∑
12 March 2025
Calculating Correlation Coefficient

(6)(129)  (32)(32) 774  1024

r   .94
6(230)  32 6(204)  32 
2 2
(356)( 200)

r = - 0.94

Indirect strong correlation

12 March 2025 22
Spearman Rank Correlation Coefficient (rs)

It is a non-parametric measure of correlation.

This procedure makes use of the two sets of
ranks that may be assigned to the sample
values of x and Y.
Spearman Rank correlation coefficient could be
computed in the following cases:
Both variables are quantitative.
Both variables are qualitative ordinal.
One variable is quantitative and the other is
qualitative ordinal.

12 March 2025 23
Procedure:

1. Rank the values of X from 1 to n where n

is the numbers of pairs of values of X and
Y in the sample.
2. Rank the values of Y from 1 to n.
3. Compute the value of di for each pair of
observation by subtracting the rank of Yi
from the rank of Xi
4. Square each di and compute ∑di2 which
is the sum of the squared values.

12 March 2025 24
5. Apply the following formula

6 (di) 2
rs 1 
n(n 2  1)

The value of rs denotes the magnitude and

nature of association giving the same
interpretation as simple r.

12 March 2025 25
Example
In a study of the relationship between level injury
and income the following data was obtained. Find
the relationship between them and comment.

26
Income level of injury sample
(Y) (X) numbers
25 moderate. A
10 mild. B
8 fatal. C
10 Sever. D
15 Sever. E
50 Normal F
60 fatal. G
12 March 2025
Answer:
di2 di Rank Rank
Y X (Y) (X)
4 2 3 5 25 moderate. A

27
0.25 0.5 5.5 6 10 mild. B
30.25 -5.5 7 1.5 8 fatal. C
4 -2 5.5 3.5 10 Sever. D
0.25 -0.5 4 3.5 15 Sever. E
25 5 2 7 50 Normal F
0.25 0.5 1 1.5 60 fatal. G

∑ di2=64

12 March 2025
6 64
rs 1   0.1
7(48)

Comment:
There is an indirect weak correlation
between level of injury and income.

12 March 2025 28
exercise

12 March 2025 29
What is regression analysis?
• An extension of correlation
• A way of measuring the relationship
between two or more variables.
• Used to calculate the extent to which one
variable changes (DV) when other
variable(s) change (IV(s)).
• Used to help understand possible causal
effects of one variable on another.

12 March 2025 30
What is linear regression (LR)?
• Involves:
– one predictor (IV) and
– one outcome (DV)
• Explains a relationship using a straight line
fit to the data.

12 March 2025 31
Least squares criterion

12 March 2025 32
Least-Squares Regression
The most common method for fitting a
regression line is the method of least-
squares.
This method calculates the best-fitting line for
the observed data by minimizing the sum of the
squares of the vertical deviations from each
data point to the line (if a point lies on the fitted
line exactly, then its vertical deviation is 0).
Because the deviations are first squared, then
summed, there are no cancellations between
positive and negative values.
12 March 2025 33
Linear Regression - Model
Y
? (the actual value of Yi)
Yi b0 +
Y= bX1
ei

12 March 2025 Xi X34

Linear Regression - Model

Yi     X i   i Population

Regression Coefficients for a . . .

Y ˆ= b0 + b1Xi + e
Sample

Yˆ = b0 + b1Xi

12 March 2025 35
Simple Linear Regression Model
• The population simple linear regression model:

y= a + b x +  my|x=a+b x

36
or
Nonrandom or Random
Systematic Component
Component
• Where
• y is the dependent (response) variable, the variable we wish to explain or
predict;
• x is the independent (explanatory) variable, also called the predictor variable;
and
•  is the error term, the only random component in the model, and thus, the only
source of randomness in y.

12 March 2025
Cont…
• my|x is the mean of y when x is specified,
all called the conditional mean of Y.

• a is the intercept of the systematic

component of the regression relationship.
•  is the slope of the systematic
component.

12 March 2025 37
Picturing the Simple Linear Regression Model
Regression Plot • The simple linear regression
Y
model posits an exact linear
relationship between the

38
expected or average value of Y,
• the dependent variable Y, and X,
my|x=a +  x
the independent or predictor
variable:

{
y

Error:  }  = Slope my|x= a+b x

}

Actual observed values of Y (y) differ

1
from the expected value (my|x ) by

{
an unexplained or random
error(e):
a = Intercept
y = my|x + 
X = a+b x + 
0 x

12 March 2025
Assumptions of the Simple Linear
Regression Model
• The relationship between X and Y LINE assumptions of the Simple Linear
is a straight-Line (linear) Y
Regression Model
relationship.

39
• The values of the independent
variable X are assumed fixed (not
random); the only randomness in my|x=a +  x
the values of Y comes from the
error term .
• The errors  are uncorrelated (i.e. y
Independent) in successive
observations. The errors  are
Normally distributed with mean Identical normal
0 and variance 2(Equal distributions of errors, all
centered on the
variance). That is: ~ N(0,2) N(my|x, sy|x2) regression line.

X
x
12 March 2025
Fitting a Regression Line
Y Y

40
Data
Three errors from the
least squares regression
X line X
Y e

Three errors Errors from the least

from a fitted line squares regression
line are minimized
X X
12 March 2025
Errors in Regression
Y

41
yi . yˆ a  bx the fitted regression line

yˆi
{
Error ei yi  yˆi
yˆ the predicted value of Y for x

X
xi
12 March 2025
Sums of Squares, Cross Products, and Least
Squares Estimators
Sums of Squares andCross Products:
(å x)
2

lxx = å (x x ) å x
- 2
= 2
-
n 2
lyy = å (y - y)2 = å y2 -
(å y)
n
(å x)(å y)
ŷ a =
lxy bx å (x - x)(y - y) = å xy -
n
Least -squares regressionestimators:
lxy
b= lxx
ŷ a  bx
a = y - bx
12 March 2025 42
Example
x2 y2
 x 
Patient x y x ×y 2
592.62
1 22. 4 134. 0 501. 76 17956. 0 3001. 60 lxx  x 2
 41222.14  6104.66
4 25. 1 80. 2 630. 01 6432. 0 2013. 02 n 10
8 32. 4 97. 2 1049. 76 9447. 8 3149. 28
 y 
2
2 51. 6 167. 0 2662. 56 27889. 0 8617. 20 1428.702
3 58. 1 132. 3 3375. 61 17503. 3 7686. 63 l yy  y 2
 220360.47  16242.10
5 65. 9 100. 0 4342. 81 10000. 0 6590. 00 n 10
7
6
75. 3 187. 2
79. 7 139. 1
5670. 09
6352. 09
35043. 8
19348. 8
14096. 16
11086. 27 lxy  xy 
 x y 91866.46  592.6 1428.70 7201.70
10 85. 7 199. 4 7344. 49 39760. 4 17088. 58 n 10
9 96. 4 192. 3 9292. 96 36979. 3 18537. 72
Total 592. 6 1428. 7 41222. 14 220360. 5 91866. 46 7201.70
lxy
b  1.18
l 6104.66
xx

regression equation: a y  bx 1428.7  (1.18)  592.6 

10
 
 10 
yˆ 72.96  1.18 x 72.96

12 March 2025 43
Linear Regression - Variation

SSR

Due to regression.

SST

SST = SSR + SSE SSE

Random/unexplained.

12 March 2025 44
Linear Regression - Variation
Y 
SSE =(Yi - Yi
_ )2
SST = (Yi -
Y)2
 _
SSR = (Yi - Y)2
_
Y

Xi X
12 March 2025 45
Contents of correlation and linear
regression
• Correlation
• Introduction to simple linear regression
• Least-squares estimation of the parameter

46
Introduction
• Correlation and regression – for quantitative
variables
– Correlation: assessing the association between
quantitative variables
– Simple linear regression: description and prediction of
one quantitative variable from another
• Only considering linear relationships
• When considering correlation or carrying out a
regression analysis between two variables always
plot the data on a scatter plot first
47
Scatter plot

48
Cont…

49
Pearson Correlation Coefficient

50
51
Correlation – Linear Relationship

52
Correlation – Linear Relationship

53
Correlation Does Not Imply Causation
• Correlation does not mean causation
• If we observe high correlation between two variables, this does
not necessarily imply that because one variable has a high
value it causes the other to have a high value
• There may be a third variable causing a simultaneous change in
both variables.
• Example:
– Suppose we measured children’s shoe size and reading skills
– There would be a high correlation between these two variables, as
the shoe size increases so too do the child’s reading abilities
– But one does not cause the other, the underlying variable is age
– As age increases so too does shoes size and reading ability

54
Example: Percentage of children immunized against
DPT and under-five mortality rate for 20 countries,
1992
Nation Percentage Mortality Nation Percentage Mortality
immunized rate per immunized rate per
1000 live 1000 live
births births
Bolivia 77 118 Greece 54 9
Brazil 69 65 India 89 124
Cambodia 32 184 Italy 95 10
Canada 85 8 Japan 87 6
China 94 43 Mexico 91 33
Czech Repu. 99 12 Poland 98 16
Egypt 89 55 Russian Fed. 73 32
Ethiopia 13 208 Senegal 47 145
Finland 95 7 Turkey 76 87
France 95 9 United King. 90 9
55
Natio xi yi xi-x^ yi—y^ Natio xi yi xi-x^ yi—y^
n n
Bolivia 77 118 -0.4 59 Gree 54 9
Brazil 69 65 India 89 124
Camb. 32 184 Italy 95 10
Canad. 85 8 Japan 87 6
China 94 43 Mexi 91 33
Czech 99 12 Poland 98 16
Egypt 89 55 Russia 73 32
Ethio. 13 208 Seneg 47 145
Finla. 95 7 Turkey 76 87
France 95 9 United 90 9
Mean 77.4 59

56
57
Non-Parametric Correlation
• Rank correlation may be used whatever type of pattern
is seen in the scatter diagram, doesn’t specifically
assess linear association but more general association
• Spearman’s rank correlation rho
– Non-parametric measure of correlation – doesn’t make any
assumptions about the particular nature of the relationship
between the variables, doesn’t assume a linear relationship
– rho is a special case of Pearson’s r in which the two sets of
data are converted to rankings
– can test null hypothesis that the correlation is zero and
calculate confidence intervals
58
Formula

59
60
Linear regression
• Is used to explore the nature of the relationship
between two “continuous” normally distributed
random variables.
• Enables us to investigate the change in response
variable, which corresponds to a given change in
the explanatory variable.
• The ultimate objective of regression analysis is to
predict or estimate the value of the response that
is associated with a fixed value of the explanatory
variable
61
Example:
Cigarettes & coronary heart disease

IV = Cigarette DV = Coronary Heart

consumption Disease (CHD) 62
Example:
Cigarettes & coronary heart disease

• IV =X= Average no. of cigarettes per adult per

day
• DV =Y= Coronary Heart Disease mortality (rate
of deaths per 10,000 per year due to CHD)
• Unit of analysis = Country
• How fast does CHD mortality rise with a one
unit increase in smoking?
63
Data

64
Scatterplot
Cigarate CHD Cigarate CHD
11 26 5 4
9 21 5 18
9 24 5 12
9 21 5 3
8 19 4 11
8 13 4 15
8 19 4 6
6 11 3 13
6 23 3 4
5 15 3 14
5 13

65
Scatterplot with Line of Best Fit

66
Simple linear regression
• It is a model with a single regressor x that has a
linear relationship with a response y.
• The simple linear regression model is

• Where:
– Y= response variable - = slope
– X = regressor variable - ε =random error component
– = intercept
67
• X is
– controlled variable not random variable
– Deterministic or mathematical variable
• Y
– Is random variable and can’t be controlled
– It depends on the regressor variable

68
Basic assumptions on the model

i= 1 to n
1. εi is a random variable with zero mean & variable δ 2
(unknown). i.e. E(εi)=0 & v(εi)=δ2
2. εi & εj are uncorrelated. i≠j. So Cov (εi,εj)=0
3. εi is a normally distributed random variable, with
mean zero & variance δ2
εi ~ N (0, δ2) 69
the variable with out error is not random variable, it is the true population mean
value. If we add error on it we can find the sample statistics value which is the actual
or observed value. Mean of y/x=bo+b1xi and mean of y=bo+b1x
i+ error. Because
E(ε)=0

Because
β0+β1χi are
not
random
variable
b/c no
error.

70
Assumptions of the Simple Linear
Regression Model
• The relationship between X LINE assumptions of the Simple
and Y is a straight-Line (linear) Y
Linear Regression Model
relationship.
• The values of the independent
variable X are assumed fixed
(not random); the only my|x=a +  x
randomness in the values of Y
comes from the error term .
• The errors  are uncorrelated y
(i.e. Independent) in
successive observations. The
Identical normal
errors  are Normally distributions of errors,
distributed with mean 0 and all centered on the
regression line.
variance 2(Equal variance). N(my|x, sy|x2)
That is: ~ N(0,2)
X
x
71
Picturing the Simple Linear Regression Model
Y Regression Plot The simple linear regression model
posits an exact linear relationship
between the expected or average
value of Y, the dependent variable Y,
and X, the independent or predictor
my|x=a +  x variable:
my|x= a+b x
{
yi

Error:  }  = Slope Actual observed values of Y (yi)

differ from the expected value y at
}

1
(my|x ) by an unexplained or random

{
error(e):

a = Intercept
yi = my|x + 
0 x
X = a+b x + 
72
Estimation: The Method of Least Squares
Estimation of a simple linear regression relationship involves finding estimated or
predicted values of the intercept and slope of the linear regression line.

The estimated regression equation:

y= a+ bx + e

where a estimates the intercept of the population regression line, a ;

b estimates the slope of the population regression line,  ;
and e stands for the observed errors ------- the residuals from fitting the estimated
regression line a+ bx to a set of n points.

The estimated regres sion line:

y$ = a +b x
ŷ
$
where (y - hat) is th e value of Y lying n
o the fitted regression line f or a given
value of X .
73
Fitting a Regression Line
Y Y

Data
Three errors from the
least squares regression
X line X
Y e

Three errors Errors from the least

from a fitted line squares regression
line are minimized
X X
74
Fitting a Regression Line
Y Y

Data
The parameters β0 & β1 are
Three errors from the
unknown and must be least squares regression
estimatedXusing sample line X
Y data: e
(x1,y1), (x2,y2), …, (xn,yn).

Three errors Errors from the least

from a fitted line squares regression
line are minimized
X X
75
Fitting
The line fitted by least
a Regression Line
Y
square is the one that Y
makes the sume of
squares of all vertical
discrepancies as small as
possible

Data
Three errors from the
least squares regression
X line X
Y e

Three errors Errors from the least

from a fitted line squares regression
line are minimized
X X
76
30

Residual
CHD Mortality per 10, 000

Prediction

0
2 4 6 8 10 12

Cigarette Consumption per Adult per D ay 77

30
Ŷ=β
0 +β
1 χ1
Residual
CHD Mortality per 10, 000

Prediction

0
2 4 6 8 10 12

Cigarette Consumption per Adult per D ay 78

30
Ŷ=β
(x9,y9) 0 +β
1 χ1
Residual
CHD Mortality per 10, 000

Prediction

0
2 4 6 8 10 12

Cigarette Consumption per Adult per D ay 79

30
Ŷ=β
(x9,y9) 0 +β
1 χ1
Residual
CHD Mortality per 10, 000

Prediction

(x9,ŷ9)

0
2 4 6 8 10 12

Cigarette Consumption per Adult per D ay 80

30
Ŷ=β
(x9,y9) 0 +β
1 χ1
Residual
CHD Mortality per 10, 000

ε9=y9-ŷ9

Prediction

(x9,ŷ9)

0
2 4 6 8 10 12

Cigarette Consumption per Adult per D ay 81

General
Y

yi .
yˆi
Error ei yi  yˆi
{

X
xi
82
Least square estimation is
n
2 General
ssresiduals   i is min imum
i 1
Y

yi .
yˆi
{

X
xi
83
Least square estimation is
General
Y

yi . Now we can estimate the

yˆi
{ parameters (β0 & β1),
because the sum of
squares of all the
differences between the
observation yi and the
fitted line is minimum
X
xi
84
.

85
.
• The least square estimator of β0 & β1
must satisfy the following two equations

86
.
• The least square estimator of β0 & βWe
1 have two
normal
must satisfy the following two equations
equations and
two unknowns
and they are
independent
therefore we
can uniquely fit
β0 & β1

87
.
• So the estimator of are solution of the
equations

88
.
89
.

90
Regression Statistics

SST  (Y  Y ) 2

SSR  (Y   Y ) 2

SSE  (Y  Y ) 2
Variance to be
explained by predictors
(SST)

Y
X1

Variance Y
explained by Variance NOT
X1 explained by X1
(SSR) (SSE)
Regression Statistics

SST SSR  SSE

Regression Statistics

SSR
2
R 
SST
Coefficient of Determination
to judge the adequacy of the regression model
Regression Statistics
2
R R
S xy  xy
R 
S xx S yy  x y

Correlation
measures the strength of the linear association between two variables.
Regression Statistics
Standard Error for the regression model

S e  S  ˆ
2
e
2

SSE SSE  (Y  Y ) 2
2
S 
e
n 2
2
S e MSE
ANOVA
H 0 : 1 0
H A : 1 0

df SS MS Fcal P-value

Regression K=2) SSR SSR / df MSR / MSE P(F)

2-1
Residual K=2 SSE SSE / df
n-2
Total n-1 SST
If P(F)<a then we know that we get significantly better prediction of Y from the
regression model than by just predicting mean of Y.
ANOVA to test significance of regression model
Hypothesis Tests for Regression
Coefficients

H 0 :  i 0
H 1 :  i 0

bi   i
t( n  k  1) 
Sbi
Hypotheses Tests for Regression
Coefficients

H 0 : 1 0
H A : 1 0

b1  1 b1  1
t( n  k  1)  
S e (b1 ) 2
Se
S xx
Confidence Interval on Regression
Coefficients of leaner model

2 2
S S
b1  t / 2,( n  k  1) e
1 b1  t / 2,( n  k  1) e
S xx S xx

Confidence Interval for b1

Hypothesis Tests on Regression Coefficients

H 0 :  0 0
H A :  0 0

b0   0 b0   0
t( n  k  1)  
S e (b0 ) 1 X 2

S  
2
e

 n S xx 
Confidence Interval on Regression
Coefficients

2 1 X 2
 2 1 X 2
b0  t / 2,( n  k  1) S e     0 b0  t / 2,( n  k  1) S e   
 n S xx   n S xx 

Confidence Interval for the intercept

Hypotheses Test the Correlation Coefficient
By using t-test
H 0 :  0
H A :  0

R n 2
T0 
1 R2

We would reject the null hypothesis if t 0  t / 2 , n  2

Diagnostic Tests For Regressions
Expected distribution of residuals for a linear
model with normal distribution or residuals
(errors).

i

Yi
Diagnostic Tests For Regressions
Residuals for a non-linear fit

i

Yi
Diagnostic Tests For Regressions
Residuals for a quadratic function
or polynomial

i

Yi
Diagnostic Tests For Regressions
Residuals are not homogeneous
(increasing in variance)

i

Yi
Regression – important points

1. Ensure that the range of values

sampled for the predictor variable
is large enough to capture the full
range to responses by the response
variable. This means that the range
of sampled variable should be
wide enough to accommodate all
values of response variable.
Y

X
Y

X
Regression – important points
22Ensure that the distribution of
predictor values is approximately
uniform within the sampled range.
Y

X
Y

X
Readings

• Howell (2004) – Fundamentals - Regression (Ch10)

• Howell (2007) – Methods - Correlation & Regression (Ch
9)
• Francis (2004) – Relationships Between Metric Variables
- Section 3.1
Linear Regression Assumptions
• The values of the dependent variable Y should be Normally
distributed for each value of the independent variable X
(needed for hypothesis testing and confidence intervals)
• The variability of Y (variance or standard deviation) should
be the same for each value of X (homoscedasticity)
• The relationship between the two variables should be linear
• The observations should be independent or uncorrelated.
• Do not have to have both variables random, values of X do
not have to be random and they don’t have to be Normally
distributed

114
Cigarate xi CHD yi Xi-x~ Yi-y~ Cigarate xi CHD yi Xi-x~ Yi-y~
11 26 5.05 11.48 5 4 -0.95 -10.52
9 21 3.05 6.48 5 18 -0.95 3.48
9 24 3.05 9.48 5 12 -0.95 -2.52
9 21 3.05 6.48 5 3 -0.95 -11.52
8 19 2.05 4.48 4 11 -1.95 -3.52
8 13 2.05 -1.52 4 15 -1.95 0.48
8 19 2.05 4.48 4 6 -1.95 -8.52
6 11 0.5 -3.52 3 13 -2.95 -1.52
6 23 0.5 8.48 3 4 -2.95 -10.52
5 15 -0.95 0.48 3 14 -2.95 -0.52
5 13 -0.95 -1.52
mean 5.95 14.52
115
Making a prediction
• Assume that we want to predict CHD
mortality when cigarette consumption is 6.

Yˆ bX  a 2.04 X  2.37

Yˆ 2.04 * 6  2.37 14.61
• We predict 14.61 people/10,000 in that
country will die of coronary heart disease.
Accuracy of prediction
• Finnish smokers smoke 6 cigarettes/adult/day
• We predict 14.61 deaths/10,000
• They actually observed have 23 deaths/10,000
• Our error (“residual”) = 23 - 14.61 = 8.39
30

Residual
CHD Mortality per 10, 000

Prediction

0
2 4 6 8 10 12

Cigarette Consumption per Adult per D ay

Errors of prediction
• Residual variance
– The variability of predicted values
ˆ
(Y  Y ) 2
2
sY  Yˆ 
N 2
• Standard error of estimate
– The standard deviation of predicted values
Standard error of estimate
ˆ
(Y  Y ) 2
sY  Yˆ 
N 2
• A common measure of the accuracy of our
predictions
– We want it to be as small as possible.
– It has an inverse relationship to r2
(i.e., when r2 is large, the standard error of the
estimate will be small, and vice-versa)
Explained variance
• r = .71
• r2 = .712 =.51
• Approximately 50% in variability of incidence
of CHD mortality is associated with variability
in smoking.
Residuals

124
Regression Coefficient
• Regression coefficient:
– this is the slope of the regression line
– indicates the strength of the relationship between
the two variables
– interpreted as the expected change in y for a one-
unit change in x
– can calculate a standard error for the regression
coefficient
– can calculate a confidence interval for the coefficient
– can test the hypothesis that b = 0, i.e., that there is
no relationship between the two variables 125
Intercept
• Intercept:
– the estimated intercept a gives the value of y that
is expected when x = 0
– often not very useful as in many situations it may
not be realistic or relevant to consider x = 0
– it is possible to get a confidence interval and to
test the null hypothesis that the intercept is zero
and most statistical packages will report these

126
Coefficient of Determination, R-Squared
• The coefficient of determination or R-squared is the amount of
variability in the data set that is explained by the statistical model
• Used as a measure of how good predictions from the model will be
• In linear regression R-squared is the square of the correlation coefficient
• The regression analysis can be displayed as ANOVA table, many
statistical packages present the regression analysis in this format

– Often expressed as a percentage

– High R-squared says that the majority of the variability in the data is
explained by the model (good!)

127
Adjusted R-Squared
• Adjusted R-squared
– Sometimes an adjusted R-squared will be
presented in the output as well as the R- squared
– Adjusted R-squared is a modification to the R-
squared to compensate for the number of
explanatory or predictor variables in the model
(more relevant when considering multiple
regression)
– The adjusted R-squared will only increase if the
addition of the new predictor improves the
model more than would be expected by chance
128
Interpolation and Extrapolation
• Interpolation
– Making a prediction for Y within the range of values of
the predictor X in the sample used in the analysis
– Generally this is fine
• Extrapolation
– Making a prediction for Y outside the range of values
of the predictor X in the sample used in the analysis
– No way to check linearity outside the range of values
sampled, not a good idea to predict outside this range

129

Correlation and Regression
100% (2)
Correlation and Regression
54 pages
Tarea 4.2 Analisis de Regresesion MBA 5020
No ratings yet
Tarea 4.2 Analisis de Regresesion MBA 5020
4 pages
12.1correlation and simple linear
No ratings yet
12.1correlation and simple linear
45 pages
Corelation With Example
No ratings yet
Corelation With Example
112 pages
Correlation Regression
100% (1)
Correlation Regression
55 pages
Correlation_Linear_Logistic Regression
No ratings yet
Correlation_Linear_Logistic Regression
123 pages
Reg & Cor QMS 080-1
No ratings yet
Reg & Cor QMS 080-1
48 pages
Simple Regression and Simple Correlation: MA261 Statistical and Numerical Techniques March 24, 2022
No ratings yet
Simple Regression and Simple Correlation: MA261 Statistical and Numerical Techniques March 24, 2022
52 pages
Correlation and Linear Regression
No ratings yet
Correlation and Linear Regression
46 pages
Correlation and Linear Regression
No ratings yet
Correlation and Linear Regression
25 pages
Simple Linear Regression and Correlation
No ratings yet
Simple Linear Regression and Correlation
77 pages
Linear Regression Analysis_1
No ratings yet
Linear Regression Analysis_1
18 pages
Correlation & Regression
No ratings yet
Correlation & Regression
65 pages
Linear Regression and Correlation
No ratings yet
Linear Regression and Correlation
26 pages
CH 6
No ratings yet
CH 6
43 pages
Simple Linear Regression Part 1
No ratings yet
Simple Linear Regression Part 1
63 pages
Correlation
100% (1)
Correlation
29 pages
Correlation and Regression: Associate Professor Georgi Iskrov, PHD Department of Social Medicine and Public Health
No ratings yet
Correlation and Regression: Associate Professor Georgi Iskrov, PHD Department of Social Medicine and Public Health
28 pages
Lecture 7 - Correlation Regression
No ratings yet
Lecture 7 - Correlation Regression
47 pages
Simple Linear Regression and Correlation
No ratings yet
Simple Linear Regression and Correlation
32 pages
Lecture SLR (1)
No ratings yet
Lecture SLR (1)
60 pages
Basic Statistics (3685) PPT - Lecture On 22-01-2019
No ratings yet
Basic Statistics (3685) PPT - Lecture On 22-01-2019
29 pages
Stat Chapter 6
No ratings yet
Stat Chapter 6
23 pages
Correlation
No ratings yet
Correlation
53 pages
Simple Linear Regression and Correlation PDF
No ratings yet
Simple Linear Regression and Correlation PDF
7 pages
Lecture 8 Correlation and Linear Regression
No ratings yet
Lecture 8 Correlation and Linear Regression
66 pages
26 - Correlation and Regression Analysis
No ratings yet
26 - Correlation and Regression Analysis
50 pages
Stat Cor Reg
No ratings yet
Stat Cor Reg
85 pages
chapter8
No ratings yet
chapter8
8 pages
Correlation and Simple Linear Regression Analyses: Objectives
No ratings yet
Correlation and Simple Linear Regression Analyses: Objectives
6 pages
CH 6
No ratings yet
CH 6
42 pages
correlation_and_regression
No ratings yet
correlation_and_regression
62 pages
Correlation Regression
No ratings yet
Correlation Regression
58 pages
Lecture 6 - Linear Regression and Correlation
No ratings yet
Lecture 6 - Linear Regression and Correlation
40 pages
Correlation and Regression 2020
No ratings yet
Correlation and Regression 2020
63 pages
Regression&Corr&Annova
No ratings yet
Regression&Corr&Annova
71 pages
Regression and Correlation Analysis
No ratings yet
Regression and Correlation Analysis
16 pages
Chapter 1
No ratings yet
Chapter 1
22 pages
06 Correlation and Regression
No ratings yet
06 Correlation and Regression
63 pages
bio6
No ratings yet
bio6
26 pages
Ch 4- Correlation and Regression YARA&LAMA
No ratings yet
Ch 4- Correlation and Regression YARA&LAMA
27 pages
Correlation and Simple Linear Regression: Y. I.E. X
100% (1)
Correlation and Simple Linear Regression: Y. I.E. X
9 pages
Common Pitfalls in Statistical Analysis: Linear Regression Analysis
No ratings yet
Common Pitfalls in Statistical Analysis: Linear Regression Analysis
4 pages
Advanced Marketing Research
No ratings yet
Advanced Marketing Research
32 pages
Correlation and Regression
No ratings yet
Correlation and Regression
16 pages
Linear Regression
No ratings yet
Linear Regression
216 pages
Chapter 8
No ratings yet
Chapter 8
8 pages
Correlation Anad Regression
No ratings yet
Correlation Anad Regression
13 pages
Correlation Regression Tutorial
No ratings yet
Correlation Regression Tutorial
42 pages
Correlation and Linear
No ratings yet
Correlation and Linear
27 pages
Correlation Regression
No ratings yet
Correlation Regression
18 pages
14 - Regresi dan Korelasi
No ratings yet
14 - Regresi dan Korelasi
34 pages
QT _Unit 2_Part B - Regression
No ratings yet
QT _Unit 2_Part B - Regression
40 pages
Correlation and Regression
No ratings yet
Correlation and Regression
81 pages
Correlation and Regression
No ratings yet
Correlation and Regression
31 pages
Chapter 5 - 1
No ratings yet
Chapter 5 - 1
5 pages
Linear Regression Analysis: Gaurav Garg (IIM Lucknow)
No ratings yet
Linear Regression Analysis: Gaurav Garg (IIM Lucknow)
96 pages
7 Regression
No ratings yet
7 Regression
96 pages
Lecture - Correlation and Regression GEG 222
100% (1)
Lecture - Correlation and Regression GEG 222
67 pages
Business Decision Making II Simple Linear Regression: Dr. Nguyen Ngoc Phan
No ratings yet
Business Decision Making II Simple Linear Regression: Dr. Nguyen Ngoc Phan
69 pages
Ed Icu Transfer
No ratings yet
Ed Icu Transfer
8 pages
Anemia 2019 Trevised
No ratings yet
Anemia 2019 Trevised
5 pages
Decentralization
No ratings yet
Decentralization
11 pages
006 the Head Nurse Rule 003
No ratings yet
006 the Head Nurse Rule 003
8 pages
Abebish Thesis Ppt May 2017
No ratings yet
Abebish Thesis Ppt May 2017
45 pages
Dtsch_Arztebl_Int-105-0865
No ratings yet
Dtsch_Arztebl_Int-105-0865
7 pages
pain assessment check list
No ratings yet
pain assessment check list
3 pages
Abebayehu Derejaw Do
No ratings yet
Abebayehu Derejaw Do
63 pages
Clinics
No ratings yet
Clinics
4 pages
lotte 3red
No ratings yet
lotte 3red
8 pages
Quality_improvement_initiative_for_pain_management
No ratings yet
Quality_improvement_initiative_for_pain_management
8 pages
pain QI
No ratings yet
pain QI
5 pages
36623_CE[Ra1]_F(SHU)_PF1(AGAK)_PFA(KM)_PB(AG_SHU)_PN(SL)
No ratings yet
36623_CE[Ra1]_F(SHU)_PF1(AGAK)_PFA(KM)_PB(AG_SHU)_PN(SL)
5 pages
Quality_Improvement_in_Postoperative_Pain_Manageme
No ratings yet
Quality_Improvement_in_Postoperative_Pain_Manageme
8 pages
[Journal_of_Perinatal_Medicine]_Expert_recommendations_for_the_diagnosis_and_treatment_of_iron-deficiency_anemia_during_pregnancy_and_the_postpartum_period_in_the_Asia-Pacific_region
No ratings yet
[Journal_of_Perinatal_Medicine]_Expert_recommendations_for_the_diagnosis_and_treatment_of_iron-deficiency_anemia_during_pregnancy_and_the_postpartum_period_in_the_Asia-Pacific_region
10 pages
Anaemia in pregnancy and the postpartum period
No ratings yet
Anaemia in pregnancy and the postpartum period
18 pages
EQP
No ratings yet
EQP
6 pages
Auditing & practice Ass
No ratings yet
Auditing & practice Ass
9 pages
Module 3_Quality Improvement Models_Nov 2025
No ratings yet
Module 3_Quality Improvement Models_Nov 2025
22 pages
Module 5_Generating Change_Nov 2025
No ratings yet
Module 5_Generating Change_Nov 2025
33 pages
Module 4_Problem Identification, Prioritization and Aim Statement_Nov 2025
No ratings yet
Module 4_Problem Identification, Prioritization and Aim Statement_Nov 2025
33 pages
Los
No ratings yet
Los
10 pages
Critical apprisal of 2008 MPH summer
No ratings yet
Critical apprisal of 2008 MPH summer
39 pages
Reducing Observation Unit Length of Stay Hours_ A Quality Improve
No ratings yet
Reducing Observation Unit Length of Stay Hours_ A Quality Improve
58 pages
DUTH capacity building protocol edited
No ratings yet
DUTH capacity building protocol edited
5 pages
Statistical method of categorical variable
No ratings yet
Statistical method of categorical variable
68 pages
Demography and Health service statistics
No ratings yet
Demography and Health service statistics
63 pages
Critical appraisal
No ratings yet
Critical appraisal
22 pages
ANOVA
No ratings yet
ANOVA
39 pages
DUTH Briefing and debriefing protocol edited
No ratings yet
DUTH Briefing and debriefing protocol edited
6 pages
t Test Assignment
No ratings yet
t Test Assignment
2 pages
STATS
No ratings yet
STATS
26 pages
Intro To Vae
No ratings yet
Intro To Vae
89 pages
Answers To Additional Business Exercises Chapter 19 Two Way Between Groups ANOVA
No ratings yet
Answers To Additional Business Exercises Chapter 19 Two Way Between Groups ANOVA
3 pages
ECON 3350/7350 Applied Econometrics For Macroeconomics and Finance
No ratings yet
ECON 3350/7350 Applied Econometrics For Macroeconomics and Finance
18 pages
Parametric Vs Non Parametric Statistical Tests
No ratings yet
Parametric Vs Non Parametric Statistical Tests
3 pages
Data Mining Notes Unit 4
No ratings yet
Data Mining Notes Unit 4
30 pages
MONOVA
No ratings yet
MONOVA
22 pages
POWER9 IC922 Level 2
No ratings yet
POWER9 IC922 Level 2
7 pages
Econ 3049: Econometrics: Department of Economics The University of The West Indies, Mona
No ratings yet
Econ 3049: Econometrics: Department of Economics The University of The West Indies, Mona
16 pages
Tutorial 7
No ratings yet
Tutorial 7
4 pages
Lecture 6: Classical Normal Linear Regression Model Some Basic Ideas
No ratings yet
Lecture 6: Classical Normal Linear Regression Model Some Basic Ideas
9 pages
Modelling Survival Data in Medical Research, 4th Edition Latest Edition Download
93% (14)
Modelling Survival Data in Medical Research, 4th Edition Latest Edition Download
16 pages
Scatter Plots and Linear Correlations: R N Xy
No ratings yet
Scatter Plots and Linear Correlations: R N Xy
3 pages
Ebook Ebook PDF Business Analytics 3Rd Edition by Jeffrey D Camm All Chapter PDF Docx Kindle
100% (31)
Ebook Ebook PDF Business Analytics 3Rd Edition by Jeffrey D Camm All Chapter PDF Docx Kindle
41 pages
Thesis On Financial Time Series
100% (3)
Thesis On Financial Time Series
7 pages
Statistical Techniques In Business And Economics 17th Edition Lind Solutions Manual instant download
100% (1)
Statistical Techniques In Business And Economics 17th Edition Lind Solutions Manual instant download
45 pages
Data Treatment Excercises - Achig - Cusangua - Parra
No ratings yet
Data Treatment Excercises - Achig - Cusangua - Parra
13 pages
Ba ZG524 Course Handout
No ratings yet
Ba ZG524 Course Handout
9 pages
econometrics project- Maternal Mortality Analysis
No ratings yet
econometrics project- Maternal Mortality Analysis
23 pages
Statistical Methods For The Social And Behavioural Sciences A Modelbased Approach First Edition David B Flora instant download
No ratings yet
Statistical Methods For The Social And Behavioural Sciences A Modelbased Approach First Edition David B Flora instant download
80 pages
Linear Models and The Relevant Distributions and Matrix Algebra
No ratings yet
Linear Models and The Relevant Distributions and Matrix Algebra
539 pages
An in Vitro Comparison and Evaluation of Sealing Ability of Newly Introduced C-Point System, Cold Lateral Condensation, and Thermoplasticized Gutta-Percha Obturating Technique, A Dye Extraction Study
No ratings yet
An in Vitro Comparison and Evaluation of Sealing Ability of Newly Introduced C-Point System, Cold Lateral Condensation, and Thermoplasticized Gutta-Percha Obturating Technique, A Dye Extraction Study
6 pages
Abhishek Data Scientist Resume
0% (1)
Abhishek Data Scientist Resume
5 pages
QTT Project Crease Stifness Problem
No ratings yet
QTT Project Crease Stifness Problem
38 pages
Model Selection Evaluation Algorithm Selection 1684595082
No ratings yet
Model Selection Evaluation Algorithm Selection 1684595082
51 pages
Experiments With One Factor (Ch.3. Analysis of Variance - Anova)
No ratings yet
Experiments With One Factor (Ch.3. Analysis of Variance - Anova)
34 pages
Anova
100% (2)
Anova
49 pages
TMP 6932
No ratings yet
TMP 6932
12 pages

13simple linear regression

Uploaded by

13simple linear regression

Uploaded by

• YILMA CHISHA

• ASISSTANT PROFESSOR, BHI

Finding the relationship between

Scatter diagram of weight and systolic blood

Scatter diagram of weight and systolic blood

The pattern of data is indicative of the type

12 March 2025 Age in Weeks 9

Statistic showing the degree of relation

 It is also called Pearson's correlation

while the value of r denotes the

 While if the sign is -ve this means an

A sample of 6 children was selected, data about their

=y2∑ =x2∑ xy=∑ =y ∑ =x ∑ Total

(6)(129)  (32)(32) 774  1024

Indirect strong correlation

It is a non-parametric measure of correlation.

1. Rank the values of X from 1 to n where n

The value of rs denotes the magnitude and

12 March 2025 Xi X34

Regression Coefficients for a . . .

• a is the intercept of the systematic

Error:  }  = Slope my|x= a+b x

Actual observed values of Y (y) differ

Three errors Errors from the least

regression equation: a y  bx 1428.7  (1.18)  592.6 

SST = SSR + SSE SSE

IV = Cigarette DV = Coronary Heart

• IV =X= Average no. of cigarettes per adult per

Error:  }  = Slope Actual observed values of Y (yi)

The estimated regression equation:

where a estimates the intercept of the population regression line, a ;

The estimated regres sion line:

Three errors Errors from the least

Three errors Errors from the least

Three errors Errors from the least

Cigarette Consumption per Adult per D ay 77

Cigarette Consumption per Adult per D ay 78

Cigarette Consumption per Adult per D ay 79

Cigarette Consumption per Adult per D ay 80

Cigarette Consumption per Adult per D ay 81

yi . Now we can estimate the

SST SSR  SSE

Regression K=2) SSR SSR / df MSR / MSE P(F)

Confidence Interval for b1

Confidence Interval for the intercept

We would reject the null hypothesis if t 0  t / 2 , n  2

1. Ensure that the range of values

• Howell (2004) – Fundamentals - Regression (Ch10)

Yˆ bX  a 2.04 X  2.37

Cigarette Consumption per Adult per D ay

– Often expressed as a percentage

You might also like