0% found this document useful (0 votes)
20 views36 pages

Lecture 7 - Regression

Uploaded by

Dr. Anis Fatima
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views36 pages

Lecture 7 - Regression

Uploaded by

Dr. Anis Fatima
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 36

Course contact information

• Course Provider: Dr Anis Fatima

• Office: Room 5, Second Floor IM


Building

• Phone Ext: 2250

• Email: [email protected]

Driveonelink:
https://1drv.ms/f/s!AkgKqDvMcQJRgjeZOGE0fgYluSRX
1
Books and lecture notes

2
Correlation vs. Regression
• A scatter diagram can be used to show the
relationship between two variables
• Correlation analysis is used to measure
strength of the association (linear relationship)
between two variables
– Correlation is only concerned with strength of the
relationship
– No causal effect is implied with correlation
– Scatter diagrams were first presented
– Correlation was first presented
Introduction to
Regression Analysis
• Regression analysis is used to:
– Predict the value of a dependent variable based on the value
of at least one independent variable
– Explain the impact of changes in an independent variable on
the dependent variable
Dependent variable: the variable we wish to predict
or explain
Independent variable: the variable used to explain
the dependent variable
Simple Linear Regression Model

• Only one independent variable, X


• Relationship between X and Y is described
by a linear function
• Changes in Y are assumed to be caused by
changes in X
Types of Relationships
Linear relationships Curvilinear relationships

Y Y

X X

Y Y

X X
Types of Relationships
(continued)
Strong relationships Weak relationships

Y Y

X X

Y Y

X X
Types of Relationships
(continued)
No relationship

X
Simple Linear Regression Model

Population Random
Population Independent Error
Slope
Y intercept Variable term
Coefficient
Dependent
Variable

Yi  β0  β1Xi  ε i
Linear component Random Error
component
Simple Linear Regression Model
(continued)

Y Yi  β0  β1Xi  ε i
Observed Value
of Y for Xi

εi Slope = β1

Predicted Value Random Error for this Xi


of Y for Xi value

Intercept = β0

Xi
X
Individual observations around true
regression line
Simple Linear Regression

•The simple linear regression considers


a single regressor or predictor x and a
dependent or response variable Y.
• The expected value of Y at each level
of x is a random variable:
E(Y|x) = b0 + b1x

• We assume that each observation, Y,


can be described by the model
Y = b 0 + b 1x + 

12
Simple Linear Regression
Least Squares Estimates
The least-squares estimates of the intercept and slope in
the simple linear regression model are
ˆ  y
 ˆ x
 
(11-1)
 n  n 
  yi    xi 
n   
   i 1 
 yi xi  i 1
n
ˆ 
 i 1
 2
 n 
  xi 
n  
(11-2)  xi2   i 1 
i 1 n

y  (1/n)  in1 yi x  (1/n)  in1 xi


where and . 13
Simple Linear Regression
Notation
2
 n 
  xi 
n n  
S xx   xi  x 2   xi2   i 1 
i 1 i 1 n

 n  n 
  xi    yi 
n n   
S xy   yi xi  x 2   xi yi   i 1   i 1 
i 1 i 1 n

14
11-2: Simple Linear Regression
The fitted or estimated regression line is therefore

ˆ 
yˆ   ˆ x
 
(11-3)

Note that each pair of observations satisfies the


relationship
yi  ˆ   ˆ  xi  ei , i  1, 2,  , n

ei  yi  yˆ i

where is called the residual. The residual


describes the error in the fit of the model to the ith
observation yi.
Sec 11-2 Simple Linear Regression
15
Example

16
Oxygen Purity
We will fit a simple linear regression model to the oxygen purity
data in
Table 11-1. The following
20
quantities
20
may be computed:
n  20  xi  23.92  yi  1,843.21
i 1 i 1
x  1.1960 y  92.1605
20 20
 yi2  170,044.5321  xi2  29.2892
i 1 i 1
20
 xi yi  2,214.6566
i 1
2
 20 
  xi 
20   ( 23.92) 2
S xx   xi 
2  i 1   29.2892 
i 1 20 20
 0.68088
an  20   20 
  xi    yi 
d 20   
S xy   xi yi   i 1   i 1 
i 1 20
( 23.92) (1,843.21)
 2,214.6566   10.17744
20
17
Oxygen Purity - continued
Therefore, the least squares estimates of the slope and
intercept are
S xy 10.17744
ˆ 1    14.94748
S xx 0.68088

and
ˆ 0  y  ˆ  x  92.1605  (14.94748)1.196  74.28331

The fitted simple linear regression model (with the


coefficients reportedˆ to three decimal places) is
y  74.283  14.947 x

18
Simple Linear Regression
Estimating 2
The error sum of squares is

n n
SS E   ei2    yi  yˆ i 2
i 1 i 1

It can be shown that the expected


value of the error sum of squares is
E(SSE) = (n – 2)2.

19
Simple Linear Regression
Estimating 2
An unbiased estimator of 2 is
SS E
ˆ2 
 (11-4)
n2

where SSE can be easily


computed using
SS E  SST  ̂1S xy (11-5)

20
Properties of the Least Squares Estimators

• Slope Properties

2
ˆ )
E ( ˆ ) 
1 1 V (1
S xx

• Intercept Properties

 1 x 2 
E (ˆ 0 )  0 and V (ˆ 0 )   2   
 n S xx 

21
Confidence Intervals
11-5.1 Confidence Intervals on the Slope and
Intercept
Definition
Under the assumption that the observation are normally and
independently distributed, a 100(1 - a)% confidence interval on
the slope b1 in simple linear regression is

ˆ t ˆ2
 ˆ t ˆ2

1 /2, n  2  1  1 /2, n  2
(11-11) S xx S xx

Similarly, a 100(1 - a)% confidence interval on the intercept b0 is


ˆ 2 1 x2 
0  t/2, n  2 
ˆ   
 n S xx 

2 1 x2 
 0  ˆ 0  t/2, n  2 
ˆ   
 n S xx 
(11-12)
22
Confidence Intervals
EXAMPLE Oxygen Purity Confidence Interval on the Slope
We will find a 95% confidence interval on the slope of the
regression line using the ˆ 1  14.947
data in, SExample ˆ 2  1.18 that
xx  0.6808811-1. 
Recall
, and (see Table 11-2). Then, from
Equation 11-11 we find
ˆ ˆ2 ˆ ˆ2
  t0.025,18  1  1  t0.025,18
S xx S xx

Or 1.18 1.18
14.947  2.101  1  14.947  2.101
0.68088 0.68088

This simplifies to

12.181  b1  17.713

Practical Interpretation: This CI does not include zero, so there


is strong evidence (at a = 0.05) that the slope is not zero. The CI
is reasonably narrow (2.766) because the error variance is
fairly small.
23
Adequacy of the Regression Model
Residual Analysis
• The residuals from a regression model are ei =
yi - ŷi , where yi is an actual observation and ŷi is
the corresponding fitted value from the
regression model.

• Analysis of the residuals is frequently helpful


in checking the assumption that the errors are
approximately normally distributed with
constant variance, and in determining whether
additional terms in the model would be useful.

24
Adequacy of the Regression Model
EXAMPLE Oxygen Purity Residuals
The regression model for the oxygen purity data in Example
isyˆ  74.283  14.947 x

Table 11-4 presents the observed and predicted values of y


at each value of x from this data set, along with the
corresponding residual. These values were computed using
Minitab and show the number of decimal places typical of
computer output.

A normal probability plot of the residuals is shown in Fig. 11-


10. Since the residuals fall approximately alongŷia straight
line in the figure, we conclude that there is no severe
departure from normality.

The residuals are also plotted against the predicted value25


in Fig. 11-11 and against the hydrocarbon levels x in Fig. 11-
Adequacy of the Regression Model
Example

26
Adequacy of the Regression Model
Example

Figure 11-10 Normal


probability plot of
residuals, Example 11-
7.

27
Adequacy of the Regression Model
Example

Figure 11-11 Plot of


residuals versus
predicted oxygen
purity, ŷ, Example 11-7.

28
Adequacy of the Regression Model
Coefficient of Determination (R2)
• The
quantity 2 SS R SS E
R  1
SST SST

is called the coefficient of determination


and is often used to judge the adequacy of
a regression model.
• 0  R2  1;
• We often refer (loosely) to R2 as the amount
of variability in the data explained or
accounted for by the regression model.

29
Adequacy of the Regression Model
Coefficient of Determination (R2)
• For the oxygen purity regression
model,
R2 = SSR/SST
= 152.13/173.38
= 0.877
• Thus, the model accounts for 87.7%
of the variability in the data.

30
Some Useful Transformations to Linearize
Diagrams depicting functions listed in Table
11.6
Data for Example

11 - 33
Pressure and volume data and fitted
regression
Example
• A study was made on the amount of converted sugar
in a certain process at various temperatures. The
data were coded and recorded as follows:
• Estimate the linear
regression line.
• Estimate the mean amount
of converted sugar produced
when the coded
temperature is 1.75.
• Plot the residuals versus
temperature. Comment.
35
36

You might also like