0% found this document useful (0 votes)

20 views36 pages

Lecture 7 - Regression

Uploaded by

Dr. Anis Fatima

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views36 pages

Lecture 7 - Regression

Uploaded by

Dr. Anis Fatima

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 36

Course contact information

• Course Provider: Dr Anis Fatima

• Office: Room 5, Second Floor IM

Building

• Phone Ext: 2250

• Email: [email protected]

Driveonelink:
https://1drv.ms/f/s!AkgKqDvMcQJRgjeZOGE0fgYluSRX
1
Books and lecture notes

2
Correlation vs. Regression
• A scatter diagram can be used to show the
relationship between two variables
• Correlation analysis is used to measure
strength of the association (linear relationship)
between two variables
– Correlation is only concerned with strength of the
relationship
– No causal effect is implied with correlation
– Scatter diagrams were first presented
– Correlation was first presented
Introduction to
Regression Analysis
• Regression analysis is used to:
– Predict the value of a dependent variable based on the value
of at least one independent variable
– Explain the impact of changes in an independent variable on
the dependent variable
Dependent variable: the variable we wish to predict
or explain
Independent variable: the variable used to explain
the dependent variable
Simple Linear Regression Model

• Only one independent variable, X

• Relationship between X and Y is described
by a linear function
• Changes in Y are assumed to be caused by
changes in X
Types of Relationships
Linear relationships Curvilinear relationships

Y Y

X X

Y Y

X X
Types of Relationships
(continued)
Strong relationships Weak relationships

Y Y

X X

Y Y

X X
Types of Relationships
(continued)
No relationship

X
Simple Linear Regression Model

Population Random
Population Independent Error
Slope
Y intercept Variable term
Coefficient
Dependent
Variable

Yi  β0  β1Xi  ε i
Linear component Random Error
component
Simple Linear Regression Model
(continued)

Y Yi  β0  β1Xi  ε i
Observed Value
of Y for Xi

εi Slope = β1

Predicted Value Random Error for this Xi

of Y for Xi value

Intercept = β0

Xi
X
Individual observations around true
regression line
Simple Linear Regression

•The simple linear regression considers

a single regressor or predictor x and a
dependent or response variable Y.
• The expected value of Y at each level
of x is a random variable:
E(Y|x) = b0 + b1x

• We assume that each observation, Y,

can be described by the model
Y = b 0 + b 1x + 

12
Simple Linear Regression
Least Squares Estimates
The least-squares estimates of the intercept and slope in
the simple linear regression model are
ˆ  y
 ˆ x
 
(11-1)
 n  n 
  yi    xi 
n   
   i 1 
 yi xi  i 1
n
ˆ 
 i 1
 2
 n 
  xi 
n  
(11-2)  xi2   i 1 
i 1 n

y  (1/n)  in1 yi x  (1/n)  in1 xi

where and . 13
Simple Linear Regression
Notation
2
 n 
  xi 
n n  
S xx   xi  x 2   xi2   i 1 
i 1 i 1 n

 n  n 
  xi    yi 
n n   
S xy   yi xi  x 2   xi yi   i 1   i 1 
i 1 i 1 n

14
11-2: Simple Linear Regression
The fitted or estimated regression line is therefore

ˆ 
yˆ   ˆ x
 
(11-3)

Note that each pair of observations satisfies the

relationship
yi  ˆ   ˆ  xi  ei , i  1, 2,  , n

ei  yi  yˆ i

where is called the residual. The residual

describes the error in the fit of the model to the ith
observation yi.
Sec 11-2 Simple Linear Regression
15
Example

16
Oxygen Purity
We will fit a simple linear regression model to the oxygen purity
data in
Table 11-1. The following
20
quantities
20
may be computed:
n  20  xi  23.92  yi  1,843.21
i 1 i 1
x  1.1960 y  92.1605
20 20
 yi2  170,044.5321  xi2  29.2892
i 1 i 1
20
 xi yi  2,214.6566
i 1
2
 20 
  xi 
20   ( 23.92) 2
S xx   xi 
2  i 1   29.2892 
i 1 20 20
 0.68088
an  20   20 
  xi    yi 
d 20   
S xy   xi yi   i 1   i 1 
i 1 20
( 23.92) (1,843.21)
 2,214.6566   10.17744
20
17
Oxygen Purity - continued
Therefore, the least squares estimates of the slope and
intercept are
S xy 10.17744
ˆ 1    14.94748
S xx 0.68088

and
ˆ 0  y  ˆ  x  92.1605  (14.94748)1.196  74.28331

The fitted simple linear regression model (with the

coefficients reportedˆ to three decimal places) is
y  74.283  14.947 x

18
Simple Linear Regression
Estimating 2
The error sum of squares is

n n
SS E   ei2    yi  yˆ i 2
i 1 i 1

It can be shown that the expected

value of the error sum of squares is
E(SSE) = (n – 2)2.

19
Simple Linear Regression
Estimating 2
An unbiased estimator of 2 is
SS E
ˆ2 
 (11-4)
n2

where SSE can be easily

computed using
SS E  SST  ̂1S xy (11-5)

20
Properties of the Least Squares Estimators

• Slope Properties

2
ˆ )
E ( ˆ ) 
1 1 V (1
S xx

• Intercept Properties

 1 x 2 
E (ˆ 0 )  0 and V (ˆ 0 )   2   
 n S xx 

21
Confidence Intervals
11-5.1 Confidence Intervals on the Slope and
Intercept
Definition
Under the assumption that the observation are normally and
independently distributed, a 100(1 - a)% confidence interval on
the slope b1 in simple linear regression is

ˆ t ˆ2
 ˆ t ˆ2

1 /2, n  2  1  1 /2, n  2
(11-11) S xx S xx

Similarly, a 100(1 - a)% confidence interval on the intercept b0 is

ˆ 2 1 x2 
0  t/2, n  2 
ˆ   
 n S xx 


2 1 x2 
 0  ˆ 0  t/2, n  2 
ˆ   
 n S xx 
(11-12)
22
Confidence Intervals
EXAMPLE Oxygen Purity Confidence Interval on the Slope
We will find a 95% confidence interval on the slope of the
regression line using the ˆ 1  14.947
data in, SExample ˆ 2  1.18 that
xx  0.6808811-1. 
Recall
, and (see Table 11-2). Then, from
Equation 11-11 we find
ˆ ˆ2 ˆ ˆ2
  t0.025,18  1  1  t0.025,18
S xx S xx

Or 1.18 1.18
14.947  2.101  1  14.947  2.101
0.68088 0.68088

This simplifies to

12.181  b1  17.713

Practical Interpretation: This CI does not include zero, so there

is strong evidence (at a = 0.05) that the slope is not zero. The CI
is reasonably narrow (2.766) because the error variance is
fairly small.
23
Adequacy of the Regression Model
Residual Analysis
• The residuals from a regression model are ei =
yi - ŷi , where yi is an actual observation and ŷi is
the corresponding fitted value from the
regression model.

• Analysis of the residuals is frequently helpful

in checking the assumption that the errors are
approximately normally distributed with
constant variance, and in determining whether
additional terms in the model would be useful.

24
Adequacy of the Regression Model
EXAMPLE Oxygen Purity Residuals
The regression model for the oxygen purity data in Example
isyˆ  74.283  14.947 x

Table 11-4 presents the observed and predicted values of y

at each value of x from this data set, along with the
corresponding residual. These values were computed using
Minitab and show the number of decimal places typical of
computer output.

A normal probability plot of the residuals is shown in Fig. 11-

10. Since the residuals fall approximately alongŷia straight
line in the figure, we conclude that there is no severe
departure from normality.

The residuals are also plotted against the predicted value25

in Fig. 11-11 and against the hydrocarbon levels x in Fig. 11-
Adequacy of the Regression Model
Example

26
Adequacy of the Regression Model
Example

Figure 11-10 Normal

probability plot of
residuals, Example 11-
7.

27
Adequacy of the Regression Model
Example

Figure 11-11 Plot of

residuals versus
predicted oxygen
purity, ŷ, Example 11-7.

28
Adequacy of the Regression Model
Coefficient of Determination (R2)
• The
quantity 2 SS R SS E
R  1
SST SST

is called the coefficient of determination

and is often used to judge the adequacy of
a regression model.
• 0  R2  1;
• We often refer (loosely) to R2 as the amount
of variability in the data explained or
accounted for by the regression model.

29
Adequacy of the Regression Model
Coefficient of Determination (R2)
• For the oxygen purity regression
model,
R2 = SSR/SST
= 152.13/173.38
= 0.877
• Thus, the model accounts for 87.7%
of the variability in the data.

30
Some Useful Transformations to Linearize
Diagrams depicting functions listed in Table
11.6
Data for Example

11 - 33
Pressure and volume data and fitted
regression
Example
• A study was made on the amount of converted sugar
in a certain process at various temperatures. The
data were coded and recorded as follows:
• Estimate the linear
regression line.
• Estimate the mean amount
of converted sugar produced
when the coded
temperature is 1.75.
• Plot the residuals versus
temperature. Comment.
35
36

Chapter 11 - Simple Linear Regression and Correlation
100% (1)
Chapter 11 - Simple Linear Regression and Correlation
51 pages
Curve Fitting
100% (1)
Curve Fitting
43 pages
Regression & Correlation
No ratings yet
Regression & Correlation
44 pages
Topic 11. Simple Linear Regression
No ratings yet
Topic 11. Simple Linear Regression
70 pages
IE354 Slides 10 Chp11
No ratings yet
IE354 Slides 10 Chp11
68 pages
US - TMC - 06 - Curve Fitting & Interpolation
No ratings yet
US - TMC - 06 - Curve Fitting & Interpolation
64 pages
DATAENG Lesson 10 Simple Linear Regression and Correlation
No ratings yet
DATAENG Lesson 10 Simple Linear Regression and Correlation
57 pages
CH 11
No ratings yet
CH 11
54 pages
CH 11
No ratings yet
CH 11
54 pages
MathEng5-M - Part 5
No ratings yet
MathEng5-M - Part 5
53 pages
Simple Linear Regression 69
No ratings yet
Simple Linear Regression 69
69 pages
webMATH236 Lecture8
No ratings yet
webMATH236 Lecture8
39 pages
Lesson 11 Simple Linear Regression and Correlation
No ratings yet
Lesson 11 Simple Linear Regression and Correlation
38 pages
Ch11 - Simple Linear Regression
No ratings yet
Ch11 - Simple Linear Regression
40 pages
Ch3 Multiple Regression
No ratings yet
Ch3 Multiple Regression
56 pages
CH 17
No ratings yet
CH 17
36 pages
Biostat Lecture 10
No ratings yet
Biostat Lecture 10
47 pages
10 Regression
No ratings yet
10 Regression
41 pages
4-Curve Fitting and Interpolation
No ratings yet
4-Curve Fitting and Interpolation
48 pages
3.1 Multivariate Analysis
No ratings yet
3.1 Multivariate Analysis
32 pages
Clase 11 Calculo Numerico I
No ratings yet
Clase 11 Calculo Numerico I
37 pages
Linear Regression
No ratings yet
Linear Regression
29 pages
Simple Linear Regression and Correlation: Chapter Outline
No ratings yet
Simple Linear Regression and Correlation: Chapter Outline
77 pages
CVEN2002 Week11
No ratings yet
CVEN2002 Week11
49 pages
Regression and Correlation Analysisxy
No ratings yet
Regression and Correlation Analysisxy
23 pages
Chapt 11 Simples Linear Regression and Correlation
No ratings yet
Chapt 11 Simples Linear Regression and Correlation
48 pages
Chapter 6-Simple Linear Regression and Correlation
No ratings yet
Chapter 6-Simple Linear Regression and Correlation
23 pages
Chap 012
No ratings yet
Chap 012
50 pages
Linear Regression Full Version
No ratings yet
Linear Regression Full Version
34 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
31 pages
Preliminaries: Prediction and Confidence Intervals in Regression
No ratings yet
Preliminaries: Prediction and Confidence Intervals in Regression
10 pages
EEPC102 Module - 6 Lesson 2
No ratings yet
EEPC102 Module - 6 Lesson 2
12 pages
Unit4 Multivariate Analysis
No ratings yet
Unit4 Multivariate Analysis
20 pages
Least Square Regression: Numerical Methods ECE 410
No ratings yet
Least Square Regression: Numerical Methods ECE 410
44 pages
Chapter 5 Curve Fitting
No ratings yet
Chapter 5 Curve Fitting
19 pages
STB1003 - Unit-3 BSC
No ratings yet
STB1003 - Unit-3 BSC
12 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
11 pages
Regression Adequacy
No ratings yet
Regression Adequacy
11 pages
Curve Fitting-Linear Regression
No ratings yet
Curve Fitting-Linear Regression
20 pages
Linear Regression
No ratings yet
Linear Regression
56 pages
Ch17 Curve Fitting
No ratings yet
Ch17 Curve Fitting
44 pages
Adequacy Og Regression Model
No ratings yet
Adequacy Og Regression Model
10 pages
Chapter-4 Curve Fitting PDF
No ratings yet
Chapter-4 Curve Fitting PDF
17 pages
PSLP Unit-3 (Regression Lines) - 1
No ratings yet
PSLP Unit-3 (Regression Lines) - 1
11 pages
Lecture 12
No ratings yet
Lecture 12
29 pages
Chapter 10 Simple Linear Regression and Correlation
No ratings yet
Chapter 10 Simple Linear Regression and Correlation
28 pages
Ch17 Curve Fitting
No ratings yet
Ch17 Curve Fitting
44 pages
Polynomial Curve Fitting
No ratings yet
Polynomial Curve Fitting
44 pages
Regression Notes
No ratings yet
Regression Notes
23 pages
Statistic For Agriculture Studies: The Assumptions of Regression
No ratings yet
Statistic For Agriculture Studies: The Assumptions of Regression
6 pages
QUIZ (Objectives) Identification: - (Residual)
No ratings yet
QUIZ (Objectives) Identification: - (Residual)
5 pages
Topic 6B Regression
No ratings yet
Topic 6B Regression
13 pages
Inference For Regression
No ratings yet
Inference For Regression
24 pages
Chapter2 (Simple Linear Regression)
No ratings yet
Chapter2 (Simple Linear Regression)
11 pages
09 Inference For Regression Part1
No ratings yet
09 Inference For Regression Part1
12 pages
Water Resource-1
No ratings yet
Water Resource-1
7 pages

Lecture 7 - Regression

Uploaded by

Lecture 7 - Regression

Uploaded by

Course contact information

• Course Provider: Dr Anis Fatima

• Office: Room 5, Second Floor IM

• Phone Ext: 2250

• Only one independent variable, X

Predicted Value Random Error for this Xi

•The simple linear regression considers

• We assume that each observation, Y,

y  (1/n)  in1 yi x  (1/n)  in1 xi

Note that each pair of observations satisfies the

where is called the residual. The residual

The fitted simple linear regression model (with the

It can be shown that the expected

where SSE can be easily

Similarly, a 100(1 - a)% confidence interval on the intercept b0 is

Practical Interpretation: This CI does not include zero, so there

• Analysis of the residuals is frequently helpful

Table 11-4 presents the observed and predicted values of y

A normal probability plot of the residuals is shown in Fig. 11-

The residuals are also plotted against the predicted value25

Figure 11-10 Normal

Figure 11-11 Plot of

is called the coefficient of determination

You might also like