0% found this document useful (0 votes)
57 views10 pages

Lecture 14: Regression Analysis: Nonlinear Relationship

This document provides an overview of regression analysis, including: 1) Regression analysis is used when both the response and explanatory variables are continuous and there is a relationship between them that can be modeled by an equation. 2) Common regression models include linear, quadratic, cubic, and other polynomial functions. 3) The simplest regression model is a straight line defined by the equation Y = β0 + β1X, where β0 is the y-intercept and β1 is the slope. 4) The least squares regression line minimizes the sum of the squared residuals and provides estimates of the slope (β1) and y-intercept (β0) that best fit the data.

Uploaded by

Victor Mlongoti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views10 pages

Lecture 14: Regression Analysis: Nonlinear Relationship

This document provides an overview of regression analysis, including: 1) Regression analysis is used when both the response and explanatory variables are continuous and there is a relationship between them that can be modeled by an equation. 2) Common regression models include linear, quadratic, cubic, and other polynomial functions. 3) The simplest regression model is a straight line defined by the equation Y = β0 + β1X, where β0 is the y-intercept and β1 is the slope. 4) The least squares regression line minimizes the sum of the squared residuals and provides estimates of the slope (β1) and y-intercept (β0) that best fit the data.

Uploaded by

Victor Mlongoti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

BSC 311: Design and Analysis of Experiments First Semester 2021/22 Academic Year

Lecture 14: Regression Analysis

A. Introduction
 Regression analysis is the statistical method used when both the response variable and the
explanatory variable are continuous variables
 The easiest way of knowing when regression is the appropriate analysis is when the
scatterplot is the appropriate graphic.
 From the scatter plot, it is often possible to visualize a smooth curve that approximates the
data. Such a curve is called an approximate curve.
 If the data appear to be approximated well by a straight line, we say that a linear relationship
exists between the variables.
 If a relationship exists between two variables and the relationship is not linear, we call it a
nonlinear relationship.
 The general problem of finding equations of approximating curves that fit given sets of data
is called curve fitting.

B. Equations of approximating curves


 Several common types of approximating curves and their equations are listed below for
reference purposes.

No Type of curve Equation


1. Straight line 𝑌 = 𝛽0 + 𝛽1 𝑋
2. Parabola or quadratic curve 𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝛽2 𝑋 2
3. Cubic curve 𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝛽2 𝑋 2 + 𝛽3 𝑋 3
4. Quartic curve 𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝛽2 𝑋 2 + 𝛽3 𝑋 3 + 𝛽4 𝑋 4
5. nth-degree curve 𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝛽2 𝑋 2 ⋯ ⋯ ⋯ + 𝛽𝑛 𝑋 𝑛

 The right sides of the above equations are called polynomials of the first, second, third, fourth
and nth degrees, respectively.
 The functions defined by the first four equations are sometimes called linear, quadratic,
cubic and quartic functions, respectively.

1
Lecture 14 Wilson Wesley LazaroJere
BSC 311: Design and Analysis of Experiments First Semester 2021/22 Academic Year

The simple linear regression model

2
Lecture 14 Wilson Wesley LazaroJere
BSC 311: Design and Analysis of Experiments First Semester 2021/22 Academic Year

C. The simple linear regression model

 The simplest type of approximating curve is a straight line, whose equation can be written as
follows:

𝑌 = 𝛽0 + 𝛽1 𝑋

Where 𝛽0 is the y intercept and 𝛽1 is the slope of the line that describes the relationship of
Y and X.

 The slope 𝛽1 measures the change in Y divided by a corresponding change in X.

𝑌2 − 𝑌1
𝛽1 =
𝑋2 − 𝑋1

D. The least-squares regression line

 A least-squares regression line (or line of best fit) is the line through the data points (𝑥𝑖 , 𝑦𝑖 )
that has the smallest possible sums of squares of deviations from the line.
 The method of finding this line is called least-squares estimation.

a) Let the true linear regression, 𝑌 = 𝛽0 + 𝛽1 𝑋, be estimated by 𝑌 = 𝛽̂0 + 𝛽̂1 𝑋.


∑(𝑥𝑖 −𝑥̅ )(𝑦𝑖 −𝑦̅)
b) The point estimate of 𝛽1is 𝛽̂1 = ∑(𝑥𝑖 −𝑥̅ )2
. This is the definitional formula.

c) The point estimate of 𝛽0 is 𝛽̂0 = 𝑦̅ − 𝛽̂1 𝑥̅


 A regression line obtained in this way, from sample data, is valid only for the observed range
of observations. The linear relation may not hold for all variables of x, and so it should not be
extrapolated beyond the range of values used in obtaining the regression line.
∑(𝑥𝑖 −𝑥̅ )(𝑦𝑖 −𝑦̅)
 The definitional formula 𝛽̂1 = ∑(𝑥𝑖 −𝑥̅ )2
can be replaced by the computational versions.
Recall that:
(∑ 𝑥𝑖 )2
∑(𝑥𝑖 − 𝑥̅ )2 = ∑ 𝑥𝑖2 −
𝑛

(∑ 𝑦𝑖 )2
∑(𝑦𝑖 − 𝑦̅) = 2
∑ 𝑦𝑖2 −
𝑛

3
Lecture 14 Wilson Wesley LazaroJere
BSC 311: Design and Analysis of Experiments First Semester 2021/22 Academic Year

∑ 𝑥𝑖 ∑ 𝑦𝑖
∑(𝑥𝑖 − 𝑥̅ )(𝑦𝑖 − 𝑥̅ ) = ∑ 𝑥𝑖 𝑦𝑖 −
𝑛
∑(𝑥𝑖 −𝑥̅ )(𝑦𝑖 −𝑥̅ )
 Therefore, the computational version of 𝛽̂1 = ∑(𝑥𝑖 −𝑥̅ )2
becomes:

∑ 𝑥𝑖 ∑ 𝑦 𝑖
∑ 𝑥𝑖 𝑦𝑖 −
𝑛
𝛽̂1 = (∑ 𝑥𝑖 )2
∑ 𝑥𝑖2 − 𝑛

 From the computational formula, we notice the following quantities (Recall the
computational formulas for sum of squares from Lecture 4):

a) The corrected sum of squares for the response variable:

(∑ 𝑦𝑖 )2
𝑆𝑆𝑌 = ∑ 𝑦𝑖2 −
𝑛
b) The corrected sum of squares of the explanatory variable shown in the denominator:

(∑ 𝑥𝑖 )2
𝑆𝑆𝑋 = ∑ 𝑥𝑖2 −
𝑛
c) The corrected sum of products shown in the numerator:

∑ 𝑥𝑖 ∑ 𝑦𝑖
𝑆𝑆𝑋𝑌 = ∑ 𝑥𝑖 𝑦𝑖 −
𝑛

Example 13.1. Given the response variable Y and the explanatory variable X shown the table
below, determine the linear regression parameter estimates 𝛽0and 𝛽1.

X Y
1 6
1 5
3 10
3 14
5 12
5 18

4
Lecture 14 Wilson Wesley LazaroJere
BSC 311: Design and Analysis of Experiments First Semester 2021/22 Academic Year

 We start out by calculating the ‘famous five’. These are: ∑ 𝑦𝑖2 and ∑ 𝑦𝑖 , ∑ 𝑥𝑖2 and ∑ 𝑥𝑖 and the
sum of cross of products, ∑ 𝑥𝑖 𝑦𝑖 .
 Here is the table showing computations:

𝒙 𝒚 𝒙𝟐 𝒚𝟐 𝒙𝒚
1 6 1 36 6
1 5 1 25 5
3 10 9 100 30
3 14 9 196 42
5 12 25 144 60
5 18 25 324 90
Total 18 65 70 825 233
Mean 3 10.8333
 From the table, we get the following quantities:

o ∑ 𝑦𝑖2 = 825 and ∑ 𝑦𝑖 = 65,


o ∑ 𝑥𝑖2 = 70 and ∑ 𝑥𝑖 = 18
o ∑ 𝑥𝑖 𝑦𝑖 = 233.

 We can, therefore, make the following computations:


(∑ 𝑦𝑖 )2 (65)2
𝑆𝑆𝑌 = ∑ 𝑦𝑖2 − = 825 − = 120.8333
𝑛 6

(∑ 𝑥𝑖 )2 (18)2
𝑆𝑆𝑋 = ∑ 𝑥𝑖2 − = 70 − = 16
𝑛 6

∑ 𝑥𝑖 ∑ 𝑦 𝑖 (18)(65)
𝑆𝑆𝑋𝑌 = ∑ 𝑥𝑖 𝑦𝑖 − = 233 − = 38
𝑛 6

 We can now calculate the value of 𝛽̂1


∑ 𝑥 𝑖 ∑ 𝑦𝑖
∑ 𝑥𝑖 𝑦 𝑖 − 𝑆𝑆𝑋𝑌 38
𝛽̂1 = 𝑛
2 = = 16 = 2.375
(∑ 𝑥 𝑖) 𝑆𝑆𝑋
∑ 𝑥𝑖2 −
𝑛

 Finally, we can calculate the value of 𝛽̂0

𝛽̂0 = 𝑦̅ − 𝛽̂1 𝑥̅ = 10.8333 − (2.375)(3) = 3.708

 The regression equation is: 𝒀 = 𝟑. 𝟕𝟎𝟖 + 𝟐. 𝟑𝟕𝟓𝑿


5
Lecture 14 Wilson Wesley LazaroJere
BSC 311: Design and Analysis of Experiments First Semester 2021/22 Academic Year

Interpretation of the regression parameters

 The intercept, 𝛽̂0 = 3.708: This means that when 𝑋 = 0, 𝑌 = 3.708.


 The slope, 𝛽̂1 = 2.375. This means that a unit increase in X results in a 2.375 increase in Y.

E. Regression analysis

a) Testing the fit of the regression model

 Now that we have the estimates of the regression line, we need to test if the regression model
fits the data well.
 This is done by comparing the mean squares due to regression (𝑀𝑆𝑟𝑒𝑔 ) to the residual mean
squares (𝑀𝑆𝑟𝑒𝑠 ).
 The two mean squares of variances can be compared through an F-test.
2
𝜎𝑟𝑒𝑔
𝐹= 2
𝜎𝑟𝑒𝑠

Example 13.2. For the data in Example 13.1, perform an analysis of variance to determine if the
model fits the data well.

Solution.

 We set the null hypothesis as follows:


2
𝜎𝑟𝑒𝑔
𝐻0 : 2 =1
𝜎𝑟𝑒𝑠

2
𝜎𝑟𝑒𝑔
𝐻𝑎 : 2 >1
𝜎𝑟𝑒𝑠

 We start out by calculating the sums of squares:

i) Total sums of squares

Total sums of squares = 𝑆𝑆𝑌 = 120.8333

ii) Regression sums of squares

𝑆𝑆𝑟𝑒𝑔 = (𝛽1 )(𝑆𝑆𝑋𝑌) = 2.375*38 = 90.25

iii) Residual sums of squares

𝑆𝑆𝑟𝑒𝑠 = Total sums of squares - Regression sums of squares

6
Lecture 14 Wilson Wesley LazaroJere
BSC 311: Design and Analysis of Experiments First Semester 2021/22 Academic Year

= 120.8333 - 90.25

= 30.5833

 We, then calculate, the degrees of freedom as follows:

i) Two parameters were estimated in the simple linear regression, 𝛽0 and 𝛽1 . The
degrees of freedom are 2 − 1 = 1.
ii) The residual degrees of freedom are the number of observations minus number of
parameters, i.e., 6 − 2 = 4.
 The mean squares are calculated by diving the sums of squares by the degrees of freedom.
 The F-test for the model is calculated by diving the mean squares for regression by the mean
squares for residuals.
 The following is the ANOVA table for the regression model in Example 13.1.

Source of Sums of Degree of Mean F-value 𝑭𝟎.𝟎𝟓(𝟏),𝟏,𝟒


variation Squares freedom Squares
Treatment 90.2500 1 90.2500 11.80 7.71
Error 30.5833 4 7.6458
Total 120.8333 5
 Since the calculated F is greater than the tabulated F, we reject the null hypothesis. We
2
𝜎𝑟𝑒𝑔
conclude that the model fits the data well, i.e., 2 > 1. In other words, the regression model
𝜎𝑟𝑒𝑠
is significant or well defined.

b) Descriptive statistics from the regression analysis

 The following descriptive statistics can be calculated from the regression analysis:

i) Standard error of the estimate

𝑆𝑦.𝑥 = √𝑀𝑆𝑟𝑒𝑠

Example 13.3. Calculate the standard error of the estimate for the model in Example
13.2.

𝑆𝑦.𝑥 = √𝑀𝑆𝑟𝑒𝑠 = √7.6458 = 2.7651

7
Lecture 14 Wilson Wesley LazaroJere
BSC 311: Design and Analysis of Experiments First Semester 2021/22 Academic Year

 Since the standard error of estimate is a descriptive statistic, describing the spread of
‘observations’ around the regression line, it behaves like a ‘standard deviation’ and,
technically, should not be called ‘standard error’: this term is generally reserved for
describing the variation of a ‘statistic’.
 Some books do not use the term ‘standard error of estimate’ at all, using either ‘square
root residual variance’ or ‘RMSE’ (root mean square error) instead.

ii) Coefficient of variation

The coefficient of variation shows the amount of variation in the response variable that is
explained by the variation in the independent variable.
𝑆𝑆𝑟𝑒𝑔 𝑆𝑆
𝑅 2 = 𝑆𝑆 or 𝑅 2 = 1 − 𝑆𝑆 𝑟𝑒𝑠
𝑡𝑜𝑡𝑎𝑙 𝑡𝑜𝑡𝑎𝑙

Example 13.4. Calculate the coefficient of determination for the model in Example 13.2
and explain the coefficient of determination.

𝑆𝑆𝑟𝑒𝑔 90.25
𝑅 2 = 𝑆𝑆 = 120.8333 = 0.7469
𝑡𝑜𝑡𝑎𝑙

Or
𝑆𝑆𝑟𝑒𝑠 30.5833
𝑅2 = 1 − = 1− = 0.7469
𝑆𝑆𝑡𝑜𝑡𝑎𝑙 120.8333

This means that 74.7% of the variation in the in Y that is explained by the variation in X.

c) Testing the regression coefficients

 The standard error of the y-intercept is as follows:

𝑀𝑆𝑟𝑒𝑠 ∑𝑛 2
𝑖=1 𝑥𝑖
𝑆𝛽̂0 = √( )( )
𝑛 𝑆𝑆𝑋

 The standard error of the slope is as follows"

𝑀𝑆𝑟𝑒𝑠
𝑆𝛽̂1 = √ 𝑆𝑆𝑋

8
Lecture 14 Wilson Wesley LazaroJere
BSC 311: Design and Analysis of Experiments First Semester 2021/22 Academic Year

Example 13.5. Calculate the standard error for the 𝛽̂0 and 𝛽̂1 in Example 13.1.

𝑀𝑆𝑟𝑒𝑠 ∑𝑛 2
𝑖=1 𝑥𝑖 7.6458 70
𝑆𝛽̂0 = √( )( ) = √( ) (16) = 2.3612
𝑛 𝑆𝑆𝑋 6

𝑀𝑆𝑟𝑒𝑠 7.6458
𝑆𝛽̂1 = √ =√ = 0.6913
𝑆𝑆𝑋 16

Example 13.6. Is it reasonable to believe that the unknown population intercept of Y versus X
regression model in Example 13.1 is zero (i.e. the line runs through the origin)? Use α = 0.05.

Solution. Since there is no prior knowledge available about the directional preference for the
rejection of null hypothesis, a two-tailed test will be used.

𝐻0 : 𝛽0 = 0

𝐻𝑎 : 𝛽0 ≠ 0

𝛼 = 0.05

We know that 𝛽̂0 = 3.708 and 𝑆𝛽̂0 = 2.3612

We use t-test
3.708−0
𝑡= =1.57
2.3612

𝑡0.025,6 = 2.776

Since the calculated t of 1.57 is less than the tabulated t of 2.776, we fail to reject the null
hypothesis. We conclude that the y-intercept is not significantly different from zero at 5 % level
of significance.

Example 13.7. Is it reasonable to believe that the unknown population slope in Example 13.1 is
zero (i.e. it is a flat, horizontal line, meaning that there is no relationship between X and Y)? Use
a 0.05 level of significance.

9
Lecture 14 Wilson Wesley LazaroJere
BSC 311: Design and Analysis of Experiments First Semester 2021/22 Academic Year

Solution

𝐻0 : 𝛽1 = 0

𝐻𝑎 : 𝛽1 ≠ 0

𝛼 = 0.05

We know that 𝛽̂1 = 2.375 and 𝑆𝛽̂1 = 0.6913

We use t-test
2.375−0
𝑡= =3.4356
0.6913

𝑡0.025,6 = 2.776

Since the calculated t of 3.44 is greater than the tabulated t of 2.776, we reject the null
hypothesis. We conclude that the slope is significantly different from zero at 5 % level of
significance.

Example 13.8. Is it reasonable to assume that the unknown population slope is at least 2? In
other words, for every cm increase in X, does Y in Example 13.1 increase by at least 2? Use a
0.05 level of significance.

Solution.

𝐻0 : 𝛽1 = 2

𝐻𝑎 : 𝛽1 ≠ 2

𝛼 = 0.05

We know that 𝛽̂1 = 2.375 and 𝑆𝛽̂1 = 0.6913

We use t-test
2.375−2
𝑡= = 0.5425
0.6913

𝑡0.05,6 = 2.132

Since the calculated t of 0.5425 is less than the tabulated t of 2.132, we reject the null hypothesis.
We conclude that the slope is significantly greater 2 at 5 % level of significance.

10
Lecture 14 Wilson Wesley LazaroJere

You might also like