0% found this document useful (0 votes)
14 views

Regression

This document provides examples and explanations of linear regression, non-linear regression through transformations, and polynomial regression. It includes: 1) An example of using linear regression to fit a straight line to x and y data. 2) An example of using logarithmic transformation to apply linear regression to non-linear power equation data. 3) An overview of using polynomial regression to fit data to higher-order polynomials, and the equations to calculate the coefficients and standard error.

Uploaded by

raj persaud
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Regression

This document provides examples and explanations of linear regression, non-linear regression through transformations, and polynomial regression. It includes: 1) An example of using linear regression to fit a straight line to x and y data. 2) An example of using logarithmic transformation to apply linear regression to non-linear power equation data. 3) An overview of using polynomial regression to fit data to higher-order polynomials, and the equations to calculate the coefficients and standard error.

Uploaded by

raj persaud
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Course: EMT3200 - Statistics, Numerical Analysis and

Computing
Lecturer: Abdu Yearwood
February 28, 2024

[Worked examples with detailed explanation]

1
Linear Regression
Example 1: Fit a straight line to the x and y values in the first two columns of Table 1.

Table 1: Computations for linear regression.


xi yi (yi − ȳ)2 (yi − a0 − a1 xi )2
1 0.5 8.5765 0.1687
2 2.5 0.8622 0.5625
3 2.0 2.0408 0.3473
4 4.0 0.3265 0.3265
5 3.5 0.0051 0.5896
6 6.0 6.6122 0.7972
7 5.5 4.2908 0.1993
Sum 24.0 22.7143 2.9911

Solution. To determine coefficients and equation of the line the following quantities can be computed:

n=7
X
xi yi = 119.5
X
x2i = 140
P
X xi 28
xi = 28, thus, x̄ = = =4
P n 7
X yi 24
yi = 24 thus, ȳ = = ≈ 3.428571
n 7
Thus, we obtain the following,

P P P
n x y − xi yi
a1 = Pi i2 P ≈ 0.8392857 (1)
n xi − ( xi )2
a0 = ȳ − a1 x̄a0 ≈ 0.07142857 (2)

The least-squares fit is y = 0.07142857 + 0.8392857x. This line represents the best linear approximation
to the given data points that minimises the sum of the squares of vertical distances between the data points
and the line.

Non-linearity Through Transformation


Some data cannot be modeled by linear regression. In such cases, a non-linear approach is needed but other
techniques can be considered as well. In such context, transformations can be applied to express the data
(curvilinear) in a form that is compatible with linear regression. One example is the exponential model
described here:

y = a1 eb1 x
where a1 and b1 are constants.
Another example of a non-linear model is the simple power equation, given here:

y = a2 xb2
For more information, see: Numerical Methods for Engineers by Chapra and Canale (Least Square Re-
gression. Section - 17.1.5 Linearization of Nonlinear Relationships).

2
Table 2: Data to be fitted to the power equation.
x y log x log y
1 0.5 0 20.301
2 1.7 0.301 0.226
3 3.4 0.477 0.534
4 5.7 0.602 0.753
5 8.4 0.699 0.922

Example 2:, Fit Eq. y = a2 xb2 to the data in Table 2 using a logarithmic transformation of the data.

In this example, the solution given is:

log y = 1.75 log x − 0.300

To obtain this equation, we perform logarithmic transformation on the original power equation y = a2 xb2 .
The logarithmic transformation involves taking the natural logarithm (or any consistent base) of both sides
of the equation.

The power equation is:


y = a2 xb2
From this, we can take the natural logarithm (ln) of both sides:

ln(y) = ln(a2 xb2 )

Using the properties of logarithms, we can rewrite the right side as:

ln(y) = ln(a2 ) + ln(xb2 )

Then, using the power rule of logarithms (logb (xa ) = a logb (x)), we simplify further:

ln(y) = ln(a2 ) + b2 ln(x)

Here we define:
log y = ln(y)
log x = ln(x)
Hint:
ln x
log(x) = log10 (x) =
ln 10
In this context, log y and log x represent the logarithmic transformations of y and x respectively. The
notation ln(y) denotes the natural logarithm (base e) of y, while ln(x) denotes the natural logarithm of x.
Defining log y = ln(y) and log x = ln(x), we establish a notation convention to denote the logarithmic
transformations of y and x, when dealing with exponential or power functions. So, we have:

log y = ln(a2 ) + b2 log x

Given the data in Table 2, we can perform linear regression on log x and log y to find the coefficients.
The linear regression yields coefficients approximately equal to 1.75 for b2 and −0.300 for ln(a2 ). Thus,
substituting back the definitions of log y and log x, we get:

log y = 1.75 log x − 0.300

This equation represents the logarithmically transformed form of the original power equation, which can
be useful for linear regression analysis when the relationship between x and y is nonlinear. The intercept,
log a2 , equals -3.00 and therefore, by taking the antilogarithm, a2 = 10−0.3 = 0.5. The slope is b2 = 1.75.
The power equation is y = 0.5x1.75 .

3
Polynomial Regression
The least-squares method can be extended to fit the data to a higher-order polynomial. For example, suppose
we want to fit to the following second-order polynomial:

y = a0 + a1 x + a2 x2 + e
The sum of squares of residuals (denoted as Sr ) represents the sum of the squared differences between
the observed values (yi ) and the values predicted by the model (ŷi ) and given as.
n
X
Sr = (yi − a0 − a1 xi − a2 x2i )2 (Eq.1)
i=1

Note, Sr in this case is of higher order when compared with linear regression as shown in the following:
n
X 2 2
Sr = e2i = a yimeasured − yimodel = a (yi − a0 − a1 xi )
i

In expanded form:

n
X
Sr = (yi − a0 − a1 xi − a2 x2i )2
i
Sr = (yi2 − 2yi a0 − 2a1 xi yi − 2a2 x2i yi
+ a20 + 2a0 a1 xi + 2a0 a2 x2i
+ a21 x2i + 2a1 a2 x3i + a22 x4i ).

Here,

ˆ n is the number of data points.

ˆ yi is the observed value (actual value) at data point i.

ˆ ŷi is the predicted value by the model at data point i.

Taking the derivative of Eq. 1 with respect to each of the unknown coefficients of the polynomial, as in
n
∂Sr X
= −2 (yi − a0 − a1 xi − a2 x2i )
∂a0 i=1

n
∂Sr X
= −2 (yi − a0 − a1 xi − a2 x2i )
∂a1 i=1
n
∂Sr X
= −2 (yi − a0 − a1 xi − a2 x2i )
∂a2 i=1

These equations can be set equal to zero and rearranged to develop the following set of normal equations:

X  X  X
(n) α0 + xi α1 + x2i α2 = Yi
X  X  X  X
xi α 0 + x2i α1 + x3i α2 = xi yi
X  X  X  X
x2i α0 + x3i α1 + x4i α2 = xi yi (Eq.2)

4
The standard error is given as
s
Sr
sy/x = (Eq.3)
(n − (m + 1))

where Sr is the sum of squares of residuals, n is the total number of data points. m is the order of the
polynomial. The coefficient of determination can also be computed for polynomial regression with:
St − Sr
r2 = (Eq.4)
St
Example 3 Fit a second-order polynomial to the data in the first two columns of Table 3.

Table 3: Dataset to be fitted to Polynomial Regression


Data xi yi (yi − y) (yi − a0 − a1 xi − a2 x2i )2
1 0 2.1 544.44 0.14332
2 1 7.7 314.47 1.00286
3 2 13.6 140.03 1.08158
4 3 27.2 3.12 0.80491
5 4 40.9 239.22 0.61951
6 5 61.1 1272.11 0.09439
Sum 152.6 2513.39 3.74657

Solution. From the given data,


X X
m=2 xi = 15 x4i = 979
X X
n=6 yi = 152.6 xi yi = 585.6
X X
x = 2.5 x2i = 55 x2i yi = 2488.8
X
y = 25.433 x3i = 225

Therefore, the simultaneous linear equations are:


 
   152.6
6 15 55 a0  585.6 
15 55 225 a1  =  
2488.8
55 225 979 a2

Note: you need to brush up on Guass elimination method. Solving equations using Gauss elimination gives
a0 = 2.47857, a1 = 2.35929, and a2 = 1.86071. Therefore, the least-squares quadratic equation is:

y = 2.47857 + 2.35929x + 1.86071x2


To compute Sr , we use:
n
X
Sr = (yi − a0 − a1 xi − a2 x2i )2
i

For the six datapoints we have,

5
Sr = (y1 − a0 − a1 x1 − a2 x21 )2
+ (y2 − a0 − a1 x2 − a2 x22 )2
+ (y3 − a0 − a1 x3 − a2 x23 )2
+ (y4 − a0 − a1 x4 − a2 x24 )2
+ (y5 − a0 − a1 x5 − a2 x25 )2
+ (y6 − a0 − a1 x6 − a2 x26 )2

Sr = (2.1 − 2.47857 − 0 − 0)2 +


(7.7 − 2.47857 − 2.35929 − 1.86071)2 +
(13.6 − 2.47857 − 2.35929 × 2 − 1.86071 × 22 )2 +
(27.2 − 2.47857 − 2.35929 × 3 − 1.86071 × 32 )2 +
(40.9 − 2.47857 − 2.35929 × 4 − 1.86071 × 42 )2 +
(61.1 − 2.47857 − 2.35929 × 5 − 1.86071 × 52 )2
Sr = 0.14332 + 1.00286 + 1.08158 + 0.80491 + 0.61951 + 0.09439
Sr = 3.74657
The standard error of the estimate based on the regression polynomial is computed from Eq. 3:
s s
Sr 3.74657
syx = = = 1.12
(n − (m + 1)) 6 − (2 − 1)

St = (yi − ȳ)2 = (yi − ȳ)(yi − ȳ) = yi2 − 2yi ȳ + ȳ 2


For the six datapoints we have,

St = (y1 − ȳ)2 + (y2 − ȳ)2 + (y3 − ȳ)2


+ (y4 − ȳ)2 + (y5 − ȳ)2 + (y6 − ȳ)2
= (2.1 − 25.433)2 + (7.7 − 25.433)2 + (13.6 − 25.433)2
+ (27.2 − 25.433)2 + (40.9 − 25.433)2 + (61.1 − 25.433)2
= 544.44 + 314.47 + 140.03 + 3.12 + 239.22 + 1272.11
= 2513.39

The coefficient of determination is:


St − Sr 2513.39 − 3.74657
r2 = = = 0.99851
St 2513.39
and thus, the correlation coefficient, r = 0.99925, which indicates that 99.851 percent of the original
uncertainty has been explained by the model as given in the text.

6
1 Homework
Bridge Load Analysis: As part of a civil engineering project, you are tasked with designing a bridge to
support heavy loads passing over a river. The bridge will span 50 meters and must be able to withstand
varying loads, including vehicles and pedestrians. To aid with your design, you have collected random data
on the loads experienced by similar bridges in the region over the past decade; x represents the averaged
combined weight measured by a bridge-based sensor referenced at a single point (Bridge Weight in kN),
while y is the observed deflection using a level gauge (Deflection in mm).

Table 4: Dataset of weight loading and deflections


Year x, kN (Weight) y, mm (Deflection) log x log y (y − ȳ)2 (y − a0 − a1 x − a2 x2 )2
1 10 8.1
2 10.2 7.8
3 10.1 8.3
4 9.8 9
5 10.3 14
6 8.8 7.2
7 10 7.5
8 9.5 7
9 10 8.2
10 10.1 8.5

Do:

ˆ Apply a second-order polynomial model to model the data using regression analysis techniques.

ˆ Determine the coefficients a0 , a1 , and a2 for the polynomial equation y = a0 + a1 x + a2 x2 .

ˆ Use Gauss elimination techniques to solve the simultaneous linear equations.

ˆ Calculate the sum of squares of residuals Sr and the total sum of squares St .

ˆ Compute the standard error of the estimate syx using:


s
Sr
sy/x =
(n − (m + 1))

ˆ Determine the coefficient of determination r2 using:


St − Sr
r2 =
St

You might also like