Regression
Regression
Computing
Lecturer: Abdu Yearwood
February 28, 2024
1
Linear Regression
Example 1: Fit a straight line to the x and y values in the first two columns of Table 1.
Solution. To determine coefficients and equation of the line the following quantities can be computed:
n=7
X
xi yi = 119.5
X
x2i = 140
P
X xi 28
xi = 28, thus, x̄ = = =4
P n 7
X yi 24
yi = 24 thus, ȳ = = ≈ 3.428571
n 7
Thus, we obtain the following,
P P P
n x y − xi yi
a1 = Pi i2 P ≈ 0.8392857 (1)
n xi − ( xi )2
a0 = ȳ − a1 x̄a0 ≈ 0.07142857 (2)
The least-squares fit is y = 0.07142857 + 0.8392857x. This line represents the best linear approximation
to the given data points that minimises the sum of the squares of vertical distances between the data points
and the line.
y = a1 eb1 x
where a1 and b1 are constants.
Another example of a non-linear model is the simple power equation, given here:
y = a2 xb2
For more information, see: Numerical Methods for Engineers by Chapra and Canale (Least Square Re-
gression. Section - 17.1.5 Linearization of Nonlinear Relationships).
2
Table 2: Data to be fitted to the power equation.
x y log x log y
1 0.5 0 20.301
2 1.7 0.301 0.226
3 3.4 0.477 0.534
4 5.7 0.602 0.753
5 8.4 0.699 0.922
Example 2:, Fit Eq. y = a2 xb2 to the data in Table 2 using a logarithmic transformation of the data.
To obtain this equation, we perform logarithmic transformation on the original power equation y = a2 xb2 .
The logarithmic transformation involves taking the natural logarithm (or any consistent base) of both sides
of the equation.
Using the properties of logarithms, we can rewrite the right side as:
Then, using the power rule of logarithms (logb (xa ) = a logb (x)), we simplify further:
Here we define:
log y = ln(y)
log x = ln(x)
Hint:
ln x
log(x) = log10 (x) =
ln 10
In this context, log y and log x represent the logarithmic transformations of y and x respectively. The
notation ln(y) denotes the natural logarithm (base e) of y, while ln(x) denotes the natural logarithm of x.
Defining log y = ln(y) and log x = ln(x), we establish a notation convention to denote the logarithmic
transformations of y and x, when dealing with exponential or power functions. So, we have:
Given the data in Table 2, we can perform linear regression on log x and log y to find the coefficients.
The linear regression yields coefficients approximately equal to 1.75 for b2 and −0.300 for ln(a2 ). Thus,
substituting back the definitions of log y and log x, we get:
This equation represents the logarithmically transformed form of the original power equation, which can
be useful for linear regression analysis when the relationship between x and y is nonlinear. The intercept,
log a2 , equals -3.00 and therefore, by taking the antilogarithm, a2 = 10−0.3 = 0.5. The slope is b2 = 1.75.
The power equation is y = 0.5x1.75 .
3
Polynomial Regression
The least-squares method can be extended to fit the data to a higher-order polynomial. For example, suppose
we want to fit to the following second-order polynomial:
y = a0 + a1 x + a2 x2 + e
The sum of squares of residuals (denoted as Sr ) represents the sum of the squared differences between
the observed values (yi ) and the values predicted by the model (ŷi ) and given as.
n
X
Sr = (yi − a0 − a1 xi − a2 x2i )2 (Eq.1)
i=1
Note, Sr in this case is of higher order when compared with linear regression as shown in the following:
n
X 2 2
Sr = e2i = a yimeasured − yimodel = a (yi − a0 − a1 xi )
i
In expanded form:
n
X
Sr = (yi − a0 − a1 xi − a2 x2i )2
i
Sr = (yi2 − 2yi a0 − 2a1 xi yi − 2a2 x2i yi
+ a20 + 2a0 a1 xi + 2a0 a2 x2i
+ a21 x2i + 2a1 a2 x3i + a22 x4i ).
Here,
Taking the derivative of Eq. 1 with respect to each of the unknown coefficients of the polynomial, as in
n
∂Sr X
= −2 (yi − a0 − a1 xi − a2 x2i )
∂a0 i=1
n
∂Sr X
= −2 (yi − a0 − a1 xi − a2 x2i )
∂a1 i=1
n
∂Sr X
= −2 (yi − a0 − a1 xi − a2 x2i )
∂a2 i=1
These equations can be set equal to zero and rearranged to develop the following set of normal equations:
X X X
(n) α0 + xi α1 + x2i α2 = Yi
X X X X
xi α 0 + x2i α1 + x3i α2 = xi yi
X X X X
x2i α0 + x3i α1 + x4i α2 = xi yi (Eq.2)
4
The standard error is given as
s
Sr
sy/x = (Eq.3)
(n − (m + 1))
where Sr is the sum of squares of residuals, n is the total number of data points. m is the order of the
polynomial. The coefficient of determination can also be computed for polynomial regression with:
St − Sr
r2 = (Eq.4)
St
Example 3 Fit a second-order polynomial to the data in the first two columns of Table 3.
Note: you need to brush up on Guass elimination method. Solving equations using Gauss elimination gives
a0 = 2.47857, a1 = 2.35929, and a2 = 1.86071. Therefore, the least-squares quadratic equation is:
5
Sr = (y1 − a0 − a1 x1 − a2 x21 )2
+ (y2 − a0 − a1 x2 − a2 x22 )2
+ (y3 − a0 − a1 x3 − a2 x23 )2
+ (y4 − a0 − a1 x4 − a2 x24 )2
+ (y5 − a0 − a1 x5 − a2 x25 )2
+ (y6 − a0 − a1 x6 − a2 x26 )2
6
1 Homework
Bridge Load Analysis: As part of a civil engineering project, you are tasked with designing a bridge to
support heavy loads passing over a river. The bridge will span 50 meters and must be able to withstand
varying loads, including vehicles and pedestrians. To aid with your design, you have collected random data
on the loads experienced by similar bridges in the region over the past decade; x represents the averaged
combined weight measured by a bridge-based sensor referenced at a single point (Bridge Weight in kN),
while y is the observed deflection using a level gauge (Deflection in mm).
Do:
Apply a second-order polynomial model to model the data using regression analysis techniques.
Calculate the sum of squares of residuals Sr and the total sum of squares St .