Lec 18 Linear Regression 02122022 010556pm
Lec 18 Linear Regression 02122022 010556pm
Linear Regression
Topic: Straight line fitting
x 1 3 4 6 8 9 11 14
y 1 2 4 4 5 7 8 9
Curve Fitting
From a scatter diagram it is often
possible to visualize a smooth curve.
This is an approximate curve.
For example the data plot shown here
appears to be a straight line.
So, we say a linear relationship exists
between x and y.
x 1 3 4 6 8 9 11 14
y 1 2 4 4 5 7 8 9
y a0 a1 x
Given two points (x1, y1) and (x2, y2) on a
y
line, we can find the constants a0 and a1.
x2
Then the resulting equation is
y2 – y 1
y2 y1
y y1 x x1 x1
x2 x1
x 2 – x1
y y1 mx x1 y2
y1
y2 y1
m
x2 x1
x
m is called the slope of the line.
Straight Line
Example 1: Let us construct a line that approximate the data
x 2 3 5 7 9 10
y 1 3 7 11 15 17
y a0 a1 x
Straight Line
Example 2:To determine the equation, general form of straight line equation is
y a0 a1 x
Only two points from the figure are needed to
find a0 and a1.
Let us select (2, 1) and (3, 3) as two points.
Putting x = 2, and y =1 we get:
1 = a0 + 2a1 (1)
Then putting x = 3, y =3, we get :
3 = a0 + 3a1 (2)
Solving Eq. 1 and 2, we get a0 = -3 and a1 = 2.
Therefore, Straight line is y = -3 + 2x; -3 is y-intercept and slope is equal to +2.
As a check see any point of data is lying on this line.
Straight Line
Example 2: Let us construct a line that approximate the data
x 1 3 4 6 8 9 11 14
y 1 2 4 4 5 7 8 9
Let us compare the values estimated from this equation of straight line with actual
data.
x 1 3 4 6 8 9 11 14
y 1 2 4 4 5 7 8 9
yest 1.5 2.6 3.2 4.3 5.3 5.9 7.0 8.6
The best fit is generally based on minimizing the sum of the square of the
residuals, Sr.
Residual at a point is
( xn, yn )
i yi f ( xi )
y a 0 a1 x y
x,y
Where, constants are determined i yi a0 a1 xi i i
y a N a x
0 1 x ,y
2 2
x3 , y3
xy a x a x
0 1
2
x1 , y1
x
These are called normal equations
for least square line.
Examples on least square line fitting
Example 3: Given the data points find the best fit to a straight line using
least square fitting:
x y x2 xy y2
1 1 1 1 1
3 2 9 6 4
4 4 16 16 16
6 4 16 24 16
8 5 64 40 25
9 7 81 63 49
11 8 121 88 64
14 9 196 126 81
40 = 8a0 + 56a1
364 = 56a0 + 524a1
Solving these we get: a0 = 6/11 and a1 = 7/11.
So, least square line equation is
y = 0.545 + 0.636x
Examples on least square line fitting
Example 4: Farm employment in a country as a function of years is given
in millions of workers as shown in table below:
year 1935 1940 1945 1950 1955 1960 1965 1970 1975 1980
Farm
employment 12.7 11.0 10.0 9.9 8.4 7.1 5.6 4.5 4.3 3.7
(millions)
Year x y x2 xy
1935 0 12.7 0 0
1940 1 11.0 1 11.0
1945 2 10.0 4 20.0
1950 3 9.9 9 29.7
1955 4 8.4 16 33.6
1960 5 7.1 25 35.5
1965 6 5.6 36 33.6
1970 7 4.5 49 31.5
1975 8 4.3 64 34.4
1980 9 3.7 81 33.3
Σx =45 Σy =77.2 Σx2 =285 Σxy = 262.6
Examples on least square line fitting
Example 4:
N = 10
Year x y yest =
1935 0 12.7 12.35
1940 1 11.0 11.32
1945 2 10.0 10.29
1950 3 9.9 9.27
1955 4 8.4 8.24
1960 5 7.1 7.21
1965 6 5.6 6.18
1970 7 4.5 5.15
1975 8 4.3 4.13
1980 9 3.7 3.10
Example on least square curve fitting
yest = 12.346 - 1.028 x
(c) The year 1990 corresponds to x = 11 and putting this in above line
equation, we get
y = 1`.042
This results agrees with the new data that about a million workers were
there in year 1990 for farming.
Now for year 2000, x = 13
y = -1.014
This result is impossible. We therefore conclude that the linear trend of
equation does not continue for long and projections for 2000 will be
wrong based on this trend.