0% found this document useful (0 votes)
45 views21 pages

Lec 18 Linear Regression 02122022 010556pm

Here are the steps to solve this problem: (a) The data is graphed with year on the x-axis and farm employment (in millions) on the y-axis. (b) To find the least squares line, the sums of x, y, x^2, xy are calculated and the normal equations are set up and solved: Σy = 103.4 Σx = 435 Σx^2 = 15,225 Σxy = -0.2435x + 11.7 Solving the normal equations yields: y = -0.2435x + 11.7 (c) To predict employment in 1990 and 2000,

Uploaded by

sadia sagheer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views21 pages

Lec 18 Linear Regression 02122022 010556pm

Here are the steps to solve this problem: (a) The data is graphed with year on the x-axis and farm employment (in millions) on the y-axis. (b) To find the least squares line, the sums of x, y, x^2, xy are calculated and the normal equations are set up and solved: Σy = 103.4 Σx = 435 Σx^2 = 15,225 Σxy = -0.2435x + 11.7 Solving the normal equations yields: y = -0.2435x + 11.7 (c) To predict employment in 1990 and 2000,

Uploaded by

sadia sagheer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 21

Numerical Methods

Linear Regression
Topic: Straight line fitting

Dr. Nasir M Mirza


Email: [email protected]
Curve Fitting
To determine an equation that connects
variables, first step is to collect data.
For example, (x1, y1), (x2, y2), (x3, y3)
and so on.
The next step is to plot the data points
on a rectangular coordinate system.
This is called a scatter diagram.

For example, data and scatter diagram are

x 1 3 4 6 8 9 11 14

y 1 2 4 4 5 7 8 9
Curve Fitting
From a scatter diagram it is often
possible to visualize a smooth curve.
This is an approximate curve.
For example the data plot shown here
appears to be a straight line.
So, we say a linear relationship exists
between x and y.

x 1 3 4 6 8 9 11 14

y 1 2 4 4 5 7 8 9

A general problem of finding equation or


relationship between x and y is called
curve fitting.
Curve Fitting
Several equations can be used to approximate curves. A few a shown
here:
straight line : y  a0  a1 x
quadratic curve y  a0  a1 x  a2 x 2
cubic curve y  a0  a1 x  a2 x 2  a3 x 3
1
hyperbola : y
a0  a1 x
exponential curve y  ab x
geometric curve y  a xb

A general problem of finding equation or relationship between x and y is


called curve fitting.
Straight Line
Simplest of all the curve fitting is a straight line. The equation is

y  a0  a1 x
Given two points (x1, y1) and (x2, y2) on a
y
line, we can find the constants a0 and a1.
x2
Then the resulting equation is
y2 – y 1
y2  y1
y  y1  x  x1  x1
x2  x1
x 2 – x1
y  y1  mx  x1  y2
y1
y2  y1
m
x2  x1
x
m is called the slope of the line.
Straight Line
Example 1: Let us construct a line that approximate the data

x 2 3 5 7 9 10
y 1 3 7 11 15 17

First let us plot the data on a coordinate


system.
It is clear from the figure that all points
lie on a straight line.
This data fits a straight line exactly.
Now question is what is equation of this
Straight line:

y  a0  a1 x
Straight Line
Example 2:To determine the equation, general form of straight line equation is

y  a0  a1 x
Only two points from the figure are needed to
find a0 and a1.
Let us select (2, 1) and (3, 3) as two points.
Putting x = 2, and y =1 we get:
1 = a0 + 2a1 (1)
Then putting x = 3, y =3, we get :
3 = a0 + 3a1 (2)
Solving Eq. 1 and 2, we get a0 = -3 and a1 = 2.
Therefore, Straight line is y = -3 + 2x; -3 is y-intercept and slope is equal to +2.
As a check see any point of data is lying on this line.
Straight Line
Example 2: Let us construct a line that approximate the data

x 1 3 4 6 8 9 11 14
y 1 2 4 4 5 7 8 9

First let us plot the data on a coordinate


system.
A straight line is drawn to the data by
free-hand.
There are several possible choices for
such a line and we selected one.
Now question is what is equation of this
Straight line:
y  a0  a1 x
Straight Line
Example 2: The general form of straight line equation is
y  a0  a1 x
Only two points from the figure are needed to
find a0 and a1.

Let us select (0, 1) and (12, 7.5) as two points.


Putting x = 0, and y =1 we get:
1 = a0 +0 (1)
Then putting x = 3, y =3, we get :
7.5 = a0 + 12a1 (2)

Solving Eq. 1 and 2, we get a0 = 1 and a1 = 6.5/12 = 0.542.


Therefore, Straight line is y = 1 + 0.652x; 1 is y-intercept and slope is equal to +0.652.
Straight Line
Example 2:

Let us compare the values estimated from this equation of straight line with actual
data.

x 1 3 4 6 8 9 11 14
y 1 2 4 4 5 7 8 9
yest 1.5 2.6 3.2 4.3 5.3 5.9 7.0 8.6

The estimated values have errors.


What is Regression?
What is regression? Given n data points ( x1, y1), ( x 2, y 2), ... , ( xn, yn)
What will be the best fit for the data: y  f (x)

The best fit is generally based on minimizing the sum of the square of the
residuals, Sr.

Residual at a point is
( xn, yn )
i  yi  f ( xi )

Sum of the square of the residuals y  f (x)


n
Sr   ( yi  f ( xi )) 2 ( x1, y1)
i 1

Figure. Basic model for regression


Linear Regression-Straight line fitting
To avoid individual judgment in fitting lines, it is necessary to agree on
a definition of best fitting line. y  a 0  a1 x
The figure shows Linear
regression of y vs. x data y
x,y
showing residuals at a  i  yi  a0  a1 xi i i
typical point, xi . x ,y
n n

Of all the curves approximating


the data set, the curve having x ,y
2 2
the property that sum of all
x3 , y3
squares of residual is
minimum, that curve is called x,y x
1 1
best fitting curve.
n
   i is minimum.
2 2 2 2 2 2
        
1 2 3 4 n
i 1
Linear Regression-Straight line fitting
The best fitting line is then called least-square line. The least square line
that approximates the data (x1, y1), (x2, y2), . . . , (xn, yn) is

y  a 0  a1 x y
x,y
Where, constants are determined  i  yi  a0  a1 xi i i

from solving following: xn , y n

y  a N a x
0 1 x ,y
2 2
x3 , y3
 xy  a  x  a  x
0 1
2

x1 , y1
x
These are called normal equations
for least square line.
Examples on least square line fitting
Example 3: Given the data points find the best fit to a straight line using
least square fitting:
x y x2 xy y2

1 1 1 1 1

3 2 9 6 4

4 4 16 16 16

6 4 16 24 16

8 5 64 40 25

9 7 81 63 49

11 8 121 88 64

14 9 196 126 81

Σx =56 Σy =56 Σx2 =524 Σxy =364 Σy2 =256


Examples on least square line fitting
Example 3:

To find constants a0 and a1, let y a N a x


0 1
us use normal equations:
 xy  a  x  a  x
0 1
2

Σx =56 Σy =56 Σx2 =524 Σxy =364 Σy2 =256

Putting values of sums into above equations:

40 = 8a0 + 56a1
364 = 56a0 + 524a1
Solving these we get: a0 = 6/11 and a1 = 7/11.
So, least square line equation is
y = 0.545 + 0.636x
Examples on least square line fitting
Example 4: Farm employment in a country as a function of years is given
in millions of workers as shown in table below:

year 1935 1940 1945 1950 1955 1960 1965 1970 1975 1980
Farm
employment 12.7 11.0 10.0 9.9 8.4 7.1 5.6 4.5 4.3 3.7
(millions)

(a) Graph the data.


(b) Find an equation for the least square line fitting the data.
(c) Try to predict farm employment in year 1990 and year 2000
assuming that the trend continues,
Examples on least square line fitting
year Employment Example 4: Farm employment in a country as a
(in millions) function of years is given in millions of workers as
shown in table below:
1935 12.7
(a) Graph for the data.
1940 11.0
1945 10.0
1950 9.9
1955 8.4
1960 7.1
1965 5.6
1970 4.5
1975 4.3
1980 3.7
Examples on least square line fitting
Example 4:

Year x y x2 xy
1935 0 12.7 0 0
1940 1 11.0 1 11.0
1945 2 10.0 4 20.0
1950 3 9.9 9 29.7
1955 4 8.4 16 33.6
1960 5 7.1 25 35.5
1965 6 5.6 36 33.6
1970 7 4.5 49 31.5
1975 8 4.3 64 34.4
1980 9 3.7 81 33.3
Σx =45 Σy =77.2 Σx2 =285 Σxy = 262.6
Examples on least square line fitting
Example 4:

To find constants a0 and a1, let y  a N a x


0 1

us use normal equations:  xy  a  x  a  x


0 1
2

N = 10

Σx =45 Σy =77.2 Σx2 =285 Σxy = 262.6

Putting values of sums into above equations:

77.2 = 10a0 + 45a1 7.72 = a0 + 4.5a1


262.6 = 45a0 + 285a1 5.8356 = a0 + 6.3333a1

Solving these we get: a0 = 12.3456 and a1 = -1.0279


So, least square line equation is
y = 12.346  1.028 x
Examples on least square line fitting
Example 4: yest = 12.346 - 1.028 x

Year x y yest =
1935 0 12.7 12.35
1940 1 11.0 11.32
1945 2 10.0 10.29
1950 3 9.9 9.27
1955 4 8.4 8.24
1960 5 7.1 7.21
1965 6 5.6 6.18
1970 7 4.5 5.15
1975 8 4.3 4.13
1980 9 3.7 3.10
Example on least square curve fitting
yest = 12.346 - 1.028 x

(c) The year 1990 corresponds to x = 11 and putting this in above line
equation, we get
y = 1`.042
This results agrees with the new data that about a million workers were
there in year 1990 for farming.
Now for year 2000, x = 13
y = -1.014
This result is impossible. We therefore conclude that the linear trend of
equation does not continue for long and projections for 2000 will be
wrong based on this trend.

You might also like