Linear Regression
Linear Regression
ModuleCeowe i
Sople
the
he depende vaioble
poedict te Voniable
4 The Vahi able or Vaables wbasLo
vancb
ooe collajindeperdend
Valu a ha cependend Vakate
appsoach to molelay
is a oneas
O indepat Nahiable,
Vantables. In the Case
independers
Vaiade, Y=mx+C
msope
csntes cepk
X-Ois
Eruatoni
RsosMolel ard Reqtessioo
* Dotoe y
The
fressi on, modal aed in šimple biraan
follows
be enplairal by
blw " and y
Cderoedl by 'boand
stalistics poand e,
OwD y Sarople parormatens
estumats a
ta population
beand bi
ab
Compated Statistics
Valas a tte 'saritle We obtaln tho
Suhsutedig the
euato
astimatelesson
Gqyodror
Squctas method:
Least
procadua
tor
s a
Sauahes metbod
Uhe leobt
euodton.
to ind the estimcded
(auotey
Restaunenb
stuolenb.
(lo00s) soes
(tiooc
Popalabeo
2 88
4
5 134
16
144 bo.
tbix
4
Q0
matC;Y-
8D Y=
1604
to eprcent te eldiond,
We Can chooae a Simple^e model
t blw nd Auolenl populeo.
blw qugdey sales
into this tabe
Own Deyt task s to wse the sanple do a
in Stimated
to detetmind he vae al 'bond b the
bo t bii
Wherej
th sestcuwen.
(a,-ã)
where
obsen vaßton
Restauaaba)
–12 144
58 864
I05 64
88 -42 36
4 8 118 -6 -12 36
117 -13l
a-l406
10 137 (4 4
Qo 157 36
Kl64 34 Q34 36
y-300
8
I49 |5 64
130 12 864 1444
568
&840
l40 I300
bi QB40 =5
568
CbooHouWs,
The calaulafion ot the yintercept
bo = -h
= 130- 5x(14)
= 130 5 60
bo = 6O
=6045
orta tolowhy
hest paabola y = atbr4c
tiiyy
data
8 11
Let o= 455;
to
ar
evh
nt
Y a tbX+CX
eyuation perabola becomeb
er
tt
Normal equatons ar
2 -4 456 Q4 -46
2 6 -18
3
4
8 16
1
-4
5 10
3
3 4 16 6
18
4 1 Q56 4 16
5=0 Zy=2 Sx²=60
at2-e
Gob + Oc =5|
Oat 08C = -69
Goa t Ob
t
=) Gob 5|
.
From
b=5s0
a= 0043 -0Q63
-0.a.Gt3X*=y
Q.0043 +(0-85)6+
-oa673 X2
0043 + O86X
(x-5)-
o-d673
Ca-s
o-85) tI0
y-8= 004
4.Q5 -0-Q67332*(a5
8a5 t673
y-8 =
0043t(0-85r 6-G
«Q613r
0043 t 0.85T-4:85-0
y-8
0.4Q84+
3•5Q3-O.6430
Jeast Squeras
Multiple nean
A
Vavables
Ponc tloo
Sineon
to He dub a (Z, 4),%
+a,4 is AHed
tet on Z= a, + a,2
Su should be mininu
TO
Chapter
4.1 INTRODUCTION
a curve to
In experimental work, we often encounter the problem of fitting
is to derive an
data which are subject to errors. The strategy for such cases
approximating function that broadly fits the data without necessarily passing
through the given points. The curve drawn is such that the discrepancy
between the data points and the curve is least. In the method of least squares,
the sum of the squares of the errors is minimized. For continuous functions,
periodic systems; but for aperiodic systems, the Fourier transform is the
primary tool available. The computations of discrete Fourier transform and the
Fast Fourier Transform (FFT) are discussed in detail in Section 4.6.
SECTION 4.2: Least Squares Curve Fitting Procedures 127
then the method of least squares consists minimizing S, i.e., the sum of the
in
squares of the errors. In the following sections, we shall study the linear and
nonlinear least squares fitting to given data (,y), = 1, 2, i ..", m.
Y= do t ax be the straight line to be fitted to the given data, viz. (xb Y),
Let
i = 1, 2, ..., m. Then, corresponding to Eq. (4.2), we have
as
dao
= 0= -2)y (a t ar)] - 2'2- (ao t ajx)l
-2[m -(a t axm)] (4.4a)
and
.)
(4.5a)
and
aox + xt + a+xýt+xm=
+ xy t xy, t .tX}m
(4.5b)
or more compactly to
(4.6a)
i=1 i=1
and
128 CHAPTER 4: Least Squares and Fourier Transforms
Equations (4.6) are called the normal equations,and can be solved for do
and a, since x, and
We can obtain
y,
easily
a=
are known
m
i=l
m>
-Y quantities.
i=1
-
(4.7)
Since and are both positive at the points a, and aj, it follows that
these values provide a minimum of S. In Eq. (4.8), x and y are the means
of x and y, respectively. From Eq. (4.8), we have
which shows that the fitted straight line passes through the centroid of the
data points.
Sometimes, a goodness of fit is adopted. The correlation coefficient (cc)
is defined as
ixS,
= -S CC
(4.9)
S,
where
m
(4.10)
i=1
Example 4.1 Find the best values of do and a if the straight line
Y=a t ax is fitted to the data (r V):
(1, 0.6), (2, 2.4), (3, 3.5), (4, 4.8), (5, 5.7)
Find also the correlation coefficient.
From the table of values given below, we find x=3, y=3.4, and
5(63.6) -1517)_1 6
Procedures 129
SECTION 4.2: Least Squares Curve Fitting
1
2 2.4 4.8 1.00 0.0676
3 3.5 10.5 0.01 0.0100
4 4.8 16 19.2 1.96 0.0196
5.7 25 28.5 5.29 0.0484
15 17.0 55 63.6 16.10 0.2240
16.10 – 0.2240
The correlation coefficient = =0.9930.
16.10
Example 4.2 Certain experimental values ofx and y are given below:
(0, -1), (2, 5), (5, 12), (7, 20)
If the straight line Y = ah t ax is fitted to the above data, find the
approximate values of ao and a.
The table of values is given below.
0
5 4 10
12 25 60
7 20 49 140
14 36 78 210
4a t 14a = 36
and
14a0 + 78a, = 210
Solving the two equations, we obtain
s
2)( -a -4 -a,y) =0
as
da
-=-2y(-a - aj- a,y;) =0,
and
map t aj x, t a,y,= z,
(4.11)
1
0
1
2
4
0
1 1
00
4 1 4
2 3 4 6 6 9 9
4 2 16 16 8 64 4 32
6 8 36 48 48 64 64
13 14 33 57 63 122 78 109
do = 2, aj = 5 and a, =-3.
(a) y =ax+
This can be written as
logo = Y, logiox = X,
logob
a =Ao and a
= A1
so that
Y= A, t AX.
(c) y = ab
Taking logarithms of both sides, we obtain
logio = logi04 +x logob
Y=A t AX,
where
Y = logioy, A = log104,
X=X, and = log,ob
(yWe= a
have
A
Y=logjoy, Ao = log1o4, A, = b
and
X= log10X.
In y = In a + br
where
Y=A, + AX,
and
Y= Iny, A = In a,A, = b
132 CHAPTER 4: Least Squares and Fourier Transforms
We have
y= aex
Therefore,
In y = In a + bx
en Y=A + 4X,
where
Y = In y, Ap = ln a, A, = b and X= x.
The table of values is given below
XY= 0.905
In y
1
XY
0.905
3 1.905 5.715
5 2.905 25 14.525
7 3.905 49 27.335
9 4.905 81 44.145
We obtain
X=5, Y =2.905
5(92.625)- 25(14.525)-0.5=b.
A, = 5(165)–625
Then
A =Y-AX=2,905-0.5(5) =0.405.
Hence,
a=e0 =e0405 =1.499.
curve is of the form
It follows that the required
y= 1.499,0.Sx
a curve of the form
Using the method of least squares, fit
Example 4.5
y= a+bx
to the following data
at bx
a+bx =b+
r=, + AX,
Procedures 133
Fitting
Least Squares Curve
SECTION 4.2:
is
The table of values
XY
X Y
0.140 0.111 0.047
0.333
0.098 0.040 0.020
0.200
0.074 0.016 0.009
0.125
0.083 0.061 0.007 0.005
We obtain
4(0.081) –0.741(0.373)
A =a= =0.324, X =0.185, Y= 0.093
4(0.174) –- (0.741)
and A b= Y- aX = 0.0331.
Hence the required fit is Y = 0.0331 + 0.324(X), which simplifies to
y=
0.324 +0.033 1(x)
Note: The given data is obtained from the relation y = 0.3162 +0.0345x
4.2.4 Curve Fitting by Polynomials
Let the polynomial of the nth degree,
Equating to zero the first partial derivatives and simplifying, we obtain the
normal equations:
(4.14)
134 CHAPTER 4: Least Squares and Fourier Transforms
thatround off errors in the data may cause large changes in the solution.
Such systems occur quite often in practical problems and are called ill
conditioned systems.Orthogonal polynomials are most suited to solve such
systems and one particular form of these polynomials, the Chebyshev
polynomials,will be discussed later in this chapter.
Example 4.6 Fit a polynomial of the second degree to the data points
(r, y) given by
(0, 1), (1, 6) and (2, 17).
2 17 4 8 16 34 68
3 24 5 9 17 40 74
do = 1, a = 2 and a, = 3.
The required polynomial is given by Y= +2x+ 3x, and
1 it can be seen that
y
0.63 1 1 1 0.63 0.63
y= Ae"+A,elex (4.16)
is to be fitted
d'y dy +
dy?dx ayy
(4.17)
Now,
fwdrde=fr-nyod
(4.20)
(4.21)
Then Eq. (4.20) gives
and
+as (-)
aq
y() t
) y(g)
(4.23)
Adding Eqs. (4.22) and (4.23) and using Eq. (4.21), we obtain
(4.24)
|
20
SECTION 4.2: Least Squares Curve Fitting Procedures 137
(4.25)
A, and A, can be obtained by
Finally, the method of least squares or by the
method of averages.
)
Fit a
y= Ae +Ayeer (i)
.2 1.4
1.0 1.2
and
= 0.988 = 0.99,
= -0.96.
Using the method of least squares, we finally obtain
A = 0.499 and A, = 0.491.
The above data was actually constructed from the function y = cosh x so
that A = A, = 0.5, 2, = 1.0 and =1.0.
138 CHAPTER 4: Least Squares and Fourier Transforms
or minima, we have
For maxima
(4.28)
-=0,
dag da
which give
(4.29)
-=-2) W, Ly; -(a t ax;)]=0
dao i=l
and
(4.30)
-=-2W, Ly; -(a, t ax)] =0.
da i=l
and
2_w.. (4 32)
SECTION 4.3:
Weighted Least
Squares
which are the Approximation 139
normal equations in
f aj. We this case and
consider Example are solved to
4.2 again to obtain ao and
illustrate the use
Example 4.9 of weights.
Suppose that in the data
known to be more of Example
reliable than the 4.2, the point
(5, 12) is
10) others. Then we prescribe a
corresponding to this point weight (say,
The only and all other weights
following table is then obtained. are taken as unity.
X
W Wx Wx2 Wy Wxy
2 10-1A12pe
5
0eE 0 -1ots 0
1
5 ae 2 4 5 10
12 10 50 250 120
7 600
20 1 7 49 20 140
14 36 13 59 303 144 750
The normal Eqs. (4.31)
and (4.32)
)
then give
y=-1.349345 + 2.73799x.
Example 4.10 We
consider Example 4.9 again with
an increased weight,
say 100, corresponding to y(5.0). The
following table is then obtained.
X
Wx Wx2 Wy Wxy
0 -1 1
-1
5 1 2 4 5 10
5 12 100 500 2500 1200 6000
7 20 1
7 49 20 140
14 36 103 509 2553 1224 6150
103a,
case areAe
+ 509a, =1224
T2AR
CHAPTER 4: Least Squares and Fourier Transforms
140
we obtain
Solving the preceding equations,
ao =-1l.41258 and a =2.69056.
is therefore given by
The required linear least squares approximation'
dly=-141258 -+ 2.69056x,
=12.0402.
and the value of y(5)
when the weight is increased.
It follows that the
approximation becomes better
m
S (a0, 4js .,a,)=W; =l
Ly; - (a tax t +t a, x)]. (4.34)
=0. (4.35)
40 W tt a, W;x= W;y
m,
ta W:*;
i=1 i=1 i=1 i=1
40 Wta i=1
Wít.ta,
i=l
Wx*= i=1
W;xy; (4.36)
7 m
Let
(4.38)
a
=0, (4.39)
dao da,
which yield
b (4.40)
(4.41)
unknowns, viz. ao, 4,, d,, ...,a, and they always possess a 'unique'solution.