Curve Fitting, Regression and
Correlation Analysis
Pe
yr=ODUCTION
Approximating curve is the graph of data obtained
though measurement or observation, Curve fitting
site process of finding the “best fit” curve since
dierent approximating curves can be obtained for
tesamedata. Least squares method is the best curve
fingmethod and is easily implemented on comput-
esihan the other methods like method of moments,
ehod of group averages, graphical method. We also
consider curve fitting by
axveighted and non-linear weighted least squares
‘approximation.
Inthe unvariate case, a single variable say the
Height of an Indian, is analyzed. Whereas in the bi-
Yafatecase, two "numerical" variables are measured
resulting in a pair of measurements for each m
ber say the height and weight of an Indian; the age
424 the blood pressure of a person; amount spent
‘advertising and volume of sales; intake of nutri-
{sfood and 1.Q of a student etc. In the correlation
‘alysis, one wish to find whether @ (mathematical)
onship exists and measure the strength of such
ip. In the regression analysis, the exact na
fia form of mathematical equation (of the re
is obtaned. While the correlation coefficient
the “closeness”, the "regression equation
for prediction (or estimation).
sum of exponentials, lin-
FITTING
of finding equation ©!
set of data. On the basis
fa curve that
of this
mathematical equation, predictions can be made in
many statist
Relationship
Relationship (or as
variables may exist.
ion) between two (or more)
Examples: Blood pressure and age, rainfall and
crop yield, volume of a cube and length of its side,
consumption of food and weight gain, intake of drug
and heart rate, height and weight, income and medi-
cal care, nutrition and 1.Q.
Scatter Diagram
To find a mathematical relationship (equation) be-
tween say two variables X and Y, plot the set
of given N paired observations of X and Y ie.
(X1, Yi), (Xa, Ya), «++ Xn ¥w) in the XY-plane.
The resulting set of points is known as a scatter dia-
gram.
‘Approximating Curve
It isa smooth curve that approximates the given set
of N data points plotted in the scatter diagram.
Collocational polynomial For unequally spaced
X's N coefficients doy, @2,---,aw-=1 in the pes
Jocational polynomial
Y(X) = 9 + aX + aX? ++ Lona
can be determined so thatthe given set of NV data
points (X1+ Yy)e-+- (Xe Yn) lies (collocates) on the
curve (satisfies ‘the above equation)
30.1Parabola oF quadratic ¢
3. Yea taix4..,
nth degree (polyno,
Lagrange’ interpolation formula is uses ee
Serre rtes pons Hs age and
‘the empirical or observed data contains errors (i.¢..
§
ae
random variables). In such exponential
Bempe dee etacin Ie ya axe | pnts
Y = f(x) containing few unknown parameters. Ber siy ren
‘Best fitting curve by method of least squares Let
(6), Ys (Xa, Yad (ips Py) be a given set of NV
data points. Let dj = ¥; ~ ¥; denotes the difference
‘between ¥; and the
Curve Fiting by Least Squares
() Least squares si
Lati-o Dura yd °
Pina D xa De tale
oF 0% = a 0X) =0
w ”
te, O% = Nata OX
linear and can be solved
See: Normal equations of the mh degree Ext
Bic teote ly by molipying @)
O both sides 1X" and summing
orm o>
a
ar
=200; -a
© rx -e Duta Le .
‘Thus the two unknown parameters 4°
ee vit (Extrapolation) isto nd Y cor.
toa value of X included between "0%
io to the) given values of X-
© Drenm ta DK
: ern asx
* Result 1;
Psp PHC He aay
Sohation: Lat
Prater o
‘esheets ine LS.) of 0 Xs ermal
Lr=swsaDx
stag 9a, EE BE
« Y= apt ak @
ae through (X. 7).
Xatthr °
ivding by its normal equation, we get
eee
EX yin et tawny
SoLSL passes through ¥. 7).
Result 2: Prove that LSI. ¥ on X can be ex
presses
over igh HP), Pasta
Seine
y-F=ax-H
e year
isco stow tas = $33
Lr- seen le
pxreaDreare
wyxr-Ex5r
any
-0.12 at 001 level of significance.
Hint: N = 6,30 X = 42.0 Y = 78.07
= 364, DXY = 48.6, 5 YF = 10.68
Sex = $00, Sry = 3.24, Sur = ~36.5¢ =
0.08017.
Ans, a. Y(X) = 19 — 0.086X, ¥(5) = L47
1 confidence interval for a (1.6933, 2.1067)
1 = 3.58, reject oul hypothesis
=33,
Ex
i355,
bb Determine a 95% confidence interval for
a.
: Testibe hypothesis B = LOagainst < 1.0.
4. Testthe hyposhesisthata = Oagainstar # 0
at 005 level of significance.
Ans, 2. Y =38296 + 0.9036X, ¥(25) = 26.4196
08012