Curve Fitting: There Are Two General Approaches For Curve Fitting
Curve Fitting: There Are Two General Approaches For Curve Fitting
y
y i
, i 1, , n
n
St
Sy , S t ( yi y ) 2
n 1
Mathematical Background (cont’d)
i y y
2 2
2
( y y ) 2
/n
S
y or S 2
i i
n 1 y
n 1
• Coefficient of variation. Has the utility to quantify the
spread of data.
Sy
c.v. 100%
y
Least Squares Regression
Linear Regression
Fitting a straight line to a set of paired
observations: (x1, y1), (x2, y2),…,(xn, yn).
y = a 0+ a 1 x + e
a1 - slope
a0 - intercept
e - error, or residual, between the model and
the observations
Linear Regression: Residual
Linear Regression: Question
e1 e1= -e2
e2
Linear Regression: Criteria for a “Best” Fit
n n
min | ei | | yi a0 a1 xi |
i 1 i 1
Linear Regression: Criteria for a “Best” Fit
n
min max | ei || yi a0 a1 xi |
i 1
Linear Regression: Least Squares Fit
n n n
S r e ( yi , measured yi , model) ( yi a0 a1 xi ) 2
2
i
2
i 1 i 1 i 1
n n
2 2
min S r ei ( yi a0 a1 xi )
i 1 i 1
n n
2 2
min S r ei ( yi a0 a1 xi )
i 1 i 1
a 0 na0
na0 xi a1 yi 2 equations with 2
unknowns, can be solved
ii 0i 1i
y x a x a x 2
simultaneously
Linear Regression:
Determination of ao and a1
n xi yi xi yi
a1
n x xi
2 2
i
a0 y a1 x
17
18
Error Quantification of Linear Regression
i 1 i 1
Error Quantification of Linear Regression
n xi yi xi yi
a1 2
n x ( xi )
2
i
7 119 .5 28 24
2
0.8392857
7 140 28
a0 y a1 x
3.428571 0.8392857 4 0.07142857
Y = 0.07142857 + 0.8392857 x
Least Squares Fit of a Straight Line: Example
(Error Analysis)
^
2
xi yi (yi y) e ( yi y ) 2
2
i
1 0.5 8.5765 0.1687 S t
i y y 2
22.7143
2 2.5 0.8622
S r ei 2.9911
2
3 2.0 0.5625
4 4.0 2.0408 0.3473
5 3.5 0.3265 0.3265 2 St S r
6 6.0 0.0051 0.5896 r 0.868
St
7 5.5 6.6122 0.7972
28 24.0 4.2908 2.9911
22.7143 0.1993 2
r r 0.868 0.932
Least Squares Fit of a Straight Line:
Example (Error Analysis)
St 22.7143
sy 1.9457
n 1 7 1
•The standard error of estimate (quantifies the spread around the
regression line)
Sr 2.9911
sy / x 0.7735
n2 72
Because S y / x S y , the linear regression model has good fitness
Algorithm for linear regression
Linearization of Nonlinear Relationships
• The relationship between the dependent and
independent variables is linear.
• However, a few types of nonlinear functions
can be transformed into linear regression
problems.
The exponential equation.
The power equation.
The saturation-growth-rate equation.
Linearization of Nonlinear Relationships
1. The exponential equation.
y a1e b1x
ln y ln a1 b1 x
y* = a o + a1 x
Linearization of Nonlinear Relationships
2. The power equation
y a2 x b2
x
y a3
b3 x
y* = 1/y
1 1 b3 1 ao = 1/a3
a1 = b3/a3
y a3 a3 x
x* = 1/x
Example
Fit the following Equation:
b2
y a2 x
log y=-0.334+1.75log x
1.75
y 0.46x
Polynomial Regression
A parabola is preferable
Polynomial Regression (cont’d)
S r ei yi ao a1 xi a2 xi
2
2 2
Polynomial Regression (cont’d)
S r
2 ( yi ao a1 xi a2 xi2 ) 0
ao
S r
2 ( yi ao a1 xi a2 xi2 ) xi 0
a1
S r
2 ( yi ao a1 xi a2 xi2 ) xi2 0
a2
i
y n a o a1 i
x a 2 i
x 2
3 linear equations
with 3 unknowns
i i o i 1 i 2 i
x y a x a x 2
a x 3
(ao,a1,a2), can be
solved
i i o i 1 i 2 i
x 2
y a x 2
a x 3
a x 4
Polynomial Regression (cont’d)
Sr St S r
sy / x 2
r
n3 St
Polynomial Regression (cont’d)
General:
The mth-order polynomial:
y ao a1 x a2 x 2 ..... am x m e
• A system of (m+1)x(m+1) linear equations must be solved for
determining the coefficients of the mth-order polynomial.
• The standard error:
Sr
sy / x
n m 1
2 St S r
• The coefficient of determination: r
St
Polynomial Regression- Example
Fit a second order polynomial to data:
xi yi xi2 xi 3 xi4 xiyi xi2yi x i 15
0 2.1 0 0 0 0 0
1 7.7 1 1 1 7.7 7.7
y
xi3 225 i 152.6
x y 585.6
i i
x y 2488.8
2
i i
Polynomial Regression- Example (cont’d)
S r ei 3.74657
2
St yi y 2513.39
2
Polynomial Regression- Example (cont’d)
xi yi ymodel ei2 (yi-y`)2
0 2.1 2.4786 0.14332 544.42889
1 7.7 6.6986 1.00286 314.45929
2 13.6 14.64 1.08158 140.01989
3 27.2 26.303 0.80491 3.12229
4 40.9 41.687 0.61951 239.22809
5 61.1 60.793 0.09439 1272.13489
15 152.6 3.74657 2513.39333
•The standard error of estimate:
3.74657
sy / x 1.12
63
•The coefficient of determination:
2513.39 3.74657
r2 0.99851, r r 2 0.99925
2513.39
Using the Regression Equation
• Before using the regression model, we
need to assess how well it fits the data.
• If we are satisfied with how well the
model fits the data, we can use it to
predict the values of y.
• To make a prediction we use
– Point prediction, and
– Interval prediction
44
Point Prediction
• Example
– Predict the selling price of a three-year-old
Taurus with 40,000 miles on the odometer.
A point prediction
ŷ 17067 .0623x 17067 .0623( 40,000) 14,575
46
Interval Estimates,
Example
• Example - continued
– Provide an interval estimate for the bidding price
on a Ford Taurus with 40,000 miles on the
odometer.
– Two types of predictions are required:
• A prediction for a specific car
• An estimate for the average price per car
47
Interval Estimates,
Example
• Solution
– A prediction interval provides the price
estimate for a single car:
1 ( xg x ) 2
yˆ t 2 s 1
t.025,98 n ( xi x ) 2
Approximat
ely
48
Interval Estimates,
Example
• Solution – continued
– A confidence interval provides the estimate of
the mean price per car for a Ford Taurus with
40,000 miles reading on the odometer.
1 ( x g x)2
• The confidence interval (95%)ŷ = t 2 s
n
( x i x)2
49
The effect of the given xg on the
length of the interval
– As xg moves away from x the interval becomes
longer. That is, the shortest interval is found at x.
ŷ b 0 b1x g
1 ( x g x)
2
ŷ t 2 s
n (n 1)s 2x
50
The effect of the given xg on the
length of the interval
– As xg moves away from x the interval becomes
longer. That is, the shortest interval is found at x.
ŷ b 0 b1x g
1 ( x g x)
2
ŷ t 2 s
n (n 1)s 2x
ŷ( x g x 1)
ŷ( x g x 1) 1 12
ŷ t 2 s
n (n 1)s 2x
x 1 x 1
x
( x 1) x 1 ( x 1) x 1
51
The effect of the given xg on the
length of the interval
– As xg moves away from x the interval becomes longer. That is, the
shortest interval is found at x.
ŷ b 0 b1x g
1 ( x g x)
2
ŷ t 2 s
n (n 1) s 2x
1 12
ŷ t 2 s
n (n 1)s 2x
x2 x2 1 22
x ŷ t 2 s
n (n 1)s 2x
( x 2) x 2 ( x 2) x 2
52
Regression Diagnostics - I
• The three conditions required for the validity
of the regression analysis are:
– the error variable is normally distributed.
– the error variance is constant for all values of x.
– The errors are independent of each other.
• How can we diagnose violations of these
conditions?
53
Residual Analysis
• Examining the residuals (or standardized
residuals), help detect violations of the
required conditions.
• Example – continued:
– Nonnormality.
• Use Excel to obtain the standardized residual
histogram.
• Examine the histogram and look for a bell shaped.
diagram with a mean close to zero.
54
Residual Analysis
ObservationPredicted Price Residuals Standard Residuals
1 14736.91 -100.91 -0.33
2 14277.65 -155.65 -0.52
3 14210.66 -194.66 -0.65
4 15143.59 446.41 1.48
5 15091.05 476.95 1.58
A Partial list of
For each residual we calculate
Standard residuals
the standard deviation as follows:
s ri s 1 hi whereStandardized residual ‘i’ =
1 ( x i x)2 Residual ‘i’
hi Standard deviation
n (n 1)s 2x
55
Residual Analysis
Standardized residuals
40
30
20
10
0
-2 -1 0 1 2 More
56
Heteroscedasticity
• When the requirement of a constant variance is violated
we have a condition of heteroscedasticity.
• Diagnose heteroscedasticity by plotting the residual
against the predicted y. +
^y ++
Residual
+
+ + + ++
+ +
++
+ + + +
++ + +
+ ++ ++ +
+ ++ + + y^
+ +
+ +
+ + + +++
+ ++
+ +
^ y
The spread increases with 57
Homoscedasticity
• When the requirement of a constant variance is not violated we have
a condition of homoscedasticity.
• Example - continued
1000
500
Residuals
0
13500 14000 14500 15000 15500 16000
-500
-1000
Predicted Price
58
Non Independence of Error
– A time series is Variables
constituted if data were collected
over time.
– Examining the residuals over time, no pattern
should be observed if the errors are independent.
– When a pattern is detected, the errors are said to
be autocorrelated.
– Autocorrelation can be detected by graphing the
residuals against time.
59
Non Independence of Error Variables
+++ + +
+ + +
+ + +
0 + 0 + +
+ Time Time
+ + + + + +
+ +++ + +
+
Note the runs of positive residuals, Note the oscillating behavior of the
replaced by runs of negative residuals
residuals around zero.
60
Outliers
• An outlier is an observation that is unusually small or
large.
• Several possibilities need to be investigated when an
outlier is observed:
– There was an error in recording the value.
– The point does not belong in the sample.
– The observation is valid.
• Identify outliers from the scatter diagram.
• It is customary to suspect an observation is an outlier
if its |standard residual| > 2
61
An outlier An influential observation
+++++++++++
+ +
+ … but, some outliers
++
may be very influential
+ +
+
+ + + + +
+ +
+
62
Procedure for Regression
Diagnostics…
• Develop a model that has a theoretical basis.
• Gather data for the two variables in the model.
• Draw the scatter diagram to determine whether a linear
model appears to be appropriate.
• Determine the regression equation.
• Check the required conditions for the errors.
• Check the existence of outliers and influential observations
• Assess the model fit.
• If the model fits the data, use the regression
equation.
63