Module 5 Advance Math
Module 5 Advance Math
LESSON 1: INTERPOLATION
1
MODULE 5
INTRODUCTION
OBJECTIVES
2
MODULE 5
Lesson 1
INTERPOLATION
Interpolation is the process for finding a particular value between two given
points on a line or curve. Here the word, ‘inter,’ means we will ‘enter,’ into the data
set. This tool is not only useful in statistics but is also useful in science, business and
many more real-life applications that fall within two existing data points.
Linear interpolation
Linear interpolation is the simplest form of interpolation, used to estimate
values between two known data points on a straight line. It assumes a linear
relationship between the data points.
Given two points (x₁, y₁) and (x₂, y₂) where x₁ < x < x₂, the interpolated value y at a
point x can be calculated using the equation of a straight line:
y 2− y 1
y = y1 + (x−x1) ×
x 2−x 1
Example:
At the supermarket, one kilogram of dress chicken cost 180 pesos and 2 kilograms cost
360 pesos. As the butcher weighs your chosen cuts, it scales at 1.65 kilograms, how
much should you pay?
Given:
x1 = 1 y 1 = 180
x2 = 2 y 2 = 360
x = 1.65
Find: y
Using the linear interpolation formula,
y 2− y 1
y = y1 + (x−x1) ×
x 2−x 1
360−180
y = 180 + (1.65 − 1) ×
2−1
3
y = 279 pesos
MODULE 5
Polynomial interpolation
Polynomial interpolation is a method used to estimate values between known
data points using a polynomial function. Unlike linear interpolation, which uses a
straight line to connect two points, polynomial interpolation employs a polynomial
curve to pass through multiple data points.
The general form of a polynomial interpolation function is:
1 2 n
P ( x )=a 0+ a1 x +a 2 x + …+a n x
There are several types of polynomial interpolation methods, each with its own
approach to fitting a polynomial curve to a set of data points. Here are three common
types of polynomial interpolation methods along with short descriptions and a
comparison of their characteristics:
1. Lagrange Interpolation:
2. Newton interpolation
y p ( p−1) 2 p ( p−1 )( p−2) 3
p=¿ y0 + p ∆ y 0 + ∆ y0 + ∆ y 0+ …¿
2! 3!
( x−x 0)
Where: p=
h
4
It starts with a divided difference table, which is used to calculate the
coefficients of the interpolating polynomial.
Newton interpolation is generally more stable numerically than Lagrange
interpolation.
MODULE 5
It is efficient for adding new data points, as the divided difference table can be
updated without recalculating the entire polynomial.
However, constructing the divided difference table can require more
computational effort than Lagrange interpolation.
3. Hermite interpolation
n −1 n−1
P(x) = ∑ hi ( x ) y i+ ¿ ∑ g i ( x ) f '( xi )¿
i=0 i=0
Hermite interpolation is used when both function values and derivative values
are known at the data points.
It constructs a polynomial that not only passes through the data points but also
matches the derivative values at those points.
Hermite interpolation is useful for interpolating curves with known slopes or
when the underlying function is better represented by its derivatives.
However, it can be more complex to implement compared to Lagrange and
Newton interpolation methods.
Hermite interpolation may also introduce oscillations in the interpolating
polynomial, especially if the derivative values are not accurately known.
Extrapolation
Extrapolation involves predicting values beyond the range of known data points, while
interpolation estimates values within the range of known data points.
Interpolation Extrapolation
Interpolation involves estimating values Extrapolation involves estimating values
within the range of known data points. beyond the range of known data points.
It is used to approximate values between It is used to predict or forecast values
data points to fill in missing information outside the range of the available data
or to create a smooth curve that passes based on the observed trend within the
through the given data. data range.
Interpolation assumes that the Extrapolation assumes that the trend
relationship between the data points is observed within the data range will
reasonably continuous and that the continue beyond the known data points,
5
estimated values will be consistent with which may not always be valid,
the trend observed in the existing data. especially if the underlying relationship
is not well understood or subject to
change.
MODULE 5
Interpolation error
Interpolation error refers to the discrepancy between the true (actual) value and the
estimated value obtained through an interpolation method at a specific point within
the range of known data points.
The formula for interpolation error depends on the specific interpolation method
being used and the context of the problem. However, a general formula to estimate
the interpolation error is often based on the difference between the true function
value and the interpolated function value at a given point.
E ( x )=|f ( x )−P(x)|
Where:
E(x) = interpolation error
f(x) = the true function
P(x) = interpolating polynomial obtained through interpolation at point x
Lesson 2
In many scientific investigations, data are collected that relate two variables.
For example, if x is the number of dollars spent on advertising by a manufacturer and
y is the value of sales in the region in question, the manufacturer could generate data
by spending x 1 , x 2 ,… , x n dollars at different times and measuring the corresponding
sales values y 1 , y 2 , … , y n .
6
Suppose it is known that a linear relationship exists between the variables x
and y —in other words, that y=a+bx for some constants a and b . If the data are
plotted, the points (x 1 , y 1 ), (x2 , y 2) , … , ( x n , y n ) may appear to lie on a straight line and
estimating a and b requires finding the “best-fitting” line through these data points.
For example, if five data points occur as shown in the diagram, line 1 is clearly a
better fit than line 2. In general, the problem is to find the values of the constants a
MODULE 5
and b such that the line y=a+bx best approximates the data in question. Note that an
exact fit would be obtained if a and b were such that y i=a+b x i were true for each
data point ( x i , y i ). But this is too much to expect. Experimental errors in measurement
are bound to occur, so the choice of a and b should be made in such a way that the
errors between the observed values y i and the corresponding fitted values a+ b xi are
in some sense minimized. Least squares approximation is a way to do this.
The first thing we must do is explain exactly what we mean by the best fit of a
line y=a+bx to an observed set of data points (x 1 , y 1 ), (x2 , y 2) , … , ( x n , y n ). For
convenience, write the linear function r 0 +r 1 x as
f ( x )=r 0 +r 1 x
The second diagram is a sketch of what the line y=f (x ) might look like. For
each i the observed data point ( x i , y i )and the fitted point ( x i , f ( x i ) ) need not be the
same, and the distance d i between them measures how far the line misses the
observed point. For this reason, d i is often called the error at x i, and a natural
measure of how close the line y=f ( x ) is to the observed data points is the
sum d 1 +d 2 +d 3 +…+ d n of all these errors. However, it turns out to be better to use the
sum of squares
2 2 2
S=d 1 + d 2 +…+ d n
as the measure pf error, and the line y=f ( x ) is to be choses so as to make this sum as
small as possible. This line is said to be the least square approximating line for the
data points (x 1 , y 1 ), (x2 , y 2) , … , ( x n , y n ).
2
The square of the error d i is given by d i =[ yi −f ( x i ) ] for each i. So the quantity S to be
2
7
Note that all the numbers x i and y i are given here; what is required is that the
function f be chosen in such way as to minimize S. Because f ( x )=r 0 +r 1 x , this amounts
to choosing r 0 and r 1 to minimize S. This problem can be solved using Theorem. The
following notation is convenient.
[] [ ] [ ][ ]
x1 y1 f ( x1) r 0 +r 1 x 1
x= 2 y= 2 and f ( x )= f ( x 2 ) = r 0 +r 1 x 2
x y
⋮ ⋮ ⋮ ⋮
xn yn f ( xn) r 0 +r 1 x n
MODULE 5
Then the problem takes the following form: Choose r 0 and r 1 such that
2 2 2 2
S= [ y 1−f ( x 1 ) ] + [ y 2 −f ( x 2 ) ] +…+ [ y n−f ( x n ) ] =‖ y−f ( x )‖
[ ]
1 x1
M=
1
⋮
x2
⋮
and r = []
r0
r1
1 xn
r0
Then M r=f ( x ) , so we are looking for a column r =
r1 [] 2
such that ‖ y −f ( x )‖ is as small
as possible. In other words, we are looking for a best approximation z to the system
M r= y . Hence Theorem applies directly, we have
Suppose that n data points (x 1 , y 1 ), (x2 , y 2) , … , ( x n , y n ) are given, where at least two of
x 1 , x 2 ,… , x n are distinct. Put
[] [ ]
y1 1 x1
1 x2
y= y 2 M =
⋮ ⋮ ⋮
yn 1 xn
Then the least squares approximating line for these data points has equation
y=z 0 + z 1 r
Where z= []
z0
z1
is found by gaussian elimination from the normal equations
8
( M T M ) z=M T y
The condition that at least two of x 1 , x 2 ,… , x n are distinct ensures that M T M is an
invertible matrix, so z is unique:
−1
z=( M M ) M y
T T
[ ]
1 x1
M M=
T
[ 1 1
x1 x2
⋯ 1 1
⋯ x5 ⋮ ] x2
⋮
1 x5
¿
[ 5 x 1+ ⋯ + x 5
x 1 + ⋯ + x 5 x 21+ ⋯ + x 25 ][
= 5 21
21 111 ]
[]
y1
and M y=
T
[ 1 1 ⋯
x1 x 2 ⋯
1
x5 ]⋮
y2
we
y5
¿
[ y 1+ y 2 ⋯ + y 5
=
15
x 1 y 1+ x 2 y 2+ ⋯ + x 5 y 5 78 ][]
So, the normal equations (M ¿¿ T M )z=M y for z=
T
[]
z0
z1
¿ become
[ 5 21
21 111
z
= 0=
15
z 1 78 ][][ ]
9
The solution using (gaussian elimination is z= [][ ]
z0
z1
=
0.24
0.66
to two decimal places, so
MODULE 5
This method is very sensitive to outliers. In fact, this can skew the results of
the least-squares analysis.
In linear regression, the line of best fit is a straight line as shown in the following
diagram:
10
generally used in surface, polynomial and hyperplane problems, while
perpendicular offsets are utilized in common practice.
Least-square method is the curve that best fits a set of observations with a minimum
sum of squared residuals or errors. Let us assume that the given points of data are
(x 1 , y 1 ), (x2 , y 2),(x 3 , y 3 ), … ,(x n , y n ) in which all x’s are independent variables,
MODULE 5
while all y’s are dependent ones. This method is used to find a linear line of the form
y = mx + b, where y and x are variables, m is the slope, and b is the y-intercept. The
formula to calculate slope m and the value of b is given by:
m=¿
b=(∑ y−m ∑ x)/n
Here, n is the number of data points.
Following are the steps to calculate the least square using the above formulas.
Step 1: Draw a table with 4 columns where the first two columns are for x and
y points.
11
Example:
1) Let us say we have data as shown below.
x 1 2 3 4 5
y 2 5 3 8 7
MODULE 5
x y xy x2
1 2 2 1
2 5 10 4
3 3 9 9
4 8 32 16
5 7 35 25
12
m=[(5 ×88)−(15 ×25)]/(5× 55)−(15)2
m=(440−375)/(275−225)
m=65 /50=13 /10
2) Consider the set of points: (1, 1), (-2, -1), and (3, 2). Plot these points and the
least-squares regression line in the same graph.
MODULE 5
x y xy x2
1 1 1 1
-2 -1 2 4
3 2 6 9
∑x = 2 ∑y = 2 ∑xy = 9 ∑x2 = 14
So, the required equation of least squares is y=mx+ b=23/38 x+5 /19 . The required
graph is shown as:
MODULE 5
14
3) Consider the set of points: (-1, 0), (0, 2), (1, 4), and (k, 5). The values of slope
and y-intercept in the equation of least squares are 1.7 and 1.9 respectively. Can
you determine the value of k?
x y
-1 0
0 2
1 4
MODULE 5
k 5
∑x = k ∑y =11
Now, to evaluate the value of unknown k, substitute m = 1.7, b = 1.9, ∑x =k, and ∑y 11
in the formula,
b = (∑y – m ∑x)/n
1.9×4 = 11 - 1.7k
1.7k = 11 - 7.6
k = 3.4/1.7
k=2
15
Therefore, the value of k is 2.
Supplementary Questions:
1. The following data shows the sales (in million dollars) of a company.
y 12 19 29 37 45
Can you estimate the sales in the year 2020 using the regression line? ANS: 53.6
xi 8 3 2 10 11 3 6 5 6 8
yi 4 12 1 12 9 4 9 6 1 14
Use the least square method to determine the equation of line of best fit for the
data. Then plot the line. ANS: y = 3.0026 + 0.677x
MODULE 5
Lesson 3
All engineering experiments land into collection of data which has discrete values. This section
deals with techniques to fit curves to such data in order to obtain intermediate estimates.
16
The simplest method for fitting a curve to data is to plot the points and then sketch a line that
visually conforms to the data. But this approach results into different results as illustrated in
below figure.
Fitted curves can be used for data visualization, inferring function values when no data is
available, and summarizing the connections between two or more variables. Extrapolation is the
use of a fitted curve outside the range of the observed data, and it is prone to some ambiguity
since it may reflect the method used to generate the curve as much as the actual data.
In linear algebraic data analysis, "fitting" often refers to attempting to identify the curve
that minimises the vertical (y-axis) displacement of a point from the curve (e.g., ordinary least
squares). However, for graphical and picture applications, geometric fitting aims to offer the
MODULE 5
greatest visual fit, which often entails minimizing the orthogonal distance to the curve (e.g., total
least squares) or include both axes of displacement of a point from the curve.
17
Fitting of straight line
1. Find a least squares straight line for the following data:
X 1 2 3 4 5 6
Y 6 4 3 5 4 2
and estimate (predict) Y at X – 4 at Y – 4.
Solution:
2 2
X Y X Y XY
1 6 1 36 6
2 4 4 16 8
3 3 9 9 9
4 5 15 25 20
5 4 25 16 20
6 2 36 4 12
21 24 91 106 75
So
∑ X=21
∑ Y =24
∑ X 2=91
∑ Y 2=105
MODULE 5
∑ XY =75
N=6
18
Similarly, L.S.S.L. of X on Y is assumed to be
X = b0 + b1Y
Calculations of ∑ X , ∑ Y etc.
x y X = x - 1971 Y = y - 11 ∑ X2 XY
1951 10 -20 -1 400 20
MODULE 5
Now, Y = a + bX
Therefore, ∑ Y = Na + b∑ X
∑ XY = a∑ X +b ∑ X 2
Putting the above values
0 = 5a + 0 therefore, a = 0
19
Hence, the equation is Y = 0.08 X.
putting x = 1987.
y = -146.68 + 0.08 (1987) = 12.28
x 5 10 15 20 25
y 15 19 23 26 30
(ii) [Answer: y = 0.74x + 11.5]
x 1 2 3 4 5
y 14 27 40 55 68
(iii) [Answer: y = 13.6x]
x 1 2 3 4 5 6
y 1200 900 600 200 110 50
(iv) [Answer: y = 1361.97 – 243.42x]
MODULE 5
(v) Fit a straight line to the following data and estimate the production in the year 1957.
4. Fit a first-degree curve to the following data and estimate the value of y when x = 73.
x 10 20 30 40 50 60 70 80
y 1 3 5 10 6 4 2 1
[Answer: y = 4 – 0.071u where u = (x - 45)/5, y = 3.595, when x = 73]
5. Fit a straight line to the following data and estimate y when x = 12.
20
x 1 2 3 4 5 6 7 8 9 10
y 52.5 58.7 65.0 70.2 75.4 81.1 87.2 95.5 102.2 108.4
[Answer: Y = -79.62 + 3.08X where X = (x – 5.5)2, when x = 12, y = 119.66]
Polynomial regression refers to the case that want to fit a polynomial of a specific order to our
data:
linear when y= ax + b
quadratic when y = a x 2 + bx + c
cubic when y = a x 3+ b x 2+ cx + d
MODULE ;5
EXACT FIT
The samples are not noisy, and we wish to train a curve that connects each point. It is useful
when to develop finite-difference approximations or determine a function's minimums,
maximums, and zero crossings.
The figure below depicts an example of a perfect fit with polynomial interpolation.
21
Curve fitting can have certain drawbacks for optimization, such as overfitting, which occurs
when your model or system becomes overly complicated or unique to the data, losing its capacity
to generalize. This might result in poor performance or excessive risk in real trading since your
approach may not catch changing market conditions. Curve fitting may also lead to data
snooping, which occurs when you utilize the same data to build and test your model or system,
potentially inflating performance metrics or statistics.
MODULE 5
References
22
Cuemath(2024). Least Square Method. Retrieved on March 17, 2024 from https:/
/www.
cuemath.com/data/least-squares/.
Byju’s (2024). Least Square Method. Retrieved March 17, 2024 from
https://byjus.com
/maths/least-square-method/?fbclid=IwAR1lkr70u32BhO3-K0z9B0saPT8MnrEp3
WwR_Po6wTKFsxxQ1BhGyM0I8Q
Kenton (2023). Least Squares Method: What It Means, How to Use It, With Examples.
Retrieved on March 17, 2024 from https://www.investopedia.com/terms/l/
least-squares-method.asp
L. (2023, July 29). 5.6: Best Approximation and Least Squares. Mathematics
LibreTexts.https://math.libretexts.org/Bookshelves/Linear_Algebra/
Linear_Algebra_with_Applications_(Nicholson)/05%3A_Vector_Space_R/
5.06%3A_Best_Approximation_and_Least_Squares
Smith, J. D. (2022). Interpolation Methods: Theory and Applications. New York, NY:
Academic Press. Johnson, A. S., & Thompson, R. B. (2023). A Comparative
Study of Interpolation Techniques for Spatial Data Analysis. Journal of
Geographical Analysis, 45(2), 123-145.
Sandra Lach Arlinghaus, PHB Practical Handbook of Curve Fitting. CRC Press, 1994.
https://www.baeldung.com/cs/curve fitting#:~:text=There%20are%20many
%20proposed%20algorithms,estimation%20from%20the%20fitted%20curve.
23
Submitted By:
Alunday, Jerome
Baoas, Shien Grace
Caluza, Gerone
Datay, Hashley L.
Ducusin, John Rafael
Guerrero, Irish Mae
Iglesia, Haslyn
Marquez, Ryan Carylle
Quinoane, Cleyand
Tumlos, Rojen
Submitted To:
Engr. Bonifacio Cabradilla, Jr.
24