Data Smoothing Functions in Mathcad
Data Smoothing Functions in Mathcad
noise than the original set but retains the basic shape and important properties
of the original data. Mathcad provides three data smoothing functions:
medsmooth, ksmooth and supsmooth.
Here's an example provided by Stuart Bruff of what the smoothers do. The
matrix contains five sets of stress-strain data from one-dimensional
compression tests on different specimens.
M :=
:= i := 0 .. cols ( M) 1
Each column of the matrix is plotted below providing very noisy data, but the
trend is apparent.
6
110
0
M
5
110
1
M
2
M 4
110
3
M
4 3
M 110
100
0 0.02 0.04 0.06 0.08 0.1
Now see what each of Mathcad's smoothing functions do to smooth out the
curves in the plots: medsmooth, ksmooth, and supsmooth.
Ma := medsmooth M i , 55 ( )
i
6
110
Ma
0
5
110
Ma
1
Ma
2 4
110
Ma
3
Ma 3
110
4
100
0 0.02 0.04 0.06 0.08 0.1
Ma := supsmooth , M i ( )
i
6
110
Ma
0
5
110
Ma
1
Ma
2 4
110
Ma
3
Ma 3
110
4
100
0 0.02 0.04 0.06 0.08 0.1
(
Ma := ksmooth , M i , 23 0
i
)
6
110
Ma
0
5
110
Ma
1
Ma
2 4
110
Ma
3
Ma 3
110
4
100
0 0.02 0.04 0.06 0.08 0.1
To see how each of these smoothers works, use the following set of data.
ORIGIN := 1 t
2 2
i := 1 .. 300 f ( t) := 5 e sin ( t)
i
ti := Qi := f ( ti)
150
j := 1 .. 150
The rnd function adds random noise to every other point in the data set. You
get a different set of data each time this worksheet is calculated.
1
Q2 j := Q2 j + rnd ( 1)
2
Points
Line Trace
4
4
2
2
Medsmooth returns a modified vector of the same size as the original data.
For almost all points in a data set, Medsmooth takes a window of data
around a given data point and replaces that point with the median of the
values in the window. The window size, n, must be an odd number so that
there are (n-1)/2 points on either side of the point being replaced.
Qmed_5 := medsmooth ( Q , 5)
Q= Qmed_5 =
Notice that elements 1 and 2 of the smoothed data remain the same as the
original data. This is because there are too few elements on either side of
these points for Mathcad to evaluate the entire window. For a window size n,
the first (n-1)/2 data points are unchanged.
Element 3 has changed. Where does this new value come from?
Since the window size is 5, Mathcad replaces element 3 by the median of
elements 1, 2, 3, 4, and 5.
and so on.
This process continues until the window size is too large to accommodate all
the necessary points at the end of the data. As with the beginning of the data,
the last (n-1)/2 points are also unchanged.
The new data set is much smoother than the original. However, there still are
several bumpy areas.
window = 5
0 0.5 1 1.5 2
data
medsmooth
A larger window may help, but then a larger number of values at the endpoints
are not smoothed.
window = 21
0 0.5 1 1.5 2
Here'
s a Mathcad program that uses a smaller window size when necessary.
0 0.5 1 1.5 2
1
data
medsmooth
program
Like medsmooth, ksmooth replaces each point in the data set with a
modified version of itself based on the values of the surrounding points.
The ksmooth function takes 3 arguments:
ksmooth ( x , y , bandwidth )
x - vector of real numbers, must be in ascending order
y - vector of real numbers
bandwidth - size of the smoothing window. Typically a few times the
spacing between the x data points
i := 1 .. 5 xi := yi :=
0.5 1
2 2
3 3
4 2
5 1
1 0 1 2 3 4 5 6
1
2 The point at x = 3 is weighted three times more
w := 3 heavily than the points at x = 1 and x = 5.
2
1
Since the weights are actually percentages of the total contribution,
divide by the sum of the weights.
0.111
0.222
w
w := w = 0.333
w 0.222
0.111
The weighted average of element 3 is:
w1 x1 + w2 x2 + w3 x3 + w4 x4 + w5 x5 = 2.944
See how this differs from the median method used in medsmooth:
median ( x1 , x2 , x3 , x4 , x5) = 3
Kernels - The distance to each point affects how much of an impact an
element has on the final result. In the example above, elements 1 and 5
have the same weight because they are each 2 elements away from element
3. However, element 1 is farther away from the point being modified
(element 3), it has a smaller effect on the results.
Manually changing the vector of weights is not efficient because when you
modify other elements, they may not fall into this same pattern.
K ( d) 0 for all d
K ( x) dx = 1 All weights "add" to 1
K ( d) 0.5
2 1 0 1 2
d
Bandwidth controls the number of points considered. Unlike the window size
used in medsmooth, this is not a number of points. Rather, it is the actual size
of the window on the x-axis. This value is typically be a few times the spacing
between values on the x-axis so that several points are considered.
The kernel function used in ksmooth can be manually defined as follows:
last ( x)
xi xj
K yi
bw
i=1
S ( j , bw) :=
last ( x)
xi xj
K
bw
i=1
bw - bandwidth
xj - the point being replaced
0 0.5 1 1.5 2
data
bandwidth = 0.1
bandwidth = 0.2
bandwidth = 0.5
The supsmooth function takes two arguments:
supsmooth ( x , y)
x - vector of real numbers, must be in strictly increasing order (no
repeating x values)
y - vector of real numbers
The supsmooth algorithm uses a local smoother that does a localized linear fit.
Consider the first five points in Q and t. The line function can be used to find
a best-fit line through these points. Essentially, just as medsmooth replaced
the point with the median value of the 5 points, supsmooth replaces the point
with the value of the best-fit line evaluated at the same x value.
t1 Q1
t2 Q2
b
:= line t3 , Q3
m
t4 Q4
t5 Q5
f ( x) := m x + b
0.5
0.5
Unlike the other smoothers, supsmooth does not take an argument such as
window size or bandwidth. The supsmooth function adaptively adjusts the
size of the window based on the behavior of the data.
Qsup := supsmooth ( t , Q)
0 0.5 1 1.5 2
1