ME5001 PHMMT 2021 Introduction Combined
ME5001 PHMMT 2021 Introduction Combined
Dr.N.Arunachalam
Associate Professor
Manufacturing Engineering section
Department of Mechanical Engineering
IIT Madras, chennai-36
ME5326 – PHMMT
Objective
Benefit
Source :
Proactive or predictive maintenance through PHM methodology is key for the successful
operation of any machine tool.
ME5326 – PHMMT
Course Content
Course Structure
• Introduction
• Sensors and signals
• Data types & Preprocessing
• Feature Extraction
• Feature Selection
• Model building – Diagnostics and Prognostics
• Decision making
ME5326 – PHMMT
Evaluation Pattern
Five Number of quizzes of duration 1 hour - 30 %
Project - 50 %
Tutorial - 20%
ME5326 – PHMMT
Text Books
1. Rolf Iserman, Fault-Diagnosis Systems- An Introduction from Fault Detection
and fault tolerance, Springer publications-2011
2. Hassan El-hofy, Fundamentals of Machining Processes: Conventional and
Nonconventional Processes, CRC Press-2013
References
1. Prognostics and Health Management of Electronics, M. G. Pecht, Wiley-Interscience, New
York, NY, August 2008
2. W.J. Staszewski, C. Boller and G.R. Tomlinson, Health Monitoring of Aerospace
Structures: Smart Sensor Technologies and Signal Processing, 2004, John Wiley & Sons,
Ltd
3. G. Vachtsevanos, F. L. Lewis, M. Roemer, A. Hess, and B. Wu, Intelligent Fault Diagnosis
and Prognosis for Engineering Systems, 2006, John Wiley & Sons, Ltd
4. Geoffrey Boothroyd, Winston Anthony Knight, Fundamentals of Machining and Machine
Tools, Taylor&Francis, 2006.
5. Seifedine Kadry, Diagnostics and Prognostics of Engineering systems, IGI Global , 2012
http://phmap.org/data-challenge/
Data Pre-processing
Where Feature Extraction fits in PHM ?
Advisory
Feature extraction: what and why
What:
Spindle drive
Servo Motor
Tool holder
Tool
12 current sensors
Load of operation
Raw Data
Mean
Variance
First order
descriptive
statistics
Flakes
Gear – pitting
fs = 10000; t = -1:1/fs:1;
x1 = tripuls(t,20e-3);
x2 = rectpuls(t,20e-3);
subplot(2,1,1)
plot(t,x1) axis([-0.1 0.1 -0.2 1.2])
xlabel('Time (sec)') ylabel('Amplitude') title('Triangular Aperiodic Pulse')
subplot(2,1,2) plot(t,x2) axis([-0.1 0.1 -0.2 1.2])
xlabel('Time (sec)') ylabel('Amplitude') title('Rectangular Aperiodic Pulse')
Signal Generation
tc = gauspuls('cutoff',50e3,0.6,[],-40);
t1 = -tc : 1e-6 : tc;
y1 = gauspuls(t1,50e3,0.6);
t2 = linspace(-5,5);
y2 = sinc(t2);
subplot(2,1,1) plot(t1*1e3,y1)
xlabel('Time (ms)')
ylabel('Amplitude')
title('Gaussian Pulse')
subplot(2,1,2) plot(t2,y2)
xlabel('Time (sec)')
ylabel('Amplitude') title('Sinc
Function')
Sampling rate
Ask the students to generate signals with different
frequency and sample rate
Generate 1.5 seconds of a 50 Hz saw tooth (respectively square) wave with a sample rate of 10 kHz
Program to Generate Periodic Signals
fs = 10000;
t = 0:1/fs:1.5;
x1 = sawtooth(2*pi*50*t);
x2 = square(2*pi*50*t);
subplot(2,1,1)
plot(t,x1)
axis([0 0.2 -1.2 1.2])
xlabel('Time (sec)')
ylabel('Amplitude')
title('Sawtooth Periodic Wave')
subplot(2,1,2)
plot(t,x2)
axis([0 0.2 -1.2 1.2])
xlabel('Time (sec)')
ylabel('Amplitude')
title('Square Periodic Wave')
Aperiodic Waveforms
To generate 2 seconds of a triangular (respectively rectangular) pulse with a sample rate of 10 kHz and a
width of 20 ms
Program to Generate Aperiodic
Signals
fs = 10000;
t = -1:1/fs:1;
x1 = tripuls(t,20e-3);
x2 = rectpuls(t,20e-3);
subplot(2,1,1)
plot(t,x1)
axis([-0.1 0.1 -0.2 1.2])
xlabel('Time (sec)')
ylabel('Amplitude')
title('Triangular Aperiodic Pulse')
subplot(2,1,2)
plot(t,x2)
axis([-0.1 0.1 -0.2 1.2])
xlabel('Time (sec)')
ylabel('Amplitude')
title('Rectangular Aperiodic Pulse')
Noise – Unwanted information buried in the original signal collected using
the sensor and associated Data acquisition system
Hardware of DAQ
ADC
Quantization error
Resolution
What is Data Smoothing?
• Using filters to smooth out data by removing noise and allowing
important patterns to stand out.
• The smoothing is quantified using two parameters I) SNR II)RMSE
where y is the variable, t is the current time period, and n is the number of time
periods in the average
n=11 SNR=1.2135 RMSE = 0.0785
Anamolies –
abnormal events
First order statistics
Mean
Variance
3
Skewness
Kurtosis
Energy
μ
Skewness value Normal distribution is 0
Peakedness of the distribution
Non-stationary
Original signal
+
rand
Sum(abs(N))
Surface Surface
H= [ 1 0 -1]
H= [ 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1]
Weighted Moving Average
A weighted average is an average that has
multiplying factors to give
different weights to data at different positions in the
sample window.
k=11 SNR= 1.3001 RMSE= 0.0759
size(filtered signal)
Artifcat RC filter
2-RC filter
Edge effect
Sampling length
Cutoff = 0.8 mm
• The forecast Ft+1 is based on weighting the most recent observation yt with a
weight α and weighting the most recent forecast Ft with a weight of 1- α
Simple Exponential Smoothing Method
The smoothing process is considered local because, like the moving average
method, each smoothed value is determined by neighboring data points defined
within the span. The process is weighted because a regression weight function is
defined for the data points contained within the span. In addition to the
regression weight function, you can use a robust weight function, which makes
the process resistant to outliers. Finally, the methods are differentiated by the
model used in the regression: lowess uses a linear polynomial, while loess uses a
quadratic polynomial.
y = a0+a1x – Linear Polynomial
Y = a0+a1x+a2x2 - Quadratic Polynomial
The span can be even or odd.
You can specify the span as a percentage of the total number of data points in the
data set. For example, a span of 0.1 uses 10% of the data points.
Source : Matlab Central
Lowess and Loess
Compute the regression weights for each data point in the span. The weights are
given by the tricube function shown below.
x is the predictor value associated with the response value to be smoothed, x i are
the nearest neighbors of x as defined by the span, and d(x) is the distance along the
abscissa from x to the most distant predictor value within the span. The weights
have these characteristics:
The data point to be smoothed has the largest weight and the most influence on the
fit.
Data points outside the span have zero weight and no influence on the fit.
A weighted linear least-squares regression is performed. For lowess, the
regression uses a first degree polynomial. For loess, the regression uses a second
degree polynomial.
The smoothed value is given by the weighted regression at the predictor value of
interest. Source : Matlab Central
Local Regression Weight Function
Plots (a) and (b) use an asymmetric weight function, while plots (c) and (d) use a
symmetric weight function. Source : Matlab Central
Residual Analysis
residual = data – fit
r=y–ŷ
Compute the robust weights for each data point in the span
where ri is the residual of the ith data point produced by the regression
smoothing procedure, and MAD is the median absolute deviation of the
residuals.
Source : Matlab Central
The median absolute deviation is a measure of how spread out the residuals are. If r i
is small compared to 6MAD, then the robust weight is close to 1. If r i is greater than
6MAD, the robust weight is 0 and the associated data point is excluded from the
smooth calculation.
Smooth the data again using the robust weights. The final smoothed value is
calculated using both the local regression weight and the robust weigh
Repeat the previous two steps for a total of five iterations
dimensionality
reduction
7
Evaluation Strategies
• Filter Methods
– Evaluation is independent
of the classification
algorithm.
– The objective function
evaluates feature subsets
by their information
content, typically interclass
distance, statistical
dependence or
information-theoretic
measures.
Evaluation Strategies
• Wrapper Methods
– Evaluation uses criteria
related to the classification
algorithm.
– The objective function is a
pattern classifier, which
evaluates feature subsets
by their predictive
accuracy (recognition rate
on test data) by statistical
resampling or
cross-validation.
Filter vs Wrapper Approaches
Variance Normalized Class separation
distance
A successful classification requires that the features used have good class separability.
One estimation of this property is the variance-normalized class separation distance D
which for a feature x distributed amongst two classes j and k is given by
Correlation of feature set
Another useful assessment of the two-dimensional feature
space is the correlation of the two features x and y in a
particular class j.
• Disadvantage
– Feature correlation is not considered.
– Best pair of features may not even contain the best individual
feature.
Sequential forward selection (SFS)
(heuristic search)
• First, the best single feature is selected
(i.e., using some criterion function).
• Then, pairs of features are formed using
one of the remaining features and this best
feature, and the best pair is selected.
• Next, triplets of features are formed using
one of the remaining features and these two
best features, and the best triplet is
selected.
• This procedure continues until a predefined
number of features are selected.
SFS performs
best when the
optimal subset is
small.
15
Example
features added at
each iteration
Feature
Selection
(GA)
Wavelet Families
1. Haar Wavelets
2. Daubechies Wavelets
3. Biorthogonal Wavelets
4. Coiflets
5. Symlets
6. Morlet Wavelets
7. Mexican Hat Wavelets
8. Meyer Wavelets
Haar
Wavelets
Haar Wavelets are the oldest and simplest wavelets. The function is discontinuous and exists only on the interval
of O to 1. Between 0 and 0.5 the value is 1, and between 0.5 and I it is -1.
The Continuous Wavelet Transform
• wavelet
• decomposition
The Continuous Wavelet Transform
• Example : The mexican hat wavelet
The Continuous Wavelet Transform
• reconstruction
• admissible wavelet :
Averaging and
differencing
The Haar wavelet
Orthogonal and Bi-orthogonal Functions
Wavelets that use the same filters for decomposition (analysis) and reconstruction (synthesis) belong to the
orthogonal wavelet family. For instance, if ϕn(t)spans some n-spaces, with some input signal dn, then the transmit
sequence can be obtained as follows
In a reverse form, dn can be obtained as the inner product of s(t) and ϕ(t)
with the interpretation that ϕn(t) spans the space R and is the basis set of R if the set of { d n} differs for any
given s(t) ∈ R .
If <ϕ n (t), ϕ m (t)> = 0 , then the basis sets are orthogonal, and wavelets constructed from this form of scaling function
are orthogonal wavelets. Examples of orthogonal base wavelets in literature are the Daubechies wavelets
In MATLAB environment, the Daubechies wavelets are designated as ‘dbN” where “N” stands for the effective filter
length. These wavelets are both orthogonal and orthonormal according to the following;
Daubechies Wavelets
The Daubechies wavelets, based on the work of Ingrid Daubechies, are a family of orthogonal wavelets defining
a discrete wavelet transform and characterized by a maximal number of vanishing moments for some given
support.
Scaling function
Signal length : 8
Inverse Transform
Forward Transform
Plot and interpret how the details coefficients vary with the
decomposition level?
Introduction
Data characteristics
Application & domain
Feature extraction methods
Feature dimensionality reduction
Issues in real applications
Summary
Where Feature Extraction fits in PHM ?
Advisory
Feature extraction: what and why
What:
Fourier Transform
Mathematical Background: Complex Numbers
• Addition:
• Multiplication:
Mathematical Background: Complex Numbers
Magnitude:
Phase:
φ
Magnitude-Phase notation:
Mathematical Background: Complex Numbers
• Complex conjugate
• Properties
Mathematical Background: Complex Numbers
• Euler’s formula
• Properties
Mathematical Background:
Sine and Cosine Functions
• Periodic functions
• General form of sine and cosine functions:
Mathematical Background:
Sine and Cosine Functions
IFT DFT
π 3π/
π/ 2
2
π
π/ 3π/
2 2
shorter period
higher frequency
(i.e., oscillates faster)
f2
f= f1+f2+f3
f3
Image Transforms
• Many times, image processing tasks are
best performed in a domain other than the
spatial domain.
• Key steps: y f(x,y)
α
1
α
2
α
3
Continuous Fourier Transform (FT)
• Transforms a signal (i.e., function) from the
spatial (x) domain to the frequency (u)
domain.
where
Why is FT Useful?
• Easier to remove undesirable frequencies.
F(u + Δu)
F(u) =0 frequencie
noisy signal
s
Band-pass Output
filter image
Frequency Filtering Steps
1. Take the FT of f(x):
• Magnitude of FT (spectrum):
• Phase of FT:
• Magnitude-Phase representation:
magnitude
• Inverse FT
Example: 2D rectangle function
• FT of 2D rectangle function
2D sinc()
Discrete Fourier Transform (DFT)
Discrete Fourier Transform (DFT) (cont’d)
• Forward DFT
• Inverse DFT
1/NΔx
Example
Bb (Hz)
T Seconds DFT
N samples N Samples
n = 0, 1, 2 ………………. N-1
T= 0, Δt, 2Δt, ………….. 1/ Δt = N/T sampling frequency
k = 0, 1 ,2 ………..N-1
N – Complex multiplication
N-1 Complex addition
for each k
O (N2 ) Computations
O (N log2(N) ) – FFT
WN =
Periodic in k and n
Decimation in time FFT algorithm
n =2r
n =2r+1 , r= 0, 1 , 2 ………… N/2 -1
N/2 DFT of even samples, Xe [K] N/2 DFT of odd samples, X0 [K]
X[0] Xe[0] 1
w80 X[0]
X[2] Xe[2]
w81 X[1]
N/2 DFT 1
X[4] Xe[4] 1
w82 X[2]
Xe[6]
1
w83 X[3]
X[6] 1
1
N /2 2(N/2)2+ N = N2/2 + N
• Forward DFT
• Inverse DFT:
Extending DFT to 2D (cont’d)
• Special case: f(x,y) is N x N. N
N
• Forward DFT
• Inverse DFT
x,y = 0,1,2, …, N-1
Extending DFT to 2D (cont’d)
2D cos/sin functions
Visualizing DFT
• Typically, we visualize |F(u,v)|
• The dynamic range of |F(u,v)| is typically very
large
• Apply streching: (c is
const)
|F(u,v)| |D(u,v)|
Forward DFT:
kernel is
separable:
DFT Properties: (1) Separability (cont’d)
• Rewrite F(u,v) as follows:
• Let’s set:
• Then:
DFT Properties: (1) Separability (cont’d)
• How can we compute F(x,v)?
no after
translation translation
DFT Properties: (5) Rotation
• Rotating f(x,y) by θ rotates F(u,v) by θ
DFT Properties: (8) Average value
Average:
So:
Magnitude and Phase of DFT
• What is more important?
magnitud phas
• Hint: use the inverse DFT
e
to reconstruct the
e
input
image using magnitude or phase only information
Magnitude and Phase of DFT (cont’d)
Reconstructed image using
magnitude only
(i.e., magnitude determines the
strength of each component!)
Arunachalam et al [2008]
P = VI cos φ
P = I2 R
Time domain
Frequency domain
Magnitude 2 = Power
Clc
Clear all
S= a X sinwt
F= abs(fft(S));
Plot (F)
Power spectral density or PSD How the power of the signal is distributed
FREQUENCY over the frequency content representing the
signal?
Power spectral density plots
Arunachalam et al [2008]
0,v/2
u/2,v/2
y
fft2 u
(-u/2,0) (0,0) (u/2,0)
fftshift
x 0, -v/2
v
Shaped specimens image based surface roughness
assessment
1.6 µm 12.5 µm 50 µm
G( u,v ) = F(u,v)*H(u,v)
Short Time Fourier Transform
(STFT)
A, , m/s2
T, sec
Time
fft fft fft Mean information is
Frequency domain features not retained
Variance
Energy
Entropy Dominant
Amplitude Frequency
Skewness frequency
A1 f1 Kurtosis
A2 f2
A3 f3 Short time Fourier transform
A4 f4
(inverse DFT)
F1(u)
F2(u)
F3(u)
Fourier Analysis – Examples (cont’d)
F4(u)
Limitations of Fourier Transform
Fourier Analysis – Examples (cont’d)
Provides excellent
localization in the frequency F4(u)
domain but poor
localization in the time
domain.
Limitations of Fourier Transform
(cont’d)
Three frequency
components,
present at all
times!
F4(u)
Stationary vs non-stationary signals (cont’d)
Non-stationary signal
(varying frequency):
Three frequency
components,
NOT present at all
times!
F5(u)
Stationary vs non-stationary signals (cont’d)
Non-stationary signal
(varying frequency):
F5(u)
Limitations of Fourier Transform
(cont’d)
FT
Representing discontinuities or sharp corners
(cont’d)
Original
Reconstructed
1
Representing discontinuities or sharp corners
(cont’d)
Original
Reconstructed
2
Representing discontinuities or sharp corners
(cont’d)
Original
Reconstructed
7
Representing discontinuities or sharp corners
(cont’d)
Original
Reconstructed
23
Representing discontinuities or sharp corners
(cont’d)
Original
Reconstructed
39
Representing discontinuities or sharp corners
(cont’d)
Original
Reconstructed
63
Representing discontinuities or sharp corners
(cont’d)
Original
Reconstructed
95
Representing discontinuities or sharp corners
(cont’d)
Original
Reconstructed
127
2D function
f(t)
[0 – 300] ms 🡪 75 Hz sinusoid
[300 – 600] ms 🡪 50 Hz sinusoid
[600 – 800] ms 🡪 25 Hz sinusoid
[800 – 1000] ms 🡪10 Hz sinusoid
Example
f(t)
W(t)
scaled: t/20
Choosing Window W(t)
• What shape should it have?
– Rectangular, Gaussian, Elliptic …
scaled:
t/20
Example (cont’d)
scaled:
t/20
Heisenberg (or Uncertainty) Principle
Time resolution: How well two spikes in Frequency resolution: How well two
time can be separated from each other in spectral components can be separated
the frequency domain. from each other in the time domain
Heisenberg (or Uncertainty) Principle