0% found this document useful (0 votes)
137 views243 pages

ME5001 PHMMT 2021 Introduction Combined

This document provides information about the course ME 5326 Prognostics and Health Management of Machine Tools taught by Dr. N. Arunachalam at IIT Madras. The objectives of the course are to teach students prognostics and health management technologies to predict failures, enable condition-based maintenance, assess life extension, and improve future product designs. Upon completing the course, students will be able to develop and implement PHM concepts for electrical, mechanical, and electromechanical systems. The course content includes topics such as sensors, data preprocessing, feature extraction, modeling, and decision making. Students are evaluated based on quizzes, a project, and tutorials.

Uploaded by

Hemant
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
137 views243 pages

ME5001 PHMMT 2021 Introduction Combined

This document provides information about the course ME 5326 Prognostics and Health Management of Machine Tools taught by Dr. N. Arunachalam at IIT Madras. The objectives of the course are to teach students prognostics and health management technologies to predict failures, enable condition-based maintenance, assess life extension, and improve future product designs. Upon completing the course, students will be able to develop and implement PHM concepts for electrical, mechanical, and electromechanical systems. The course content includes topics such as sensors, data preprocessing, feature extraction, modeling, and decision making. Students are evaluated based on quizzes, a project, and tutorials.

Uploaded by

Hemant
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 243

ME 5326

PROGNOSTICS AND HEALTH MANAGEMENT OF


MACHINE TOOLS

Dr.N.Arunachalam
Associate Professor
Manufacturing Engineering section
Department of Mechanical Engineering
IIT Madras, chennai-36
ME5326 – PHMMT

Objective

The knowledge of prognostics and health management technologies will


prepare the students to develop and implement the PHM
methodologies in real time to predict the failure; to convert the
scheduled maintenance in to condition based maintenance; to assess
the life extension capability; to improve the future designs of the product
(Design for serviceability).

Benefit

On completion of this course, the students will have the fundamental


knowledge and skills to develop and implement PHM concepts for
electrical, mechanical, electro-mechanical systems.
Industrial Relevance
To reduce the manufacturing variability across multiple
components produced by the same machine.

To enable the machine to self-asses its capability before it


starts producing the parts

Unplanned to planned maintenance through CBM

Life extension through adaptive control

Source :

Proactive or predictive maintenance through PHM methodology is key for the successful
operation of any machine tool.
ME5326 – PHMMT
Course Content
Course Structure
• Introduction
• Sensors and signals
• Data types & Preprocessing
• Feature Extraction
• Feature Selection
• Model building – Diagnostics and Prognostics
• Decision making
ME5326 – PHMMT
Evaluation Pattern
Five Number of quizzes of duration 1 hour - 30 %

Project - 50 %

Tutorial - 20%
ME5326 – PHMMT
Text Books
1. Rolf Iserman, Fault-Diagnosis Systems- An Introduction from Fault Detection
and fault tolerance, Springer publications-2011
2. Hassan El-hofy, Fundamentals of Machining Processes: Conventional and
Nonconventional Processes, CRC Press-2013
References
1. Prognostics and Health Management of Electronics, M. G. Pecht, Wiley-Interscience, New
York, NY, August 2008
2. W.J. Staszewski, C. Boller and G.R. Tomlinson, Health Monitoring of Aerospace
Structures: Smart Sensor Technologies and Signal Processing, 2004, John Wiley & Sons,
Ltd
3. G. Vachtsevanos, F. L. Lewis, M. Roemer, A. Hess, and B. Wu, Intelligent Fault Diagnosis
and Prognosis for Engineering Systems, 2006, John Wiley & Sons, Ltd
4. Geoffrey Boothroyd, Winston Anthony Knight, Fundamentals of Machining and Machine
Tools, Taylor&Francis, 2006.
5. Seifedine Kadry, Diagnostics and Prognostics of Engineering systems, IGI Global , 2012
http://phmap.org/data-challenge/
Data Pre-processing
Where Feature Extraction fits in PHM ?

Data Acquisition (DA) normalization, smoothing, outlier


removal, missing data imputation

Data Manipulation (DM)


Feature Extraction in
data-driven PHM solutions
State Detection (SD)

Health Assessment (HA)

Prognostics Assessment (PA)

Advisory
Feature extraction: what and why

What:

Feature extraction transforms raw signals into more


informative signatures or fingerprints of a system
Why:
•Extract information from data
•Serve the need of follow-up modeling procedures
•Achieve intended objectives
Problem: bearing health assessment
Data: vibration (from accelerometers)
Extract frequency domain features:

• Segment the data with a certain time window

• Transform each segment into frequency


spectrum with FFT

• Calculate energy for each frequency band


around interested frequency F where Af is the
amplitude of frequency f

• Obtain feature vector [EF1, EF2, ...]


Schematic of three axis milling
machine
Spindle on off
Coolant on off Feed drives for X, Y, Z axis
Spindle speed
Feed drives speed Servo motors
Closed loop with feedback

Spindle drive
Servo Motor
Tool holder
Tool

Relative movement between the


cutting tool and workpiece

12 current sensors
Load of operation
Raw Data

Mean
Variance

First order
descriptive
statistics
Flakes
Gear – pitting

Bearing – Fatigue – Brinelling

HRSG - Thermal cycles

Battery – Cyclic load


Signal Generation
Signal Generation

fs = 10000; t = -1:1/fs:1;
x1 = tripuls(t,20e-3);
x2 = rectpuls(t,20e-3);
subplot(2,1,1)
plot(t,x1) axis([-0.1 0.1 -0.2 1.2])
xlabel('Time (sec)') ylabel('Amplitude') title('Triangular Aperiodic Pulse')
subplot(2,1,2) plot(t,x2) axis([-0.1 0.1 -0.2 1.2])
xlabel('Time (sec)') ylabel('Amplitude') title('Rectangular Aperiodic Pulse')
Signal Generation

tc = gauspuls('cutoff',50e3,0.6,[],-40);
t1 = -tc : 1e-6 : tc;
y1 = gauspuls(t1,50e3,0.6);

t2 = linspace(-5,5);
y2 = sinc(t2);
subplot(2,1,1) plot(t1*1e3,y1)
xlabel('Time (ms)')
ylabel('Amplitude')
title('Gaussian Pulse')
subplot(2,1,2) plot(t2,y2)
xlabel('Time (sec)')
ylabel('Amplitude') title('Sinc
Function')
Sampling rate
Ask the students to generate signals with different
frequency and sample rate

Estimate different time domain


Parameters
Signal Generation and Visualization

Generate 1.5 seconds of a 50 Hz saw tooth (respectively square) wave with a sample rate of 10 kHz
Program to Generate Periodic Signals
fs = 10000;
t = 0:1/fs:1.5;
x1 = sawtooth(2*pi*50*t);
x2 = square(2*pi*50*t);
subplot(2,1,1)
plot(t,x1)
axis([0 0.2 -1.2 1.2])
xlabel('Time (sec)')
ylabel('Amplitude')
title('Sawtooth Periodic Wave')
subplot(2,1,2)
plot(t,x2)
axis([0 0.2 -1.2 1.2])
xlabel('Time (sec)')
ylabel('Amplitude')
title('Square Periodic Wave')
Aperiodic Waveforms

To generate 2 seconds of a triangular (respectively rectangular) pulse with a sample rate of 10 kHz and a
width of 20 ms
Program to Generate Aperiodic
Signals
fs = 10000;
t = -1:1/fs:1;
x1 = tripuls(t,20e-3);
x2 = rectpuls(t,20e-3);
subplot(2,1,1)
plot(t,x1)
axis([-0.1 0.1 -0.2 1.2])
xlabel('Time (sec)')
ylabel('Amplitude')
title('Triangular Aperiodic Pulse')
subplot(2,1,2)
plot(t,x2)
axis([-0.1 0.1 -0.2 1.2])
xlabel('Time (sec)')
ylabel('Amplitude')
title('Rectangular Aperiodic Pulse')
Noise – Unwanted information buried in the original signal collected using
the sensor and associated Data acquisition system

Sensor - lower sampling rate

DAQ- SENSOR SIGNAL AT A TIME


- 10 kS/S

Under sampling leads harmonic distortion

Shape distortion to the signal

Analog filter for removing the noise

Hardware of DAQ

ADC
Quantization error
Resolution
What is Data Smoothing?
• Using filters to smooth out data by removing noise and allowing
important patterns to stand out.
• The smoothing is quantified using two parameters I) SNR II)RMSE

• SNR (Signal to Noise Ratio): It is defined as the ratio of signal


power to the noise power, often expressed in decibels.
• RMSE (Root Mean Square Error):It is measure of the differences
between values predicted by a model or an estimator and the
values actually observed.
SNR (Signal to Noise Ratio):
Some Common Data Smoothing
Techniques
• Moving Average
• Exponential Moving Average
• Weighted Moving Average
• Savitzky - Golay Filter
• Loess & Robust Loess Filter
1. Moving Average

• It is calculation to analyze data points by creating series of


averages of different subsets of the full data set.
• Simple Moving Average (SMA)
It is the unweighted mean of the previous n data.

where y is the variable, t is the current time period, and n is the number of time
periods in the average
n=11 SNR=1.2135 RMSE = 0.0785

Anamolies –
abnormal events
First order statistics

Mean

Variance
3
Skewness

Kurtosis

Energy

μ
Skewness value Normal distribution is 0
Peakedness of the distribution

Non-stationary
Original signal

+
rand

Sum(abs(N))
Surface Surface

Assume the process of surface production as stationary process


Local variance of the data – Need to be retained

H= [ 1 0 -1]
H= [ 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1]
Weighted Moving Average
A weighted average is an average that has
multiplying factors to give
different weights to data at different positions in the
sample window.
k=11 SNR= 1.3001 RMSE= 0.0759

size(filtered signal)
Artifcat RC filter

2-RC filter

Edge effect
Sampling length

Cutoff = 0.8 mm

Evaluation length = 5 x 0.8 = 4 mm

Measuring length = 7x 0.8 =5.6 mm


Simple Exponential Smoothing Method

• Formally, the exponential smoothing equation is

• forecast for the next period.


• α = smoothing constant.
• yt = observed value of series in period t.
• = old forecast for period t.

• The forecast Ft+1 is based on weighting the most recent observation yt with a
weight α and weighting the most recent forecast Ft with a weight of 1- α
Simple Exponential Smoothing Method

• The implication of exponential smoothing can be better seen if the


previous equation is expanded by replacing Ft with its components as
follows:
Smooth(
Simple Exponential Smoothing Method

• The following table shows the weights assigned to past observations


for α = 0.2, 0.4, 0.6, 0.8, 0.9
Savitzky Golay filter
Savitzky Golay filter
Savitzky Golay filter
Savitzky Golay filter
One approach for smoothing the time series is to replace each value of the series
with a new value which is obtained from a polynomial fit to 2n+1 neighboring
points (including the point to be smoothed), with n being equal to, or greater
than the order of the polynomial.
Method of Least Squares
Method of Least Squares
Local Regression Smoothing

Lowess and Loess

The Local Regression Method

Robust Local Regression

Source : Matlab Central


Lowess and Loess
The names “lowess” and “loess” are derived from the term “locally weighted
scatter plot smooth,” as both methods use locally weighted linear regression to
smooth data.

The smoothing process is considered local because, like the moving average
method, each smoothed value is determined by neighboring data points defined
within the span. The process is weighted because a regression weight function is
defined for the data points contained within the span. In addition to the
regression weight function, you can use a robust weight function, which makes
the process resistant to outliers. Finally, the methods are differentiated by the
model used in the regression: lowess uses a linear polynomial, while loess uses a
quadratic polynomial.
y = a0+a1x – Linear Polynomial
Y = a0+a1x+a2x2 - Quadratic Polynomial
The span can be even or odd.

You can specify the span as a percentage of the total number of data points in the
data set. For example, a span of 0.1 uses 10% of the data points.
Source : Matlab Central
Lowess and Loess
Compute the regression weights for each data point in the span. The weights are
given by the tricube function shown below.

x is the predictor value associated with the response value to be smoothed, x i are
the nearest neighbors of x as defined by the span, and d(x) is the distance along the
abscissa from x to the most distant predictor value within the span. The weights
have these characteristics:
The data point to be smoothed has the largest weight and the most influence on the
fit.

Data points outside the span have zero weight and no influence on the fit.
A weighted linear least-squares regression is performed. For lowess, the
regression uses a first degree polynomial. For loess, the regression uses a second
degree polynomial.

The smoothed value is given by the weighted regression at the predictor value of
interest. Source : Matlab Central
Local Regression Weight Function

Source : Matlab Central


Smoothening Function

Plots (a) and (b) use an asymmetric weight function, while plots (c) and (d) use a
symmetric weight function. Source : Matlab Central
Residual Analysis
residual = data – fit
r=y–ŷ

Source : Matlab Central


Residual Analysis

Source : Matlab Central


Robust Weighted Regression
If your data contains outliers, the smoothed values can become distorted, and not
reflect the behavior of the bulk of the neighboring data points. To overcome this
problem, you can smooth the data using a robust procedure that is not influenced
by a small fraction of outliers.

Calculate the residuals from the smoothing procedure described in the


previous section.

Compute the robust weights for each data point in the span

where ri is the residual of the ith data point produced by the regression
smoothing procedure, and MAD is the median absolute deviation of the
residuals.
Source : Matlab Central
The median absolute deviation is a measure of how spread out the residuals are. If r i
is small compared to 6MAD, then the robust weight is close to 1. If r i is greater than
6MAD, the robust weight is 0 and the associated data point is excluded from the
smooth calculation.
Smooth the data again using the robust weights. The final smoothed value is
calculated using both the local regression weight and the robust weigh
Repeat the previous two steps for a total of five iterations

Plot (a) shows that the outlier


influences the smoothed value
for several nearest neighbors.
Plot (b) suggests that the
residual of the outlier is greater
than six median absolute
deviations. Therefore, the
robust weight is zero for this
data point. Plot (c) shows that
the smoothed values
neighboring the outlier reflect
the bulk of the data.
Source : Matlab Central
Examples
x = 15*rand(150,1);
yy1 = smooth(x,y,0.1,'loess');
y = sin(x) + 0.5*(rand(size(x))-0.5);
yy2 = smooth(x,y,0.1,'rloess');
y(ceil(length(x)*rand(2,1))) = 3;

Source : Matlab Central


FEATURE SELECTION
Feature Selection
Feature selection aims to select a feature subset that
has discriminative information from the original feature
set. In practice, we do not know what classifier is used
beforehand, and it is preferable to find a feature subset
that is universally effective for any classifier. Such a trial
is called classifier-independent feature selection and can
be made by removing garbage features that have no
discriminative information.
Feature Selection
• Given a set of n features, the role of feature
selection is to select a subset of size d (d < n) that
leads to the smallest classification error.

dimensionality
reduction

• Fundamentally different from dimensionality reduction


(e.g., PCA or LDA)
Why is feature selection important?

• Features may be expensive to obtain


– You evaluate a large number of features (sensors) in the test bed
and select only a few for the final implementation
• You may want to extract meaningful rules from your
classifier
– When you transform or project, the measurement units (length,
weight, etc.) of your features are lost
• Features may not be numeric
– A typical situation in the machine learning domain
Feature Selection Methods
• Feature selection is an
optimization problem.
– Search the space of possible
feature subsets.

– Pick the subset that is optimal


or near-optimal with respect to
an objective function.
Feature Selection Methods
• Feature selection is an optimization problem.
– Search the space of possible feature subsets.
– Pick the subset that is optimal or near-optimal with respect to
a certain criterion.

Search strategies Evaluation strategies


– Optimum - Filter methods
– Heuristic - Wrapper methods
– Randomized
Search Strategies
• Assuming m features, an exhaustive search would
require:

– Examining all possible subsets of size d.

– Selecting the subset that performs the best according to the


criterion function.
• The number of subsets grows combinatorially, making
exhaustive search impractical.
• Iterative procedures are often used based on heuristics
but they cannot guarantee the selection of the optimal
subset.

7
Evaluation Strategies
• Filter Methods
– Evaluation is independent
of the classification
algorithm.
– The objective function
evaluates feature subsets
by their information
content, typically interclass
distance, statistical
dependence or
information-theoretic
measures.
Evaluation Strategies
• Wrapper Methods
– Evaluation uses criteria
related to the classification
algorithm.
– The objective function is a
pattern classifier, which
evaluates feature subsets
by their predictive
accuracy (recognition rate
on test data) by statistical
resampling or
cross-validation.
Filter vs Wrapper Approaches
Variance Normalized Class separation
distance
A successful classification requires that the features used have good class separability.
One estimation of this property is the variance-normalized class separation distance D
which for a feature x distributed amongst two classes j and k is given by
Correlation of feature set
Another useful assessment of the two-dimensional feature
space is the correlation of the two features x and y in a
particular class j.

where xj and yj are the means and xj and yj are the


standard deviations of N samples in class j.
Filter vs Wrapper Approaches (cont’d)
Naïve Search
• Sort the given n features in order of their probability of
correct recognition.

• Select the top d features from this sorted list.

• Disadvantage
– Feature correlation is not considered.
– Best pair of features may not even contain the best individual
feature.
Sequential forward selection (SFS)
(heuristic search)
• First, the best single feature is selected
(i.e., using some criterion function).
• Then, pairs of features are formed using
one of the remaining features and this best
feature, and the best pair is selected.
• Next, triplets of features are formed using
one of the remaining features and these two
best features, and the best triplet is
selected.
• This procedure continues until a predefined
number of features are selected.
SFS performs
best when the
optimal subset is
small.
15
Example
features added at
each iteration

Results of sequential forward feature selection for classification of a satellite image


using 28 features. x-axis shows the classification accuracy (%) and y-axis shows the
features added at each iteration (the first iteration is at the bottom). The highest
accuracy value is shown with a star. 16
Sequential backward selection (SBS)
(heuristic search)
• First, the criterion function is computed for all
n features.
• Then, each feature is deleted one at a time,
the criterion function is computed for all
subsets with n-1 features, and the worst
feature is discarded.
• Next, each feature among the remaining n-1
is deleted one at a time, and the worst
feature is discarded to form a subset with n-2
features.
• This procedure continues until a predefined
number of features are left. SBS performs
best when the
optimal subset is
large.
17
Example
features removed at
each iteration

Results of sequential backward feature selection for classification of a satellite image


using 28 features. x-axis shows the classification accuracy (%) and y-axis shows the
features removed at each iteration (the first iteration is at the top). The highest accuracy
value is shown with a star. 18
Bidirectional Search (BDS)

• BDS applies SFS and SBS simultaneously:


– SFS is performed from the empty set
– SBS is performed from the full set
• To guarantee that SFS and SBS converge
to the same solution
– Features already selected by SFS are not
removed by SBS
– Features already removed by SBS are not
selected by SFS
“Plus-L, minus-R” selection (LRS)
• A generalization of SFS and SBS
– If L>R, LRS starts from the empty set and:
• Repeatedly add L features
• Repeatedly remove R features
– If L<R, LRS starts from the full set and:
• Repeatedly removes R features
• Repeatedly add L features

• LRS attempts to compensate for the weaknesses of


SFS and SBS with some backtracking capabilities.
Sequential floating selection
(SFFS and SFBS)
• An extension to LRS with flexible backtracking capabilities
– Rather than fixing the values of L and R, floating methods
determine these values from the data.
– The dimensionality of the subset during the search can be
thought to be “floating” up and down

• There are two floating methods:


– Sequential floating forward selection (SFFS)
– Sequential floating backward selection (SFBS)

P. Pudil, J. Novovicova, J. Kittler, Floating search methods in feature


selection, Pattern Recognition Lett. 15 (1994) 1119–1125.
Sequential floating selection
(SFFS and SFBS)
• SFFS
– Sequential floating forward selection (SFFS) starts from the
empty set.
– After each forward step, SFFS performs backward steps as long
as the objective function increases.
• SFBS
– Sequential floating backward selection (SFBS) starts from the full
set.
– After each backward step, SFBS performs forward steps as long
as the objective function increases.
Feature Selection using
Genetic Algorithms (GAs)
(randomized search)

GAs provide a simple, general, and powerful framework


for feature selection.

Pre- Feature Feature


Processing Classifier
Extraction Subset

Feature
Selection
(GA)
Wavelet Families

1. Haar Wavelets
2. Daubechies Wavelets
3. Biorthogonal Wavelets
4. Coiflets
5. Symlets
6. Morlet Wavelets
7. Mexican Hat Wavelets
8. Meyer Wavelets
Haar
Wavelets
Haar Wavelets are the oldest and simplest wavelets. The function is discontinuous and exists only on the interval
of O to 1. Between 0 and 0.5 the value is 1, and between 0.5 and I it is -1.
The Continuous Wavelet Transform

• wavelet

• decomposition
The Continuous Wavelet Transform
• Example : The mexican hat wavelet
The Continuous Wavelet Transform
• reconstruction

• admissible wavelet :

• simpler condition : zero mean wavelet

Practically speaking, the reconstruction formula is of no use.


Need for discrete wavelet transforms wich preserve exact
reconstruction.
The Haar wavelet

•A basis for L2( R) :

Averaging and
differencing
The Haar wavelet
Orthogonal and Bi-orthogonal Functions

Wavelets that use the same filters for decomposition (analysis) and reconstruction (synthesis) belong to the
orthogonal wavelet family. For instance, if ϕn(t)spans some n-spaces, with some input signal dn, then the transmit
sequence can be obtained as follows

In a reverse form, dn can be obtained as the inner product of s(t) and ϕ(t)

with the interpretation that ϕn(t) spans the space R and is the basis set of R if the set of { d n} differs for any
given s(t) ∈ R .

If <ϕ n (t), ϕ m (t)> = 0 , then the basis sets are orthogonal, and wavelets constructed from this form of scaling function
are orthogonal wavelets. Examples of orthogonal base wavelets in literature are the Daubechies wavelets

In MATLAB environment, the Daubechies wavelets are designated as ‘dbN” where “N” stands for the effective filter
length. These wavelets are both orthogonal and orthonormal according to the following;
Daubechies Wavelets
The Daubechies wavelets, based on the work of Ingrid Daubechies, are a family of orthogonal wavelets defining
a discrete wavelet transform and characterized by a maximal number of vanishing moments for some given
support.

Scaling function

The wavelet function coefficient values are: g0 = h3 , g1 = -h2 g2 = h1 g3 = -h0


Daubechies Wavelets

Signal length : 8

Inverse Transform
Forward Transform

A similar problem exists in the case of the


In the case of the forward transform, with a finite data set
inverse transform. Here the inverse
(as opposed to the mathematician's imaginary infinite data
transform coefficients extend beyond the
set), i will be incremented until it is equal to N-2. In the last
beginning of the data, where the first two
iteration the inner product will be calculated from calculated
inverse values are calculated from s[-2],
from s[N-2], s[N-1], s[N] and s[N+1]. Since s[N] and s[N+1]
s[-1], s[0] and s[1].
don't exist (they are byond the end of the array), this
presents a problem
Bi-orthogonal Wavelets
Biorthogonal wavelets constitute a generalization of
orthogonal wavelets. Under this framework, instead of a
single orthogonal basis, a pair of dual biorthogonal basis
functions is employed: One for the analysis step and the
other for the synthesis step
If the scaling functions basis set arises from different
sources but still satisfy the orthogonal and orthonormal
properties such as

where ϕn(t) and ξm(t) are different orthogonal basis sets,


then the scaling function basis sets are said to be
biorthogonal. The basis sets are among themselves
biorthogonal but not orthogonal to each other
Bi-orthogonal Wavelets
Coiflets
Coiflets are discrete wavelets designed by Ingrid Daubechies , have scaling functions with vanishing
moments. The wavelet is near symmetric, their wavelet functions have N/3 vanishing moments and
scaling functions N/3-1.
Symlets
They are a modified version of Daubechies wavelets with increased
symmetry.
Morlet Wavelets
The real-valued Morlet wavelet is defined as: ψ(x) =C e−x2 cos(5x)
Mexican Hat Wavelets
Meyer Wavelets
The Meyer wavelet is an orthogonal wavelet that is indefinitely differentiable with infinite support. The Mayer
scale function and wavelet are defined in the frequency domain in terms of function v by means of well-known
equations
Consider any random signal or periodic signal with
noise
Based on the sampling frequency of 256Hz, the signal was decomposed into 5 levels with each level having a
detail coefficient, D(n) and an approximate coefficient, A(n)
Wavelet Scale to Frequency

A way to do it is to compute the center frequency, Fc, of the wavelet and


to use the following relationship. Fa=Fc/(a⋅Δ) where a is a scale. Δ is the
sampling period. Fc is the center frequency of a wavelet in Hz. Fa is the
pseudo-frequency corresponding to the scale a, in Hz. The idea is to
associate with a given wavelet a purely periodic signal of frequency Fc.
For the generated signal

Get the approximation and detail coefficients

Plot and interpret how the details coefficients vary with the
decomposition level?

How to select a particular mother wavelet for a given signals based


on the entropy value?
Feature Extraction methods
Summary

Introduction
Data characteristics
Application & domain
Feature extraction methods
Feature dimensionality reduction
Issues in real applications
Summary
Where Feature Extraction fits in PHM ?

Data Acquisition (DA) normalization, smoothing, outlier


removal, missing data imputation

Data Manipulation (DM)


Feature Extraction in
data-driven PHM solutions
State Detection (SD)

Health Assessment (HA)

Prognostics Assessment (PA)

Advisory
Feature extraction: what and why
What:

Feature extraction transforms raw signals into more


informative signatures or fingerprints of a system
Why:
•Extract information from data
•Serve the need of follow-up modeling procedures
•Achieve intended objectives
Problem: bearing health assessment
Data: vibration (from accelerometers)
Extract frequency domain features:

• Segment the data with a certain time window

• Transform each segment into frequency


spectrum with FFT

• Calculate energy for each frequency band


around interested frequency F where Af is the
amplitude of frequency f

• Obtain feature vector [EF1, EF2, ...]


T=f

Tukey cooley, 1965 , Fast Fourier transform

Fourier Transform
Mathematical Background: Complex Numbers

• A complex number x is of the form:

α: real part, b: imaginary part

• Addition:

• Multiplication:
Mathematical Background: Complex Numbers

• Magnitude-Phase (i.e.,vector) representation

Magnitude:

Phase:
φ
Magnitude-Phase notation:
Mathematical Background: Complex Numbers

• Multiplication using magnitude-phase


representation

• Complex conjugate

• Properties
Mathematical Background: Complex Numbers

• Euler’s formula

• Properties
Mathematical Background:
Sine and Cosine Functions
• Periodic functions
• General form of sine and cosine functions:
Mathematical Background:
Sine and Cosine Functions
IFT DFT

Special case: A=1, b=0, α=1


f (Hz)

π 3π/
π/ 2
2

π
π/ 3π/
2 2

Mse Zero – perfect reconstruction of original signal


Mathematical Background:
Sine and Cosine Functions (cont’d)

• Shifting or translating the sine function by a const b

Note: cosine is a shifted sine function:


Mathematical Background:
Sine and Cosine Functions (cont’d)
• Changing the amplitude A
Mathematical Background:
Sine and Cosine Functions (cont’d)
• Changing the period T=2π/|α|
consider A=1, b=0: y=cos(αt)
α =4
period 2π/4=π/2

shorter period
higher frequency
(i.e., oscillates faster)

Frequency is defined as f=1/T

Alternative notation: cos(αt)=cos(2πt/T)=cos(2πft)


f1

f2

f= f1+f2+f3
f3
Image Transforms
• Many times, image processing tasks are
best performed in a domain other than the
spatial domain.
• Key steps: y f(x,y)

(1) Transform the image


x
(2) Carry the task(s) in the transformed domain.
(3) Apply inverse transform to return to the
spatial domain.
Notation
• Continuous Fourier Transform (FT)

• Discrete Fourier Transform (DFT)

• Fast Fourier Transform (FFT) algorithm to


implement DFT
Fourier Series Theorem

• Any periodic function f(t) can be expressed


as a weighted sum (infinite) of sine and
cosine functions of varying frequency:

is called the “fundamental frequency”

100 , 200 400 600 even harmonics of the


fundamental frequency

100 300 500 700 odd harmonics of the


fundamental frequency
Fourier Series (cont’d)

α
1
α
2

α
3
Continuous Fourier Transform (FT)
• Transforms a signal (i.e., function) from the
spatial (x) domain to the frequency (u)
domain.

where
Why is FT Useful?
• Easier to remove undesirable frequencies.

• Faster perform certain operations in the


frequency domain than in the spatial domain.
Example: Removing undesirable frequencies

F(u + Δu)
F(u) =0 frequencie
noisy signal
s

To remove certain remove high reconstructe


frequencies d
frequencies, set
signal
their
corresponding F(u)
coefficients to zero!
How do frequencies show up in an image?

• Low frequencies correspond to slowly varying


information (e.g., continuous surface).
• High frequencies correspond to quickly
varying information (e.g., edges)

Original Image Low-passed


Example of noise reduction using FT

Input image Spectrum

Band-pass Output
filter image
Frequency Filtering Steps
1. Take the FT of f(x):

2. Remove undesired frequencies:

3. Convert back to a signal:

We’ll talk more about these steps later .....


Definitions
• F(u) is a complex function:

• Magnitude of FT (spectrum):

• Phase of FT:

• Magnitude-Phase representation:

• Power of f(x): P(u)=|F(u)|2=


Example: rectangular pulse

magnitude

rect(x) function sinc(x)=sin(x)/x


Extending FT in 2D
• Forward FT

• Inverse FT
Example: 2D rectangle function
• FT of 2D rectangle function

2D sinc()
Discrete Fourier Transform (DFT)
Discrete Fourier Transform (DFT) (cont’d)

• Forward DFT

• Inverse DFT

1/NΔx
Example
Bb (Hz)

T Seconds DFT

N samples N Samples

n = 0, 1, 2 ………………. N-1
T= 0, Δt, 2Δt, ………….. 1/ Δt = N/T sampling frequency

T/N = Δt – Sampling interval


Magnitude and phase information
Computation required

k = 0, 1 ,2 ………..N-1

N – Complex multiplication
N-1 Complex addition
for each k

O (N2 ) Computations

O (N log2(N) ) – FFT

N 1000 106 109


N2 106 1012 1018 1018 ns = 31.2 years
N log2(N) 104 20 X 106 30 X 109 30 X 109 ns = 30 sec
FFT makes use of the symmetric properties of

WN =

Periodic in k and n
Decimation in time FFT algorithm

Big DFT using the smaller ones WN =


N = 2m
Separte X[n] in to even and odd indexed sub sequences

n =2r
n =2r+1 , r= 0, 1 , 2 ………… N/2 -1
N/2 DFT of even samples, Xe [K] N/2 DFT of odd samples, X0 [K]

SUM OF two N/2 DFT points


Example : N = 8

X[0] Xe[0] 1
w80 X[0]
X[2] Xe[2]
w81 X[1]
N/2 DFT 1
X[4] Xe[4] 1
w82 X[2]

Xe[6]
1
w83 X[3]
X[6] 1
1

X[1] X0[0] w84 X[4]


X[3] w85
X0[1] X[5]
N/2 DFT
w86 X[6]
X[5] X0[2]
w87
X[7] X0[3] X[7]

(N/2)2 2 + N = N2/2 + N multiplies


Keep splitting
Each N/2 DFT 2 N/4 Point DFTs

N/2 , N/4, N/16 , ……… N/2p-1 , N/2p =1 where p = log2N times

N /2 2(N/2)2+ N = N2/2 + N

N /4 2( 2(N/4)2 + N/2 ) +N = N2/4 +2 N

P : N/2p =1 N2/2p +PN = N2/N + N log2N

O ( N Log 2 N) operations for large N


Extending DFT to 2D
N
• Assume that f(x,y) is M x N.
M

• Forward DFT

• Inverse DFT:
Extending DFT to 2D (cont’d)
• Special case: f(x,y) is N x N. N

N
• Forward DFT

u,v = 0,1,2, …, N-1

• Inverse DFT
x,y = 0,1,2, …, N-1
Extending DFT to 2D (cont’d)
2D cos/sin functions
Visualizing DFT
• Typically, we visualize |F(u,v)|
• The dynamic range of |F(u,v)| is typically very
large
• Apply streching: (c is
const)
|F(u,v)| |D(u,v)|

original image before stretching after stretching


DFT Properties: (1) Separability

• The 2D DFT can be computed using 1D


transforms only:

Forward DFT:
kernel is
separable:
DFT Properties: (1) Separability (cont’d)
• Rewrite F(u,v) as follows:

• Let’s set:

• Then:
DFT Properties: (1) Separability (cont’d)
• How can we compute F(x,v)?

N x DFT of rows of f(x,y)

• How can we compute F(u,v)?


DFT of cols of F(x,v)
DFT Properties: (1) Separability (cont’d)
DFT Properties: (4) Translation (cont’d)

no after
translation translation
DFT Properties: (5) Rotation
• Rotating f(x,y) by θ rotates F(u,v) by θ
DFT Properties: (8) Average value

Average:

F(u,v) at u=0, v=0:

So:
Magnitude and Phase of DFT
• What is more important?

magnitud phas
• Hint: use the inverse DFT
e
to reconstruct the
e
input
image using magnitude or phase only information
Magnitude and Phase of DFT (cont’d)
Reconstructed image using
magnitude only
(i.e., magnitude determines the
strength of each component!)

Reconstructed image using


phase only
(i.e., phase determines
the phase of each component!)
Magnitude and Phase of DFT (cont’d)
The FFT Algorithm

The running time is O(n log n). [inverse FFT is similar]


Grinding wheel images

(a) Fresh wheel (b) After 30 minutes (c) After 60 minutes

(d) After 90 minutes (e) Worn-out wheel- 120 minutes

Arunachalam et al [2008]
P = VI cos φ
P = I2 R

Time domain

Frequency domain

Magnitude 2 = Power
Clc
Clear all

S= a X sinwt

F= abs(fft(S));
Plot (F)
Power spectral density or PSD How the power of the signal is distributed
FREQUENCY over the frequency content representing the
signal?
Power spectral density plots

(a) Fresh wheel (b) After 30 minutes

(c) After 60 minutes (d) After 90 minutes

(e) Worn-out wheel

Arunachalam et al [2008]
0,v/2
u/2,v/2

y
fft2 u
(-u/2,0) (0,0) (u/2,0)

fftshift

x 0, -v/2
v
Shaped specimens image based surface roughness
assessment
1.6 µm 12.5 µm 50 µm
G( u,v ) = F(u,v)*H(u,v)
Short Time Fourier Transform
(STFT)
A, , m/s2

T, sec
Time
fft fft fft Mean information is
Frequency domain features not retained
Variance
Energy
Entropy Dominant
Amplitude Frequency
Skewness frequency
A1 f1 Kurtosis
A2 f2
A3 f3 Short time Fourier transform
A4 f4

Time of occurrence of a particular event is very important for fault diagnosis


Fourier Transform

• Fourier Transform reveals which frequency


components are present in a given function:

(inverse DFT)

where: (forward DFT)


Tutorial Examples
Examples (cont’d)

F1(u)

F2(u)

F3(u)
Fourier Analysis – Examples (cont’d)

F4(u)
Limitations of Fourier Transform
Fourier Analysis – Examples (cont’d)

Provides excellent
localization in the frequency F4(u)
domain but poor
localization in the time
domain.
Limitations of Fourier Transform
(cont’d)

1. Cannot not provide simultaneous time and


frequency localization.

2. Not very useful for analyzing time-variant,


non-stationary signals.
Stationary vs non-stationary signals
(cont’d)
Stationary signal
(non-varying frequency):

Three frequency
components,
present at all
times!
F4(u)
Stationary vs non-stationary signals (cont’d)

Non-stationary signal
(varying frequency):

Three frequency
components,
NOT present at all
times!
F5(u)
Stationary vs non-stationary signals (cont’d)

Non-stationary signal
(varying frequency):

Perfect knowledge of what


frequencies exist, but no
information about where
these frequencies are
located in time!

F5(u)
Limitations of Fourier Transform
(cont’d)

1. Cannot not provide simultaneous time and


frequency localization.

2. Not very useful for analyzing time-variant,


non-stationary signals.

3. Not efficient for representing discontinuities


or sharp corners.
Representing discontinuities or sharp corners
Representing discontinuities or sharp corners
(cont’d)

FT
Representing discontinuities or sharp corners
(cont’d)
Original

Reconstructed

1
Representing discontinuities or sharp corners
(cont’d)
Original

Reconstructed

2
Representing discontinuities or sharp corners
(cont’d)
Original

Reconstructed

7
Representing discontinuities or sharp corners
(cont’d)
Original

Reconstructed

23
Representing discontinuities or sharp corners
(cont’d)
Original

Reconstructed

39
Representing discontinuities or sharp corners
(cont’d)
Original

Reconstructed

63
Representing discontinuities or sharp corners
(cont’d)
Original

Reconstructed

95
Representing discontinuities or sharp corners
(cont’d)
Original

Reconstructed

127

A large number of Fourier components


is needed to represent discontinuities.
Short Time Fourier Transform (STFT)

• Segment signal into narrow time intervals (i.e., narrow


enough to be considered stationary) and take the FT of each
segment.
• Each FT provides the spectral information of a separate
time-slice of the signal, providing simultaneous time and
frequency information.
STFT - Steps
(1) Choose a window of finite length
(2) Place the window on top of the signal at t=0
(3) Truncate the signal using this window
(4) Compute the FT of the truncated signal, save results.
(5) Incrementally slide the window to the right
(6) Go to step 3, until window reaches the end of the signal
STFT - Definition

Time Frequency Signal to


paramete paramete be
r r analyzed

2D function

STFT of f(t): Windowing Centered at t=t’


computed for each function
window centered at t=t’
Example

f(t)

[0 – 300] ms 🡪 75 Hz sinusoid
[300 – 600] ms 🡪 50 Hz sinusoid
[600 – 800] ms 🡪 25 Hz sinusoid
[800 – 1000] ms 🡪10 Hz sinusoid
Example

f(t)

W(t)
scaled: t/20
Choosing Window W(t)
• What shape should it have?
– Rectangular, Gaussian, Elliptic …

• How wide should it be?


– Should be narrow enough to ensure that the
portion of the signal falling within the window is
stationary.
– Very narrow windows, however, do not offer good
localization in the frequency domain.
STFT Window Size

W(t) infinitely long: 🡪 STFT turns into FT, providing excellent


frequency localization, but no time localization.

W(t) infinitely short: 🡪 results in the time signal (with a


phase factor), providing excellent time localization but no
frequency localization.
STFT Window Size (cont’d)
• Wide window 🡪 good frequency resolution,
poor time resolution.

• Narrow window 🡪 good time resolution, poor


frequency resolution.
Example
different size windows

(Four frequencies, non-stationary)


Example (cont’d)

scaled:
t/20
Example (cont’d)

scaled:
t/20
Heisenberg (or Uncertainty) Principle

Time resolution: How well two spikes in Frequency resolution: How well two
time can be separated from each other in spectral components can be separated
the frequency domain. from each other in the time domain
Heisenberg (or Uncertainty) Principle

• We cannot know the exact time-frequency


representation of a signal.
• We can only know what interval of frequencies are
present in which time intervals.

You might also like