Measurement: Abhishek Dhananjay Patange, Jegadeeshwaran R

Measurement 173 (2021) 108649
Contents lists available at ScienceDirect
Measurement
journal homepage: www.elsevier.com/locate/measurement
A machine learning approach for vibration-based multipoint tool insert

health prediction on vertical machining centre (VMC)
Abhishek Dhananjay Patange a, b, *, Jegadeeshwaran R. a
a
School of Mechanical Engineering, Vellore Institute of Technology, Chennai, Tamil Nadu, India
b
Department of Mechanical Engineering, College of Engineering, Pune, Maharashtra, India
A R T I C L E I N F O A B S T R A C T
Keywords: A cutting tool condition is always considered as a key contributor to the machining operation. In-process failure
Vibration-based health prediction of the tool directly affects the surface roughness of a workpiece, the power consumption of prime mover, and
Multipoint tool insert endurance of the process, etc. Thus supervisory system envisioning its health prediction is drawing industry
Machine learning
attention. This needs to be adopted by a framework which institutes knowledge built data with the intent to
Statistical features
Decision tree algorithm
predict defects earlier and prevent it from failure. In this context, the application of ‘Machine Learning’ (ML)
Tree family classifiers methodology would assist the classification of tool condition and its prediction. In an attempt to monitor the
multipoint tool insert health, an ML-based approach is presented in this paper. The time-domain vibration
response for defect-free and various faulty configurations of four insert tool were collected during face milling
performed on a vertical machining center (VMC). Further statistical features were extracted by designing an
event-driven algorithm in Visual Basic Environment (VBE). Features exhibited in the Decision Tree (DT)
generated by the J48 algorithm serve as ‘most significant’ amongst all extracted features; hence were selected for
further classification. Finally, various tool conditions were categorized using six different ‘supervised-tree-based’
algorithms and a comparative study is presented to find the best possible classifier.
1. Introduction techniques which employ the use of laser scatter pattern, image pro
cessing techniques, scanning electron microscopy, etc. are listed under
The subjects of tool-faults, tool-life, and machinability have attracted the ‘Offline / Direct’ approach. This is typically applicable for investi
extensive research interest most of which have been directed toward gating faults with unpredictable nature and thus acts inappropriate for a
milling operations. This is because of the complexity of milling opera knowledge-based machine learning scheme [3,4]. On the other side,
tions, resulting from the discontinuous nature of the process and the online approach involves acquisition of parameters like cutting forces,
varying chip thickness during the cut. The phenomenon of tool-wear in acoustic emissions, surface roughness, noise, feed rate, cutting speed,
cutting tools is extremely complex, involving and affected by a variety of tool temperature, vibration etc. using various sensors/transducers. Thus
disparate parameters and cutting conditions. Since the type and amount the change in the response of these parameters with respect to time is
of wear to which a tool is subjected ultimately determine its total use useful to examine gradual tool deterioration i.e. for monitoring soft
fulness, tool-life is of extreme importance in any consideration of the faults [5,6]. This makes the online approach most suitable for
economics of machining and the quality of finished products [1]. This knowledge-based machine learning schemes and so advocated herein.
necessitates monitoring of the cutting tool to track the defects and avoid Some of the prominent research investigations related to the study of
them before they get worse. vibration signals to interpret tool health are reviewed here. The signal
The influence of in-process generated defects is untraceable by processing serves as an important stage in attributes computation in
analytical and numerical models proposed by researchers and appears vibration analysis. Besides, it is a challenging facet of health prediction
inapt for real-time tool condition monitoring. The models constructed due to the complex processes involved in the system responses. The
using real-time data reflecting the dynamics behavior of a process serves signal processing mainly aims at the investigation of raw vibration sig
to be most realistic [2]. It is broadly categorized into two classes i.e. nals, which are often complex & noisy, and extract reliable attributes
‘Online / Indirect’ and ‘Offline / Direct’ approach. The vision-based from them. The strategic monitoring of vibrations is beneficial to
* Corresponding author.
E-mail address: [email protected] (Abhishek Dhananjay Patange).
https://doi.org/10.1016/j.measurement.2020.108649
Received 23 December 2019; Received in revised form 14 September 2020; Accepted 20 October 2020
Available online 27 October 2020
0263-2241/© 2020 Elsevier Ltd. All rights reserved.
Abhishek Dhananjay Patange and Jegadeeshwaran R. Measurement 173 (2021) 108649
examine the relationship between tool condition and dimensional var conventional lathe and found 85.28% and 86.34% accuracy respectively
iations, the surface roughness of work-piece over other parameters such [11]. Further Karandikar [24] used a naïve Bayes classifier for both
as noise, temperature, and spindle load to name a few [7]. Vibration continuous and discrete types of signal. The ‘mean force’ feature
evolved during milling is categorized into two types such as dependent extracted from time domain response and summation of tooth passing
and independent. Vibration developed due to inaccuracies or failure of frequency and its harmonics observed from frequency domain response
any machine parts is defined under independent type. On the other were utilized for classification and its effect on tool wear was presented.
hand, periodic variation in machining parameters leads to interrupted Madhusudana [20] used a K-star classifier which is based on the prin
cutting leading dependent type vibrations. Correspondingly it is labeled ciple of ‘entropic measure’ using the probability of converting class into
as self-induced vibrations influenced due to a large range of speeds and another by arbitrarily selecting amongst all possible conversions. The
lesser cutting forces [2]. Usually, the vibration/FFT analyzers are classification accuracy above 90% was obtained using the K-star algo
employed to notice the change in vibration signals to decide fault clas rithm. Painuli demonstrated the use Locally Weighted Learning (LWL) as
sification. Such direct decision making fully depends on costlier classifiers and obtained a classification accuracy of 78.69% [18]. Bin
instrumentation, its software environment, and skilled human inspector saeid [25] used three ML classifiers i. e. RBF-NN (Neural Networks of
having prior knowledge of faults. The study using the time-domain Radial Basis Function), SVM (Support Vector Machine), and MLP
signals reported that the time-dependent vibration response (Multilayer Perceptron), & found that SVM has the highest accuracy.
adequately able to inspects the deterioration level of a cutting tool [8]. Navneet Bohara et al. [26] classified the carbide coated insert condition
The study by Dimla [9] analyzed cutting tool vibrations to map it with for turning operation on the conventional lathe. The dynamics of
the tool wear. The catastrophic failure observed for nose wear with machining and signature analysis when a single insert is employed is
magnitude more than 0.2 mm and linear progress in notch wear was different than multi-inserts. This makes the main distinction in the
perceived at the commencement of machining and retained unchanged approach used by Bohara [26] and presented in the current research.
for the remaining tool life. The experiment carried by Prasad et al. [10], A finding of literature review of the current study is stated herewith:
efforts to find the relationship between flank wear and surface rough
ness by considering cutting tool vibrations. It was observed that the • Selection of conservative factors like spindle speed, cutting depth,
progression of tool wear indicates a rise in the displacement. M. Elan and feed rate to accommodate most adverse cutting conditions for
govan et al. [11] and V. Sugumaran et al. [12] presented an investiga safe & quality machining and deriving arbitrarily remarks about tool
tion on vibration based condition monitoring using time-domain condition is a usual practice. However beyond that there exist un
statistical and histogram features. It was showed that time-based sta known moments leading to the development of faults and need to be
tistical features exhibit more accurate than histogram features. In the explored.
study carried by Drouillet et al. [13], the cutting tool degradation trend • The vibration signal acquisition requires an accelerometer and a data
of a milling process was stated using time-domain analysis and found acquisition system accompanied by lesser supplementary instru
that signature of root mean square power (Prms) is a significant feature mentation. The vibration signals are considered as more informative,
owing to its sensitive nature perceived from change in the fault. The self-explanatory, responsive, attentive, and reliable hence recom
investigation by Mark A. Rubeo et al. [14] analyzed the peak-to-peak mended for onboard health monitoring. Hence, the current study
plot of vibration-time domain signal in milling operation. It was evi exclusively made confined to the investigation of machining vibra
denced that the time domain features exhibits accurate classification tions in time-domain and corresponding cutting tool condition.
and prediction of cutting forces including the flute-to-flute run out ef • Statistical feature extraction and feature selection employing deci
fect. The same plots were also used for the examination of spindle de sion trees algorithm consume negligible time. Also, as far as feature
flections and motion study. In very recent times, Patange et al. [15], classification is considered, the use of ‘supervised-tree-based’ algo
Aralikatti et al. [16], Alemelu et al. [17] demonstrated promising in rithms for training real-time tool insert faults has not appeared yet.
vestigations on condition monitoring using time-domain signals alone. Thus, the application of the Random Forest, Functional tree, Logistic
After acquiring raw signals, the approaches such as descriptive sta Model Regression tree, Best First tree, and Functional tree classifiers
tistics, auto-regressive moving average (ARMA), wavelet transform, is proposed.
decomposition by variational mode, histogram, decomposition by • The studies have been reported for conventional lathe [34,36,42,45]
empirical mode, etc. stand popular to quantify the signals. In experi and conventional milling machines [5,6,7,8,10,38]. Thus the inves
mentation carried by Painuli [18], the classification accuracy of 76.9% tigation for health monitoring envisioning multi-point tool insert
was observed when only 5 features (kurtosis, mode, std. error, variance, fault classification & its prediction on Industrial computer numerical
and standard deviation) were used and accuracy of 72% was obtained controlled vertical machining center (CNC-VMC) is needed.
when all 13 features were involved. The novel method demonstrating
the statistical analysis using the I-Kaz technique was presented by A finding of the literature review has motivated to advocate ML
Ahmad et al [19]. This approach established the association of I-Kaz framework for vibration-based multipoint tool insert health classifica
coefficients with Ƶ∞ values. A reduction in Ƶ∞ values exhibited a rise in tion and prediction on a CNC-VMC.
flank wear magnitude. Madhusudana et al. [20] extracted twelve
descriptive statistical attributes from vibration response of various fault 2. Methodology & contributions
free and faulty conditions of inserts and used only six attributes i.e.
mode, kurtosis, range, standard error, median & mean to obtain the Fig. 1 presents the graphic methodology of the current research.
highest accuracy. The major three methodologies usually employed for A ML-based approach is proposed in this investigation, in an attempt
feature selection includes the use of the J48 DT, the contribution of to demonstrate the supervisory scheme envisioning multipoint tool
features and attribute selection using Principal component analysis insert health prediction. The time-domain vibration response for defect-
(PCA). Sugumaran et al [21] utilized a decision tree based on ‘infor free and various faulty configurations of four insert tool were collected
mation gain’ and the combination of features for choosing attributes during face milling performed on a vertical machining center (VMC).
accordingly. The PCA is a multidimensional features reduction tool Tool faults such as notch wear, wear at nose & flank are considered in
based on the nonparametric method. It retains substantial dissimilarity the current study. Further statistical features were extracted by
in the primary database while dimensionality reduction [22]. Elangovan designing an event-driven algorithm in Visual Basic Environment (VBE).
[23] also employed the use of PCA for upgrading classifier efficiency by Features exhibited in the decision tree generated by the J48 algorithm
dimensionality reduction of data. Two Bayes classifiers (i.e. BayesNet & serve as ‘most significant’ amongst all extracted features; hence were
NaiveBayes) were used in the study of a single-point cutting tool on a selected for further classification. Finally, various tool conditions were
2
Fig. 1. Methodology for multipoint tool inserts health prediction.
categorized using six different ‘supervised-tree-based’ algorithms and a The remaining paper is arranged as: Section 2 describes the experi
comparative study is presented to find the best possible classifier. mentation arrangement with the elaboration of input conditions,
The main contributions are enlisted herein: experimental setup, and vibration signal acquisition. Section 3, 4, and 5
presents step by step Machine Learning approach applied for class pre
• Demonstrates a framework for real-time monitoring of multi-point diction. Results and discussion is stated in Section 6. Section 7 represents
tool inserts faults on industrial CNC-VMC incorporating vibration the limitations of the current study & future scope and section 8 con
signal acquisition and ML approach. cludes the paper.
• Characterization of vibration signatures evolved from unknown
moments leading to in-process tool defects 3. Experimentation and signal acquisition
• Design and training of six different ‘supervised-tree-based’ algo
rithms for classification of fault The setup for experimentation is presented in Fig. 2. It mainly con
• Class prediction considering test data sets based on trained model of sisted of a Vertical Machining Centre (CNC-VMC), an accelerometer, a
tree family classifiers and validation using ‘k-fold’ test mode data acquisition system. The specifications are listed as follows:
3
Fig. 2. Experimental setup.
• Machine tool: CNC-VMC VTEC-CCS00605 (X = 4200 mm, Y = 2600 The orientation and location of the accelerometer play important
mm, Z = 920 mm) role in collecting the desired response. Vibration response is generally
• Work-piece material: Mild Steel collected from the location near to the dynamic component of machines.
• Work-piece shape: Rectangular hollow cube of form ‘C’ As far as the machining on CNC-VMC is considered, the vibration
• Workpiece size: Length = 650, Breadth: 250, Height = 100 (all di evolved during the machining interrupts the cutting tool and therefore
mensions are in mm) to the spindle. Thus the accelerometer was fixed on the spindle frame.
• Milling tool: Milling cutter of dia 63 mm Besides, the accelerometer was held in a vertical direction because the
• Number of inserts: 4 (carbide coated) drive shaft revolves vertically. The time-domain vibration response for
• Operation performed: Face milling defect-free and various faulty configurations of four insert tool were
• Machining parameters: collected.
• Spindle speed = 900 revolutions per minute The sampler frequency was chosen as 20 kilo-Hertz with the help of
• Feed = 2000 mm/min the Nyquist theorem. The sampling length of 2048 was arbitrarily cho
• Cut Depth = 0.25 mm sen which corresponds to 211 such that approximately 2048 data points
• Data acquisition (DAQ): Vibration analyzer (Make DEWE 43 A) of acceleration amplitude (readings). In the beginning, few passes with
• Sensor: Piezoelectric accelerometer (Make PCB-J356A43, accelera rough milling were carried out to remove unwanted deposits and
tion: 500 g, sensitivity: 10 mV/g) eliminate irregularity of work sample until the stability in machining is
achieved. Fig. 4 (a) – Fig. 4 (f) represents the time-dependent vibration
The fundamental machining inputs (speed of the drive, table feed, signatures corresponding to various configurations of a multi-insert tool.
and cutting depth) drive the condition of the tool and overall perfor The time-domain plots reflect the major change in amplitude of accel
mance of machining. A single level of these machining parameters was eration for operation 1 to 6. This is due to the change in insert condition.
selected. The parameters were estimated as per the industry standard. The amplitude is significantly increased for 5th and 6th operation since
Six machining operations were performed considering machining inputs the combination of faults were considered. However, this visual in
as stated above and different tool configurations as specified in Table 1. spection of change in amplitude of analog plots is not useful for accurate
Three most commonly occurring fault types namely notch wear, wear at distinction of various faulty conditions. Thus statistical feature extrac
nose & flank were considered in the study. Fig. 3 (a) – Fig. 3 (d) shows tion and classification using ML algorithms is advocated.
the tool insert with different states such as defect-free (good), notch
wear, wear at nose & flank respectively. The vibration signal was 4. Feature extraction
captured for each of the six machining operations.
To quantify variation between the faulty and fault-free classes of the
tool, the extraction of descriptive statistics representing the signal was
Table 1
Different tool configurations considered for six operations. undertaken. Total 13 descriptive statistical attributes such as (1) Kur
tosis, (2) Standard Error, (3) Maximum value, (4) Skewness, (5) Mini
Insert No. I II III IV Label for
mum value, (6) Range, (7) Count, (8) Summation, (9) Variance, (10)
Operation 1 New New New New A insert class
No. 2 New New New Flank B Standard Deviation, (11) Mode, (12) Median (13) Mean were estimated
wear from the coding. The event-driven algorithm was programmed in Visual
3 New New New Nose C Basic Environment (VBE) of Microsoft Excel and the flowchart of
wear
pseudo-code is represented in Fig. 5.
4 New New New Notch D
wear
5 New New Flank Nose E 5. Feature selection
wear wear
6 New New Flank Notch F The feature selection serves as a decisive step in the ML scheme
wear wear
because it decides the efficiency of the classifiers by suggesting the most
4
a. Defect free (new) b. Wear at flank
c. Wear at nose d. Notch wear

Fig. 3. Defect-free and various defective configurations of the insert.
distinctive attributes over a complete set of descriptive statistics. It ‘information gain’. At last, select the utmost information gained attri
majorly includes the dimensionality reduction of data sets; increases the bute as primary root, distribute the dataset by branches, and follow the
rate of classification; augments classification efficiency and outputs the same on each branch [29,30].
most coherent results [27,28]. Here the ‘J48 DT algorithm’ is considered
for attributes selection and validated using the ‘effect of the number of
5.2. Study to validate feature selection
attributes’ technique.
The features selection can be validated by observing & comparing
5.1. Decision tree algorithm plots of attributes for each tool condition and examining the influence of
the number of attributes on efficiency. Here, the significant feature is
The decision (J48) algorithm categorizes instances into discrete form easily identified by visualizing variation in a class as well as between
and provides structural arrangement containing homogeneous instances classes. The variation between classes should be ‘maximum’ to serve it
i.e. instances with similar values. The J48 algorithm comprises elements as a significant feature. The list of attributes exhibited by the J48 al
such as branches, leaves, nodes, and primary root. The categorization of gorithm can be considered. The inexistent features do not contribute to
condition takes place in top-down ordinance i.e. from primary root to efficiency that’s why kept out from the decision tree. The steps involved
leaf through various branches. The primary root describes the most in evaluating the effect of features on classification accuracy are stated
significant attribute, every inner node describes a trial on a feature, each here. First of all, the feature exhibited at the primary node of the deci
branch describes an outcome and all terminating leaf to describe a class sion tree alone must be used for classification. After, the node with
(condition). The attributes exhibited by a tree are considered to be subsequent significant features accompanied by the previous one can be
significant attributes amongst all others. The thumb rule states that used for classification. Further, the same procedure is to be carried out
‘broader the tree, the more compound the decision rubrics and hence using subsequent significant features exhibited by the decision tree.
more accurate selection & classification’. The key factors which affect Lastly, all other inexistent features can also be used to examine their
the selection of significant attributes are ‘information gain’ and ‘entropy effect on classification accuracy.
reduction’. The J48 algorithm utilizes entropy to measure disorder
which will be further useful to define similarity in a sample. It is easy to 6. Feature classification
interpret results that the entropy is zero if the sample is completely
homogenous and the entropy is unity if the sample is equally distributed. A recent development in various families of classifiers (Tree, Per
The expected decrease of entropy by dividing datasets on basis of at ceptron, Lazy, Bayes, SVM, Regressions, ANN, etc) has shown
tributes is nothing but an ‘information gain’. The creation of a decision imrpovement in classification tasks in terms of accuracy, precision etc.
tree determines the most significant attribute i.e. it earns the utmost The ‘supervised-tree-based’ algorithms exhibit superior classification
information gain. First of all, entropy is determined. The dataset is accuracy and as well as consume negligible time to build the model
extracted into samples. The total entropy is estimated by considering the concerning the comparison of all other ML algorithms, hence selected
proportional summation of entropy of each branch. This subsequent for the current study. Therefore, the principle of tress such as J48,
entropy is deducted from the entropy with no split. This gives the Random Forest (RFT), Best First (BFT), Logistic Model (LMT) Functional
5
(a) Opearation 1 with tool configuration ‘A’ (b) Opearation 2 with tool configuration ‘b’
(c) Opearation 3 with tool configuration ‘C’ (d) Opearation 4 with tool configuration ‘D’
(e) Opearation 5 with tool configuration ‘E’ (f) Opearation 6 with tool configuration ‘F’
Fig. 4. Time domain signals.
(FT), and Simple Cart (SCT) classifiers are discussed. Dichotomiser 3 (ID3) algorithm which turns very remarkable these days
for creating a decision tree for practical datasets. The ID3 algorithm
6.1. Decision tree (J48) classifier starts with the initial set A considering at primary root node and per
forms iterations with unused sub-attribute at the new node and mea
A decision tree serves as an efficient tool in the domain of judgmental sures the disorder i.e. using entropy h(A) in terms of information gain IG
investigation. It is a special algorithm that performs a dual role i.e. while (A). Later, it chooses the minimum entropy attributes (in other words
constructing the classification model it assists in the selection of attri maximum information gain). The segregation of set A is subsequently
butes. It exhibits a hierarchical pattern of decisions similar to tree completed by previously preferred attributes to generate subsets. The
structure as per the corresponding influences based on ‘condition gov thumb rule states that the process takes place repetitively for the attri
erned rubrics’. The general structure of the decision tree includes 3 key butes which not ever designated earlier [32–34].
elements i.e. primary root node at the top denotes ‘test’ on an attribute ID3 is the predecessor to the C4.5 algorithm often utilized for attri
test, braches of the tree denotes ‘classification’ for expected outcomes, butes’ classification. C4.5 uses the information entropy perception as the
decisive leaf node at the bottom denotes ‘actual outcome’ for various same as ID3. The training data consists of formerly categorized samples’
classes. This procedure is to be carried out for each sub-tree if any e.g. set A = a1, a2, …. ai. Every sample Ai is described by a p-dimensional
formed at the new node. The rule sets generated during this process are vector (z1, i, z2,i,….., zp, i). Here attribute value in samples is denoted by
easily tracked by a hierarchical chain from root to leaf [31]. zj, along with the category of ai. If entire samples in the set appear in the
Several algorithms are designed, developed, and implemented suc same category then the algorithm simply roots a leaf node suggesting a
cessfully for optimized classification e.g. versions of Iterative Dichoto selection of that category. The case where not a single of the attribute
miser (ID3, ID4, ID5), Classification and Regression Tree (CART), exhibits information gain, then algorithm roots a higher upped decisive
Concept Learning Systems (CLS), etc. Ross Quinlan designed Iterative node with the projected value of the category. Once more, it roots a
6
but the LMT constructs a standard structure of the tree using logistic
regression (LR) at each leaf; similar to the model tree. Fig. 6 represents a
generation of the tree.
The leaf node gets separated as two child nodes according to the
threshold. The division to the right-hand side exhibits the value of
feature larger as compared to the threshold and the division to the left-
hand side exhibits the value of feature smaller than the threshold. A
Logit boost algorithm is usually employed to assign the threshold value.
It creates an LR model at each node followed by a partition with aid of
the C4.5 decision tree criterion. Every Logit Boost call is initiated from
its output in the parental node. The cross-validation is used to estimate
the possibility of Logic Boost iterations which does not lead to over-
fitting of training data sets. This makes LMTs more competent as good
as other commanding algorithms such as boosted decision trees [36,37].
It principally involves a set of non-terminal or internal nodes ‘Ni’ & a set
of terminal nodes ‘Tn’. Now we will assume ‘Si’ represents the space for
complete occurrences, covering all attributes in the training data. Then
subdivision splitting of Si into regions Ssb can be witnessed by observing
the tree structure and a leaf of the tree indicates each region as:
∐
Si = = t’
Ssb , Ssb ∩ St’ = ∅fort ∕ (1)
t∈Tn
Similar to the normal decision tree, the terminal nodes t ∈ Tn exhibits

ft as a function of associated LR in place of class tag alone. The regression
function ft considers a subclass Vt ⊆V of total features in a dataset (For
regression with an assumption of binarization of nominal features), and
models the class membership probabilities as
eFj (x)
Pr(G = j|X = x) = ∑J (2)
Fk (x)
k=1 e
where
∑
Fj (x) = αj0 + αj0 ∙v (3)
v∈Vt
or, equivalently,
∑m j
Fj (x) = αj0 + α ∙vk
k=1 vk
(4)
j
if αvk = 0 for vk ∕∈ Vt
The classification model of the LMT is represented by,
∑
f (x) = ft (x)∙I(x ∈ Ssb ) (5)
t∈Tn
here I(x ∈ Ssb ) is ‘one’ if x ∈ Ssb & ‘zero’ otherwise.

Fig. 5. Flowchart of pseudo-code for statistical features extraction. It is very important to notice that both standalone simple DT and LR
are featured cases of the LMT. Initially, LMT pruned to the origin and
higher upped decisive node with the projected value for another case then a tree for which Vt = ∅for entire t ∈ Tn . Idyllically, the LMT is
where the formerly hidden category comes across. C4.5 is advanced as expected to adapt to a varying dataset. A simple LR provides a superior
compare to ID3 concern to: bias-variance trade-off for smaller dataset, whereas the LMT just com
prises of a single LR model so that it pruned to the origin or else, a more
• In C4.5, both continuous and discrete features can be treated. A illustrative structure is satisfactory for other datasets [38].
threshold is extracted amongst the data and then features are cate
gorized according to threshold with the condition of ‘more or less
than or equal to’. 6.3. Random Forest classifier
• The disappeared features are denoted by ‘?’ and are not considered in
the computation of information gain. A (binary) decision tree is a series of (binary) questions which can be
• It eliminates irrelevant branches & substitutes it by leaves; which is programmatically schematized as nested if-then-else statements and
nothing but ‘Pruning post-generation’ [35]. represented as an acyclic graph directed outwards from a root node and
having the following properties:
6.2. Logistic model (LM) tree classifier • Each node can have zero (a leaf) or two descendants (a test node)
• Each node has one parent, except for a single node called the root
It is a dual-scheme classifier incorporating modeling logistic which has no parent.
regression while training of a decision rule sets. The classic decision tree
having constants located at leaves generates a piecewise-constant model Tree growing uses a top-down divide and conquer strategy to build a
7
Fig. 6. Generation of LMT.
tree of small complexity fitting very well the learning sample. The goal input feature and a question about its values that will divide the
of tree growth is to divide the attribute space into as small as a possible current learning set in two parts.
number of regions that contain essentially only samples of a single class. • Evaluation of the splits: A score measure has to be used to decide
In a nutshell, the method starts with a trivial tree composed only of its which is the best question to ask, what is the best feature to use at the
root and the complete learning sample: it then tries to split the learning current node, and with which threshold? Since the idea is to generate
sample by finding a test (or a question) based on one of the input fea the purest learning subsets in terms of the output labels, the basic
tures, in such a way that objects of different classes correspond as much principle is to measure the purity improvement made by any
as possible to different outcomes of the test. Once this test has been candidate split, yielding score measures based on the class purity
found, the method splits the learning sample into two subsamples cor improvement between the sub-sample reaching the current node and
responding to the two possible outcomes of the test and proceeds by those of its resulting two sub-sub-samples.
recursively building the corresponding subtrees based on these sub • Deciding under which conditions should a node become a leaf: the
samples. Notice that before deciding to expand any node the algorithm stopping rules will ensure that the tree has a finite number of nodes?
verifies whether or not the current node should or not become a leaf of Different rules exist; some of them are data-driven while the others
the final tree. Fig. 7 represents the generation of a binary tree [39]. are user-defined [40].
Thus, the three key ingredients for growing a decision tree are the
following: Random forest classifier is based on the principle that it develops a
cluster of judgmental structures at a training period followed by inte
• Definition of a set of candidate splits: Based on observations, we need gration with each other to achieve enhanced, consistent, and trust
to find a ‘question’ that will partition the learning set in two sub worthy classification and prediction. The set of trees categorizing with
groups. In the standard method, a split is defined by choosing an oblique hyper planes achieves better accuracy as it propagates with no
Fig. 7. Generation of a binary tree.
8
over-fitting. The assumption is made that the set of trees are randomly the purest nodes splitting for every stage. The objective of splitting is to
constrained to be sensitive to only nominated distinct attributes. It estimating the split leading to the maximum reduction in impurity.
rectifies the ‘over-fitting’ issue to corresponding training data sets. It Thus, the split that offers maximum Gini or information gain must be
investigates the utmost significant attribute between random forests found out. The estimation of maximum information or Gini gain
instead of searching while partition of the node; hence it provides a considering a split at a predecessor nodes drives the minimum value of
better model due to enlarged motley collection. The output of the the weighted summation of the Gini or information gain of its successor
random forest classifier is exhibited in terms of an average or weighted ones. Similar to normal DT, the BFDT is modeled with the ‘divide &
average of total negotiated leaf nodes. The error rate is indicated by the conquer’ style. Every non-terminal node tests an attribute and assigns a
amount of inter tree correlation, the greater error rate is a consequence classification [44]. Three significant considerations shall be employed
of inter tree correlation, and therefore as uncorrelated as possible is during modeling of the BF tree as stated here:
desirable. Fig. 8 shows the generation of random forest tree [41].
The following are two strategies help de-correlate the individual • To identify the most significant features for splitting at every node.
trees in the forest, which is what improves generalization: • To identify which node in the node, the list is to be expanded next.
• To make the decision for stopping tree growth.
• For each tree, train on a sub-sample of a training datasets (with
replacement). The size of the subsample should be the square root of The BF algorithm generates the fully-enlarged tree using the rule of
the total number of training samples. ‘split-and-sort’ [54]. The Gini Index is used to develop the tree. The
• When training, only use a subset of variables (features) for each equations to determine Gini gain is given as.
node. ∑
gini(p1 , p2 , ⋯., xn ) = px py (6)
This method introduces two meta-parameters:
y∕
=x
∑ ( ) ∑
gini(p1 , p2 , ⋯., pn ) = py 1 − py = 1 − p2y (7)
• Here ‘T’ signifies number of desired trees in the ensemble: Choice of y∕
=x y
T is essentially driven by time/computation limitation. Indeed, the
ory and empirical results demonstrate as larger T the superior. Of here ‘px’ denotes the probability of × class, ‘py’ is the probability of y
course, for given data, after a certain tree in the ensemble, the result class [17].
converges. The first node is sorted and subsequently expanded then. For a case,
• K the number of tested variables at each node: the choice here is if the impurity reduction of this node is zero, then the reduction of entire
dependent on the nature of the problem. If we know that many nodes in the line would be zero, and splitting stops. Relating to the
variables are relevant, a small value of K would be a good choice, on criteria of a stopping, standard DT ceases the expansion of a tree for a
the other hand, when only a few descriptors are informative, a large case where all nodes are pure. However, a fixed number of expansions
value of K would be well suited [42]. can be specified in BF tree training. A BF tree ceases expansion as a fixed
number of expansions are attained. This particular stoppage scheme
enables the BF tree for investigation of new pre-pruning and post-
6.4. Best First (BF) tree classifier
pruning [43,44].
Best First search algorithm expands the most significant node using
heuristic estimation function ‘f(n)’ based on factors like an illustration of 6.5. Functional tree classifier
‘n’, the expected outcome, the information gained during the search,
and any other additional information about datasets. Many times spe A tree induced using features’ fusion is considered as a multivariate
cifically it is performed prediction of paths which are judged to be closer decision tree. In a regression setting, it explores multiple representation
to a solution hence are corresponding expansion is performed first. The languages using linear models at leaf nodes. The functional tree studies
‘significant’ node is described by the property of categorization in such a the influence of features’ fusion at various parts of the decision tree i.e.
fashion that it provides the greatest drop in impurity amongst all nodes all types of node points, leaves, etc. A comprehensive integrated model
[43]. Any tree-based classifier always aims for correct and miniature for a multivariate tree is introduced by ‘Joao Gama’ [45] to evaluate the
model output. Heuristically, to construct the miniature tree, testing of effect of using functional nodes at various positions. It uses the fusion of
complete attributes is needed to select the one attribute which exhibits a univariate tree and a linear function via structural training. Decision
trees induced using the proposed model utilizes decisive nodes and leaf
nodes through multivariate tests and linear functions respectively. The
construction of multivariate decision nodes occurs at the development of
the tree, whereas functional leaves get construct at the pruning of the
tree. It exhibits the same accuracy with the benefit of the use of multi-
variant databases. The multivariate decisive nodes affect the bias of
the error and multivariate decisive leaves affect the variance in the
classification and regression modeling [46].
6.6. Simple CART classifier
Classification & Regression Tree is a decision tree-based supervised

learning approach for building models in ML. CART is a hypernym that
describes decision tree algorithms used for classification and regression
(supervised) learning tasks. CART is Classification and Regression Trees
where the dataset is split into several trees depending on the criteria of
splitting. These criteria are Gini, Entropy, and Variance. The splitting is
done until the terminal node of the tree is reached/created. The limit to
Fig. 8. Generation of random forest tree. which the trees must be created is explicitly given by the model. It is
9
used for predicting classification variables or regression (or continuous presenting all 13 attributes of 240 samples (starting from tool
variables) [47]. As the name of the classifier implies, the use of classi configuration A to F) was created.
fication and regression knowledge is used. The classification trees • Now, the extracted attributes were applied to the J48 decision tree to
identify and categorize the attributes class-wise using the ‘true/false’ identify the attributes which reflect the distinction between the tool
rule. Regression trees, in contrast, are used for attributes in ‘continuous’ configurations. The model of a tree was constructed on the basis of
which predicts the expected output. Considering both at the same time entropy reduction and the structure is depicted in Fig. 9. The attri
builds a CART classifier that uses a recurrent splitting scheme to butes exhibited in this structure (i.e. range, maximum, minimum,
envisage continuous depending attributes (regression) and split discrete skewness, kurtosis, and standard error) serves to be ‘most distinctive’
predictor attributes (classification). The CART is nothing but a binary amongst all; thereby recommended for further classification.
tree that is rooted with the help of a greedy algorithm to select splits. The • The tree has several leaves equal to ‘9′ and the size of the tree is ‘17′ .
algorithm is dependant irrespective of information in a specific kind of • When ‘kurtosis’ is greater than 1.294439, ‘maximum’ is less than or
spread hence is considered as nonparametric. The stopping rule is very equal to 1.551416, and skewness is less than or equal to − 0.1539993
reliable in the case of an overgrown tree as it can be trimmed back to then ‘34′ samples of tool configuration C were correctly classified. On
optimum magnitude. It includes a dual approach i.e. uses test data and k- the other hand, if skewness is greater than − 0.1539993 and range is
turn cross-validation both for evaluating the robustness. The ability of less than or equal to 2.29652 then 6 samples of tool configuration C
the classifier to utilize the same attributes many times in various sections were correctly classified. But when the range is greater than 2.29652
of the tree is favourable to find complicated interrelationship in sets of then 40 samples of tool configuration B were correctly classified.
attributes. CART is the most interpretable of all of the classifiers. It can When ‘kurtosis’ is greater than 1.294439 and ‘maximum’ is greater
work well with mixed-mode data; such as X: discrete or continuous & Y: than 1.551416, then ‘39′ samples of tool configuration A along with
discrete or continuous. The classification Tree can be used when Y is ‘2′ samples of configuration E i.e. total ‘41′ samples were classified
discrete & the Regression Tree can be used when Y is continuous and ‘1′ sample of tool configuration B was wrongly classified. Simi
[48,49]. larly, all other conditions are classified.
• Figs. 10a–10f describes variation in the features for various tool
7. Result and discussion configurations. It was observed that kurtosis, range, maximum,
minimum, skewness, standard error exhibits considerable variation
A face milling operation was carried out for vibration signal acqui for different tool configurations i.e. A to F, and thus serve to be the
sition considering various configurations of 4 insert tools followed by most significant features. The identical agreement with the result of
statistical features extraction, selection, and classification. The stepwise the J48 algorithm is perceived from these plots.
result is discussed here. • A combination of various features is studied for identifying which
amongst the sets reflect the maximum classification accuracy. The
• The extraction of descriptive statistics representing variation be sequence of features exhibited by the J48 algorithm was considered
tween the faulty and fault-free configuration of the tool the signal such that desired accuracy is secured.
was undertaken. • First of all, the feature exhibited at primary node i.e. Kurtosis (most
• Thirteen statistical attributes such as (1) Kurtosis, (2) Standard Error, significant feature) alone is used for classification, the accuracy was
(3) Maximum value, (4) Skewness, (5) Minimum value, (6) Range, 57.91%. When ‘Kurtosis and maximum’ were deployed, accuracy
(7) Count, (8) Summation, (9) Variance, (10) Standard Deviation, was 80%. As per Table 2, all other features were arranged one by one,
(11) Mode, (12) Median (13) Mean were extracted. and accuracy was estimated.
• In this study, a total of 240 samples describing all tool configurations • The maximum accuracy of classification for a combination of the
were analyzed. That means 40 samples belong to each tool config minimum number of features (kurtosis, standard error, maximum,
uration. Every single sample represents of 2048 data points of ac skewness, minimum, and range) was obtained as 96.25% and
celeration changing with respect to time. A training data sheet remained the same until the 11th configuration.
Fig. 9. Structure of the J48 decision tree.
10
Fig. 10a. Class wise variation of ‘Kurtosis’
Fig. 10b. Class wise variation of ‘Range’
• When all 13 features were considered, accuracy was 95.83% thus it is range) was obtained. Thus to secure maximum accuracy within
suggested to use only 6 features for further study in an attempt to lesser computation time, it is suggested to use only 6 features (Kur
secure maximum accuracy within minimal modeling interval. The tosis, Std. error, Max, Skewness, Min, Range) for further
trend of accuracy to the number of attributes is exhibited in Fig. 11. It classification.
shows a sudden rise i.e. from 57.91% to 80.00% for an instance • Further, tool configurations were classified using J48, Random For
where the number of features involved in computation changed from est (RFT), Best First (BFT), Logistic Model (LMT) Functional (FT),
1 to 2. Further, progressive change was observed from the 3rd and Simple Cart (SCT) based on cross-validation of 10-folds. The ML
configuration. The lesser variation was observed in the trend when software programmed in Waikato Environment i.e. ‘WEKA’ [57] was
the number of attributes is more than 3. The maximum accuracy of used for designing and training the ‘tree-based’ algorithms.
classification for the combination of the minimum number of fea • The misperception matrix was used to validate the classification re
tures (kurtosis, standard error, maximum, skewness, minimum, and sults. It represents the categorization of samples for actual and
11
Fig. 10c. Class wise variation of ‘Maximum value’
Fig. 10d. Class wise variation of ‘Minimum value’
predicted tool class. Detailed accuracy of classifiers is elaborated by an agreement that possibly be desired. If this value lies between 80
Precision, False Positive (FP), Receiver Operating Characteristics and 100%, then classification is considered to be best. The MAE
(ROC) area, True Positive (TP), Recall. (Mean Absolute Error) fundamentally estimates the average squared
• TP rate denotes the percentage of accurately identified samples and error of the predictors hence the greater the MAE, the poorer the
the FP rate denotes the percentage of wrongly identified samples. training of the model is. The 2nd root of MSE gives Root Mean
Ideally, TP must be 100% and FP must be 0%. The ratio of accurately Squared Error (RMSE) to make errors and targets mutually scalable.
classified samples to the total samples in the class is denoted by
‘Precision’. A recall is just equivalent to the TP rate. F-Measure gives
the combined calculation of precision and recall. ROC gives an idea 7.1. Classification using decision tree
of how the classifiers are performing in general. The Kappa evaluates
% of samples from the diagonal matrix and later tunes the values for The classification accuracy considering the J48 DT is 96.2%. Fig. 12
12
Fig. 10e. Class wise variation of ‘Skewness’
Fig. 10f. Class wise variation of ‘Standard Error’
represents the misperception matrix for the J48 DT. It is popularly called of all, a true positive rate of configuration ‘A’ is 0.975 which shows 39
a ‘confusion matrix’ a significant tool in examining the performance of a samples out of 40 were rightly categorized as the defect-free condition,
classifier. Diagonal elements in the matrix exhibit correct classification and the FP rate is 0.01 which is almost negligible. In this study, both true
while elements other than diagonal exhibit wrong classification. In positive and false positive rates are very close to the ideal value. Simi
Fig. 12 for tool configuration ‘A’ (4 inserts are defect-free), ‘39′ samples larly, all other parameters like ROC Area, F- Measure, Recall, precision
were correctly placed under ‘A’, and 1 sample was wrongly placed under are also close to one. Compared to the study provided by Patange et al
configuration ‘B’. Next to that, for configuration ‘B’ (3 are defect-free [49] improved classification accuracy (96.2%) is achieved with the J48
and 1 with flank wear), ‘39′ samples were correctly placed under DT.
configuration ‘B’, whereas ‘1′ sample was wrongly placed under
configuration ‘C’ (i.e. nose wear). The matrix is self-explanatory for
further classification. 7.2. Classification using random forest
Table 3 presents a detailed tool for configuration-wise accuracy. First
As mentioned earlier in Table 2, classification has been obtained by
13
Table 2 Table 3
Combination of various attributes and effect on accuracy. Descriptive accuracy: J48 classifier.
Attribute combination Accuracy TP Recall ROC Precision FP F- Tool
(%) Rate Area Rate Measure Condition
Kurtosis (1) 57.91 0.975 0.975 0.981 0.951 0.01 0.963 A

Kurtosis, Max (2) 80.00 0.975 0.975 0.985 0.975 0.005 0.975 B
Kurtosis, Max, Std error (3) 83.75 0.975 0.975 0.985 0.975 0.005 0.975 C
Kurtosis, Std error, Max, Skewness (4) 95.83 0.975 0.975 0.98 0.907 0.02 0.94 D
Kurtosis, Std error, Max, Skewness, Min (5) 96.00 0.95 0.95 0.984 1 0 0.974 E
Kurtosis, Std error, Max, Skewness, Min, Range (6) 96.25 0.925 0.925 0.983 0.974 0.005 0.949 F
Kurtosis, Std error, Max, Skewness, Min, Range, Count (7) 96.25 0.963 0.963 0.983 0.964 0.008 0.963 Weighted
Kurtosis, Std error, Max, Skewness, Min, Range, Count, Sum (8) 96.25 Avg.
Kurtosis, Std error, Max, Skewness, Min, Range, Count, Sum, 96.25
Variance (9)
Variance, SD (10)
Variance, SD, Mode (11)
Variance, SD, Mode, Median (12)
Variance, SD, Mode, Median, Mean (All 13)
Fig. 13. Misperception matrix: Random Forest classifier.
Table 4
Descriptive accuracy: Random Forest classifier.
Fig. 11. Trend of J48 classification accuracy pertaining to the contribution of TP Recall ROC Precision FP F- Tool
several features. Rate Area Rate Measure Condition
1 1 1 1 0 1 A
0.975 0.975 0.999 0.951 0.01 0.963 B
0.95 0.95 1 0.974 0.005 0.962 C
0.95 0.95 0.998 0.905 0.02 0.927 D
0.95 0.95 0.999 1 0 0.974 E
0.95 0.95 0.999 0.95 0.01 0.95 F
0.963 0.963 0.999 0.963 0.008 0.963 Weighted
Avg.
7.3. Classification using Logistic model tree
The classification accuracy considering the Logistic Model Trees

(LMT) Classifier is 96.2%. The self-explanatory misperception matrix is
shown in Fig. 14. The detailed tool condition-wise accuracy is presented
in Table 5.
7.4. Classification using best first tree
The classification accuracy considering Best First Tree (BFT) Classi

fier is 97%. The self-explanatory misperception matrix and the detailed
tool condition-wise accuracy are presented as Fig. 15 and Table 6
Fig. 12. Misperception matrix: J48 classifier.
respectively.
considering 6 features. For Random Forest (RF) classifier accuracy is

96.2%. The self-explanatory misperception matrix is shown in Fig. 13. 7.5. Classification using Functional tree
The detailed tool condition-wise accuracy is presented in Table 4.
The classification accuracy considering Functional Trees (FT) Clas
sifier is 95.4%. The self-explanatory misperception matrix and the
14
Table 6
Descriptive accuracy: Best First Tree Classifier.
TP Recall ROC Precision FP F- Tool
Rate Area Rate Measure Condition
1 1 1 1 0 1 A
0.975 0.975 0.982 0.975 0.005 0.975 B
0.975 0.975 0.984 0.975 0.005 0.975 C
0.975 0.975 0.991 0.907 0.02 0.94 D
0.975 0.975 0.986 1 0 0.987 E
0.925 0.925 0.984 0.974 0.005 0.949 F
0.971 0.971 0.988 0.972 0.006 0.971 Weighted
Avg.
Fig. 14. Misperception matrix: Logistic Model Trees.
Table 5
Descriptive accuracy: Logistic Model Trees Classifier.
0.975 0.975 1 0.975 0.005 0.975 A

0.975 0.975 1 1 0 0.987 B
0.975 0.975 1 0.975 0.005 0.975 C
0.95 0.95 0.993 0.884 0.025 0.916 D
1 1 1 1 0 1 E
0.9 0.9 0.996 0.947 0.01 0.923 F
0.963 0.963 0.998 0.964 0.007 0.963 Weighted Fig. 16. Misperception matrix: Functional Trees.
Avg.
Table 7
Descriptive accuracy: Functional Trees Classifier.
0.95 0.95 1 0.974 0.005 0.962 A

0.95 0.95 1 0.974 0.005 0.962 B
0.975 0.975 0.999 0.951 0.01 0.963 C
0.975 0.975 0.988 0.866 0.025 0.929 D
0.975 0.975 1 1 0 0.987 E
0.9 0.9 0.99 0.974 0.01 0.923 F
0.954 0.954 0.996 0.956 0.009 0.954 Weighted
Avg.
9. The Kappa for all classifiers is above 0.90 therefore considered to be

best. The mean absolute error does not exceed 0.025 value, RMS error
does not exceed 0.015 for all classifiers. The maximum value relative
absolute error is 8.75% for the Random Forest classifier. The maximum
root relative squared error 30.32% observed for the J48 algorithm.
7.7. Discussion on results

Fig. 15. Misperception matrix: Best First Tree.
Figs. 18(a) and 18(b) show a comparison of classification accuracy
detailed tool condition-wise accuracy are presented as Fig. 16 and and time consumed to construct models for different ML classifiers
Table 7 respectively. respectively. It is observed that the Best First Tree algorithm not only
yields the highest accuracy as 97% but also the time required to build
7.6. Classification using Simple CART classifier the model is lowest i.e. 0.2 s. The J48, LMT, RF, SC yields the same
accuracy as 96.2% whereas the FT offers the lowest accuracy as 95.4%
By considering this classifier, accuracy is 96.2%. The self- amongst all tree family classifiers. The time consumed in constructing
explanatory misperception matrix and the detailed tool condition-wise the model for LMT is 5.3 s which is the highest between all.
accuracy are presented as Fig. 17 and Table 8 respectively. Recently, research presented by Patange et al. [49] includes the ML
The comparative study for tree family classifiers is explained in table approach for milling insert condition monitoring through tree-family
15
Fig. 18a. Comparison of tree family classifiers performance based on

the accuracy.
Fig. 17. Misperception matrix: Simple Cart classifier.
Table 8
Descriptive accuracy: Simple Cart classifier.
1 1 1 1 0 1 A
0.975 0.975 0.982 0.975 0.005 0.975 B
0.975 0.975 0.984 0.975 0.005 0.975 C
0.925 0.925 0.987 0.902 0.02 0.914 D
0.975 0.975 0.986 1 0 0.987 E
0.925 0.925 0.978 0.925 0.015 0.925 F
0.963 0.963 0.986 0.963 0.008 0.963 Weighted
Avg.
Table 9
Comparison of parameters for different tree family classifiers.
Sr. Parameters/Tree J48 BF FT LMT RF SC
No. family classifiers Fig. 18b. Comparison of tree family classifiers performance based on the time
1 Properly 96.2 97.0 95.4 96.2 96.2 96.2 required to build a model.
classified samples
(%)
2 Kappa 0.955 0.965 0.945 0.955 0.955 0.955 Table 10
3 MAE 0.015 0.013 0.018 0.019 0.024 0.016 Classification Accuracy of different tree family classifiers using untrained test
4 RMSE 0.113 0.099 0.104 0.093 0.095 0.109 data set.
5 RAE (%) 5.53 4.84 6.77 6.83 8.75 5.97
6 RRSE (%) 30.32 26.59 27.92 24.96 25.70 29.42 Sr. Classifier Classification accuracy (%)
No. Training data(40 Test data 10 fold cross-validation
samples) (8 (40 samples)
classifiers with classification accuracy less than 90% and 8.6 s consumed samples)
for constructing the model is which is considerably higher than the best- 1 J48 99.1 93.2 96.2
first tree classifier presented in the current research. 2 BF 100 98.5 97
3 FT 99.2 96.3 95.4
The trained model is then tested with un-trained data. The acquired
4 LMT 99.5 95.3 96.2
data were categorized into 2 sections such as training and testing with 5 RF 100 95.2 96.2
the ratio of 80% (32 samples out of 40 in each condition) to 20% (8 6 SC 97 95.1 96.2
samples out of 40 in each condition) respectively. First, the classifier
model was trained followed by testing the model with the untrained
(test) data set. Table 10 shows the corresponding results. Compared to 8. Limitations and future scope
all other classifiers, the Best First Tree algorithm produced superior
classification accuracy in this investigation also. Thus the Best First Tree The vibration-based multipoint tool inserts health prediction on the
algorithm acts suitable for real-time implementation of class prediction VMC approach limits itself to the class prediction for explicit data ac
as it performs best in all tests. quired for fixed input parameters of face milling. For the faults gener
The current framework serves to the best in terms of classification ating from unknown moments beyond those which are presented in this
accuracy amongst other investigations published in recent years as study, the trained model can classify the fault corresponding to any of
depicted in table 11. the similar pre-defined classes however at actual it need not be true.
Thus further study considering various levels of tool faults within a class
is the necessity to address the prediction of faults generating from un
known moments in the future. Moreover, for in future on-board fault
16
Table 11 • The supervised-tree-based classification methodology shall drive

Classification accuracy of investigations published on tool fault monitoring in optimum and effective use of the cutting tool, avoid idle periods,
milling. ensure the integrity of a process, and many more.
Sr. Reference Nature of Type of Classifier Accuracy • On-board implementation of this framework shall propel the de
No. Signal features (%) mands of a smart tool insert health monitoring pursuing generaliz
1 Rui Liu Time series Statistical Calibration- 90.00 ability and repeatability.
2020 model • The time-domain vibration response evolved during the tool-
2 Kehua Guo Time series Voting, ANN <90.00 workpiece interface is observed as cyclic & periodic owing to
2020 averaging
inherent insert faults and six different configurations and has suffi
and
weighted- ciently been assisted for appropriate fault prediction considering the
averaging present approach.
3 Zhou, Yang Time- Holder SVM 90.80 • The J48 DT based on the logic of information gain has presented a
2020 frequency Exponents systematic selection of attributes within just 0.7 s over trial and error
4 Jiayu Ou Time- Statistical SVM 91.83
2019 domain
basis selection.
5 Zhou, Guo Time- Holder Gaussian SVM 86.20 • The comparative study reveals that the Best First Tree classifier has
2019 frequency Exponents presented 97% classification accuracy within computation time of
6 Zhengyou Time- Statistical Continuous 91.20 0.2 s thus endorsing the satisfactory training of the model.
Xie 2019 domain de- Hidden
• Moreover, the BF Tree has exhibited 98.5% accuracy for the test data
noised by Markov
wavelet Models that proves the robustness of the model regardless of variants in the
7 Fatemeh Time- Wavelet SVR 85.50 datasets and is thereby inclusive for inter-class fault diagnosis
Aghazadeh frequency packets adaptions.
2018 • Last but not least the trained model of classification shall act as a
8 Ravikumar. Time- Statistical Decision tree 82.41
predictor and can be applied for blind datasets for class prediction in
S 2018 domain de- J48
noised by the future and likely to be advocated its further employment for data
wavelet with fixed instants.
9 N. Time- Statistical & Multilayer 82.50
Gangadhar domain histogram Perceptron
2018
CRediT authorship contribution statement
10 Guofeng Time & Statistical SVM 90.00
Wang 2014 frequency Abhishek Dhananjay Patange: Development or design of method
domain ology, creation of models, implementation of the computer code and
11 M. Time- Statistical Decision tree 77.22
supporting algorithms, testing of existing code components, Methodol
Elangovan domain J48
2011 ogy, Writing - original draft, Writing - review & editing, Visualization.
12 M. Time- Statistical Nu-SVM 88.06 Jegadeeshwaran R.: Methodology, Ideas, formulation or evolution of
Elangovan domain overarching research goals and aims, Verification.
2011
13 Karali Patra Time- Statistical & Back- 89.60
2010 domain wavelet Propagation Declaration of Competing Interest
Packet Neural
Networks
14 M. Time- Statistical & Bayes Net 86.34 The authors declare that they have no known competing financial
Elangovan domain histogram interests or personal relationships that could have appeared to influence
2010 the work reported in this paper.
15 Proposed Time- Statistical Best first tree 97.00
model domain
Acknowledgement
diagnosis and to deal with the diversity between the data distributions, The authors would like to acknowledge ‘Axis Metal-cut Technologies
in advance training of corresponding testing datasets for target domains Pvt. Ltd.’ situated in Bhosari-Pune, India for permitting to carry out
is essential and currently beyond the scope of the present study as it experimentation.
would involve further trials for data expansion. To advocate this
requisite, a deep learning-based generic framework is recommended. References
Furthermore, the unknown faults beyond the generic model if arise then
must be monitored using human intervention as a ‘feedback’ necessary [1] L. N. Lopez de Lacalle, Francisco J. Campa and Aitzol Lamikiz (2011) Milling,
for Reinforcement Learning (RL). Modern Machining Technology: A Practical Guide by J Paulo Davim, Elsevier,
214–303.
[2] Y. Altintas, Manufacturing Automation: Metal Cutting Mechanics, Machine Tool
9. Conclusion Vibrations and CNC Design, Appl. Mech. Rev. 54 (4) (2001) B84.
[3] F.S. Wong, et al., Tool condition monitoring using laser scatter pattern, Journal of
Materials Processing Technology: Elsevier, Science 63 (1997) 205–210.
A vibration-based multipoint tool inserts health prediction scheme [4] Y. Zhou, W. Xue, Review of tool condition monitoring methods in milling
executed on Vertical Machining Centre (VMC) using a machine learning processes, Int. J. Adv. Manufacturing Technol. 96 (2018) 2509–2523.
approach has been well demonstrated in this paper. The results achieved [5] Chuang wen X et al. The relationships between cutting parameters, tool wear,
cutting force and vibration. Advances in Mechanical Engineering-SAGE, 10 (2018)
indicate the framework to be suitable & capable of tool insert health
1–14.
classification & prediction. [6] N. Seemuang, T. McLeay, Slatter T Using spindle noise to monitor tool wear in a
turning process, Int. J. Adv. Manufacturing Technol.-Springer, London 86 (2016)
2781–2790.
• This scheme can be employed on age-old machine tools from
[7] D.E. Dimla, P.M. Lister, On-line metal cutting tool condition monitoring: force and
medium-small scale workshops within fewer instrumental/software vibration analyses, Int. J. Machine Tools Manufacture-Elsevier Science Ltd. 40
resources. (2000) 739–768.
[8] Siddhpura M, Siddhpura A, Bhave S., Vibration as a parameter for monitoring the
health of precision machine tools, International conference on frontiers in design
and manufacturing engineering, Coimatore-India, (2008).
17
[9] D.E. Dimla, The correlation of vibration signal features to cutting tool wear in a [27] V. Muralidharan, V. Sugumaran, Feature extraction using wavelets and
metal turning operation, Int. J. Adv. Manufacturing Technology 19 (2002) classification through decision tree algorithm for fault diagnosis of mono-block
705–713. centrifugal pump, Measurement 46 (2013) 353–359.
[10] B. Prasad, M. Sarcar, B. Ben, Development of a system for monitoring tool [28] M. Amarnath, V. Sugumaran, Hemantha Kumar, Exploiting sound signals for fault
condition using acousto-optic emission signal in face turning–an experimental diagnosis of bearings using decision tree, Measurement 46 (2013) 1250–1256.
approach, Int. J. Adv. Manuf. Technol. 51 (2010) 57–67. [29] A. Sharma, V. Sugumaran, S. Babu Devasenapati, Misfire detection in an IC engine
[11] M. Elangovan, K.I. Ramachandran, V. Sugumaran, Studies on Bayes classifier for using vibration signal and decision tree algorithms, Measurement 50 (2014)
condition monitoring of single point carbide tipped tool based on statistical and 370–380.
histogram features, Expert Syst. Appl. 37 (2010) 2059–2065. [30] A. Joshuva, V. Sugumaran, A lazy learning approach for condition monitoring of
[12] V. Sugumaran, K.I. Ramachandran, Effect of number of features on classification of wind turbine blade using vibration signals and histogram features, Measurement
roller bearing faults using SVM and PSVM, Expert Syst. Appl. 38 (2011) 152 (2020) 107295.
4088–4096. [31] C. Kingsford, Salzberg S L What are decision trees? Nat. Biotechnol. 26 (2008)
[13] Cyril Drouillet, Jaydeep Karandikar, Chandra Nath, et al., Tool life predictions in 1011–1013.
milling using spindle power with the neural network technique, J. Manuf. [32] J.R. Quinlan, Decision trees and multi-valued attributes, in: J.E. Hayes, D. Michie
Processes 22 (2016) 161–168. (Eds.), Machine intelligence 11, Oxford University Press, 1985.
[14] Mark A. Rubeo, Tony L. Schmitz, Global stability predictions for flexible workpiece [33] J.R. Quinlan, Induction of decision trees, Machine Learning 1 (1) (1986) 81–106.
milling using time-domain simulation, Journal of Manufacturing Systems 2016; 40: [34] J.R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers
8–14. (1993).
[15] Patange A D, Jegadeeshwaran R, Milling cutter condition monitoring using [35] J.R. Quinlan, Improved use of continuous attributes in c4.5, J. Artificial
machine learning approach, IOP Conference Series: Material Science and Intelligence Research 4 (1996) 77–90.
Engineering 624: 012030, doi:10.1088/1757-899X/624/1/012030. [36] S.B. Kotsiantis, Supervised Machine Learning: A Review of Classification
[16] Suhas S. Aralikatti, K.N. Ravikumar, et al., Comparative Study on Tool Fault Techniques, Informatica 31 (2007) (2007) 249–268.
Diagnosis Methods Using Vibration Signals and Cutting Force Signals by Machine [37] I. Colkesen, T. Kavzoglu, The use of logistic model tree (LMT) for pixel and object-
Learning Technique, Structural Durability Health Monitoring 14 (2) (2020) based classifications using high resolution World view-2 imagery, Geocarto
128–145. International 32 (2017).
[17] Alamelu Manghai T. M. and Jegadeeshwaran R., Vibration based brake health [38] Niels Landwehr, Mark Hall, Eibe Frank, Logistic Model Trees, Kluwer Academic
monitoring using wavelet features: A machine learning approach, Journal of Publishers 14 (2006) 21.
Vibration and Control 2019; 0(0):1–17. [39] L. Breiman, Z. Ghahramani, “Consistency for a simple model of random forests”.
[18] S. Painuli, M. Elangovan, V. Sugumaran, Tool condition monitoring using K-star Statistical Department, University of California at Berkeley, Technical Report (670)
algorithm, Expert Syst. Appl. 41 (2014) 2638–2643. (2004).
[19] M.A.F. Ahmada, et al., Development of tool wear machining monitoring using [40] Yi Lin, Yongho Jeon, Random forests and adaptive nearest neighbors, J. Am. Stat.
novel statistical analysis method I-kaz, Procedia Eng. 101 (2015) 355–362. Assoc. 101 (474) (2006) 578–590.
[20] C.K. Madhusudana, H. Kumar, S. Narendranath, Condition monitoring of face [41] T. Shi, S. Horvath, Unsupervised Learning with Random Forest Predictors,
milling tool using K-star algorithm and histogram features of vibration signal, J. Computational Graphical Statistics. 15 (1) (2006) 118–138.
Engineering Science and Technology, An International Journal-Elsevier 19 (2016) [42] A. Painsky, S. Rosset, Cross-Validated Variable Selection in Tree-Based Methods
1543–1551. Improves Predictive Performance, IEEE Trans. Pattern Anal. Mach. Intell. 39 (11)
[21] V. Sugumaran, V. Muralidharan, K.I. Ramachandran, Feature selection using (2017) 2142–2153.
decision tree and classification through proximal support vector machine for fault [43] Haijian Shi, Best-first Decision Tree Learning, Master’s thesis submitted to The
diagnostics of roller bearing, Mech. Syst. Sig. Process. 21 (2007) 930–942. University of Waikato, Hamilton, New Zealand 2006.
[22] D. Gangadhar, et al., Condition monitoring of single point cutting tools based on [44] Russell, Stuart J.; Norvig, Peter (2003), Artificial Intelligence: A Modern Approach
machine learning approach, Int. J. Acoustics Vibration 23 (2018) 131–137. (2nd ed.), Upper Saddle River, New Jersey: Prentice Hall, ISBN 0-13-790395-2. pp.
[23] M. Elangovan, et al., Evaluation of expert system for condition monitoring of a 94-95.
single point cutting tool using principle component analysis and decision tree [45] J. Gama, Functional Trees, Machine Learning 55 (2004) 219–250.
algorithm, Expert Syst. Appl. 38 (2011) 4450–4459. [46] Xia Zhao, Wei Chen, GIS-Based Evaluation of Landslide Susceptibility Models Using
[24] J. Karandikar, et al., Tool wear monitoring using naive bayes classifiers, Int. J. Adv. Certainty Factors and Functional Trees-Based Ensemble Techniques, Appl. Sci. 10
Manuf. Technol. 77 (2015) 1613–1626. (2020) 16.
[25] S. Binsaeid, et al., Machine ensemble approach for simultaneous detection of [47] V.T. Tran, B.S. Yang, A.C. Tan, A classification and regression trees (CART) model
transient and gradual abnormalities in end milling using multi-sensor fusion, of parallel structure and long-term prediction prognosis of machine condition,
J. Mater. Process. Technol. 209 (2009) 4728–4738. Structural Health Monitoring 9 (2009) 121–132.
[26] Navneet Bohara, R. Jegadeeshwaran, G. Sakthivel, Carbide Coated Insert Health [48] Gordon, L.. “Using Classification and Regression Trees (CART) in SAS® Enterprise
Monitoring Using Machine Learning Approach through Vibration Analysis, Int. J. Miner TM For Applications in Public Health.” (2013).
Prognostics Health Management 24 (2017) 1–14. [49] Jake Morgan, Classification and Regression Tree Analysis, Technical Report No.
PM931 Directed Study in Health Policy and Management, Boston University,
(2014).
18

Measurement: Abhishek Dhananjay Patange, Jegadeeshwaran R

Uploaded by

Copyright:

Available Formats

Measurement: Abhishek Dhananjay Patange, Jegadeeshwaran R

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Measurement: Abhishek Dhananjay Patange, Jegadeeshwaran R

Uploaded by

Copyright:

Available Formats

Measurement 173 (2021) 108649

Contents lists available at ScienceDirect

A machine learning approach for vibration-based multipoint tool insert

Fig. 1. Methodology for multipoint tool inserts health prediction.

Fig. 2. Experimental setup.

a. Defect free (new) b. Wear at flank

c. Wear at nose d. Notch wear

Similar to the normal decision tree, the terminal nodes t ∈ Tn exhibits

here I(x ∈ Ssb ) is ‘one’ if x ∈ Ssb & ‘zero’ otherwise.

Fig. 6. Generation of LMT.

Fig. 7. Generation of a binary tree.

6.6. Simple CART classifier

Classification & Regression Tree is a decision tree-based supervised

Fig. 9. Structure of the J48 decision tree.

Fig. 10a. Class wise variation of ‘Kurtosis’

Fig. 10b. Class wise variation of ‘Range’

Fig. 10c. Class wise variation of ‘Maximum value’

Fig. 10d. Class wise variation of ‘Minimum value’

Fig. 10e. Class wise variation of ‘Skewness’

Fig. 10f. Class wise variation of ‘Standard Error’

Kurtosis (1) 57.91 0.975 0.975 0.981 0.951 0.01 0.963 A

Fig. 13. Misperception matrix: Random Forest classifier.

7.3. Classification using Logistic model tree

The classification accuracy considering the Logistic Model Trees

7.4. Classification using best first tree

The classification accuracy considering Best First Tree (BFT) Classi­

considering 6 features. For Random Forest (RF) classifier accuracy is

Fig. 14. Misperception matrix: Logistic Model Trees.

0.975 0.975 1 0.975 0.005 0.975 A

0.95 0.95 1 0.974 0.005 0.962 A

9. The Kappa for all classifiers is above 0.90 therefore considered to be

7.7. Discussion on results

Fig. 18a. Comparison of tree family classifiers performance based on

Table 11 • The supervised-tree-based classification methodology shall drive

You might also like

The classification accuracy considering Best First Tree (BFT) Classi