VIETNAM NATIONAL UNIVERSITY HO CHI MINH CITY
HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY
PROJECT REPORT
PROBABILITY AND STATISTIC (MT2013)
FLANK WEAR OF CNC LATHE TOOLS
Instructor: Ph.D.Phan Thi Huong
Group: CC03 – 03
Ho Chi Minh City, May 11, 2023
VIETNAM NATIONAL UNIVERSITY HO CHI MINH CITY
HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY
PROJECT REPORT
PROBABILITY AND STATISTIC (MT2013)
FLANK WEAR OF LATHE TOOLS
Instructor: Ph.D.Phan Thi Huong
Group: CC03 – 03
Members - Student ID Contributions
Trần Khánh Lộc (Leader) – 2153548 Theory + Comments
[email protected] Nguyễn Khánh Duy - 2153251 Data introduction, Conclusion, Represent the report
Ngô Bảo Đại - 2152496 Code R, Comments
Phạm Công Thiện - 2152996 Code R, Prediction
Nguyễn Trần Việt Hưng - 2152619 Descriptive statistics
Comments Score
CONTENTS
1. DATA INTRODUCTION ............................................................................................... 1
3. THEORETICAL BASIS ................................................................................................. 3
4. TEST STATISTICS ........................................................................................................ 4
5. DESCRIPTIVE STATISTIC........................................................................................... 5
5.1. Import data ............................................................................................................. 5
5.2. Data cleaning ........................................................................................................... 5
5.3. Data visualization .................................................................................................... 5
5.4. Fitting Linear regression models to predict the Flank wear in the following ways....... 9
6. COMPARISON AND SUMMARY ................................................................................ 15
7. CONCLUSION AND EXTENSIONS ............................................................................ 16
8. SOURCE CODE R ....................................................................................................... 17
REFERENCES ..................................................................................................................... 20
LIST OF FIGURES
Figure 1: Nomenclature of single point cutting tool (Kaggle.com) ............................... 1
Figure 2: Code R and the result when reading the file name and viewing the first 6
lines of the file. ............................................................................................................... 5
Figure 3: Code R checking for missing value ................................................................ 5
Figure 4: Code R calculating discriptive statistic values ............................................... 6
Figure 5: Code R and Histogram of Flankwear ............................................................. 6
Figure 6: Code R and Relationship between Flank wear and Feed rate ........................ 7
Figure 7: Code R and Relationship between Flank wear and Cutting depth ................. 8
Figure 8: Code R and Relationship between Flank wear and Spindle speed ................. 9
Figure 9: Code R and Predicting flank wear base on machining parameters Model 1 10
Figure 10: Code R and Predicting flank wear base on machining parameters Model 2
...................................................................................................................................... 11
Figure 11: Code R and Relationship between Residual and Fitted values................... 13
Figure 12: Relationship between Standardized residuals and Normal Q-Q................. 14
Figure 13: Scale Location ............................................................................................. 14
Figure 14: Residual vs. Leverage ................................................................................. 15
Figure 15: Code R ANOVA (Model 1, Model 2) ........................................................ 16
Figure 16: Code R of Prediction based on Model 2 ..................................................... 16
1. DATA INTRODUCTION
Flank wear is an undesirable phenomenon in cutting tools, especially single
point cutting tools. It is due to adhesion and abrasion wear when the cutting tool meets
the workpiece. Flank wear can be measured by differentiating the geometrical
relationship of the rake face image between a new and a worn cutter. We wanted to
explore what factors and how affect flank wear of CNC lathe tools.
Figure 1: Nomenclature of single point cutting tool (Kaggle.com)
File “Flank-wear-of-CNC-lathe-tool-insert-dataset.csv” contains information of
machining parameters and flank wear of CNC lathe tool (mm) and other elements of
the tool. The original dataset is given by:
https://www.kaggle.com/datasets/drganeshkumars/flank-wear-of-cnc-lathe-tool-
insert-dataset
1
All variables of dataset:
Variable Type Descriptions
Feed rate (mm/min) Independent Feed rate is basically the
variable distance at which the tool travels
during its single spindle
revolution. It is defined as the
velocity at which the cutter is
fed. It is represented in the units
of distance per
Cutting depth (mm) Independent The tertiary cutting motion that
variable provides necessary depth of
material that is required to
remove by machining.
Spindle speed (rpm) Independent Spindle speed is the number of
variable revolutions the milling tool on
the spindle makes in unit time.
Flank wear (mm/min) Dependent variable The damage of cutting tools in
using process.
2
3. THEORETICAL BASIS
● A linear regression model describes the relationship between a dependent
variable, y, and one or more independent variables, X. The dependent variable
is also called the response variable. Independent variables are also called
explanatory or predictor variables. Continuous predictor variables are also
called covariates, and categorical predictor variables are also called factors.
The matrix X of observations on predictor variables is usually called the design
matrix.
● A multiple linear regression model is also known simply as multiple regression,
is a statistical technique that uses several explanatory variables to predict the
outcome of a response variable. The goal of multiple linear regression is to
model the linear relationship between the explanatory (independent) variables
and response (dependent) variables. In essence, multiple regression is the
extension of ordinary least-squares (OLS) regression because it involves more
than one explanatory variable.
𝑦𝑦𝑖𝑖 =𝛽𝛽0 +𝛽𝛽1 𝑋𝑋𝑖𝑖1 +𝛽𝛽2 𝑋𝑋𝑖𝑖2 +⋯+𝛽𝛽𝑝𝑝 𝑋𝑋𝑖𝑖𝑖𝑖 +𝜀𝜀𝑖𝑖 , i=1,⋯,n,
Where
● n is the number of observations.
● yi is the ith response.
● βk is the kth coefficient, where β0 is the constant term in the model. Sometimes,
design matrices might include information about the constant term. However,
fitlm or stepwiselm by default includes a constant term in the model, so you
must not enter a column of 1s into your design matrix X.
● Xij is the ith observation on the jth predictor variable, j = 1, ..., p.
● εi is the ith noise term, that is, random error.
● If a model includes only one predictor variable (p = 1), then the model is called
a simple linear regression model.
● ANOVA, or analysis of variance, is a statistical method for separating observed
variance data into multiple components for use in further tests.
3
● For three or more sets of data, a one-way ANOVA is used to learn more about
the connection between the dependent and independent variables.
● There are two main types:
❖ Two-way tests can be performed with or without replication:
❖ One-way ANOVA between groups: used to compare two groups to see
whether there is a difference.
● When you have one group and are double-testing the same group, you may
utilize a two-way ANOVA without replication.
4. TEST STATISTICS
A random variable based on the sample data is the test statistic. The ratio of
variability between groups to variability within groups is what we’re interested in
here. We’ll refer to this metric as F, and it signifies a difference in means. When the
variability between groups outnumbers the variability within groups, the observed F
ratio is high. A low F-ratio indicates that within-group variability is substantially
greater than between-group variability.
The ANOVA formula:
𝑀𝑀𝑀𝑀𝑀𝑀
F= (1)
𝑀𝑀𝑀𝑀𝑀𝑀
Where:
• F : ANOVA coefficient
• MSB : Mean sum of square due to treatment
• MSE : Mean sum of square due to error
The MSB formula:
∑𝑘𝑘
𝑖𝑖=1 𝑛𝑛𝑖𝑖 (𝑋𝑋𝑖𝑖 −𝑋𝑋)
2
MSB = (2)
𝑘𝑘−1
Where Xi is the mean for each group, and X is the overall mean.
4
The MSE formula:
���𝚥𝚥 )2
∑(𝑋𝑋𝑖𝑖𝑖𝑖 −𝑋𝑋
MSE = (3)
𝑛𝑛𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 −𝑘𝑘
Where 𝑛𝑛𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 = 𝑛𝑛1 + 𝑛𝑛2 + ... + 𝑛𝑛𝑖𝑖 with 𝑛𝑛𝑖𝑖 being the sample size of group i
5. DESCRIPTIVE STATISTIC
5.1. Import data
Use command read.csv() to read the file
Figure 2: Code R and the result when reading the file name and viewing the first 6 lines of the file.
5.2. Data cleaning
● Check for the missing data in the file. (Command reference: is.na(), which(),
apply(),...) If there is missing data, suggest an alternative method for those.
Figure 3: Code R checking for missing value
Comments: There is no missing data in the file.
5.3. Data visualization
● For continuous variables, calculate descriptive statistics including: mean,
standard deviation, maximum and minimum values. Export the results as a
table. (Hint function: mean(), sd(), min(), max() , apply()).
● For categorical variables, make a statistical table for each category (Hint
function: table()).
5
Figure 4: Code R calculating discriptive statistic values
• Use the hist() function to plot the distribution of the Frequency and Flank wear.
Figure 5: Code R and Histogram of Flankwear
Comments:
This is not normal distribution (Right side distribution). The distribution of flank
wear appears mostly from 0 to 1.
6
• Use the pairs() command to plot the distribution of the price variable for each
classification group of the Feed rate and Flank wear
Figure 6: Code R and Relationship between Flank wear and Feed rate
Comments:
When Feed rate increases result in the increasement of Flank wear. We can see
that Flank wear and Feed rate have the linear positive correlated relationship.
• Using the pairs() command plot the distributions of the price variable over the
Flank wear, Cutting depth.
7
Figure 7: Code R and Relationship between Flank wear and Cutting depth
Comments:
In the same Feed rate, when cutting depth increases, Flank wear aslo increase.
We can see that Flank wear and Cutting depth have the linear positive correlated
relationship.
• Using the pairs() command plot the distributions of the price variable over the
Flank wear, Cutting depth and Spindle speed.
8
Figure 8: Code R and Relationship between Flank wear and Spindle speed
Comments:
Spindle speed and Flank wear distributed randomly so they don’t have the linear
relationship.
5.4. Fitting Linear regression models to predict the Flank wear in the
following ways
Check for regression coefficents:
• 𝐻𝐻0 : 𝛽𝛽𝑖𝑖 = 0
• 𝐻𝐻1 : 𝛽𝛽𝑖𝑖 ≠ 0
Model 1: Flank wear as a function of Feed rate, Cutting depth and Speed
In this section, we will fit a multiple linear regression model involving all the
features i.e. Feed rate (mm/min), Cutting depth (mm) and Spindle Speed (rpm).
Flank Wear = 𝜷𝜷𝟎𝟎 + Feed rate * 𝜷𝜷𝟏𝟏 + Cutting depth * 𝜷𝜷𝟐𝟐 + Spindle Speed * 𝜷𝜷𝟑𝟑 + 𝜺𝜺
9
Figure 9: Code R and Predicting flank wear base on machining parameters Model 1
Results:
𝐅𝐅𝐅𝐅𝐅𝐅𝐅𝐅𝐅𝐅 𝐰𝐰𝐰𝐰𝐰𝐰𝐰𝐰 = (−𝟕𝟕. 𝟎𝟎𝟎𝟎𝟎𝟎 ∗ 𝟏𝟏𝟏𝟏−𝟏𝟏 ) + 𝟏𝟏. 𝟒𝟒 ∗ 𝐅𝐅𝐅𝐅𝐅𝐅𝐅𝐅 𝐫𝐫𝐫𝐫𝐫𝐫𝐫𝐫 + (𝟐𝟐. 𝟕𝟕𝟕𝟕 ∗ 𝟏𝟏𝟏𝟏−𝟏𝟏 ) ∗ 𝐂𝐂𝐂𝐂𝐂𝐂𝐂𝐂𝐂𝐂𝐂𝐂𝐂𝐂 𝐝𝐝𝐞𝐞𝐩𝐩𝐭𝐭𝐭𝐭
+ (𝟏𝟏. 𝟏𝟏𝟏𝟏𝟏𝟏 ∗ 𝟏𝟏𝟏𝟏−𝟔𝟔 ) ∗ 𝐒𝐒𝐒𝐒𝐒𝐒𝐧𝐧𝐝𝐝𝐝𝐝𝐝𝐝 𝐒𝐒𝐒𝐒𝐒𝐒𝐒𝐒𝐒𝐒
This suggests that in the absence of all the three features,the Flank wear occurs
by (−7.045 ∗ 10−1 ).
The coefficient of the features says the following: An increasement in 1000 rpm
results in an increasement of (1.148 ∗ 10−6 ) in flank wear.
An increasement in 1 mm/min in Feed rate increases the Flank wear by 1.4e and
an increasement in 1 mm in Cutting depth increases the Flank wear by (2.77 ∗ 10−1 ).
10
Comments:
We have Spindle speed Pr = 0.685 > 𝛼𝛼(= 0.05) => can not reject 𝐻𝐻0 => 𝛽𝛽3 = 0.
So, Spindle speed doesn’t have the meaning in the linear regression model.
Model 2: Flank wear as a function of Feed rate and Cutting depth
Flank Wear = 𝜷𝜷𝟎𝟎 + Feed rate * 𝜷𝜷𝟏𝟏 + Cutting depth * 𝜷𝜷𝟐𝟐 + 𝜺𝜺
Figure 10: Code R and Predicting flank wear base on machining parameters Model 2
Results:
Flank Wear = -0.703447 + 1.400254*Feed rate + 0.276985*Cutting depth
This suggests that in the absence of both the features, Flank wear occurs by [-
0.703447].
11
The coefficient of the features says the following: An increase in 1 mm/min in
Feed rate increases the Flank wear by [1.400254] and an increase in 1 mm in Cutting
depth increases the Flank wear by [0.276985].
Feed rate and Cutting depth Pr (< 2 ∗ 10−16 ) << 𝛼𝛼(= 0.05) => Reject 𝐻𝐻0
=> Feed rate and Cutting depth have very strong impact to Flank wear.
Check for linear regression line:
• 𝐻𝐻0 : 𝛽𝛽1 = 𝛽𝛽2 = 𝛽𝛽3 = 0
• 𝐻𝐻1 : ∃ 𝛽𝛽𝑖𝑖 ≠ 0
• F Statistic = 1.807 ∗ 105 > 𝐹𝐹0.05 ; 2; 1997 (= 2.9957) => Reject 𝐻𝐻0 ,
∃ 𝛽𝛽𝑖𝑖 ≠ 0, have a regression line
Assumptions of model:
• Linear: Flank wear has linear relationship with independent variables.
• Errors have normal distribution with Expectation = 0 and variance
unchanged.
• Errors are independent.
Illustrate diagrams to check the assumptions:
12
Figure 11: Code R and Relationship between Residual and Fitted values
Comments:
• Expectation of errors line (red line) is almost a straight line => Satisfy the
linear hypothesis
• The red line goes along the Standardized Residual line =0, so it has Expectation
of errors = 0
• Some data don’t randomly distribute along the Expectation of errors line ->
Variance is not a constant
13
Figure 12: Relationship between Standardized residuals and Normal Q-Q
Comments:
• Standardized residuals deviate from Expected normal distribution line =>
Normal distribution errors do not satisfied.
Figure 13: Scale Location
14
Comments:
• Square root of Standardized residuals don’t distribute randomly along the red
line => Variance hypothesis of errors are constant unsatisfied
Figure 14: Residual vs. Leverage
Comments:
• This figure illustrates the potential outlier.
• But there is no point that lie beyond the Cook’s distance line -> no outlier.
6. COMPARISON AND SUMMARY
15
Figure 15: Code R ANOVA (Model 1, Model 2)
● The flank wear is most influenced by Feed rate followed by Cutting depth.
● Spindle speed has nothing to do with the flank wear.
7. CONCLUSION AND EXTENSIONS
Through building the regression model, we can see the influences of variables on
Flank wear of CNC lathe tool such as
• The flank wear is most influenced by Feed rate followed by Cutting depth.
• Spindle speed has nothing to do with the Flank wear.
We find out that the best model is model 2 and the way to minimize the Flank
wear.
• Predictions:
First, create X attribute according to the parameters X: Feed rate = 1.65, Cutting
depth = 1.32. Then, make prediction for X:
Figure 16: Code R of Prediction based on Model 2
Comment: The predicted mean number of Flank wear is 3.43253, the confident for
the predicted mean number of Flank wear is (3.425143, 3.443363)
16
8. SOURCE CODE R
# đọc dữ liệu
fwc <- read.csv("D:/code/Rprojects/dataset/Final/Flank-wear-of-CNC-lathe-tool-
insert-dataset.csv")
head(fwc)
# làm sạch dữ liệu
apply(is.na(fwc), 2, which)
fwc <- na.omit(fwc)
# làm rõ dữ liệu
mean <- apply(fwc[,2:4], 2, mean)
sd <- apply(fwc[,2:4], 2, sd)
Q1 <- apply(fwc[,2:4], 2, quantile, probs=0.25)
Q2 <- apply(fwc[,2:4], 2, quantile, probs=0.5)
Q3 <- apply(fwc[,2:4], 2, quantile, probs=0.75)
min <- apply(fwc[,2:4], 2, min)
max <- apply(fwc[,2:4], 2, max)
t(data.frame(mean, sd, Q1, Q2, Q3, min, max))
17
# tạo vùng để vẽ đồ thị
flank_wear <- fwc$Flank.wear
step <- .1
pad <- step - ((max(flank_wear) - min(flank_wear)) %% step)/2
breaks <- seq(min(fwc) - pad, max(flank_wear) + pad,by=.1)
breaks <- floor(min(flank_wear)*10):ceiling(max(flank_wear)*10)/10
hist(flank_wear, breaks, xlab = "Flank wear", ylab = "Frequency", main =
"Histogram of Flank wear")
pairs(Flank.wear~Feed.rate,data = fwc,main="Plot of Flank wear for Feed rate")
pairs(Flank.wear~Cutting.depth, data = fwc, main = "Plot of Flank wear for Cutting
depth")
pairs(Flank.wear~Spindle.speed, data = fwc, main = "Plot of Flank wear for
Spindle speed")
model_1 <- lm(Flank.wear~Feed.rate + Cutting.depth + Spindle.speed, data = fwc)
summary(model_1)
model_2 <- lm(Flank.wear~Feed.rate + Cutting.depth, data = fwc)
summary(model_2)
18
anova(model_1, model_2)
plot(model_2)
X <- data.frame(Feed.rate = 2.5, Cutting.depth = 2.3)
predict_X <- predict(model_2, X, interval = "confidence")
predict_X
19
REFERENCES
[1] Hoang Van Ha, Bai giang xac suat thong ke.
[2] Nguyen Tien Dung (Chu bien), Nguyen Dinh Huy, Xac suat Thong ke & Phan tich
so lieu.
[3] Alain F. Zuur, Elena N. leno, Erik H.W. G. Meesters. (2009). A Beginner’s Guide
to R.
[4] Peter Dalgaard. Second Edition. Introductory Statistics with R.
[5] John Verzani. SimpleR
[6] Defiition of ANOVA:
https://www.statisticshowto.com/probability-and-statistics/hypothesis
testing/anova/
[7] ANOVA Example for Portland departing flights in 2014:
[8] https://ismayc.github.io/teaching/sample
problems/anova3.html?fbclid=IwAR1oAYoKJkf99O
ELGpycNdtM5tH3ZtS8TYQKDdAojy MxitGFTUaZfohLo0
[9] Anova example for Portland: https://ismayc.github.io/teaching/sample
problems/anova3.html?
fbclid=IwAR0JBLBM-
MesqYZmBudeaCl8RQyfxiKdlHGLoMVDoCzC3andXhtRbDJRa2A
[10] John Verzani, simpleR – Using R for Introductory Statistics
[11] The dataset: https://www.kaggle.com/datasets/drganeshkumars/flank-wear-
of-cnc-lathe-tool-insert-dataset
20