Comparison of Machine Learning Approaches For Time-Series-Based Quality Monitoring of Resistance Spot Welding (RSW)
Comparison of Machine Learning Approaches For Time-Series-Based Quality Monitoring of Resistance Spot Welding (RSW)
Comparison of Machine Learning Approaches For Time-Series-Based Quality Monitoring of Resistance Spot Welding (RSW)
1 Introduction
Technological advances in computing, connecting and sensoring enables a
new wave of productivity improvement. We are currently in the era of Indus-
try 4.0, a term first coined by Kagermann et al (2011), which represents the
fourth technological revolution. In manufacturing, data mining, as one of the
key technologies of Industry 4.0 (Rüßmann et al, 2015), has created new
intelligent tools for automatic information extraction and knowledge discovery
(Wang, 2007; Nagorny et al, 2017). Automated analysis of sensor data from
industrial environments is important for practices such as identifying root causes
of exceptions or making correct decisions (Zhu et al, 2011).
Although enormous amounts of data are generated every day in automatic
manufacturing, the acquired data is not always relevant (such as missing labels)
and thus suitable for data analysis (Wuest et al, 2016). Resistance Spot Weld-
ing (RSW) is a typical fully automated manufacturing process, widely applied
in the automobile industry, with 3000 to 6000 welding spots per car chassis. In
RSW, the two electrode caps of the welding gun (see Figure 1) press two or three
worksheets between the electrodes with force. An electric current then flows
from one electrode, through the worksheets, to the other electrode, generating a
substantial amount of heat as a result of electric resistance. The materials in a
small area between the two worksheets, known as the welding spot, will melt,
and form a weld nugget connecting the worksheets. The electrode caps directly
touching the worksheets wear out easily due to high thermo-mechanical loads
and oxidation, and need to be changed on a regular basis.
The quality of the welding spots is typically quantified by the weld nugget
diameter as defined in the international standards (ISO, 2004) and German
standard (DVS, 2016). The welding quality is then determined as accepted
or rejected according to specific tolerance bands. This work strives to predict
the spot diameter as a numeric value rather than a classification as accepted
or rejected, because the exact diameter values are of great interest for process
experts, who can potentially gain better insights by analyzing which input factors
would influence the diameter values to what extent. After predicting the numeric
values of spot diameters, a classification can still be made according to different
welding conditions and user-defined tolerance bands.
However, the nugget diameter is difficult to measure, and the volume of data
with nugget diameter (or labeled data in data science) is low. The common
practice is to tear the welded worksheets apart and measure the nugget directly,
for example using a micrometer (see Figure 2). Performing quality control in
the automobile industry usually requires the destruction of several car chassis,
which is extremely expensive.
Many previous studies have adopted the approach of data-driven models.
Leveraging the power of data science, data-driven models can assess the nugget
diameters based on the recorded process data. These models are beneficial as
they can potentially perform quality control for every welding spot reliably,
ensuring process capability and reducing costs for quality control. Many authors
treated the problem as classification (Martín et al, 2007; Sun et al, 2017),
while others evaluated the predicted diameters as a regression problem (Ruisz
et al, 2007; El Ouafi et al, 2010; Wan et al, 2016). Their dataset size ranges
from approximately 10 (Cho and Rhee, 2004) to around 3000 labeled welding
4 Baifan Zhou, Tim Pychynski, Markus Reischl and Ralf Mikut
spots (Boersch et al, 2016). Various methods were explored, including Random
Forests (Sumesh et al, 2015), Neural Networks (Afshari et al, 2014), etc. Most of
them used process curves, such as electrode displacement (Li et al, 2012), while
others also used scalar process parameters, such as coating (Yu, 2015).
Most of the previous studies have mentioned that their data is collected
from laboratory experiments. It is questionable whether models developed from
laboratory data are applicable in real production, as the welding conditions (such
as cooling time and wear) are usually different. Apart from the labeled data
amount problem, other difficulties exist in building successful data-driven models.
This paper summarizes these difficulties systematically as three challenges for
data analysis in RSW and in automatic manufacturing:
• Challenge 1 is the limit on labeled data amount, which results from the
difficulty in collecting labeled data in production or even in the laboratory.
• Challenge 2 is the limit on features. Additional sensors, like electrode
displacement, or extra measurements, such as actual worksheet thickness,
may be necessary for reliable quality prediction in manufacturing pro-
cesses. However, more sensors mean higher costs, and an increased risk of
expensive machine stops due to sensor failures. When deciding whether
to install an extra sensor, its benefit needs to outweigh the disadvantages.
It is, therefore, important to understand which sensors or measurements
(or features in data science) are necessary and to quantify their benefit to
justify the higher costs.
• Challenge 3 is the limit on coverage of relevant situations. Quality failures
are very unusual in manufacturing data, because the welding quality
is good under normal conditions, which is the most frequent case in
production. Moreover, quality failures may be of different types, making it
even more difficult to collect sufficient data for reliable quality prediction.
This work will address the first two challenges. They are interpreted as three
questions in collecting costly labeled data:
The authors of this paper suggest the use of physics-based simulation data
generated with a verified Finite Element Method (FEM) model (Schif, 2017)
to help overcome these three challenges. An established simulation model has
four advantages:
1. Each simulation run produces one labeled data point. Running a lot of
simulations can thus offer a large amount of labeled data.
2. There is almost no limit for obtaining features that are costly or difficult
to realize through sensor measurements in laboratory or production.
3. Using astutely designed scenarios, rare situations in production can be
studied in detail.
4. There is no measurement sensor precision issue in data generated from a
simulation model.
This work gives a short introduction to the FEM simulation to cover some points
that the readers may be interested in. The FEM simulation models mechanical
effects (elastic-plastic part deformation and thermal expansion), thermal effects
(temperatures and heat transfer) and electrical effects (electric current density
and electric potential field), taking into account non-linear changes of material
and contact properties (such as electrical and thermal contact conductivity).
Strong interactions between all three fields result in a very dynamic process
behavior and require a multi-field coupled simulation.
The model has been verified using measurements from lab tests under
controlled conditions. Compared quantities were curves of electrical voltage
drops, temperatures and electrode displacements measured during welding as
well as spot diameters and electrode imprints determined by destructive or optical
methods after welding. The main effort of the FEM simulation lies in:
2 Data Description
The most normal conditions in production, i.e. only random variation without
spatter or other disturbances, have been selected as the simulation scenario for
the data studied in this paper. This simple scenario consists of one welding
machine, one type of worksheet pair with identical nominal sheet thickness and
material, and three welding programs for three different target spot diameters
(see Figure 3). A total of 13,952 welding spots with diameter measurements
were simulated. Two types of data exist for each welding spot.
• Time series: Process curves are series of values with time stamps,
referred to as time series in data science. Among the extracted 20 time
series, process input curves exist, such as electric current 𝐼, voltage 𝑈,
resistance 𝑅, and process feedback curves, such as force of the electrode 𝐹,
displacement 𝑠 of the electrode, temperature 𝑇 of certain measurement
positions, etc.
• Single features: Numeric values or text strings that remain constant for a
welding process. These are nominal and measured geometry or material
properties of the caps and worksheets, the number of spots welded,
positions of the welding spots, welding programs, etc. The simulation
dataset can contain up to 235 single features.
1
Diameter (normalized)
0
ProgNo : 1 ProgNo : 2 ProgNo : 3
(a) (b)
Figure 3: (a) Process curves for the three welding programs. ProgNo indicates the program numbers.
The welding programs prescribe the way the welding process is performed, by specifying the process
curves and some additional welding parameters. (b) Boxplot of the diameters for the three welding
programs.
Comparison of ML Approaches for Time-series-based Quality Monitoring 7
• Data splitting into training and test set in this paper is not the conventional
random splitting. In manufacturing, the historical data collected for
training may have slightly different statistical properties than the data in
the later application phase. As mentioned in Section 1, the electrode caps
are regularly changed in RSW. The caps in the collected training dataset
will therefore always be different from the caps in the test set. In this
paper, 14 caps are simulated in total. Data generated with 9 caps (7973
data points) is used for training, while data from the remaining 5 caps
(5979 data points) is used for testing.
• Data is split into subsets of different training data numbers for the
purpose of answering question 1 (“How much labeled data to collect at
the start?”). A series of training subsets is built by randomly selecting a
different number of data points from the training dataset. To allow direct
comparison of the testing results, the test dataset always contains the
same 5979 data points. Machine learning methods are applied to build
models using these subsets with different sizes of training data, and tested
on the same test set (see Table 1).
• Data is split into subsets of different features for the purpose of answering
question 2 (“Which features are important?”). All available features, i.e.
time series features as well as single features, are divided into four subsets
as in the following enumeration. By comparing the performance of
machine learning models trained using the four different feature subsets,
it is possible to estimate the importance of these features.
8 Baifan Zhou, Tim Pychynski, Markus Reischl and Ralf Mikut
› Production Set: Features that are always available in production, such as current,
voltage, resistance (15 single features and 4 time series).
› LabLowCost Set: Production Set + features available in laboratory with relatively
low cost (16 single features and 10 time series).
› LabHighCost Set: LabLowCost Set + features available in laboratory with higher
cost. (29 single features and 14 time series)
› Complete Set: LabHighCost Set + other features that are difficult to realize or
extremely costly (235 single features and 20 time series).
• Extra data sets with manually generated noise are for the purpose
of answering question 3 (“Which precision level should the sensors
have?”). Adding noise to the aforementioned non-noisy 28 subsets results
in another corresponding 28 subsets with noise. The noise levels are
a best engineering guess derived from discussions with process and
measurement experts.
› Sensor data (time series plus single features): Gaussian noise with 2 % standard
deviation (in the time series the noise is added for every single sample point).
› Spot diameter measurement: Gaussian noise with 0.1 mm standard deviation.
First, some simple features are extracted from the time series. The extracted
features are minimum, maximum, minimum position, maximum position,
mean, median, standard deviation, length. We decided to extract only these
features to obtain interpretable features. These extracted features combined
with all other single features are then evaluated and selected using step-forward
selection, starting with the evaluation of each single feature, and incrementally
adding more features. Features that result in models with the best regression
accuracy (RMSE) for the prediction of spot diameters are selected. After feature
selection, three machine learning methods (polynomial regression, neural
networks and k-nearest neighbors) are applied to build different data-driven
models. These models are implemented using the MATLAB toolbox SciXMiner
developed by Mikut et al (2017).
The feature extraction and machine learning models are intentionally kept
simple, for better generalization and automation in other similar applications
in automatic manufacturing. The characteristics of the three machine learning
methods and the reasons for choosing them are explained below.
The hyper-parameters of these models are chosen with 5-fold cross validation
with exemplary training datasets. For all machine learning models trained on
subsets without noise, the hyper-parameters are chosen with the non-noisy
subset of all training data (with 7973 training data points) and the Production
Feature Set. For all machine learning models trained with subsets with noise,
this is performed again with the noisy subset of all training data (with 7973
training data points) and the Production Feature Set.
The hyper-parameter selection results in a first order polynomial regression
model without interaction terms (namely a linear regression model, referred to
as Polynomial1), a multi-layer perceptron with one hidden layer of 16 neurons
(referred to as MLP16), and a k-nearest neighbors model (referred as KNN3).
Details are listed in Table 2.
Table 2: Machine learning models and hyper-parameters selected through 5-fold cross validation.
The experiments show that many features are highly correlated and cause
problems in feature selection due to random effects. For example, some features
are selected for better performance in the feature selection stage, but could show
better or worse performance in a later stage of performance comparison, or even in
another repetition of training and testing. To circumvent this problem, for training
machine learning models on subsets with more features, the selected features
from smaller feature sets will always be used, and the models have the opportunity
to add more features. It is worthy to note this issue of highly correlated features,
for this phenomenon may be prevalent in manufacturing data analysis.
Comparison of ML Approaches for Time-series-based Quality Monitoring 11
4 Results
To answer the three questions proposed in Section 1, it is necessary to compare
the performance of the various models built with different methods and trained
with different subsets. In order to compare the performance, an appropriate
measure is important. Various performance measures were proposed in the
literature. This section first discusses the various performance measures, then
uses the selected measure to compare the models and provides answers to the
proposed questions. Diameter prediction is treated as a regression problem in
this work, as process experts can obtain more information from a predicted
diameter than a mere good/bad classification.
0.12 1
Correlation Coefficient
0.99
0.1
RMSE (mm)
0.98
0.08 Production
Production
LabLowCost
LabLowCost 0.97
LabHighCost
0.06 LabHighCost
All
All
100 250 500 1000 2000 5000 7973 100 250 500 1000 2000 5000 7973
Training data number Training data number
(a) (b)
1 1
Error Within 10 Percent
Error Within 5 Percent
0.95
0.995
0.9
Production Production
LabLowCost LabLowCost
0.85 LabHighCost LabHighCost
All All
0.99
100 250 500 1000 2000 5000 7973 100 250 500 1000 2000 50007973
Training data number Training data number
(c) (d)
RMSE (or MSE; see Figure 5 a) stands for root mean squared error (or mean
squared error) between the reference diameters and predicted diameters. It
is a classic performance measure for regression tasks, and has been used
12 Baifan Zhou, Tim Pychynski, Markus Reischl and Ralf Mikut
0.9 0.9
Production
Production
LabLowCost
LabLowCost
0.85 LabHighCost 0.85
LabHighCost
All
All
100 250 500 1000 2000 5000 7973 100 250 500 1000 2000 5000 7973
Training data number Training data number
Figure 6: Testing results MLP16 on non- Figure 7: Testing results KNN3 on non-noisy
noisy datasets. datasets.
1 1
Production
Error Within 5 Percent
0.9 0.9
Polynomial1
MLP16
0.85 0.85
KNN3
100 250 500 1000 2000 5000 7973 100 250 500 1000 2000 5000 7973
Training data number Training data number
Comparing models trained with non-noisy data and noisy data can be made by
contrasting Figure 5 c and Figure 9. The comparison suggests the performance
deteriorates strongly with noisy data. Generally speaking, the difference between
models trained on these three feature subsets, Production, LabLowCost, and
LabHighCost is insignificant.
data from production or laboratory. Reviewing the starting point of the three
questions, the following conclusions are made based on the datasets generated
by a simulation model in the selected simple scenario (see Section 2), serving
as a guidance for a first plan of data collection:
This paper tries to use data analysis as a guidance for data collection. The iterative
process data-collection → data analysis → data-analysis guided data collection
→ data analysis is an attempt to combine data collection and analysis. A similar
practice would be particularly necessary in fields where a limit on the amount
of labeled data, features, and covering situations is a problem of interest.
Acknowledgements The authors would like to thank Simon Waczowicz for his help using the
SciXMiner toolbox, Alexander Schif for providing the data, Friedhelm Günter, Florian Schmid, Martin
Dieterle for their valuable input on improving the quality of the paper, and Katherine Quinlan-Flatter
for proofreading this article.
References
Afshari D, Sedighi M, Karimi MR, Barsoum Z (2014) Prediction of the Nugget Size in
Resistance Spot Welding with a Combination of a Finite-element Analysis and an
Artificial Neural Network. Materiali in tehnologije 48(1):33–38. DOI: 10.26634/jms.
3.1.3366.
16 Baifan Zhou, Tim Pychynski, Markus Reischl and Ralf Mikut