Adaptive Laser Welding Control A Reinforcement Learning Approach
Adaptive Laser Welding Control A Reinforcement Learning Approach
net/publication/341691013
CITATIONS READS
0 1,007
5 authors, including:
Some of the authors of this publication are also working on these related projects:
Real-Time Sensing, Control and Monitoring of Metal Additive Manufacturing Processes using Artificial Intelligence Techniques View project
All content following this page was uploaded by Giulio Masinelli on 21 June 2020.
ABSTRACT Despite extensive research efforts in the field of laser welding, the imperfect repeatability of
the weld quality still represents an open topic. Indeed, the inherent complexity of the underlying physical
phenomena prevents the implementation of an effective controller using conventional regulators. To close
this gap, we propose the application of Reinforcement Learning for closed-loop adaptive control of welding
processes. The presented system is able to autonomously learn a control law that achieves a predefined weld
quality independently from the starting conditions and without prior knowledge of the process dynamics.
Specifically, our control unit influences the welding process by modulating the laser power and uses optical
and acoustic emission signals as sensory input. The algorithm consists of three elements: a smart agent
interacting with the process, a feedback network for quality monitoring, and an encoder that retains only
the quality critic events from the sensory input. Based on the data representation provided by the encoder,
the smart agent decides the output laser power accordingly. The corresponding input signals are then analyzed
by the feedback network to determine the resulting process quality. Depending on the distance to the targeted
quality, a reward is given to the agent. The latter is designed to learn from its experience by taking the actions
that maximize not just its immediate reward, but the sum of all the rewards that it will receive from that
moment on. Two learning schemes were tested for the agent, namely Q-Learning and Policy Gradient. The
required training time to reach the targeted quality was 20 min for the former technique and 33 min for the
latter.
INDEX TERMS Laser welding, laser material processing, reinforcement learning, policy gradient,
Q-learning, closed-loop control.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 8, 2020 103803
G. Masinelli et al.: Adaptive LW Control: A RL Approach
A less common approach, but that is worth investigat- The design of a keyhole LW control system is made all
ing, is based on more sophisticated regulators that rely on the more challenging by the partial observability of the laser
differential models of the process [5], [6]. But in the case of process. In fact, in-depth information of the PZ can only be
LW, a reliable model can be complicated to obtain, as it has indirectly obtained either by acoustic emission (AE) sensors
to take into account many factors that can drastically vary the or by surface measurements using optical emissions (OE)
process, such as the heating and melting dynamics [5]. sensors [12]. Consequently, it is difficult to provide an effec-
Nevertheless, a preliminary attempt can be found in Na tive feedback from the process to the control system, since it
et al. [7], where the authors presented an algorithm that requires the correlation of the surface measurements with the
automatically builds a model during the operation using the sub-surface events (e.g., pore formation), which is not a trivial
Hammerstein identification technique. task [12]. Nevertheless, some pilot works in LW monitor-
An example of the actual use of a model-based controller ing report successes in identifying quality critic momentary
for laser processes was proposed by Song and Mazumder [6], events from the corresponding AE and OE signals from the
where an experimentally identified model was involved for processed zone [13], [14].
predictive control of laser cladding — a process that is closely The present study starts from the aforementioned pre-
related to LW. This technique heavily relies on its model for liminary results of process monitoring and focuses on the
the choice of the actions to take according to their impact use of Reinforcement Learning (RL) towards keyhole LW
on the environment evaluated with the model itself. To be closed-loop control.
specific, a closed-loop process was used to steer the melt RL appears to be an attractive approach since it enables a
pool temperature to a reference temperature profile. In a model-free learning scheme that is capable of solving com-
real-life scenario, unfortunately, this approach has two major plex problems and provides high adaptability to specific con-
drawbacks. First, the temperature of the melt pool is not ditions through active interaction with a given process [15].
uniformly distributed over its surface [8]. Second, the optimal Moreover, we take advantage of recent advances in Deep
temperature profile can vary during the process, as it strictly Convolutional Neural Networks (DCNN) developments [16],
depends on the geometry, e.g., on the proximity to the edges [17] to derive efficient representations of the laser process
or the boundaries of the workpiece. Thus, the tracking of a from the high-dimensional sensory input — the AE and OE
single fixed target has a direct impact on the system perfor- signals from the PZ — and use them to generalize previ-
mance and so on the desired result. ous experiences to new situations [18]. In our case, indeed,
Similarly, Bollig et al. [9] showed promising results by the input data from the sensors do not contain an explicit
modeling the non-linear process with an Artificial Neural representation of the physical state of the system, as they are
Network and controlling the laser power with a linear model just limited to the optical and acoustic emission. As shown
predictive algorithm based on the instantaneous linearization by Mnih et al. [18], DCNNs can overcome — and even take
of the neural network itself. In this case, the regulator aimed to advantage of — this condition, allowing the system to learn
track a reference penetration depth detected from the intensity meaningful position and scale of irregular structures in the
of the plasma’s optical emission. However, the experimen- data.
tal calibration curve used to map the measured intensity to Concerning the recent advances of RL, its application
the penetration depth may diverge from its real-life values, towards LW was discussed in Günther et al. [19], where
limiting the application of the same methodology in broader a dynamic model substituted the real laser process, and a
scenarios. camera-based system and photodiodes were used for process
In this context, there is a clear need for a widely applicable, monitoring. RL was able to efficiently search for strategies
robust, and cost-effective process control system that ensures for modulating the laser irradiation to compensate for the
high-quality standards. In particular, we focus on deep key- mentioned process instabilities.
hole welding, where the process complexity is even higher Despite the successes of this work, the efficiency of RL
compared to other welding regimes, such as conduction in more complex LW processes remains an open ques-
welding. tion. To close this gap, we inspected the performance of
This welding regime is indeed characterized by the our methodology in the case of keyhole LW and evalu-
co-existence — within a limited volume — of vapor, melt, ated its outcomes in terms of the evolution of the weld
and plasma phases of the processed material [10]. More- quality over time during training. Firstly, the AE and OE
over, it possesses an extremely complex energy-coupling signatures of the desired weld quality were given to the
mechanism that includes Fresnel absorption (due to multi- algorithm, as well as several signatures of undesirable
ple reflections inside the vapor channel) [11]. These com- qualities, without any other prior information about the pro-
plex phenomena generate many process instabilities, making cess dynamics. Further search for the optimal process con-
keyhole welding prone to defects even under constant laser trol strategy was carried out in a completely autonomous
irradiation [10]. Specifically, one of the most critical defects way. Two RL techniques were investigated in this contri-
is porosity. Pores are problematic since they are located inside bution: Q-Learning [20] and Policy Gradient [21], in order
the material and may substantially weaken the mechanical to analyze their strengths and weaknesses in this particular
strength of the welding joint [12]. application.
FIGURE 1. (a) Scheme of the experimental setup and (b) its picture. The labels of the individual components in (a) and
(b) correspond to each other.
FIGURE 2. Structure of the complete control unit made up of three main building blocks: an
encoder that processed the data from the sensory input to retain only the quality critic events,
a smart agent interacting with the welding process, and a feedback network based on a
convolutional neural network for quality monitoring.
(i) using some policy, collect a dataset of transitions: estimation, which is a common problem for standard Q-
Learning realizations [44].
{(st , at , st+1 , rt )}t=1,2,... (2) Moreover, to avoid bad local minima and to reduce the
correlation between observations, a replay buffer B was intro-
duced, as in Mnih et al. [18]. In particular, during step (i) in
(ii) for every transition, compute:
FQI, the collected transitions are added to B. During step (ii),
yt = rt + γ max Qπθ (st+1 , a) (3) we randomly sampled a batch of the accumulated transitions
a from B and used those to compute the targets yt through the
target network (see (3)). Finally, the updates of the parameters
(iii) update the parameters θ: θ in the Q-network were carried out using (4).
" # Here one of the key advantages of the introduction of the
encoder manifests itself. Indeed, it allows a dimensionality
X
θ ← argmin kQπθ (st , at ) − yt k ,
2
(4)
θ t reduction of the input — the reduction factor was 300 in our
setup — allowing us to use a bigger buffer B, avoiding the
where Qπθ denotes the functional approximator of the func- GPU memory saturation.
tion Qπ given by a parametric function with parameters θ. The advantages and disadvantages of Q-Learning can be
In this contribution, θ represents the weights and biases of a explained by the way the targets are computed in FQI. As can
DCNN that takes as input the ordered pair (st , at ) and outputs be seen in (3), the observed reward in just one transition is
an estimate of Qπ (st , at ). γ is a discount factor ∈ (0, 1) to used to calculate the targets yt . In addition, the first term rt
weigh less future rewards and more the immediate ones, rt is in (3) is significant when the estimation of Qπθ is inaccurate,
the reward collected at time t, and yt is a momentary target for as it is a real reward and not an estimation. In contrast,
the computation of the so-called Bellman update in (4) [37]. the second term γ maxa Qπθ (st+1 , a) in (3) is relevant only
The minimization problem in (4) can be solved using gra- when the estimation of Qπθ is reliable, as it is an estimation
dient descent methods. Therefore, it can be addressed using of the total future reward that is supposed to be higher than
the techniques for loss minimization that are common in DL the current one.
frameworks [42], [43]. Consequently, during the Bellman updates (see (4)),
In order to promote the exploration of the state space at the the algorithm relies more and more on the actual estimate
beginning of the training, we have used the so-called epsilon- of the Q-value as soon as it becomes sufficiently large. In
greedy technique for step (i) of the FQI [15]. This strategy Q-Learning, as a result, the strategy of sharply reducing the
consists in the use of the following policy for the collection variance of the estimates (the Q-values) is being adopted,
of the transitions: to the detriment of high bias.
1 − ε, if at = argmaxQπθ (st , a)
π(at |st ) = ε
a
(5) G. POLICY GRADIENT
, otherwise, As mentioned above, the main limitation of Q-Learning is the
|A| − 1
high bias in the estimation of the Q-values. This bias origi-
where |A| is the cardinality of the set A and ε ∈ (0, 1). nates from the single-step reward estimator for the targets yt .
Following (5), at each timestamp, the algorithm chooses The Policy Gradient (PG) approach [15], [45], [46] aims to
either a random action with probability ε, or the best action overcome those limits by evaluating the total reward on an
according to the actual Qπ estimate with probability 1 − ε. entire episode. Similarly to other RL algorithms, the objective
As the training progresses, ε is progressively reduced. This of PG is to find the policy that maximizes the expected total
procedure encourages the exploration of the environment at reward in one episode that includes T steps. But contrary to
the very beginning of the training and the exploitation of the Q-Learning, PG does not try to estimate the optimal Q-values,
acquired knowledge at the end. but the parameters of the policy approximating the optimal
To reduce the oscillations or divergence of the policy, policy π ∗ :
the momentary target yt and the Q-value Qπθ (st , at ) were esti-
mated using two separate networks that are known as target θ ∗ = argmaxJ (θ), (6)
θ
network (Qπθ t ) and Q-network (Qπθ ), respectively [18].
During the interaction with the environment, the param- where
" T #
eters of the target network are cyclically updated with the X
parameters of the Q-network. Additionally, in our study, J (θ) = Eπθ r(st , at ) , (7)
the Double Q-Learning technique was used [44]. It consists t=1
in using the Q-network to evaluate the action to take — and θ stands for the policy parameters. In our case, θ rep-
using Qπθ in (5) — and the target network to evaluate the resents the weights and the biases of a DCNN that takes
momentary target yt — using Qπθ t instead of Qπθ in (3). as input the current sensory representation provided by the
The reason was an efficient decorrelation between the encoder (see Section III-A) and outputs the action to be taken
noise in the action selection and the noise in the Q-values (e.g., the power of laser irradiation).
In PG, the functional J (θ ) is estimated as: penetration depth identified via optical inspection of both
surface and cross-section of the workpieces.
T
X Based on the optical inspection, the categories were
J (θ ) ≈ Ĵ (θ) = r(st , at ). (8)
defined as insignificant penetration (achieved with a laser
t=1
power of 20 W), poor penetration (40 W), medium pene-
The optimization of the objective J (θ) is carried out by tration (60 W), highest penetration without pores (80 W),
directly differentiating its estimate Ĵ (θ) and using gradient and porosity (120 W). In total, each category consisted
ascent to update the parameters as: of 150 samples.
The second stage concerns the definition of the reward
θ ← θ + α∇θ Ĵ (θ ). (9) function that determines the reward assignment from the
feedback network to the smart agent.
In particular, the gradient of the objective in (8) is computed Considering that the agent is designed to act to maximizes
as [45], [46]: the collected rewards in the long run, the engineering of the
T T reward is crucial since it influences the learning process.
The reward assigned for every weld quality detected by the
X X
∇θ Ĵ (θ) = ∇θ log πθ (at |st ) r(st , at ). (10)
t=1 t=1
classifier used in our experiments is reported in Table 1.
Clearly, the entire approach relies on a single sample estimate TABLE 1. Rewards assigned for every category detected by the classifier.
of the full expectation (cf. (8)) that, even if unbiased, has a
very high variance.
For this reason, even though this method is potentially able
to provide better results compared to Q-Learning in terms of
the learned policy, it surely requires more learning time.
The implementation of PG was carried out by firstly
randomly initializing the parameters of the policy πθ and
then sampling a trajectory (i.e., collecting all the transitions After the preparation, we let the algorithm interact with the
(st , at , st+1 , rt ) within a single episode). The logarithm of the environment in a completely autonomous way without any
action probabilities, as well as the rewards collected along the further interventions. The performance for both Q-Learning
trajectory, were accumulated and used to calculate the pol- and Policy Gradient is shown in Fig. 3, where the red line
icy’s gradient according to (10). Finally, the parameters were represents the average values of the rewards obtained in
updated following the direction of improvement indicated by every episode, whereas the shaded area denotes the standard
the gradient (cf. (9)). deviation.
The average reward of Q-Learning reached a plateau after
IV. RESULTS AND DISCUSSION approximately 110 episodes, i.e., after performing 110 line
A. RESULTS welds of 10 mm. Taking into consideration the fact that
Prior to starting the interaction with the environment, we wait for 10 s after each line — to permit the agent to
the preparation of the algorithm included two stages, namely: update its parameters and to allow the stage to move in a new
i) collection of the signal database for training the classifier unprocessed position —, this learning period corresponds to
and the encoder, and ii) definition of a reward function. about 20 minutes. In contrast, PG reached a plateau only after
The first step is motivated by the fact that the classifier and 180 episodes (33 minutes). In both cases, additional learning
the encoder — to fulfill the role of guiding the smart agent time had little effect in terms of increment of the quality, and
during its learning process — have to learn to recognize, it only increased the cost in terms of wasted materials and
not just the reference quality, but also several other counter- time.
examples. The dynamics of the agent adaption to the given process
For this reason, we collected the acoustic and optical sig- can be vividly seen in the evolution of the welds using optical
nals from multiple weld experiments at various laser power inspections of the surfaces and cross-sections of the processed
(20, 40, 60, 80, and 120 W). material. Fig. 4 presents the optical images of the welds
It must be emphasized that the weld quality depends the- corresponding to the first, the 40th, the 80th, and the 110th
oretically not only on the laser power but also on the work- episode of the Q-Learning training process. To be specific,
piece velocity and its physical properties such as optical and Fig. 4 (a) shows the light microscope images of the top views
thermal [10]. But in this work, since the latter factors were of different episodes, whereas Fig. 4 (b), the corresponding
invariable, the former one is used to define the weld quality. cross-sections.
The sensors’ signals were acquired during three weld It has to be noted that the results in Fig. 4 show an evolution
experiments at each laser power, then partitioned in samples of the weld quality that is consistent with the increment of the
of 20 ms (see Section III-D, for details), and finally grouped reward observed in Fig. 3. Indeed, in Fig. 4 (a), episode 1 —
in 5 categories according to the weld quality in terms of i.e., beginning of the training — signs of unstable controlled
FIGURE 3. Performance in terms of average reward per episode over time for Q-Learning and Policy Gradient. The red line
represents the average reward over an episode, whereas the shaded area indicates the standard deviation. An episode
corresponds to the weld of a line of 10 mm and has a duration of 1s. Between one line to the other, we wait for 10 s to permit
the agent to update its parameters and to allow the stage to move in a new unprocessed position.
laser power can be seen on the weld surface. The black B. DISCUSSION
marks on the weld correspond to oxidation, which is also Whether the classifier is of unquestionable fundamental
an indication of local overheating due to inaccurate laser importance as it allows the monitoring of the process, the use
control leading to a poor weld quality in terms of mechanical of the encoder, on the other side, is debatable. The encoder has
properties [12]. indeed some pros and cons that were not obvious before the
This aspect is even more evident from the cross-sections experiments. As stated in Section III-A, its advantages consist
(Fig. 4 (b), episode 1), which is characterized by rapid vari- of an effective reduction of the state space dimensionality
ations of the weld penetration depth along the line. In this that potentially simplifies the search of the optimal param-
specific case, the local overheating of the material was taking eters of the smart agent by capturing a proper parametriza-
place due to the application of a too high level of laser power tion of the signal that can focus only on quality critical
generating a highly unstable keyhole that led to the trapping events.
of pores inside the material during the keyhole collapse [10]. In contrast, its drawbacks derive from its output representa-
The red arrows highlight the pore locations in the magnifica- tion, that could not be entirely suited for deriving the dynam-
tion in Fig. 4 (b). ics of the system, as its temporal resolution is non-uniform
After 40 trials, i.e., about 7 min from the beginning of [47]. As a result, the sensitivity of the algorithm to some
the training (Fig. 4, episode 40), the welds started to be actions could be reduced and potentially bringing to poor
characterized by smoother changes in surface textures and process control.
penetration depth. For the sake of verifying the effectiveness of the encoder,
Confirming the positive trend, significant signs of progress we have also tried to exclude it from the processing pipeline
are obtained just after performing other 40 more welds and directly provide the high dimensional raw signals from
(Fig. 4 (a), episode 80, about 15 min from the beginning), the sensors as input to the agent.
when the texture of the weld surface started to present no per- It resulted in a marginally slower convergence rate in terms
ceivable non-uniformities. Nevertheless, some fluctuations of the number of episodes (in the order of tens of episodes),
in the penetration depth can still be observed (Fig. 4 (b), but the two strategies were able to achieve the same results.
episode 80). We believe that this behavior can be explained by the
Finally, a weld comparable to the reference one was only very first convolutional layer of the agent (see Fig. 2) that,
achieved after the completion of other 30 more episodes — if provided with raw signals, can take over the encoder
see Fig. 4 (a), episode 110 (about 20 min from the start), when duty to deliver a good signal representation to the following
the welds began to be characterized by uniform surface tex- layers. However, when excluding the encoder, the computa-
ture and constant penetration depth. Fig. 4 (c) also shows the tions were slowed down due to the more significant input
light microscope images of the cross-sections for the trained quantities, and we had to increase the time between each
controlled and reference welds, respectively. As described episode.
in Section II-D, the latter was realized after an exhaustive It also has to be mentioned that the present work was
search of the laser parameters and achieved a weld depth realized using a well-controlled laboratory environment and
of 150 µm, as shown is in Fig. 4 (c), top image. As can with reliable custom equipment.
be noticed, no measurable differences between the trained These controlled conditions provided a more reproducible
controlled weld and the reference one can be found. laser-material interaction during the welds as they included
Similarly, PG showed identical results apart from a dif- the processing of always the same material with consistent
ferent convergence rate. Indeed, the convergence took about material properties as well as flat surfaces with identical
1.6 times more time compared to Q-Learning (see Fig. 3). surface roughness.
FIGURE 4. Training dynamics of the Q-Learning algorithm in terms of welding quality. (a) light microscope pictures of the top
view of the welded surface at discrete time points of the algorithm’s training; (b) corresponding light microscope pictures of
the cross-section of the welds from (a). The magnification for the first episode is shown on the right. The red arrows indicate
the pores inside the material; (c) reference weld and controlled weld after the completion of the training procedure. The
numbering of the episodes started from the beginning of the training procedure and is indicated on the vertical axis. The arrow
at the bottom shows the direction of the laser scan. The white borders denote the boundary of the weld. The deep weld
penetration at the beginning of each line constitutes the initial condition from which the algorithm needs to regulate the
power.
The well-controlled environment could also be the reason weld quality autonomously. The latter was chosen to be
for the small size of the database needed to train encoder represented by the weld with the highest depth achievable
and classifier, and this detail may be significantly different without porosity in Ti grade 5 workpiece, to meet the indus-
in industrial conditions. trial demand for high-quality keyhole welding. This refer-
ence weld was determined experimentally and attained a
V. CONCLUSIONS weld depth of 150 µ m without porosity with a laser power
This work presents the first results of a study for adaptive of 80 W.
closed-loop control of laser welding based on RL applied on To guide the smart agent, the feedback network and the
a real-life setup. encoder were trained to recognize not just the reference qual-
The developed system includes an encoder that derives ity, but also several other counter-examples. For this reason,
efficient representations from the sensory input for the active we collected the acoustic and optical signals from 15 weld
unit, a feedback network, and a smart agent, which is the experiments at various laser power, namely 20, 40, 60, 80,
active unit itself, that can influence the laser process. The and 120 W.
principle of operation is the following: based on the current The signals were then grouped in 5 categories according
sensory input provided by the encoder, the agent chooses to the corresponding weld quality in terms of penetration
an action, which leads to a change of its sensory input, and depth, which were identified via optical inspection of both
receives a reward — an indirect quality measure of the state the surfaces and the cross-sections of the workpieces, and
the agent ends up in. From this experience — made up by further partitioned in samples of 20 ms. This time span was
the past sensory input, the executed action, the current input, chosen by taking into consideration the requirement of very
and the received reward — the agent tries to optimize the high classification accuracy and computation time within the
outcomes of its actions over time. range of 1–5 ms.
In standard RL approaches, the reward signal is provided After the DCNN classifier and the encoder were trained,
by the environment and is straightforward to derive. In laser the smart agent started its interaction with the laser process
welding, conversely, effective feedback is challenging to by performing line welds with the output laser power being
provide, as the process is only partially observable since controlled autonomously.
in-depth information of the PZ can be obtained only indi- We tested two learning schemes — Q-Learning and Policy
rectly from conventional sensors. This reason motivates the Gradient — and evaluated their performance both in terms of
introduction of the feedback network: a complete monitoring the evolution of rewards over time, and of the resulting weld
system based on a DCNN classifier capable of tracking the quality.
weld quality in real-time. The training time needed for both the algorithms to reach
In the present work, the control unit was implemented the reference quality was 20 minutes and 33 minutes, respec-
to regulate the output laser power while using the acoustic tively. After that time, there was no additional observable
and optical emission as sensory input. The potential of the increment of weld quality and rewards.
system was demonstrated by its capability — without prior The present results demonstrate the ability of RL to learn
knowledge of the process dynamics — to reach a reference a control law for laser welding processes autonomously.
This prospect is highly appealing for the industrial sector [17] Y. Bengio, ‘‘Learning deep architectures for AI,’’ Found. Trends Mach.
as the unit can deal with complex processes without costly Learn., vol. 2, pp. 1–27, Jan. 2009.
[18] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness,
simulation and computational tools. Furthermore, the sensor M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski,
technologies exploited in the present work are commercially S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran,
available and ready for industrial implementation. It must be D. Wierstra, S. Legg, and D. Hassabis, ‘‘Human-level control through
deep reinforcement learning,’’ Nature, vol. 518, no. 7540, pp. 529–533,
emphasized that the proposed framework can also operate Feb. 2015.
with other feedback sensor signals — pyrometer, micro- [19] J. Günther, P. M. Pilarski, G. Helfrich, H. Shen, and K. Diepold, ‘‘Intelli-
phones, or additional photodiodes — making it a rather gent laser welding through representation, prediction, and control learning:
An architecture with deep neural networks and reinforcement learning,’’
versatile tool. Further experiments are planned to explore the Mechatronics, vol. 34, pp. 1–11, Mar. 2016.
potential of this approach on more complex conditions, e.g., [20] C. J. Watkins and P. Dayan, ‘‘Technical note: Q-learning,’’ Mach. Learn.,
with surface irregularities or at the interface between two vol. 8, nos. 3–4, pp. 279–292, May 1992.
different materials. Additionally, we will increase the number [21] R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour, ‘‘Policy gradient
methods for reinforcement learning with function approximation,’’ in Proc.
of control variables, including the workpiece velocity and its 12th Int. Conf. Neural Inf. Process. Syst. Cambridge, MA, USA: MIT
distance from the laser source. Finally, the RL algorithms will Press, 1999, pp. 1057–1063.
be further enriched with techniques for faster convergence, [22] L. Bassi, ‘‘Industry 4.0: hope, hype or revolution?’’ in Proc. IEEE 3rd Int.
Forum Res. Technol. Soc. Ind. (RTSI), Sep. 2017, pp. 1–6.
higher operating frequency, better adaptation under changing [23] T. Le-Quang, S. A. Shevchik, B. Meylan, F. Vakili-Farahani,
materials, and varying noise levels. M. P. Olbinado, A. Rack, and K. Wasmer, ‘‘Why is in situ quality
control of laser keyhole welding a real challenge?’’ Procedia CIRP,
vol. 74, pp. 649–653, Jan. 2018.
REFERENCES [24] F. Vakili-Farahani, J. Lungershausen, and K. Wasmer, ‘‘Process parameter
[1] D. Bäuerle, Laser Processing and Chemistry. Berlin, Germany: Springer, optimization for wobbling laser spot welding of Ti6Al4 V alloy,’’ Phys.
1996. Procedia, vol. 83, pp. 483–493, Jan. 2016.
[2] J. R. Berretta and W. Rossi, ‘‘Laser welding,’’ in Encyclopedia Tribology. [25] S. Shevchik, T. Le Quang, B. Meylan, and K. Wasmer, ‘‘Acoustic emission
Boston, MA, USA: Springer, 2013, pp. 1969–1981. for in situ monitoring of laser processing,’’ in Proc. 33rd Eur. Conf.
[3] A. R. Konuk, R. G. K. M. Aarts, A. J. H. I. Veld, T. Sibillano, D. Rizzi, Acoustic Emission Test. (EWGAE), 2018, pp. 1–10.
and A. Ancona, ‘‘Process control of stainless steel laser welding using [26] F. Vakili-Farahani, J. Lungershausen, and K. Wasmer, ‘‘Wavelet analysis
an optical spectroscopic sensor,’’ Phys. Procedia, vol. 12, pp. 744–751, of light emission signals in laser beam welding,’’ J. Laser Appl., vol. 29,
Jan. 2011. no. 2, May 2017, Art. no. 022424.
[4] S. Postma, ‘‘Weld pool control in ND: YAG laser welding,’’ Ph.D. disser- [27] J. Yang, S. Sun, M. Brandt, and W. Yan, ‘‘Experimental investigation and
tation, Dept. Eng. Technol., Univ. Twente, Amsterdam, The Netherlands, 3D finite element prediction of the heat affected zone during laser assisted
2003. [Online]. Available: https://research.utwente.nl/en/publications/ machining of Ti6Al4 V alloy,’’ J. Mater. Process. Technol., vol. 210, no. 15,
weld-pool-control-in-nd-yag-laser-welding pp. 2215–2222, Nov. 2010.
[5] A. Papacharalampopoulos, P. Stavropoulos, and J. Stavridis, ‘‘Adaptive [28] J. Willems, E. Kikken, and B. Depraetere, ‘‘Low-dimensional learning con-
control of thermal processes: Laser welding and additive manufacturing trol using generic signal parametrizations,’’ IFAC-PapersOnLine, vol. 52,
paradigms,’’ Procedia CIRP, vol. 67, pp. 233–237, Jan. 2018. no. 29, pp. 280–285, 2019.
[6] L. Song and J. Mazumder, ‘‘Feedback control of melt pool temperature
[29] G. E. Hinton and R. S. Zemel, ‘‘Autoencoders, minimum description length
during laser cladding process,’’ IEEE Trans. Control Syst. Technol., vol. 19,
and helmholtz free energy,’’ in Proc. 6th Int. Conf. Neural Inf. Process.
no. 6, pp. 1349–1356, Nov. 2011.
Syst., San Francisco, CA, USA: Morgan Kaufmann, 1993, p. 3–10.
[7] X. Na, Y. Zhang, Y. Liu, and B. Walcott, ‘‘Nonlinear identification of
[30] J. C. Ye and W. K. Sung, ‘‘Understanding geometry of encoder-decoder
laser welding process,’’ IEEE Trans. Control Syst. Technol., vol. 18, no. 4,
CNNs,’’ in Proc. 36th Int. Conf. Mach. Learn., ICML, Jun. 2019,
pp. 927–934, Jul. 2010.
pp. 12245–12254.
[8] P. A. Hooper, ‘‘Melt pool temperature and cooling rates in laser powder
[31] A. Radford, L. Metz, and S. Chintala, ‘‘Unsupervised representation
bed fusion,’’ Additive Manuf., vol. 22, pp. 548–559, Aug. 2018.
learning with deep convolutional generative adversarial networks,’’ in
[9] A. Bollig, D. Abel, C. Kratzsch, and S. Kaierle, ‘‘Identification and pre-
Proc. 4th Int. Conf. Learn. Represent. ICLR, 2016. [Online]. Available:
dictive control of laser beam welding using neural networks,’’ in Proc. Eur.
https://arxiv.org/abs/1511.06434
Control Conf. (ECC), Sep. 2003, pp. 2457–2462.
[10] M. Courtois, M. Carin, P. Le Masson, S. Gaied, and M. Balabane, ‘‘A com- [32] P. Baldi, ‘‘Autoencoders, unsupervised learning, and deep architectures,’’
plete model of keyhole and melt pool dynamics to analyze instabilities and in Proc. ICML Workshop Unsupervised Transf. Learn., vol. 27, I. Guyon,
collapse during laser welding,’’ J. Laser Appl., vol. 26, no. 4, Nov. 2014, G. Dror, V. Lemaire, G. Taylor, and D. Silver, eds. Washington, DC, USA:
Art. no. 042001. Bellevue, Jul. 2012, pp. 37–49.
[11] X. Jin, L. Li, and Y. Zhang, ‘‘A study on fresnel absorption and reflections [33] S. Ioffe and C. Szegedy, ‘‘Batch normalization: Accelerating deep network
in the keyhole in deep penetration laser welding,’’ J. Phys. D, Appl. Phys., training by reducing internal covariate shift,’’ in Proc. 32nd Int. Conf.
vol. 35, p. 2304, Sep. 2002. Mach. Learn. (ICML), Lile, France, vol. 1, Jul. 2015, pp. 448–456.
[12] J. Stavridis, A. Papacharalampopoulos, and P. Stavropoulos, ‘‘Quality [34] H. Ide and T. Kurita, ‘‘Improvement of learning for CNN with ReLU
assessment in laser welding: A critical review,’’ Int. J. Adv. Manuf. Tech- activation by sparse regularization,’’ in Proc. Int. Joint Conf. Neural Netw.
nol., vol. 94, pp. 1825–1847, Feb. 2018. (IJCNN), May 2017, pp. 2684–2691.
[13] S. Shevchik, T. Le-Quang, B. Meylan, F. V. Farahani, M. P. Olbinado, [35] R. Ayachi, M. Afif, Y. Said, and M. Atri, ‘‘Strided convolution instead of
A. Rack, G. Masinelli, C. Leinenbach, and K. Wasmer, ‘‘Supervised deep max pooling for memory efficiency of convolutional neural networks,’’ in
learning for real-time quality monitoring of laser welding with X-ray Proc. Int. Conf. Sci. Electron., Technol. Inf. Telecommun. in Smart Inno-
radiographic guidance,’’ Sci. Rep., vol. 10, no. 1, p. 3389, Dec. 2020. vation, Systems and Technologies, vol. 146, Berlin, Germany: Springer,
[14] S. A. Shevchik, T. Le-Quang, F. V. Farahani, N. Faivre, B. Meylan, 2020, pp. 234–243.
S. Zanoli, and K. Wasmer, ‘‘Laser welding quality monitoring via graph [36] E. O. Neftci and B. B. Averbeck, ‘‘Reinforcement learning in artificial
support vector machine with data adaptive kernel,’’ IEEE Access, vol. 7, and biological systems,’’ Nature Mach. Intell., vol. 1, no. 3, pp. 133–143,
pp. 93108–93122, 2019. Mar. 2019.
[15] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. [37] R. Bellman, Dynamic Programming. Princeton, NJ, USA: Princeton Univ.
Cambridge, MA, USA: A Bradford Book, 2018. Press, 2010.
[16] A. Krizhevsky, I. Sutskever, and G. Hinton, ‘‘ImageNet classification with [38] S. S. Mousavi, M. Schukat, and E. Howley, ‘‘Deep reinforcement learning:
deep convolutional neural networks,’’ in Proc. Neural Inf. Process. Syst., An overview,’’ in Proc. SAI Intell. Syst. Conf. (IntelliSys), vol. 16, 2018,
vol. 25, 2012, pp. 1097–1105. pp. 426–440.
[39] M. Telgarsky, ‘‘Benefits of depth in neural networks,’’ J. Mach. Learn. SILVIO ZANOLI received the B.Sc. degree in elec-
Res., vol. 49, pp. 1517–1539, Feb. 2016. trical engineering from the University of Bologna,
[40] P. Petersen and F. Voigtlaender, ‘‘Equivalence of approximation by con- Italy, in 2017, and the M.Sc. degree in electri-
volutional neural networks and fully-connected networks,’’ Proc. Amer. cal engineering (with data science and the IoT
Math. Soc., vol. 148, no. 4, pp. 1567–1581, Dec. 2019. specialization) from the Swiss Federal Institute of
[41] S. Lange, T. Gabel, and M. Riedmiller, ‘‘Batch reinforcement learning,’’ Technology in Lausanne (EPFL), Lausanne,
in Reinforcement Learning (Adaptation, Learning, and Optimization), Switzerland, in 2019, where he is currently pursu-
vol. 12. Berlin, Germany: Springer, 2012, pp. 45–73. [Online]. Available:
ing the Ph.D. degree in electrical engineering (with
https://link.springer.com/chapter/10.1007/978-3-642-27645-3_2#citeas
data science specialization). His research interests
[42] P. Mishra and P. Mishra, ‘‘Introduction to PyTorch, tensors, and tensor
operations,’’ in PyTorch Recipes. New York, NY, USA: Apress, 2019, are in signal processing, machine learning, and the
pp. 1–27. IoT with particular attention to low energy solutions.
[43] S. Abrahams, D. Hafner, E. Erwitt, and A. Scarpinelli, TensorFlow for
Machine Intelligence: A Hands-On Introduction to Learning Algorithms.
Santa Rosa, CA, USA: Bleeding Edge Press, 2016. [Online]. Available:
https://dl.acm.org/doi/book/10.5555/3125813
[44] H. V. Hasselt, ‘‘Double Q-learning,’’ in Proc. 23rd Int. Conf. Neural Inf.
Process. Syst., Red Hook, NY, USA: Curran Associates, vol. 2, 2010, KILIAN WASMER (Member, IEEE) received
pp. 2613–2621. the B.S. degree in mechanical engineering
[45] J. Peters and S. Schaal, ‘‘Reinforcement learning of motor skills with policy from Applied University, Sion, Switzerland, and
gradients,’’ Neural Netw., vol. 21, no. 4, pp. 682–697, May 2008. Applied University, Paderborn, Germany, in 1999,
[46] S. Levine and V. Koltun, ‘‘Guided policy search,’’ in Proc. 30th and the Ph.D. degree in mechanical engineering
Int. Conf. Mach. Learn., vol. 28, S. Dasgupta and D. McAllester, from Imperial College London, Great Britain,
eds. Atlanta, Georgia: PMLR, Jun. 2013, pp. 1–9. [Online]. Available: in 2003. He joined the Swiss Federal Laboratories
http://proceedings.mlr.press/v28/levine13.html for Materials Science and Technology (EMPA),
[47] G. Arvanitidis, L. K. Hansen, and S. Hauberg, ‘‘Latent space oddity: On Thun, Switzerland, in 2004, to work on control of
the curvature of deep generative models,’’ in Proc. 6th Int. Conf. Learn. crack propagation in semiconductors. He currently
Represent., ICLR, 2017, pp. 1–16.
leads the Group of Dynamical Processes, Laboratory for Advanced Materials
Processing (LAMP). His research interests include materials deformation
GIULIO MASINELLI (Member, IEEE) received and wear, crack propagation prediction, and material tool interaction. In the
the B.Sc. degree in electrical engineering from last years, he has focused his work on in situ and real-time observation of
the University of Bologna, Italy, in 2017, and complex processes using acoustic and optical sensors in various fields such as
the M.Sc. degree in electrical engineering (with in tribology, fracture mechanics, and laser processing. He is in the director
data science specialization) from the Swiss Federal committee for additive manufacturing of Swiss Engineering. He is also a
Institute of Technology in Lausanne (EPFL), Lau- member of Swiss tribology, European Working Group of Acoustic Emission
sanne, Switzerland, in 2019. He is currently pursu- (EWGAE), and Swissphotonics.
ing the Ph.D. degree with Swiss Federal Laborato-
ries for Material Science and Technology (EMPA)
and EPFL, mainly developing machine learning
algorithms for data analysis and industrial automation. His research interests
include signal processing and machine learning, with emphasis on deep
learning. SERGEY A. SHEVCHIK received the M.Sc.
degree in control from the Moscow Engineer-
TRI LE-QUANG received the B.S. degree in ing Physics Institute, Russia, in 2003, and the
applied physic from Vietnam National Univer- Ph.D. degree in biophotonics from the General
sity, Ho Chi Minh City, Vietnam, in 2007, Physics Institute, Russia, in 2005. He stayed until
the M.Sc. degree in optics from the Friedrich- 2009 as a Postdoctoral Researcher at the Gen-
Schiller-Universität Jena, Germany, in 2013, and eral Physics Institute. From 2009 to 2012, he was
the Ph.D. degree in material engineering from with the Kurchatov Institute, Russia, developing
the Instituto Superior Tecnico Lisboa, Portugal, human–machine interfaces. In 2012 and 2014,
in 2017. Since 2017, he has been working as a he was with the University of Bern, investigating
Postdoctoral Researcher with EMPA, Swiss Fed- multi-view geometry. Since 2014, he has with Swiss Federal Laboratories for
eral Laboratories for Materials Science and Tech- Material Science and Technology (EMPA), working on industrial automa-
nology, Laboratory of Advanced Materials Processing. His research interests tion. His current interest is in signal processing.
include laser material processing, laser technology, and in-situ monitoring.