Adaptive Laser Welding Control A Reinforcement Learning Approach

This document summarizes a research publication about using reinforcement learning to control laser welding processes. The authors propose using reinforcement learning as it can learn control laws without prior knowledge of the complex welding dynamics. Their approach uses an agent that modulates laser power based on optical and acoustic emission signals. The agent aims to maximize rewards for achieving a targeted weld quality. Two learning schemes - Q-learning and policy gradient - were tested, with training times of 20 and 33 minutes respectively to reach the target quality. The goal is to autonomously learn adaptive control without needing a pre-established operating range.

Uploaded by

Dileep Gangwar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

72 views13 pages

Adaptive Laser Welding Control A Reinforcement Learning Approach

Uploaded by

Dileep Gangwar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/341691013

Adaptive laser welding control: A reinforcement learning approach

Article in IEEE Access · May 2020

DOI: 10.1109/ACCESS.2020.2998052

CITATIONS READS

0 1,007

5 authors, including:

Giulio Masinelli Tri Le Quang

École Polytechnique Fédérale de Lausanne Empa - Swiss Federal Laboratories for Materials Science and Technology
8 PUBLICATIONS 18 CITATIONS 21 PUBLICATIONS 101 CITATIONS

SEE PROFILE SEE PROFILE

Silvio Zanoli Kilian Wasmer

École Polytechnique Fédérale de Lausanne Empa - Swiss Federal Laboratories for Materials Science and Technology
4 PUBLICATIONS 9 CITATIONS 128 PUBLICATIONS 1,133 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

GoCARB - http://gocarb.ch/ View project

Real-Time Sensing, Control and Monitoring of Metal Additive Manufacturing Processes using Artificial Intelligence Techniques View project

All content following this page was uploaded by Giulio Masinelli on 21 June 2020.

The user has requested enhancement of the downloaded file.

Received April 16, 2020, accepted May 8, 2020, date of publication May 27, 2020, date of current version June 15, 2020.
Digital Object Identifier 10.1109/ACCESS.2020.2998052

Adaptive Laser Welding Control:

A Reinforcement Learning Approach
GIULIO MASINELLI 1,2 , (Member, IEEE), TRI LE-QUANG 1 , SILVIO ZANOLI 2,

KILIAN WASMER 1 , (Member, IEEE), AND SERGEY A. SHEVCHIK 1

1 Laboratory for Advanced Materials Processing, Swiss Federal Laboratories for Materials Science and Technology (EMPA), 3602 Thun, Switzerland
2 Embedded Systems Laboratory, Swiss Federal Institute of Technology in Lausanne (EPFL), 1015 Lausanne, Switzerland
Corresponding author: Kilian Wasmer ([email protected])
This work was supported by the Swiss Federal Laboratories for Materials Science and Technology (EMPA).

ABSTRACT Despite extensive research efforts in the field of laser welding, the imperfect repeatability of
the weld quality still represents an open topic. Indeed, the inherent complexity of the underlying physical
phenomena prevents the implementation of an effective controller using conventional regulators. To close
this gap, we propose the application of Reinforcement Learning for closed-loop adaptive control of welding
processes. The presented system is able to autonomously learn a control law that achieves a predefined weld
quality independently from the starting conditions and without prior knowledge of the process dynamics.
Specifically, our control unit influences the welding process by modulating the laser power and uses optical
and acoustic emission signals as sensory input. The algorithm consists of three elements: a smart agent
interacting with the process, a feedback network for quality monitoring, and an encoder that retains only
the quality critic events from the sensory input. Based on the data representation provided by the encoder,
the smart agent decides the output laser power accordingly. The corresponding input signals are then analyzed
by the feedback network to determine the resulting process quality. Depending on the distance to the targeted
quality, a reward is given to the agent. The latter is designed to learn from its experience by taking the actions
that maximize not just its immediate reward, but the sum of all the rewards that it will receive from that
moment on. Two learning schemes were tested for the agent, namely Q-Learning and Policy Gradient. The
required training time to reach the targeted quality was 20 min for the former technique and 33 min for the
latter.

INDEX TERMS Laser welding, laser material processing, reinforcement learning, policy gradient,
Q-learning, closed-loop control.

I. INTRODUCTION In the literature, the most commonly reported approach to

Laser welding (LW) is a crucial technology for many indus- increase the repeatability of the weld quality is the application
trial sectors, including automotive production, maritime, of traditional regulators, such as proportional-integral (PI)
medical, aerospace, and micromechanics [1]. On the one or proportional-integral-derivative (PID) controllers [3], [4].
hand, its advantages are in non-contact processing — avoid- These methods allow tracking the desired weld quality using
ing tool wear, ability to process refractory materials, and measurements of the surface temperature or the surface shape
higher processing rate and joint quality compared to tra- of the process zone (PZ) as feedback. Unfortunately, since
ditional welding processes [2]. On the other hand, LW’s they are based on the linearization of the non-linear weld-
main disadvantages derive from the highly complex under- ing dynamics, they can only operate in a narrow range of
lying physical phenomena involved in the process. Thus, the process parameters. This operating range, moreover, has
despite many developments of this technology, LW still to be established during a preliminary exhaustive experi-
suffers from imperfect quality repeatability, limiting its mental search, which is very time- and material-consuming,
applications in industrial production requiring high-quality making the entire methodology undesirable in an industrial
standards. environment.

The associate editor coordinating the review of this manuscript and

approving it for publication was Jianyong Yao .

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 8, 2020 103803
G. Masinelli et al.: Adaptive LW Control: A RL Approach

A less common approach, but that is worth investigat- The design of a keyhole LW control system is made all
ing, is based on more sophisticated regulators that rely on the more challenging by the partial observability of the laser
differential models of the process [5], [6]. But in the case of process. In fact, in-depth information of the PZ can only be
LW, a reliable model can be complicated to obtain, as it has indirectly obtained either by acoustic emission (AE) sensors
to take into account many factors that can drastically vary the or by surface measurements using optical emissions (OE)
process, such as the heating and melting dynamics [5]. sensors [12]. Consequently, it is difficult to provide an effec-
Nevertheless, a preliminary attempt can be found in Na tive feedback from the process to the control system, since it
et al. [7], where the authors presented an algorithm that requires the correlation of the surface measurements with the
automatically builds a model during the operation using the sub-surface events (e.g., pore formation), which is not a trivial
Hammerstein identification technique. task [12]. Nevertheless, some pilot works in LW monitor-
An example of the actual use of a model-based controller ing report successes in identifying quality critic momentary
for laser processes was proposed by Song and Mazumder [6], events from the corresponding AE and OE signals from the
where an experimentally identified model was involved for processed zone [13], [14].
predictive control of laser cladding — a process that is closely The present study starts from the aforementioned pre-
related to LW. This technique heavily relies on its model for liminary results of process monitoring and focuses on the
the choice of the actions to take according to their impact use of Reinforcement Learning (RL) towards keyhole LW
on the environment evaluated with the model itself. To be closed-loop control.
specific, a closed-loop process was used to steer the melt RL appears to be an attractive approach since it enables a
pool temperature to a reference temperature profile. In a model-free learning scheme that is capable of solving com-
real-life scenario, unfortunately, this approach has two major plex problems and provides high adaptability to specific con-
drawbacks. First, the temperature of the melt pool is not ditions through active interaction with a given process [15].
uniformly distributed over its surface [8]. Second, the optimal Moreover, we take advantage of recent advances in Deep
temperature profile can vary during the process, as it strictly Convolutional Neural Networks (DCNN) developments [16],
depends on the geometry, e.g., on the proximity to the edges [17] to derive efficient representations of the laser process
or the boundaries of the workpiece. Thus, the tracking of a from the high-dimensional sensory input — the AE and OE
single fixed target has a direct impact on the system perfor- signals from the PZ — and use them to generalize previ-
mance and so on the desired result. ous experiences to new situations [18]. In our case, indeed,
Similarly, Bollig et al. [9] showed promising results by the input data from the sensors do not contain an explicit
modeling the non-linear process with an Artificial Neural representation of the physical state of the system, as they are
Network and controlling the laser power with a linear model just limited to the optical and acoustic emission. As shown
predictive algorithm based on the instantaneous linearization by Mnih et al. [18], DCNNs can overcome — and even take
of the neural network itself. In this case, the regulator aimed to advantage of — this condition, allowing the system to learn
track a reference penetration depth detected from the intensity meaningful position and scale of irregular structures in the
of the plasma’s optical emission. However, the experimen- data.
tal calibration curve used to map the measured intensity to Concerning the recent advances of RL, its application
the penetration depth may diverge from its real-life values, towards LW was discussed in Günther et al. [19], where
limiting the application of the same methodology in broader a dynamic model substituted the real laser process, and a
scenarios. camera-based system and photodiodes were used for process
In this context, there is a clear need for a widely applicable, monitoring. RL was able to efficiently search for strategies
robust, and cost-effective process control system that ensures for modulating the laser irradiation to compensate for the
high-quality standards. In particular, we focus on deep key- mentioned process instabilities.
hole welding, where the process complexity is even higher Despite the successes of this work, the efficiency of RL
compared to other welding regimes, such as conduction in more complex LW processes remains an open ques-
welding. tion. To close this gap, we inspected the performance of
This welding regime is indeed characterized by the our methodology in the case of keyhole LW and evalu-
co-existence — within a limited volume — of vapor, melt, ated its outcomes in terms of the evolution of the weld
and plasma phases of the processed material [10]. More- quality over time during training. Firstly, the AE and OE
over, it possesses an extremely complex energy-coupling signatures of the desired weld quality were given to the
mechanism that includes Fresnel absorption (due to multi- algorithm, as well as several signatures of undesirable
ple reflections inside the vapor channel) [11]. These com- qualities, without any other prior information about the pro-
plex phenomena generate many process instabilities, making cess dynamics. Further search for the optimal process con-
keyhole welding prone to defects even under constant laser trol strategy was carried out in a completely autonomous
irradiation [10]. Specifically, one of the most critical defects way. Two RL techniques were investigated in this contri-
is porosity. Pores are problematic since they are located inside bution: Q-Learning [20] and Policy Gradient [21], in order
the material and may substantially weaken the mechanical to analyze their strengths and weaknesses in this particular
strength of the welding joint [12]. application.