Reinforcement Learning For Statistical Process Control in Man - 2021 - Measureme
Reinforcement Learning For Statistical Process Control in Man - 2021 - Measureme
Measurement
journal homepage: www.elsevier.com/locate/measurement
A R T I C L E I N F O A B S T R A C T
Keywords: The main concept of the authors is to place Reinforcement Learning (RL) into various fields of manufacturing. As
Statistical Process Control one of the first implementations, RL for Statistical Process Control (SPC) in production is introduced in the paper;
Optimal Control it is a promising approach owing to its adaptability and the continuous ability to perform. The widely used Q-
Reinforcement Learning
Table method was applied for get more stable, predictable, and easy to overview results. Therefore, quantization
Production Trends
Disturbance Handling
of the values of the time series to stripes inside the control chart was introduced. Detailed elements of the
production environment simulation are described and its interaction with the reinforcement learning agent are
detailed. Beyond the working concept for adapting RL into SPC in manufacturing, some novel RL extensions are
also described, like the epsilon self-control of exploration–exploitation ratio, Reusing Window (RW) and the
Measurement Window (MW). In the production related transformation, the main aim of the agent is to optimize
the production cost while keeping the ratio of good products on a high level as well. Finally, industrial testing
and validation is described that proved the applicability of the proposed concept.
1. Introduction extended this range to the Reinforcement Learning (RL). Currently there
are also further combinations of such techniques like semi-supervised
Artificial Intelligence (AI) and Machine Learning (ML) approaches learning.
are spreading across all areas in our live, it is also valid for technical The spread of various artificial intelligence and machine learning
fields, e.g., for the manufacturing sector as well. Nowadays, the increase techniques in manufacturing is valid for reinforcement learning as well.
in the speed of this expansion is growing, consequently, the intensity of However, as described in the next paragraph, the reviewing the scientific
changes and novel challenges require more and more attention with literature mirrors that the domain specific adaptation of reinforcement
exhaustive research & development activities. Moreover, the frequently learning to various production fields concentrates mainly, only to production
arising, novel AI and ML techniques have to be continuously adopted to scheduling and robotics. This state-of-the-art status provoked the moti
the given domain to achieve the best match. This mission is valid also to vation to extend and adapt reinforcement learning to further potential
manufacturing, while the well-known Industry 4.0 global initiative fields of manufacturing. So, the current paper introduces novel RL based,
(called also as Industrial Internet or Cyber Physical Production Systems Statistical Process Control (SPC) in manufacturing, with various additional
(CPPS)) supports, facilitates, moreover, incorporates these directions, novel components:
consequently, the actual situation in this sector is quite promising.
There are various areas of the AI discipline (e.g., machine learning, • The main contribution of the paper is the adaptation concept of rein
search techniques, multicriteria optimization, inference and expert forcement learning to the process control in manufacturing.
systems, graph modelling and traversal…), nowadays, the so-called • A novel, general (manufacturing independent), dynamic Q table
Deep Learning (DL) receives the highest level of attention, making it handling for RL is described, even if it was motivated by the production
to the most fashionable solution, while sometimes some may forget the adaptation challenges.
other very important areas of AI. In general, ML is one of the key, basic • The specialties of industrial process control led to the introduction of
foundations in AI, originally this branch started with the two directions the so-called Reusing Window (RW) in RL based SPC in manufacturing.
of supervised and unsupervised learning. However, in the 80s, the early, • To compare the efficiencies of various RL solutions in production
pioneering results of Sutton [1] with his professors and colleagues SPC, the Measuring Window (MW) is introduced.
* Corresponding author.
https://doi.org/10.1016/j.measurement.2021.109616
Received 15 February 2021; Received in revised form 2 April 2021; Accepted 10 May 2021
Available online 6 June 2021
0263-2241/© 2021 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
Z.J. Viharos and R. Jakab Measurement 182 (2021) 109616
• The features and behaviour of the related, SPC in production, specific implementation of a reinforcement learning architecture for default
simulation are described. pattern identification in multi-stage assembly processes with non-ideal
• A novel, dynamic, (manufacturing independent) self-adaptation sheet-metal parts [44], in which the failure patterns are generated by
method is presented to control its own exploration–exploitation rate by simulation. The architecture was composed by three modes (knowledge,
the agent itslef. cognitive and executive modes) combining an artificial intelligence
• Industrial testing and validation is decribed that proves the applicability of model library with a Q learning algorithm. The results presented by
the method proposed. three methods (MLP, SOM and MLP + GAs) achieved high precision
level regarding the different measurement parameters generated espe
The paper is organized as follows. After the current introduction, the cially for this training and validation. The architecture extension by a
actual status about the reinforcement learning in production field is reinforcement learning algorithm (Q-learning in this case) resulted in
summarized. The third paragraph reviews the current SPC solutions in further benefits because its helps to determinate parameters for the
manufacturing followed by the novel approach to introduce RL in pro models that enabled better adjustment to the different changes of the
duction, especially for SPC assignments. Its specialized extensions are experimental data and to the different work regiments.
introduced in the next paragraph, after the introduction of the Reusage The paper of Guo et al. [45] introduced a professional, hybrid
Window (RW) and Measurement Window (MW) the proving results of framework for optimizing and control of an injection moulding process
an Industrial Test and Validation are presented. Conclusions with for lens production. The model is pre-trained by a simulation framework
Outlook, Acknowledgements and References close the paper. establishing the initial knowledge that can be promptly applied for on-line
optimization and control. Finally, the complete process was controlled
2. Reinforcement learning in production staying inside the prescribed quality control chart. This paper is excel
lent, and the approach is robust. In comparison to genetic algorithm and
Despite the given, highly potential situation, the state-of-the art fuzzy inference based optimization and control methods, the proposed
literature mirrors that RL applications in manufacturing concentrate reinforcement learning based solution outperformed them significantly.
mainly only on two fields: production scheduling and robotics. Kader and Yoon [3] proposed a methodology for the stencil process
In production scheduling, the state-of-the-art for dynamic scheduling parameter estimation in the surface mount technology (SMT) of circuit
shows a growing increase for the use of RL algorithms. Several papers boards. The aim was to build an optimal adaptive controller to enhance
use Q-learning [2,3,4], deep Q-learning [5] and adapted versions of Q- the solder paste volume transfer during the process. Printing speed,
learning [6,7]. Most cases focus on value-based algorithms [2,3,4,5,8], printing pressure and separation speed in a discretized coding formed
however a few papers like [4,7] are policy-based. Some researchers used the state space of reinforcement learning and two prediction models are
the epsilon-greedy method [3,4,5], whereas Bouazza et al. [2] used it in built to estimate the reward functions, which are average (avg.) and
addition to the machine selection rule. While Kuhnle et al. [7,8] standard deviation (std.) of solder paste volume transfer efficiency (TE).
considered the architecture of a RL algorithm framework, Qu et al. [10] To estimate the immediate reward, after predicting the avg. and std. of
analysed the optimal assignment of multi-skilled workers. In papers volume TE, the capability indices Cpk and Cpkm were calculated for each
[2,5,10] multi-agent architectures were realized. In general, researcher component, in this aspect the results are really harmonized with the SPC
applied a simulation tool to test and validate their approach. E.g., Kar techniques. If these values are above a threshold, the reward was 1, if not, the
dos et al. introduced a Q learning based RL architecture into the reward was − 1. The simulation-based testing results showed that the
scheduling/dispatching decisions in productions systems and proved on agent successfully arrived at the terminal state with taking only few
simulation basis that their solution significantly reduces the average actions.
lead time of production orders in a dynamic environment [9]. Moreover, Li et al. proposed a novel neural network [46], called a Reinforce
it was shown that as the complexity of the production environment in ment Learning Unit Matching Recurrent Neural Network (RLUMRNN),
creases, the application of RL for dynamic scheduling becomes more and with the aim of resolving the problem that the generalization perfor
more beneficial; that makes the future production systems more flexible mance and nonlinear approximation ability of typical neural networks
and adaptive. are not controllable, which is caused by the experience-based selection
In the field of robotics applications of RL, Nair et al. presented a of the hidden layer number and hidden layer node number. This was the
system with a RL component for solving multi-step tasks [11]. The main contribution of the paper, and so, the simulation of the training data
report by Plappert et al. [12] introduced a suite of challenging contin is easy. In the application side, a discriminator was constructed for dividing
uous control tasks and also a set of concrete research ideas for improving the whole state degradation trend of rolling bearings into three kinds of
RL algorithms. Zhu et al. combined reinforcement and imitation monotonic trend units: ascending unit, descending unit and stationary unit.
learning for solving dexterous manipulation tasks from pixels [13]. Kahn By virtue of reinforcement learning for the recurrent neural network, its
et al. presented a high-performing RL algorithm for learning robot hidden layer number and hidden layer node numbers was fitted to a
navigation policies [14]. Long et al. optimized a decentralized sensor corresponding monotone trend unit that was selected to enhance its
level collision avoidance policy with RL [15] and Johannink et al. generalization performance and nonlinear approximation ability.
studied the combination of conventional feedback control methods [16]. By taking advantages of the concept a new state trend prediction
There are some first trials for applying RL also in the field of process method for rolling bearings was proposed. In this prediction method, the
control in manufacturing, however, such papers are particular and much moving average of the singular spectral entropy was used first as the state
rare than those for scheduling and robotics, so, these results have to be degradation description feature, and then this feature was inputted into
introduced in more detail. the model to accomplish the state trend prediction for rolling bearings.
Large variety is one of the challenges in brine injection into bacon The same concept was copied and re-published by Wang et al. [47], the
food products, requiring an adaptive model for control. Injection pres only differences were that it optimizes the structure of a deep neural
sure and injection time can be controlled by an adaptable Deep Deter network and the test environment is a test-rig and a locomotive bearing.
ministic Policy Gradient (DDPG) reinforcement learning model In the paper of Ramanathan et al. [48] a reinforcement learning
presented by Andersen et al. [43]. The DDPG could adapt a model to a based smart controller was proposed, and its performance was demon
given simulated environment that is based on 64 conducted real experiments. strated by using it to control the level of liquid in a non-linear conical
With a target setpoint of mass increase of 15% it was capable of pro tank system. The advantage is that a standalone controller is designed on
ducing a mean of 14.9% mass increase with a standard deviation of its own without prior knowledge of the environment or the system.
2.6%. Hardware implementation of the designed unit showed that it controlled
The main contribution of Beruvider et al. was the design and the level of fluid in the conical tank efficiently, and rejected random
2
Z.J. Viharos and R. Jakab Measurement 182 (2021) 109616
3
Z.J. Viharos and R. Jakab Measurement 182 (2021) 109616
quality production by eliminating key problems: undesirable tolerance uncertainty of Wire Electric Discharge Machining (WEDM) [52]. To
limits, poor surface finish or circularity of spheroidal cast iron parts investigate wind speed range by state and by month, the R chart was
during machining [27]. adopted by Aquila et al. [53]. Control limits were obtained by the spe
Huybrechts et al. applied standardization, trend modelling, and an cific equation related to the examined wind energy field. Through the R
autoregressive moving average (ARMA) model to determine short-term graphic, it was possible to observe that the higher amplitudes, regarding
correlation between subsequent measurements [28]. The out-of-control wind average speeds measured for each month for four years, occurred
observations can be determined precisely with the Dijkstra model on the in the state of Bahia during summer months. A Bayesian Control Chart is
cumulative sum chart of the corrected residuals between the measured a graphical monitoring tool that shows measurements of process sam
and predicted values. Milk yield data from two automatic milking sys ples over time by using the Bayes’ theorem to update information on a
tem farms and one farm with a conventional milking system were used process state. It was applied by Mba et al. [54] in the proposed, novel
for the case study. fault detection and classification system based on the integration of
Viharos and Monostori presented an approach, already in 1997 for stochastic resonance and the hidden Markov modelling (HMM) of vi
optimization of process chains by artificial neural networks and genetic bration data tested by simulated and real-life gearbox applications. The
algorithms using quality control charts [29]. It was shown that the measurement error results of diameter and thickness of O-rings are
control of “internal” parameters (temporal parameters along the pro presented on a quality control chart by Peng et al. when improving the
duction chain) is a necessity, by this way, early decisions can be made human inspection-based quality control by vision systems [55]. The
whether to continue the production of a given part or not. Continuous proposed method by Wu et al. can adjust the train rail defect detection
optimization of the production system is possible using the proposed (control chart) thresholds according to the design law to improve the
solution as well. Survey on neuro-fuzzy systems and their applications in detection accuracy. In addition, a data pre-screening perceptron classi
technical diagnostics and measurement was presented by Viharos and fication and learning model is proposed for defect features of end face,
Kis [30], further machine learning techniques supported the supervision weld reinforcement and lower crack of screw hole, which can transform
of the milling tool life of cutting ceramics in the selection of appropriate the defects detection problem into a contour classification problem, and
features of the vibration signals [31]. the detection performance can be improved by learning sample images
Concerning the applied techniques, the most prevalent approaches [56]. Gopal and Prakash applied quality control charts for fabrication of
are based on statistical methods, such as autoregression, moving average magnesium hybrid metal matrix composite by reinforcing silica rich E-
and their combinations: autoregressive integrated moving average waste CRT panel glass and BN particles through powder metallurgy
model (ARIMA) [32] with use of linear regression analysis, quasi-linear route [57]. A novel Al/Rock dust composite was fabricated successfully
autoregressive model [33] or Markov chain models (MCM) [34]. These by Prakash et al. through stir casting technique for varying reinforce
methods are based on historical production or time series data for ment parameters i.e. Rock dust particle size and weight percent [58].
modelling and prediction. Alam et al. describes a lossy compression technique with a quality
Another approach has appeared with the evolution of artificial in control mechanism for PPG monitoring applications [59].
telligence, such us modelling with artificial neural networks (ANN), The review of the literature and the related applications mirror that there
support vector machines (SVM) or nearest neighbour approaches, based are various methods for statistical process control in manufacturing,
on pattern sequence similarity [35]. There are several curve-fitting including also many machine learning techniques, however, the advances of
methods in this field for small sample data, such as genetic algorithm reinforcement learning are not yet exploited in this area of production. This
[36]. By using artificial neural networks combined with statistical status served with the scientific motivation to adopt RL to SPC in pro
methods to compensate drawbacks of the separate approaches in trend duction as introduced in the next paragraphs.
forecasting lead to better classification and approximation results.
A mixed, physical model integrating real process measurements was 3.1. RL for SPC in manufacturing
presented by R. Paggi et. al. for computing process uncertainties beyond
their prognosis values [37]. Various physical modelling techniques, like In a general reinforcement learning approach the central component
finite element methods, analytical equations can represent the known is an agent (or a set of agents) that senses its environment and acts
dependencies. Francesco et. al. [38] used effective measurements through actions to its environment, moreover, it receives rewards from
derived from the conformity tests to improve the accuracy of the the environment evaluating externally and independently the actions
Remaining Useful Life (RUL) evaluation. taken. Series of such interactions serve with continuous information to
The importance for applying appropriate SPC control charts is the agent and it can learn in an uninterrupted manner from its envi
mirrored through the results of Wu et al [49]. A new method combining ronment, moreover, because also it takes actions parallelly, it performs
the ensemble-integrated empirical mode decomposition (EEMD) algo in the same time valuable tasks as well. This a very important advantage of
rithm and the EWMA control chart was proposed to identify and alert reinforcement learning over supervised or unsupervised techniques because
global navigation satellite system (GNSS) deformation information. The the learning component can be continuously applied to perform the given
experimental results show that the recognition accuracy of the EWMA assignment parallel to its training that is especially important in
control chart method and the modified EWMA method is higher than manufacturing.
that of the cumulative sum control chart method. The use of a modified As a rough description, in the proposed framework for applying RL
EWMA control chart improves the accuracy of deformation identifica for SPC in manufacturing the agent walks through the sensed time series,
tion and early warning, which reduces the false alarm rate and missing as a moving window. Each moving window will be quantized and
alarm rate. become a state. The agent considers only the actual moving window
An extended SPC control chart was introduced by Aslam et al. and it from the past as information source when deciding which action to
has been compared with the existing plan using simulated data gener select. The generated actions act on the production environment with
ated from neutrosophic COM-Poisson distribution [50]. The practical which it can influence the trend (the Control Chart Patterns) inside (or
implementation of the suggested chart has also been expounded using sometimes unfortunately outside) the prescribed manufacturing toler
the data from the manufacturing of the electric circuit boards. Overall, ance range. The agent receives the related reward from the environment,
the results demonstrate that the suggested chart will be a proficient according to the taken (production influencing) action. The proposed
addition in the control chart literature. external reward system is defined so that every action has a (real) cost,
Research on variety types of control charts’ applications is wide and the reward is inversely proportional to the cost. In addition, it incurs
spread. Xbar-R charts for Material Removal Rate (MRR) and Ra control and extra penalty if the trend (produced products) goes out to the out-of-
were applied by Kavimani et al. for to evaluate the measurement control range. To introduce the proposed RL for SPC in manufacturing
4
Z.J. Viharos and R. Jakab Measurement 182 (2021) 109616
concept more precisely, the various individual components of RL have to series itself in its next points. In addition, as in every production envi
be defined exactly as described in the next sections about the state, ac ronment, independent events (many times called as “changes and dis
tions, learning method, reward, events in the environment, knowledge turbances”) can occur with certain probabilities that affect the
representation and learning method. evolutions of the time series as well. The time series’ data points are
generated step-by-step, at first, they are generated noiselessly, namely
they are formed by individual linear movements/trends. So, the starting
3.2. Temporal difference learning
position inside the control chart, the steepness of the linear line and its
length simulates the production trend evolution. Due to the complexity
There are various kinds of learning strategies in reinforcement
and natural noisiness of the real time series, noise is added to the data
learning, Temporal Difference (TD) learning is one of the most popular and
point after it is generated, so the new data point will be sampled from a
effective methods, so, it was selected for the proposed solution, however,
gaussian distribution, where its mean is the value of the original
other learning types can be applied here as well. In the TD case, after the
noiseless (linear trend) point and the measure of the noise is its standard
agent receives the reward from the environment, the chosen action’s (Q)
deviation. In many efficient production environments, the size of the
value will be updated using the Temporal Difference Learning equation
noise is less or between 5% and 10% of the interval formed by the two
(eq. 1), where Qs,a is the value of the action of row s (state) of the Q-
out-of-control boundaries of SPC (LTL, UTL).
Table (detailed later). 0 ≤ α ≤ 1 is a constant step-size parameter, which
As Fig. 3. shows two time series one is the original trend the other is
influences the speed of learning, therefore it is called as learning rate. γ is
the final time series, having additional noise. The learning uses only the
a parameter, 0 ≤ γ ≤ 1, it is called as discount rate/factor. The discount
noisy time series, but if required it is simple and easy to generate various
factor determines the present value of future rewards: a reward received
noise levels based on the original time series as well.
k time steps in the future is worth only γk− 1 times less what it would be
A special “trick” is the length of the generated trend. In reality, each
worth if it were received immediately [39]. In this case it is applied to
trend has only one step lengths, because the RL agent selects and action
estimate the value of the next state where the agent lands after taking
at each product produced, so, at each time series point the selected ac
action a in state s. The R marks the related received reward.
tion redefines the trend and the noise level as well, so, these components
( )
of the trends are continuously redefined at each point. However, there is
Qt (s, a) = Qt− 1 (s, a) + α R(s, a) + γmaxQ(s’ , a’ ) − Qt− 1 (s, a)
a’ a special type of the action, the so called “No action” when no changes
are implemented in the actual behaviour of the actual trend, noise, etc.
This update rule is an example of a so-called temporal-difference In such a case the size of the original trend becomes much longer (than
learning method, because its changes are based on the difference one). Fortunately, after some learning steps and also in the reality the
Qt (s, a) − Qt− 1 (s, a), so on Q values at two successive steps/states. “No action” is far the most frequent one, so much longer trends are also
The values of α and γ has recently been treated as a hyperparameters emerging.
of the optimisation e.g., by OpenAI and Xu et al. [40,41]. After It has to be emphasized that in the given case all of the elements of
numerous tests of the authors of this paper, it was found that the step- the simulator are inherited a validated through a real manufacturing
size parameter should be around 0.3 and the discount rate should be environment similar to some of the introduced solution before in the
around 0.75 in the analysed manufacturing environment. review of the state-of-the-art. It is valid for the possible events, actions,
Optimal action selection is a widely researched area in the RL, it is their effects, various noise levels, as well for frequencies of events, ef
well known as the “Exploration-Exploitation Trade-off” [41]. In RL fects of actions, etc., even if the figures in the paper are distorted.
cases, where a complete model of dynamics of the problem is not
available, it becomes necessary to interact with the environment to learn
by trial-and-error the optimal policy that determines the action selec 3.4. Events
tion. The agent has to explore the environment by performing actions
and perceiving their consequences. At some point in time, it has a policy In manufacturing unexpected events happen, like tool failure or equip
with a particular performance. In order to see whether there are possible ment failure, etc., consequently, the simulation has to be able to emulate this
improvements to this policy, sometimes the agent has to try out various, behaviour as well. For that reason, it is needed to specify the events’
in some cases not optimal actions to see their results. This might result in frequencies and their impacts, too. The impacts can be divided into three
worse performance because the actions might (probably) also be less different parts. Events have impact on the mean of the trend, so, where
good than the current policy. However, without trying them, it might
never find possible improvements. In addition, if the world is not sta
tionary, the agent has to do exploration in order to keep its policy up to
date. So, in order to learn, it has to explore, but in order to perform well,
it should exploit what it already knows. Balancing these two things is
called the exploration–exploitation problem.
5
Z.J. Viharos and R. Jakab Measurement 182 (2021) 109616
the next time series point starts on the y axis (see Fig. 5.), inside (or
sometimes outside) the control chart. These mean-changing (trend shift
in Fig. 1.) information are regularly given as distributions, which are
usually not uniform. So, it is necessary to provide the intervals for the
mean changes and each of the subintervals’ probabilities. Second, the
events may affect the time series’ trend (steepness), meaning that they
can start e.g., a long-term, low-intensity trend which slowly brings the
mean outward (or inward). Outward means the trend is slowly going
towards the out-of-control stripes, inward means the trend is slowly
going towards the central, normal stripe (stripes are discretised intervals
of the control chart as described in detail later and shown in Fig. 4.).
Finally, the events effect the time series’ noise, with which the indi
vidual data points are emulated. If multiple events occur at the same
time, their effects are cumulated, more precisely, the mean/starting
value of the next starting trend is the average of the starting points of the
individual events, trend steepness are summed, however, their effect on
the noise is selected otherwise: the new noise will be equal to the largest
noise among all the actual events. The same structure is applied to describe
the effects of the actions as well.
Fig. 4. Time series with stripes. Green in the middle is the normal range 3.7. Actions for SPC in manufacturing
(optimal), yellows are the control ranges (warning) and the reds are the out-of-
control ranges (resulting in product failure). In manufacturing, different actions are distinguishable, such as
6
Z.J. Viharos and R. Jakab Measurement 182 (2021) 109616
networks.
Fig. 6. A part of the typical Q table and its content in the RL for SPC in A major problem with Q-Table is its memory requirement. For
manufacturing approach. States for the RL agents are represented as BW long example, for a table with A columns where each column takes B different
series of stripe codes according to the allocations of the production data in the values, the size of the table could reach to BA rows. With large A and B,
different stripes in the recent BW number of points. the generation, storage and handling of the table could cause problems,
moreover it is unnecessary to allocate the memory for the empty table
Maintenance actions, Measurement actions, Set-up actions and No action as before it is used. Therefore, a technique was introduced, called dynamic
well. There are actions whose impacts on the trend are outstanding and Q-Table, where only as much memory is allocated as required and only when
need to be considered, and there are action whose impacts are negli it is required. It the proposed concept, at the beginning the Q table is
gible. No action stands out from the rest, because taking it means the empty. When the algorithm reaches a new state, it adds it to a list which
time series is controlled and the near future of the products looks represents the rows of the Q-Table and its initial actions’ values are
promising, so, no intervention is needed. In general, an action can have selected randomly. (In the future research the order of the Q table rows
the same effect on the production trend as an evet: it can influence the will be much more optimized for exploiting the characteristics of the
starting position of a trend, the steepness and the noise level after taken. production SPC – e.g., normal states are much more frequent.) If the
There are two strategies for action selection which are needed to be actual state is already in the Q table, then the chosen action will be
balanced: exploration and exploitation. When the agent does not have updated as it is detailed in the next chapter. As a research outlook, this
sufficient information about its environment, it is advisable to choose concept could be transferred to the Artificial Neural Network (ANN)
the exploration strategy when the agent chooses barely chosen actions based solution, so, at the beginning the ANN could have really limited
with intent to explore its environment and so the state-space. In contrast, knowledge and later on it can learn continuously new states, however,
when the agent does have enough knowledge about the environment this is a challenge for a future research.
mostly it is acceptable to choose the action with the highest goodness. At the beginning, when the agent mostly explores its environment, it
often meets states in which it has not been before, so they are added to
(the end of) the Q table. As it explores, its knowledge about the envi
3.8. Knowledge representaion: Q table
ronment grows, therefore it meets more and more times visited states,
where it only updates the relevant action values of the state, so in most
The Q table incorporates naturally also the possible actions (A1, …,
of such cases no new row is needed. As a result, the length of the table
AN) of the RL agent as shown in Fig. 7. The values V(i,j) under them are
grows logarithmically, as it shown in the upper part of Fig. 8. In the
the estimated goodness (values/expected reward) of the actions [1], in
given, presented case in the bottom part of Fig. 8., it may be rational to
respect to the actual state (row), they are the so-called Q values (es
stop the learning after ~ 2000 learning steps, because even though the
referred later on).
table is still growing, that is not increasing the in the given, particular
The table in Fig. 7. is structured as follows: the states are chosen for
case considered recognition rate for production trend forecast signifi
the rows, and the actions are chosen for the columns. The states consist
cantly. It is a very important result proving that there exists a rational
of quantized values, its length depends on how many previous values are
limit for the Q table, beyond it, the size, calculation time and other
taken into account in the action selection (BWs). Concerning the actions,
performance requirements grow significantly but it does not bring
they are the possible activities performed by the operators, experts or a
valuable additional knowledge to the given SPC assignment. Conse
control system to control the given process, so, in the RL, when the best
quently, with the proposed dynamic Q table solution the IT background
action is searched at a state, it means in reality that the best production
requirements can be limited and kept under control.
intervention is searched.
According to the applied, well-known Q table representation the pro
4.2. State extension with past actions
duction related knowledge is stored in the V(i,j) values. Later, it will be
substituted by one (or more) regression techniques, like deep neural
In manufacturing environments, it is crucial to consider what were the last
actions taken for control the given production process, because it is not worth
to do the same (e.g. expensive) action repeatedly in short time. To handle
States Production Actions
this requirement also, the state space was extended with the last ‘n’
actions from the past (in the current analysis n = 3), this extension can
… … be seen in Fig. 9.
NN … NO+ NC+ … In the learning, the most popular ε-greedy algorithm is applied in
which ε controls the ratio of exploration vs. exploitation. Its zero value
means that there is exploitation only and the value one means full
exploration [1]. There are various strategies how to adapt the value of ε
Fig. 7. A part of the typical Q table and its content in the RL for SPC in during learning, typically it starts with a high value for wide exploration
manufacturing approach. Rows are the different, already visited states and the and it is decreased over time to enforce exploitation while the learnt
possible, selectable actions are shown as A1, …, AN, while the Q table values knowledge of the agent is continuously increasing. This is a really
(Vi,j) store the knowledge of the RL agent. valuable feature of RL but on the other hand it is an additional
7
Z.J. Viharos and R. Jakab Measurement 182 (2021) 109616
Fig. 9. Extension of the state vector with the last ‘n’ actions taken (‘n’=3 in this 4.5. Initial exploration level
particular case) as shown in the middle of the table by the columns T-3A, T-2A,
T-1A. Consequently, not only the results of the earlier actions are considered During the interaction with the environment, beyond the selection of
through the past stripes in the actual state vector, but the recently taken ‘n’
a production action (as introduced before) the agent chooses for itself ε
actions themselves as well.
values as well (in Fig. 10.: Epsilon Actions) for its next activities. For the
production action the actual ε controls the exploration–exploitation
parameter that has to be defined and controlled. This requirement ratio, however for the action of selecting the future ε value, only
8
Z.J. Viharos and R. Jakab Measurement 182 (2021) 109616
exploitation was defined (this solution performed as best). However, Unit cost
there is a small exception, when the agent meets a new, previously not visited
state it has to select the initial, first value of the ε for this new state. This
choice determines the staring exploration–exploitation ratio. In the 0.35
given proposed concept and in the related manufacturing experiments, 0.3
fixed, concrete, discretized values for ε are applied only (0.0: 0 - as 0.25 0
exploitation only, 1.: 0.05, 2.: 0.15, 3.: 0.25, 4.: 0.5 and 5.: 1.0 - as full 0.2 1
0.15 2
exploration), consequently, the selection one from these initial values
0.1
determine the starting exploration–exploitation ratio in the new state. 0.05 2
So, in this initial stage there is a small level of exploration also for the 0 1
0
exploration itself (as an exception). The optimal selection for the first 0 3 5
value is unknown, but it was tested as described afterwards. One can Cummulative cost
recognise that the possible values for ε are not distributed equally,
because the experiences mirrored that the well working ratio of
exploitation is much higher than exploration, consequently, smaller ε 700000000
600000000
values are needed more than larger ones.
500000000 0
Comprehensive experiment was performed for selecting the best/optimal 400000000 1
exploration control rule and the related optimal/best initial exploration level. 300000000 2
A complete and so, long RL training was performed at all combinations 200000000
of these factors (at each exploration control rule with initial exploration 100000000 2
levels of 0.: 0.0, 3.: 0.25 and 5.: 1.0). The final cumulated costs, the ratio 0
0
1
0 3 5
of good products, the unit price of a product, the self-controlled, final
exploration–exploitation ratio and the size of the Q table were measured Rate of good products
at each combination for finding the optimal set-up. The Fig. 11. shows 0.978
the experimental results for these indicators, resulting in the optimal 0.976
selection for the exploration control rule to the second option (the action 0.974
is selected randomly but in relation according to the current Q values of 0.972 0
the possible action list given at the current state) and the initial explo 1
0.97
ration level to 5.: ε = 1.0-as full exploration. 2
0.968
4.7. Reusage Window (RW) Fig. 11. Performance parameter values of the proposed solution after long
runs. The two horizontal axes show the possible Options (0 – equal chance for
As opposite to the original concept, where the agent walks only once action selection, 1 – action selection proportionally to the actual own estimated
through the time series, with RW it is repeated RW times, as follows: at reward, 2 – action selection proportionally to the known external reward) and
first, an interval, which length is RW, is selected from the time series and the Initial exploration level applied for new states (0 – as exploitation only, 3 –
the agent goes through this selected interval once, sampling and quan balanced exploitation and exploration, 5 – exploration only) and the vertical
axes represents the various evaluation measures. As result, the Option 1 with
tizing states from it. This is one learning iteration. Then, the RW moves
(the action is selected randomly but in relation according to the current Q
one step (one product or one measurement) on the time series ahead,
values of the possible action list given at the current state) and the Initial
and it starts again, until the end when the RW meets the actual end of the exploration level of 5.: ε = 1.0 - meaning full exploration was identified
time series. It means that one data is (re)used so many times as long the as optimal.
length of the RW is. As Fig. 12. represents, the RW moving window goes
9
Z.J. Viharos and R. Jakab Measurement 182 (2021) 109616
Fig. 13. Comparison among various production trend forecast recognition rates
with different RWs. (noise = 10, MW = 150) [51]. It is mirrored that the RW Fig. 14. Comparison of different recognition rates of production trend forecast
has an optimal level because above that the production trend forecast recog with different RWs. (noise = 0, MW = 150) [51]. The figure shows that the MW
nition rate does not increase but the calculation need grows significantly (above of 150 is applicable because this length of performance measurement serves
30 in this case). with the same data independently from the data reusage amount (RW).
10
Z.J. Viharos and R. Jakab Measurement 182 (2021) 109616
11
Z.J. Viharos and R. Jakab Measurement 182 (2021) 109616
Fig. 16. Key Performance Indicators (KPIs) of the RL for SPC in manufacturing
agent: cumulated production cost (green), the rate of good products (blue), unit
cost (black) and exploration ratio (red) along the production cycles (BW: length
for considering the past, RW: reusing window, option 1.: action selection pro
portionally to the actual own estimated reward, init = 5: maximal exploration
in new states). The figure on the top represents a first stage of the training
process for mirroring the learning when the KPIs’ values change quickly. The
next three figures show the same, but significantly longer training (and per
forming) process where the left half of the figures presents the starting period of
the agent-event interaction and the right side of the figure presents the “end” of
the training when the unit cost, ratio of good products and the exploitation ratio
is stable, almost constant (between them a long period was cut out). Conse
quently, these right half parts show the final performance of the trained agent.
The figure below the top one shows the KPIs of a full training, the third one is
the same but with zooming into the top (vertical axis) region to show the final
ratio of good products (~99.6% in this example, in blue), the last figure on the
bottom zooms to the constant, final unit cost (0.0075 in this example in black)
and the stable but fortunately not zero ε (average of ~ 0.005 in this example in
red) values. (For interpretation of the references to colour in this figure legend,
the reader is referred to the web version of this article.)
defined for enabling the fair comparison of learnings with different RWs
by sampling them with the same evaluation frequencies. This extension
of the traditional RL is necessary in the given manufacturing SPC envi
ronment, considering the cost of a measurement value and the precise
evaluation requirement about the performance of the production
system.
Beyond the working concept for adapting RL into SPC in production,
some novel RL extensions are described, like the epsilon self-control of
exploration and exploitation, and the optimisation of some meta-data of
the training.
The manufacturing related performance comparison of the continu
ously training and performing agent was realized by analysing the dis
tribution of selected action frequencies and additionally the unit prize
together with the rate of good products, the later measures are the two
most important Key Performance Indicators (KPIs). In the validation
experiment the production process is considered close to its optimum
with constant and stabilized unit prize and level of good products.
Having achieved this performance, the frequencies of selected produc
tion actions were compared to their frequencies in the real
manufacturing environment to measure the performance of the pro
posed concept numerically. The most important KPI was the ratio of the
selected “No action” type action. In the best set-ups this ratio was higher
than in the practice, namely, the agent proposed around 10–30% less
production intervention actions than it happens in the manufacturing
shopfloor. Finally, industrial testing and validation proved the applicability
of the proposed method.
As next step, the future research has to answer numerous open
challenges, like more efficient state coding of the past production his
tory, involving the real-time evaluation of Cp and Cpk values of the
analysed production process. Additionally, the related simulation could
be extended to generate more frequently new sates to be explored to
increase the speed of learning.
12
Z.J. Viharos and R. Jakab Measurement 182 (2021) 109616
the work reported in this paper. [21] Viharos, Zs. J.; Csanaki, J.; Nacsa, J.; Edelényi, M.; Péntek, Cs.; Kis, K. B.; Fodor, Á.;
Csempesz, J.: Production trend identification and forecast for shop-floor business
intelligence, ACTA IMEKO, The Open Access e-Journal of the International
Measurement Confederation (IMEKO), Vol. 5. No. 4., 2016., pp. 49-55. ISSN: 2221-
Acknowledgment 870X.
[22] G. Köksal, I. Batmaz, M.C. Testik, A review of data mining applications for quality
improvement in manufacturing industry, Expert Systems with Applications, Elsevier
First of all, special thanks have to be expressed to the Opel Szent 38 (2011) 13448–13467.
gotthárd Kft., Hungary, working in the automotive sector; especial for [23] T.T. El-Midany, M.A. El-Baz, M.S. Abd-Elwahed, A proposed framework for control
chart pattern recognition in multivariate process using artificial neural networks,
the experts Jenő Csanaki, Csaba Péntek and László Cserpnyák for sup
Expert Systems with Applications 37 (2010) 1035–1042.
porting the research with continuous consultations and validations as [24] G.D. Pelegrina, L.T. Duarte, C. Jutten, Blind source separation and feature
well as with relevant data and bi-directional knowledge exchange. extraction in concurrent control charts pattern recognition: Novel analyses and a
The research in this paper was partly supported by the European comparison of different methods, Computers & Industrial Engineering 92 (2015)
105–114.
Commission through the H2020 project EPIC (https://www.centre-epic. [25] Gutierrez, H. De la T.; Pham, D.T.: Estimation and generation of training patterns
eu/) under grant No. 739592., by the Hungarian ED_18-2-2018-0006 for control chart pattern recognition, Computers & Industrial Engineering, Vol. 95,
grant on a “Research on prime exploitation of the potential provided 2016., pp. 72-82.
[26] W.-A. Yang, W. Zhou, W. Liao, Y. Guo, Identification and quantification of
by the industrial digitalisation” and by the Ministry of Innovation and concurrent control chart patterns using extreme-point symmetric mode
Technology NRDI Office within the framework of the Artificial Intelli decomposition and extremelearning machines, Neurocomputing 147 (2015)
gence National Laboratory Program. 260–270.
[27] A.R. Motorcu, A. Güllü, Statistical process control in machining, a case study for
machine tool capability and process capability, Materials and Design 27 (2006)
References 364–372.
[28] T. Huybrechts, K. Mertens, J. De Baerdemaeker, B. De Ketelaere, W. Saeys, Early
warnings from automatic milk yield monitoring with online synergistic control,
[1] A.G. Barto, R.S. Sutton, P. Brouwer, Associative search network: A reinforcement
American Dairy Science Association, Vol.v97 (2014) 3371–3381.
learning associative memory, Biological Cybernetics 40 (1981) 201–211.
[29] Viharos, Zs. J.; Monostori, L.: Optimization of process chains by artificial neural
[2] W. Bouazza, Y. Sallez, B. Beldjilali, A distributed approach solving partially flexible
networks and genetic algorithms using quality control charts, Proceedings of
job-shop scheduling problem with a Q-learning effect, IFAC PapersOnline 50–1
Danube - Adria Association for Automation and Metrology, Dubrovnik, 1997., pp.
(2017) 15890–15895.
353-354.
[3] N. Khader, S.W. Yoon, Online control of stencil printing parameters using
[30] Viharos, Zs. J.; Kis K. B.: Survey on Neuro-Fuzzy Systems and their Applications in
reinforcement learning approach, Procedia Manufacturing 17 (2018) 94–101.
Technical Diagnostics and Measurement, Measurement, Vol. 67., 2015., pp. 126-
[4] Y.-C. Wang, J.M. Uscher, Application of reinforcement learning for agent-based
136.
production scheduling, Engineering Applications of Artificial Intelligence 18
[31] Móricz, L.; Viharos, Zs. J.; Németh, A.; Szépligeti, A.; Büki, M.: Off-line geometrical
(2005) 73–82.
and microscopic & on-line vibration based cutting tool wear analysis for micro-
[5] B. Waschneck, A. Reichstaller, L. Belzner, T. Altenmüller, T. Bauernhansl,
milling of ceramics, Measurement, Vol. 163., 2020., online available.
A. Knapp, A. Kyek, Optimization of global production scheduling with deep
[32] J. Xie, Y. Wang, X. Zheng, Q. Yang, T. Wang, Y. Zou, J. Xing, Y. Dong, Modeling
reinforcement learning, Procedia CIRP 72 (2018) 1264–1269.
and forecasting Acinetobacter baumannii resistance to set appropriate use of
[6] Schneckenreither M.; Haeussler S.: Reinforcement Learning Methods for
cefoperazone-sulbactam: Results from trend analysis of antimicrobial consumption
Operations Research Applications: The Order Release Problem. In: Nicosia G.,
and development of resistance in a tertiary care hospital, American Journal of
Pardalos P., Giu rida G., Umeton R., Sciacca V. (eds) Machine Learning,
Infection Control 43 (2015) 861–864.
Optimization, and Data Science, Part of the Lecture Notes in Computer Science
[33] M. Gan, Y. Cheng, K. Liu, G. Zhang, Seasonal and trend time series forecasting
book series (LNCS, volume 11331), 2019, pp. 545-559.
based on a quasi-linear autoregressive model, Applied Soft Computing 24 (2014)
[7] Al Kuhnle, L. Schäfer, N. Stricker, G. Lanza, Design, Implementation and
13–18.
Evaluation of Reinforcement Learning for an Adaptive Order Dispatching in Job
[34] C.R. Clarkson, J.D. Williams-Kovacs, F. Qanbari, H. Behmanesh, M.H. Sureshjani,
Shop Manufacturing Systems, Procedia CIRP 81 (2019) 234–239.
History-matching and forecasting tight/shale gas condensate wells using combined
[8] A. Kuhnle, N. Röohrig, G. Lanza, Autonomous order dispatching in the
analytical, semi-analytical, and empirical methods, Journal of Natural Gas Science
semiconductor industry using reinforcement learning, Procedia CIRP 79 (2018)
and Engineering 26 (2015) 1620–1647.
391–396.
[35] I. Koprinska, M. Rana, A.T. Lora, F. Martínez-Álvarez, Combining pattern sequence
[9] Kardos, Cs.; Laflamme, C.; Gallina, V.; Sihn, W.: Dynamic scheduling in a job-shop
similarity with neural networks for forecasting electricity demand time series, The,
production system with reinforcement learning, Procedia CIRP, 8th CIRP
International Joint Conference on Neural Networks (IJCNN) 2013 (2013) 1–8.
Conference of Assembly Technology and Systems, 29 Sept. – 1. Oct., Athens,
[36] V.K. Semenychev, E.I. Kurkin, E.V. Semenychev, Modelling and forecasting the
Greece, 2020., in print.
trends of life cycle curves in the production of non-renewable resources, Energy 75
[10] S. Qu, J. Wang, S. Govil, J.O. Leckie, Optimized Adaptive Scheduling of a
(2014) 244–251.
Manufacturing Process System with Multi-SkillWorkforce and Multiple Machine
[37] R. Paggi, G.L. Mariotti, A. Paggi, A. Calogero, F. Leccese, Prognostics via Physics-
Types: An Ontology-Based, Multi-Agent Reinforcement Learning Approach,
Based Probabilistic Simulation Approaches, Proc. of Metrology for Aerospace, 3rd
Procedia CIRP 57 (2016) 55–60.
IEEE International Workshop 21–23 (2016) 130–135.
[11] A. Nair, B. McGrew, M. Andrychowicz, W. Zaremba, P. Abbeel, Overcoming
[38] Ed. Francesco, De; De Francesco, Ett.; De Francesco, R.; Leccese, F.; Cagnetti, M.:
Exploration in Reinforcement Learning with Demonstrations, in: 2018 IEEE
Improving Autonomic Logistic analysis by including the production compliancy
International Conference on Robotics and Automation (ICRA), 2018,
status as initial degradation state, Proc. of Metrology for Aerospace, 3rd IEEE
pp. 6292–6299.
International Workshop, Firenze, Italy, June 21-23, 2016., pp. 371 - 375.
[12] Plappert, M.; Andrychowicz, M.; Ray, A.; McGrew, B.; Baker, B.; Powell, G.;
[39] R. Sutton, A.G. Barto, Reinforcement Learning: An Introduction, The MIT Press,
Schneider, J.; Tobin, J.; Chociej, M.; Welinder, P.; Kumar, V.; Zaremba, W.: Multi-
Book, 2018.
Goal Reinforcement Learning: Challenging Robotics Environments and Request for
[40] OpenAI. Openai five. https://blog.openai.com/openai-five/, 2018.
Research, ArXiv, (2018), abs/1802.09464.
[41] Xu, Z.; van Hasselt, H.; Silver, D.: Meta-gradient reinforcement learning. arXiv,
[13] Y. Zhu, Z. Wang, J. Merel, A. Rusu, T. Erez, S. Cabi, S. Tunyasuvunakool,
preprint arXiv:1805.09801, 2018.
J. Kramár, R. Hadsell, N. Freitas, N. Heess, Reinforcement and Imitation Learning
[43] Andersen, R.E., Madsen, S., Barlo, A.B.K., Johansen, S.B., Nør, M., Andersen, R.S.,
for Diverse Visuomotor Skills, Proceedings of Robotics: Science and Systems,
Bøgh, S., 2019. Self-learning Processes in Smart Factories: Deep Reinforcement
Pennsylvania, Pittsburgh, 2018, p. 10.
Learning for Process Control of Robot Brine Injection. Procedia Manufacturing 38,
[14] G. Kahn, A. Villaflor, B. Ding, P. Abbeel, S. Levine, Self-Supervised Deep
171–177. DOI: 10.1016/j.promfg.2020.01.023.
Reinforcement Learning with Generalized Computation Graphs for Robot
[44] G. Beruvides, A. Villalonga, P. Franciosa, D. Ceglarek, R.E. Haber, Fault pattern
Navigation, in: 2018 IEEE International Conference on Robotics and Automation
identification in multi-stage assembly processes with non-ideal sheet-metal parts
(ICRA), 2018, pp. 5129–5136.
based on reinforcement learning architecture, Procedia CIRP 67 (2018) 601–606,
[15] P. Long, T. Fan, X. Liao, W. Liu, H. Zhang, J. Pan, Towards Optimally Decentralized
https://doi.org/10.1016/j.procir.2017.12.268.
Multi-Robot Collision Avoidance via Deep Reinforcement Learning, in: 2018 IEEE
[45] F. Guo, X. Zhou, J. Liu, Y. Zhang, D. Li, H. Zhou, A reinforcement learning decision
International Conference on Robotics and Automation (ICRA), 2018,
model for online process parameters optimization from offline data in injection
pp. 6252–6259.
molding, Applied Soft Computing 85 (2019), 105828, https://doi.org/10.1016/j.
[16] Johannink, T.; Bahl, S.; Nair, A.; Luo, J.; Kumar, A.; Loskyll, M.; Ojea, J.A.;
asoc.2019.105828.
Solowjow, E.; Levine, S.: Residual Reinforcement Learning for Robot Control, 2019
[46] F. Li, Y. Chen, J. Wang, X. Zhou, B. Tang, A reinforcement learning unit matching
International Conference on Robotics and Automation (ICRA), Montreal, QC,
recurrent neural network for the state trend prediction of rolling bearings,
Canada, 2019, pp. 6023-6029.
Measurement 145 (2019) 191–203, https://doi.org/10.1016/j.
[19] V. Ranaee, A. Ebrahimzadeh, Control chart pattern recognition using a novel
measurement.2019.05.093.
hybrid intelligent method, Applied Soft Computing 11 (2011) 2676–2686.
[20] K. Lavangnananda, S. Khamchai, Capability of Control Chart Patterns Classifiers on
Various Noise Levels, Procedia Computer Science 69 (2015) 26–35.
13
Z.J. Viharos and R. Jakab Measurement 182 (2021) 109616
[47] R. Wang, H. Jiang, X. Li, S. Liu, A reinforcement neural architecture search method Brazilian states using the nested GR&R measurement system, Meas. J. Int. Meas.
for rolling bearing fault diagnosis, Measurement 154 (2020), 107417, https://doi. Confed., vol. 115, pp. 217–222, Feb. 2018, doi: 10.1016/j.
org/10.1016/j.measurement.2019.107417. measurement.2017.10.048.
[48] P. Ramanathan, K.K. Mangla, S. Satpathy, Smart controller for conical tank system [54] C.U. Mba, V. Makis, S. Marchesiello, A. Fasana, L. Garibaldi, Condition monitoring
using reinforcement learning algorithm, Measurement 116 (2018) 422–428, and state classification of gearboxes using stochastic resonance and hidden Markov
https://doi.org/10.1016/j.measurement.2017.11.007. models, Meas. J. Int. Meas. Confed. 126 (Oct. 2018) 76–95, https://doi.org/
[49] H. Wu, Y. Dai, C. Wang, X. Xu, X. Jiang, Identification and forewarning of GNSS 10.1016/j.measurement.2018.05.038.
deformation information based on a modified EWMA control chart, Measurement, [55] G. Peng, Z. Zhang, W. Li, Computer vision algorithm for measurement and
Volume 160, ISSN 107854 (2020) 0263–2241, https://doi.org/10.1016/j. inspection of O-rings, Meas. J. Int. Meas. Confed. 94 (Dec. 2016) 828–836, https://
measurement.2020.107854. doi.org/10.1016/j.measurement.2016.09.012.
[50] M. Aslam, G. Srinivasa Rao, A. Shafqat, L. Ahmad, R.A.K. Sherwani, Monitoring [56] F. Wu, Q. Li, S. Li, T. Wu, Train rail defect classification detection and its
circuit boards products in the presence of indeterminacy, Measurement, Volume parameters learning method, Meas. J. Int. Meas. Confed. 151 (Feb. 2020), 107246,
168, ISSN 108404 (2021) 0263–2241, https://doi.org/10.1016/j. https://doi.org/10.1016/j.measurement.2019.107246.
measurement.2020.108404. [57] P.M. Gopal, K.S. Prakash, Minimization of cutting force, temperature and surface
[51] Viharos, Zs. J.; Jakab, R. B.: Reinforcement Learning for Statistical Process Control roughness through GRA, TOPSIS and Taguchi techniques in end milling of Mg
in Manufacturing, 17th IMEKO TC 10 and EUROLAB Virtual Conference: “Global hybrid MMC, Meas. J. Int. Meas. Confed. 116 (Feb. 2018) 178–192, https://doi.
Trends in Testing, Diagnostics & Inspection for 2030”, October 20-22., 2020., org/10.1016/j.measurement.2017.11.011.
ISBN: 978-92-990084-6-1, pp. 225-234. [58] K.S. Prakash, P.M. Gopal, S. Karthik, Multi-objective optimization using Taguchi
[52] V. Kavimani, K.S. Prakash, T. Thankachan, Multi-objective optimization in WEDM based grey relational analysis in turning of Rock dust reinforced Aluminum MMC,
process of graphene – SiC-magnesium composite through hybrid techniques, Meas. Meas. J. Int. Meas. Confed. 157 (Jun. 2020), 107664, https://doi.org/10.1016/j.
J. Int. Meas. Confed. 145 (Oct. 2019) 335–349, https://doi.org/10.1016/j. measurement.2020.107664.
measurement.2019.04.076. [59] S. Alam, R. Gupta, J. Bera, Quality controlled compression technique for
[53] Aquila, G.; Peruchi, R.S.; Rotela, P.; Rocha, Jun.L.C.S.; de Queiroz, A.R.; de O. Photoplethysmogram monitoring applications, Meas. J. Int. Meas. Confed. 130
Pamplona, E.; Balestrass, P.P.: Analysis of the wind average speed in different (Dec. 2018) 236–245, https://doi.org/10.1016/j.measurement.2018.07.091.
14