Control Flow Perspective & Process Mining PDF
Control Flow Perspective & Process Mining PDF
ABSTRACT
TAMPERE UNIVERSITY OF TECHNOLOGY
Master of Science Degree Programme in Machine Automation
KHAJEHZADEH, NAVID: Data and Process Mining Applications on a Multi-Cell
Factory Automation Testbed
Master of Science Thesis, 66 pages, 3 Appendix pages
December 2012
Major subject: Factory Automation
Examiner: Professor Jose L. Martinez Lastra
Keywords: energy awareness, behavioural patterns, monitoring, energy
consumption, factory automation, fault detection, process mining, model
discovery, event logs, information system
This paper presents applications of both data mining and process mining in a factory
automation testbed. It mainly concentrates on the Manufacturing Execution System
(MES) level of production hierarchy.
Unexpected failures might lead to vast losses on investment or irrecoverable
damages. Predictive maintenance techniques, active/passive, have shown high potential
of preventing such detriments. Condition monitoring of target pieces of equipment
beside defined thresholds forms basis of the prediction. However, monitored parameters
must be independent of environment changes, e.g. vibration of transportation
equipments such as conveyor systems is variable to workload. This work aims to
propose and demonstrate an approach to identify incipient faults of the transportation
systems in discrete manufacturing settings. The method correlates energy consumption
of the described devices with the workloads. At runtime, machine learning is used to
classify the input energy data into two pattern descriptions. Consecutive mismatches
between the output of the classifier and the workloads observed in real time indicate
possibility of incipient failure at device level.
Currently, as a result of high interaction between information systems and
operational processes, and due to increase in the number of embedded heterogeneous
resources, information systems generate unstructured and massive amount of events.
Organizations have shown difficulties to deal with such an unstructured and huge
amount of data. Process mining as a new research area has shown strong capabilities to
overcome such problems. It applies both process modelling and data mining techniques
to extract knowledge from data by discovering models from the event logs. Although
process mining is recognised mostly as a business-oriented technique and recognised as
a complementary of Business Process Management (BPM) systems, in this paper,
capabilities of process mining are exploited on a factory automation testbed. Multiple
perspectives of process mining is employed on the event logs produced by deploying
Service Oriented Architecture through Web Services in a real multi-robot factory
automation industrial testbed, originally used for assembly of mobile phones.
PREFACE
This thesis work was accomplished at the Factory Automation Systems and
Technologies Lab (FAST), Department of Production Engineering at Tampere
University of Technology, under direction of Professor Jose L. Martinez Lastra and
Research fellow Corina Postelnicu.
Funding of this work came from AREMIS Second Call 2009 Programme under
Agreement Number 100223, correspondent to the project shortly entitled eSONIA:
Embedded Service Oriented Monitoring, Diagnostics and control: Towards the Asset
Aware and Self Recovery Factory.
I would like to express my sincere appreciation to Prof. Jose L. Martinez Lastra for
giving me the opportunity to study at TUT and to work in FAST laboratory which I
always remember it as a nice, well-equipped and friendly multinational place.
I would like to extend my appreciation to Dr. Andrei Lobov for preparing and giving
such a high-level education as well as lots of joys and happiness.
My heartfelt gratitude goes to my supervisor, Dr. Corina Postelnicu for her infinite
support, help and professional guidance, reviews and comments. Without her
constructive supervision it would be very difficult for me to finish this thesis.
I would like to thank all my friends and colleagues in TUT and FAST laboratory
especially Tomas, Luis, Hector and Bin for encouraging me all the times when I needed
motivation and for making an enjoyable place to study and work together.
A world of thanks to my parents for their endless support and unconditional love and
help. Thanks to my brothers and sister for always persuading me to aim for the highest
in my life and to never surrender to the problems.
ACRONYMS
IS
BPM
CRM
ICT
PAIS
WFM
BPMN
EPC
UML
PN
WFN
YAWL
IT
C-net
FSM
GA
HM
BI
BAM
CEP
CPM
CPI
BPI
TQM
KPI
SEMMA
LQN
PCA
PLS
SVM
LS-SVM
QP
SAW
LTL
MXML
WS
DPWS
RTU
MES
DSS
Information System
Business Process Management
Customer Relationship Management
Information and Communication Technology
Process-Aware Information Systems
Work-Flow Management
Business Process Model and Notation
Event Driven Process Chains
Unified Modelling Language
Petri Net
Work Flow Net
Yet Another Workflow Language
Information Technology
Causal net
Finite state machine
Genetic Algorithm
Heuristic Miner
business intelligence
Business Activity Monitoring
Complex Event Processing
Corporate Performance Management
Continuous Process Improvement
Business Process Improvement
Total Quality Management
Key Performance Indicators
Sample, Explore, Modify, Model and Assess
Layered Queuing Networks
Principal Component Analysis
Partial Least Squares
Support Vector Machines
Least Squares Support Vector Machines
Quadratic Programming
Simulator of Assembly Workshops
Linear Temporal Logic
Magic eXtensible Markup Language
Web Services
Device Profile for Web Services
Remote Terminal Unit
Manufacturing Execution Systems
Decision Support System
4
SOA
CRISP-DM
LIST OF FIGURES
Figure 1. Application possibilities of process mining in factory automation ...........................................13
Figure 2. A general structure for producing event logs ..........................................................................15
Figure 3. A sample discovered model for a Lasagna process ..................................................................18
Figure 4. A sample discovered model for a Spaghetti process ...............................................................18
Figure 5. Process mining overview [11] ................................................................................................21
Figure 6. Concrete activities in process mining life cycle [53]................................................................22
Figure 7. BPM life-cycle .......................................................................................................................24
Figure 8. Classification of process history-based methods......................................................................25
Figure 9. Mapping non-linearly separable data into a higher dimensional space .....................................27
Figure 10. Amodel representing steps to produce a work-flow model by alpha-algorithm .......................28
Figure 11. Production automaton system structure .................................................................................33
Figure 12. FASTory Line ......................................................................................................................38
Figure 13. A detailed view of one cell of FASTory line .........................................................................38
Figure 14. Layout of FASTory line from top view .................................................................................39
Figure 15. Conveyor system analyzed. Main conveyor hosting 2 pallets. Bypass hosting 1 pallet............40
Figure 16. The previous system architecture for FASTORY line ............................................................41
Figure 17. Architecture of retrofitted FASTORY line ............................................................................42
Figure 18. Setting used to monitor energy consumption .........................................................................44
Figure 19. Cell 5 bypass conveyor engine power consumption. ..............................................................46
Figure 20. Energy awareness for detection of gradual conveyor missalignment ......................................47
Figure 21. Cell 5 conveyor system engine (main and bypass conveyor) power consumption ...................47
Figure 22. Classes generated by rule based engine and correlated to each sampled data ..........................48
Figure 23. Classified cross validation data generated by LS_SVM .........................................................48
Figure 24. ISA-95 architecture focusing on level 3 ...............................................................................50
Figure 25. Applied method for storing the data including necessary parameters .....................................50
Figure 26. Fragment of generated event log by XESame tool .................................................................53
Figure 27. Alpha algorithm applied to Fastory event log. .......................................................................54
Figure 28. Feasible message sequence patterns between consecutive cells in Fastory..............................55
Figure 29. ProMs Heuristic Miner applied to the event log of Fastory (Partial result). ..........................56
Figure 30. The Heuristic model of Figure 24, converted to Petri Nets.....................................................56
Figure 31. Pie chart, basic performance analysis ....................................................................................57
Figure 32. Basic performance analysis (x axis - performers; y axis total cell working time, seconds ) ..57
Figure 33. KPIs including IPC-2541 states overview .............................................................................58
Figure 34. Pallets performance. Dotted chart analysis ............................................................................60
Figure 35. The effect of improper operation of pallet 33 on other pallets ................................................60
Figure 36. a : The Conformance Checker b: zoom view with descriptions ..............................................62
Figure 37 . log diagnostic perspective....................................................................................................62
Figure 38. Noncompliance between the model and the log (highlighted in orange) .................................63
Figure 39. log view, visualizes each process instance.............................................................................64
Figure 40.Unusual behaviour observed in the generated model from the log...........................................65
Figure 41. Incorrect transition between cell 3 and cell 4 of trace number 100 .........................................66
Figure 42. Dotted Chart implementation for cellNumber 3 messages .....................................................66
Figure 43. MXML log format................................................................................................................74
Figure 44 . XES log format ..................................................................................................................75
LIST OF TABLES
Table 1. Popular Process Modelling Paradigms ....................................................................................17
Table 2. Process mining tools, short view ..............................................................................................35
Table 3. ProM plugins applicable to process discover ............................................................................36
Table 4. ProM plugins applicable to check conformance........................................................................37
Table 5. Testbed generated messages ....................................................................................................42
Table 6. Possible control scenarios considered for FASTory line ...........................................................43
Table 7. Defined rules of event processing ............................................................................................51
Table 8. Raw data associated with producing one product stored in MySQL database ............................52
Table 9. Applied XES Standard extensions............................................................................................52
Table 10. Comparison of online KPIs with KPIs applied offline for basic performance analysis .............58
Table 11. Pallet performance (Basic Pefrormance Analysis plugin, text view) ........................................59
TABLE OF CONTENTS
Abstract ........................................................................................................................ 1
PREFACE .................................................................................................................... 2
ACRONYMS ............................................................................................................... 3
LIST OF FIGURES ...................................................................................................... 5
LIST OF TABLES ........................................................................................................ 6
TABLE OF CONTENTS .............................................................................................. 7
1. INTRODUCTION ................................................................................................ 9
1.1. Problem Definition ..................................................................................... 9
1.1.1. Problem statement ......................................................................... 9
1.1.2. Justification of the work .............................................................. 10
1.1.2.1
Fault detection and diagnosis .......................................................... 10
1.1.2.2
1.2.
2.
TESTBED........................................................................................................... 38
8
4.
References .................................................................................................................. 68
APPENDIX: MXML versus XES format of event logs ............................................... 73
1.
INTRODUCTION
Problem statement
10
Some facts about the systems of interest are not immediately obvious. Hidden
patterns are discoverable in the immense amount of data such systems are breathing in
and out. Inferences are done based on historical process information. This might include
time, energy, workload, social network, dependency of activities, resources, states of
components, embedded rules, flow of activities, etc.
Based on the discovered information, process models for such systems can be
improved in real time. Comparison of real data against initial process models may lead
to conclusions regarding possible faults and failures in the systems analysed, and in
some cases leads to enactment of further processes.
1.1.2.
1.1.2.1
Small failures can lead to significant financial losses or hazardous situations. Faults
have been e.g. responsible for 3% to 8% decrease in oil production, causing up to $20
billion losses in the US economy [82]. Early detection of faults is therefore critical to
prevent serious disruptions to the production process. In addition to vibration,
temperature, pressure and humidity data traditionally used in predictive maintenance
[83], energy consumption signatures of pieces of equipment are a promising way to
detect faults that occur gradually. An example of such faults is the misalignments of
conveyor segments that generally occur as time passes in discrete manufacturing
execution systems due to e.g. friction.
In the context of predictive maintenance, failure thresholds are defined by
experiencing repeatedly equipment failures. This is hazardous and expensive. Common
parameters widely applied for maintenance are prone to be influenced by environmental
changes especially for transportation equipment. For instance, the workload of a
conveyor system is directly affecting the parameter vibration. Therefore a smart method
which associates environment alterations to measured parameter is needed. In this work
we present an approach which links the workload of conveyor to its engine power
consumption and employ it for early fault detection.
1.1.2.2
11
In multi-agent production lines consisting of numerous pallets, tracking all pallets
individually in order to discover malfunctions (e.g. bearing damages, lubrication) is
expensive and time consuming, and prone to errors. Process modelling together with
mining techniques may provide valuable insights about the performance of processes.
Objectives
1.2.2.
Methodology
In data mining and fault detection part of this work, power consumption values
coming in real time from the line are labelled by the concerned workload on the
conveyor system at the same time. Behaviour of tesbed pieces of equipment from
energy consumption point of view is characterised using a supervised machine learning
classification algorithm (SVM). A rule based engine is defined offline and integrated in
the proposed model and finally an approach able to detect malfunctioning of
transportation pieces of equipments based on their energy consumption signatures is
proposed.
12
Given the process mining part of this thesis work, event logs are produced from the
data coming from the controllers embedded in a production line; they are converted to
required XML formats. The state of the art of multiple tools having capabilities of
process mining is reviewed. Worthwhile insights extracted from the event logs and
performance of the production line is evaluated. Having assessed multiple discovery
algorithms, process modelling is performed using heuristic approach and the models are
represented by C-net models. Using some conformance checking techniques, the real
behaviour of the system is compared with the expected behaviour, finally a number of
faults existed in the communication part of the line is discovered.
13
2.
BACKGROUND
This chapter starts by explaining steps required in order to employ process mining
techniques in factory automation.
An information system provides event logs over data captured from the process layer.
In the context of process mining, Complex Event Processing (CEP) is a typical
technique applied by IS in order to provide necessary parameters. Event logs are stored
in a database, in various formats (MXML, XES, etc.) compatible with the desired target
process mining tools (e.g. ProM Framework [1]). Process mining techniques for Workflow discovery are applied on the recorded data to output a model of the transition of
activities between resources. Inspection of this resulted model and conformance
checking of a reference model against the event log leads to e.g. improvements to the
reference model, failure detection and diagnostics in pieces of equipment, or
conclusions about the performance of the system. Figure 1 illustrates the application
possibilities of process mining.
14
15
16
Process mining aims at producing a model extracted from the event logs by
analyzing the events in order to explain the behaviour of the system. Purposes of
generated process models include[11] :
Insight: models are helpful for discussing about requirements, design decisions
and confirming assumptions. Models lead the modeller to survey the model from
different aspects, provide valuable insights and make components clear.
Discussion: unquestionably used as the basis of discussions
Documentation: Models are extremely applicable from educational point of
view
Verification: Applying models, possible errors inside the system can be
perceived.
Performance analysis: Different techniques are used to obtain a deep insight
into the process. For instance, simulation reveals the causes of bottlenecks.
Animation: Each process is controlled by a scenario. Animation is extremely
useful for designers to get feedback from their control scenario design.
Specification: models are extremely advantageous to explain PAISs in advance.
Configuration: models can be used to configure a system.
Models can be classified according to several criteria:
First, depending on how formal the model is. With an informal model it is not
possible to make a certain decision about the feasibility of a trace of activities; yet such
models are utilized for structuring decisions or filing. A formal model, in contrast,
provides sufficient support to decide about the possibility of a set of activities
performed in sequence; at higher levels of BPM, organizations are keen to such models
for analysis purposes and to enact operational processes via play-out engines ( which
permit only those activities which are allowed based on the model[99]).Semi-formal
models are initially designed as informal, but during the process of implementation and
interpretation subsets of them are formalized and supported by some formal and
standard semantics. Examples include: Business Process Model and Notation
(BPMN)[19] - a graphical representation applied for characterizing processes in a
model; UML activity diagrams [20] - a standardized general-purpose modelling
language including a set of graphic notation techniques to create visual models of
object-oriented software-intensive systems; and Event Driven Process Chains(EPCs)
[21] - flowcharts for business process modelling that are generally used for configuring
17
an enterprise resource planning (ERP) implementation and for business process
improvement.
Second, depending on how the model is constructed [22].Models may be created at
design stage or may be derived from the system by reverse engineering or from event
logs.
Table 1 lists modelling paradigms associated to processes for mining purposes, i.e.
to retrieve hidden information about the behaviour emerging at runtime.
Table 1. Popular Process Modelling Paradigms
Transition systems
Petri nets
Work flow nets (WF-nets)
YAWL
(Yet
Another
Workflow Language)
2.3.2.
From the process mining point of view, processes are categorized according to their
(un)structured nature:
In a structured process, the inputs and outputs are specified and clear, and all
activities can be automated. Lasagna processes (Figure 3) are structured. Van Der Aalst
[11] gives an informal definition for Lasagna process: a process is a Lasagna process if
with limited efforts it is possible to create an agreed-upon process model that has a
fitness of at least 0.8 [11]. Almost all process mining algorithms can be applied on a
Lasagna process.
18
Data mining
19
2) Association analysis: describing attribute or values that happen in a
dataset frequently while there is correlation among them.
3) Cluster analysis: dividing data as objects into groups while objects of
each group have similar characteristics. Clustering considered as the main
task for explorative data mining. In clustering data are not labelled at training
time and they are clustered based on optimizing interclass similarity rules.
4) Outlier analysis: Outliers are objects which behave far from the
mainstream of the data.
5) Evolution analysis: Explains and models objects whose behaviours drifts
over time. Time series, pattern matching and similarity are the main analytical
techniques in this field.
Several different life-cycle and operation steps have been proposed for data mining:
The Cross Industry Standard Process for Data Mining (CRISP-DM)
[50],[54] , defines a 6-step methodology for mining data and processes
(understanding the business and the data associated to it, preparing the data,
modelling, evaluation and deployment).
The Sample, Explore, Modify, Model and Assess (SEMMA)
methodology[51]was proposed by the SAS institute to provide a list of
activities for implementation of data mining. SEMMA has been defined to
focus on the modelling part of the data mining projects.
Sample: preparing the data set for the purpose of modelling
through data sampling.
Explore: evaluation of the data prepared at the previous step,
looking for correlations and relations between variables to find
patterns, normal and abnormal behaviours in the data sets, this
step usually is performed via visualization.
Modify: this step is associated with renovation, creation and
modification of the variables in order to prepare modelling.
Model: applies data mining and data modelling techniques as
well as prepared variables for producing models.
Assess: inspecting the accuracy and reliability of the produced
model.
Data mining techniques have demonstrated limited capability in real projects
involving multiple types of processes.
Data mining utilizes techniques to mine data and to discover and extract abstract
patterns from the data sets and show them in different formats like rules or decision
trees. Those patterns are applied for extracting knowledge like data groups, abnormal
records and dependencies. Data mining is not dealing with details and does not give
deep insights about the processes. Interpretation and result reporting are out of the scope
of data mining.
20
2.4.2.
Process mining
Process mining combines data mining and process modelling to generate models
from IT log data. The produced models can be updated fast so they mirror real-life
situations better than data mining, and hence can be used for many (additional)
purposes. Examples include discovery of bottlenecks, analysis of social networks, work
balancing, detection and prediction of (dis)similarities and/or flaws, etc .Nowadays a
large number of factories are implementing process mining techniques to cope with
problems that are not recognisable by common data mining techniques or by tools used
for online monitoring of the systems.
The start point for process mining is an event log. The fundamental plan is to extract
knowledge from event logs recorded by an information system. Process mining aims at
discovering, controlling and improving processes by generating models explaining the
behaviour of systems and analyzing them[28],[29]. There are three types of process
mining projects[11]:
Data-driven (also referred to as curiosity driven), i.e. powered by the
availability of event data. There is no concrete question or goal, but in contrast
with data mining, data-driven process mining projects are more process centric
than data-centric. Process mining looks at the data from process aspect so that
each process consists of a set of events and each event is demonstrating one
executed activity. Process mining usually focuses on those parts or processes
which are in concern.
Question-driven, aiming to answer specific questions, e.g., Why do cases
handled by team X take longer than cases handled by team Y? or Why are
there more deviations in weekends?.
Goal-driven, i.e. aspiring to improve a process with respect to particular Key
Performance Indicators (KPI) such as cost or response times.
Process mining techniques can be categorized as follows:
First, the process (control-flow) perspective is responsible for activity flow
control, i.e., the order of activities. In this perspective the main goal is to
discover and characterize all possible paths and exhibit them in process models
such as Petri nets, EPCs, BPMN and etc. Examples include, but are not limited
to exhibiting feasible paths on Petri Nets or event-driven process chain (EPC).
Second, the organizational perspective presents information about the
concealed resources existing in the log, i.e., performers or actors like people,
systems, roles, departments and their relations (e.g. a social network).
Third, the case perspective evaluates specific cases based on their properties.
Each case can be characterized according to their path in the process model or
based on the actors performing on the case. For instance, if a client shows high
interest about a product, it will be interesting to recognise the supplier and the
number of product each time they order.
21
Fourth, the time perspective takes into consideration execution time and
frequency of events. In this perspective it is necessary for events to contain
timestamp. In this perspective bottleneck discovery, operation of resources or
predicting the remaining time of processes is mostly considered.
Based on the availability of a prior model, process mining techniques are divided into
three different classes (Figure 5):
22
to see which information is typically available the moment the choice is
made. Then classical data mining techniques are used to see which data
elements influence the choice. As a result, a decision tree is generated for
each choice in the process.
3. Process mining for extension, i.e. repairing the model for the purpose of
better reflecting the reality or adding new properties to the model by
evaluating the correspondence between model and logs. e.g., bottlenecks are
shown by colouring parts of the process model.
Possible steps in a process mining implementation (Figure 6) include [53]:
23
model tightly linked with the event log must be interpreted before any redesign,
intervention or adjustment (see Figure 6).
Stage 3: The model is added perspective(s) to make it useful for many different goals.
An approach integrating the model with time, case and organizational perspectives is
presented by Van Der Aalst [11].
Stage 4: Operational support (including prediction, detection and recommendation)
based on current data. Interpretation of results is no longer needed at this stage: for
instance, automatic emails concerning abnormal behaviours can be sent to the
responsible personnel. Preconditions for this stage include an input structured process
like Lasagna. And a high quality of event logs [53], [29].
Many different technologies support business intelligence. Business Activity
Monitoring (BAM) is a technology supports real-time monitoring of business
processing. Complex Event Processing (CEP) is the technology of processing large
amounts of events in order to monitor, direct and optimize the real time business.
Corporate Performance Management (CPM) analyzes the performance of the processes.
Managerial approaches such as Continuous Process Improvement (CPI), Business
Process Improvement (BPI), Total Quality Management (TQM), and Six Sigma are
technologies analyzing the processes deeply for the purpose of discovering rooms for
improvements. Almost all of the mentioned business intelligence tools and management
techniques and technologies can be extracted from the capabilities of process mining
techniques. Great interest in process mining is observed from industrial side[49].
Some analysts limit the capabilities of process mining only to some specific data
mining techniques.
Figure 7 shows the life cycle of Business Process Management ( BPM ).A model is
designed or a predefined model is redesigned. In configuration/implementation phase
the model is converted into processes executing within the systems. After that, the
processes are executed and monitored and based on the knowledge taken from
monitoring the process is adjusted. At the diagnosis stage, the process is evaluated and
based on the demands and environmental effects, some changes might be done on the
model or a new model might be designed.
Design and configuration are linked to the models, while monitoring and diagnostics
are connected to the data. Advances in process mining made it possible to cover the
entire BPM life-cycle. In addition to diagnostics, process mining now supports some
operational aspects of the execution side, and the preparation of recommendations /
predictions based on models extracted from historical information.
24
2.4.3.
25
26
methods of non-statistical quantitative feature extraction and nowadays Support Vector
Machines (SVM) has shown high capabilities and widely applied instead of NNs.
NNs are developed based on heuristic path with wide experiments. SVMs were
developed based on theory, implementation and experimentation respectively. One of
the most significant disadvantages of neural networks (failing to find the global
optimum) appears when there are numerous local optimum values. SVMs always have a
global and unique optimal solution. Another advantage of SVMs as opposed to NNs is
sparseness, i.e. Classifiers are produced by SVMs only based on support vectors and
not dealing with whole data. Last but not least, SVMs can deal with data sets including
large number of features. Numerous features cause data sets to have high dimension.
Since SVMs have the capability to apply kernel tricks, it makes them a significant and
smart technique for high-dimensional data sets. ANNs result in overfitting when
applied for regression or prediction purposes [102].
2.5.1.
Vapnik (1995) introduced Support Vector Machines (SVM), as a data classifier and
a nonlinear function estimation tool.
SVMs are generally two-class classifiers (binary classification) [67]. Multi-class
classification is accomplished using combination of binary classifications and a decision
making procedure. In order to have a classification on a dataset consisting of multiple
labels (multi-class classification), the most used methods in practice are one-versus-all
and one-versus-one classifications. One-versus-all SVM is performed for each class by
disguising between that class and all remained classes and applying winner-takes-all as
decision making strategy about the fitting class. In one-versus-one approach, classes are
evaluated in pairs. One-versus-one decision making procedure is based on max-wins
voting strategy. It defines a discriminated function in which the value of SVM for two
classes, i.e. (c1, c2), is calculated. In the case of positive value the class c1 wins a vote
and in the case of negative value the class c2 wins a vote. Finally the class with highest
number of votes is assigned to the test pattern.
About binary classification, SVM is deployed to classify a dataset into two classes
like {+1,-1}. The purpose of the SVM is to provide a hyperplane as a boundary between
two classes of data. One of the features of SVMs is their capability to classify nonlinearly separable data. Using kernel functions, SVM maps the input data to a higherdimensional feature space where training set is linearly separable. Figure 9 illustrates
how projecting data into a higher dimensional space makes them linearly separable [67].
27
2.5.2.
One of the most interesting properties of SVM is sparseness, i.e. a large number of
elements in convex quadratic programming (QP) problem are zero. However, for data
sets with large amount of data, SVMs have shown to be time and memory consuming
from optimization point of view. This problem is solved by introducing LS-SVM [68]
which solves linear equations instead of QP problems. Although LS-SVM results in
easier-to-solve equations, they suffer from lack of sparseness. However, this problem is
also overcome in [69] so that a simplest pruning method is defined in the context of LSSVM. After all, LS-SVM is preferable in large scale problems and by applying pruning
method, sparseness is solved and the performance is similar to SVM.
The major difference between SVM and LS-SVM is that SVM solves a burdensome
quadratic program problem for training while LS-SVM overcomes that by solving some
linear equations [69].
28
2.6.1.1
The alpha algorithm is a simple algorithm whose principles have been embedded in
other powerful algorithms.
Figure 8 illustrates the basic steps required to produce a Petri net model by algorithm from an event log.
29
5.
6.
7.
8.
2.6.1.2
The -algorithm has shown weakness in dealing with the four validation criteria
identified for process discovery (fitness, precision, generalization and simplicity, see
Section 2.4.2. It is a rather primitive discovery method, as it has shown problems in
practice, e.g. noise (less frequent behaviours), incompleteness (i.e. failing to include all
relevant traces in the resulting log) and complex routing constructs.
The output of heuristic mining approach is a model similar to C-net. The algorithm is
robust because of the representational capabilities of causal nets. Specifically, C-nets
take into consideration the number of times that each activity occurs in a log (its
frequency), and associates it with log-based ordering relations.
For instance, the absolute value of a > b (Figure 8) is the number of times activity a is
followed by b and represented by |a > b| .
The notation a
refers to the number of times a is followed by b but not the other
way around.
| refers to the dependency relation between a and b:
The notation |
|=
| |
| |
|
|
(1)
In case
b , the output of dependency relation will be a number between 1 (i.e. a is
often followed by b) and -1 (i.e. b is often followed by a). A dependency value close to
zero, declares both a and b follow each other.
30
In case of loops, like when activity a occurs sequentially, a=b , the formula for
measuring dependency is modified to the second condition of (2). After computing the
dependency between all the activities of an event log, a matrix including all dependency
values is produced. Having dependency and frequency metrics, a dependency graph can
be generated. Based on the thresholds defined for dependency and frequency, activities
are appeared in the graph. For instance, defining the threshold 2 for frequency causes
those activities which have happened less than two times to not be appeared in the graph
and the same for dependency threshold.
As mentioned before, the output of a heuristic approach is a C-net model, C =
(A, , , , , ). The nodes of dependency graph stands for parameter A and the arcs
of dependency graph stand for parameter D in C-net model. The C-net graph, considers
a start ( )and an end ( ) activity for all the traces so that traces without those unique
start and end events should not be considered. Therefore, dependency graph explained
above is the core of C-net graph. Defining functions I and O completes the C-net from
dependency graph. For instance, if ao = {b,c,d,e} which means that output of activity a
is activities b,c,d,e. Then, O(a)= 2 -1=15 and there are 15 possible output bindings for
activity a. If oa={f,g}, which means f,g are inputs for activity a, then I(a)= 2
1=3 ,
which means that 3 possible input bindings exist for a. If one activity contains only one
potential binding element, C-net considers that activity in the model. Since all the
potential bindings are not in the event log, by replaying the event log on the dependency
graph, the occurrence of input and output bindings are recognised. Then, we can define
some thresholds to remove those bindings which are less frequent. Finally, functions I
and O are obtained and C-net is modelled from dependency graph.
The strong point of the Heuristic miner is its robustness about noise. Having the
possibility to define threshold for dependency and frequency makes it possible to only
extract useful or the core of data instead of modelling details of information.
2.6.2.
31
product or services after delivery). As evaluation of performance analysis is firmly
connected with KPIs, for each dimension different performance indicators is defined.
Time dimension, for instance, may be assigned one to all of the following
indicators:
Lead time: the overall time of the case to be completely performed. In this
dimension, service level is defined as the percentage of the cases that overall
performance time of them is lower than a predefined threshold. This KPI
measures the average of time throughout the case, accounting for variance.
Service time defined as total time takes for one case or one activity of a case. In
fact, service time is portion of lead time.
Waiting time: the time a case or activity waits for a service. Moreover, service
level here defined as the percentage of the activities waiting for a service during
the time defined by averages of time intervals.
Synchronization time: the time an activity waits for another transition to be
triggered.
Process mining and performance analysis
Some of the common questions raised at stage 1 of process mining are: how is the
real process working?, where are the bottlenecks?, how and with whom the tasks
are performed?, how is the communication between components?, looking at the
event logs, is the system working as organizations expect?
Process mining generates a broad overview of many different aspects of the same
process: model representations of throughput times, the control flow of the
process[57],[58], [59], process conformance[60], overview concerning the
communications (social-network [61],[62].
Most of the process mining works have been done by researchers having enough
and proper knowledge about the system. However, complex information systems make
it difficult for different organizations to interpret results with less prior knowledge about
the system. At factory floor, in addition to the overview about the whole performance of
the process, usually process mining is carried on in order to generate diagnostics about
the parts which are malfunctioning. But those parts are not recognisable by common
data mining techniques. Process mining produces numerous insights relying solely
onevent logs.
According to [63], the performance of a system is explained as the response time of
a system to the requests. They also define criterion to evaluate failure or success of a
process as the number of the requests that a system can process at less than a defined
period of time. Considering the bottleneck definition discussed before, crucial part of a
system for the performance enhancement point of view, bottlenecks are recognised as
the main obstacles on performances. Discovering the place of a bottleneck during
testing hardware under load is not a hard task because the hardware is usually monitored
easily while operating. In contrast, discovering the bottlenecks at software side or
logical supports like task instances or buffers is not an easy task because these logical
parts are not usually monitored or tracking them needs to cope with a substantial stream
of data. At such complex cases, a process model can soften the problem. Having a
32
model, evaluating both hardware and logical part of the process is less complicated.
Hence using a model showing the performance of the system, there is no need to deal
with a large number of parameters to discover the bottlenecks. It provides the analysts
to just focus on those tests related to target bottlenecks. Although performance models
have a lot of advantages, generating them as a model reflecting the systems behaviour
with acceptable accuracy and conformance needs a lot of skills.
2.6.3.
33
34
2.6.4.
Available toolkits
35
Table 2
Process mining tools, short overview
Product
description
QPR ProcessAnalyzer[86]
Disco[87]
4
5
6
Discovery
Analyst
(StereoLOGIC) [91].
Flow (Fourspark) [92].
9
10
11
Reflect|one(Pallas Athena)
Reflect
(Futura
Process
Intelligence)
12
Interstage BPME
13
2.6.4.1
Currently ProM version 6 includes about 280 plug-ins which target various application
domains. Tables 3 and 4 illustrate a short list of selected ProM plug-ins, relevant for this
thesis, applicable for discovery and/or conformance check.
36
Table 3
ProM plugins applicable to process discovery
Output process
model
Petri net
EPC,
Fuzzy
C- net
Petrinet,
Fuzzy model
Plugin
description
Alpha algorithm
Alpha ++ algorithm
Parikh Languagebased Region miner
Tsinghua-alpha
Algorithm plugin
Petrify miner
Conversion plug-in
Genetic
plugin
algorithm
Duplicate
Tasks
GA plugin
Heuristic miner
Fuzzy miner
Frequency
Abstraction Miner
Finite
state
machine
/
Transition system
Event-driven
Process Chain
Other
FSM miner
Multi-phase Macro
Plugin
DWS mining plugin
WorkFlow Pattern
miner
Doted
chart
analysis
Performance
analysis with Petri
net
Log summary tool
Basic performance
analysis
37
Table 4
ProM plugins applicable to check conformance
Plug-in
Conformance checker
LTL checker
Description
Checking the conformance to evaluate how much the real process
complies with the plans, by replaying the log on the reference or
discovered model. Provides model and log views.
Applied for conformance checking. The difference between LTL
checker and conformance checker is that LTL checker reports the
discrepancies based on the rules which are defined, not based on the
reference model.
There are some other tools which cannot cover all area of process mining but in
conjunction with Prom they may be considered as a full capable product in the context
of process mining. For instance, Genet, Petrify, Rbminer, and Dbminer[97] are
supporting only process discovery capabilities and for conformance checking
techniques they rely on ProM.
38
3.
TESTBED
The testbed used for this research work (henceforth denoted as FASTory) was
previously used in a real factory for assembly of mobile phone components. Figure 12,
illustrates the layout of the line.
39
40
Figure 15. Conveyor system analyzed. Main conveyor hosting 2 pallets. Bypass hosting
1 pallet.
The initial version of FASTORY line was consisted of five robotic modular
workstations previously applied in a factory for the purpose of assembling mobile phone
components. The line was capable of producing one mobile type at each time. Entire
sensors and actuators were connected via DeviceNet nodes to an OMRON PLC.
Applied PLC was controlling processes of workstations in a centralized mode. An
Ethernet network provided the communication of workstation controllers through
coaxial cable following OMRON FINS protocol [78]. Each pallet has an RFID tag
storing information about the operations completed on the product. Top of each stopper,
an RFID reader has been installed. They read and send the information stored in RFID
tags to the controller. For instance, using information provided by RFID readers, the
position of each pallet in the line is identified by the controllers. RFID readers are
connected over a DeviceNet interface. This interface provides a network between PLC
controllers and I/O nodes, RFID nodes. According to Figure 16, protocols applied for
the older version of FASTORY line are DeviceNet (getting and setting of the sensors
and actuators status), RS232 serial communication applied for direct communication of
the PLC controller and robot controller and Ethernet protocols providing a network
between workstation controllers and manufacturing execution systems located at higher
level.
41
42
higher layer the communication between controllers or Manufacturing Execution
Systems (MES) was based on Ethernet which is another type of network protocol.
Message
Description
EquipmentChangeState
QualityInspection
EnergyMeter
cell ID, recipe number, device type,pallet ID, the current state,
theprevious robot state, time stamp.
quality information including pallet ID, the quality of frame,
screen and keyboard, the quality of the inspection result and a
time stamp.
Robot/conveyor/controller energy consumption, per each
working cell published at a time interval of five seconds.
Every time a pallet arrives to the entry point of a work station, the following events
happen sequentially:
The NFC reader tag of the pallet is read. The conveyor invokes a routing decision in the
DSS. The DSS checks the request and decides if the pallet will be processed or
bypassed. This depends on the pallet, its status, and the work station capabilities. The
43
DSS responds the routing action. In case the pallet will be processed in the workstation,
it sends the operation parameters (drawing component and colour). After this, the
conveyor is responsible of routing the pallet until the next working station.
The devices also communicate peer to peer. Once the routing decision has been
issued, the conveyor has the control over the pallet. If the pallet has to be processed in
the working station, then there is peer to peer communication between the conveyor and
robot. The conveyor requests the Drawing service from the robot. When the robot
finishes its task it informs it to the conveyor and then the pallet continues its flow.
The product outputs of the testbed are drawn- mobile phone components. Each
component (Frame, Keyboard and Screen) has three different types, and each type can
be drawn by three different colours. Consequently, we have 9*9*9 =729 possible
products as outputs. Given multiple types of the products, different control scenarios are
considered for the line. Table 6 explains the scenarios:
Table 6
Possible control scenarios considered for FASTory line
1
2
Scenario
description
If the pallet bears raw paper and if the cell is empty pallet is
transferred into the cell and robot do all the operations.
Cells are divided so that some of them perform only keyboard,
some only frames and some only screens. Each cell is able to
draw with every colour.
Similar to the previous scenario, but each cell is configured to
draw only one colour
For example, each cell can draw only one shape like keyboard
or screen, but including all 9 possible items for that shape such
as colours and different forms of shapes.
All FASTory cells are equipped with energy meters integrated into S1000 processing
units (smart Remote Terminal Units). Each energy meter is an E10 Energy Analyzer
expansion module which provides 3-phase electrical power consumption monitoring
(Figure 18). Phase A is consigned to the robot, phase B is allocated to the cabinet, I/Os
and the controller and phase C is assigned to the conveyor system including the main
and bypass conveyor. Power measurement is achieved by sampling current and voltage.
Figure 18 depicts the current sampled by a current transformer (CT) connected to +Ia-,
+Ib- and +Ic- terminals and the voltage is measured by direct connection of the 3 phases
and neutral to the Vn, Va, Vb and Vc terminals of the E10 expansion module.
Equipment workload refers to the number of pallets occupying the conveyor at one
time. To monitor this information, inductive sensors are mounted at entrance and exit
points of each cell, by arriving or leaving each pallet, CAMX TransferIn/TransferOut
44
notification messages are sent by controllers and then they are counted by a counter
applied on the server side.
45
4.
DATA AND PROCESS MINING IN PRACTICE:
RESULTS
46
misalignment. Such observed difference between normal condition and tightened status
of conveyor belt, established the core of presented approach in this work.
It is worth mentioning that normal data sampled by E10 Energy Analyzer expansion
module embedded into S1000 controller; do not show such a significant difference
(stepwise) between power consumption values linked with various number of pallets
occupying the conveyor. In order to make the shift over power data values more
transparent, data have been expanded and amplified by winding the current wire 3 times
more around the CT.
Figure 19. Cell 5 bypass conveyor engine power consumption. Pallet traffic of 0 to 5. In
red: data obtained with the belt tightened
4.1.2.
47
behaviour in the monitored piece of equipment. The energy values observed no longer
correlate an expected to the semantics defined statically.
4.1.3.
A classifier was built to generate a model estimating the class of new energy values
based on previous observations. The power consumption (WATT) of the conveyor
system was measured at a sampling rate of 1 second. Figure 21 shows the data
monitored (2500 power consumption data samples) during the operation of the
conveyor system (including main and bypass conveyors). The most significant power
consumption change is observed when the number of pallets changes from one to two
on the conveyor system. Consequently, the first class corresponds to power
consumption values estimated to be representative of zero or one pallet and the second
class corresponds to power consumption values representative for two, three or more
number of pallets.
Figure 21. Cell 5 conveyor system engine (main and bypass conveyor) power
consumption
48
Figure 22 illustrates the two classes identified by the rule based engine on the
sampled data represented in Figure 21 . The data categories are separated based on
observed power consumption values and number of pallets associated to each sampled
power value.
Figure 22. Classes generated by rule based engine and correlated to each sampled data
A binary classifier algorithm (LS-SVM) with radial basis kernel function (2) was
applied to the data shown in Figure 22 .
( , ) = exp(
(2)
1800 data samples were used for training, and 700 for validation. The accuracy of
the classifier performance is evaluated by computing the error term defined as the
fraction of the cross validation examples that were classified incorrectly. In our
experiment, the computed error is 5.56%. This error is achieved by calculating the
average percent of the number of unsuccessful estimated classes compared with the
classes given by rule based engine. Figure 23 shows the cross validation data classified
into two classes by LS-SVM.
49
method applied for calculating the error for classifier performance through cross
validation data is applied for fault detection as well. In the presented scenario, such
deterioration would translate to a misalignment of conveyor segments.
An important issue of this approach is to avoid the generation of an illusion of
causality by an improperly tuned classifier. That is, an inappropriately large classifier
error would immediately result in mismatches between the output of the rule based
engine and the output of the LS-SVM, mismatches that would not be caused by
conveyor deterioration but by the classifiers inability to associate correct categories to
the data it is presented with.
Despite expectations, conveyor power consumption does not change instantly once
conveyor workload is modified, but after a short time delay. This delay is responsible
for the few outliers visible in Figure 22. For instance, the number of pallets is 2, one
pallet leaves the cell, then the counter shows number 1, but it takes some time for
conveyor engine to show a power consumption value matched with the new number of
pallets. Such delays on power consumption changes may influence the value of the
calculated error. One of the solutions to reduce the number of outliers is to increase the
sampling time. It provides more time delay and increases the probability to sample
power consumption value after modification toward the workload alterations. However,
increasing the sampling time leads to lose more data.
The Future research will focus on bringing more parameters for analysis, in addition
to power consumption, to increase the number of dimensions of the available datasets.
Vibration and temperature sensors are available in the testbed, and can be used
wherever applicable (e.g. for the robots). As SVMs successfully support regression and
classification of high dimensional input spaces, they were used here to provide an
implementation backbone for future work. Moreover, the classifier performance is more
accurate when dealing with higher dimension of data sets.
50
golden runs , discovering the exceptional conditions in which excellent results
achieved, to reproduce the same conditions. Performance analysis is one of the
typical capabilities of process mining.
Production data collection: the most important activity of manufacturing and
operation level. Production data collection system is responsible for collecting,
storing and reporting information about the execution of processes in factory floor.
In our case, data collection is performed in conjunction with Complex Event
Processing in order to store the data in desired format including required attributes.
Historical data are stored in a remote MySQL data base comprising raw events.
4.2.1.
Figure 25. Applied method for storing the data including necessary parameters
51
The Engine is a JAVA class defining the Event processing Language rules responsible
for receiving the real time data of interest from FASTORY.
Table 7
Defined rules of event processing
Rule
b
a
d
a=EquipmentChangeState(palletId=b.palletId)
d=ConveyorNotification(palletId=b.palletId)
c=EquipmentChangeState(currentState='READ
Y-PROCESSING-EXECUTING',
cellId=b.cellId, palletId=b.palletId)
Description
Robot notification that the IPC2541 state of
Cell 1 has now been found to be'READYPROCESSING-EXECUTING' . This is the
first notification message coming from Cell 1.
Robot (rule a) and conveyor (rule d)
notification messages which contain the
palletID similar with rule b .
Notification that the IPC2541 state of robot
has found, if cellID and palletID numbers are
the same as cell 1 notification message which
had satisfied rule b.
52
Table 8
Raw data associated with producing one product stored in MySQL database
orderID
cellID
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
2
2
3
4
5
5
5
6
8
8
8
9
10
11
12
PalletState
assign
start
complete
autoskip
autoskip
assign
start
complete
autoskip
assign
start
complete
autoskip
autoskip
autoskip
autoskip
timeStamp
recipe
palletID
17/08/2012 12:11:22
17/08/2012 12:11:25
17/08/2012 12:12:09
17/08/2012 12:12:16
17/08/2012 12:12:26
17/08/2012 12:12:42
17/08/2012 12:12:48
17/08/2012 12:13:32
17/08/2012 12:13:38
17/08/2012 12:13:54
17/08/2012 12:13:57
17/08/2012 12:14:18
17/08/2012 12:14:27
17/08/2012 12:14:37
17/08/2012 12:14:47
17/08/2012 12:14:58
0
2
2
0
0
0
4
4
0
0
7
7
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
fromZ
one
1
toZone
2
1
1
1
4
4
2
1
1
4
2
1
1
1
1
4
4
4
4
device
conveyor
robot
robot
conveyor
conveyor
conveyor
robot
robot
conveyor
conveyor
robot
robot
conveyor
conveyor
conveyor
conveyor
The XESame tool [100], [18] was used to generate XES format event logs from such
raw data. Figure 26 illustrates partly the resulting event log obtained. A number of
extensions (Table 9) were applied in the process.
Table 9
Type of
extension
Key
Lifecycle
Transition
assign
Conveyor message
Lifecycle
Transition
autoskip
Conveyor message
Lifecycle
Lifecycle
Orgenizational
Transition
Transition
group
start
complete
recipe
Robot message
Robot message
Robot&conveyor
Orgenizational
resource
palletID
Robot&conveyor
Concept
Name
cellID
Robot&conveyor
Concept
instance
device
Robot&conveyor
53
A number of ProMs discovery plug-ins (e.g. the alpha miner, the heuristic miner,
the genetic miner, the fuzzy miner, the transition system miner, etc.) are applicable on
the input event logs from FASTory.
Examples of questions that may be answered based on the input event log include:
How is the actual execution of the cases?
How many components exist in each case?
What kind of patterns are in the log?
How is the dependency between different activities?
How different sections of the process are communicating?
How many executions happened between different activities?
54
What are the similarities between cases or which parts are working similar to
each other?
If some rules exist between events or activities, are they satisfied?
4.2.2.1
ProMs Alpha Miner plug-in produces a Petri Net (PN) model based on a given event
log. The control flow model obtained from the event log of FASTory (Figure 27) shows
the event notifications of the cells as transitions. Events belonging to the same cell are
grouped via blue rectangles.
Figure 27. Alpha algorithm applied to Fastory event log: output PN model. Cell 3
(right side) exhibits 4 types of messages; autoskip, assign, start and complete messages.
The PN model generated via alpha algorithm exhibits one unexpected transition
(highlighted in red, in the figure). This transition shows possibility for the pallet to
move back to Cell 2, after bypassing from Cell 4, or to directly go to the complete state
of Cell 5. This is in disagreement to the real life situation (the actual control embedded
in the line), depicted in Figure 28. Pallet flow in Fastory is unidirectional (Cell1
Cell2
Cell3
Cell4
Cell5 Cell1). Pallets at the entry point of a cell can either
go to the main conveyor for processing by the robot (corresponding to a sequence of
messages assign, start, complete), or to the bypass conveyor of that cell (i.e. one
autoskip message).
55
Figure 28. Feasible message sequence patterns between consecutive cells in Fastory
ProMs Conformance Checker plug-ins built to evaluate the fitness of the generated
model and its conformance with the event log. Three different types of analysis are
embedded: fitness, precision and structure.
Fitness is calculated by replaying the log on the Petri net format of the
model. It evaluates whether the observed process complies with the
control flow specified by the process.
Precision/Behavioural Appropriateness evaluates how precisely the model
describes the observed process.
Structural fitness evaluates whether the model describes the process in a
structurally suitable way.
The fitness of the model shown in Figure 27 was evaluated at approx. 58%. This result
is in agreement to findings reporting unreliability exhibited by the -algorithm in real
life projects[29], [53]. The Alpha miner is not suitable for FASTorys event log.
4.2.2.2
The high fitness and precision of the models generated based on the heuristic
approach, have given rise to tremendous interest by analysts to apply this algorithm.
The strongest point of Heuristic Miner (2.6.1.2) is its robustness to noise.
The Heuristic Miner operates based on causal dependency between events.
Figure 29 shows the control flow model produced by Heuristic miner for the event
log of FASTory. All traces start and end by the START/END activities added manually
at log cleaning stage.
56
Figure 29. ProMs Heuristic Miner applied to the event log of Fastory (Partial result).
Zoom in on the right side.
Tasks correspond to the boxes of the model, with connecting arcs showing
dependency between the tasks. Numbers associated to each box indicate the amount of
times the associated tasks have been performed (e.g. according to the model shown, the
robot of Cell 2 has been assigned to operate on the pallet 70 times). Numbers associated
to the arcs indicate arc frequency (the number of times the tasks associated to the arc
performed sequentially; bottom) and the dependency (i.e. how certain we are that two
activities are dependent of each other; top).
To evaluate model fitness and conformance via the Conformance Checker, the input
must be formerly converted to a PN (Figure 30), or EPC / Fuzzy models (refer to ).
Figure 30. The Heuristic model of Figure 24, converted to Petri Nets
57
The fitness value computed for the obtained heuristic net model is 98%, confirming
the model reflects accurately the real testbed scenarios.
4.2.3.
4.2.3.1
basic
Several KPIs designed and implemented at FASTORY line in order to supervise the
success of goals. Those KPIs are under measurement frequently to provide proper
knowledge for fast decisions. To handle and overcome high quantity of real time data
coming from lower levels, proper and powerful information technology must be applied
at higher levels. Service Oriented Architecture (SOA) and Complex Event Processing
(CEP) are two IT systems applied at higher level. SOA provides fast and reliable
capability to collect information holding status of manufacturing processes and CEP is
applied for tracking and analyzing stream of that information in order to store events in
a desired format in databases. Implemented KPIs at FASTORY are classified into
efficiency, Energy, Quality and Reliability and some other KPIs reflecting the overall
status of the entire production line.
58
Table 10
Comparison of online KPIs with KPIs applied offline for basic performance analysis
Robot
2
3
4
5
6
8
9
10
11
12
3171
1038
3345
3105
2537
1711
1213
465
336
152
Process mining
Basic performance analysis
Robot working time(sec)
2708
960
3146
2724
2263
1566
1105
384
275
66
Table 10 illustrates the results of online KPIs (pie charts) showing the IPC-2541
status of the robots and KPIs applied offline for performance analysis of the FASTORY.
Comparing the results of them demonstrates that they have similar results and robots
2,4,5 and 6 are surpassing other robots. Given the results from both types of KPIs, some
trivial dissimilarity in the working time of robots are visible in both columns. Those
differences are due to the fact that the rules applied in the CEP of online and offline
59
KPIs are slightly different. As explained in section 0, in order to provide event logs for
process mining techniques, events are stored in the listener when they satisfy rules c and
d, otherwise all the messages related to the related productID are not recorded. In other
words, notification message of each productID will be stored in a chain if pallet starts
from cell number one and ends to cell number one. In contrast, datasets applied for
online KPIs are including whole data because there is no need for productID parameter
(online KPIs have not developed for process analysis point of view). Consequently,
although online KPIs are representing that working time of robots is slightly higher than
offline KPIs, they demonstrate similar results about performance and load balance of
the FASTory.
4.2.3.2
After evaluating the performance of the robots and balance of the line (4.2.3.1),
pallet performance is also another option. In case of mass production, a large number of
pallets are applied and tracking everything online is both time consuming and
expensive. Usually it is not possible to detect the pallets malfunctioning (i.e. uneven
movement / off-balance structure of pallets) by monitoring due to traffic of the pallets or
some delay problems. Malfunctioning of the pallets leads to various problems such as
decrease in line efficiency, increased traffic /bottlenecks, damaging of other pallets,
increased maintenance costs.
An analysis example is illustrated in Table 11. The interval for a pallet moving
among two sequential FASTory cells is usually less than 20 seconds. The longest path
between two sequential cells (25 seconds) is between cells number 6 and 8, because cell
number 7 of the FASTory line is under development and bypassed by a conveyor.
Table 11
Pallet performance (Basic Pefrormance Analysis plugin, text view)
60
oval shapes correspond to the moments that pallets had significant delay while moving
between the two cells. It is visible that the time interval for pallet 33 while moving
between the cells (on intermediate bypass conveyors) exceeds the expected time in
several instants. Improper operation of some pallets affects other pallets performance.
The red arrow (Figure 34) highlights the improper operation of pallet 14 (a significant
delay, of approx 1 min 30 seconds, until the following message is received from Pallet
14). Cell number 10 (red dot) contains both pallets 33 and 14 at the same time on the
bypass conveyor (autoskip). It can be concluded that pallet 33 is influencing other Pallet
14s performance.
61
4.2.4.
4.2.4.1
This section documents results obtained when checking a reference process model
against the event log of the line.
Figure 36(a) illustrates the model of Figure 30, after the log entries are replayed on it
in ProMs Conformance Checker. The information obtained includes the number of
missing / remained tokens, the tasks that could not be enabled in replay mode, the tasks
enabled but unfinished, and path coverage (i.e. all tasks and arcs involved in replaying
the event log).
Some of the questions of interest here include:
Does this model properly represent the actual process?
How much of the observed model behaves like the reference model?
If some deviations exist between the observed model and the reference one,
where are they located in the model?
The orange highlighted boxes of Figure 30.a are illustrating those parts of the model
that do not comply with the log or the reference model. The red highlighted circles show
the remaining / missing tokens when replaying of the log.
Given Figure 36 (b), the textbox associated to cellNumber=9 (autoskip) containing
#token and # instances information , details the fact that for 3 out of 144 product
instances (papers/ processInstances/traces), after the log is replayed on the reference
model, there is an inconsistency found between the updated model and the event log
(cell number 9 (autoskip) event is triggered in the event log despite impossibility to do
so when inspecting the model deficiency in input tokens associated to the transition
mirroring the firing of the mentioned event). The number of missing tokens associated
to the deficient input place(s) is specified in the #tokens column, meaning the transition
is fired in the replay log mode despite the lack of tokens enabling it.
The log perspective is more suitable for highlighting discrepancies. Figure 37 shows
an example, depicting the sequence of the events in each trace (process instance) from
beginning to end. OrderIDs associated to the traces are shown on the left side.
62
a.
b.
3 missing
tokens
63
replayed on the model. Inspection of the contents of orderID 141 led to conclude no
message from cell number 5 was even received for this trace/process instance.
The cause of the orange event for OrderID 140 is not immediately obvious (the
sequence of messages complies with the rules designed for Fastory, therefore the event
log is most probably correct). To investigate this discrepancy between the model and the
real log, the transition from cellNumber3 (autoskip) to cellNumber4 (assign) is
inspected in model perspective (Figure 38). Findings show the model includes only one
transition from the autoskip event (bypass conveyor) of Cell 3 to the autoskip event of
Cell 4. Therefore the model is not complete enough to replay all the events of the log.
Figure 38. Noncompliance between the model and the log (highlighted in orange)
Figure 39 shows that most non-compliance cases between the log and the model are
associated with Cell 4. According to stage 3 of process mining structure (Figure 6) by
applying conformance checking or comparing the predefined model with the discovered
model, the model is verified or both models are combined to have a more precise
reflection about the process activities. In our case, by adding a transition from cell
cellNumber3 (autoskip) event to cellNumber4 (assign) event, it should be possible to
replay all problematic messages and it will result in a model which is representing the
behaviour of the processes for producing 146 products, based on the event logs. This
step is considered as the enhancement of the model.
Other possible sources of non-compliance between the model and the log are lost
messages and/or communication problems leading to a wrong notification message
format. Such causes do not arise frequently, and are considered a negligible source of
errors for the test case of focus.
Note: Model soundness[81] , [30] , must be ensured prior to replaying the log on the model and checking conformance of the
model against the event log. Three criteria are defined to evaluate soundness of a WF-net:
Complete option: for each case, it is always possible to reach to a state defined as the END.
Proper completion: When state is marked as END, none of the other places of the case is marked.
No dead transition: there should not be any deadlock in the case, always there should be an auxiliary path in case a
transition is impossible to be fired.
A WF-net is sound if and only if the short-circuited net (i.e. END and START places connected to each other) is both live and
bounded [31]. Model soundness is ensured by artificially adding START/END events at the beginning/end of all traces via
Conformance Checker .
64
Figure 39. log view, visualizes each process instance. Incompatible messages related to
cell number 4 is highlighted with red colour circles.
4.2.4.2
ProMs LTL Checker plug-in verifies conformity of the input event log against a set
of rules / requirements defined at design stage that must be fulfilled. Figure 28 shows
examples of informal descriptions of system correctness.
Figure 40 shows a part of the heuristic model obtained for FASTory. The red
highlighted part of the model indicates behaviour considered abnormal after comparison
against the rules defining the expected system behaviour. Specifically, Cell 2 is
bypassed (autoskip), and then cell number 3 is assigned to work on the product (assign).
In FASTory, it is expected that when assign notification message for a cell is
received, start and complete notification messages are immediately following.
Additionally, it is expected to observe only one output arc from the task cellNumber =3
(assign). However, this part of the model is displaying three output arcs from the task
cellNumber3 (assign), while just one of them (assign, start, complete) is acceptable such
a problem can be caused by:
1. A need to modify the model, as it is not projecting the real behaviour of
processes. This is improbable because the calculated fitness value of the model
is high.
65
2. A need to redefine the rules (defined by CEP) for collecting and aggregating the
information (the information system).This is improbable because the problem
analyzed occurs only in one part of the model.
3. A need to address communication problems or controller failure in Fastory. This
is the most likely situation. Improper communication between some parts of the
line and the server may be caused by e.g. high traffic of messages or some
components breakdown.
Figure 40.Unusual behaviour observed in the generated model from the log
ProMs LTL checker is used to define the rules that the FASTory line is expected to
comply from the behavioural viewpoint. An example of such rule is:
For_all_activities_always_event_E_implies_eventually_event_F. Event E is in this
case the assign event and event F is the start event.
The rule is not satisfied for 45 out of 146 process instances. Figure 41 shows a
counterexample for the rule, an incorrect transition between cellNumber 3 and
cellNumber4 in the process instance of orderId=100. Two cases are feasible here:
1) Cell 3 performed no operation on the pallet, and the pallet has been
bypassed even though the notification message assign has been
received. 2) The robot of cellNumer 3 has worked on the pallet, but robotrelated messages (start, complete) were not submitted by the controller. It
66
was discovered that in all the other 44 traces, such a wrong transition
exists.
Figure 41. Incorrect transition between cell 3 and cell 4 of trace number 100
Messages may be lost because of malfunctioning controllers or heavy traffic (many
controllers sending messages at the same time on the server side). The ProM Dotted
Chart plug-in, can be helpful to discover and troubleshoot such problems. The plug-in
operates only on event logs (not process models) allowing the monitoring of the number
of messages received from different controllers at the same time on the server side / the
load of the messages at each time instant.
Figure 42 shows the Dotted Chart Analysis tool in action: for Cell 3, after the time
instance of 14:29:03, none of the robot-related messages (start and complete) was sent
by robots controller.
67
CONCLUSIONS
The results reported in this thesis include:
Data mining approaches employed for diagnosis of faults:
Data mining approaches are employed to characterize the behaviour of pieces of
equipment placed in a production line testbed from the viewpoint of energy
consumption. During training phase, the energy signature of the system components
is associated with semantics concerning the workload of the conveyor belts. At
validation phase, real time data coming from the line is input to the classifier and the
output obtained is compared against the output of a rule based engine defined
offline. Chain consecutive mismatches pinpoint to possible gradual deterioration of
expected behaviour. In the presented scenario, such deterioration would translate to
a misalignment of conveyor segments.
Evaluation of the usability of process mining applications for:
Discovery of underlying models and performance related issues.
Conformance check of a reference model against the original event log. This
resulted in highlighting significant information concerning communication
failures in-between pieces of equipment
68
REFERENCES
[1]
[2]
H.M.W. (Eric) Verbeek, H.M.W. (Eric) Verbeek, ProM 6 Tutorial, August 2010
W.M.P. van der Aalst, A.J.M.M. Weijters, L. Maruster, Workflow mining: discovering
process models from event logs, IEEE Trans. Knowl. Data Eng. 16 (9) (2004)11281142.
[3]
[4]
[5]
Robert J. Ellison, Andrew P. Moore, Trustworthy Refinement Through IntrusionAwareDesign (TRIAD), Networked Systems Survivability, October 2002 , Pittsburgh, PA
15213-3890
[6]
[7]
[8]
[9]
[10]
[11]
Wil M.P. van der Aalst, Process Mining Discovery, Conformance and Enhancement of
Business Processes, Springer-Verlag Berlin Heidelberg 2011
[12]
[13]
Fenton, N., Neil, M.: A Critique of Software Defect Prediction Models. IEEE
Transactions on Software Engineering 25, 675-689 (1999)
[14]
Li, P.L., Herbsleb, J., Shaw, M.: Forecasting Field Defect Rates Using a Combined TimeBased and Metrics-Based Approach: A Case Study of OpenBSD. In: 16th IEEE International
Symposium on Software Reliability Engineering, pp. IEEE Computer Society, (2005)
[15]
Mockus, A., Weiss, D., Zhang, P.: Understanding and Predicting Effort in Software
Projects. In: International Conference on Software Engineering (ICSE), pp. (2003)
[16]
Wikan Danar Sunindyo Thomas Moser Dietmar Winkler Stefan Biffl,Process Analysis
and Organizational Mining in Production Automation Systems Engineering , Vienna
University of Technology, Austria
[17]
Minseok Song and Wil M.P. van der Aalst. Supporting Process Mining by Showing
Events at a Glance. Eindhoven University of Technology.
[18]
ing. J.C.A.M. Buijs: Mapping Data Sources to XES in a Generic Way, Master
Thesis, Eindhoven, March 2010
[19]
Modeling
Notation
(BPMN)
69
[20]
2003
[21]
[22]
W.M.P. van der Aalst, Process-Aware Information Systems: Lessons tobe Learned from
Process Mining, Department of Mathematics and Computer Science,Eindhoven University of
Technology
[23]
Yagang Zhang; Jinfang Zhang; Jing Ma; Zengping Wang, Fault Detection Based on
Data Mining Theory ,IEEE Conference Publications, Intelligent Systems and Applications,
2009. ISA 2009, Page(s): 1 4.
[24]
J.W. Han, M. Kamber, Data Mining: Concepts and Techniques, Second Edition. San
Francisco: Morgan Kaufmann, Elsevier, 2006.
[25]
[26]
Y.J. Kwon, O.A. Omitaomu and G.N. Wang, Data mining approaches for modelling
complex electronic circuit design activities, Computers &Industrial Engineering, vol.54(2),
pp.229-241, March 2008.
[27]
Vapnik, VN, Cortes, C., Support Vector Networks, Machine Learning, vol. 20, pp.
273-297, 1995.ww.tut.fi/triphome/julkai-sut/welcome.html
[28]
[29]
[30]
W.M.P. van der Aalst. The Application of Petri Nets to Workflow Management. The
Journal of Circuits, Systems and Computers, 8(1):2166, 1998.
[31]
W.M.P. van der Aalst, K.M. van Hee, A.H.M. terHofstede, N. Sidorova, H.M.W.
Verbeek,M. Voorhoeve, and M.T. Wynn. Soundness of Workflow Nets: Classification,
Decidability, and Analysis. Formal Aspects of Computing, 2011.
[32]
[33]
A.W. Scheer. Business Process Engineering, Reference Models for Industrial Enterprises.
Springer, Berlin, 1994.
[34]
W.M.P. van der Aalst, A.J.M.M. Weijters, and L. Maruster. Workflow Mining:
Discovering Process Models from Event Logs. IEEE Transactions on Knowledge and Data
Engineering,16(9):11281142, 2004.
[35]
A.K.A. de Medeiros, W.M.P. van der Aalst, and A.J.M.M. Weijters. Workflow Mining:
CurrentStatus and Future Directions. In R. Meersman, Z. Tari, and D.C. Schmidt, editors, On the
Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE, volume 2888 of
Lecture Notes in Computer Science, pages 389406. Springer, Berlin, 2003.
[36]
L. Wen, W.M.P. van der Aalst, J. Wang, and J. Sun, Mining Process Models with NonFree Choice Constructs, Data Min. Knowl. Discov., 15-2 (2007),pp. 145-180.
[37]
Carmona, J., Cortadella, J., Kishinevsky, M., A region-based algorithm for discovering
Petri nets from event logs , Lecture Notes in Computer Science (including subseries Lecture
Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 5240 LNCS , pp. 358-373
[38]
Wen, L., Wang, J., Sun, J.,Detecting implicit dependencies between tasks from event logs
. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence
and Lecture Notes in Bioinformatics) 3841 LNCS , pp. 591-603
[39]
Cortadella, J., Kishinevsky, M., Kondratyev, A., Lavagno, L., Yakovlev, A.: Petrify:a
tool for manipulating concurrent specifications and synthesis of asynchronous controllers. IEICE
Transactions on Information and Systems E80-D (1997) 315325
70
[40]
A.K.A de Medeiros, A.J.M.M. Weijters, and W.M.P. van der Aalst. Genetic Process
Mining: An Experimental Evaluation. Data Mining and Knowledge Discovery, 14(2):245304,
2007.
[41]
A.J.M.M. Weijters and J.T.S. Ribeiro. Flexible Heuristics Miner (FHM). BETA
WorkingPaper Series, WP 334, Eindhoven University of Technology, Eindhoven, 2010.
[42]
A.J.M.M. Weijters and W.M.P. van der Aalst. Rediscovering Workflow Models from
Event-Based Data Using Little Thumb. Integrated Computer-Aided Engineering, 10(2):151
162,2003.
[43]
C.W. Gnther and W.M.P. van der Aalst. Fuzzy Mining: Adaptive Process
SimplificationBased on Multi-Perspective Metrics. In G. Alonso, P. Dadam, and M. Rosemann,
editors,International Conference on Business Process Management (BPM 2007), volume 4714
ofLecture Notes in Computer Science, pages 328343. Springer, Berlin, 2007.
[44]
A.W. Biermann and J.A. Feldman. On the Synthesis of Finite-State Machines from
Samplesof Their Behavior. IEEE Transaction on Computers, 21:592597, 1972.
[45]
B.F. van Dongenand W.M.P. van der Aalst. Multi-Phase Process Mining: Building
Instance Graphs. In P. Atzeni, W. Chu, H. Lu, S. Zhou, and T.W. Ling, editors, International
Conference on Conceptual Modeling (ER 2004), volume 3288 of Lecture Notes in Computer
Science, pages 362376. Springer, Berlin, 2004.
[46]
W.M.P. van der Aalst, A.H.M. terHofstede, B. Kiepuszewski, and A.P. Barros. Workflow
Patterns. Distributed and Parallel Databases, 14(1):551, 2003.
[47]
W.M.P. van der Aalst, A.J.M.M. Weijters, and L. Maruster. Workflow Mining:
Discovering Process Models from Event Logs. IEEE Transactions on Knowledge and Data
Engineering, 16(9):11281142, 2004.
[48]
Rozinat, A., &Aalst, W. van der (2006a). Conformance Testing: Measuring the Fit and
Appropriateness of Event Logs and Process Models. In C. Bussler et al. (Ed.), BPM 2005
Workshops (Workshop on Business Process Intelligence) (Vol. 3812, pp. 163176). SpringerVerlag, Berlin.
[49]
Wil van der Aalst et al , Process Mining Manifesto, Business Process Management
Workshops 2011, Lecture Notes in Business Information Processing, Vol. 99, Springer-Verlag,
2011
[50]
[51]
[52]
W. van der Aalst, A. Medeiros, and A. Weijters, Genetic process mining, in ICATPN
2005, ser. LNCS, vol. 3536, 2005, pp. 48 69.
[53]
Process mining: discovering and improving Spaghetti and Lasagna processes Wil M.P.
van der Aalst, Computational Intelligence and Data Mining (CIDM), 2011 IEEE Symposium on
Publication Year: 2011 , Page(s): 1 - 7 IEEE Conference Publications
[54]
[55]
[56]
B.D. Clinton and A. van der Merwe. Management Accounting: Approaches, Techniques,
and Management Processes. Cost Management, 20(3):1422, 2006.
[57]
W. van der Aalst, A. Medeiros, and A. Weijters, Genetic process mining, in ICATPN
2005, ser. LNCS, vol. 3536, 2005, pp. 48 69.
71
[58]
C. Gunther and W. van der Aalst, Fuzzy mining: Adaptive processs implification based
on multi-perspective metrics, in BPM 2007, ser. LNCS, vol. 4714, 2007, pp. 328 343.
[59]
J. M. van der Werf, B. van Dongen, C. Hurkens, and A. Serebrenik, Process discovery
using integer linear programming, in ATPN, ser. LNCS, vol. 5062, 2008, pp. 368 387.
[60]
A. Rozinat and W. van der Aalst, Conformance testing: Measuring thefit and
appropriateness of event logs and process models, in BPM 2005Workshops, ser. LNCS, vol.
3812, 2005, pp. 163 176.
[61]
W. van der Aalst, H. Reijers, and M. Song, Discovering social networks from event
logs, Computer Supported Cooperative Work, vol. 14, no. 6,pp. 549 593, 2005.
[62]
M. Song and W. van der Aalst, Towards comprehensive support for organizational
mining, Technische Universiteit Eindhoven, Tech. Rep.,2007.
[63]
Ahmad Mizan and Greg Franks, Automated Performance Model Construction Through
Event Log Analysis, Software Testing, Verification and Validation (ICST), 2012 IEEE, Page(s):
636 641
[64]
[65]
Yagang Zhang; Jinfang Zhang; Jing Ma; Zengping Wang, Fault Detection Based on
Data Mining Theory ,IEEE Conference Publications, Intelligent Systems and Applications,
2009. ISA 2009, Page(s): 1 4.
[66]
[67]
[68]
[69]
Haifeng W., Comparison of SVM and LS-SVM for Regression , IEEE International
Conf. On Neural Networks and Brain, vol. 1, pp. 279 283, 2005.
[70]
[71]
Process Mining of Test Processes: A Case Study, Process Mining Applied to the Test
Process of Wafer Steppers in ASML, Systems, Man, and Cybernetics, Part C: Applications and
Reviews, IEEE Transactions on , Page(s): 474 479 , 2009
[72]
[73]
B.F. van Dongenand W.M.P. van der Aalst. A Meta Model for Process Mining Data. In
Conference on Advanced Information Systems Engineering, volume 161, Porto, Portugal, 2005.
10, 11, 83
[74]
[75]
[76]
W. M. P. van der Aalst V. Rubin ,H. M.W. Verbeek B. F. van Dongen , E. Kindler C.
W. Gnther, Process mining: a two-step approach to balance between underfitting and
overfitting, 25 November 2008, open access at Springerlink.com
72
[77]
W.M.P. van der Aalst, B.F. van Dongen, J. Herbst, L. Maruster, G. Schimm, and
A.J.M.M. Weijters. Workflow Mining: A Survey of Issues and Approaches. Data and
Knowledge Engineering,47(2):237267, 2003.
[78]
[79]
Izaguirre M.J.A.G., Lobov A., Lastra J.L.M., OPC-UA and DPWS interoperability for
factory floor monitoring using complex event processing, Industrial Informatics (INDIN), 2011
9th IEEE , International Conference, pp. 205-211, 2011
[80]
S1000
User
Manual,
Inico
Technologies
http://www.inicotech.com/doc/S1000%20User%20Manual.pdf
Ltd.,
Available
online:
[81]
W.M.P. van der Aalst. Veri_cation of Workow Nets. In P. Az_ema and G. Balbo, editors,
Application and Theory of Petri Nets 1997, volume 1248 of Lecture Notes in Computer Science,
pages 407:426. Springer-Verlag, Berlin,1997.
[82]
Wang, H., Chai, TY, Ding JL, Brown M.,Data Driven Fault Diagnosis and Fault
Tolerant Control: Some Advances and Possible New Directions, ActaAutomaticaSinica, Vol.
35, No.6, pp. 739747, June, 2009.
[83]
[84]
Mobley, An Introduction to Predictive Maintenance, 2nd edition, pp. 99- 113, 2002.
[85]
http://www.softwareag.com/corporate/products/aris_platform/aris_controlling/aris_proce
ss_performance/overview/default.asp , accessed September 26, 2012
[86]
[87]
[88]
[89]
[90]
[91]
[92]
[93]
[94]
[95]
[96]
[97]
[98]
[99]
D. Harel and R. Marelly. Come, Lets Play: Scenario-Based Programming Using LSCs
and the Play-Engine. Springer, Berlin, 2003.
[100]
H.M.W. Verbeek, J.C.A.M. Buijs, B.F. van Dongen, and W.M.P. van der Aalst. XES,
XESame, and ProM6. available online :http://wwwis.win.tue.nl/~wvdaalst/publications/p566.pdf
[101]
Luis E. Gonzalez Moctezuma, Jani Jokinen, Corina Postelnicu, Jose L. Martinez Lastra,
Retrofitting a Factory Automation System to Address Market Needs and Societal Changes,
Industrial Informatics (INDIN),pp. 413-418, 2012 10th IEEE International Conference
[102]
L.B. Jack, A.K. Nandi, Comparison of neural networks and support vector machines in
condition monitoring application, in: Proceedings of COMADEM 2000, Houston, TX, USA,
2000, pp.721730.
73
74
Originator attribute shows the performer of the activity. As mentioned before, process
instances may involve more data values generating more information. In this example
group, lifeCycle, timeStamp and name are additional attributes for
ProcessInstance.
75