Full Text 01
Full Text 01
ZED LEE
c 2020
Zed Lee
Abstract | i
Abstract
Sequences of event intervals occur in several application domains, while their
inherent complexity hinders scalable solutions to tasks such as clustering
and classification. In this thesis, we propose a novel spectral embedding
representation of event interval sequences that relies on bipartite graphs. More
concretely, each event interval sequence is represented by a bipartite graph by
following three main steps: (1) creating a hash table that can quickly convert a
collection of event interval sequences into a bipartite graph representation, (2)
creating and regularizing a bi-adjacency matrix corresponding to the bipartite
graph, (3) defining a spectral embedding mapping on the bi-adjacency matrix.
In addition, we show that substantial improvements can be achieved with
regard to classification performance through pruning parameters that capture
the nature of the relations formed by the event intervals. We demonstrate
through extensive experimental evaluation on five real-world datasets that
our approach can obtain runtime speedups of up to two orders of magnitude
compared to other state-of-the-art methods and similar or better clustering and
classification performance.
Keywords
event intervals, bipartite graph, spectral embedding, clustering, classification.
ii | Sammanfattning
Sammanfattning
Sekvenser av händelsesintervall förekommer i flera applikationsdomäner,
medan deras inneboende komplexitet hindrar skalbara lösningar på uppgifter
som kluster och klassificering. I den här avhandlingen föreslår vi en ny spektral
inbäddningsrepresentation av händelsens intervallsekvenser som förlitar sig
på bipartitgrafer. Mer konkret representeras varje händelsesintervalsekvens av
en bipartitgraf genom att följa tre huvudsteg: (1) skapa en hashtabell som
snabbt kan konvertera en samling händelsintervalsekvenser till en bipartig
grafrepresentation, (2) skapa och reglera en bi-adjacency-matris som motsvarar
bipartitgrafen, (3) definiera en spektral inbäddning på bi-adjacensmatrisen.
Dessutom visar vi att väsentliga förbättringar kan uppnås med avseende
på klassificeringsprestanda genom beskärningsparametrar som fångar arten
av relationerna som bildas av händelsesintervallen. Vi demonstrerar genom
omfattande experimentell utvärdering på fem verkliga datasätt att vår strategi
kan erhålla runtime-hastigheter på upp till två storlekar jämfört med andra
modernaste metoder och liknande eller bättre kluster- och klassificerings-
prestanda.
Nyckelord
händelsesintervall, bipartitgraf, spektral inbäddning, klustering, klassificering.
CONTENTS | iii
Contents
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Benefits, ethics and sustainability . . . . . . . . . . . . . . . . 5
1.6 Research methodology . . . . . . . . . . . . . . . . . . . . . 6
1.7 Delimitations . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.8 Structure of the thesis . . . . . . . . . . . . . . . . . . . . . . 7
2 Extended background 9
2.1 Graph theory . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.1 Graphs . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.2 Graph matrices . . . . . . . . . . . . . . . . . . . . . 10
2.1.3 Spectral clustering . . . . . . . . . . . . . . . . . . . 12
2.1.4 PageRank . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Event sequences . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.1 Elements . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Clustering methods . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.1 K-means algorithm . . . . . . . . . . . . . . . . . . . 17
2.3.2 K-means++ algorithm . . . . . . . . . . . . . . . . . 18
2.3.3 K-medoids algorithm . . . . . . . . . . . . . . . . . . 19
2.4 Classification methods . . . . . . . . . . . . . . . . . . . . . 19
2.4.1 K-nearest neighbors . . . . . . . . . . . . . . . . . . 19
2.4.2 Random forest . . . . . . . . . . . . . . . . . . . . . 20
2.4.3 Support vector machines . . . . . . . . . . . . . . . . 21
2.5 Performance metrics . . . . . . . . . . . . . . . . . . . . . . 23
2.5.1 Clustering purity . . . . . . . . . . . . . . . . . . . . 23
iv | Contents
2.5.2 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . 24
2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3 Research methods 26
3.1 Choice of research methods . . . . . . . . . . . . . . . . . . . 26
3.2 Research process . . . . . . . . . . . . . . . . . . . . . . . . 28
4 Suggested algorithm 29
4.1 Research paradigm . . . . . . . . . . . . . . . . . . . . . . . 29
4.2 Model design . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2.1 Construction of hash table . . . . . . . . . . . . . . . 32
4.2.2 Conversion to a bipartite graph . . . . . . . . . . . . . 35
4.2.3 Application of graph-based algorithms . . . . . . . . 37
5 Empirical experiments 41
5.1 Data collection . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.2 Data properties . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.3 Experimental design . . . . . . . . . . . . . . . . . . . . . . 43
5.3.1 Experiment 1: clustering . . . . . . . . . . . . . . . . 44
5.3.2 Experiment 2: classification . . . . . . . . . . . . . . 45
5.3.3 Runtime efficiency . . . . . . . . . . . . . . . . . . . 47
5.3.4 Software tools . . . . . . . . . . . . . . . . . . . . . 47
5.4 Experiment 1. clustering . . . . . . . . . . . . . . . . . . . . 48
5.5 Experiment 2. classification . . . . . . . . . . . . . . . . . . 50
References 56
List of Figures
List of Tables
A.1 The results of grid search for three constraints on the suggested
algorithm for classification. The constraint values were increased
by 0.1 within the range of 0.1 and 1.0 for each parameter. In
the case of tie, the values are sorted in ascending order by
minSup, maxSup, and gap (Ranks 1-5). . . . . . . . . . . . 64
A.2 The results of grid search for three constraints on the suggested
algorithm for classification. The constraint values were increased
by 0.1 within the range of 0.1 and 1.0 for each parameter. In
the case of tie, the values are sorted in ascending order by
minSup, maxSup, and gap (Ranks 6-10). . . . . . . . . . . 65
LIST OF ALGORITHMS | vii
List of Algorithms
1 K-means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2 K-means++ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3 K-medoids . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
NN Nearest Neighbors
RF Random Forest
TN True Negative
TP True Positive
Introduction | 1
Chapter 1
Introduction
1.1 Background
Sequential pattern mining has been applied in various application areas.
The principal objective of sequential pattern mining is to extract frequent
patterns from sequences of events [1, 2] or to create clusters from data records
distinguished by similar patterns [3]. However, they have a strong assumption
that events occur immediately without duration. This assumption becomes
the main weakness of sequential pattern mining methods because they cannot
catch the temporal relations between events with time duration. Extensive
research has been conducted in which the concept of events extends to the
concept of event intervals. The advantage of this representation is that it can
model the duration of events. Therefore, we can analyze the relationship
between different event intervals according to their relative time position,
providing additional information about the nature of the underlying event.
Analysis of event intervals is widely accepted in various areas including sign
A A A
event
B B B
C C
1 2 4 6 7 8 10 11 13 15 17 19 21 22
time
Figure 1.1: Example of an sequence of eight event intervals. The total time
duration of the sequence is 22 time points.
2 | Introduction
language [4], music informatics [5], cognitive science [6] or linguistics [7].
Example. Fig. 1.1 shows a sequence composed of nine event intervals defined
over three labels. Each label is represented as an alphabet, such as A, B, and
C. Each event interval is characterized by a particular label, as well as its start
and end periods. The same event label can arise multiple times, and various
types of temporal relations can happen between each combination of event
labels.
A set of chronologically arranged event intervals can create an event interval
sequence. Relative relations between many events in the sequence can lead
to various forms of temporal configurations. A typical example would be
temporal relations between the events using Allen’s temporal logic [8]. One
challenging problem is finding common elements in such complex sequences
and clustering them, or classifying new sequences into specific kinds of sets
of similar elements. How fast and accurate clustering and classification is an
important point.
1.2 Problem
Early studies on classification and clustering of sequences of event intervals
have been concentrating on establishing pairwise distance measures between
the sequences. One of the distance functions is Artemis [9, 10], which
measures the pairwise distance by computing the ratio of temporal relations
of specific pairs that both sequences have in common. Artemis can ignore
the absolute timespan of each sequence. It makes Artemis unconscious to
the range of the sequences by comparing only relative positions. However,
this method requires substantial computation time for checking all pairwise
relations, making this algorithm take cubic time in the worst case. Nonetheless,
the excellent predictive performance is achieved using Artemis together with
the k-NN classification algorithm.
Interval-Based Sequential Matching (IBSM) [11], another distance metric,
computes the pairwise distance of sequences by converting each sequence into
a 0-1 matrix to observe the active event labels at each time point. This method
does not explicitly recognize temporal relations (e.g., Allen’s) from the event
pairs. In the matrix, its columns have each time point, and its rows have event
labels. The matrix has its cells set to 1 for active labels at a specific time
point, while all inactive event labels are set to 0. This measure differs from
Artemis in that each sequence has a data point as a matrix. The speed of IBSM
mainly depends on the maximum time duration of sequences in the database.
Introduction | 3
Furthermore, in the case of IBSM, since the absolute time points of sequences
are different, the additional processing time is needed to interpolate the shorter
sequences to match the longest size. As a result, the time computation can be
substantially slower when the database includes sequence pairs with a hugely
disproportional number of event labels and time durations. It can be worse
when the longest sequence is far longer than other ones. However, Artemis only
recognizes temporal relations, thus affected mostly by the number of different
pairwise relations in the database. Therefore, there is no clear winner between
these two algorithms, and depending on the nature of the data, one is slower
than the other.
Recently, Sequences of Temporal Intervals Feature Extraction (STIFE)
[12] has been introduced for the classification problem of event intervals
by utilizing a mixture of static and temporal features extracted from the
sequences. The temporal features include pairs of event intervals that can
obtain high class-separation by calculating information gain. Nevertheless,
STIFE’s feature extraction can be even slower than IBSM and Artemis. Also,
as it returns non-numeric feature vectors, we cannot directly apply a distance-
based algorithm, such as clustering, to the features.
Lastly, [13] presents the relationship between graphs and sequences of
event intervals. It shows a method to transform dynamic graphs to event
intervals. On the contrary, in this thesis, we explain that bipartite graph
representation can be used to define a feature space at a substantially low
computational cost by using spectral embeddings, a common embedding
technique for graph clustering for capturing community structure in graphs
[14, 15, 16]. For bipartite graphs, bi-spectral clustering is introduced to
speed up the process by using the bi-adjacency matrix. The bi-adjacency
matrix excludes the space for edges between vertices in the same set from the
adjacency matrix [17]. Variants of spectral clustering have been introduced
on the stochastic block model [18]. Recently, the technique of regularizing
the adjacency matrix has been researched and worked well [19, 20], being
explained in terms of eigenvector perturbation [21], conductance [22], and
sensitivity to outliers [23]. This embedding space of an affinity matrix can
also be used as a feature space for classification, showing better performance
than former pairwise distance measures [24].
A major problem with the earlier algorithms we discussed is that they do
not have practically applicable runtime speed in clustering or classification
problem of event interval sequences. For example, the latest clustering
algorithms have cubic runtime in the worst case depending on the length of
the event interval sequence or the number of event intervals. Furthermore,
4 | Introduction
1.3 Purpose
To the best of our knowledge, this thesis is the first attempt to solve the problem
of clustering and classification of event sequence datasets by combining them
with the graph theory. From previous trials dealing with event sequence
datasets for classification and clustering, we could bring a problem of practical
applicability such as finishing the algorithm in a reasonable time and achieving
high-performance measures such as purity or clustering accuracy. Graph
theory can be expected to improve the applicability mentioned above since
the event sequence datasets can naturally form bipartite graphs without further
processing on the datasets, if we regard the event sequences and their temporal
relations as nodes, like terms and documents in a text corpus, customers and
purchasing items in market baskets [25]. Then we can exploit the bipartite
graph-based properties that can help our problems be more scalable and more
accurate than event sequence-specific ones. To solve the problems mentioned
above, the following research question would be applicable, and we consider
the bipartite graph as our suggested representation in this thesis:
The research question above includes both the efficiency and effectiveness
of the new proposal model. Efficiency can be interpreted as the total running
time to finish the model and get the results. Effectiveness can be explained by
calculating performance indicators such as clustering purity and classification
accuracy.
The main purpose of this thesis is eventually to attempt to combine
graph theory with event interval sequences, which is a special form of data.
Introduction | 5
1.4 Goals
The main goals to achieve to answer our research questions of this thesis are
summarized as follows:
the privacy problem without sharing actual data information. Therefore, the
method proposed in this thesis can be an alternative to such conventional time
series analysis techniques. Temporal abstraction to discretize time series data
is already widely known [27], but after using this method, we can proceed
with time series analysis in the form of privacy by using the technology in
this thesis. Besides, in the case of converting to interval data, we can change
the name of the labels arbitrarily and use random numeric labels for the data,
for example, we can use the labels ’23’, ’445’, rather than using meaningful
labels such as ’high temperature’ or ’low temperature’ of the patient. It helps
to maintain even a higher level of data privacy.
Finally, in clustering and classification, the causes can be more easily
interpreted. In the case of state-of-the-art algorithms used for comparison in
this thesis, it is not feasible to interpret the results of classification or clustering
due to the characteristics of the algorithms. However, the bipartite graph
structure used in this thesis is easy to interpret on its own. For a specific
event sequence classified into a specific class, the cause can be inferred through
the edges connected to features or temporal relations. This can eliminate the
dangers of machine learning as a black-box that has emerged recently. Machine
learning models that are difficult to determine the cause are not sustainable
because they are challenging to identify when errors occur, and therefore
cannot be relied upon in critical cases [28]. The model proposed in this thesis
is easy to identify the cause, and in the end, it will help create a sustainable
model.
1.7 Delimitations
Here we describe parts of our research question that we will not study in this
thesis. The items of delimitation defined here mainly try to limit the formal
part of the event sequence dataset.
• There are some cases that give flexibility by allowing uncertainty to
the time points that intervals have, for more accurate performance.
However, the uncertainty of the interval is not included in our scope
of research. That means we assume that the start time and end time of
intervals do not change.
• There is also a data format in which one interval has various statuses,
not only one, but this thesis only deals with cases where one interval can
have only one event label. If we have two intervals and the event labels
are different, they are treated as different intervals even if the start and
end times are the same.
• Event interval sequences can have multiple classes (e.g., one patient can
have multiple diseases in medical datasets), but in this thesis, we limit
that one sequence can have only one class.
This is the first study to perform graph clustering and classification by
converting the event interval sequence into a graph. Therefore, we do not
attempt any other graph forms except for the bipartite graph. Besides, although
there are various types of clustering and classification on graphs, we focus on
showing the possibility by using the primary spectral clustering and PageRank-
based techniques. Besides, this thesis demonstrates the performance of the
algorithm itself and does not perform statistical verification on empirical
datasets.
Chapter 2
Extended background
Bipartite graph
A bipartite graph GB = {U, V, E} is a special form of a graph, having vertices
divided into two disjoint sets U and V , meaning that U ∩ V = ∅, and a set of
10 | Extended background
Edges
1
2 1
a
5
4
2 2
1
3 b
3
1 1 Weights
3
2 c
4 4
Nodes
Graph Bipartite graph
Figure 2.1: Example of a weighted graph (left) and a bipartite graph (right).
Biadjacency matrix
A bi-adjacency matrix B ∈ R|U |×|V | is a two-dimensional matrix representing
a bipartite graph GB = {U, V, E}, with each axis representing a set of
vertices, and edges between sets U and V are defined as elements of the matrix.
Moreover, Bij > 0 if and only if there is at least one edge between (i, j) in the
graph.
A bi-adjacency matrix can only be defined in a bipartite graph because the
matrix does not have elements between vertices in the same vertex set. On the
Extended background | 11
Laplacian matrix
A Laplacian adjacency matrix of a graph G is defined as L = D − A, where
|U |
D ∈ R|U |×|U | is a diagonal degree matrix with Dii = Σj=1 Aij , and i, j ∈
[1, |U |].
Unlike the standard adjacency matrix, we cannot get a bi-adjacency matrix
carrying the same information as L for bipartite graph does not have a block
structure like an adjacency matrix as follows (D1 = diag(B1|U | ), D2 =
diag(B T 1|V | ) [30]:
" #
|U |×|U |
D1 B
L(GB ) = T |V |×|V | (2.2)
B D2
X
w(A, B) = wij
i∈A,j∈B
w(A, B) w(A, B)
normalized_cut(A, B) = +
w(A, V ) w(B, V )
2.1.4 PageRank
PageRank is a method of weighting documents having a hyperlink structure
such as the World Wide Web according to their relative importance [35].
Hyperlinks can be expressed as out-links from one document to other ones, and
simultaneously, as in-links coming to one document from other documents. A
citation graph can be an example with the papers as the nodes, and the citations
in the paper as the edges. The PageRank algorithm can be applied to all types
of graphs of the same shape.
The PageRank algorithm assumes that out-links from the same document
have the same importance. Moreover, the importance of the out-link is
determined proportionally to the importance of the document. Likewise, the
node’s importance is expressed recursively as the sum of the importance of the
in-links coming into the node.
Example. Let’s say the importance of nodes A, B, and C in order is iA , iB , iC .
In the PageRank algorithm, the importance of links from node B to A and the
importance of links from node B to C are both i2B . And the importance of
each node can be expressed as follows:
iA iB
iA = +
2 2
iA
iB = + iC
2
iB
iC =
2
The above expression can be expressed using a matrix and a vector as follows:
1 1
iA 2 2
0 iA
1
iB = 0 1 iB
2
iC 0 12 0 iC
If we set all the importance at the very beginning to N1 and repeat the matrix
multiplication, it will converge to the actual importance. This method is called
power iteration. However, The PageRank algorithm described above does not
work well for all graphs. It is known that it does not work well especially
for two cases: 1) when a node with no out-link exists (dead ends), 2) when
all out-links of a group smaller than the entire graph exist only in the group
(spider traps). Teleport has been proposed as a way to solve this problem.
14 | Extended background
Personalized PageRank
Personalized PageRank is a special form of PageRank with restriction of the
set to teleport [36]. Unlike a normal PageRank with teleport that can jump
to any node in the graph with the equal probability, personalized PageRank
teleports only to the given sets, or more generally, with different probability
values. This can be represented as a unique power iteration solution as follows:
Event interval
An event interval s = (e, ts , te ) is a triple of event label and time elements.
The three elements are event label s.e ∈ Σ, and s.ts , s.te , meaning the start
and end time of the interval. event interval can be formed when s.ts <= s.te ,
Extended background | 15
E-sequence
Relation:
E-sequence Blue overlaps Yellow
database
and for continuous event interval, s.ts < s.te is always established. In the
special case where s.ts = s.te , the event interval is instantaneous.
Event sequence
An event sequence S={s1 , . . . , sn } is a group of event intervals for the same
entity. Event intervals in an event sequence are sorted chronologically and can
contain the same event label multiple times. They first follow an ascending
order based on their start time. If the start times tie, the end times are
arranged in ascending order. If the end times are also the same, they follow
the lexicographic order of the event labels.
Temporal relation
follows A B matches
A
meets A B B
A left- A
overlaps
B matches B
contains
A right- A
B matches B
Figure 2.3: Seven temporal pairwise relations created by the relative positions
of two event intervals.
In this thesis, we consider Allen’s seven temporal relations [8] (Figure 2.3),
defined in the following set, I = {follows, meets, overlaps, matches, contains,
16 | Extended background
2.2.2 Properties
Vertical support
Given an event sequence S = {s1 , . . . , sn } and a temporal relation R =<
ei , ej , r > defined by event labels ei , ej ∈ Σ, with r ∈ I, the vertical support
of R is defined as the number of event sequences where event labels ei , ej
occur with relation r.
While there can be multiple occurrences of R in the same event sequence,
the relation is counted only once.
Let function occV (·) indicate a single occurrence of a temporal relation in
an event sequence, such that occV (R, S) = 1 if R occurs in S, and 0 otherwise.
We define a frequency function F : [0, |D|] → [0, 1] that computes the relative
vertical support of a temporal relation R in an event sequence database D as
follows:
|D|
1 X
F(R) = occV (R, Si ) .
|D| S ∈D
i
Horizontal support
Given a single event sequence S and a specific temporal relation R, we can
define the horizontal support as a function of these two parameters. The
horizontal support is the number of occurrences of R in S, represented as
occH (R, S). This function counts the occurrences of R in a single event
sequence, not in the database.
problems that can occur in k-means. However, the overall algorithm execution
process is similar for all three algorithms.
Algorithm 1: K-means
Data: D: set containing n data objects
k: number of clusters
Result: k clusters
1 k data objects are randomly extracted from the data object set D, and
these data objects are set as the centroid of each cluster.
2 For each data object in set D, the distances from k cluster center
objects are respectively obtained, and it is found which center point
(centroid) is most similar to each data object. Then, each data object
is assigned as the center point.
3 Recalculate the center point of the cluster. That is, the center point is
recalculated based on the clusters reassigned in 2.
4 Repeat steps 2 and 3 until the cluster to which each data object
belongs does not change.
5 return k clusters
center point of the set Si , the algorithm tries to acieve the following formula:
k X
X
argmin kx − µi k2
S
i=1 x∈Si
Algorithm 2: K-means++
Data: D: set containing n data objects
k: number of clusters
Result: k clusters
1 Select a random data point from the data set and set it as the first
center c0 .
2 For each data point di in the dataset D, the distance between the
corresponding data and the closest center among the selected center
points dist(di , ck ) is calculated.
3 Select a data point using a biased probability distribution whose
probability is proportional to Σk−1j dist(di , cj )2 , and set it as the k’th
center.
4 Step 2, 3 are repeated until k centers are selected.
5 k-means clustering is performed using the selected k centers as initial
values.
6 return k clusters
The k-means clustering has a property that significantly depends on how the
initial value is selected. k-means++ is proposed to reduce the damage caused
by these properties [42]. The k-means++ algorithm is a variant of the k-means
algorithm that selects the initial value of the k-means clustering algorithm
(Algorithm 2). The k-means++ algorithm requires additional time to set the
initial values, but the selected initial values then ensure that the k-means
algorithm finds the optimal k-means solution over the time of O(logk). In
this thesis, we used k-means++ instead of k-means.
Extended background | 19
Algorithm 3: K-medoids
Data: D: set containing n data objects
k: number of clusters
Result: k clusters
1 Select any k data from the data set and set it as the midpoint.
2 Bind unselected data to the nearest midpoint. At this time, not only
the Euclidean distance but also other distance functions can be used.
3 For each unselected data, the distance cost is calculated by assuming
that data is the new midpoint of the set containing it.
4 The distance cost for the existing midpoints is compared with the
distance cost for the newly proposed midpoints, and if the cost is
lowered, the midpoint is replaced.
5 Repeat steps 2, 3, 4 until the selected midpoints have not changed.
6 return k clusters
The k-medoids algorithm uses the midpoint instead of the average of the
data points to find the center of gravity of the cluster [43]. Unlike the k-
means algorithm, it works for the Euclidean distances and arbitrary distance
functions; thus, it does not need the actual data points but only needs the
pairwise distance function receiving two data points’ ids (Algorithm 3). In
this algorithm, most of the time is spent calculating the distance between the
data. Therefore, the distance value between data can be calculated and stored
in advance, or the heuristic technique can be used to speed up the algorithm.
k=3
k=1
k nearest neighbors based on distances. That is why k-NN is a lazy model (or
instance-based learning), which means they do not build the model separately
as a training process. It is a concept that contrasts with model-based learning,
which creates a model from data first, and performs the task. The purpose is
to perform tasks such as classification/regression using only each observation
without a separate model generation process.
K-NN’s hyperparameters are two types of neighbors to search (k) and
a distance measurement method. If k is small, it will overfit the regional
characteristics of the data. Conversely, if it is large, the model tends to be
over-normalized (underfitting). Second, the results of the k-NN algorithm vary
greatly depending on the distance measurement method. Most commonly, the
Euclidean distance is used.
3. Combine basic classifiers (trees) into one classifier (RF) (using average
or majority vote method).
Because the tree has a small bias and large variance, a very profoundly
grown tree becomes overfitted. The bootstrap process improves the forest’s
performance because it preserves the trees’ deflection while reducing the
variance. That is, one decision tree is susceptible to noise in the training data,
but if the trees are not correlated with each other, the average of several trees
is robust against noise. If all the trees that make up the forest are trained with
the same dataset, the trees’ correlation will be huge. Therefore, bagging is a
process to uncorrelated trees by training on different datasets.
If k(x, y) gets smaller as x and y get farther apart, each sum represents the
degree of proximity of the test point x and its corresponding data point xi . do.
In this way, the sum of the above kernel expressions can be used to measure
the relative proximity between data points and test points in the set we want to
distinguish. This may be more complicated and difficult when the point x in
the non-convex set in the initial space is mapped to a higher dimension.
Classifying data is a common task in machine learning. Assuming that a
given data point belongs to each of the two classes, the goal is to determine
where the new data point will belong. On a support vector machine, when a
data point is given as a p -dimensional vector (a list of p numbers), we check
to see if we can classify them into a hyperplane of (p − 1)-dimensions. This is
called a linear classification. Hyperplanes that classify data can come in many
cases. One logical way to select hyperplanes is to select the hyperplane with
the largest classification or margin between the two classes. So we choose
the hyperplane that maximizes the distance between the data points of each
class closest to the hyperplane. If such hyperplanes exist, the hyperplane is
called the maximum-margin hyperplane, and the linear classifier is called the
maximum margin classifier.
The optimization problem of SVM can be expressed as follows. Using
the Lagrange multiplier method, the above problem can be expressed as the
Extended background | 23
Figure 2.6: Example of the clustering purity with three clusters and three
classes.
24 | Extended background
Purity is one of the criteria for assessing the quality of a cluster and is
classified as an external evaluation [52] of clusters, meaning that we need to
know ground truth class distribution in our dataset [53]. This is calculated by
evaluating the percentage of how many single classes each cluster contains.
When calculating purity, the absolute value of each class is not important, and
it is always calculated by the percentage of classes that make up the largest
number in each cluster. If this is expressed as a formula, it can be expressed
as:
1 X
max |m ∩ c|
D m∈M c∈C
2.5.2 Accuracy
Accuracy is calculated as the number of True Positive (TP) and True Negative
(TN) of the prediction over the size of the entire target dataset as follows:
tp + tn
Accuracy =
tp + tn + f p + f n
2.6 Summary
This section briefly introduced the contents of the various algorithms used
in our thesis. The graph algorithm is used to transform our dataset, and
clustering and classification are applied on top of this graph structure. Event
sequences are the format that our data has by default and are used to transform
Extended background | 25
Chapter 3
Research methods
This chapter aims to provide an overview of the research method used in this
thesis to answer the following research questions:
• Representation and applicability: Will graph structure be a proper
representation for clustering and classification problems that can generate
numerical feature vectors?
• Clustering purity: Can clustering of event intervals with a graph
structure achieve purer clusters from datasets compared to previous
methods?
• Classification accuracy: Can the classification of event sequences with
a graph structure achieve higher accuracy from datasets than previous
methods?
• Runtime efficiency: Will the total runtime of creating the feature
vectors and performing classification and clustering on them be faster
than state-of-the-art algorithms?
Section 3.1 explains the choice of research methods in the thesis to answer
the research question. Section 3.2 describes the research process used in the
thesis based on the CRoss-Industry Standard Process for Data Mining (CRISP-
DM) process to derive the proper algorithm that can solve the proposed
research question.
We have found that the algorithms previously specified for the event
sequence datasets did not show a practical level of efficiency and effectiveness.
Also, the number of algorithms limited to the event sequence datasets is quite
limited compared to those based on general data types. Based on these facts,
we concluded that changing the structure of the event sequence datasets to a
more general form could help enhance efficiency and effectiveness.
From the previous experiment converting dynamic graph structure into a
set of event intervals [13], we figured out that both structures are compatible
and can be converted into each other. Furthermore, from our comprehensive
background analysis, we also gathered information that the graph structure
can handle various machine learning algorithms including graph-specific
ones and traditional classification and clustering algorithms such as k-NN
and SVM, and dimensionality reduction techniques like spectral embedding,
which increases the possibility of improving both efficiency and effectiveness.
We introduce a novel method to convert event sequence datasets into
graph structure to answer our research question accurately based on our
extensive research into existing research areas. After introducing the method,
we answer each of the decomposed research questions by trying empirical
experiments with five real-world datasets. We validate that the method we
propose can reliably answer our research questions through graph clustering
and classifying based on graph structures and traditional machine learning
techniques through dimension reduction, which is also based on graphs. We
28 | Research methods
Chapter 4
Suggested algorithm
13 for Rk ∈ HT do
14 if
F(R) < constraints.minSup∨F(R) > constraints.maxSup
then
15 remove HT [Rk ]
16 // Step 2: Conversion to a bipartite graph
17 B = 0|D|×|HT | ;
18 for Rj ∈ HT do
19 for Si .id ∈ HT [Rk ] do
20 B[Si .id][hash(Rk , |HT |)] = HT [Rk ][Si .id];
21 // Step 3: Application of graph-based algorithms
22 if mode = "spectral" then
23 // Spectral embedding of the bipartite graph
f eatures = spectralEmbedding(B, d);
24 else if mode = "pagerank" then
25 // Similarity acquisition by personalized PageRank
f eatures = pageRank(B, d);
26 return features
Suggested algorithm | 31
Step 1. Step 2.
Construction Conversion
getRelation()
2 1 1 R1
<A, A, follows>
2 4 1
1 1 2 R2
<A, B, matches>
id Event intervals 2 4 1
1 (A, 1, 3), (B, 1, 3), (A, 14, 16) <A, B, meets> 2 1 3 R3
1
2 (A, 1, 6), (B, 6, 8), (A, 10, 12), (C, 13, 17) <A, B, follows> 1 3 1 4 R4
3 (A, 4, 7), (B, 11, 12)
2 2
4 (B, 1, 5), (A, 6, 14), (B, 6, 14), (A, 17, 18) <A, C, follows>
2 4 1
E-sequence database 2 1
R1 R2 R3 R4
<B, A, follows> 1 1+0.01 0+0.01 0+0.01 0+0.01
2 4 2
2 1+0.01 0+0.01 2+0.01 1+0.01
<B, B, follows> 1 4 1 3 0+0.01 0+0.01 0+0.01 0+0.01
<B, C, follows> 1 2 1 4 0+0.01 1+0.01 1+0.01 2+0.01
Step 3.
Spectral Embedding Personalized PageRank
1 R1
Embedding e1 e2
2 R2
1
Normalization 2 3 R3
3
4 4 R4
…
S2.id 3
Construction step
First, we traverse all event intervals in the event sequence database in pre-
defined chronological order. For every iteration, we have a target event interval
sa . Then we make a pair with all event intervals sb that occurs after (or at
Suggested algorithm | 33
the same time with, but having lexicographically later event label) our target.
Thereafter, we check the temporal relation between these two event intervals
sa and sb (lines 1-5, Algorithm 4). Then, the temporal relation between them,
Rk =< sa .e, sb .e, r >, with r ∈ I is formed and stored as a key in the
first hash table HT , which we call temporal relation hash table (lines 6-9).
Whenever the algorithm finds a temporal relation Rk , it identifies the event
sequence id containing it, and use it as a key in the second hash table HE ,
called event sequence hash table (lines 10-11). For clarity, since each record
HT [Rk ] ∈ HT is mapped to its respective event sequence hash table, we call
k
it HE . The keys of the event sequence hash table are the event sequence ids
where Rk occurs, while the values are the horizontal supports that quantify
the occurrence Rk in the event sequence, which eventually will be converted
into edge weights of the bipartite graph. When we firstly create a specific key,
k
we set the value in HE equal to one as it has only one horizontal support for
the first time. If the same temporal relation Rk happens more than once in the
k
same event sequence Si , we sum the count to HE [i] to update its horizontal
support (line 12).
Pruning step
The pruning step helps limit the unnecessary creation of trivial temporal
relations and maintain the graph only to hold necessary information for the
learning problem. The pruning step consists of two sub-steps. However, each
step does not occur sequentially, but occur at different times in the middle of
the construction step:
1. Gap pruning: A gap constraint limits the maximum distance of follows
relations between intervals. It is easy to catch follows relations as
it happens in almost every case between two intervals since it also
simply means there is no time overlaps at all. This pruning process
drops unnecessary temporal relations that happen just because they are
chronologically far apart rather than having a meaningful relation. The
algorithm checks the gap when examining the temporal relation while
scanning the database (line 5, Algorithm 4). For this, the algorithm
receives the gap constraint with a value in the range [0, 1]. This value
means the ratio of the average time duration of event sequences in the
database. The algorithm prunes the follows relations having a distance
above that ratio.
2. Frequency pruning: Frequency pruning is a step of eliminating the
temporal relations Rk , whose relative vertical supports F(Rk ) are
34 | Suggested algorithm
below or above the pre-defined criteria after the multi-layer hash table is
entirely created (lines 13-15). To do this, we impose the following two
constraints:
hash table, the algorithm performs frequency pruning by applying two support
constraints {minSup, maxSup}. Since minSup = 0.5 (or support count
of 2), the temporal relations with vertical support equal to 1 are subsequently
excluded from the first layer of the table (gray triplets in HT in the example).
Time complexity
The benefit of this multi-layer hash table is that we can easily consider two
types of frequencies and apply pruning techniques by scanning the event
sequence database only once. Moreover, we can instantly transform the table
to its corresponding bi-adjacency matrix weighted by the support of the event
sequences’ temporal relations. All that is required is to scan the database once,
scan the first layer of the table for applying the pruning technique, and scan
the first and second layers of the hash table to convert it into the bi-adjacency
matrix.
With these benefits, we can explicitly calculate the time complexity of the
first step of the suggested algorithm. Given an event sequence database D =
{S1 , . . . , S|D| }, the set of possible relations I, and the alphabet of event labels
Σ, the time complexity for creating the bi-adjacency matrix is quadratic in the
worst case as follows:
|D|
X
( |Si |2 × |I|) + (|Σ|2 × |I|) + (|Σ|2 × |I| × |D|) .
Si ∈D
U : {i | Si ∈ D, i ∈ [1, |D|]},
V : {HT .keys | ∀R ∈ HT : minSup < F(R) < maxSup}} and
i i
E : {ei,j = HE [j] | i ∈ [1, |D|], j ∈ HE }.
Algorithm 5: spectralEmbedding
Data: B: a bi-adjacency matrix of intervals where B ∈ R|U |×|V |
d: dimension factor
Result: U : Embedding of the rows
|U |×|V |
1 BR = B + α ∗ 1
−1 −1
2 NBU R = D1 2 BR D2 2
3 calculate singluar vectors NBU R = M ΣW T
4 pick leading d singular values and corresponding d columns from M
5 return M [: U, : d]
part of NB (line 2). We can only use this part since N can be expressed as a
bi-adjacency matrix as follows:
−1 −1
" #
I|U | −DU 2 BDV 2
N= −1 −1
(4.1)
−DV 2 B T DU 2 I|V |
Algorithm 6: personalizedPageRankClassification
Data: B: a bi-adjacency matrix of intervals where B ∈ R|U |×|V |
l: vector of training true labels
d: dampling factor
Result: result: classification result (label)
1 U ←− unknown nodes from the graph B
2 for c ∈ unique(l) do
3 ui ←− 1
4 Normalize u such that ||u||1 = 1
5 Rc ←− P ersonalizedP ageRank(B, u, d)
6 for i ∈ U do
7 Xi ←− argmaxc (Rci )
8 return X[U ]
we can get the estimated label for our test instance (node), by looking at the
similarity scores of each class and take the one having the maximum score for
our target node. This algorithm is described in the algorithm 6. One different
thing from the classification using spectral embedding is that it directly uses
the graph structure itself and does not change more from a bi-adjacency matrix.
Instead, it just follows the edges to get scores of the most similar nodes from
the unknown ones.
Empirical experiments | 41
Chapter 5
Empirical experiments
This chapter shows the empirical experiments. Section 5.1 describes the
properties of the datasets we collected to perform our experiments. Section 5.2
describes the properties of datasets we selected. Section 5.3 describes the
framework selected to evaluate the suggested method, and software tools,
mainly libraries and languages, we used for our experiments. Section 5.3.1
and Section 5.3.2 give the results of our empirical experiments for clustering
and classification on five real-world event sequence datasets.
• SKATING: The event labels show the muscle activity and leg position
of six professional inline speed skaters while performing a control test
at seven speeds on the treadmill. The event sequence represents one
element of the complete movement cycle.
• CONTEXT: The event labels are from the categorical and numerical
data that describe the situation of mobile devices that people carry in
various situations. The event sequence represents a part of a possible
scenario, such as being in the street or having a meeting.
from simple datasets to complex datasets. Firstly, the BLOCKS dataset is the
lightest dataset we have collected in terms of relation size. Since there are only
eight event labels, the maximum number of relations that the dataset can form
is just 8 × 8 = 64. On the other hand, PIONEER, CONTEXT, and SKATING
datasets have similar medium-sized characteristics, but each dataset highlights
different aspects. The PIONEER dataset has even fewer sequences than the
BLOCKS sequence, but it has 92 the largest number of event labels, so the
maximum possible temporal relations is also the highest (Table 3.2). On
the other hand, CONTEXT and SKATING have fewer possible relations than
PIONEER, but the number of event sequences and intervals increases the total
searchable space. The most significant difference between CONTEXT and
SKATING data is that the event sequence time point of SKATING is about
ten times larger than that of CONTEXT (Table 3.1), which makes the speed of
algorithms based on time points such as IBSM very slow. Finally, HEPATITIS
data is a dataset with both high event labels and sequences; all algorithms take
the most time on it. HEPATITIS data can thus explain how our algorithms
show an improved effect on complex data.
Spectral Application of
Construction of embedding of the clustering/classification
hash table bipartite graph methods
Common process
1. Construction of hash table: Form a hash table for traversing a given
event sequence datasets and converting them into graph form.
Datasets
We used all five datasets we prepared (BLOCKS, PIONEER, CONTEXT,
SKATING, and HEPATITIS).
Competitor methods
We compared our model to two alternative state-of-the-art distance functions,
Artemis and IBSM. We developed those algorithms in the same programming
language for a fair comparison. We ran those on the same datasets in the same
working environment.
Evaluation metrics
We demonstrated the runtime efficiency of our suggested model and its
applicability to the clustering tasks by reporting clustering purity. For our
proposed algorithm as well as competitors, clustering purity values are derived
by performing the methods 100 times independently and average the values.
Runtime values are also averaged in the same manner.
we can find and apply optimal parameters for the datasets. A detailed process
is described as follows:
Common process
The tasks described below are run separately after finishing the common
process.
Datasets
For this experiment, we used all five datasets we prepared.
Empirical experiments | 47
Competitor methods
We here again compared our model to two alternative state-of-the-art distance
functions, Artemis and IBSM. For classification, for completeness, we also
compared the state-of-the-art algorithm called STIFE, a RF feature-based
classifier for event sequences, which only can handle classification problems.
We developed those three algorithms in the same programming language for
a fair comparison. We ran those on the same datasets in the same working
environment.
Evaluation metrics
We demonstrated the runtime efficiency of our suggested model as well as its
applicability to the tasks of classification by reporting classification accuracy.
For our proposed algorithm and competitors, classification accuracy values
are derived by performing 10-fold cross-validation and average the values.
Runtime values are also averaged in the same manner.
Table 5.3: Clustering results for all competitors in terms of clustering purity
(%) and runtime (seconds).
Artemis IBSM
Dataset K-medoids K-means K-medoids
Purity Time Purity Time Purity Time
BLOCKS 85.62 1.20 95.30 0.71 99.09 10.57
PIONEER 66.13 15.64 63.94 4.41 64.09 74.13
CONTEXT 65.13 122.23 75.22 5.19 82.66 204.82
SKATING 36,52 180.48 70.21 286.10 - >1h
HEPATITIS - >1h 67.91 444.77 - >1h
vectors provided a compressed version of the original space by almost 99% for
all datasets, which has contributed to the high computation speedups obtained.
Graph-based ones were not as fast as the spectral embedding based process,
but they were also at least much faster than state-of-the-art algorithms. One
more benefit that graph-based clustering and classification can have is that we
can visually see how they are grouped in the graph structure since it does not
transform the bi-adjacency matrix generated by our hash table process. Using
our algorithm with various classifiers, we could achieve speedups of up to a
factor of 292. This is even an underestimate as for the cases where competitors
that did not finish within the one-hour execution time limit, our approach is at
least 300 times faster than the competitors.
Table 5.4: Clustering results for our algorithm in terms of clustering purity
(%) and runtime (seconds).
K-medoids K-means PageRank
Dataset
Purity Time Purity Time Purity Time
BLOCKS 93.81 0.02 99.82 0.04 96.19 0.41
PIONEER 74.75 0.89 83.12 0.91 95.63 2.13
CONTEXT 77.54 1.99 82.36 2.02 61.03 3.59
SKATING 62.45 1.48 74.40 1.52 64.01 3.98
HEPATITIS 71.70 9.60 70.08 9.63 60.01 14.30
Runtime comparison
In terms of runtime, our two algorithms (spectral embedding and personalized
PageRank) were faster compared to our two competitors (Artemis and IBSM).
In particular, Artemis even could not complete the calculation within an hour
on the HEPATITIS dataset. IBSM showed a deficient runtime performance for
the datasets with long event sequences, such as SKATING and HEPATITIS.
Specifically, when k-medoids was used on SKATING, the speed was even
slower than that of Artemis. Moreover, k-means could not complete within
an hour for SKATING and HEPATITIS, while our algorithm with k-means
completed in 1.52 seconds on SKATING and in 9.63 seconds for HEPATITIS.
Purity comparison
In terms of purity, our algorithm also showed remarkable results. In the k-
medoids trials, Artemis had the lowest purity values for all data sets except
for PIONEER, while IBSM showed the highest purity only on SKATING for
k-medoids, but it was about 193 times slower than our algorithm. For the
rest of the datasets, our algorithm showed the fastest runtime performance
and achieved the highest purity. In the k-means experiment, our algorithm
showed the highest purity values, except for CONTEXT. In the case of
CONTEXT, IBSM led by a slight difference by about 0.3 percent but was
also about 101 times slower than our algorithm. The personalized PageRank
algorithm shows similar results but not actively better than competitors, and
even shows worse performance in terms of purity on the most datasets except
for PIONEER. However, it shows the highest purity value on the PIONEER
50 | Empirical experiments
dataset, suggesting that there might be cases where the graph structure is more
effective than spectral embedding. We have left a closer look at this problem
as future works.
Chapter 6
6.1 Conclusions
We proposed a novel representation of event sequences using a bipartite graph
for efficient and effective clustering and classification. We benchmarked our
representation on five real-world datasets against several competitors. Our
experimental benchmarks showed that both proposed spectral embedding
representation and graph-based analysis could achieve substantially lower
runtimes than competitors and even higher values of purity (for clustering)
and classification accuracy (for classification) than some of its competitors.
We want to revisit the research question in this thesis and close this thesis.
We worked on this thesis to address one research question:
In our research question, the key measures we need to derive was efficiency
and effectiveness, and we performed extensive experiments to derive both
measures thoroughly. First, we checked the efficiency through the total
runtime until the algorithm terminates, which runs the algorithm altogether,
as mentioned earlier. Moreover, we confirmed through experiments that the
algorithm proceeds faster for any data set we prepared than all the existing
algorithms. Second, effectiveness was measured through purity for clustering
and accuracy for classification. However, the effectiveness of our algorithm
showed a slightly different aspect of clustering and classification. Clustering
54 | Conclusions and future work
6.2 Limitations
Although our advanced algorithm brings many benefits to the event sequence
field, there are obvious limitations to this attempt. The most significant
limitation of our algorithm is that it requires three parameters. To get the
most excellent effect that this algorithm can perform due to the addition of
Conclusions and future work | 55
References
[20] T. Qin and K. Rohe, “Regularized spectral clustering under the degree-
corrected stochastic blockmodel,” in Advances in Neural Information
Processing Systems, 2013, pp. 3120–3128.
[24] M. Schmidt, G. Palm, and F. Schwenker, “Spectral graph features for the
classification of graphs and graph sequences,” Computational Statistics,
vol. 29, no. 1-2, pp. 65–80, 2014.
[30] N.-D. Ho and P. Van Dooren, “On the pseudo-inverse of the laplacian
of a bipartite graph,” Applied Mathematics Letters, vol. 18, no. 8, pp.
917–922, 2005.
60 | REFERENCES
[31] J. Shi and J. Malik, “Normalized cuts and image segmentation,” IEEE
Transactions on pattern analysis and machine intelligence, vol. 22, no. 8,
pp. 888–905, 2000.
[33] L. Hagen and A. B. Kahng, “New spectral methods for ratio cut
partitioning and clustering,” IEEE transactions on computer-aided
design of integrated circuits and systems, vol. 11, no. 9, pp. 1074–1085,
1992.
[41] J. Han, J. Pei, and M. Kamber, Data mining: concepts and techniques.
Elsevier, 2011.
[43] H.-S. Park and C.-H. Jun, “A simple and fast algorithm for k-medoids
clustering,” Expert systems with applications, vol. 36, no. 2, pp. 3336–
3341, 2009.
[45] L. Breiman, “Bagging predictors,” Machine learning, vol. 24, no. 2, pp.
123–140, 1996.
[47] L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp. 5–
32, 2001.
[55] C. Shearer, “The crisp-dm model: the new blueprint for data mining,”
Journal of data warehousing, vol. 5, no. 4, pp. 13–22, 2000.
[56] G. Harper and S. D. Pickett, “Methods for mining hts data,” Drug
Discovery Today, vol. 11, no. 15-16, pp. 694–699, 2006.
Appendix A
These two tables (Table A.1, A.2) include the results of grid search for three
constraints on the suggested algorithm for classification. The constraint values
were increased by 0.1 within the range of 0.1 and 1.0. In the case of tie, the
values are sorted in ascending order by minSup, maxSup, and gap.
64 | Appendix A: Grid search results
Table A.1: The results of grid search for three constraints on the suggested
algorithm for classification. The constraint values were increased by 0.1 within
the range of 0.1 and 1.0 for each parameter. In the case of tie, the values are
sorted in ascending order by minSup, maxSup, and gap (Ranks 1-5).
Constraints 1-NN
Dataset Rank
minSup maxSup gap accuracy
1 0.0 0.4 0.0 100.00
2 0.0 0.4 0.6 100.00
BLOCKS 3 0.0 0.4 0.7 100.00
4 0.0 0.4 0.8 100.00
5 0.0 0.4 0.9 100.00
1 0.0 0.7 0.1 100.00
2 0.0 0.7 0.3 100.00
PIONEER 3 0.0 0.7 0.5 100.00
4 0.0 0.8 0.6 100.00
5 0.0 0.8 0.7 100.00
1 0.4 0.5 0.2 95.00
2 0.4 0.5 0.3 95.00
CONTEXT 3 0.4 0.7 0.2 93.75
4 0.4 0.7 0.3 93.33
5 0.1 1.0 0.6 92.92
1 0.5 0.6 0.1 91.32
2 0.5 0.9 0.3 90.75
SKATING 3 0.5 0.9 0.1 90.19
4 0.0 0.7 0.1 90.19
5 0.0 0.9 0.1 90.00
1 0.0 1.0 0.1 76.30
2 0.0 0.9 0.4 76.29
HEPATITIS 3 0.0 0.7 0.7 76.09
4 0.1 1.0 0.4 75.93
5 0.1 0.7 0.1 75.92
Appendix A: Grid search results | 65
Table A.2: The results of grid search for three constraints on the suggested
algorithm for classification. The constraint values were increased by 0.1 within
the range of 0.1 and 1.0 for each parameter. In the case of tie, the values are
sorted in ascending order by minSup, maxSup, and gap (Ranks 6-10).
Constraints 1-NN
Dataset Rank
minSup maxSup gap accuracy
6 0.0 0.4 1.0 100.00
7 0.0 0.5 0.0 100.00
BLOCKS 8 0.0 0.5 0.5 100.00
9 0.0 0.5 0.6 100.00
10 0.0 0.5 0.7 100.00
6 0.0 0.8 0.9 100.00
7 0.0 0.9 0.3 100.00
PIONEER 8 0.0 0.9 1.0 100.00
9 0.0 0.1 0.0 99.36
10 0.0 0.1 0.8 99.36
6 0.4 0.8 0.2 92.92
7 0.4 0.8 0.3 92.92
CONTEXT 8 0.4 0.6 0.3 92.50
9 0.1 0.9 0.7 91.25
10 0.1 0.9 0.9 91.25
6 0.5 0.7 0.2 89.81
7 0.0 0.9 0.3 89.62
SKATING 8 0.5 1.0 0.4 89.62
9 0.0 1.0 0.1 89.72
10 0.6 0.9 0.2 89.43
6 0.0 0.8 0.9 75.91
7 0.0 0.6 1.0 75.71
HEPATITIS 8 0.1 0.5 0.4 75.71
9 0.1 0.7 0.6 75.71
10 0.1 0.9 0.7 75.71
TRITA-EECS-EX-2020:679
www.kth.se