0% found this document useful (0 votes)
57 views14 pages

Active Online Learning For Social Media Analysis To Support Crisis Management

This document proposes a new active online learning algorithm called AOMPC to classify social media data streams related to crises in real-time. AOMPC uses both labeled and unlabeled data, and employs active learning to query labels for ambiguous data from users. It was evaluated on synthetic data and Twitter data from two crises, and was shown to accurately classify streaming data with evolving topics over time. Comparative studies found AOMPC performed well for dealing with partly-labeled streaming data.

Uploaded by

rock star
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views14 pages

Active Online Learning For Social Media Analysis To Support Crisis Management

This document proposes a new active online learning algorithm called AOMPC to classify social media data streams related to crises in real-time. AOMPC uses both labeled and unlabeled data, and employs active learning to query labels for ambiguous data from users. It was evaluated on synthetic data and Twitter data from two crises, and was shown to accurately classify streaming data with evolving topics over time. Comparative studies found AOMPC performed well for dealing with partly-labeled streaming data.

Uploaded by

rock star
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TKDE.2019.2906173, IEEE Transactions on Knowledge and Data Engineering

IEEE TRANS. KNOWLEDGE AND DATA ENGINEERING, VOL. , NO. , —- 2015 1

Active Online Learning for Social Media


Analysis to Support Crisis Management
Daniela Pohl, Abdelhamid Bouchachia SMIEEE, and Hermann Hellwagner SMIEEE

Abstract—People use social media (SM) to describe and discuss different situations they are involved in, like crises. It is therefore
worthwhile to exploit SM contents to support crisis management, in particular by revealing useful and unknown information about
the crises in real-time. Hence, we propose a novel active online multiple-prototype classifier, called AOMPC. It identifies relevant
data related to a crisis. AOMPC is an online learning algorithm that operates on data streams and which is equipped with active
learning mechanisms to actively query the label of ambiguous unlabeled data. The number of queries is controlled by a fixed
budget strategy. Typically, AOMPC accommodates partly labeled data streams. AOMPC was evaluated using two types of data:
(1) synthetic data and (2) SM data from Twitter related to two crises, Colorado Floods and Australia Bushfires. To provide a
thorough evaluation, a whole set of known metrics was used to study the quality of the results. Moreover, a sensitivity analysis
was conducted to show the effect of AOMPC’s parameters on the accuracy of the results. A comparative study of AOMPC against
other available online learning algorithms was performed. The experiments showed very good behavior of AOMPC for dealing
with evolving, partly-labeled data streams.

Index Terms—Online Learning, Multiple Prototype Classification, Active Learning, Social Media, Crisis Management

1 I NTRODUCTION clustering [48] was used to identify sub-events that


evolve over time in a dynamic way. In particular,
The primary task of crisis management is to identify
online feature selection mechanisms were devised as
specific actions that need to be carried out before (pre-
well, so that SM data streams can be accommodated
vention, preparedness), during (response), and after
continuously and incrementally.
(recovery and mitigation) a crisis occurred [28]. In
It is interesting to note that people from emergency
order to execute these tasks efficiently, it is helpful to
departments (e.g., police forces) already use SM to
use data from various sources including the public as
gather, monitor, and to disseminate information to
witnesses of emergency events. Such data would ena-
inform the public [21]. Hence, we propose a learning
ble emergency operations centers to act and organize
algorithm, AOMPC, that relies on active learning to
the rescue and response. In recent years, a number
accommodate the user’s feedback upon querying the
of research studies [49] have investigated the use of
item being processed. Since AOMPC is a classifier, the
social media as a source of information for efficient
query is related to labeling that item.
crisis management. A selection of such studies, among
The primary goal in using user-generated contents
others, encompasses Norway Attacks [47], Minneapo-
of SM is to discriminate valuable information from
lis Bridge Collapse [35], California Wildfire [64], Co-
irrelevant one. We propose classification as the dis-
lorado Floods [18], and Australia Bushfires [23], [22].
crimination method. The classifier plays the role of
The extensive use of SM by people forces (re)thinking
a filtering machinery. With the help of the user, it
the public engagement in crisis management regar-
recognizes the important SM items (e.g., tweets), that
ding the new available technologies and resulting
are related to the event of interest. The selected items
opportunities [13].
are used as cues to identify sub-events. Note that an
Our previous work on SM in emergency response
event is the crisis as such, while sub-events are the to-
focused on offline and online clustering of SM messa-
pics commonly discussed (i.e., hotspots like flooding,
ges. The offline clustering approach [50] was applied
collapsing of bridges, etc. in a specific area of a city)
to identify sub-events (specific hotspots) from SM
during a crisis. These sub-events can be identified
data of a crisis for an after-the-fact analysis. Online
by aggregating the messages posted on SM networks
describing the same specific topic [48], [51].
• Daniela Pohl and Hermann Hellwagner are with the Institute of In-
formation Technology, Alpen-Adria-Universität Klagenfurt, 9020 Kla-
We propose a Learning Vector Quantization (LVQ)-
genfurt, Universitätsstr. 65-67, Austria. (phone: +43 463 2700 3688, like approach based on multiple prototype classifica-
fax: +43 463 2700 993688, e-mail: {daniela,hellwagn}@itec.aau.at) tion. The classifier operates online to deal with the
• Abdelhamid Bouchachia is with the Smart Technology Research
Centre, Bournemouth University, Poole, BH12 5BB, UK (phone:
evolving stream of data. The algorithm, named active
+44 (0) 1202 962401, fax: +44 (0) 1202 965314, e-mail: aboucha- online multiple prototype classifier (AOMPC), uses un-
[email protected]) labeled and labeled data which are tagged through
active learning. Data items which fall into ambiguous
1041-4347 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more
information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TKDE.2019.2906173, IEEE Transactions on Knowledge and Data Engineering

IEEE TRANS. KNOWLEDGE AND DATA ENGINEERING, VOL. , NO. , —- 2015 2

regions are selected for labeling by the user. The two classes, e.g., “relevant” and “irrelevant”. In real
number of queries is controlled by a budget. The world-scenarios, due to the nature of the data, it
requested items help to direct the AOMPC classifier to is often not possible to describe the data with a
a better discriminatory capability. While AOMPC can single prototype-based classifier. Multiple prototype
be applied to any streaming data, here we consider in classifiers (i.e., several prototypes) are needed.
particular SM data. Self organizing maps (SOM) introduced by Koho-
The contributions of this paper are as follows: nen [32] are an unsupervised version of prototype-
• An original online learning algorithm, AOMPC, based classification, also known as LVQ. In this
is proposed to handle data streams in an efficient case, prototypes are initialized (e.g., randomized) and
way. It is a multi-prototype LVQ-like algorithm adapted. SOM was also used for SM analysis in the
inspired by our previous work [9], [8]. context of crisis management to identify important
• As part of AOMPC, an active learning strategy hotspots [50].
is introduced to guide AOMPC towards accurate LVQ has been applied to several areas, e.g., robo-
classification, and in this paper towards sub- tics, pattern recognition, image processing, text clas-
event detection. Such a strategy makes use of sification etc. [20], [32], [62]. LVQ - in the context
budget and uncertainty notions to decide when of similarity representation, rather then vector-based
and what to label. representation - is analyzed by Hammer et al. [25].
• AOMPC is evaluated on different data: synthetic Mokbel et al. [40] describe an approach to learn
datasets (synthetic numerical data, generated mi- metrics for different LVQ classification tasks. They
croblogs, which are geo-tagged) and real-world suggest a metric adaptation strategy to automatically
datasets collected from Twitter related to two adapt metric parameters.
crises, Colorado Floods in 2013 and Australia Bezdek et al. [6] review several offline multiple
Bushfires in 2013. The choice and the use of all prototype classifiers, e.g., LVQ, fuzzy LVQ, and the
these datasets was motivated by their diversity. deterministic Dog-Rabbit (DR) model. The latter limits
That allows to thoroughly evaluate AOMPC be- the movement of prototypes and is similar to our
cause these datasets have different characteristics. approach. However, in contrast to our approach, DR
• A sensitivity analysis based on the different uses offline adaptation of the learning rate. The time-
AOMPC parameters and datasets is carried out. based learning rate of our algorithm considers con-
• A comparison of AOMPC against well-known cept drift (i.e., changes of the incoming data) directly
online algorithms is conducted and discussed. during the update of the prototypes.
The paper has the following structure. Section 2 In contrast to the previous approaches, Boucha-
presents the related work covering streaming and chia [8] proposes an incremental supervised LVQ-like
SM analysis. Section 3 introduces the classification competitive algorithm that operates online. It consists
algorithm and describes the processing steps, inclu- of two stages. In the first stage (learning stage), the
ding the active learning facets. Section 4 discusses the notions of winner reinforcement and rival repulsion
empirical evaluation of AOMPC after describing the are applied to update the weights of the prototypes.
datasets used. Section 5 concludes the paper. In the second stage (control stage), two mechanisms,
staleness and dispersion are used to get rid of dead and
2 R ELATED W ORK redundant prototypes.
A summary of different prototype based learning
The problem addressed in this paper is related to se-
approaches can be found in Biehl et al. [7]. In this
veral topics: multiple prototype and Learning Vector
study, we deal with online real-time classification and
Quantization (LVQ) classification, online learning for
we propose a multi-prototype quantization algorithm,
classification, active learning with budget planning,
where the winning prototype is adapted based on
and social media analysis (i.e., natural language pro-
the input. In particular, the algorithm relies on online
cessing). A short overview of these topics is presented
learning and active learning.
in the following.

2.1 Multiple Prototype Classification and LVQ 2.2 Online Learning and Active Learning (with
Classification Budget Planning)
A prototype-based classification approach operates on Online learning receives data items in a continuous
data items mapped to a vector representation (e.g., sequence and processes them once to classify them
vector space model for text data). Data points are clas- accordingly [66]. Bouchachia and Vanaret [10], [11]
sified via prototypes considering similarity measures. use Growing Gaussian Mixture Models for online
Prototypes are adapted based on items related/similar classification. Compared to the algorithm proposed
to them. in this work, there is a difference in adapting the
A Rocchio classifier [37] is an example of a single learning rate and representing the prototypes. Reuter
prototype-based classifier. It distinguishes between et al. [54] use multiple prototypes representing an
1041-4347 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more
information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TKDE.2019.2906173, IEEE Transactions on Knowledge and Data Engineering

IEEE TRANS. KNOWLEDGE AND DATA ENGINEERING, VOL. , NO. , —- 2015 3

event. New incoming items are assigned to the most HarVis [2]. Backfried et al. [3] describe an analysis
similar events (by using an offline-trained SVM) or approach based on visual analytics for combining
otherwise new events are created. information from different sources with a specific
Another important topic in streaming analysis is focus on multilingual issues. Vieweg and Hodges [30],
active learning to improve results of classification [63] describe the Artificial Intelligence for Disaster Re-
with an amount of labeled data actively asked by sponse (AIDR) platform, where persons annotate in-
the system [57]. Ienco et al. [29] use a pre-clustering coming tweets (similar to Amazon Mechanical Turk).
step to identify relevant items to be labeled by the The tweets are then used to train classifiers to identify
user. In Smailović et al. [59] active learning is used to more relevant tweets. AIDR allows to classify inco-
improve the sentiment analysis of incoming tweets as ming tweets based on different information catego-
an indicator for stock movements. Hao et al. [27] de- ries, e.g., damage report, casualties, advises, etc. Chen
sign two active learning algorithms (Active Exponen- et al. [15] analyse tweets related to Flu to identify
tially Weighted Average Forecaster and Active Greedy topics for predicting the Flu-peak. Neppalli et al. [42]
Forecaster) which includes feedback of experts for perform sentiment analysis based on social media
labeling. The approach considers confidence of la- related to Hurricane Sandy. The work shows that
bels from the classifier compared to a set of experts. sentiment of users is related to the distance of the Hur-
Hao et al. [26] also introduce online active learning ricane to the users. Twitcident described by Abel et al.
considering second order information, e.g. based on [1] is a framework to search and filter Twitter messa-
covariance matrix. Ma et al. [36] combine decision ges through specific profiles (e.g., keywords). Terpstra
trees with active learning. This approach improves the et al. [61] show the usage of Twitcident in crisis mana-
learning step for decision trees. Bouguelia et al. [12] gement. Tweak-the-Tweet introduced by Starbird et al.
use instance weighting for active online learning. [60] defines a grammar which can be easily integrated
They consider the weight that must be changed to in tweets and therefore automatically parsed. Also,
cause the classifier changing its prediction. If only a TEDAS described by Li et al. [34] is a system to detect
small change in weight changes the original classifica- high-level events (e.g., all car accidents in a certain
tion, then the classifier is highest uncertain about the time period) using spatial and temporal information.
item. Mohamad et al. [39] introduce an active learning Yin et al. [68], [67] design a situational awareness
algorithm for data streams with concept evolution. platform for SM. Tweets are analyzed based on bursty
In addition, they suggest a bi-criteria active learning keywords to identify emergent incidents. Ragini et
algorithm by including both label uncertainty and al. [52] combine several techniques to identify people
density of the underlying distribution [38]. in danger. They examined rule based classification
Monzafari et al. [41] study different batch-based and several machine learning approaches, like SVM,
active learning approaches and define two uncertainty for hybrid classification.
strategies to query labels from crowdsourcing plat- Additional information on social media analysis in
forms. In addition, the authors also define a budget different crises can be found in Reuter and Kauf-
or goal constraint to limit labeling. Žliobaitė et al. [65] hold [53]. Due to the importance of SM, it is our aim
use active learning combined with streaming data. to support emergency management when using the
They suggest several processing mechanisms to iden- content of SM platforms. Currently, there are systems
tify uncertainty regions especially for handling data with crowd-sourcing platform characteristics, but no
drifts. It is also important to minimize the number of procedure (like active learning) is available to directly
queries, asking an expert for labels. Žliobaitė et al. [65] involve emergency management personnel in filtering
include a moving average over the incoming items relevant information.
and the amount of already labeled items to estimate
the budget. We adopted this mechanism together with
the uncertainty strategies. 3 ACTIVE O NLINE M ULTIPLE P ROTOTYPE
Based on categorization of active learning approa- C LASSIFIER (AOMPC)
ches by Settles et al. [57], our implementation is clas-
sified as a stream-based selective sampling approach, Due to the fact that SM data is noisy, it is important
considering different strategies to request instances to identify relevant SM items for the crisis situation at
for labeling. In addition, we use an online feature hand. The idea is to find an algorithm that performs
selection approach described later. this classification and also handles ambiguous items
in a reasonable way. Ambiguous denotes items where
2.3 Social Media Analysis for Crisis Management a clear classification is not possible based on the
Recent research studies SM from several technical current knowledge of the classifier. The knowledge
perspectives. Due to space limitations, we describe should be gained by asking an expert for feedback.
existing SM analysis frameworks mostly in the context The algorithm should be highly self-dependent, by
of crisis management, although there are several fra- asking the expert only labels for a limited number
meworks in other contexts, e.g., Twitterbeat [58] and of items. Therefore, we propose an original approach
1041-4347 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more
information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TKDE.2019.2906173, IEEE Transactions on Knowledge and Data Engineering

IEEE TRANS. KNOWLEDGE AND DATA ENGINEERING, VOL. , NO. , —- 2015 4

TABLE 1 Algorithm 1 : Steps of AOMPC


List of symbols used Input: Data stream X
Output: List of prototypes V
Variable Description 1: CT=1; LTU=CT;
x Input (one item) received by the data stream X
2: Let CT and LTU indicate the current time and the last
with btCT batches
V Set of currently known prototypes time a prototype was updated respectively
α A parameter used in Alg. 1 to compute the sta- 3: for batch btCT of X do
−log2 4: for incoming input x of btCT do
leness of a prototype. It is given as: α = e β , 5: Compute distance ϕi between x and all prototypes
where β is the half-life span, denoted hereafter
as (1/2)-life-span, described in [31] that refers to
vi , i = 1 · · · |V | = I, as follows:
the amount of time required for a quantity to fall if (inaction(vi ) > 0) ϕi = inaction(vi ) · dist(vi , x)
to half its value as measured at the beginning else ϕi = dist(vi , x) end if (1)
of the time period.
I Set of indices i indicating the prototypes vi such that inaction(vi ) = 1 − α(CT −vi .LT U )
dist Appropriate distance measure; see Algorithm 2 6: Compute list of nearest prototypes S
UT Threshold used to identify uncertainty based on sorted index I such that
CT Current time S = createSortedList(I, (x, y)) : (ϕx ≤ ϕy )
LT U Last time the prototype was updated (i.e., the 7: check = uncertainty(x) and within budget();
winner)
8: if check = true then
S List of nearest prototypes in ascending order to
the current input x 9: Query the label of x
label Labels are: relevant, irrelevant, and unknown 10: else
11: x.label = unknown
12: end if
13: if S 6= {} then
that combines different aspects - such as online lear- 14: Let j be the index of the closest prototype: j =
ning and active learning - to build a hybrid classi- S(1)
fier, AOMPC. AOMPC learns from both, labeled and 15: if x.label = unknown then
unlabeled data, in a continuous and evolving way. 16: Assign the data item to vj
In this context, AOMPC is designed to distinguish 17: else
18: if x.label = vj .label then
between relevant and irrelevant SM data related to 19: Reinforce vj with x using only the common
a crisis situation in order to identify the needs of features:
individuals affected by the crisis. AOMPC relies on vj = vj + αCT −LT U (x − vj )
active learning. It implies the intervention of a user in 20: Add the non-common features of x to vj :
some situations to enhance its effectiveness in terms vj .f eature = αCT −LT U (x.feature)
21: else
of identifying relevant data and the related event in 22: Go to line 26
the SM stream of data (see Fig. 1). The user is asked to 23: end if
label an item if there is a high uncertainty about the 24: end if
classification as to whether it is relevant or irrelevant. 25: else
The classifier assigns then the item (be it actively 26: Initialize a new prototype: vnew =x
27: vnew .label = x.label; vnew .LT U = CT
labeled or unlabeled) to the closest cluster or uses it to 28: V = V ∪ {vnew }
create a new cluster. A cluster - in this case - represents 29: end if
either relevant (i.e., specific information about the 30: end for
crisis of interest) or irrelevant information (i.e., not 31: Update winning clusters in btCT with LT U = CT
related to the crisis). The process flow and the steps of 32: CT = CT + 1;
33: end for
AOMPC are shown in Fig. 1. AOPMC is described in
Algorithm 1. The used symbols are defined in Tab. 1.
CT and LTU are updated in batch-mode due to the
feature selection method used (see Section 3.3 for the direction of the new incoming item (steps 17-20).
details). The algorithm could also be used in item- In case the new input comes with new features, the
wise mode. The general idea of this algorithm is prototype’s feature vector is extended to cover those
that the longer a prototype is stale (not updated), the new textual features (see step 20). In general, AOMPC
slower it should move to a new position. The learning is capable of accommodating new features. In the case
rate α is a function of the last time the prototype was of textual input, like in this study, the evolution of the
a winner (i.e., α can be seen as a forgetting factor). The vocabulary over time is captured. When no prototype
winning prototype is computed based on the learning is sufficiently close to the new item (step 22), a new
rate (steps 5-6). If there is an uncertainty detected prototype is created to accommodate that item (steps
(see Section 3.2) and enough budget is available (see 26-28).
Section 3.1), the label is queried (steps 7-11). Other- Algorithm 1 relies on the computation of the dis-
wise (e.g., not enough budget) the winning prototype tance between the input and the existing prototypes
defines the label (step 16). When a prototype wins the (e.g., Euclidean distance in Algorithm 2). Because the
competition among all other neighboring prototypes SM items usually consist of a textual description (c.f.,
based on the queried label, it is updated to move in tweets), we apply the Jaccard coefficient [37] as a
1041-4347 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more
information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TKDE.2019.2906173, IEEE Transactions on Knowledge and Data Engineering

IEEE TRANS. KNOWLEDGE AND DATA ENGINEERING, VOL. , NO. , —- 2015 5

Social Media Items


Text Representation
Pre-Processing

Dynamic Feature Selection

Geo Locations

Features/Keywords

Dynamic Representation
<Geo-Locations, Keyword-Features>

Active Learning
Active Online Multiple User
Prototype Classifier { Relevant | Irrelevant} Sub-Event
(AOMPC) DB
Uncertainty Region
Budget
Strategies

Detailed Sub-Event Classification & Visualization

Dynamic Sub-Event Detection


Futher
Irrelevant Relevant Processing Visualization
information information Summarization & Labeling

Fig. 1. Processing steps

text-based distance (dist text) (see Algorithm 2, steps Algorithm 2 : dist(v, x)


2-3). If the social media items consist of two parts, Input: Prototype v , input x
the body of the message and the geo-location that Output: Distance of (v,x)
1: if the input is a social media item then
indicates where the message was issued in terms
2: Compute the textual distance (Jaccard) as follows:
of coordinates, then we apply a combined distance
measure (dist text+dist geo)/2. Specifically, dist text dist text = 1 − jaccard, where:
refers to the Jaccard coefficient, while dist geo is the jaccard = |A ∩ B|/|A ∪ B|;
Haversine distance [55], [5] described in Algorithm 2, 3: distance = dist text;
steps 4-7. The coordinates are expressed in terms of 4: if the input is a composed social media item then
latitude and longitude. 5: Compute the geo-location distance as follows:
Moreover steps 4-12 of Algorithm 1 are related
dist geo = 1 − H(v.geo co, x.geo co)/π
to the active learning part. The algorithm starts by
where:
checking whether the new input item lies in the p p
H(x1 , x2 ) = 2 · atan2( φ, 1 − φ)
uncertainty region between the relevant and irrelevant
∆lat
prototypes and whether there is enough budget for φ = sin2 ( ) + cos(x1 .lat) ·
2
labeling this item. More details follow in the next ∆lon
section. cos(x2 .lat) · sin2 ( )
2
∆lat = x2 .lat − x1 .lat,
∆lon = x2 .lon − x1 .lon
3.1 Definition of Budget
6: distance = (dist geo + dist text)/2;
The idea of active learning is to ask for user feedback 7: end if
instead of labeling the incoming data item automa- 8: else
tically. To limit the number of interventions of the 9: Note: the input is no social media item
user, a so called budget, is defined. Budget can be 10: Compute the Euclidean distance as follows:
understood as the maximum number of queries to the v
uM
user. We adapt the method presented in [65] to imple-
uX
dist Euclidean(v, x) = t (vi − xi )2 (2)
ment active learning in the context of online multiple i=1
prototype classification. In step 7 of Algorithm 1, the 11: end if
method within budget() checks if enough budget is
available for querying the user. The consumed budget
after k items, bk is defined in [65] as follows:
uk where uk estimates the amount of labels already que-
uk = uk−1 λ + labelingk ; λ = (w − 1)/w; bk =
w ried by the system in the last w steps. The window
(3) w acts as memory [65] (e.g., last 100 item steps)
1041-4347 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more
information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TKDE.2019.2906173, IEEE Transactions on Knowledge and Data Engineering

IEEE TRANS. KNOWLEDGE AND DATA ENGINEERING, VOL. , NO. , —- 2015 6

described by λ. Hence, λ describes the fraction of where step is set to 0.01 as suggested in [65]. We name
including value uk−1 . labelingk updates uk based on this variant dynamic conflicting neighborhood (DCN).
the requested label (i.e., labelingk = 0 if no label In the given equation it is combined with the SCN
was queried and labelingk = 1 if there was a label strategy. Additionally, we combined it with the CVCN
requested) for the current item k. strategy given above.
An upper bound B is defined describing the maxi- As a baseline for comparison, we implement a
mum number of requested labels. B is the fraction of random version (see Eq. 7). We name this variant
data from window w that can be labeled (i.e., B = 0.2 random conflicting neighborhood (RCN).
are 20%). At each step, one input is processed. The 
within budget() procedure in Algorithm 1 checks if 
 1 if (|S| < 2) or

enough budget is available (i.e., bk < B). If so, the  (|ϕi − ϕj | < r




algorithm queries the label of the ambiguous input.  and vi .label 6= vj .label
uncertainty(x) = (7)


 where r ∼ U (0, 1) is a
3.2 Which Data Items to Query? random variable)





In active learning, before querying the label, one 0 otherwise

has to decide which data points to query. Obviously
We also implemented another version, called
one has to find those points, for which the classifier
Random (R) that assumes a fixed uncertainty given
is not confident about the assignment decision (see
by UT as shown in Eq. 8.
Algorithm 1, step 7). In this paper, we use a simple
mechanism based on the neighboring prototype prox- 
imity and labels. An input x is queried if its two most 

1 if (|S| < 2) or
closest prototypes, vi and vj with distances ϕi and ϕj ,  (r < U T )



respectively, and where i = S(1) and j = S(2), have uncertainty(x) = where r ∼ U (0, 1) is a (8)
different labels. Eq. 4 below formalizes the test which 
random variable)


is called simple conflicting neighborhood (SCN) hereafter. 


0 otherwise

 1 if (|S| < 2) or


 (|ϕ − ϕ | < U T and We ignore an absolute pure random version r < B,
i j
uncertainty(x) = (4) because it would increase the number of queries


 vi .label 6= vj .label) drastically compared to the other uncertainty variants.
0 otherwise

However, to make the selection more constrained, a 3.3 Dynamic Representation of Social Media
second variant is introduced. In fact, it is worthwhile Stream
to look at the border area of the inter-class uncertainty The SM items considered in our work are textual
regions, where the labels are very important/useful. documents and therefore their representation will rely
This border area could be used to track concept drift. on the standard tf-idf [48], [37]. The pre-processing
Eq. 5 shows the constraint by multiplying the thres- step pointed out in Fig. 1, as part of the workflow,
hold U T by a random number m that has a uniform makes use of feature extraction which is sufficiently
distribution in unit interval [0,1] (m ∼ U (0, 1)) [65]. discussed in our previous work [48]. This step also
This variant is called controlled variable conflicting neig- includes the identification of word synonymy using
hborhood (CVCN). WordNet [48]. Similar words (e.g. ”car” and ”auto-
mobile”) are reduced to one root word. In this case, a

1 if (|S| < 2) or
document is represented as a bag-of-words. However,


 (|ϕi − ϕj | < (U T ∗ m)


because social media documents arrive online and

uncertainty(x) = and vi .label 6= vj .label (5) are processed as batches, tf-idf should be adapted

where m ∼ U (0, 1)) to meet the streaming requirement [48]. Basically,




the importance of a word is measured based on the

0 otherwise
number of incoming documents containing that word.
Moreover, the threshold U T can be continuously Thus, the evolution of a term’s importance should be
updated, as proposed in [65], according to the follo- reflected in the formulation of tf-idf. Here, we use a
wing rule: factor that scales tf-idf so that the importance increases
and decreases according to the term’s presence in the
 
 1 if (|S| < 2) or
incoming batches (see Eq. 9).

 


 
 (|ϕ − ϕ | < U T and

uncertainty(x) = i j

 vi .label 6= vj .label) (6) scaled tf idft,d = importancet,τ · tft,d · idft (9)
 
0 otherwise
 
The importance factor importancet,τ of term t is cal-



U T = U T + (−1)uncertainty ∗ step

culated over batches (windows) marked by time τ .
1041-4347 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more
information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TKDE.2019.2906173, IEEE Transactions on Knowledge and Data Engineering

IEEE TRANS. KNOWLEDGE AND DATA ENGINEERING, VOL. , NO. , —- 2015 7

Synthetic Data
1
batch-4 batch-1 batch-2 batch-3
The length of the batch is defined by the user (e.g., 30
0.9
minutes). It depends on the nature of the crisis. Slow
0.8
evolution of the crisis may require longer windows,
while fast evolution requires short windows. Terms 0.7

with low importance value are removed from the 0.6

index. For instance, if importance < 0.2, then 80% 0.5

of the term’s importance is lost. The importance of 0.4


a term is computed as follows:
0.3

importancet,τ = gt,τ /g maxt (10) 0.2

0.1
where gt,τ is the weight of term t obtained at time
0
τ . The weight gt,τ is refreshed based on intermediate 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

sampling intervals (i.e., sub-batches, like every 10


minutes). g maxt is the maximum weight the term Fig. 2. GD dataset to simulate the stream appearing in
t reached. gt,τ is expressed as follows: the order batch-1, batch-2, batch-3, batch-4
(
(1 − γ) · ut,τ + γ · gt,τ −1 if ut,τ > gt,τ −1 The geo-tagged text collection, synthetic social media
gt,τ = dataset (SSMD), was generated using a tool1 we ori-
(1 − δ) · ut,τ + δ · gt,τ −1 otherwise
ginally developed for integrating SM into emergency
(11)
exercises (i.e., training of first responders). We genera-
where ut,τ describes the incoming SM items contai-
ted microblogs using a data generation tool we deve-
ning t till time τ and gt,τ −1 is the weight of term t of
loped and which is based on a set of predefined text
the previous sampling interval τ − 1. Case 1 of Eq. 11
snippets that describe sub-events like ”vehicles and
shows how fast terms are learned (i.e., a smaller γ
garbage dumps on fire”, ”police attacked by rioters”,
corresponds to faster increase of importance). Case 2
and ”shop on fire nearby” (see Fig. 3(a)). The rand-
of Eq. 11 shows how fast terms should be forgotten
omly generated data follows the timeline of the UK
(i.e., a higher δ corresponds to slower forgetting or
riots (see [4]) described as an XML file (see Fig. 3(b)).
decrease of importance). The values γ and δ are
This way we generate data which describes incidents
empirically set by the user. We suggest that γ < δ so
close to what happened in reality. The XML file covers
that terms are learned faster, compared to forgetting
the different phases and particularly the sub-events
them again.
of the UK riots which are marked as relevant or
irrelevant using a tag (relevant) to provide the ground
4 E VALUATION truth for the experiments. Irrelevant sub-events in the
data are represented by real-world tweets collected
In the following we present the experimental setting from Twitter in relation to a given location (e.g., Lon-
including the datasets and the metrics we used. We don), while relevant sub-events are based on the text
then describe the experiments and their outcomes. snippets. On the other hand, additional data, in the
form of textual annotations, was collected from Flickr
4.1 Synthetic Datasets and YouTube and was labeled based on the real-world
sub-events of the riots (see [50]). In total, we used a
To evaluate AOMPC, we use two synthetic datasets. collection of 1227 messages, mostly covering London
The first one is a 2-dimensional numerical dataset districts. The data collected over 28 hours (’2011-08-06
and the second one is a collection of SM messages 19:44:00’ to ’2011-08-07 23:44:00’) covers several calm
artificially generated by a tool. These datasets allow periods during the riots. The data is split into 30-
to observe the behavior of the algorithm, especially minutes batches to observe the behavior of AOMPC.
because it simulates data drift. The artificial SM data The number of messages relevant to the riots is 312,
is used to evaluate the online classifier on geo-tagged with 116 distinct text messages. Furthermore, there are
textual data which is close to the real-world data. 915 irrelevant messages with 789 distinct messages. In
The simple 2-dimensional synthetic dataset is based all, the dataset contains approximately 322 repetitions
on Gaussian data (GD). GD consists of 4 batches (see of text messages. Repetition refers to messages that are
Fig. 2) which are sequentially presented to AOMPC. very similar and correspond to retweets.
Each batch consists of 200 points, generated by two
Gaussians which actually represent two clusters. The 4.2 Real-World Datasets
upper clusters (100 points each), denoted as ’x’, are
assumed “irrelevant”, while the lower clusters, deno- The CrisisLexT26 collection [43] was recently made
ted as ’o’, are assumed “relevant”. Batch-4 given in available to the community. It consists of Twitter data
Fig. 2 contains a virtual or temporary drift caused by 1. http://www.bridgeproject.eu/content/bridge information
abrupt changes of the feature values [24]. intelligence flyer.pdf, [Accessed: August 2014]
1041-4347 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more
information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TKDE.2019.2906173, IEEE Transactions on Knowledge and Data Engineering

IEEE TRANS. KNOWLEDGE AND DATA ENGINEERING, VOL. , NO. , —- 2015 8

(a) Data Generation Tool GUI (b) UK riots stream in XML format

Fig. 3. Data Generation Tool

related to 26 crises around the world. Each crisis is sure (see Appendix A for details). CQM contains VM
described by 1,000 items which were randomly se- as a cluster evaluation measure and ER as classifi-
lected and labeled through a crowdsourcing platform. cation specific measure. A high VM value indicates
The class labels of the items were assigned by the a good clustering, whereas a high value of (1-ER)
majority of three crowdsourcing workers. Four cate- unveils satisfactory labelling. The technical details
gories are available: related to the crisis and informative, of VM and ER are given in Appendix A. In terms
related to the crisis - but not informative, not related and of active learning budget B, the number of queries
not applicable. In our case, we have considered items (Q) has been taken into account. In Eq. 12, Bt is
relevant only when they are labeled as related to the the set of batches (Bt = {bt1 , · · · , bt|Bt| }) and vmi
crisis and informative. Otherwise, they are considered and eri are the values of VM and ER for batch bti
irrelevant. respectively. #items is the number of items. As shown
We selected two datasets from the CrisisLexT26 in Eq. 12, the measures are weighted based on their
collection: Colorado Floods (CF) and Australia Bushfi- importance. ER is weighted with a factor of 0.5 due
res (AB) which are dated but not geo-tagged. CF to its high importance, followed by VM with weight
data is from the period ’2013-09-12 07:00:00’ - ’2013- 0.3. Finally, the number of queries is weighted with
09-29 10:00:00’. The data is somewhat imbalanced, 0.2. In conclusion high values of CQM indicate high
the number of relevant items is larger than that of quality of clustering and classification.
the irrelevant ones. CF data consists of 751 relevant
items and 224 irrelevant items and approximately 4.4 Experiments and Results
189 repetitions. Considering the number of relevant
We conducted extensive analysis. In particular, we
and irrelevant items of SSMD, CF has an opposite,
did a sensitivity analysis to observe the effect of the
but very similar, distribution. AB data is from the
algorithm’s parameters: α, β, the threshold U T (see
period ’2013-10-17 05:00:00’ - ’2013-10-29 12:30:00’.
Alg. 1 and Tab. 1), and the budget B (see Sec. 3.1).
It consists of 645 relevant, 408 irrelevant items and
In this section, we describe the outcome of the ex-
approximately 385 retweets.
periments on the datasets using different settings as
shown in Tab. 2. We focus on the performance of
4.3 Evaluation Measures the different uncertainty strategies using CQM. The
Because AOMPC combines clustering and classifica- α-setting represents the fixed and variable α settings.
tion, we developed a combined performance measure, Gaussian Dataset (GD). Considering the most sen-
called combined quality measure (CQM), to evaluate the sitive parameters, namely B and α (see Appendix B),
algorithms. It is defined as follows: the effect of active learning methods is illustrated in
P|Bt| Fig. 4. The other parameters B and UT are discussed
vmi in Appendix B. In general it can be seen that the
CQM = [0.3 ∗ i=1 ]+ (12)
|Bt| uncertainty strategy R yields the lowest CQM value
P|Bt| and that RCN tends to query more often, since the
(1 − eri /100)
[0.5 ∗ i=1 ]+ pure random threshold r varies between 0 and 1 (see
|Bt|
Sec. 3.2). For example, SCN has a query ratio of 0.14
[0.2 ∗ (1 − (Q/#items))]
and RCN a ratio of 0.2 to achieve a similar ER value
It refers to two other known measures, namely the (SCN with ER=1.250 and RCN with ER=1.370). On
validity measure (VM) and the error-rate (ER) mea- average, SCN variants show the most stable results,
1041-4347 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more
information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TKDE.2019.2906173, IEEE Transactions on Knowledge and Data Engineering

IEEE TRANS. KNOWLEDGE AND DATA ENGINEERING, VOL. , NO. , —- 2015 9

0.92 0.92 1

0.9 0.9
0.95

CQM

CQM

CQM
0.88 0.88

0.86 0.86 0.9

0.84 0.84
0.85
0.82 0.82

0.8 0.8 0.8


0.5 0.5 0.5
0.4 6 0.4 6 0.4 6
5 5 5
0.3 4 0.3 4 0.3 4
0.2 3 0.2 3 0.2 3
2 2 2
0.1 1 0.1 1 0.1 1
B α-setting B α-setting B α-setting
(a) SCN (b) SCN with DCN (c) CVCN

0.94 0.86 0.95

0.92
0.84 0.9
CQM

CQM

CQM
0.9

0.88 0.82 0.85

0.86
0.8 0.8
0.84

0.82 0.78 0.75


0.5 0.5 0.5
0.4 6 0.4 6 0.4 6
5 5 5
0.3 4 0.3 4 0.3 4
0.2 3 0.2 3 0.2 3
2 2 2
0.1 1 0.1 1 0.1 1
B α-setting B α-setting B α-setting
(d) CVCN with DCN (e) R (f) RCN

Fig. 4. Results of the different active learning methods using the Gaussian data (GD) and the CQM measure.

TABLE 2 “with DCN” methods vary the border by changing


Evaluation Parameters U T . This behavior is expected, since data varies in
Parameter Values/Instances a small range, i.e., geo-data within London area with
B B = 0.1, 0.2, . . . 0.5 with w = 100 similar incidents (damages caused by riots).
UT 0.1, 0.2, 0.3
β 1, 2, 3, 4 Colorado Floods (CF). Fig. 6 illustrates the outcome
fixed α 0.01 and 0.03 of AOMPC on the CF data for the different active
−log(3)
variable α α=e β as (1/3)-life-span learning strategies. The results of CF indicate good
−log(2)
α=e β as (1/2)-life-span performance for the fixed α values and especially
α=e
log(2/3)
β as (2/3)-life-span
for a low budget B. The results corresponding to
log(7/8) variable α are better than those obtained with fixed
α=e βas (7/8)-life-span
Active Learning Method SCN, CVCN, SCN with DCN, α. Note that higher α leads to fast update of the
CVCN with DCN, R, and RCN AOMPC prototypes and that variable α requires less
α-setting #1 equals to 0.01 (fixed α) queries (see Tab. 5). Based on the Levenshtein distance
α-setting #2 equals to 0.03 (fixed α) (ldis) ([33], for calculating similarity between charac-
α-setting #3 equals to (1/3)-life-span (var. α)
α-setting #4 equals to (1/2)-life-span (var. α) ter strings), there exist 105 items with similar text (i.e.,
α-setting #5 equals to (2/3)-life-span (var. α) ldis ≤ 0.2) in CF, which is a quite small number.
α-setting #6 equals to (7/8)-life-span (var. α) This also indicates that the length of the repeating
while the CVCN variants slightly increase CQM for text fragments are very small (105 vs. 189 repetitions
small values of B (i.e., B ≤ 0.2), because they focus of text). Therefore, the small number of similar items
on concept drift near to the uncertainty boundary. for this long period of the crisis and the performance
related to the variable α with a fast adaptation are
Synthetic Social Media Dataset (SSMD). The
an indication that there are drifts in CF not near the
active learning strategies (SCN, CVCN, SCN with
inter-class border as defined by U T .
DCN and CVCN with DCN) given in Fig. 5 show that
they outperform the random method R. Again, RCN Australian Bushfires (AB). AOMPC’s results on
shows good performance due to the higher variety AB are illustrated in Fig. 7. The variable α shows
of the threshold. For CVCN with DCN 0.22 queries nearly the same performance, but this time it is worse
and RCN 0.24 queries out of B = 0.3 are requested, compared to the values obtained on CF. The AB
reaching an ER of 7.3225 and 7.4984, respectively. A dataset has a high amount of similar items, which
high value of B increases the overall quality of the is 582 (items with ldis ≤ 0.2). This high amount of
results independently of the method (i.e., more labe- similar items is an indicator that changes in data are
led data is available to build the classification model). more common around the boundary, because similar
The CVCN options performs best for high values of vocabulary within the items is used. AOMPC shows
B for the different α settings. In general, the active the best performance with a fixed α value for all
learning options SCN with DCN and CVCN with DCN budget settings. Due to the high similarity between
perform best. This might indicate that concept drift items combined with conflicting labels, it is more dif-
appears along the uncertainty region border as those ficult to distinguish between relevant and irrelevant
1041-4347 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more
information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TKDE.2019.2906173, IEEE Transactions on Knowledge and Data Engineering

IEEE TRANS. KNOWLEDGE AND DATA ENGINEERING, VOL. , NO. , —- 2015 10

0.95 0.9 0.9

0.9 0.85 0.85

CQM

CQM

CQM
0.85 0.8 0.8

0.8 0.75 0.75

0.75 0.7 0.7

0.7 0.65 0.65


0.5 0.5 0.5
0.4 6 0.4 6 0.4 6
5 5 5
0.3 4 0.3 4 0.3 4
0.2 3 0.2 3 0.2 3
2 2 2
0.1 1 0.1 1 0.1 1
B α-setting B α-setting B α-setting
(a) SCN (b) SCN with DCN (c) CVCN

0.9 0.7 0.9

0.85 0.85
0.68
CQM

CQM

CQM
0.8 0.8
0.66
0.75 0.75

0.64
0.7 0.7

0.65 0.62 0.65


0.5 0.5 0.5
0.4 6 0.4 6 0.4 6
5 5 5
0.3 4 0.3 4 0.3 4
0.2 3 0.2 3 0.2 3
2 2 2
0.1 1 0.1 1 0.1 1
B α-setting B α-setting B α-setting
(d) CVCN with DCN (e) R (f) RCN

Fig. 5. Results of the different active learning methods using the synthetic social media dataset (SSMD) and the
CQM measure

0.95 0.84 0.84

0.9 0.82 0.82


CQM

CQM

CQM
0.8 0.8
0.85
0.78 0.78
0.8
0.76 0.76
0.75 0.74 0.74

0.7 0.72 0.72


0.5 0.5 0.5
0.4 6 0.4 6 0.4 6
5 5 5
0.3 4 0.3 4 0.3 4
0.2 3 0.2 3 0.2 3
2 2 2
0.1 1 0.1 1 0.1 1
B α-setting B α-setting B α-setting
(a) SCN (b) SCN with DCN (c) CVCN

0.84 0.8 0.84

0.82 0.78 0.82


CQM

CQM

CQM

0.8
0.76 0.8
0.78
0.74 0.78
0.76

0.74 0.72 0.76

0.72 0.7 0.74


0.5 0.5 0.5
0.4 6 0.4 6 0.4 6
5 5 5
0.3 4 0.3 4 0.3 4
0.2 3 0.2 3 0.2 3
2 2 2
0.1 1 0.1 1 0.1 1
B α-setting B α-setting B α-setting
(d) CVCN with DCN (e) R (f) RCN

Fig. 6. Results of the different active learning methods using the Colorado Floods dataset (CF) and the CQM
measure

items. Consider the following example, which shows conclude a fixed learning rate of α and “with DCN”
the same tweet, but labeled differently [43] (Related- active learning strategies produce good performance
and-informative and Not-related): for both CF and AB, especially, for low values of B.
• Wed Oct 16 17:12:46 +0000 2013: ”RT @Xxxxx: A
dog has risked its life to save a litter of newborn
kittens from a house fire in Melbourne, Australia
4.5 Comparative Studies: AOMPC vs. Others
http://t.co/Gz..”,Eyewitness,Affected individuals,Related Beside the experiments with different datasets and pa-
and informative
• Wed Oct 16 17:13:57 +0000 2013: ”RT @Xxxxx: A dog has ris-
rameters, we compare AOMPC against the unsuper-
ked its life to save a litter of newborn kittens from a house fire vised k-means algorithm that operates without labels
in Melbourne, Australia http://t.co/Gz...”,Not labeled,Not and against a set of supervised online algorithms that
labeled,Not related
require full labeling. This choice should help assess
AB is an interesting dataset for testing the algorithms AOMPC against the extreme ends of the labeling
under various conditions. Fixed α provides much spectrum:
better quality on AB compared to other α-settings as • k-means: Given the online setting, the algorithm
shown in Fig. 7. Considering Figs. 7 and 6, we can is run on batches of the data, setting the number
1041-4347 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more
information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TKDE.2019.2906173, IEEE Transactions on Knowledge and Data Engineering

IEEE TRANS. KNOWLEDGE AND DATA ENGINEERING, VOL. , NO. , —- 2015 11

0.8 0.8 0.8

0.75
0.75 0.75

CQM

CQM

CQM
0.7
0.7 0.7
0.65

0.65 0.65
0.6

0.55
0.5 0.5 0.5
0.4 6 0.4 6 0.4 6
5 5 5
0.3 4 0.3 4 0.3 4
0.2 3 0.2 3 0.2 3
2 2 2
0.1 1 0.1 1 0.1 1
B α-setting B α-setting B α-setting
(a) SCN (b) SCN with DCN (c) CVCN

0.8 0.71 0.8

0.7
0.75 0.75
CQM

CQM

CQM
0.69

0.7 0.68 0.7

0.67
0.65 0.65
0.66

0.65
0.5 0.5 0.5
0.4 6 0.4 6 0.4 6
5 5 5
0.3 4 0.3 4 0.3 4
0.2 3 0.2 3 0.2 3
2 2 2
0.1 1 0.1 1 0.1 1
B α-setting B α-setting B α-setting
(d) CVCN with DCN (e) R (f) RCN

Fig. 7. Results of the different active learning methods using the Australia Bushfires dataset (AB) and the CQM
measure

of clusters to 10. For the real-world datasets (CF only once and then discards it, whereas k-means uses
and AB) k-means has been initialized with 5 all data points for computation. Clearly, the CQM va-
clusters, because there are fewer items per batch lues in Tab. 3 for CF and AB are very high, caused by
compared to the other datasets. For each batch low values of ER. For CF and AB, we used the same
bti ∈ Bt of the data stream, the final centers obtai- batch size (i.e., every 30 minutes) as for the generated
ned from the previous batch serve to initialize the SSMD dataset. More often, only a handful items are
centers of the current batch. contained in the individual batches. Due to the small
• Discriminative Online (Good?) Matlab Algo- number of items per batch, it is not possible that
rithms (DOGMA) [44]: The following algorithms relevant and irrelevant items are highly mixed within
are considered: PA-I [17], RBP and Perceptron the created clusters of each batch. Hence, assignments
[14], Projectron [46], Projectron++ [46], Forgetron are clear/unambigious.
(Kernel-Based Perceptron) [19], and Online Inde- The results of DOGMA algorithms related to the
pendent Support Vector Machines (OISVM) [45]. datasets are displayed in Tab. 4 for the best and
Because these algorithms are fully supervised, worst cases. Details on the remaining algorithms can
they are trained on all labeled data that is allowed be found in Appendix C. Note that the DOGMA
by the budget B. algorithms operate with the maximum amount of
labels given by the budget. Hence, the training data
TABLE 3 is as large as the maximum number of items allowed
K-means: Avg. results for GD, SSMD, CF, and AB by the budget. The CQM value is calculated such that
Q VM ER CQM
Q = B · #items. The evaluation measures are com-
GD
SSMD
0
0
0.8270
0.8143
2.8750
4.7216
0.9337
0.9207
puted based on each batch for comparison. DOGMA
CF
AB
0
0
0.9608
0.9477
0.9235
1.3056
0.9836
0.9778
algorithms are trained based on randomly selected
items from the dataset in advance. To ensure a fair
Running k-means on the different datasets produ- comparison of DOGMA algorithms against AOMPC,
ces the results shown in Tab. 3. CQM is calculated we applied a 10-cross-validation strategy. The results
considering that k-means requires no queries (Q = 0). in Tab. 4 show that in the case of GD, most of the
Items of a cluster are assigned the label of the majority. DOGMA algorithms produce lower CQM compared
This assignment is performed after each batch and to AOMPC results, which are illustrated in Fig. 4.
it is the base for computing the quality measures. It It is an indication that the DOGMA algorithms are
can be seen that for SSMD, k-means produces lower inefficient when dealing with changes in data, like
CQM compared to those of GD. This is also true in the one artificially introduced in batch-4 of GD (see
the case of AOMPC. Considering Fig. 4 and Fig. 5, it Fig. 2 of Sec. 4.1). In case of SSMD, CQM values
can be seen that AOMPC performs well. Comparing obtained by most of the DOGMA algorithms (see
the results of k-means in Tab. 3 with the results of Tab. 4) look similar to those values corresponding
AOMPC in Tab. 5, the AOMPC values represent a to the best active learning method of AOMPC (see
good performance: AOMPC processes each data point Fig. 5 “with DCN” active learning methods). OISVM
1041-4347 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more
information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TKDE.2019.2906173, IEEE Transactions on Knowledge and Data Engineering

IEEE TRANS. KNOWLEDGE AND DATA ENGINEERING, VOL. , NO. , —- 2015 12

TABLE 4 forgetting of outdated knowledge is controlled by α,


Best and worst CQM of DOGMA Algorithms (GD, which also depends on the budget. Learning serves to
SSMD, CF, AB) adapt and/or create clusters in a continuous way. The
Q B VM ER CQM algorithm queries labels on-the-fly for continuously
Forgetron 80 0.1 0.3029 32.5500 0.6081
OISVM 80 0.1 0.8084 3.2625 0.9062 updating the classification model. In summary, it can
RBP 160 0.2 0.3188 31.9500 0.5959
OISVM 160 0.2 0.8217 2.9000 0.8920 be said that budget B and threshold UT are related
Forgetron 240 0.3 0.4100 25.3625 0.6362
OISVM 240 0.3 0.8153 3.0250 0.8695 to each other. Increasing their values increases the
RBP 320 0.4 0.2099 38.6750 0.4896
quality of the algorithm. B has also an influence on
GD

OISVM 320 0.4 0.8180 2.9750 0.8505


RBP 400 0.5 0.4811 20.9000 0.6398
OISVM 400 0.5 0.8157 3.0250 0.8296 the number of clusters that are created (i.e., the more
PA-I
Projectron++
123
123
0.1
0.1
0.7228
0.4202
5.4406
11.5303
0.8696
0.7484
often the user is asked, the more hints for new clusters
Projectron++
OISVM
246
246
0.2
0.2
0.4105
0.8427
10.5367
10.1921
0.7305
0.8619
are given).
PA-I
Forgetron
369
369
0.3
0.3
0.7636
0.5593
2.2302
9.7172
0.8579
0.7592 The advantage of our algorithm compared to the
0.7257
RBP 492 0.4 0.5025 9.0046
others is the transferred knowledge from one batch to
SSMD

OISVM 492 0.4 0.8834 5.0767 0.8596


PA-I
RBP
615
615
0.5
0.5
0.8647
0.6244
1.2505
5.3916
0.8532
0.7604 the next creating a continuous view on the arriving
PA-I
Projectron++
98
98
0.1
0.1
0.7631
0.7137
17.5100
28.4213
0.8214
0.7520
data. The already known prototypes act as memory
PA-I
RBP
196
196
0.2
0.2
0.7728
0.7141
15.9354
23.7132
0.8122
0.7557
(i.e., forgetting is based on α and learning is based on
PA-I
Forgetron
294
294
0.3
0.3
0.8039
0.7180
13.8672
29.8722
0.8118
0.7060
the new creation of clusters, see Algorithm 1).
PA-I 392 0.4 0.8222 12.7396 0.8030
In terms of performance, Tab. 5 shows the best
CF

Forgetron 392 0.4 0.7117 28.5864 0.6906


PA-I
Forgetron
490
490
0.5
0.5
0.8405
0.7353
11.3371
24.1613
0.7955
0.6998
results of AOMPC for different budget values using
PA-I
Projectron++
106
106
0.1
0.1
0.6791
0.6440
22.9801
32.6142
0.7688
0.7101
the CQM measure. For GD, the variable learning rate
PA-I
Forgetron
212
212
0.2
0.2
0.7094
0.6643
20.9924
29.6821
0.7678
0.7109
α and the fixed α rate in the case of SSMD show good
PA-I
RBP
318
318
0.3
0.3
0.7428
0.6707
17.6217
27.3168
0.7747
0.7046
performance. For CF, the variable learning rate seems
PA-I 424 0.4 0.7751 16.0927 0.7721
to be more suitable considering the number of queries.
AB

Forgetron 424 0.4 0.6870 24.4803 0.7037


Forgetron
OISVM
530
530
0.5
0.5
0.7086
0.8087
22.5930
13.6702
0.6996
0.7743
AOMPC produces good results on AB using a fixed
learning rate. The reason is that the data items are
TABLE 5
very similar and that changes within the textual data
Best results of AOMPC based on budget B
happen slowly and near the boundary. Finally, com-
B Query strategies α (β for Q VM ER CQM paring the active learning strategies (“DCN” options),
var. α) (Q/#items)
0.1 SCN 0.03 79.0 (0.10) 0.8460 2.3750 0.9222 we can notice that very good performance is achieved
0.2 SCN 1/2 (4) 113.0 (0.14) 0.9180 1.2500 0.9409
especially for SSMD and CF. The quality of clustering
GD

0.3 SCN 1/2 (4) 114.0 (0.14) 0.9180 1.2500 0.9406


0.4 SCN 1/2 (4) 114.0 (0.14) 0.9180 1.2500 0.9406
0.5 SCN 1/2 (4) 114.0 (0.14) 0.9180 1.2500 0.9406 increases even for low values of B.
0.1 CVCN with DCN 0.03 113.0 (0.09) 0.7080 12.2120 0.8329
0.2 SCN 1/3(1) 140.0 (0.11) 0.8440 12.2762 0.8690 Overall, AOMPC shows a quite good performance
SSMD

0.3 SCN 0.03 300.0 (0.24) 0.9161 8.8391 0.8817


0.4 CVCN with DCN 0.01 256.0 (0.21) 0.8640 5.8791 0.8881 (see Tables 4, 3 and 5), despite the fact that it operates
0.5 CVCN with DCN 0.03 238.0 (0.19) 0.8876 9.4269 0.8804
0.1 SCN 1/2 (2) 27.0 (0.03) 0.7451 18.0411 0.8278 online and handles labeling just-in-time. Moreover,
0.2 CVCN 1/2 (2) 32.0 (0.03) 0.7463 18.0141 0.8273
AOMPC was run on batches just for the sake of
CF

0.3 RCN 2/3 (2) 223.0 (0.23) 0.8050 13.4949 0.8283


0.4 SCN 0.03 297.0 (0.30) 0.8261 11.6488 0.8287
0.5 SCN 0.03 297.0 (0.30) 0.8261 11.6488 0.8287 feature selection (see Sec. 3.3). AOMPC can run in
0.1 CVCN with DCN 0.01 117.0 (0.11) 0.6669 31.4934 0.7204
0.2 CVCN with DCN 0.03 215.0 (0.20) 0.7325 27.7243 0.7403 purely point-based online mode (i.e., item-by-item) as
AB

0.3 SCN 0.01 304.0 (0.29) 0.7383 22.7398 0.7501


0.4 CVCN with DCN 0.01 343.0 (0.33) 0.7607 18.8053 0.7690 well. In the future, we plan to extend this algorithm
0.5 CVCN 0.03 380.0 (0.36) 0.7728 17.4619 0.7723
by deleting clusters when they lose their importance.
and PA-I produce the best performance on SSMD. In This could also be done for features in order to obtain
all, AOMPC performs well for on-the-fly querying. an evolving feature space. We also plan to implement
The DOGMA results related to CF and AB are also a variable budget strategy so that, for instance, the
given in Tab. 4. Considering CQM as representative number of queries (i.e., budget) is bigger for cold-
measure, DOGMA produced similar results to those start and gets reduced afterward, depending on the
produced by AOMPC shown in Figs. 6 and 7. uncertainty and the performance of the algorithm. Fi-
In a nutshell, AOMPC shows good performance nally, it would be interesting to identify drift, without
compared to DOGMA, although the selection of items defining a threshold, but by considering the general
to query is performed on the fly. In addition, DOGMA case, where classes are non-contiguous.
algorithms use fully labeled data, while AOMPC uses
only a subset of labeled data whose size is upper 5 C ONCLUSION
bounded by the budget.
This paper presents a streaming analysis framework
for distinguishing between relevant and irrelevant
4.6 Discussion and Future Work data items. It integrates the user into the learning
The advantage of AOMPC compared to the other process by considering the active learning mechanism.
algorithms is the continuous processing of data stre- We evaluated the framework for different datasets,
ams and incremental update of knowledge, where the with different parameters and active learning strate-
existing prototypes act as memory for the future. Here gies. We considered synthetic datasets to understand
1041-4347 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more
information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TKDE.2019.2906173, IEEE Transactions on Knowledge and Data Engineering

IEEE TRANS. KNOWLEDGE AND DATA ENGINEERING, VOL. , NO. , —- 2015 13

the behavior of the algorithm and real-world social [17] K. Crammer, O. Dekel, J. Keshet, S. Shalev-Shwartz, and Y. Sin-
media datasets related to crises. We compared the ger, “Online Passive-Aggressive Algorithms,” J. Mach. Learn.
Res., vol. 7, pp. 551–585, Dec. 2006.
proposed algorithm, AOMPC, against many existing [18] S. Dashti, L. Palen, M. P. Heris, K. M. Anderson, S. Anderson, and
algorithms to illustrate the good performance under S. Anderson, “Supporting Disaster Reconnaissance with Social
Media Data: A Design-Oriented Case Study of the 2013 Colorado
different parameter settings. As explained in Sec. 4.6, Floods,” in Proc. of the 11th Int’l Conference on Information Systems
the algorithm can be extended to overcome many for Crisis Response and Management, University Park, Pennsylva-
nia, USA, 2014.
issues, for instance by considering: dynamic budget, [19] O. Dekel, S. Shalev-Shwartz, and Y. Singer, “The Forgetron: A
dynamic deletion of stale clusters, and generalization Kernel-Based Perceptron on a Fixed Budget,” in NIPS. MIT
to handle non-contiguous class distribution. Press, 2005, pp. 259–266.
[20] A. Denecke, H. Wersing, J. Steil, and E. Körner, “Online figure-
ground segmentation with adaptive metrics in generalized
LVQ,” Neurocomputing, vol. 72, no. 7-9, pp. 1470 – 1482, 2009.
ACKNOWLEDGMENTS [21] S. Denef, P. S. Bayerl, and N. Kaptein, “Social Media and the
The research leading to these results has received fun- Police - Tweeting Practices of British Police Forces during the
August 2011 Riots,” in Proc. of the SIGCHI Conf. on Human Factors
ding from the European Union Seventh Framework in Computing Systems (CHI), Paris, France, May 2013.
Programme (FP7/2007-2013) under grant agreement [22] N. Dufty, “Using Social Media to build Community Disaster Re-
silience,” The Australian Journal of Emergency Management, vol. 27,
n◦ 261817 and was partly performed in the Lakeside no. 1, pp. 40–45, 2012.
Labs research cluster at Alpen-Adria-Universität Kla- [23] M. Freeman and A. Freeman, “Bonding over Bushfires: Social
genfurt. Networks in Action,” in IEEE International Symposium on Techno-
logy and Society (ISTAS), June 2010, pp. 419–426.
[24] J. a. Gama, I. Žliobaitė, A. Bifet, M. Pechenizkiy, and A. Bou-
R EFERENCES chachia, “A Survey on Concept Drift Adaptation,” ACM Comput.
Surv., vol. 46, no. 4, pp. 44:1–44:37, 2014.
[1] F. Abel, C. Hauff, G.-J. Houben, R. Stronkman, and K. Tao, “Se- [25] B. Hammer, D. Hofmann, F.-M. Schleif, and X. Zhu, “Learning
mantics + Filtering + Search = Twitcident. Exploring Information Vector Quantization for (dis-)similarities,” Neurocomputing, vol.
in Social Web Streams,” in Proc. of the 23rd ACM Conf. on Hypertext 131, pp. 43 – 51, 2014.
and Social Media. ACM, 2012, pp. 285–294. [26] S. Hao, J. Lu, P. Zhao, C. Zhang, S. C. H. Hoi, and C. Miao,
[2] U. Ahmad, A. Zahid, M. Shoaib, and A. AlAmri, “Harvis: An “Second-Order Online Active Learning and Its Applications,”
integrated social media content analysis framework for youtube IEEE Transactions on Knowledge and Data Engineering, vol. 30,
platform,” Information Systems, vol. 69, pp. 25 – 39, 2017. no. 7, pp. 1338–1351, July 2018.
[3] G. Backfried, J. Gollner, G. Qirchmayr, K. Rainer, G. Kienast, [27] S. Hao, P. Hu, P. Zhao, S. C. H. Hoi, and C. Miao, “Online Active
G. Thallinger, C. Schmidt, and A. Peer, “Integration of Media Learning with Expert Advice,” ACM Trans. Knowl. Discov. Data,
Sources for Situation Analysis in the Different Phases of Disaster vol. 12, no. 5, pp. 58:1–58:22, 2018.
Management: The QuOIMA Project,” in Eur. Intel. and Security [28] S. R. Hiltz, B. van de Walle, and M. Turoff, “The Domain of
Informatics Conf., Aug 2013, pp. 143–146. Emergency Management Information,” in Information Systems for
[4] BBC News Europe. (2012, Aug.) England Riots: Maps and Emergency Management. Armonk, New York: B. van de Walle,
Timeline. [Online]. Available: http://www.bbc.co.uk/news/ M. Truoff and S. R. Hiltz, 2010, vol. 16, pp. 3–19.
uk-14436499 [29] D. Ienco, A. Bifet, I. Žliobaitė, and B. Pfahringer, “Clustering
[5] H. Becker, M. Naaman, and L. Gravano, “Learning Similarity Based Active Learning for Evolving Data Streams,” in Discovery
Metrics for Event Identification in Social Media,” in Proc. of the Science, ser. Lecture Notes in Computer Science, J. Fürnkranz,
Third ACM Int’l Conf. on Web Search and Data Mining, ser. WSDM E. Hüllermeier, and T. Higuchi, Eds. Springer Berlin Heidelberg,
’10. NY, USA: ACM, 2010, pp. 291–300. 2013, vol. 8140, pp. 79–93.
[6] J. Bezdek, T. Reichherzer, G. Lim, and Y. Attikiouzel, “Multiple-
[30] M. Imran, C. Castillo, J. Lucas, P. Meier, and S. Vieweg, “AIDR:
Prototype Classifier Design,” IEEE Trans. on Systems, Man, and
Artificial Intelligence for Disaster Response,” in Proc. of the Com-
Cybernetics, Part C: Applications and Reviews, vol. 28, no. 1, pp. 67–
panion Publication of the 23rd Int’l Conf. on World Wide Web, ser.
79, Feb 1998.
WWW Companion ’14, April 2014, pp. 159–162.
[7] M. Biehl, B. Hammer, and T. Villmann, “Prototype-based Models
in Machine Learning,” Wiley Interdisciplinary Reviews: Cognitive [31] Y. Ishikawa, Y. Chen, and H. Kitagawa, “An On-Line Document
Science, vol. 7, no. 2, pp. 92–111, 2016. Clustering Method Based on Forgetting Factors,” in Research and
[8] A. Bouchachia, “Learning with Incrementality,” in Proc. of the Int’l Advanced Technology for Digital Libraries, ser. Lecture Notes in
Conf. on Neural Information Processing, 2006, pp. 137–146. Computer Science, P. Constantopoulos and I. T. Solvberg, Eds.
[9] ——, “Incremental Learning with Multi-Level Adaptation,” Neu- Springer, 2001, vol. 2163, pp. 325–339.
rocomputing, vol. 74, no. 11, pp. 1785–1799, 2011. [32] T. Kohonen, “The Self-Organizing Map,” Proc. of the IEEE, vol. 78,
[10] A. Bouchachia and C. Vanaret, “Incremental Learning Based no. 9, pp. 1464 –1480, Sep 1990.
on Growing Gaussian Mixture Models,” in 10th Int’l Conf. on [33] V. I. Levenshtein, “Binary Codes Capable of Correcting Deleti-
Machine Learning and Applications and Workshops (ICMLA), vol. 2, ons, Insertions and Reversals,” in Soviet physics doklady, vol. 10,
Dec 2011, pp. 47–52. 1966, p. 707.
[11] ——, “GT2FC: An Online Growing Interval Type-2 Self-Learning [34] R. Li, K. H. Lei, R. Khadiwala, and K.-C. Chang, “TEDAS: A
Fuzzy Classifier,” IEEE Transactions on Fuzzy Systems, vol. 22, Twitter-based Event Detection and Analysis System,” in IEEE
no. 4, pp. 999–1018, 2014. 28th Int’l Conf. on Data Engineering (ICDE), 2012, pp. 1273–1276.
[12] M.-R. Bouguelia, Y. Belaı̈d, and A. Belaı̈d, “An Adaptive Stre- [35] S. Liu, L. Palen, J. Sutton, A. Hughes, and S. Vieweg, “In Search of
aming Active Learning Strategy based on Instance Weighting,” the Bigger Picture: The Emergent Role of On-Line Photo-Sharing
Pattern Recognition Letters, vol. 70, pp. 38 – 44, 2016. in Times of Disaster,” in Proc. of the 5th Int’l ISCRAM Conf., 2008.
[13] M. Büscher and M. Liegl, “Connected Communities in Crises,” [36] L. Ma, S. Destercke, and Y. Wang, “Online Active Learning of
in Social Media Analysis for Crisis Management, H. Hellwagner, Decision Trees with Evidential Data,” Pattern Recognition, vol. 52,
D. Pohl, and R. Kaiser, Eds. IEEE Computer Society Special pp. 33 – 45, 2016.
Technical Community on Social Networking E-Letter, March [37] C. Manning, P. Raghavan, and H. Schütze, Introduction to Infor-
2014, vol. 2, no. 1. mation Retrieval. Cambridge University Press, 2008.
[14] G. Cavallanti, N. Cesa-Bianchi, and C. Gentile, “Tracking the best [38] S. Mohamad, A. Bouchachia, and M. Sayed-Mouchaweh, “A bi-
Hyperplane with a simple Budget Perceptron,” Machine Learning, criteria active learning algorithm for dynamic data streams,”
vol. 69, no. 2-3, pp. 143–167, 2007. IEEE transactions on neural networks and learning systems, vol. 29,
[15] L. Chen, K. S. M. Tozammel Hossain, P. Butler, N. Ramakrishnan, no. 1, pp. 74–86, 2018.
and B. A. Prakash, “Syndromic Surveillance of Flu on Twitter [39] S. Mohamad, M. Sayed-Mouchaweh, and A. Bouchachia, “Active
Using Weakly Supervised Temporal Topic Models,” Data Mining learning for classifying data streams with unknown number of
and Knowledge Discovery, vol. 0, no. 3, pp. 681–710, May 2016. classes,” Neural Networks, vol. 98, pp. 1–15, 2018.
[16] T. M. Cover and J. A. Thomas, “Entropy, Relative Entropy and [40] B. Mokbel, B. Paassen, F.-M. Schleif, and B. Hammer, “Metric
Mutual Information,” in Elements of Information Theory. New Learning for Sequences in Relational LVQ,” Neurocomputing, vol.
Jersey: A John Wiley & Sons, 2006. 169, pp. 306 – 322, 2015.
1041-4347 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more
information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TKDE.2019.2906173, IEEE Transactions on Knowledge and Data Engineering

IEEE TRANS. KNOWLEDGE AND DATA ENGINEERING, VOL. , NO. , —- 2015 14

[41] B. Mozafari, P. Sarkar, M. Franklin, M. Jordan, and S. Madden, Contribute to Situational Awareness,” in Proc. of the Int’l Conf.
“Scaling Up Crowd-sourcing to Very Large Datasets: A Case for on Human Factors in Computing Systems, ser. CHI ’10. NY, USA:
Active Learning,” Proc. VLDB Endow., vol. 8, no. 2, pp. 125–136, ACM, 2010, pp. 1079–1088.
Oct. 2014. [65] I. Žliobaitė, A. Bifet, B. Pfahringer, and G. Holmes, “Active
[42] V. K. Neppalli, C. Caragea, A. Squicciarini, A. Tapia, and Learning With Drifting Streaming Data,” IEEE Trans. on Neural
S. Stehle, “Sentiment Analysis during Hurricane Sandy in Emer- Networks and Learning Sys., vol. 25, no. 1, pp. 27–39, Jan 2014.
gency Response,” International Journal of Disaster Risk Reduction, [66] I. H. Witten, E. Frank, and M. A. Hall, Data Mining: Practical
vol. 21, pp. 213 – 222, 2017. Machine Learning Tools and Techniques. Elsevier, 2011.
[43] A. Olteanu, S. Vieweg, and C. Castillo, “What to Expect When [67] J. Yin, A. Lampert, M. Cameron, B. Robinson, and R. Power,
the Unexpected Happens: Social Media Communications Across “Emergency Situation Awareness from Twitter for Crisis Mana-
Crises,” In Proc. of the ACM Conf. on Computer Supported Coopera- gement,” Int’l Workshop on Social Web for Disaster Management
tive Work and Social Computing, 2015. (SWDM), In conjunction with WWW’12, no. 99, p. 1, 2012.
[44] F. Orabona, DOGMA: a MATLAB toolbox for Online Learning, 2009, [68] ——, “Using Social Media to Enhance Emergency Situation Awa-
software available at http://dogma.sourceforge.net. reness,” IEEE Intelligent Sys., vol. 27, no. 6, pp. 52–59, 2012.
[45] F. Orabona, C. Castellini, B. Caputo, L. Jie, and G. Sandini, “On-
line Independent Support Vector Machines,” Pattern Recognition,
vol. 43, no. 4, pp. 1402 – 1412, 2010.
[46] F. Orabona, J. Keshet, and B. Caputo, “Bounded Kernel-Based
Online Learning,” J. Mach. Learn. Res., vol. 10, pp. 2643–2666, Dec.
2009.
[47] S.-Y. Perng, M. Büscher, L. Wood, R. Halvorsrud, M. Stiso, L. Ra- Daniela Pohl received her Dipl.-Ing. (Mas-
mirez, and A. Al-Akka, “Peripheral Response: Microblogging ter’s degree) in Computer Science in 2008 at
During the 22/7/2011 Norway Attacks,” Int’l Journal of Infor- the Alpen-Adria-Universität Klagenfurt, Au-
mation Systems for Crisis Response and Management (IJISCRAM),
stria. She worked as research assistant in
vol. 5, no. 1, pp. 41–57, 2013.
[48] D. Pohl, A. Bouchachia, and H. Hellwagner, “Online Processing the scope of the EU-funded FP7 project
of Social Media Data for Emergency Management,” in Int’l Conf. BRIDGE (www.bridgeproject.eu) to develop
on Machine Learning and Applications, vol. 2, Dec. 2013, pp. 333 – technical solution to improve crisis mana-
338. gement. She received her doctoral degree
[49] D. Pohl, “Social Media Analysis for Crisis Management: A Brief 2015 at the Alpen-Adria-Universität Klagen-
Survey,” in Social Media Analysis for Crisis Management, H. Hel- furt. Her research interests include informa-
lwagner, D. Pohl, and R. Kaiser, Eds. IEEE Computer Society tion retrieval and machine learning.
Special Technical Community on Social Networking E-Letter,
March 2014, vol. 2, no. 1.
[50] D. Pohl, A. Bouchachia, and H. Hellwagner, “Social Media for
Crisis Management: Clustering Approaches for Sub-Event De-
tection,” Multimedia Tools and Applications, pp. 1–32, 2013.
[51] ——, “Online Indexing and Clustering of Social Media Data for
Emergency Management,” Neurocomputing, vol. 172, pp. 168 – Abdelhamid Bouchachia is Professor at
179, 2016. Bournemouth University, Department of
[52] J. R. Ragini, P. R. Anand, and V. Bhaskar, “Mining Crisis Infor- Computing, UK. His major research interests
mation: A Strategic Approach for Detection of People at Risk include Machine Learning and computational
through Social Media Analysis,” International Journal of Disaster intelligence with a particular focus on
Risk Reduction, vol. 27, pp. 556 – 566, 2018. scalable online/incremental learning, semi-
[53] C. Reuter and M. Kaufhold, “Fifteen Years of Social Media in
supervised and active learning, prediction
Emergencies: A Retrospective Review and Future Directions for
Crisis Informatics,” Journal of Contingencies and Crisis Manage- systems, and uncertainty modelling. He
ment, vol. 26, no. 1, pp. 41–57, 2018. published numerous papers in international
[54] T. Reuter and P. Cimiano, “Event-based Classification of Social journals and conferences and edited several
Media Streams,” in Proc. of the 2nd ACM Int’l Conf. on Multimedia special issues and volumes. He founded
Retrieval, 2012, pp. 22:1–22:8. and served as the general chair of the International Conference
[55] T. Reuter, P. Cimiano, L. Drumond, K. Buza, and L. Schmidt- on Adaptive and Intelligent Systems (ICAIS) for many years.
Thieme, “Scalable Event-Based Clustering of Social Media via He currently serves as program committee member for many
Record Linkage Techniques,” in The 5th Int’l Conf. on Weblogs and conferences and is acting as Associate Editor of Evolving Systems
Social Media, 2011, pp. 313–320. as well as member of Evolving Intelligent Systems (EIS) Technical
[56] A. Rosenberg and J. Hirschberg, “V-Measure: A Conditio- Committee (TC) of the IEEE Systems, Man and Cybernetics Society
nal Entropy-Based External Cluster Evaluation Measure,” in and member of the IEEE Task-Force for Adaptive and Evolving
EMNLP-CoNLL, vol. 7, 2007, pp. 410–420. Fuzzy Systems and the IEEE Computational Intelligence Society.
[57] B. Settles, “Active Learning Literature Survey,” University of Wis-
consin, Madison, vol. 52, pp. 55–66, 2010.
[58] E. Shook, K. Leetaru, G. Cao, A. Padmanabhan, and S. Wang,
“Happy or Not: Generating Topic-based Emotional Heatmaps
for Culturomics using CyberGIS,” in 2012 IEEE 8th Int’l Conf. on
E-Science, oct. 2012, pp. 1 –6.
[59] J. Smailović, M. Grčar, N. Lavrač, and M. Žnidaršič, “Stream- Hermann Hellwagner is a full professor
based Active Learning for Sentiment Analysis in the Financial of Informatics in the Institute of Informa-
Domain (in press),” Information Sciences, April 2014. tion Technology (ITEC), Klagenfurt Univer-
[60] K. Starbird and J. Stamberger, “Tweak the Tweet: Leveraging Mi-
sity, Austria, leading the Multimedia Commu-
croblogging Proliferation with a Prescriptive Syntax to Support
Citizen Reporting,” in Proc. of the 7th Int’l ISCRAM Conf., Seattle, nications group. His current research areas
USA, May 2010. are distributed multimedia systems, multime-
[61] T. Terpstra, A. de Vries, R. Stronkman, and G. L. Paradies, “To- dia communications, and quality of service.
wards a Realtime Twitter Analysis during Crises for Operational He has received many research grants from
Crisis Management,” in Proc. of the 9th Int’l ISCRAM Conf., Van- national (Austria, Germany) and European
couver, April 2012. funding agencies as well as from industry,
[62] M. F. Umer and M. S. H. Khiyal, “Classification of Textual Docu- is the editor of several books, and has pu-
ments using Learning Vector Quantization,” Information Techno- blished more than 250 scientific papers on parallel computer archi-
logy Journal, vol. 6, no. 1, pp. 154–159, 2007. tecture, parallel programming, and multimedia communications and
[63] S. Vieweg and A. Hodges, “Rethinking Context: Leveraging adaptation. He is a senior member of the IEEE, member of the ACM
Human and Machine Computation in Disaster Response,” Com- and OCG (Austrian Computer Society); he was Vice President of the
puter, vol. 47, no. 4, pp. 22–27, Apr 2014.
Austrian Science Fund (FWF).
[64] S. Vieweg, A. L. Hughes, K. Starbird, and L. Palen, “Microblog-
ging During Two Natural Hazards Events: What Twitter May
1041-4347 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more
information.

You might also like