0% found this document useful (0 votes)

39 views8 pages

Road Traffic Event Detection Using Twitter Data Machine Learning and Apache Spark

Uploaded by

kuuh29

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views8 pages

Road Traffic Event Detection Using Twitter Data Machine Learning and Apache Spark

Uploaded by

kuuh29

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

2019 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing &

Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation

Road Traffic Event Detection Using Twitter Data,

Machine Learning, and Apache Spark
Ebtesam Alomari1, Rashid Mehmood2, and Iyad Katib1
1
Faculty of Computing and Information Technology, 2High Performance Computing Center
King AbdulAziz University
Jeddah, Saudi Arabia
[email protected], {RMehmood, IAKatib}@kau.edu.sa

Abstract—Road transportation is the backbone of modern sustainability” [3]. In smart cities and societies, a large
societies, yet it costs annually over a million deaths and trillions amount of diverse information is produced daily by
of dollars to the global economy. Social media such as Twitter heterogeneous sources including GPS, cameras smartphones
have increasingly become an important source of information as well as user-generated content from social media. Such
in many dimensions of smart societies. Automatic detection of data offers the potential for developing novel solutions that
road traffic events using Twitter data mining is one such area will support decision making for smart transportation. In
of a great many applications and enormous potential, albeit recent years, several approaches related to transportation in
facing major challenges concerning the management and smart cities have been proposed, e.g., autonomic
analysis of big data (volume, velocity, variety, and veracity).
transportation systems [4] and intelligent disaster
Various approaches on the subject have been proposed in
management [5].
recent years, but the methods and outcomes are in their
infancy. This paper proposes a method for automatic detection Social media such as Twitter and Facebook are a relative-
of road traffic related events from tweets in the Saudi dialect ly inexpensive and conveniently available source of
using machine learning and big data technologies. Firstly, we information comparing to physical sensors that cost greatly
build and train a classifier using three machine learning to install at a large scale to monitor the traffic flow. Twitter
algorithms, Naïve Bayes, Support Vector Machine, and logistic is one of the most popular microblogging media used for
regression, to filter tweets into relevant and irrelevant. Subse- communication and sharing personal status, events, news,
quently, we train other classifiers to detect multiple types of etc. Twitter allows users to post short text messages called
events including accident, roadwork, road closure, road tweets. A massive amount of real-time data is posted by
damage, traffic condition, fire, weather, and social events. The millions of users on various topics including transportation
results from the analysis of one million tweets show that our and real-time road traffic.
method is able to detect road traffic events, as well as their Moreover, Twitter has been adopted as a powerful data
location and time, automatically, without any prior knowledge source in smart transportation. In recent years, there has been
of the events. To the best of our knowledge, this is the first an increasing amount of literature on the use of Twitter as a
work on traffic event detection from Arabic tweets using sensor for traffic monitoring [6], flow forecasting [7],
machine learning and the Apache Spark big data platform. congestion estimation [8], and event detection [9]. These
approaches show great potential for this area, albeit face
Keywords—Twitter data analysis, Smart transportation,
major challenges. From the data mining perspective, event
Event detection, Smart cities, Machine learning, Text mining,
Big data analytics, Apache Spark, MongoDB, Naïve Bayes,
detection from unstructured, rapidly evolving tweets is a
Support Vector Machine (SVM), Logistic Regression challenging task. The Twitter data has all the characteristics
of big data, i.e., volume, velocity, variety, and veracity.
I. INTRODUCTION Therefore, the management and analysis of Twitter data for
event detection purposes is a major challenge. Advanced
Road transportation is the backbone of modern cities and
techniques and efficient approaches for data mining are
societies, yet it costs, annually, 1.25 million deaths and 20-
required to extract useful information, monitor the changes
50 million people injured across the globe [1]. Moreover,
and predict future observations [10].
road traffic congestion is one of the most significant
Another dimension of the automatic event detection do-
problems in modern cities. The annual cost of congestion to
main is the language of the tweets. Many researchers have
the US economy alone exceeds $305 billion [2]. The
attempted using social media information to monitor road
increasing number of vehicles, social events, lane closures,
traffic in different countries by analyzing text from different
roadworks, adverse weather, and other unexpected incidents
languages such as Japanese [11], Italian [12], and Chinese
have a negative impact on traffic flow and cause traffic
[13]. Our interests lie in detecting events from tweets in
congestions. Therefore, those causes, namely events (inci-
Saudi Arabia which has its own challenges due to the
dents), should be detected in an efficient and timely manner
dialectical Arabic which is used mostly in everyday tweeting
in order to support decision making and set management
compared to the formal Modern Standard Arabic (MSA).
strategies to reduce or eliminate congestion.
Another research gap is the limited use of big data
Smart cities provide “state-of-the-art approaches for ur-
technologies in the automatic detection of road traffic events
banization, having evolved from … knowledge-based
from Twitter data in the Arabic language. Particularly, to the
economy … digital economy and intelligent economy. The
best of our knowledge, no work exists that uses big data
notion of smart cities can be extended to smart societies …
technologies for automatic event detection of road traffic
digitally enabled, knowledge-based societies, aware of and
events from tweets in the Arabic language.
working towards social, environmental, and economic

978-1-7281-4034-6/19/$31.00 ©2019 IEEE 1888

DOI 10.1109/SmartWorld-UIC-ATC-SCALCOM-IOP-SCI.2019.00332
Authorized licensed use limited to: University of Botswana. Downloaded on October 18,2024 at 10:42:51 UTC from IEEE Xplore. Restrictions apply.
To sum up, several approaches to automatic event detec- Additionally, they used three machine learning methods:
tion from Twitter data have been proposed in recent years, Naïve Bayes, K-nearest-neighbor and the Dynamic
but the methods and outcomes are in their infancy. This work Language Model (DLM). Moreover, D’Andrea et al. [12]
aims at detecting traffic-related events to enable smarter collected real-time Italian tweets and classified them after
transportation. To this end, we propose a method for auto- applying text mining techniques. The tweets are classified
matic detection of road traffic related events from tweets in into three classes namely, traffic due to an external event,
Saudi dialect using machine learning and big data technolo- traffic congestion or crash, and non-traffic. Other works on
gies. Firstly, we build and train a classifier using three traffic event detection using social media include [17], [18],
machine learning (ML) algorithms, Naïve Bayes, Support [19], [20], [21] and [22].
Vector Machine (SVM), and logistic regression, to filter
tweets into relevant and irrelevant. Subsequently, other B. General event detection from Arabic Tweets
classifiers were trained to detect multiple types of events The amount of research about analyzing Arabic social
including accident, roadwork, road closure, road damage, information for event detection is considerably limited
traffic condition, fire, weather, and social events. The results compared to what is done in other languages. AL-Smadi and
show that our method is able to detect road traffic events, as Qawasmeh [23] used an unsupervised rule-based technique
well as their location and time, automatically, without any to extract events about technology, sports, and politics out of
prior knowledge of the events. Subsequently, the detected Arabic tweets. Furthermore, Alsaedi and Pete [24] proposed
events are validated by searching in official sources such as a framework for detecting disruptive events from Arabic
newspapers websites. The classification accuracy is also tweets. The tweets are classified into event and non-events
evaluated using four widely used metrics; precision, accura- tweets using a Naïve Bayes model. Also, they applied an
cy, recall, and F-score. online clustering algorithm to identify the topic of an event.
The big data platform that we have used is Apache Spark. Moreover, they extended their work and used the clustering
It is a powerful in-memory distributed computing platform algorithm to detect the riots events [25].
that enables batch and streaming processing with extensive Other researchers [26] trained classification algorithms
support for many machine learning algorithms for text by using the training matrix that contains the selected terms
mining and other applications [14]. and their corresponding TF-IDF (Term Frequency-Inverse
The paper is organized as follows. Section 2 reviews the Document Frequency) weights. They tested several
related work. Section 3 describes the methodology. Section algorithms. The results show that SVM was promising in
4 provides results. Section 5 concludes and gives future terms of accuracy. However, the model was trained on a
directions. small dataset about 3700 Arabic tweets to detect one type of
events which is a high-risk flood. Moreover, none of the
II. LITERATURE REVIEW above-discussed approaches for event detection from the
In this section, we review some notable literature related Arabic text used big data platforms. Furthermore, their main
to social media based event detection. First, we review the focus was not on traffic events such as traffic jam.
works on traffic event detection in languages other than C. Traffic Events detection using big data technologies
Arabic. Subsequently, we discuss the existing works about
detection of various events (not necessarily traffic events) Alomari and Mehmood [27] used SAP HANA, which is
from Arabic social data. These two sections do not include an in memory processing platform to analyze Arabic tweets
any works that use big data. Finally, we review the works on related to traffic congestion in Jeddah city. In addition, they
traffic event detection that use big data technologies. extracted the top causes of traffic congestion. Furthermore,
they extended the work and proposed sentiment analysis
A. Traffic event detection using social media approach for traffic events [28]. However, their approach
In recent years, researchers had proposed many different was dictionary-based. They did not use machine learning
approaches in online event detection from social media. techniques. Salas et al. [29] propose a framework for the
Agarwal et al. [9] focused on identifying complaints reported real-time detection of traffic events from tweets in English
(tweets) in road irregularities and bad road conditions. After language using Apache Spark and Python machine learning
extracting the important information; such as the problem algorithms. Additionally, they used the SVM classification
and the location, they applied rule-based classifier and algorithm and classified the tweets into traffic and non-traffic
categorized them into useful, nearly-useful and irrelevant related tweets.
complaint reports. Sakaki et al. [11] mainly focus on Suma et al. [30] have analyzed tweets to detect events
detecting heavy-traffic information and weather information. related to road traffic. They built a classification model to
They classified Japanese tweets into positive (event-related) classify the tweets into traffic-related and non-traffic-related.
and negative (not related to events) classes using Support by using logistic regression with stochastic gradient descent.
Vector Machine (SVM) classifier. To detect events, they identify the most frequent terms
Furthermore, Klaithin and Haruechaiyasak [15] among the traffic-related tweets. They improve the methodo-
extracted information related to traffic using lexicon-based logical and event detection aspects of their work in [31].
and rule-based techniques. These applied machine learning All of these approaches ([29], [30], [31]) used supervised
classifier based on Naive Bayes Model to classify Thai classification algorithms on Apache Spark platform. Howev-
tweets about traffic into six categories include accident, er, they analyzed tweets in the English language. Lau [13]
announcement, question, orientation, request, and sentiment. used the Latent Dirichlet Allocation (LDA) topic modeling
The trained the model using 4,637 tweets. Kumar et al. [16] module for unsupervised topic mining. After labeling the
detected road hazard by applying a trained language model to messages using a list of common traffic event keywords,
classify the tweets as having negative or non- negative they implemented ML classifier using Spark Machine
sentiment. All tweets that expressed negative sentiment are Learning (MLib) library.
considered as a tweet with road hazard information.

1889

Authorized licensed use limited to: University of Botswana. Downloaded on October 18,2024 at 10:42:51 UTC from IEEE Xplore. Restrictions apply.
Fig. 1. Architecture of the proposed event detection system using Twitter data, machine learning, and Apache Spark

However, they focused on three traffic events: traffic jams, classifier to filter tweets into relevant and irrelevant. Three
poor road conditions, and traffic restrictions and analyzed the models are built using three different supervised classifica-
Chinese language data. Therefore, there is a need for an tion algorithms. Then, the four widely used evaluation
efficient and scalable approach mainly designed for the metrics; precision, accuracy, recall, and F-score are used to
Arabic Language to address the challenges arising from evaluate the models and select the best algorithm. After that,
Arabic big social data. we use the trained model that achieves higher performance
than the others to filter out the irrelevant tweets. Fifth, part
III. METHODOLOGY of the relevant tweets are manually labeled and used to build
Fig. 1 illustrates the proposed architecture for automatic and train other classifiers to classify events. The trained
traffic event detection from Arabic tweets using supervised classifiers are evaluated and then used for event detection.
ML algorithms and Apache Spark. It comprises six main Finally, we visualize the results and validate the effective-
components: (1) Data collection and storage component, (2) ness of the classifier by searching in the official sources such
Data pre-processing component, (3) Feature extractor com- as the newspaper website.
ponent, (4) Tweet filtering component, (5) Event detection Moreover, we use Apache Spark platform, which is a
component, and (6) Validation and results visualization distributed in-memory computing platform to handle the
component. huge volume of unstructured data in twitter platform for
First, the data are collected using Twitter API, and the event detection. Besides, we use Python Machine Learning
fetched JSON objects are stored in MongoDB. After (Spark ML) package, which provides high-level machine
removing the duplicates, we split the tweets into a labeled learning APIs built on top of Spark DataFrame. A
and unlabeled dataset. The authors manually tag each tweet DataFrame is a distributed collection of data organized into
in the labeled set with an appropriate label (1 for relevant, 0 named columns. It is conceptually equivalent to a table in a
for irrelevant). Second, we apply pre-processing steps to relational database. DataFrames can be used
remove noise and prepare the data for classification. The with Spark SQL. Additionally, it can be constructed from
output of this component is a list of normalized and cleaned different sources such as Hive tables, structured data files,
tokens. Third, we extract the features and use TF-IDF as a external databases, or existing RDDs.
feature vectorization method to reflect the importance of a A. Data collection
term to a document (tweet) in the whole collection (tweets
list). Fourth, the labeled tweets are used to build and train a Tweets are collected via Twitter REST API using
geolocation filtering to obtain tweets posted in Saudi Arabia.

1890

Authorized licensed use limited to: University of Botswana. Downloaded on October 18,2024 at 10:42:51 UTC from IEEE Xplore. Restrictions apply.
In addition, we collected tweets in hashtags that usually used Further, instead of removing the entire hashtag, we strip
to post about events in cities such as '#‫ 'ﺟﺪﻩ_ﺍﻻﻥ‬meaning only the hash (#) and underscore (_) symbols and keep the
(#Jeddah_now), '#‫( 'ﺍﻟﺮﻳﺎﺽ_ﺍﻻﻥ‬#Riyadh_now). We collected keywords because it almost includes useful information like
all Arabic tweets in the period between 23 September-1st the place/ event name. Moreover, we remove all Arabic
October 2018. diacritic and vowel marks such as Shaddah, which is a
Since our data required scalable and flexible schemas diacritic shaped like a small written "w." After that, the text
based storage, we selected NoSQL databases instead of the is divided into words (tokens). The tokens are normalized to
relational databases. The collected tweets are stored in replace letter that has different forms into the basic shape.
MongoDB, which is a document-oriented database suitable For instance, the letter (‫ )ﺍ‬pronounced Alif had three forms
for storing and managing Big Data-sized collections of (‫ﺃ‬, ‫ﺇ‬, ‫ )ﺁ‬and normalized to bare Alif (‫)ﺍ‬. Also, the letter (‫)ﻱ‬
documents like text. The fetched JSON objects from Twitter pronounced Yaa is normalized to dotless Yaa (‫) ﻯ‬and (‫)ـﺔ‬
API are inserted into the database. Further, the Tweets object Taa marbutah to (‫)ـﻪ‬. Finally, the Stop Words are filtered
contains several attributes including (i) 'created_at', which using the Arabic stop words list in the Natural Language
represents the time when the tweet was posted and (ii) Toolkit (NLTK) [32]. We modify the list to add the missing
'full_text' contains the message content. After that, we word and normalize the words before using them.
checked the redundancy and removed duplicate tweets Furthermore, we check the result of the pre-processing
(retweets). The total number of tweets after removing the phase before starting the classification. If the remaining
duplicates is about 1 million. number of tokens is equal to zero, the tweet is excluded from
the analysis. Fig. 3 shows the steps applied to a sample
B. Pre-processing tweet. The English translation for the tweet is:
Pre-processing the text is an essential task since the "#Riyadh_now abnormal congestion at the intersection of
Arabic morphology is rich and the Arabic dialectal text prince Fahad St. and University St.!!! Morning @Ruh_Rd".
usually has typos or grammatical mistakes. Also, it is a The removed punctuation, diacritics, and English words are
critical step to reduce the amount of noise before highlighted in red color. The word 'ً ‫ 'ﺻﺒﺎﺣﺎ‬ends with Fatha
classification because performing analysis directly on Tanween ( ً◌), which is one of the Nunation diacritics. The
dialectal text may lead to poor results. normalized tokens are highlighted in green while the
Algorithm1 summarizes the main pre-processing steps. removed stop words are highlighted in red. All the discussed
First, sparkConnector is used to connect to MongoDB. Then, pre-processing implementation steps in this subsection are
the tweets are loaded and saved in Spark DataFrame. The specific to the Arabic language except tokenization since it is
next step is iterating over the tweets to remove all numbers, based on splitting a string by white space regardless of the
English alphabets and punctuations such as commas (,), language.
period (.), semi-colons (;), colons (:), question marks (?), and
so forth. Likewise, we strip the Arabic question mark (‫ )؟‬and C. Feature Extraction
Arabic semi-colons (‫)؛‬. Removing punctuations helps to We use Feature Extractors algorithms provided in Spark
reduce the size of the feature set since users rarely use formal ML package. We apply TF-IDF (Term Frequency-Inverse
language. Therefore, most of the punctuation marks are not Document Frequency), which is a measure of how important
used properly and keeping them will not give any valuable a word is to a document (tweet). The TF-IDF is merely the
information. product of TF and IDF. The TF(t, d) is the frequency of the
appearance of term t in document d while the IDF is a
numerical measure of how much information a term pro-
vides. The IDF is calculated using the following equation:

| |
IDF(t, D) = ( , )

where |D| is the total number of documents in the

collection D. Document Frequency DF(t, D) is the number of
documents where the term t appears.

TFIDF(t, d, D) = TF(t, d) ⋅ IDF(t, D)

To generate the term frequency (TF) vectors, we used

CountVectorizer algorithm. The algorithm gets the list of
tokens in 'Tokens' column as input and then converts them
into vectors of token counts. Then, the resultant term
frequency vectors are passed to the IDF algorithms. After
that, the IDFModel will rescale the feature vectors, and the
output will be stored in a new column named 'Features'. This
column is passed as input for classification algorithms.
D. Classification (Tweet Filtering)
Since not all the collected tweets are relevant to traffic,
we filter the tweets before detecting events. So, we build a
Fig. 2. Pre-processing algorithm classifier to filter out the irrelevant tweets to traffic. We used
machine learning algorithms in the Spark ML package.

1891

Authorized licensed use limited to: University of Botswana. Downloaded on October 18,2024 at 10:42:51 UTC from IEEE Xplore. Restrictions apply.
Fig. 3. Steps of pre-processing applied to a sample tweet

We split the manually labeled data into training sets (80%) and Accident at the same time. To address this problem, we
and testing sets (20%). After that, we build and train model treat each label as a separate binary classification problem.
using Naïve Bayes, SVM, and logistic regression (LR) Thus, we trined eight binary classifiers. For each event type,
algorithms. The models are trained on the training set. To we consider the tweets about the event as positive while all
find the best algorithm, we evaluate them over the testing set. the remaining tweets about the other types of events as
The common statistical metrics, such as precision, accuracy, negative. However, this will lead to imbalance sampling
recall, and F-score are used to evaluate the trained classifier. where the number of negative is larger than the positive. To
To clarify the meaning of these metrics, we refer to traffic- adjust the class distribution and eliminate the effect on
related tweets as positive class and none related as negative evaluation results, we perform undersampling for the
class. The following four classes are used in these metrics: negative (majority) class using the random undersampling
(i) True Positive (TP) for the positive tweets that correctly method to make the data set balanced before evaluation. We
predicted as positive, (ii) True Negative (TN) for the prefer undersampling by removing samples from the
negative tweets that correctly predicted as negative, (iii) majority class instead of oversampling by taking repeated
False Positive (FP) refers to the tweets that labeled as samples from the minority class. Since the number of the
negative but predicted as positive, and (iv) False Negative negative labels is very large compared to the positive where
(FN) for the tweets that labeled as positive but predicted as it contains all the tweets about the other event types. Even
negative. The corresponding equations for each matric are though undersampling leads to loss of information, in our
listed below. The accuracy is calculated by Eq. (3), Precision case, correctly classifying the negative labels is less
(Positive Predictive Value) by Eq. (4), Recall (True Positive important than the positive labels. Moreover, after detecting
Rate) by Eq. (5) and F-Score by Eq. (6). the events, we extract the time of occurrence using the time,
and date information from 'created_at' attribute in the tweets
acc = object. Furthermore, we extract information about each
event including location information using the top frequent
terms since people usually refer to the event place using the
PPV = hashtag. For model evaluation, we use the same evolution
method explained in section 4.C. To validate the effective-
ness of our event detection approach, we extract the top
TPR =
vocabularies from the tweets of each detected events. Then,
.
we use these vocabularies to search in the official news/
F(β) = (1 +β2). newspapers websites to confirm the occurrence of the events.
⋅
After that, we compare the extracted information by our
method including time and location with the real information
E. Event Detection in the official sources.
For event detection, we build and train classifier using
the Naïve Bayes, SVM, and logistic regression algorithms. IV. RESULTS AND DISCUSSION
To train the events classifier, the authors manually label part
A. Results for Tweets filtering
of the filtered data from the previous step into eight event
categories, which are Fire, Weather, Social Events, Traffic The performance of the three classification algorithms
Condition, Roadwork, Road Damage, Accident, and Road (Naïve Bayes, SVM, and Logistic Regression algorithms) for
Closures. Traffic condition category includes negative and tweet filtering is measured using the evaluation metrics
positive tweets about the traffic condition. For Fire events, explained in Eq. (3-6). Fig. 4 shows that SVM is better than
all tweets about fires are included under this category even Naïve Bayes and Logistic Regression algorithms in term of
though it is not a vehicle fire because it may affect negatively accuracy, F-score and precision. Furthermore, both SVM and
on the traffic and cause congestion. Furthermore, for the Logistic Regression achieved recall of 90%.
social event, we focus only on the events that could affect the
traffic (e.g., carnival, national day).
During our analysis, we notice that some event types
have a large number of tweets compared to the other. So, we
divided them into small-scale and large-scale events based
on the number of tweets. The small-scale events are Traffic
Condition, Roadwork, Road Damage, Accident, and Road
Closures. The number of tweets for these events is small
compared to Fire, Weather, and Social Events. So, we
consider them as large-scale events.
Furthermore, we have a multi-label classification
problem, since the classes (event types) are not mutually
exclusive and the same tweet can belong to more than one
Fig. 4. Evaluation results for tweets filtering
class. For example, the tweet can be about Traffic Condition

1892

Authorized licensed use limited to: University of Botswana. Downloaded on October 18,2024 at 10:42:51 UTC from IEEE Xplore. Restrictions apply.
(a) (b)

Fig. 6. Sample of the detected large-scale events per day (Year: 2018) Fig.7. Sample of the detected small-scale events per day (Year: 2018)

B. Results for event detection Traffic Condition, Accident, Road Damage, Roadwork and
Road Closure are shown in Fig. 7. Moreover, we validated
Fig. 5 illustrates the evaluation results for the binary clas- our event detection approach by searching in the official
sification of events. The figure shows the four metrics sources. From the tweets of each detected events, we
Accuracy, Precision, Recall, and F-score, respectively. We extracted the top vocabularies. Then, we searched in the
compared the result to select the algorithm that achieves official news websites and local newspapers websites such as
higher results for the four metrics. We found that for Road Okaz and Sabq. After that, we extracted the time infor-
Closures, Accident and Traffic Condition events, SVM mation from the tweet object and drew charts to show the
worked better than the other algorithm. On the other side, number of tweets in hours by day.
the logistic regression algorithm achieved higher results for Fig. 8 shows the hourly number of tweets related to Social
Social Event, Roadwork and Road Damage. Besides, SVM event. From the tweets about the Social Event on the 23rd of
and logistic regression algorithms gave similar results for September, we listed the top vocabularies: ‫( ﻭﻁﻨﻰ‬national),
Fire and Weather events. ‫(ﻳﻮﻡ‬day), ‫(ﺍﺣﺘﻔﺎﻝ‬celebration), ‫(ﺍﺣﺘﻔﺎﻻﺕ‬celebrations),
Moreover, we noticed that the results for Weather and ‫( ﺳﻌﻮﺩﻱ‬Saudi). The vocabularies illustrate that the detected
Fire are higher than the other events. We assume that the event is the Saudi national day celebration where many
reason is that our dataset contains only one big fire event (as activities were organized by municipalities in different cities.
explained later in this section), and thus we expected that The second large-scale detected event is about the
most tweets about it contain similar vocabularies, which weather condition. Fig. 9 shows the number of tweets in
make the classification easier. Similarly, most of the tweets hours by day. The highest number of tweets about the
related to the weather condition are about rains.. weather was on the 27th of September. The top extracted
Furthermore, we created charts to show the detected vocabularies about this event are ‫( ﺍﻻﻥ‬now), ‫( ﻣﻄﺮ‬rain), ‫ﻁﺎﻳﻒ‬
events per day. We divided them into two categories: large- (Taif), ‫( ﺍﻣﻄﺎﺭ‬rains), ‫( ﻣﻜﻪ‬Makkah). The news reports
scale and small-scale based on the number of tweets. Fig. 6 indicated that there were rains in Makkah region including
shows the large-scale events: Social Events, Weather, and Makkah and Taif cities on the same date.
Fire. On the other side, the small scale events including

1893

Authorized licensed use limited to: University of Botswana. Downloaded on October 18,2024 at 10:42:51 UTC from IEEE Xplore. Restrictions apply.
Fig. 9. The number of tweets per hour for the top 'Weather' events
Fig. 8. The number of tweets per hour for the top 'Social Event'

Fig. 10. The number of tweets per hour for the 'Fire' events Fig. 11. The number of tweets per hour for the top 'Traffic Condition' event

when students/employee drive to home from work or school.

Additionally, the closure of the main roads near the station
increased the traffic jam. As shown in Fig. 11 and Fig. 12 the
number of tweets on the 25th of September is increased after
12:00 pm. The discussed results above verify the ability of
our proposed approach for automatic detection of large-scale
and small-scale events, as well as the location and time
without prior knowledge about the event.
V. CONCLUSION
In this paper, we focused on detecting road traffic related
Fig. 12. The number of tweets per hour for the top 'Road Closure' events events to enable smarter transportation. We proposed a
method for automatic detection of traffic events from tweets
Moreover, we detected Fire on the 1st of October. The in Saudi dialect using machine learning algorithms and
list of the top vocabularies, which include ‫( ﺣﺮﻳﻖ‬fire), ‫ﺭﻳﺎﺽ‬ Apache Spark platform. Since the raw text is not suitable as
(Riyadh), ‫( ﺷﻤﺎﻝ‬north), ‫( ﻛﻬﺮﺑﺎء‬electricity), ‫( ﻓﻴﺪﻳﻮ‬video), direct input to classification, the text was divided into tokens
‫(ﻣﺤﻄﻪ‬station) indicates that the fire was on Riyadh city. The and normalized after removing numbers, punctuation,
newspapers illustrated that there was a huge fire broke out at diacritic, and non-Arabic words. TF-IDF was selected as
a power plant in Riyadh. As posted in newspaper websites, weighting schemes and the tokens are converted into a vector
the Saudi Civil Defense received notification about the fire at of terms.
15:00. Fig. 10 illustrates that the number of tweets about Fire Furthermore, we trained a classifier to filter tweets into
increased sharply at 15:00. This matches the event starting relevant (to traffic) and irrelevant. We used three machine
time clarified in the newspapers articles. learning algorithms, Naïve Bayes, SVM, and logistic
For small-scale events, we chose the top detected event regression. Subsequently, we trained the other classifiers to
which is about the traffic condition on the 25th of September. detect the occurrence of multiple traffic-related events in
We listed the top vocabularies extracted from the tweets Saudi Arabia. We extracted information about each event
about this event, which include ‫( ﺣﺮﻣﻴﻦ‬Harammin), ‫ﻁﺮﻳﻖ‬ including location information using the top frequent terms.
(Road), ‫( ﻗﻄﺎﺭ‬Train) , ‫( ﺷﺎﺭﻉ‬Streets), ‫( ﺯﺣﻤﻪ‬Congestion), ‫ﺍﻟﺴﺮﻳﻊ‬ Then, we searched in the official sources such as the news-
(Highway), ‫( ﺟﺪﻩ‬Jeddah). After that, we searched on the paper websites to validate our approach. The results showed
newspaper website using these vocabularies. We found that that our method is able to detect the traffic-related events, as
Al-Haramain high-speed railway was inaugurated on the well as their location and time, automatically, without any
same date. The inauguration ceremony was at the main prior knowledge of the events.
station in Jeddah and started afternoon during rush hours

1894

Authorized licensed use limited to: University of Botswana. Downloaded on October 18,2024 at 10:42:51 UTC from IEEE Xplore. Restrictions apply.
In the future, we will improve the location detection World of Wireless, Mobile and Multimedia Networks (WoWMoM),
approach to extract the exact location of the event especially 2017, pp. 1–6.
if it is not mentioned in the text. Besides, we will develop a [14] N. Pavlopoulou, A. Abushwashi, F. Stahl, and V. Scibetta, “A text
mining framework for Big Data,” Expert Updat., vol. 17, no. 1, 2017.
sentiment classifier to identify positive and negative tweets.
For instance, traffic condition events will be classified into [15] S. Klaithin and C. Haruechaiyasak, “Traffic Information Extraction
and Classification from Thai Twitter,” Comput. Sci. Softw. Eng.
positive (no traffic jam) and negative (traffic jam). We will (JCSSE), 2016 13th Int. Jt. Conf., pp. 1–6, 2016.
also improve the design, analysis and data variety aspects of [16] A. Kumar, M. Jiang, and Y. Fang, “Where not to go?: detecting road
our work. Finally, the proposed methodology can be applied hazards using twitter,” in Proceedings of the 37th international ACM
to event types other than transportation, and other areas, …, 2014, vol. 2609550, pp. 1223–1226.
because we collect all the tweets (without any filtering) and [17] D. A. Kurniawan, S. Wibirama, and N. A. Setiawan, “Real-time
then build and train a classifier to filter out irrelevant tweets. Traffic Classification with Twitter Data Mining,” in In 2016 8th
International Conference on Information Technology and Electrical
ACKNOWLEDGMENT Engineering (ICITEE), 2016, pp. 1–5.
[18] D. Semwal, S. Patil, S. Galhotra, A. Arora, and N. Unny, “STAR:
The work carried out in this paper is supported by the Real-time Spatio-Temporal Analysis and Prediction of Traffic
HPC Center at King Abdulaziz University. Insights using Social Media,” in In Proceedings of the 2nd IKDD
Conference on Data Sciences, 2015, p. 7.
REFERENCES [19] M. R. Alifi and S. H. Supangkat, “Information Extraction for Traffic
[1] G. Cookson, “World Health Organization: Road traffic injuries.” Congestion in Social Network,” in International Conference on ICT
[Online]. Available: https://www.who.int/news-room/fact- For Smart Society, 2016, no. July, pp. 20–21.
sheets/detail/road-traffic-injuries. [Accessed: 18-Feb-2019]. [20] R. Hanifah, S. H. Supangkat, and A. Purwarianti, “Twitter
[2] “INRIX Global Traffic Scorecard.” [Online]. Available: information extraction for smart city,” Proc. - 2014 Int. Conf. ICT
http://inrix.com/scorecard/. [Accessed: 18-Feb-2019]. Smart Soc. “Smart Syst. Platf. Dev. City Soc. GoeSmart 2014”, ICISS
2014, pp. 295–299, 2014.
[3] R. Mehmood, B. Bhaduri, I. Katib, and I. Chlamtac, Eds., Smart
Societies, Infrastructure, Technologies and Applications, Lecture [21] P. Tejaswin, R. Kumar, and S. Gupta, “Tweeting Traffic: Analyzing
Notes of the Institute for Computer Sciences, Social Informatics and Twitter for generating real-time city traffic insights and predictions,”
Telecommunications Engineering (LNICST), Volume 224, vol. 224. Proc. 2nd IKDD Conf. Data Sci. - CODS-IKDD ’15, pp. 1–4, 2015.
Cham: Springer International Publishing, 2018. [22] N. Dhavase and A. M. Bagade, “Location identification for crime &
[4] J. Schlingensiepen, F. Nemtanu, R. Mehmood, and L. McCluskey, disaster events by geoparsing Twitter,” 2014 Int. Conf. Converg.
“Autonomic Transport Management Systems—Enabler for Smart Technol. I2CT 2014, pp. 2–4, 2014.
Cities, Personalized Medicine, Participation and Industry [23] O. Mohammad AL-Smadi , Qawasmeh, “Knowledge-based Approach
Grid/Industry 4.0,” in Intelligent Transportation Systems – Problems for Event Extraction from Arabic Tweets,” Int. J. Adv. Comput. Sci.
and Perspectives, Volume 32 of the series Studies in Systems, Appl., vol. 7, no. 6, 2016.
Decision and Control, Springer International Publishing, 2016, pp. 3– [24] N. Alsaedi and P. Burnap, “Arabic Event Detection in Social Media,”
35. in LNCS, vol. 9041, 2015, pp. 384–401.
[5] Z. Alazawi, O. Alani, M. B. Abdljabar, S. Altowaijri, and R. [25] N. Alsaedi, P. Burnap, and O. Rana, “Can We Predict a Riot ?
Mehmood, “A Smart Disaster Management System for Future Cities,” Disruptive Event Detection Using Twitter,” ACM Trans. Internet
WiMobCity ’14. Int. Work. Wirel. Mob. Technol. Smart Cities, pp. 1– Technol., vol. 17, no. 2, p. 18, 2017.
10, 2014.
[26] W. Alabbas, M. Haider, A. Mansour, G. Epiphaniou, and I.
[6] D. Wang, A. Al-Rubaie, J. Davies, and S. S. Clarke, “Real time road Frommholz, “Classification of Colloquial Arabic Tweets in real- time
traffic monitoring alert based on incremental learning from tweets,” in to detect high-risk floods,” Soc. Media, Wearable Web Anal. (Social
In 2014 IEEE Symposium on Evolving and Autonomous Learning Media), 2017 Int. Conf. IEEE., 2017.
Systems (EALS), 2014, pp. 50–57.
[27] E. Alomari and R. Mehmood, “Analysis of tweets in Arabic language
[7] M. Ni, Q. He, and J. Gao, “Forecasting the Subway Passenger Flow for detection of road traffic conditions,” in Lecture Notes of the
under Event Occurrences with Social Media,” IEEE Trans. Intell. Institute for Computer Sciences, Social-Informatics and
Transp. Syst., vol. 18, no. 6, pp. 1623–1632, 2017. Telecommunications Engineering, LNICST, Volume 224, 2018, vol.
[8] S. Wang, L. He, L. Stenneth, P. S. Yu, and Z. Li, “Citywide traffic 224, pp. 98–110.
congestion estimation with social media,” in Proceedings of the 23rd [28] E. Alomari, R. Mehmood, and I. Katib, “Sentiment Analysis of
SIGSPATIAL International Conference on Advances in Geographic Arabic Tweets for Road Traffic Congestion and Event Detection,” in
Information Systems - GIS ’15, 2015, pp. 1–10. In: Mehmood R., See S., Katib I., Chlamtac I. (eds) Smart
[9] S. Agarwal, N. Mittal, and A. Sureka, “Potholes and Bad Road Infrastructure and Applications: Foundations for Smarter Cities and
Conditions- Mining Twitter to Extract Information on Killer Roads,” Societies, Springer, 2019, p. to appear.
ACM India Jt. Int. Conf. Data Sci. Manag. Data CoDS-COMAD [29] I. Salas, A., Georgakis, P., Nwagboso, C., Ammari, A. and Petalas,
2018, 2018. “Traffic Event Detection Framework Using Social Media,” in IEEE
[10] A. Oussous, F.-Z. Benjelloun, A. A. Lahcen, and S. Belfkih, “Big International Conference on Smart Grid and Smart Cities, 2017, no.
Data technologies: A survey,” J. King Saud Univ. - Comput. Inf. Sci., July, p. 5.
2017. [30] S. Suma, R. Mehmood, N. Albugami, I. Katib, and A. Albeshri,
[11] T. Sakaki, Y. Matsuo, T. Yanagihara, N. P. Chandrasiri, and K. “Enabling Next Generation Logistics and Planning for Smarter
Nawa, “Real-time event extraction for driving information from social Societies,” Procedia - Procedia Comput. Sci., pp. 1–6, 2017.
sensors,” in Proceedings - 2012 IEEE International Conference on [31] S. Suma, R. Mehmood, and A. Albeshri, “Automatic Event Detection
Cyber Technology in Automation, Control, and Intelligent Systems, in Smart Cities Using Big Data Analytics,” in In International
CYBER 2012, 2012, pp. 221–226. Conference on Smart Cities, Infrastructure, Technologies and
[12] E. D’Andrea, P. Ducange, B. Lazzerini, and F. Marcelloni, “Real- Applications, 2017, pp. 111–122.
Time Detection of Traffic from Twitter Stream Analysis,” IEEE [32] E. Loper and S. Bird, “NLTK: The Natural Language Toolkit.” arXiv
Trans. Intell. Transp. Syst., vol. 16, no. 4, pp. 2269–2283, 2015. preprint cs/0205028, 2002
[13] R. Y. K. Lau, “Toward a social sensor based framework for intelligent
transportation,” in 2017 IEEE 18th International Symposium on A

1895

Authorized licensed use limited to: University of Botswana. Downloaded on October 18,2024 at 10:42:51 UTC from IEEE Xplore. Restrictions apply.

Transportation Analytics in The Era of Big Data: Satish V. Ukkusuri Chao Yang Editors
No ratings yet
Transportation Analytics in The Era of Big Data: Satish V. Ukkusuri Chao Yang Editors
240 pages
Twiter (Recovered)
No ratings yet
Twiter (Recovered)
143 pages
10 1016@j Ipm 2019 102139
No ratings yet
10 1016@j Ipm 2019 102139
25 pages
Transportation System 上台報告ppt
No ratings yet
Transportation System 上台報告ppt
47 pages
Traffic Data Mining Australasian Database Conference
No ratings yet
Traffic Data Mining Australasian Database Conference
12 pages
Event Detection, Tracking and Visualization in Twitter A Mention-Anomaly-Based Approach
No ratings yet
Event Detection, Tracking and Visualization in Twitter A Mention-Anomaly-Based Approach
18 pages
Detection of Traffic Congestion Based On Twitter Using Convolutional Neural Network Model
No ratings yet
Detection of Traffic Congestion Based On Twitter Using Convolutional Neural Network Model
12 pages
IET Intelligent Trans Sys - 2015 - Grant Muller - Enhancing Transport Data Collection Through Social Media Sources Methods
No ratings yet
IET Intelligent Trans Sys - 2015 - Grant Muller - Enhancing Transport Data Collection Through Social Media Sources Methods
11 pages
A Three-Stage Anomaly Detection Framework For Traf
No ratings yet
A Three-Stage Anomaly Detection Framework For Traf
11 pages
Policy Booklet - D080785780
No ratings yet
Policy Booklet - D080785780
85 pages
Liu Dissertation
No ratings yet
Liu Dissertation
215 pages
A Random Forest Incident Detection Algorithm That Incorporates Contexts
No ratings yet
A Random Forest Incident Detection Algorithm That Incorporates Contexts
13 pages
JournalNX - Traffic Time Monitoring
No ratings yet
JournalNX - Traffic Time Monitoring
3 pages
Improving Traffic Prediction With Tweet Semantics
No ratings yet
Improving Traffic Prediction With Tweet Semantics
7 pages
Improving Crisis Event Detection Rate in Online Social Networks Twitter Stream Using Apache Spark
No ratings yet
Improving Crisis Event Detection Rate in Online Social Networks Twitter Stream Using Apache Spark
11 pages
Preheater ... AM
No ratings yet
Preheater ... AM
77 pages
Big Data and Analytics For Safer Transportation PDF
No ratings yet
Big Data and Analytics For Safer Transportation PDF
8 pages
E73 HMM
No ratings yet
E73 HMM
186 pages
Toward Detecting Accidents With Already Available Passive Traffic Information
No ratings yet
Toward Detecting Accidents With Already Available Passive Traffic Information
4 pages
A Deep Learning Approach For Traffic Incident Detection in Urban Networks
No ratings yet
A Deep Learning Approach For Traffic Incident Detection in Urban Networks
6 pages
Irritec Filtration Catalogue
No ratings yet
Irritec Filtration Catalogue
100 pages
1.1 What Is A Signal?: Block Diagram Representation of A System
No ratings yet
1.1 What Is A Signal?: Block Diagram Representation of A System
98 pages
Solution CS304P_Lab Exercises
No ratings yet
Solution CS304P_Lab Exercises
7 pages
3D CAD Journal
No ratings yet
3D CAD Journal
65 pages
Very Advanced System Engineering With FAS
No ratings yet
Very Advanced System Engineering With FAS
18 pages
SA765
No ratings yet
SA765
5 pages
Smart India Hackathon 2024: Title Page
No ratings yet
Smart India Hackathon 2024: Title Page
6 pages
Field Trip Report 3
No ratings yet
Field Trip Report 3
11 pages
Wallaga University School of Graduate Studies
No ratings yet
Wallaga University School of Graduate Studies
28 pages
0 0 2112123612151TCSPKG-1
No ratings yet
0 0 2112123612151TCSPKG-1
7 pages
Model and Serial Number(s) : Cat Lift Trucks
No ratings yet
Model and Serial Number(s) : Cat Lift Trucks
9 pages
TDH Manual
No ratings yet
TDH Manual
13 pages
Database: Level 100
No ratings yet
Database: Level 100
29 pages
DE
No ratings yet
DE
28 pages
REM611 Series Broch
No ratings yet
REM611 Series Broch
12 pages
Master Liquid Clustering - Internals, Mechanisms
No ratings yet
Master Liquid Clustering - Internals, Mechanisms
6 pages
SQL pdf-3
No ratings yet
SQL pdf-3
9 pages
Migrating A Survey From LimeSurvey To Qualtrics
No ratings yet
Migrating A Survey From LimeSurvey To Qualtrics
11 pages
8PON Port GPON OLT (FD1508GS) Datasheet - V1.1
No ratings yet
8PON Port GPON OLT (FD1508GS) Datasheet - V1.1
3 pages
Manual MS 6500
No ratings yet
Manual MS 6500
8 pages
Brief Operation Guide
No ratings yet
Brief Operation Guide
5 pages
Grade 11 Mathematics Mid-Exam (G11)
No ratings yet
Grade 11 Mathematics Mid-Exam (G11)
3 pages
Ss 1580 C Cellular Us
No ratings yet
Ss 1580 C Cellular Us
2 pages
CX - Airtel and Amdocs Case Study - FINAL
No ratings yet
CX - Airtel and Amdocs Case Study - FINAL
4 pages
Resume
No ratings yet
Resume
3 pages
Brent Braun Position Impossible (PDF) : You've Uploaded 2 of The 5 Required Documents
No ratings yet
Brent Braun Position Impossible (PDF) : You've Uploaded 2 of The 5 Required Documents
3 pages
International Relations in the Cyber Age: The Co-Evolution Dilemma
From Everand
International Relations in the Cyber Age: The Co-Evolution Dilemma
Nazli Choucri
No ratings yet
Digital Economies at Global Margins
From Everand
Digital Economies at Global Margins
Mark Graham
No ratings yet
The Smart Enough City: Putting Technology in Its Place to Reclaim Our Urban Future
From Everand
The Smart Enough City: Putting Technology in Its Place to Reclaim Our Urban Future
Ben Green
No ratings yet
The Data-Driven World - How Big Data is Transforming Business and Society
From Everand
The Data-Driven World - How Big Data is Transforming Business and Society
Alex Dawson
No ratings yet
Shrinking Economic Distance: Understanding How Markets and Places Can Lower Transport Costs in Developing Countries
From Everand
Shrinking Economic Distance: Understanding How Markets and Places Can Lower Transport Costs in Developing Countries
Matías Herrera Dappe
No ratings yet
Enterprise Strategy for Blockchain: Lessons in Disruption from Fintech, Supply Chains, and Consumer Industries
From Everand
Enterprise Strategy for Blockchain: Lessons in Disruption from Fintech, Supply Chains, and Consumer Industries
Ravi Sarathy
No ratings yet
The Internet of Things, revised and updated edition
From Everand
The Internet of Things, revised and updated edition
Samuel Greengard
3/5 (11)
Digital Criminology
From Everand
Digital Criminology
Paul Neumann
No ratings yet
Africa 2.0: Inside a continent’s communications revolution
From Everand
Africa 2.0: Inside a continent’s communications revolution
Russell Southwood
No ratings yet
Networking the World: Beyond Boundaries
From Everand
Networking the World: Beyond Boundaries
Pasquale De Marco
No ratings yet
Smart Cities
From Everand
Smart Cities
Germaine Halegoua
3.5/5 (3)
Cyber Wars
From Everand
Cyber Wars
Paul Neumann
No ratings yet
Smart Cities, Smarter Lives - The Urban Future
From Everand
Smart Cities, Smarter Lives - The Urban Future
Ayla Huxley
No ratings yet
The Role of Information and Communication Technologies in Postconflict Reconstruction
From Everand
The Role of Information and Communication Technologies in Postconflict Reconstruction
David Souter
No ratings yet
Thingalytics
From Everand
Thingalytics
Dr. John Bates
4.5/5 (2)
IoT in Everyday Life
From Everand
IoT in Everyday Life
Anasuya Menon
No ratings yet
Information and Communications for Development 2018: Data-Driven Development
From Everand
Information and Communications for Development 2018: Data-Driven Development
World Bank
No ratings yet
The Telematics Revolution: Driving Connectivity and Insights
From Everand
The Telematics Revolution: Driving Connectivity and Insights
Anand Kumar Vedantham
No ratings yet
Smart Mobility: Innovations in Intelligent Transportation
From Everand
Smart Mobility: Innovations in Intelligent Transportation
Gavin T. Railton
No ratings yet
The Rise of Smart Cities - How Technology is Shaping Urban Life
From Everand
The Rise of Smart Cities - How Technology is Shaping Urban Life
Dr. Marcus Lawson
No ratings yet
Data Decoded - Understanding Big Data and Its Everyday Applications
From Everand
Data Decoded - Understanding Big Data and Its Everyday Applications
Michael Reed
No ratings yet
Cognitive Cities Unleashed: AI Algorithms for Efficient Resource Allocation
From Everand
Cognitive Cities Unleashed: AI Algorithms for Efficient Resource Allocation
Alberto De Miranda
No ratings yet
The Secrets of Future Disruptive Hi-Tech Ideas & Innovations
From Everand
The Secrets of Future Disruptive Hi-Tech Ideas & Innovations
Prof.(Dr.)Sanjay Rout
No ratings yet
The Great Disconnect: Preparing For The Day The Internet Dies
From Everand
The Great Disconnect: Preparing For The Day The Internet Dies
Lloyd Leon
No ratings yet
Holistic Future of Digital Governance
From Everand
Holistic Future of Digital Governance
Igor Ujhazi
No ratings yet
“Smart Cities: The Technology Transforming Urban Living”: GoodMan, #1
From Everand
“Smart Cities: The Technology Transforming Urban Living”: GoodMan, #1
Patrick Mukosha
No ratings yet
IAAC Bits 10 – Learning Cities: Collective Intelligence in Urban Design
From Everand
IAAC Bits 10 – Learning Cities: Collective Intelligence in Urban Design
Areti Markopoulou
No ratings yet
Transit Trajectory: Navigating New Paths in Urban Mobility
From Everand
Transit Trajectory: Navigating New Paths in Urban Mobility
Gavin T. Railton
No ratings yet
Mandates, Motors, and Misinformation
From Everand
Mandates, Motors, and Misinformation
John Shenton
No ratings yet
Navigating the Digital Landscape: Fundamentals, Cybersecurity, Emerging Technologies, and Applications
From Everand
Navigating the Digital Landscape: Fundamentals, Cybersecurity, Emerging Technologies, and Applications
Eli Kol
No ratings yet
GIS for Cities: Harnessing Geospatial Technology for Urban Development
From Everand
GIS for Cities: Harnessing Geospatial Technology for Urban Development
Dr Aran Castro A J
No ratings yet
Moving Forward: GIS for Transportation
From Everand
Moving Forward: GIS for Transportation
Terry Bills
No ratings yet
Web3 Unleashed: Exploring the Future of Digital Societies
From Everand
Web3 Unleashed: Exploring the Future of Digital Societies
Rafael Bonnelly
No ratings yet
Privacy in the Digital Age
From Everand
Privacy in the Digital Age
Roberto Miguel Rodriguez
No ratings yet
The IT Pro's Guide to Technical Mastery
From Everand
The IT Pro's Guide to Technical Mastery
Douglas Albert Amos
No ratings yet
All Hands on Tech: The AI-Powered Citizen Revolution
From Everand
All Hands on Tech: The AI-Powered Citizen Revolution
Thomas H. Davenport
No ratings yet
Big Data for a Sustainable Smart City
From Everand
Big Data for a Sustainable Smart City
Dr. Rehana Kassim
No ratings yet
Securing Information in the Digital Age
From Everand
Securing Information in the Digital Age
Roberto Miguel Rodriguez
No ratings yet
The Impact of 5G on Society: 5G technology can connect people, devices, infrastructures, and objects.
From Everand
The Impact of 5G on Society: 5G technology can connect people, devices, infrastructures, and objects.
Topin
No ratings yet
Digital Technologies – an Overview of Concepts, Tools and Techniques Associated with it
From Everand
Digital Technologies – an Overview of Concepts, Tools and Techniques Associated with it
Editor IJSMI
No ratings yet
The Prepper's Guide to the Digital Age: Escape, Evasion, and Survival
From Everand
The Prepper's Guide to the Digital Age: Escape, Evasion, and Survival
Sam Fury
No ratings yet
Intelligent Transportation System: revolutionizing mobility with robotics and automation
From Everand
Intelligent Transportation System: revolutionizing mobility with robotics and automation
Fouad Sabry
No ratings yet
Digital Technologies for Climate Action, Disaster Resilience, and Environmental Sustainability
From Everand
Digital Technologies for Climate Action, Disaster Resilience, and Environmental Sustainability
Asian Development Bank
No ratings yet
Crash Course Big Data
From Everand
Crash Course Big Data
IntroBooks Team
No ratings yet

Road Traffic Event Detection Using Twitter Data Machine Learning and Apache Spark

Uploaded by

Road Traffic Event Detection Using Twitter Data Machine Learning and Apache Spark

Uploaded by

2019 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing &

Road Traffic Event Detection Using Twitter Data,

978-1-7281-4034-6/19/$31.00 ©2019 IEEE 1888

where |D| is the total number of documents in the

TFIDF(t, d, D) = TF(t, d) ⋅ IDF(t, D) 

To generate the term frequency (TF) vectors, we used

when students/employee drive to home from work or school.

You might also like

TFIDF(t, d, D) = TF(t, d) ⋅ IDF(t, D)