Road Traffic Event Detection Using Twitter Data Machine Learning and Apache Spark
Road Traffic Event Detection Using Twitter Data Machine Learning and Apache Spark
Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation
Abstract—Road transportation is the backbone of modern sustainability” [3]. In smart cities and societies, a large
societies, yet it costs annually over a million deaths and trillions amount of diverse information is produced daily by
of dollars to the global economy. Social media such as Twitter heterogeneous sources including GPS, cameras smartphones
have increasingly become an important source of information as well as user-generated content from social media. Such
in many dimensions of smart societies. Automatic detection of data offers the potential for developing novel solutions that
road traffic events using Twitter data mining is one such area will support decision making for smart transportation. In
of a great many applications and enormous potential, albeit recent years, several approaches related to transportation in
facing major challenges concerning the management and smart cities have been proposed, e.g., autonomic
analysis of big data (volume, velocity, variety, and veracity).
transportation systems [4] and intelligent disaster
Various approaches on the subject have been proposed in
management [5].
recent years, but the methods and outcomes are in their
infancy. This paper proposes a method for automatic detection Social media such as Twitter and Facebook are a relative-
of road traffic related events from tweets in the Saudi dialect ly inexpensive and conveniently available source of
using machine learning and big data technologies. Firstly, we information comparing to physical sensors that cost greatly
build and train a classifier using three machine learning to install at a large scale to monitor the traffic flow. Twitter
algorithms, Naïve Bayes, Support Vector Machine, and logistic is one of the most popular microblogging media used for
regression, to filter tweets into relevant and irrelevant. Subse- communication and sharing personal status, events, news,
quently, we train other classifiers to detect multiple types of etc. Twitter allows users to post short text messages called
events including accident, roadwork, road closure, road tweets. A massive amount of real-time data is posted by
damage, traffic condition, fire, weather, and social events. The millions of users on various topics including transportation
results from the analysis of one million tweets show that our and real-time road traffic.
method is able to detect road traffic events, as well as their Moreover, Twitter has been adopted as a powerful data
location and time, automatically, without any prior knowledge source in smart transportation. In recent years, there has been
of the events. To the best of our knowledge, this is the first an increasing amount of literature on the use of Twitter as a
work on traffic event detection from Arabic tweets using sensor for traffic monitoring [6], flow forecasting [7],
machine learning and the Apache Spark big data platform. congestion estimation [8], and event detection [9]. These
approaches show great potential for this area, albeit face
Keywords—Twitter data analysis, Smart transportation,
major challenges. From the data mining perspective, event
Event detection, Smart cities, Machine learning, Text mining,
Big data analytics, Apache Spark, MongoDB, Naïve Bayes,
detection from unstructured, rapidly evolving tweets is a
Support Vector Machine (SVM), Logistic Regression challenging task. The Twitter data has all the characteristics
of big data, i.e., volume, velocity, variety, and veracity.
I. INTRODUCTION Therefore, the management and analysis of Twitter data for
event detection purposes is a major challenge. Advanced
Road transportation is the backbone of modern cities and
techniques and efficient approaches for data mining are
societies, yet it costs, annually, 1.25 million deaths and 20-
required to extract useful information, monitor the changes
50 million people injured across the globe [1]. Moreover,
and predict future observations [10].
road traffic congestion is one of the most significant
Another dimension of the automatic event detection do-
problems in modern cities. The annual cost of congestion to
main is the language of the tweets. Many researchers have
the US economy alone exceeds $305 billion [2]. The
attempted using social media information to monitor road
increasing number of vehicles, social events, lane closures,
traffic in different countries by analyzing text from different
roadworks, adverse weather, and other unexpected incidents
languages such as Japanese [11], Italian [12], and Chinese
have a negative impact on traffic flow and cause traffic
[13]. Our interests lie in detecting events from tweets in
congestions. Therefore, those causes, namely events (inci-
Saudi Arabia which has its own challenges due to the
dents), should be detected in an efficient and timely manner
dialectical Arabic which is used mostly in everyday tweeting
in order to support decision making and set management
compared to the formal Modern Standard Arabic (MSA).
strategies to reduce or eliminate congestion.
Another research gap is the limited use of big data
Smart cities provide “state-of-the-art approaches for ur-
technologies in the automatic detection of road traffic events
banization, having evolved from … knowledge-based
from Twitter data in the Arabic language. Particularly, to the
economy … digital economy and intelligent economy. The
best of our knowledge, no work exists that uses big data
notion of smart cities can be extended to smart societies …
technologies for automatic event detection of road traffic
digitally enabled, knowledge-based societies, aware of and
events from tweets in the Arabic language.
working towards social, environmental, and economic
1889
Authorized licensed use limited to: University of Botswana. Downloaded on October 18,2024 at 10:42:51 UTC from IEEE Xplore. Restrictions apply.
Fig. 1. Architecture of the proposed event detection system using Twitter data, machine learning, and Apache Spark
However, they focused on three traffic events: traffic jams, classifier to filter tweets into relevant and irrelevant. Three
poor road conditions, and traffic restrictions and analyzed the models are built using three different supervised classifica-
Chinese language data. Therefore, there is a need for an tion algorithms. Then, the four widely used evaluation
efficient and scalable approach mainly designed for the metrics; precision, accuracy, recall, and F-score are used to
Arabic Language to address the challenges arising from evaluate the models and select the best algorithm. After that,
Arabic big social data. we use the trained model that achieves higher performance
than the others to filter out the irrelevant tweets. Fifth, part
III. METHODOLOGY of the relevant tweets are manually labeled and used to build
Fig. 1 illustrates the proposed architecture for automatic and train other classifiers to classify events. The trained
traffic event detection from Arabic tweets using supervised classifiers are evaluated and then used for event detection.
ML algorithms and Apache Spark. It comprises six main Finally, we visualize the results and validate the effective-
components: (1) Data collection and storage component, (2) ness of the classifier by searching in the official sources such
Data pre-processing component, (3) Feature extractor com- as the newspaper website.
ponent, (4) Tweet filtering component, (5) Event detection Moreover, we use Apache Spark platform, which is a
component, and (6) Validation and results visualization distributed in-memory computing platform to handle the
component. huge volume of unstructured data in twitter platform for
First, the data are collected using Twitter API, and the event detection. Besides, we use Python Machine Learning
fetched JSON objects are stored in MongoDB. After (Spark ML) package, which provides high-level machine
removing the duplicates, we split the tweets into a labeled learning APIs built on top of Spark DataFrame. A
and unlabeled dataset. The authors manually tag each tweet DataFrame is a distributed collection of data organized into
in the labeled set with an appropriate label (1 for relevant, 0 named columns. It is conceptually equivalent to a table in a
for irrelevant). Second, we apply pre-processing steps to relational database. DataFrames can be used
remove noise and prepare the data for classification. The with Spark SQL. Additionally, it can be constructed from
output of this component is a list of normalized and cleaned different sources such as Hive tables, structured data files,
tokens. Third, we extract the features and use TF-IDF as a external databases, or existing RDDs.
feature vectorization method to reflect the importance of a A. Data collection
term to a document (tweet) in the whole collection (tweets
list). Fourth, the labeled tweets are used to build and train a Tweets are collected via Twitter REST API using
geolocation filtering to obtain tweets posted in Saudi Arabia.
1890
Authorized licensed use limited to: University of Botswana. Downloaded on October 18,2024 at 10:42:51 UTC from IEEE Xplore. Restrictions apply.
In addition, we collected tweets in hashtags that usually used Further, instead of removing the entire hashtag, we strip
to post about events in cities such as '# 'ﺟﺪﻩ_ﺍﻻﻥmeaning only the hash (#) and underscore (_) symbols and keep the
(#Jeddah_now), '#( 'ﺍﻟﺮﻳﺎﺽ_ﺍﻻﻥ#Riyadh_now). We collected keywords because it almost includes useful information like
all Arabic tweets in the period between 23 September-1st the place/ event name. Moreover, we remove all Arabic
October 2018. diacritic and vowel marks such as Shaddah, which is a
Since our data required scalable and flexible schemas diacritic shaped like a small written "w." After that, the text
based storage, we selected NoSQL databases instead of the is divided into words (tokens). The tokens are normalized to
relational databases. The collected tweets are stored in replace letter that has different forms into the basic shape.
MongoDB, which is a document-oriented database suitable For instance, the letter ( )ﺍpronounced Alif had three forms
for storing and managing Big Data-sized collections of (ﺃ, ﺇ, )ﺁand normalized to bare Alif ()ﺍ. Also, the letter ()ﻱ
documents like text. The fetched JSON objects from Twitter pronounced Yaa is normalized to dotless Yaa () ﻯand ()ـﺔ
API are inserted into the database. Further, the Tweets object Taa marbutah to ()ـﻪ. Finally, the Stop Words are filtered
contains several attributes including (i) 'created_at', which using the Arabic stop words list in the Natural Language
represents the time when the tweet was posted and (ii) Toolkit (NLTK) [32]. We modify the list to add the missing
'full_text' contains the message content. After that, we word and normalize the words before using them.
checked the redundancy and removed duplicate tweets Furthermore, we check the result of the pre-processing
(retweets). The total number of tweets after removing the phase before starting the classification. If the remaining
duplicates is about 1 million. number of tokens is equal to zero, the tweet is excluded from
the analysis. Fig. 3 shows the steps applied to a sample
B. Pre-processing tweet. The English translation for the tweet is:
Pre-processing the text is an essential task since the "#Riyadh_now abnormal congestion at the intersection of
Arabic morphology is rich and the Arabic dialectal text prince Fahad St. and University St.!!! Morning @Ruh_Rd".
usually has typos or grammatical mistakes. Also, it is a The removed punctuation, diacritics, and English words are
critical step to reduce the amount of noise before highlighted in red color. The word 'ً 'ﺻﺒﺎﺣﺎends with Fatha
classification because performing analysis directly on Tanween ( ً◌), which is one of the Nunation diacritics. The
dialectal text may lead to poor results. normalized tokens are highlighted in green while the
Algorithm1 summarizes the main pre-processing steps. removed stop words are highlighted in red. All the discussed
First, sparkConnector is used to connect to MongoDB. Then, pre-processing implementation steps in this subsection are
the tweets are loaded and saved in Spark DataFrame. The specific to the Arabic language except tokenization since it is
next step is iterating over the tweets to remove all numbers, based on splitting a string by white space regardless of the
English alphabets and punctuations such as commas (,), language.
period (.), semi-colons (;), colons (:), question marks (?), and
so forth. Likewise, we strip the Arabic question mark ( )؟and C. Feature Extraction
Arabic semi-colons ()؛. Removing punctuations helps to We use Feature Extractors algorithms provided in Spark
reduce the size of the feature set since users rarely use formal ML package. We apply TF-IDF (Term Frequency-Inverse
language. Therefore, most of the punctuation marks are not Document Frequency), which is a measure of how important
used properly and keeping them will not give any valuable a word is to a document (tweet). The TF-IDF is merely the
information. product of TF and IDF. The TF(t, d) is the frequency of the
appearance of term t in document d while the IDF is a
numerical measure of how much information a term pro-
vides. The IDF is calculated using the following equation:
| |
IDF(t, D) = ( , )
1891
Authorized licensed use limited to: University of Botswana. Downloaded on October 18,2024 at 10:42:51 UTC from IEEE Xplore. Restrictions apply.
Fig. 3. Steps of pre-processing applied to a sample tweet
We split the manually labeled data into training sets (80%) and Accident at the same time. To address this problem, we
and testing sets (20%). After that, we build and train model treat each label as a separate binary classification problem.
using Naïve Bayes, SVM, and logistic regression (LR) Thus, we trined eight binary classifiers. For each event type,
algorithms. The models are trained on the training set. To we consider the tweets about the event as positive while all
find the best algorithm, we evaluate them over the testing set. the remaining tweets about the other types of events as
The common statistical metrics, such as precision, accuracy, negative. However, this will lead to imbalance sampling
recall, and F-score are used to evaluate the trained classifier. where the number of negative is larger than the positive. To
To clarify the meaning of these metrics, we refer to traffic- adjust the class distribution and eliminate the effect on
related tweets as positive class and none related as negative evaluation results, we perform undersampling for the
class. The following four classes are used in these metrics: negative (majority) class using the random undersampling
(i) True Positive (TP) for the positive tweets that correctly method to make the data set balanced before evaluation. We
predicted as positive, (ii) True Negative (TN) for the prefer undersampling by removing samples from the
negative tweets that correctly predicted as negative, (iii) majority class instead of oversampling by taking repeated
False Positive (FP) refers to the tweets that labeled as samples from the minority class. Since the number of the
negative but predicted as positive, and (iv) False Negative negative labels is very large compared to the positive where
(FN) for the tweets that labeled as positive but predicted as it contains all the tweets about the other event types. Even
negative. The corresponding equations for each matric are though undersampling leads to loss of information, in our
listed below. The accuracy is calculated by Eq. (3), Precision case, correctly classifying the negative labels is less
(Positive Predictive Value) by Eq. (4), Recall (True Positive important than the positive labels. Moreover, after detecting
Rate) by Eq. (5) and F-Score by Eq. (6). the events, we extract the time of occurrence using the time,
and date information from 'created_at' attribute in the tweets
acc = object. Furthermore, we extract information about each
event including location information using the top frequent
terms since people usually refer to the event place using the
PPV = hashtag. For model evaluation, we use the same evolution
method explained in section 4.C. To validate the effective-
ness of our event detection approach, we extract the top
TPR =
vocabularies from the tweets of each detected events. Then,
.
we use these vocabularies to search in the official news/
F(β) = (1 +β2). newspapers websites to confirm the occurrence of the events.
⋅
After that, we compare the extracted information by our
method including time and location with the real information
E. Event Detection in the official sources.
For event detection, we build and train classifier using
the Naïve Bayes, SVM, and logistic regression algorithms. IV. RESULTS AND DISCUSSION
To train the events classifier, the authors manually label part
A. Results for Tweets filtering
of the filtered data from the previous step into eight event
categories, which are Fire, Weather, Social Events, Traffic The performance of the three classification algorithms
Condition, Roadwork, Road Damage, Accident, and Road (Naïve Bayes, SVM, and Logistic Regression algorithms) for
Closures. Traffic condition category includes negative and tweet filtering is measured using the evaluation metrics
positive tweets about the traffic condition. For Fire events, explained in Eq. (3-6). Fig. 4 shows that SVM is better than
all tweets about fires are included under this category even Naïve Bayes and Logistic Regression algorithms in term of
though it is not a vehicle fire because it may affect negatively accuracy, F-score and precision. Furthermore, both SVM and
on the traffic and cause congestion. Furthermore, for the Logistic Regression achieved recall of 90%.
social event, we focus only on the events that could affect the
traffic (e.g., carnival, national day).
During our analysis, we notice that some event types
have a large number of tweets compared to the other. So, we
divided them into small-scale and large-scale events based
on the number of tweets. The small-scale events are Traffic
Condition, Roadwork, Road Damage, Accident, and Road
Closures. The number of tweets for these events is small
compared to Fire, Weather, and Social Events. So, we
consider them as large-scale events.
Furthermore, we have a multi-label classification
problem, since the classes (event types) are not mutually
exclusive and the same tweet can belong to more than one
Fig. 4. Evaluation results for tweets filtering
class. For example, the tweet can be about Traffic Condition
1892
Authorized licensed use limited to: University of Botswana. Downloaded on October 18,2024 at 10:42:51 UTC from IEEE Xplore. Restrictions apply.
(a) (b)
(c) (d)
Fig. 5. Evaluation results for events classification (a) Accuracy, (b) Precision, (c) Recall and (d) F-score
Fig. 6. Sample of the detected large-scale events per day (Year: 2018) Fig.7. Sample of the detected small-scale events per day (Year: 2018)
B. Results for event detection Traffic Condition, Accident, Road Damage, Roadwork and
Road Closure are shown in Fig. 7. Moreover, we validated
Fig. 5 illustrates the evaluation results for the binary clas- our event detection approach by searching in the official
sification of events. The figure shows the four metrics sources. From the tweets of each detected events, we
Accuracy, Precision, Recall, and F-score, respectively. We extracted the top vocabularies. Then, we searched in the
compared the result to select the algorithm that achieves official news websites and local newspapers websites such as
higher results for the four metrics. We found that for Road Okaz and Sabq. After that, we extracted the time infor-
Closures, Accident and Traffic Condition events, SVM mation from the tweet object and drew charts to show the
worked better than the other algorithm. On the other side, number of tweets in hours by day.
the logistic regression algorithm achieved higher results for Fig. 8 shows the hourly number of tweets related to Social
Social Event, Roadwork and Road Damage. Besides, SVM event. From the tweets about the Social Event on the 23rd of
and logistic regression algorithms gave similar results for September, we listed the top vocabularies: ( ﻭﻁﻨﻰnational),
Fire and Weather events. (ﻳﻮﻡday), (ﺍﺣﺘﻔﺎﻝcelebration), (ﺍﺣﺘﻔﺎﻻﺕcelebrations),
Moreover, we noticed that the results for Weather and ( ﺳﻌﻮﺩﻱSaudi). The vocabularies illustrate that the detected
Fire are higher than the other events. We assume that the event is the Saudi national day celebration where many
reason is that our dataset contains only one big fire event (as activities were organized by municipalities in different cities.
explained later in this section), and thus we expected that The second large-scale detected event is about the
most tweets about it contain similar vocabularies, which weather condition. Fig. 9 shows the number of tweets in
make the classification easier. Similarly, most of the tweets hours by day. The highest number of tweets about the
related to the weather condition are about rains.. weather was on the 27th of September. The top extracted
Furthermore, we created charts to show the detected vocabularies about this event are ( ﺍﻻﻥnow), ( ﻣﻄﺮrain), ﻁﺎﻳﻒ
events per day. We divided them into two categories: large- (Taif), ( ﺍﻣﻄﺎﺭrains), ( ﻣﻜﻪMakkah). The news reports
scale and small-scale based on the number of tweets. Fig. 6 indicated that there were rains in Makkah region including
shows the large-scale events: Social Events, Weather, and Makkah and Taif cities on the same date.
Fire. On the other side, the small scale events including
1893
Authorized licensed use limited to: University of Botswana. Downloaded on October 18,2024 at 10:42:51 UTC from IEEE Xplore. Restrictions apply.
Fig. 9. The number of tweets per hour for the top 'Weather' events
Fig. 8. The number of tweets per hour for the top 'Social Event'
Fig. 10. The number of tweets per hour for the 'Fire' events Fig. 11. The number of tweets per hour for the top 'Traffic Condition' event
1894
Authorized licensed use limited to: University of Botswana. Downloaded on October 18,2024 at 10:42:51 UTC from IEEE Xplore. Restrictions apply.
In the future, we will improve the location detection World of Wireless, Mobile and Multimedia Networks (WoWMoM),
approach to extract the exact location of the event especially 2017, pp. 1–6.
if it is not mentioned in the text. Besides, we will develop a [14] N. Pavlopoulou, A. Abushwashi, F. Stahl, and V. Scibetta, “A text
mining framework for Big Data,” Expert Updat., vol. 17, no. 1, 2017.
sentiment classifier to identify positive and negative tweets.
For instance, traffic condition events will be classified into [15] S. Klaithin and C. Haruechaiyasak, “Traffic Information Extraction
and Classification from Thai Twitter,” Comput. Sci. Softw. Eng.
positive (no traffic jam) and negative (traffic jam). We will (JCSSE), 2016 13th Int. Jt. Conf., pp. 1–6, 2016.
also improve the design, analysis and data variety aspects of [16] A. Kumar, M. Jiang, and Y. Fang, “Where not to go?: detecting road
our work. Finally, the proposed methodology can be applied hazards using twitter,” in Proceedings of the 37th international ACM
to event types other than transportation, and other areas, …, 2014, vol. 2609550, pp. 1223–1226.
because we collect all the tweets (without any filtering) and [17] D. A. Kurniawan, S. Wibirama, and N. A. Setiawan, “Real-time
then build and train a classifier to filter out irrelevant tweets. Traffic Classification with Twitter Data Mining,” in In 2016 8th
International Conference on Information Technology and Electrical
ACKNOWLEDGMENT Engineering (ICITEE), 2016, pp. 1–5.
[18] D. Semwal, S. Patil, S. Galhotra, A. Arora, and N. Unny, “STAR:
The work carried out in this paper is supported by the Real-time Spatio-Temporal Analysis and Prediction of Traffic
HPC Center at King Abdulaziz University. Insights using Social Media,” in In Proceedings of the 2nd IKDD
Conference on Data Sciences, 2015, p. 7.
REFERENCES [19] M. R. Alifi and S. H. Supangkat, “Information Extraction for Traffic
[1] G. Cookson, “World Health Organization: Road traffic injuries.” Congestion in Social Network,” in International Conference on ICT
[Online]. Available: https://www.who.int/news-room/fact- For Smart Society, 2016, no. July, pp. 20–21.
sheets/detail/road-traffic-injuries. [Accessed: 18-Feb-2019]. [20] R. Hanifah, S. H. Supangkat, and A. Purwarianti, “Twitter
[2] “INRIX Global Traffic Scorecard.” [Online]. Available: information extraction for smart city,” Proc. - 2014 Int. Conf. ICT
http://inrix.com/scorecard/. [Accessed: 18-Feb-2019]. Smart Soc. “Smart Syst. Platf. Dev. City Soc. GoeSmart 2014”, ICISS
2014, pp. 295–299, 2014.
[3] R. Mehmood, B. Bhaduri, I. Katib, and I. Chlamtac, Eds., Smart
Societies, Infrastructure, Technologies and Applications, Lecture [21] P. Tejaswin, R. Kumar, and S. Gupta, “Tweeting Traffic: Analyzing
Notes of the Institute for Computer Sciences, Social Informatics and Twitter for generating real-time city traffic insights and predictions,”
Telecommunications Engineering (LNICST), Volume 224, vol. 224. Proc. 2nd IKDD Conf. Data Sci. - CODS-IKDD ’15, pp. 1–4, 2015.
Cham: Springer International Publishing, 2018. [22] N. Dhavase and A. M. Bagade, “Location identification for crime &
[4] J. Schlingensiepen, F. Nemtanu, R. Mehmood, and L. McCluskey, disaster events by geoparsing Twitter,” 2014 Int. Conf. Converg.
“Autonomic Transport Management Systems—Enabler for Smart Technol. I2CT 2014, pp. 2–4, 2014.
Cities, Personalized Medicine, Participation and Industry [23] O. Mohammad AL-Smadi , Qawasmeh, “Knowledge-based Approach
Grid/Industry 4.0,” in Intelligent Transportation Systems – Problems for Event Extraction from Arabic Tweets,” Int. J. Adv. Comput. Sci.
and Perspectives, Volume 32 of the series Studies in Systems, Appl., vol. 7, no. 6, 2016.
Decision and Control, Springer International Publishing, 2016, pp. 3– [24] N. Alsaedi and P. Burnap, “Arabic Event Detection in Social Media,”
35. in LNCS, vol. 9041, 2015, pp. 384–401.
[5] Z. Alazawi, O. Alani, M. B. Abdljabar, S. Altowaijri, and R. [25] N. Alsaedi, P. Burnap, and O. Rana, “Can We Predict a Riot ?
Mehmood, “A Smart Disaster Management System for Future Cities,” Disruptive Event Detection Using Twitter,” ACM Trans. Internet
WiMobCity ’14. Int. Work. Wirel. Mob. Technol. Smart Cities, pp. 1– Technol., vol. 17, no. 2, p. 18, 2017.
10, 2014.
[26] W. Alabbas, M. Haider, A. Mansour, G. Epiphaniou, and I.
[6] D. Wang, A. Al-Rubaie, J. Davies, and S. S. Clarke, “Real time road Frommholz, “Classification of Colloquial Arabic Tweets in real- time
traffic monitoring alert based on incremental learning from tweets,” in to detect high-risk floods,” Soc. Media, Wearable Web Anal. (Social
In 2014 IEEE Symposium on Evolving and Autonomous Learning Media), 2017 Int. Conf. IEEE., 2017.
Systems (EALS), 2014, pp. 50–57.
[27] E. Alomari and R. Mehmood, “Analysis of tweets in Arabic language
[7] M. Ni, Q. He, and J. Gao, “Forecasting the Subway Passenger Flow for detection of road traffic conditions,” in Lecture Notes of the
under Event Occurrences with Social Media,” IEEE Trans. Intell. Institute for Computer Sciences, Social-Informatics and
Transp. Syst., vol. 18, no. 6, pp. 1623–1632, 2017. Telecommunications Engineering, LNICST, Volume 224, 2018, vol.
[8] S. Wang, L. He, L. Stenneth, P. S. Yu, and Z. Li, “Citywide traffic 224, pp. 98–110.
congestion estimation with social media,” in Proceedings of the 23rd [28] E. Alomari, R. Mehmood, and I. Katib, “Sentiment Analysis of
SIGSPATIAL International Conference on Advances in Geographic Arabic Tweets for Road Traffic Congestion and Event Detection,” in
Information Systems - GIS ’15, 2015, pp. 1–10. In: Mehmood R., See S., Katib I., Chlamtac I. (eds) Smart
[9] S. Agarwal, N. Mittal, and A. Sureka, “Potholes and Bad Road Infrastructure and Applications: Foundations for Smarter Cities and
Conditions- Mining Twitter to Extract Information on Killer Roads,” Societies, Springer, 2019, p. to appear.
ACM India Jt. Int. Conf. Data Sci. Manag. Data CoDS-COMAD [29] I. Salas, A., Georgakis, P., Nwagboso, C., Ammari, A. and Petalas,
2018, 2018. “Traffic Event Detection Framework Using Social Media,” in IEEE
[10] A. Oussous, F.-Z. Benjelloun, A. A. Lahcen, and S. Belfkih, “Big International Conference on Smart Grid and Smart Cities, 2017, no.
Data technologies: A survey,” J. King Saud Univ. - Comput. Inf. Sci., July, p. 5.
2017. [30] S. Suma, R. Mehmood, N. Albugami, I. Katib, and A. Albeshri,
[11] T. Sakaki, Y. Matsuo, T. Yanagihara, N. P. Chandrasiri, and K. “Enabling Next Generation Logistics and Planning for Smarter
Nawa, “Real-time event extraction for driving information from social Societies,” Procedia - Procedia Comput. Sci., pp. 1–6, 2017.
sensors,” in Proceedings - 2012 IEEE International Conference on [31] S. Suma, R. Mehmood, and A. Albeshri, “Automatic Event Detection
Cyber Technology in Automation, Control, and Intelligent Systems, in Smart Cities Using Big Data Analytics,” in In International
CYBER 2012, 2012, pp. 221–226. Conference on Smart Cities, Infrastructure, Technologies and
[12] E. D’Andrea, P. Ducange, B. Lazzerini, and F. Marcelloni, “Real- Applications, 2017, pp. 111–122.
Time Detection of Traffic from Twitter Stream Analysis,” IEEE [32] E. Loper and S. Bird, “NLTK: The Natural Language Toolkit.” arXiv
Trans. Intell. Transp. Syst., vol. 16, no. 4, pp. 2269–2283, 2015. preprint cs/0205028, 2002
[13] R. Y. K. Lau, “Toward a social sensor based framework for intelligent
transportation,” in 2017 IEEE 18th International Symposium on A
1895
Authorized licensed use limited to: University of Botswana. Downloaded on October 18,2024 at 10:42:51 UTC from IEEE Xplore. Restrictions apply.