0% found this document useful (0 votes)

111 views9 pages

A Framework For Social Media Data Analytics Using Elasticsearch and Kibana

1. The document presents a framework for real-time analysis of large-scale social media data using Elasticsearch and Kibana. 2. Twitter data is collected through APIs and stored in MongoDB before being standardized and loaded into Elasticsearch for efficient querying and aggregation. 3. Results from Elasticsearch queries are visualized in Kibana in near real-time, allowing analysis of over 1 billion social media posts for applications like political trend analysis and public health monitoring.

Uploaded by

festau

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

111 views9 pages

A Framework For Social Media Data Analytics Using Elasticsearch and Kibana

Uploaded by

festau

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Wireless Networks (2022) 28:1179–1187

https://doi.org/10.1007/s11276-018-01896-2 (0123456789().,-volV)(0123456789().
,- volV)

A framework for social media data analytics using Elasticsearch

and Kibana
Neel Shah1 • Darryl Willick1 • Vijay Mago1

Published online: 11 December 2018

Springer Science+Business Media, LLC, part of Springer Nature 2018

Abstract
Real-time online data processing is quickly becoming an essential tool in the analysis of social media for political trends,
advertising, public health awareness programs and policy making. Traditionally, processes associated with offline analysis
are productive and efficient only when the data collection is a one-time process. Currently, cutting edge research requires
real-time data analysis that comes with a set of challenges, particularly the efficiency of continuous data fetching within the
context of present NoSQL and relational databases. In this paper, we demonstrate a solution to effectively adsress the
challenges of real-time analysis using a configurable Elasticsearch search engine. We are using a distributed database
architecture, pre-build indexing and standardizing the Elasticsearch framework for large scale text mining. The results from
the query engine are visulized in almost real-time.

Keywords Social media Big data Real-time analysis Elasticsearch Visualization

1 Introduction Security solutions such as encrypted searching are not

feasible to implement specific to real-time analysis because
The exponential growth of online data poses a significant of computational limitations [7]. Currently, the top three
challenge in the process of fetching a representative data tools used for analyzing large databases are Elasticsearch,
set that can be translated into tangible results [1, 2]. Pre- Hadoop and Spark [8]. Elasticsearch is a distributed search
processing in real-time adds another layer of complexity, and analytical engine which allows for real-time data
especially when the data is textual and unstructured [3] or transformations, search queries, document stream pro-
crowd sourced [4]. Solutions to processing big data sets in cessing and indexing at a relatively high speed. Addition-
the fields of cloud computing and storage are growing at ally, Elasticsearch can index numbers, geographical
rapid speed, but when we consider big data on a scale of coordinates, dates and almost any datatype while support-
petabytes [5], cloud based analytics are limited by network ing multiple languages (i.e., Python, Java, Ruby). The
inefficiencies for transporting the data; and recurring costs speed of the Elasticsearch engine is founded on its ability
for the computational resources required to perform anal- to perform aggregation, searching and processing the index
ysis in real-time [6]. Access and privacy also pose a of the data [9]. Hadoop is a distributed batch computing
challenge in cloud based storage as server administrators platform, using the MapReduce algorithm, that includes
maintain the rights to view both the data and its flow. data extraction and transformation capabilities. While the
platform is based on NoSQL technology that makes
uploading unstructured data easy, its query processing
& Vijay Mago
[email protected] HBASE does not have advanced analytical search capa-
bilities like Elasticsearch. Elasticsearch is a text search and
Neel Shah
[email protected] analytics tool with a visualization plugin for real-time
analysis with an open source license. Finally, Elasticsearch
Darryl Willick
[email protected] hosts plugins for Hadoop and Spark to reduce the distance
between the two different technologies and allows for a
1
Deparment of Computer Science, Lakehead University, hybrid system to be implemented [10].
Thunder Bay, ON, Canada

123
1180 Wireless Networks (2022) 28:1179–1187

Tools that support the management of large data sets discuss some of the general functions of Elasticsearch to
and real-time data fetching include relational (MySQL, provide context for the Elasticsearch configurization and
Oracle Database, SQLite), Graph (Neo4j, Oracle Spatial) data standardization and shard management procedure
and NoSQL (MongoDB, IBM Domino, Apache CouchDB). resulting from this research.
Limiting factors related to all types of databases include
lack of support for full-text searches in real-time. While 2.2 Abstract view
NoSQL is functional for full text searching it lacks relia-
bility when compared to relational database models [3]. Figure 1 illustrates the framework for real-time analysis of
Traditional databases require that the data is first uploaded very large scale data based on Elasticsearch and Kibana
and then the administrator must actively decide which data [13]. In the first step, the Twitter API is used for scraping
should be indexed which adds one more layer of processing twitter data (approximately 1400 tweets per minute) that is
making it infeasible for real-time analysis. Elasticsearch stored in a MongoDB database, which is installed on a
provides a solution to these limiting factors [3] by pro- Network Attached Storage (NAS) with a capacity of 16TB.
viding a highly efficient data fetching and real-time anal- The twitter data is transfered to preprocssing units which
ysis system that: handle the data and transfer it to High Performance Com-
puting (HPC) infrastructure in almost real-time. As tradi-
• Performs pre-indexing before storing the data to avoid
tional databases, including MongoDB, are not efficient
the need to fetch and query specific data in real-time;
enough to handle real-time query, we transfer the pro-
• Requires limited resources and computing power in
cessing and analsis of data to Elasticsearch, which is
relation to traditional solutions; and
implemented via HPC lab resources. Before uploading the
• Provides a system that is distributed and easy to scale.
data, we standardize the twitter object for Elasticsearch and
The capacity for Elasticsearch to contribute to high effi- use multithreading to upload the data for better real-time
ciency, real-time data analysis is enhanced through a performance and to shorten the gap between receiving and
standardized configuration process, shard size management processing data. When a user needs any data, a query will
and standardizing the data before upload into Elasticsearch be sent to Elasticsearch using the Kibana front-end. Elas-
and demonstrated through a discussion of both the working ticsearch processes that query and sends the query result
architecture as well as a real-time visualization of social object (JSON format) to Kibana, where Kibana shows the
media data collected during December 2017 and May query object to the user.
2018, a repository of over 1 billion twitter data points. Within the general functioning of the search engine,
Elasticsearch uses a running instance called a node which
1.1 Key contributions can take on one or more roles including a master or a data
node (see Sect. 2.1, Fig. 2). Dataset clusters within Elas-
• Optimizing and standardizing twitter data for ticsearch require at least one master and one data node,
Elasticsearch however it is possible that a cluster can consist of a single
• Creating a configuration file and choosing the optimal node since a node may take on multiple roles. The only
shard size data storage format compatible with Elasticsearch is JSON
• Demonstrating the real-time visualization of a very and therefore requires data mapping for producing func-
large scale social media data set tional analysis and visualizations due to the unstructured
format of the twitter data. We observed that reliance on the
JSON format makes the system more flexible than MySQL
2 Architecture for real-time analysis and other RDBMS, but less than MongoDB. While a tra-
and storage ditional database such as RDBMS use tables to store the
data, MongoDB uses BSON (like JSON) format, and
2.1 Elasticsearch Elasticsearch uses an inverted index via the Apache Lucene
architecture to store the data [11]. A typical index in
Elasticsearch was started in the year 2004 as an open Elasticsearch is a collection of documents with different
source project called compass, which was based on Apache properties that have been organized through user defined
Lucene [11]. Elasticsearch is a distributed and scalable mapping that outlines document types and fields for dif-
full-text search engine written in Java that is stable and ferent data sources; similar to a table in an SQL database.
platform independent. These features combined with The index is then split into shards housed in multiple nodes
requirement specific flexibility and easy expansion options where a shard is part of an index distributed on different
are helpful for real-time big data analysis [12]. We will nodes. Within the Elasticsearch framework, the inverted
index allows a more categorical storage of big data sets

123
Wireless Networks (2022) 28:1179–1187 1181

Fig. 1 Framework for real-time analysis using Elasticsearch

Fig. 2 Elasticsearch cluster architecture hosted on the HPC at Lakehead University

within nodes and shards so that real-time search queries are 2.2.1 Backbone
more efficient. Elasticsearch uses RESTful API to com-
municate with users, see Table 1 for a basic architecture While Elasticsearch is a powerful tool, a model is required
comparison. Additionally, there are different libraries such to optimize functionality for the purpose of real-time big
as Elasticsearch in Python [14] and Java [15] for better data analysis specific to social media. The purpose of this
integration. research is to provide (1) a specific configuration file to
optimize the organization of the data set, (2) an optimized
shard size for maximum efficiency in storage and pro-
cessing, and (3) a standardized structure for data fields
present within Twitter to eliminate over-processing of
irrelevant information When the data is stored in Elastic-
search, it stores the data in an index first, and then the index
Table 1 Comparison between
Elasticsearch RDBMS data is stored as an inverted-index using an automatic
Elasticsearch and RDBMS basic
architecture Index Database
tokenizer. When we search in Elasticsearch, we get a
Mapping Table
‘snapshot’ of the data, which means that Elasticsearch does
Document Tuple
not require the hosting of actual content but instead links to
documents stored within a node to provide a result through

123
1182 Wireless Networks (2022) 28:1179–1187

the inverted index. These results are not real data but a Logstash [16] make it convenient for functional represen-
representation of the query’s linkages to all associated tations of big data in real-time. It is part of the elastic stack
documents stored in each node. As a component of this and is freely available under open source license. Kibana
project, the following configuration file was developed and has multiple standard visualizations available by default
can be replicated in Elasticsearch on any HPC by editing and simplifies the process of developing visualizations for
the config files as per number of nodes and capacity of end users with a drag and drop feature. As Kibana is
server. Table 2 describes the basic configuration file for backed by the Elasticsearch architecture, it functions
Elasticsearch. quickly and is efficient enough for real-time analysis.
Here, the name of a cluster is dslab and a cluster name is Finally it provides the opportunity for graphical interaction
necessary, even if only a single node is present. As the in the process of building and handling queries with an
Elasticsearch is a scattered database, where one or many accessible visualization of the cluster health and properties
nodes work as heads and others as data, this parameter is within the database.
used to interconnect all the nodes in the cluster. We can
create numerous clusters with the same hardware using
different instances of Elasticsearch and different configu- 3 Social media data analysis
ration files.
Table 3 is an example of a configuration file features for 3.1 Configuration of the Elasticsearch
any Elasticsearch node. In every node for the distributed
Elasticsearch we have to configure the same file in each Live social media streaming data is stored in elastic clus-
and every instance. When the data is stored we use the ters. Each elastic cluster contains 6 nodes, with each node
index to store a specific type of data similar to a dataset in having 2 threads and 12 GB of memory. Within these 6
MySQL. The performance of Elasticsearch is based on the nodes one node works as a master and the remaining 5
mapping of the index and how we size the shards of the work as data nodes. Architecture of the elastic cluster is
data set. The formula to decide the size of the shards is shown in Fig. 2.
given in Eq. 1.
Number of shards ¼ ðSize of index in GBÞ=50 ð1Þ 3.2 Social media dataset

The reason behind the consideration of using 50 GB as a We used Elasticsearch to analyze 250? million out of 1
shard size is due to the architecture in Elasticsearch. The billion tweets scraped between December 2017 and May
architecture supports 32 GB index size and 32 GB cache 2018 using the Twitter API. Since the Twitter API response
memory so ideally the shard’s memory should be less than is in JSON format and contains unstructured and incon-
64 GB and through experimentation we observed that the sistent data the sequential collection of all data fields
best results are achieved at shard size of 50 GB. within the tweet JSON object is not guaranteed. Stan-
dardization of the data and conversion into a structured
2.3 Kibana: visualization format is therefore necessary for Elasticsearch mapping so
that each field of data is present when loaded into the
In addition to Elasticsearch being efficient for real-time index. To optimize the Elasticsearch we changed the
analysis, extended plugins such as kibana [13] and storage format of the tweet so that all the data is required to

Table 2 Master and data node

Master node config file Data node config file
configuration file
cluster.name: dsla cluster.name: dslab
node.name: m1 node.name: d1
node.master: true node.master: false
node.data: true node.data: true
path.data: /data/nshah5/dataset path.data: /data/nshah5/dataset
path.logs: /data/nshah5/log path.logs: /data/nshah5/log
network.host: x.x.x.x network.host: x.x.x.x
network.bind_host: 0 network.bind_host: 0
network.publish_host: x.x.x.x network.publish_host: x.x.x.x
discovery.zen.ping.unicast.hosts: [‘‘x.x.x.x’’] discovery.zen.ping.unicast.hosts: [‘‘x.x.x.x’’]
bootstrap.system_call_filter: false bootstrap.system_call_filter: false

123
Wireless Networks (2022) 28:1179–1187 1183

Table 3 Elasticsearch node configuration file features

Config file properties Explanation

cluster.name It is the name of cluster where present node will join.

node.name It gives the name of your current node
node.master The role of master-eligible is decided based on true or false function (Boolean function). The master node manages the
overall state of the cluster including node monitoring, index creation and deletion, and shard to node assignments.
node.data The role of data is decided based on true or false function (Boolean function). It stores the physical data shards, performs
reads, writes, searches and aggregations. Any node can be master and data, both or individual.
path.data The location of the actual data in present node is represented.
path.logs Location where the logs of the present nodes are stored. Logs are important to diagnose problems and monitor working
status.
network.host It’s an address of the present node which is unique for the individual node in the cluster.
network.publish_host It’s a public address where other nodes communicate with the present node.

be at depth level one in JSON format. Table 4 depicts the Table 5 Search query result of ‘‘pizza’’ keyword
basic example of restructured data in Elasticsearch. Result of keyword ‘‘pizza’’ from all tweets from database
As we mentioned previously, the data is stored as an
inverted index that is optimized for text searches and {
therefore very efficient. For example, if we search for the ‘‘took’’: 4060,
keyword ‘‘pizza’’ within the context of all tweets (250? ‘‘timed out’’: false,
millions) in Elasticsearch, the time taken is 4060 ms ‘‘shards’’: {
(4.06 s) to find a total of 192,118 tweets where the ‘‘pizza’’ ‘‘total’’: 106,
keyword is present in tweet text. Table 5 shows the ‘‘successful’’: 106,
example of the keyword ‘‘pizza’’ text search query ‘‘skipped’’: 0,
response from Elasticsearch. Figure 3a shows a pie chart of ‘‘failed’’: 0
tweets mapping the geographical distribution by nation of }
‘‘pizza’’ tweets where the United States alone is responsi- ‘‘hits’’: {
ble for 47% of total tweets and other countries excluding ‘‘total’’: 192118,
the top five are 30%, which is 77% of total tweets. Addi- ‘‘max_score’’: 15.110959,
tionally, the visualization shows the time taken to perform ’’hits’’: [???]
the query is 13 ms (0.013 s). Figure 3b shows five most }
used languages in the tweet text related to ‘‘pizza’’ where
the English language is used in more than 77% tweets
while Spanish is used 12%, Portuguese at third spot with
6%, French at 3% and Japanese at 2% tweets. In this
instance Elasticsearch took 17 ms for query processing.
Table 4 Difference between normal and updated structure Figure 3c shows the devices used to tweet with 38% of
Original tweet structure Updated structure tweets coming from the iPhone twitter app, the Android
twitter app was used for 29%, twitter web clients were used
{ {
for only 11% and Twitter lite and Tweetdeck combined
‘‘??Tweet’’:{ ‘‘Id’’:
were used for around 7%. Other sources were indicated for
‘‘User’’??:{ ‘‘Name’’:
the remaining 15% tweets. This query took 11 ms to exe-
‘‘Id’’??: ... cute, which is quite reasonable given the structure and
‘‘Name’’??: } amount of data.
} The above results demonstrate the efficiency of this data
}, analysis system in that all three tasks (fetching the data,
... performing descriptive analysis and creating graphs), were
} accomplished in less than 15 s from a database size of
250? million tweets. Clearly, this framework has proven

123
1184 Wireless Networks (2022) 28:1179–1187

suitable for the analysis of large text data in real-time

without losing accuracy. It also shows that the restructuring
and standardization procedures used on the data assisted in
optimizing the accuracy of the results and efficiency of the
processes in a context with limited resources.

3.3 Visualization dashboard

At present, the monitoring framework described in this

paper is used to display data coming from Twitter stream.
For example, in Fig. 4 we show a snapshot of the Kibana
dashboard. The top-most plot is a pie chart of tweet source,
which displays the results from which device they use to
tweet, such as iPhone, web browser etc. The second top-
most plot is pie chart of the languages used to tweet. In the
middle, first histogram shows the the time and amount of
twitter data flow. And, the second shows the word cloud
and the bottom left shows the top ten users who are actively
twitting. Similar dynamic dashboard creation is possible in
minutes without knowledge of any programming knowl-
edge and back-end system understanding.

3.4 Limitation

As Elasticsearch is designed to be used for real-time

analysis, there are databases which provide functions that
perform better in offline mass data analysis such as NoSQL
databases (e.g., MongoDB) that support MapReduce [3].
Elasticsearch does not support MapReduce as it instead
relies on the inverted index [17]. Additionally, Elastic-
search can be slow when new data is added to the index and
it currently lacks support for more popular data formats
(e.g., XML, CSV) and only supports JSON format which
can be challenging for users unfamiliar with JSON [18].

4 Related work

Marcos [6] suggests that cloud computing is elastic in

nature as the user can adjust it as per his/her data needs
from processing power to storage. While it does seem ideal
in theory, cloud computing comes with several challenges
including both network inefficiency in data transport as
well as issues related to data privacy and access control.
Additionally, Hashem refers to ‘data stabbing’, which are
problems associated with storing and analyzing the
heterogenous and complex structure of big datasets [19].
As a solution, other authors such as Oleksii [3] support and
highlight the benefits of Elasticsearch as a tool for real-time
analysis in modern data mining repositories. In this
research we attempted to address and resolve problems
Fig. 3 Real-time analysis of Twitter data for the term ‘‘pizza’’ associated with data preprocessing and efficiency while
also discussing the elastic cluster framework in more depth.

123
Wireless Networks (2022) 28:1179–1187 1185

Fig. 4 Partial view of the Kibana dashboard for the twitter analysis

Currently there are very few research studies on frame- their system sends the large amount of data which is stored
works for big data analysis in real-time although several in distributed NFS. During the preprocessing of the data,
discuss the application of practices in manufacturing [20] which includes analysis of string and basic cleaning, they
and gene coding [21]. Some researchers have used Elas- index the data and make it compatible for Elasticsearch.
ticsearch cluster via a logstash plugin and MySQL data- This model allows users in a different location to query the
bases for heterogenous accounting information system same experimental data which is computed in different part
[22]. The data is monitored using MySQL server before of the world in real-time. All these present environments
inserting it into Elasticsearch. The researchers observed needs to be correctly configured as per the data and the
that there might be an issue of duplication of data and requirements [24].
storage space, but the architecture ensures flexibility and
modularity for the monitoring the system. They choose
Elasticsearch as text search engine in real-time which 5 Conclusion
allows them to search historical data. Mayo Clinic
healthcare system developed a big data hybrid system Elasticsearch provides a functional system to store, pre-
using Hadoop and Elasticsearch technology. In healthcare, index, search and query very large scale data in real-time.
real-time result is essential for effective decision making. In particular, the capability of expanding the cluster size
Before that, they used traditional RDBMS database to store without stopping service as per user’s requirement makes it
and process data. But, it lacks integration between different suitable for this application. This research provides insights
platforms and inability to querying/ingest of healthcare on how to standardize and configure the processes of
data in a real-time or near real-time. In Mayo Clinic system Elasticsearch which result in increased analysis efficiency.
Hadoop is used as a distributed file system and on top of it To demonstrate the functionality and interactivity for users,
Elasticsearch works as a real-time text search engine. the Kibana plugin was used as an interface. In conclusion, a
When there is a need for raw data Hadoop is used, and for proper configuration of Elasticsearch and Kibana makes
real-time analysis Elasticsearch is used. Their experimen- real-time analysis of large scale data efficient and can help
tation showed very promising results, like searching 25.2 policy makers see the results instantaneously and in an
million HL7 records took just 0.21 s [23]. accessible format that allows for decision making.
Designsafe web portal by Natural Hazards Engineering
Research(NHER) analyze and share experimental data in Acknowledgements This research is funded by the NSERC Discov-
ery Grant; computing resources are provided by the High Perfor-
real-time with researchers across the world. The user of mance Computing (HPC) Lab and Department of Computer Science

123
1186 Wireless Networks (2022) 28:1179–1187

at Lakehead University, Canada. Authors are grateful to Gaurav 18. Burkitt, K. J., Dowling, E. G., & Branon, T. R. (2014). System
Sharma for initially setting up the data collection stream, Salimur and method for real-time processing, storage, indexing, and
Choudhury for providing insight on the data analysis and Andrew delivery of segmented video. US Patent 8,769,576.
Heppner for reviewing and editing drafts. 19. Hashem, I. A. T., Yaqoob, I., Anuar, N. B., Mokhtar, S., Gani, A.,
& Khan, S. U. (2015). The rise of big data on cloud computing:
Review and open research issues. Information Systems, 47,
98–115.
References 20. Yang, H., Park, M., Cho, M., Song, M., & Kim, S. (2014). A
system architecture for manufacturing process analysis based on
1. Cervellini, P., Menezes, A. G., & Mago, V. K. (2016). Finding big data and process mining techniques. In 2014 IEEE interna-
trendsetters on yelp dataset. In 2016 IEEE symposium series on tional conference on big data (pp. 1024–1029). IEEE.
computational intelligence (SSCI) (pp. 1–7). IEEE. 21. Stelzer, G., Plaschkes, I., Oz-Levi, D., Alkelai, A., Olender, T.,
2. Belyi, E., Giabbanelli, P. J., Patel, I., Balabhadrapathruni, N. H., Zimmerman, S., et al. (2016). Varelect: The phenotype-based
Abdallah, A. B., Hameed, W., et al. (2016). Combining associ- variation prioritizer of the genecards suite. BMC Genomics,
ation rule mining and network analysis for pharmacosurveillance. 17(2), 444.
The Journal of Supercomputing, 72(5), 2014–2034. 22. Bagnasco, S., Berzano, D., Guarise, A., Lusso, S., Masera, M., &
3. Kononenko, O., Baysal, O., Holmes, R., & Godfrey, M. W. Vallero, S. (2015). Monitoring of IAAS and scientific applica-
(2014). Mining modern repositories with Elasticsearch. In Pro- tions on the cloud using the elasticsearch ecosystem. In Journal
ceedings of the 11th working conference on mining software of physics: Conference series (Vol. 608, p. 012016). Bristol: IOP
repositories (pp. 328–331). ACM. Publishing.
4. Liu, Q., Kumar, S., & Mago, V. (2017). Safernet: Safe trans- 23. Chen, D., Chen, Y., Brownlow, B. N., Kanjamala, P. P., Arre-
portation routing in the era of internet of vehicles and mobile dondo, C. A. G., Radspinner, B. L., et al. (2017). Real-time or
crowd sensing. In 2017 14th IEEE annual consumer communi- near real-time persisting daily healthcare data into hdfs and
cations and networking conference (CCNC) (pp. 299–304). IEEE. elasticsearch index inside a big data platform. IEEE Transactions
5. Kim, M. G., & Koh, J. H. (2016). Recent research trends for on Industrial Informatics, 13(2), 595–606.
geospatial information explored by twitter data. Spatial Infor- 24. Coronel, J. B., & Mock, S. (2017). Designsafe: Using elastic-
mation Research, 24(2), 65–73. search to share and search data on a science web portal. In
6. Assunção, M. D., Calheiros, R. N., Bianchi, S., Netto, M. A., & Proceedings of the practice and experience in advanced research
Buyya, R. (2015). Big data computing and clouds: Trends and computing 2017 on sustainability, success and impact (p. 25).
future directions. Journal of Parallel and Distributed Computing, ACM.
79, 3–15.
7. Bsch, C., Hartel, P., Jonker, W., & Peter, A. (2014). A survey of
provably secure searchable encryption. ACM Computing Surveys, Neel Shah is a graduate student
47(2), 18:1–18:51. https://doi.org/10.1145/2636328. at Lakehead University, Canada
8. Kumar, P., Kumar, P., Zaidi, N., & Rathore, V. S. (2018). Currently, he is working on
Analysis and comparative exploration of elastic search, Mongodb analyzing social media data to
and Hadoop big data processing. In Soft computing: Theories and gain insight of Canadian healthy
applications, (pp. 605–615). New York: Springer. behaviours. He is an active open
9. Cea, D., Nin, J., Tous, R., Torres, J., & Ayguadé, E (2014). source coder and maintains two
Towards the cloudification of the social networks analytics. In open-source Python libraries.
Modeling decisions for artificial intelligence (pp. 192–203). New His core areas of interest are
York: Springer. deep learning and data science.
10. Bai, J. (2013). Feasibility analysis of big log data real time search
based on hbase and elasticsearch. In 2013 ninth international
conference on natural computation (ICNC) (pp. 1166–1170).
IEEE.
11. Elasticsearch-elastic.co. Retrieved April 30, 2018, from https://
www.elastic.co/guide/en/elasticsearch/reference/6.2/index.html.
12. Gormley, C., & Tong, Z. (2015). Elasticsearch: The definitive
Darryl Willick received the B.Sc.
guide: A distributed real-time search and analytics engine.
(1988) and M.Sc. (1990)
Sebastopol: O’Reilly Media, Inc.
degrees in Computational Sci-
13. Your Window into the Elastic Stack. Retrieved 30, 2018, from
ence from the University of
https://www.elastic.co/products/kibana.
Saskatchewan, Canada.
14. Python Elasticsearch Client. Retrieved April 30, 2018, from
Throughout his career he has
https://elasticsearch-py.readthedocs.io/en/master/.
worked in the areas of High
15. Java Elasticsearch library-Elastic. Retrieved April 30, 2018, from
Performance Computing, Visu-
https://www.elastic.co/guide/en/Elasticsearch/client/java-api/6.2/
alization, System administra-
index.html.
tion, and Cyber Security.
16. Getting Started with Logstash. Retrieved April 30, 2018, from
Currently he is a Technology
https://www.elastic.co/guide/en/logstash/current/getting-started-
Security Specialist/HPCC Ana-
with-logstash.html.
lyst at Lakehead University,
17. Yang, F., Tschetter, E., Léauté, X., Ray, N., Merlino, G., &
Canada.
Ganguli, D. (2014). Druid: A real-time analytical data store. In
Proceedings of the 2014 ACM SIGMOD international conference
on Management of data (pp. 157–168). ACM.

123
Wireless Networks (2022) 28:1179–1187 1187

Vijay Mago is also an Associate University. He has served on the program committees of many
Professor in the Department of international conferences and workshops. Recently in 2017, he joined
Computer Science at Lakehead Technical Investment Strategy Advisory Committee Meeting for
University in Ontario, Canada Compute Ontario. He has published extensively (more than 50 peer
where he teaches and conducts reviewed articles) on new methodologies based on soft computing and
research in areas including big artificial intelligent techniques to tackle complex systemic problems
data analytics, machine learn- such as homelessness, obesity, and crime. He currently serves as an
ing, natural language process- associate editor for IEEE Access and BMC Medical Informatics and
ing, artificial intelligence, Decision Making and as co-editor for the Journal of Intelligent
medical decision making and Systems.
Bayesian intelligence. He
received his Ph.D. in Computer
Science from Panjab University,
India in 2010. In 2011 he joined
the Modelling of Complex
Social Systems program at the IRMACS Centre of Simon Fraser

123

Mastering ClickHouse: High-Performance Data Analytics for Modern Applications
From Everand
Mastering ClickHouse: High-Performance Data Analytics for Modern Applications
Robert Johnson
No ratings yet
Knowledge Management MCQ
67% (6)
Knowledge Management MCQ
37 pages
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
Michael J. Folk, Bill Zoellick, Greg Riccardi - File Structures - An Object-Oriented Approach With C++-Addison-Wesley (1998)
No ratings yet
Michael J. Folk, Bill Zoellick, Greg Riccardi - File Structures - An Object-Oriented Approach With C++-Addison-Wesley (1998)
749 pages
Elastic
No ratings yet
Elastic
61 pages
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
From Everand
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
Byron Ellis
No ratings yet
What Is Elasticsearch
No ratings yet
What Is Elasticsearch
63 pages
Subject: A Glance To Elasticsearch in The Era of Analytics and Machine Learning
No ratings yet
Subject: A Glance To Elasticsearch in The Era of Analytics and Machine Learning
8 pages
Practical TimescaleDB Solutions: Definitive Reference for Developers and Engineers
From Everand
Practical TimescaleDB Solutions: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Free Writing Elasticsearch
No ratings yet
Free Writing Elasticsearch
2 pages
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
Advanced Apache Tez Techniques: Definitive Reference for Developers and Engineers
From Everand
Advanced Apache Tez Techniques: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Elastic Assignment
No ratings yet
Elastic Assignment
28 pages
Elasticsearch Research Paper
No ratings yet
Elasticsearch Research Paper
5 pages
ElasticSearch IEEE Format1
No ratings yet
ElasticSearch IEEE Format1
3 pages
Couchbase Essentials: Definitive Reference for Developers and Engineers
From Everand
Couchbase Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Article About Elasticsearch
No ratings yet
Article About Elasticsearch
5 pages
ElasticSearch Interview Questions
No ratings yet
ElasticSearch Interview Questions
24 pages
Elasticsearch Engineering in Practice: Definitive Reference for Developers and Engineers
From Everand
Elasticsearch Engineering in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Big Data Training
No ratings yet
Big Data Training
244 pages
Sqoop Essentials: Definitive Reference for Developers and Engineers
From Everand
Sqoop Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Efficient Parallel Computing with Dask: Definitive Reference for Developers and Engineers
From Everand
Efficient Parallel Computing with Dask: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Shri Shiva Bharatam - Nivaskara Kavindra Paramananda
No ratings yet
Shri Shiva Bharatam - Nivaskara Kavindra Paramananda
131 pages
Data Lakes & Pipelines: A Modern Azure Guide
From Everand
Data Lakes & Pipelines: A Modern Azure Guide
Kameron Hussain
No ratings yet
Databases: System Concepts, Designs, Management, and Implementation
From Everand
Databases: System Concepts, Designs, Management, and Implementation
Jonathan Rigdon
No ratings yet
Comprehensive Guide to Azure HDInsight: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to Azure HDInsight: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Mastering Apache Iceberg: Managing Big Data in a Modern Data Lake
From Everand
Mastering Apache Iceberg: Managing Big Data in a Modern Data Lake
Robert Johnson
No ratings yet
Mastering Elasticsearch: A Comprehensive Guide
From Everand
Mastering Elasticsearch: A Comprehensive Guide
Brett Neutreon
No ratings yet
Efficient Data Querying with Drill: Definitive Reference for Developers and Engineers
From Everand
Efficient Data Querying with Drill: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Redshift Essentials: Definitive Reference for Developers and Engineers
From Everand
Redshift Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Databricks Platform Essentials: Definitive Reference for Developers and Engineers
From Everand
Databricks Platform Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Advantage of Elasticsearch and Alternatives To Elasticsearch - GH
No ratings yet
Advantage of Elasticsearch and Alternatives To Elasticsearch - GH
4 pages
Apache Hive Handbook: Query, Analyze, and Optimize Big Data
From Everand
Apache Hive Handbook: Query, Analyze, and Optimize Big Data
Robert Johnson
No ratings yet
DataHub Engineering and Architecture Reference: The Complete Guide for Developers and Engineers
From Everand
DataHub Engineering and Architecture Reference: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
RisingWave for Real-Time Data Processing: The Complete Guide for Developers and Engineers
From Everand
RisingWave for Real-Time Data Processing: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Comprehensive Guide to Hive Architecture and Query Language: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to Hive Architecture and Query Language: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
PrestoDB in Practice: Definitive Reference for Developers and Engineers
From Everand
PrestoDB in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Aerospike Architecture and Implementation: Definitive Reference for Developers and Engineers
From Everand
Aerospike Architecture and Implementation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Mastering Delta Lake: Optimizing Data Lakes for Performance and Reliability
From Everand
Mastering Delta Lake: Optimizing Data Lakes for Performance and Reliability
Robert Johnson
No ratings yet
Elasticsearch: Getting Started With Elasticsearch
No ratings yet
Elasticsearch: Getting Started With Elasticsearch
6 pages
Superset Data Exploration and Analysis Framework: Definitive Reference for Developers and Engineers
From Everand
Superset Data Exploration and Analysis Framework: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Ontotext GraphDB in Practice: The Complete Guide for Developers and Engineers
From Everand
Ontotext GraphDB in Practice: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Iceberg Table Formats and Analytics: Definitive Reference for Developers and Engineers
From Everand
Iceberg Table Formats and Analytics: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Elasticsearch Blueprints - Sample Chapter
No ratings yet
Elasticsearch Blueprints - Sample Chapter
24 pages
Snowflake Data Platform Engineering: Definitive Reference for Developers and Engineers
From Everand
Snowflake Data Platform Engineering: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Dataproc Administration and Engineering Solutions: Definitive Reference for Developers and Engineers
From Everand
Dataproc Administration and Engineering Solutions: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
CloverDX Design and Integration Solutions: Definitive Reference for Developers and Engineers
From Everand
CloverDX Design and Integration Solutions: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
InfluxDB Essentials: Definitive Reference for Developers and Engineers
From Everand
InfluxDB Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Architecting Real-Time Analytics with Druid: Definitive Reference for Developers and Engineers
From Everand
Architecting Real-Time Analytics with Druid: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Content Technologies
No ratings yet
Content Technologies
54 pages
Azure Synapse Analytics Solutions: Definitive Reference for Developers and Engineers
From Everand
Azure Synapse Analytics Solutions: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
CrateDB for IoT and Machine Data: The Complete Guide for Developers and Engineers
From Everand
CrateDB for IoT and Machine Data: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Rsync Solutions: Definitive Reference for Developers and Engineers
From Everand
Rsync Solutions: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Introduction To Elasticsearch
No ratings yet
Introduction To Elasticsearch
17 pages
Session 1
No ratings yet
Session 1
48 pages
Teradata Architecture and SQL Essentials: Definitive Reference for Developers and Engineers
From Everand
Teradata Architecture and SQL Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Elasticsearch Qs
No ratings yet
Elasticsearch Qs
3 pages
Practical NetCDF Techniques: Definitive Reference for Developers and Engineers
From Everand
Practical NetCDF Techniques: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
DBMS MASTER: Become Pro in Database Management System
From Everand
DBMS MASTER: Become Pro in Database Management System
Ummed Singh
No ratings yet
Efficient Data Preparation with AWS Glue DataBrew: Definitive Reference for Developers and Engineers
From Everand
Efficient Data Preparation with AWS Glue DataBrew: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
AWS Timestream Data Management and Analysis: Definitive Reference for Developers and Engineers
From Everand
AWS Timestream Data Management and Analysis: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
WP Incremental Search Revisited
No ratings yet
WP Incremental Search Revisited
19 pages
Oro Commands List
No ratings yet
Oro Commands List
8 pages
Vardhaman - Workshop
No ratings yet
Vardhaman - Workshop
51 pages
T.Aditya Knowledge Transfer Document: Version No. 1
No ratings yet
T.Aditya Knowledge Transfer Document: Version No. 1
20 pages
Study and Research Skills Course Reading
No ratings yet
Study and Research Skills Course Reading
4 pages
Ugc Parper 2 2009
No ratings yet
Ugc Parper 2 2009
91 pages
Mini Project 3
No ratings yet
Mini Project 3
1 page
Splunk and Sysmon
No ratings yet
Splunk and Sysmon
18 pages
Elasticsearch Tutorial
100% (3)
Elasticsearch Tutorial
82 pages
Lecture3 Tolerant Retrieval Handout 6 Per
No ratings yet
Lecture3 Tolerant Retrieval Handout 6 Per
8 pages
Restructured Syllabi For M. Sc. Tech. Course
No ratings yet
Restructured Syllabi For M. Sc. Tech. Course
30 pages
Chapter 8, Internet Marketing: Outline 8
No ratings yet
Chapter 8, Internet Marketing: Outline 8
39 pages
IRS Imp
No ratings yet
IRS Imp
76 pages
IRS Unit Wise Important Questions
No ratings yet
IRS Unit Wise Important Questions
3 pages
8 Internet and World Wide Web
No ratings yet
8 Internet and World Wide Web
76 pages
Natural Language Processing Notes
No ratings yet
Natural Language Processing Notes
26 pages
Academic Search Engine Optimization (ASEO) - Optimizing Scholarly Literature For Google Scholar and Co
No ratings yet
Academic Search Engine Optimization (ASEO) - Optimizing Scholarly Literature For Google Scholar and Co
7 pages
AI Powered Search Engine Project Report
No ratings yet
AI Powered Search Engine Project Report
31 pages
Short Guide To Indexing
No ratings yet
Short Guide To Indexing
12 pages
Adams 2020 Training 701 Coursenotes 2up
No ratings yet
Adams 2020 Training 701 Coursenotes 2up
184 pages
Dm-Lab Manual
No ratings yet
Dm-Lab Manual
21 pages
DWDM - Unit - VIII
No ratings yet
DWDM - Unit - VIII
32 pages
Data Storage and Retrieval
No ratings yet
Data Storage and Retrieval
5 pages
244256-Exabeam Security Content in The Legacy Structure-Pdf-En
No ratings yet
244256-Exabeam Security Content in The Legacy Structure-Pdf-En
142 pages
YouTube Upload Checklist VIDISEO
100% (6)
YouTube Upload Checklist VIDISEO
8 pages
Iran Hostage Crisis: Nicholas Young Junior Division Individual Webpage
No ratings yet
Iran Hostage Crisis: Nicholas Young Junior Division Individual Webpage
3 pages
Search Engine Description
No ratings yet
Search Engine Description
17 pages
T 01
100% (1)
T 01
1 page

A Framework For Social Media Data Analytics Using Elasticsearch and Kibana

Uploaded by

A Framework For Social Media Data Analytics Using Elasticsearch and Kibana

Uploaded by

Wireless Networks (2022) 28:1179–1187

A framework for social media data analytics using Elasticsearch

Published online: 11 December 2018

Keywords Social media Big data Real-time analysis Elasticsearch Visualization

1 Introduction Security solutions such as encrypted searching are not

Fig. 1 Framework for real-time analysis using Elasticsearch

Fig. 2 Elasticsearch cluster architecture hosted on the HPC at Lakehead University

Table 2 Master and data node

Table 3 Elasticsearch node configuration file features

cluster.name It is the name of cluster where present node will join.

suitable for the analysis of large text data in real-time

3.3 Visualization dashboard

At present, the monitoring framework described in this

As Elasticsearch is designed to be used for real-time

Marcos [6] suggests that cloud computing is elastic in

You might also like