0% found this document useful (0 votes)
55 views9 pages

Modelling and Simulation of ElasticSearch Using CloudSim Final

This document summarizes a research paper that models and simulates the ElasticSearch distributed search engine using the CloudSim simulation framework. The authors extend CloudSim to model the key components of a search engine - crawling, indexing, and query processing. They develop a simulation based on a real ElasticSearch deployment at Linknovate.com. An experimental evaluation compares the simulated and actual query response times, precision, and resource utilization, finding the simulation can accurately predict performance at different scales in a precise and efficient manner. The results can help ElasticSearch users manage scalability and infrastructure requirements.

Uploaded by

amira
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views9 pages

Modelling and Simulation of ElasticSearch Using CloudSim Final

This document summarizes a research paper that models and simulates the ElasticSearch distributed search engine using the CloudSim simulation framework. The authors extend CloudSim to model the key components of a search engine - crawling, indexing, and query processing. They develop a simulation based on a real ElasticSearch deployment at Linknovate.com. An experimental evaluation compares the simulated and actual query response times, precision, and resource utilization, finding the simulation can accurately predict performance at different scales in a precise and efficient manner. The results can help ElasticSearch users manage scalability and infrastructure requirements.

Uploaded by

amira
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/337928637

Modelling and Simulation of ElasticSearch using CloudSim

Conference Paper · December 2019


DOI: 10.1109/DS-RT47707.2019.8958653

CITATIONS READS
8 658

7 authors, including:

Malika Bendechache Sergej Svorobej


National University of Ireland, Galway Trinity College Dublin
52 PUBLICATIONS   521 CITATIONS    27 PUBLICATIONS   333 CITATIONS   

SEE PROFILE SEE PROFILE

Patricia Takako Endo Manuel Noya


Universidade de Pernambuco Linknovate
195 PUBLICATIONS   1,560 CITATIONS    10 PUBLICATIONS   77 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

RECAP - Reliable Capacity Provisioning and Enhanced Remediation for Distributed Cloud Applications View project

RECAP - Reliable Capacity Provisioning and Enhanced Remediation for Distributed Cloud Applications View project

All content following this page was uploaded by Malika Bendechache on 14 December 2019.

The user has requested enhancement of the downloaded file.


Modelling and Simulation of ElasticSearch using
CloudSim
Malika Bendechache∗ , Sergej Svorobej∗ , Patricia Takako Endo∗‡ , Manuel Noya Mario† ,
M. Eduardo Ares† , James Byrne∗ , Theo Lynn∗
∗ Dublin
City University (DCU), Dublin, Ireland
Email: {malika.bendechache,sergej.svorobej, theo.lynn}@dcu.ie
† Linknovate, Santiago de Compostela, Spain

Email:[email protected]
‡ Universidade de Pernambuco, Recife, Brazil

Email: [email protected]

Abstract—Simulation can be a powerful technique for evalu- can be a powerful technique for evaluating the performance of
ating the performance of large-scale cloud computing services in large-scale cloud computing services in a relatively low cost,
a relatively low cost, low risk and time-sensitive manner. Large- low risk and time-sensitive manner [7], [8].
scale data indexing, distribution and management is complex to
analyse in a timely manner. In this paper, we extend the CloudSim Search engines comprise of three major architectural com-
cloud simulation framework to model and simulate a distributed ponents - web crawling, indexing, and query processing,
search engine architecture and its workload characteristics. To all of which contribute to the scalability and efficiency of
test the simulation framework, we develop a model based on online search engines [4]. To accurately depict realistic search
a real-world ElasticSearch deployment on Linknovate.com. An
experimental evaluation of the framework, comparing simulated
engine system behaviour, a simulation model must, therefore,
and actual query response time, precision and resource util- consider (a) a realistic system workload in the form of queries
isation, suggests that the proposed framework is capable of submitted by users to the search engine, (b) a virtual resource
predicting performance at different scales in a precise, accurate provisioning allocation agnostic of the complexities of the
and efficient manner. The results can assist ElasticSearch users underlying data centre hardware, and (c) a similar data flow
to manage their scalability and infrastructure requirements.
logic as the actual system implementation, under examination.
Keywords—ElasticSearch, CloudSim, Simulation, Cloud,
Workload, Query, Modelling, search engine. In this paper, we model and simulate a search engine
using Discrete Event Simulation (DES). To do so, we extend
I. I NTRODUCTION CloudSim [9], [10] with our simulation model and compare
it with KPI (Key Performance Indicator) traces collected
Search engines are a complex two-sided network connecting from a live ElasticSearch cluster deployed in a public cloud
billions of queries with billions of pages. Search engines are infrastructure by Linknovate.com. Our simulation framework
the most common method for consumers to source information supports a number of features that can help in search engine
on the Internet; in the UK, search engines are used by 94% based system deployment and provisioning decisions:
of Internet adult users, by far the most popular source for
information search [1]. In January 2019, nearly 10 billion • Modelling and simulation of a distributed data flow with
search queries were processed by Google in the US alone [2]. a hierarchical architecture;
Search result delays can lead to user frustration and result in • Custom policy implementation for distributing workload
loss of revenue [3]. The ubiquity and ease of use of search in the hierarchical architecture;
engines belie a deep layer of computational complexity to • Synchronous communication between search engine
return relevant search results in fractions of a second. Search components for data aggregation; and
engine providers rely on the efficient provision, scaling, and • Flexible modelling that can be easily adapted to integrate
optimisation of distributed compute infrastructure at hyper- with other CloudSim extensions.
scale to meet increasingly complex search functionality within The remaining of this paper is organised as follows. Sec-
and tightening service constraints [4]. tion II introduces discrete event simulation, the CloudSim
ElasticSearch (ES) [5] is a popular open source search architecture, and the ES Search Engine. Section III summarises
engine designed to be distributed, scalable, and capable of near our modelling approach for a search engine. Section IV
real-time information retrieval [6]. With large and hyper-scale presents our use case based on Linknovate.com’s deployment
systems, it is not always feasible to emulate real production of ES. Section V outlines our methodology and Section VI
environments due to the high cost of accessing large clusters presents and discusses the simulation results. Section VII
of computers, the downside risk of interfering with system briefly summarises selected related work. The paper concludes
performance, and logistical issues related to testing new algo- with a summary of the paper and a discussion of future work
rithms with no access to actual online query traffic. Simulation in Section VIII.
II. BACKGROUND • Distributed: Indices can be divided into shards (a chunk
A. Discrete Event Simulation: CloudSim of data), with each shard capable of having any number
of replicas. Routing and re-balancing operations are done
Discrete Event Simulation (DES) is a system modelling
automatically when new documents are added.
concept wherein the operation of a system is modelled as a
• High Availability: ES can form a cluster containing
chronological sequence of events [11]. DES-based decision
multiple copies of distributed shards providing error-
support processes can be divided into three main phases:
resilient data storage. If any error is detected, it will
modelling, simulation, and finally results analysis. During the
automatically remove the failed nodes and re-organise
modelling phase, a simulated system is defined by grouping
itself to make sure data is safe and accessible.
interacting entities that serve a particular purpose together in
• Full Text Search: It provides full query-based search
to a model. Once the representative system models are created,
capabilities using the Lucene information retrieval library.
the simulation engine orchestrates a time-based event queue,
• Document Oriented: ES uses NoSQL database to store
where each event is admitted to the defined system model in
data or documents as objects in JSON format. All doc-
sequence. An event represents actions happening in the system
uments are indexed by default, thus providing results at
during operation time. Depending on the event type, the system
very fast speeds.
reaction is simulated, and associated metrics captured. These
• Schema-free: ES automatically detects the data structure,
metrics are collected at the end of the simulation for results
data types, and indexes the data accordingly. Users can
analysis. Therefore, the system behaviour can be examined
also define their own mapping and can change, if re-
under different conditions. Using DES is beneficial in a
quired. ES provides an automatic conflict resolution by
complex, large scale, non-deterministic system environment
versioning any changes within stored documents.
where the system definition using mathematical equations may
• Scalability: The ES server can start with a single node
no longer be a feasible option [12].
and can be scaled horizontally depending upon concrete
There are many different DES frameworks that have been
requirements. More nodes can be added to the cluster
developed specifically for cloud computing and that provide
dynamically if more capacity is needed.
a range of useful modelling features [13]. For this study we
have chosen the CloudSim modelling framework, the most
popular cloud computing simulation framework, due to its III. S IMULATING A S EARCH E NGINE IN A P UBLIC C LOUD
proven capability in simulating different cloud topologies and
In order to simulate a workload in the ES search engine, we
application architectures and its appropriateness for the scale
model it using the CloudSimPlus DES framework. Any task
envisioned in the use case [14]. CloudSim has two key
or event occurring in CloudSim is defined by a cloudlet which
features relevant to our paper. Firstly, it has a virtualisation
represents a submitted job. Therefore, in order to simulate
engine that aids in the creation and management of multiple,
the ES workload, we model each query as a set of cloudlets
independent, and co-hosted virtualised services on data centre
flowing through the nodes in the system (see Figure 2). Our
nodes. Secondly, it is flexible and allows one to switch
simulation sets a parameter (# DN/Q) that represents the
between space-shared and time-shared allocation of processing
number of data nodes used to serve a query.
cores to virtualised services. These features speed up the
Providing that a node has the capacity to run a cloudlet,
development of new application provisioning algorithms for
the execution time of the cloudlet is based on: (i) the total
cloud computing [15].
computational budget required by the cloudlet, (ii) the CPU
CloudSim comprises two layers: (i) the Simulation layer
of the VM, (iii) the number of cores the cloudlet is able to use
provides support for modelling and simulation of virtualised
in parallel, and (iv) the amount of CPU and RAM instructions
Cloud-based data centre environments [15], and (ii) the User
the cloudlet is able to use at any given time.
Code layer exposes basic entities for hosts, applications, VMs,
number of users, application types, and broker scheduling The ES search engine has a distributed architecture with
policies [15]. Given its popularity, CloudSim has been ex- specific characteristics of parallel request processing and
tended significantly since the first version (e.g., [16]–[19]). In aggregation behaviour. Existing CloudSim models for load
particular, CloudSimPlus [20] improved several engineering distribution only support sending cloudlets from one sender to
aspects, such as code maintainability, reusability and extensi- one destination VM at pre-defined times creating one-to-one
bility thereby enabling greater accuracy, usage simplicity, and mapping between a cloudlet and a processing VM. To simulate
extension facility. As such, we make use of CloudSimPlus in the search engine behaviour, we extended the CloudSim frame-
this paper. work to take into account the distributed system behaviour of
both the ES architecture and the workload. The ES workload
B. ElasticSearch is distributed based on different criteria (e.g., the data shard
ElasticSearch (ES) is an open source search engine which distribution, the frequency access to a particular type of data
can provide distributed and real time searching capabilities. It that resides in a particular node in the system). This leads
is known as a document database for implementing Apache to different workloads on the data nodes. In our simulation,
Lucene [21] as a back-end for document parsing and structur- we present the workload as a probability distribution of the
ing [22]. ES has the following features: number of cloudlets running on every data node.
The following modelling functionalities were added to structure and identify entities and semantic relations. The
CloudSim: online layer, Processing and Indexing, is done over a virtual
• Single user queries consists of multiple cloudlets which cluster of search nodes based on ES. Finally, the Web and
can be processed in parallel or sequentially by available Search layer is where user queries execute several internal
VMs. queries over Linknovate.com indices, retrieving the data to be
• The ability to send cloudlets from one sender to several shown in the User Interface (UI). User queries are received by
receiving VMs at the desired time (one-to-many). the virtual nginx web server that also renders the results pages
• A synchronisation point in the model, where a defined (see Figure 1). In this paper, we are focusing on simulating the
search engine application component (parent) that gen- online virtual layers (web/search and ES cluster) of the Lin-
erates multiple cloudlets (children) and waits for their knovate.com search engine. Figure 1 represents an overview
execution to finish before continuing to process a user of the virtualised layer of the Linknovate.com architecture.
query.
• Allowing simulation users to design their own rules for
workload distribution within cloud deployed distributed
data systems. In this work, we simulate the workload
distribution of an ES node as per the use case studied in
section IV.
The CPU and the RAM usages are calculated automatically
by the CloudSim simulator. However, the default CloudSim
only logs/collects the evolution of the CPU usage of the VMs.
CloudSim takes as parameters how much CPU and RAM
are consumed by a cloudlet and, based on these parameters,
CloudSim calculates how long the cloudlet is running, and
if many cloudlets are running at the same time on the VM,
cloudsim will sum the CPU and RAM consummations of
this VM. Every time there is a change in CPU or RAM
consumption, CloudSim will create an event and save it in
a log. CloudSim has different configurations for the amount
of CPU and RAM a cloudlet is able to use at a time (i.e. full,
absolute or relative amount). Full corresponds to a full CPU Fig. 1. An overview of the Linknovate.com online architecture.
utilisation by the cloudlet; absolute corresponds to a fixed CPU
or RAM amount defined by the user; and relative refers to a
The deployed search service stack of Linknovate.com con-
percentage usage of the CPU or RAM by the cloudlet. In our
sists of a web server where the users input their queries and
simulation, the amount of CPU and RAM instructions used
an ES cluster which is responsible for the search and returning
by a cloudlet is set to absolute.
the response to the user query. The ES cluster consists of an
IV. U SE C ASE : L INKNOVATE . COM ES node and data nodes as shown in Figure 1.
In order to show the results of our search engine simulation, The ES node is responsible for: (i) passing and distributing
we took a real use case of the ES search engine deployment at the queries among the data nodes; (ii) coordinating and aggre-
Linknovate.com. Founded in 2012 in Spain, Linknovate.com gating the search results of different data nodes; (iii) returning
provides a business intelligence service to its clients. Lin- the query result to the web server which in turn will return
knovate.com clients primarily access competitive intelligence it to the user. The data nodes are responsible for storing and
through a discovery engine deployed on the Microsoft Azure processing old and fresh data.
cloud, as opposed to a classic search engine. Advanced data Our simulation model reflects the behaviour of the real
processing at large scale is one of Linknovate.com’s core ES-based system deployed in a public cloud. As we can see
activities. By deploying ES, Linknovate.com harvests metadata from Figure 2, when a query is launched, a set of cloudlets
(e.g., authors name, affiliation, abstract etc.) from multiple are generated and executed in sequential manner; the first
sources, not just publications (Elseviers Scopus) or patent anal- cloudlet is executed at a web server, then the second cloudlet is
ysis (Thomson Reuters) but also more up-to-date sources like executed at the ES node. From the ES node, a set of cloudlets
conference proceedings, presentations, grants (e.g. CORDIS) (which is less or equal to the number of data nodes) are
specialised blogs (e.g. Clean Technica) and specialised outlets distributed and executed at data nodes. Afterwards, another
(e.g. MIT Tech Review). cloudlet is executed again at the ES node to merge the partial
Linknovate.com manages vast amounts of information results coming back from the data nodes. Finally, a last
throughout different offline and online layers. The off-line cloudlet is going from the ES node to the web server as a
layer, Data Acquisition comprises several pre-processing com- response to the user query. Therefore, the total number of
ponents working in parallel over raw data to homogenise cloudlets that our simulation generates in order to model a
operator) that reflects the weight of the task it is meant to
represent. On the other hand, every CPU core is only capable
of executing a maximum number of CPU units per second
relative to its frequency and other performance characteristics.
We assume in our simulation that machines are equipped with
3000MHz CPU cores and we consider in our scenario that
these CPU cores are able to run up to 3000 workload CPU
units per second (i.e., one workload CPU unit per 1 MHz). If
we set the CPU consumption of every cloudlet to 10% of the
VM CPU capability (i.e., 300 CPU units per second), a VM
would be able to fully execute a cloudlet within U300 (c)
second
(assuming no preemption in the scheduling).
While we define the number of CPU units for all the
cloudlets running in the WS and ES as 30, we assign the
Fig. 2. Modelling the workload in ES using CloudSim
number of CPU units for the cloudlets in the data nodes
dynamically based on the real workload that we have obtained
from Linknovate.com. For every cloudlet c that runs on a data
query load is Cloudlet no = 4 + n, where n is the number node and belonging to aquery q with a query response time
of data nodes queried by ES. RT(q), we define the number of CPU units for c as shows in
In terms of number of messages exchanged between nodes Equation. 3. After computing the total CPU units required for
in the ES architecture, the total number of messages is the query q over all the cloudlets, we take out the CPU units
represented by the total arrows in Figure 2. We have in total for the cloudlets on WS and ES, leaving only the CPU units
M essages no = 4 + 2 ∗ n messages, where n is the total for the data nodes. Note that after filtering, the Linknovate.com
number of data nodes queried by ES. The number of messages workload does not contain any query q with a response time
depends on the synchronisation at the ES node, where the ES lower than 0.4s.
node waits to receive all the messages from all data nodes in
order to proceed with the aggregation (see Figure 2).
U (c) = 300 × RT (q) − 4 × 30 (3)
Query response time is calculated as follows:
ResponseT ime = T ime(EndOf LastCloudlet) In addition to the CPU consumption, CloudSim requires that
(1) the amount of RAM that will be utilised by every cloudlet
−T ime(QueryArrival)
throughout its execution to be provided. In our experiments,
Where EndOf LastCloudlet corresponds to the final we allocate 200MB to all the cloudlets.
cloudlet at the web server that returns the query result to the As we can see in Figure 1, Linknovate.com consists of eight
user. T ime(QueryArrival) corresponds to the time the query VMs nodes in total, one web server VM, one ES VM, and six
arrives at the system (web server). data nodes VMs.
We can also calculate the Effective Time which is the time
spent doing computations (sum of cloudlet execution times), TABLE I
without considering the wasted time (networking and waiting L INKNOVATE . COM VM CHARACTERISTICS
times). Note that the sum of cloudlet processing times is given VM-ID CPU (Cores) RAM (GB) STORAGE (GB)
by summing all the times for : cloudlets of the Web Server Web-Server VM0 8 28 1081
(WS) (two cloudlets), and ES node cloudlets (two cloudlets). ES-Client VM1 16 112 1081
DataNode1 VM2 8 28 1081
Given that the data node cloudlets are running in parallel, DataNode2 VM3 8 28 1081
we only sum the processing time of the longest (latest) data DataNode3 VM4 8 28 1081
DataNode4 VM5 8 28 1081
node (DN) cloudlet (see Figure 2). Therefore, as shown in DataNode5 VM6 8 28 1081
Equation 2, we have a total sum of five cloudlet processing DataNode6 VM7 8 28 1081
times.
Table I summarises the characteristics of the different nodes
Ef f ectiveT ime = T ime(W S QueryCloudlet ) (VMs) forming the Linknovate.com topology.
+T ime(W S ResultCloudlet ) In terms of workload distribution among the data nodes, our
analysis of the six data nodes in the Linknovate.com architec-
+T ime(ES QueryCloudlet ) (2) ture showed that they have different workload distributions
+T ime(ES ResultCloudlet ) (number of cloudlets run on each data node over a period of
n
+ Max (T ime(DNCloudletn )) time) that follow these probabilities respectively: 0.13, 0.14,
i=1 0.16, 0.16, 0.18, and 0.23. These probabilities were calculated
In CloudSim, every cloudlet c is set to run for a predefined based on a real data set provided by Linknovate.com. See
number of CPU units U (c) (to be defined by the simulation Section V for more details about the data set.
V. M ETHODOLOGY 0.9996, a positive correlation between the values reported
The goals of our simulation are: (a) to evaluate the query by the simulation model and the real system traces. We
response time, and resource consumption (CPU and memory) obtained a small relative error of 0.0354; this indicates how
under different scenarios, and (b) to analyse the scalability of close the query response time computed by the simulation
the simulator. (SimT ime) model is relative to the actual query response
Linknovate.com deploys their infrastructure in a public time (ActualT ime). The relative error is computed as:
cloud environment (i.e. Microsoft Azure). Therefore, in this er = m /x where
simulation, we neither focus on the physical machines nor on v
u n
the network physical topology. u X
m = t( (SimT imen − ActualT imen )2 /n
To run our experiments, we used a real query data set n=1
provided by Linknovate.com. This data set is composed of
1185 queries submitted to Linknovate.com search engine be- where n is the number of queries (size of the sample) and x
tween 11:23:00 and 13:23:00 on June 07, 2018. We first pre- is the average of actual time.
processed the query log file by excluding errors or invalid
queries and kept only the valid search queries (OK queries B. Response Time vs Query Traffic
with HTTP Status=200). We analyse the performance of the Linknovate.com system
In order to analyse the CPU and RAM consumption of by running the simulation with different workloads (query
the Linknovate.com system, we explore four scenarios as traffic) to see how much traffic the Linknovate.com system can
described in Table II. handle. We monitor the query response time while varying the
number of queries per second received by the system.
TABLE II
S CENARIOS STUDIED BASED ON NUMBER OF DATA NODES SERVING A
Figure 4 represents a box plot (min, max, lower quartile,
QUERY. upper quartile) that shows the query response time based on
the number of queries per second the system receives.
Scenario Node Distribution Figure 4 shows that with query traffic of up to 80 q/s, the
I 2 data nodes/q query response time for all the queries is the same and it is
II 3 data nodes/q equal to having one query per second. That means the system
III 4 data nodes/q
IV 5 data nodes/q is capable of handling 80 q/s with no waiting time.
Then, between 80 and 120 q/s, we notice a slight increase
in the response time. However, this increase affects all the
The scenarios are defined based on the number of data nodes
queries in the same way (i.e. no difference in response time
assigned to serve incoming queries. Given a query q, the set of
between the queries).
nodes used to process it is randomly chosen. Therefore we run
As we increase the query traffic beyond 120 q/s, we start to
each scenario 30 times and calculated the average consumption
notice a divergence in query response times. Between 130 and
across the 30 iterations for each VM. We also calculated the
170 q/s, we see that the system manages to execute several
average of the average consumption of all the VMs in the
queries within a short time by delaying the excess of queries.
system to reflect the CPU and RAM consumption of the whole
However, with the increase in query traffic past 170 q/s, the
system. The standard deviation of both CPU and RAM usage
system fails to even execute a single query in a short time.
is also calculated.
VI. S IMULATION R ESULTS C. CPU and RAM utilisation
We modelled and simulated the Linknovate.com architecture We analyse the CPU and memory consumption of the
as per Section III. Linknovate.com system based on the defined scenarios in
Section V. As previously mentioned, the CPU units of both
A. Response Time Results the ES and web server are set to a constant value equal to
We compare the simulated response time of a query against 300, whereas the CPU units of the data nodes are assigned
its actual time as collected from real system traces. We dynamically based on the Linknovate.com workload.
extracted a subset of 100 valid queries from the data set used. As configured, the CPU and RAM consumption of the web
Figure 3 shows the comparison of actual and simulation server (VM0) and ES nodes (VM1) are constant across all
times across the 100 queries. As one can see, the actual time the scenarios, while the CPU and RAM consumption of the
and the simulation time are very close and they are highly data nodes vary based on number of nodes used (Figures 5
positively correlated across all the 100 queries tested. and 6). As expected, the more data nodes used by a query,
We evaluate the accuracy of our simulation model by the more CPU and RAM the system consumes. For example,
computing the Pearson correlation [23] and the relative error Scenario I shows the least consumption for all the VMs in
(er ) [24] between the simulation and the real system traces. the system. The probability distributions simulated based on
The accuracy of the query response time metric achieved the Linknovate.com workload traces are clearly shown in both
by our simulation model reported a Pearson correlation of figures.
Fig. 3. Actual query time Vs Simulation query time.

like Cloud2Sim [25] is needed for faster results. Note that


CloudSim can be easily substituted by Cloud2Sim in our work
for a faster simulation.
In summary, based on the results above, we conclude
that our simulation results are very close to the real system
measurements in terms of query response time (service time).
The analysis of CPU and memory metrics across different
scenarios shows how the system consumption responds to
changes in the number of data nodes serving a query. As
a result, this can help the company to manage their cost in
terms of both CPU and memory consumption. Furthermore,
the results also serve as feedback to Linknovate.com in terms
of how much query traffic their system can accommodate at
the same time. The company can consider increasing their
Fig. 4. Query response time achieved with different query traffic volumes. system nodes’ capacities to handle their desired traffic.
VII. R ELATED W ORK
VM7 always has the highest consumption of CPU and RAM Web search engines are complex systems. Constructing
across all the scenarios because it has the highest probability a testbed for such complex systems with a high degree
distribution (0.23) followed by VM6 with the second highest of verisimilitude is a complex, costly, resource and time-
probability (0.18). VM4 and VM5 have the same probability intensive task. To overcome these issues, simulation has been
distribution. Therefore they tend to have the same consump- introduced. Simulation frameworks provide a relatively low
tion. Finally, VM2 and VM3 report less consumption due to cost mean to model, understand and evaluate a real system
their low probabilities (0.13, 0.14, respectively). [26].
Although simulation is widely used for cloud computing
D. Simulation Framework Scalability research, there are few research articles that use simulation
In order to show the capability of our simulator in handling to model web search engine systems. Some are based on
large datasets, we run the simulator with different log files mathematical models and as such, they are complex. For
that have different volumes of queries. We conducted our example, in [27], the author modelled and simulated a search
experiments by running our extended CloudSim simulator on engine by developing two mathematical approaches - adaptive
a DELL-XPS machine with 8GB of RAM and 2.70 GHz and selective approaches. These approaches seek to express the
quad Intel Core i5-6400 processor. The user query log file characteristics of the search engine based on the constraints
is collected from the Linknovate.com system for a period in the information space.
of two days; June 07 and 08, 2018. As one can see from Other research focused on using DES to model and simulate
Figure 7, the simulator takes only about 30 minutes to simulate search engines. For instance, in [24], the authors modelled
a log file that contains up to 100k queries. In fact, up to 80k, and simulated a web search engine using the Discrete Event
the simulation time increases gradually with the number of System Specification (DEVS) formalism. The validation of
queries. After that the simulation time starts to increase faster. the proposed model was done by comparing it against an
This shows that the simulator can still handle comfortably actual MPI implementation of the WSE and a process-oriented
medium to large files. However, for a faster simulation of very simulation. DEVS was also used in [28] along with a discrete-
large files, a distributed version of the CloudSim simulator event realisation of timed coloured Petri nets (CPN), and
Fig. 5. Average CPU utilisation by the Linknovate.com system Fig. 6. Average RAM utilisation by the Linknovate.com System

CloudSim as a cloud simulator and ES as a powerful search


engine to extend the CloudSim simulator framework in order
to simulate the performance of ES search engine on a public
cloud and thus inform better decision making on ES cloud
deployments.

VIII. C ONCLUSION AND F UTURE W ORK

We have described, modelled and simulated the Elastic-


Search architecture. We proposed a new simulation model
based on extending the CloudSim simulator. We have added to
CloudSim two main features that characterise any distributed
architecture: (i) modelling one-to-many cloudlets, and (ii)
Fig. 7. Simulator scalability adding a synchronisation barrier at the ES node to allow it to
wait for all the data nodes cloudlets to finish their execution
before proceeding with aggregation of the data node results.
The simulation helps us understand how the ElasticSearch
process-oriented simulation (POS) to simulate user behaviour search engine works. The Linknovate.com search engine is
in search engines. They used a circulating tokens approach used as a use case study to validate our modelling and
to represent sequences of operations that compete for search simulation. We compared the results of our simulation in
engine resources and benchmark programs to measure the cost terms of query response time against the actual query response
of relevant operations. However, the authors of [28] only time collected from actual Linknovate.com system traces. The
focused on simulating the computational cost of the search evaluation of the accuracy of the simulation results shows no
operations. significant statistical differences between the simulated results
CloudSim is one of the most popular DES simulation and the real data. We also looked at the CPU and RAM
frameworks, it has been used extensively to model and sim- consumption of the Linknovate.com system by taking different
ulate cloud computing systems and application provisioning scenarios where we evaluate how the CPU and RAM usage
environments [29], [30]. The CloudSim toolkit supports both vary based on the number of data nodes responding to a
system and behaviour modelling of cloud system components query. This can help the company to manage their costs. We
such as data centres, VMs and resource provisioning policies. also evaluated the query traffic volumes that Linknovate.com
In fact, work in [17] looks at modelling parallel applications system can handle at a given time. This can help the company
in the cloud using NetworkCloudSim, a CloudSim extension. understand their system architecture and how they can im-
However, the focus of their implementation lies on modelling prove/scale it in order to support their desirable query traffic.
network interplay between switches and routers traffic flow For future research, we will use this simulation framework
which is the area of infrastructure management useful more to examine key areas for optimisation of ElasticSearch in the
for the infrastructure provider. While the current study looks at cloud including efficient query balancing on the search servers,
a cloud application simulation, it is agnostic to the underlying replica and shard allocations, and balancing CPU and memory
data centre network. usage of the different virtual nodes. We also plan to look
Despite the popularity of ES search engine, no articles could at how the simulator can be used to inform auto-scaling to
be identified that simulated ES performance in the cloud. cope with changing system demands including deployment
In this paper, we took advantage of the popularity of both strategies and other balancing approaches.
ACKNOWLEDGEMENT [18] M. Barika, S. Garg, A. Chan, R. N. Calheiros, and R. Ranjan, “Iotsim-
stream: Modelling stream graph application in cloud simulation,” Future
This work has received funding from the European Union’s Generation Computer Systems, vol. 99, pp. 86–105, 2019.
Horizon 2020 research and innovation programme under grant [19] A. Siavashi and M. Momtazpour, “Gpucloudsim: an extension of
agreement No. 732667 (RECAP). cloudsim for modeling and simulation of gpus in cloud data centers,”
The Journal of Supercomputing, vol. 75, no. 5, pp. 2535–2561, 2019.
[20] M. C. Silva Filho, R. L. Oliveira, C. C. Monteiro, P. R. Inácio, and
R EFERENCES M. M. Freire, “Cloudsim plus: a cloud computing simulation framework
[1] “Adults: Media use and attitudes report 2019,” https://www.ofcom.org. pursuing software engineering principles for improved modularity, ex-
uk, accessed: 2019-06-07. tensibility and correctness,” in 2017 IFIP/IEEE Symposium on Integrated
[2] “Number of explicit core search queries powered by Network and Service Management (IM). IEEE, 2017, pp. 400–406.
search engines in the united states as of january 2019 [21] M. McCandless, E. Hatcher, and O. Gospodnetic, Lucene in action:
(in billions),” https://www.statista.com/statistics/265796/ covers Apache Lucene 3.0. Manning Publications Co., 2010.
us-search-engines-ranked-by-number-of-core-searches, accessed: [22] R. Kuc and M. Rogozinski, Elasticsearch server. Packt Publishing Ltd,
2019-06-07. 2013.
[3] A. Greenberg, J. Hamilton, D. A. Maltz, and P. Patel, “The cost [23] J. Benesty, J. Chen, Y. Huang, and I. Cohen, “Pearson correlation
of a cloud: Research problems in data center networks,” SIGCOMM coefficient,” in Noise reduction in speech processing. Springer, 2009,
Comput. Commun. Rev., vol. 39, no. 1, pp. 68–73, Dec. 2008. [Online]. pp. 1–4.
Available: http://doi.acm.org/10.1145/1496091.1496103 [24] A. Inostrosa-Psijas, G. Wainer, V. Gil-Costa, and M. Marin, “Devs
[4] B. B. Cambazoglu and R. Baeza-Yates, “Scalability and efficiency modeling of large scale web search engines,” in Proceedings of the
challenges in large-scale web search engines,” in Proceedings Winter Simulation Conference 2014. IEEE, 2014, pp. 3060–3071.
of the 39th International ACM SIGIR Conference on Research [25] P. Kathiravelu and L. Veiga, “Concurrent and distributed cloudsim
and Development in Information Retrieval, ser. SIGIR ’16. New simulations,” in 2014 IEEE 22nd International Symposium on Modelling,
York, NY, USA: ACM, 2016, pp. 1223–1226. [Online]. Available: Analysis & Simulation of Computer and Telecommunication Systems.
http://doi.acm.org.dcu.idm.oclc.org/10.1145/2911451.2914808 IEEE, 2014, pp. 490–493.
[5] Elasticsearch B.V, “Open Source Search Analytics - ElasticSearch,” [26] V. Moysiadis, P. Sarigiannidis, and I. Moscholios, “Towards distributed
2019. [Online]. Available: https://www.elastic.co/ data management in fog computing,” Wireless Communications and
[6] O. Kononenko, O. Baysal, R. Holmes, and M. W. Godfrey, “Mining Mobile Computing, vol. 2018, 2018.
modern repositories with elasticsearch,” in Proceedings of the 11th [27] M. K. Nasution, “Modelling and simulation of search engine,” in Journal
Working Conference on Mining Software Repositories, ser. MSR 2014. of Physics: Conference Series, vol. 801, no. 1. IOP Publishing, 2017,
New York, NY, USA: ACM, 2014, pp. 328–331. [Online]. Available: p. 012078.
http://doi.acm.org.dcu.idm.oclc.org/10.1145/2597073.2597091 [28] M. Marin, V. Gil-Costa, C. Bonacic, and A. Inostrosa, “Simulating
[7] R. Buyya, R. Ranjan, and R. N. Calheiros, “Modeling and simulation search engines,” Computing in Science & Engineering, vol. 19, no. 1,
of scalable cloud computing environments and the cloudsim toolkit: p. 62, 2017.
Challenges and opportunities,” in 2009 international conference on high [29] R. Kumar and G. Sahoo, “Cloud computing simulation using cloudsim,”
performance computing & simulation. IEEE, 2009, pp. 1–11. arXiv preprint arXiv:1403.3253, 2014.
[8] S. Svorobej, P. Takako Endo, M. Bendechache, C. Filelis-Papadopoulos, [30] W. Long, L. Yuqing, and X. Qingxin, “Using cloudsim to model and
K. M. Giannoutakis, G. A. Gravvanis, D. Tzovaras, J. Byrne, and simulate cloud computing environment,” in 2013 Ninth International
T. Lynn, “Simulating fog and edge computing scenarios: An overview Conference on Computational Intelligence and Security. IEEE, 2013,
and research challenges,” Future Internet, vol. 11, no. 3, p. 55, 2019. pp. 323–328.
[9] S. Mehmi, H. K. Verma, and A. Sangal, “Simulation modeling of cloud
computing for smart grid using cloudsim,” Journal of Electrical Systems
and Information Technology, vol. 4, no. 1, pp. 159–172, 2017.
[10] G. T. Hicham and E. A. Chaker, “Cloud computing cpu allocation and
scheduling algorithms using cloudsim simulator.” International Journal
of Electrical & Computer Engineering (2088-8708), vol. 6, no. 4, 2016.
[11] A. M. Law, W. D. Kelton, and W. D. Kelton, Simulation modeling and
analysis. McGraw-Hill New York, 2000, vol. 3.
[12] J. Idziorek, “Discrete event simulation model for analysis of horizontal
scaling in the cloud computing model,” in Proceedings of the 2010
Winter Simulation Conference. IEEE, 2010, pp. 3004–3014.
[13] J. Byrne, S. Svorobej, K. M. Giannoutakis, D. Tzovaras, P. J. Byrne,
P. stberg, A. Gourinovitch, and T. Lynn, “A review of cloud computing
simulation platforms and related environments,” in Proceedings of the
7th International Conference on Cloud Computing and Services Science
- Volume 1: CLOSER,, INSTICC. SciTePress, 2017, pp. 679–691.
[14] T. Lynn, A. Gourinovitch, J. Byrne, P. J. Byrne, S. Svorobej, K. Gian-
noutakis, D. Kenny, and J. Morrison, “A preliminary systematic review
of computer science literature on cloud computing research using open
source simulation platforms,” in Proceedings of the 7th International
Conference on Cloud Computing and Services Science - Volume 1:
CLOSER,, INSTICC. SciTePress, 2017, pp. 565–573.
[15] R. N. Calheiros, R. Ranjan, A. Beloglazov, C. A. De Rose, and R. Buyya,
“Cloudsim: a toolkit for modeling and simulation of cloud computing
environments and evaluation of resource provisioning algorithms,” Soft-
ware: Practice and experience, vol. 41, no. 1, pp. 23–50, 2011.
[16] B. Wickremasinghe, R. N. Calheiros, and R. Buyya, “Cloudanalyst: A
cloudsim-based visual modeller for analysing cloud computing environ-
ments and applications,” in 2010 24th IEEE international conference on
advanced information networking and applications. IEEE, 2010, pp.
446–452.
[17] S. K. Garg and R. Buyya, “Networkcloudsim: Modelling parallel ap-
plications in cloud simulations,” in 2011 Fourth IEEE International
Conference on Utility and Cloud Computing. IEEE, 2011, pp. 105–113.

View publication stats

You might also like