Performance Analysis of Microservices Design Patterns
Performance Analysis of Microservices Design Patterns
Fig. 1. Performance Results of Case Study I a)Mean and 95th Percentile of the Response Time vs Number of Threads, b)Efficiency Curves Using the Mean
and the 95th Percentile of the Response Time, and c)Average Service Times for Different Number of Instances
used to purchase a ticket, and the Reservations service allows Figure 1.b gives the two efficiency plots as a function of
a customer to reserve a seat for a show time of a movie. the number of threads. We observe that the efficiency based
All content and methods provided by the online theater on the mean response time, Ei , increases from one thread to
system application are accessed from two points, a store front, 18 threads, and then from 19 to 94 threads it continuously
and an API gateway. Each microservice executes a lightweight decreases until at 94 it becomes equal to E1 . Using this plot,
REST mechanism in order to provide the necessary output to we classify the microservice implementation as high, low, and
the store front and the API Gateway. The API Gateway is not efficient, as follows. High efficiency is achieved for the
used by entities that do not need to access the visual elements. range of threads from 1 to 94, since the efficiency is over
For instance, a self-service kiosk ticketing device in a cinema 100%. Low efficiency is achieved from 95 to 141 threads,
accesses the services via the API gateway. since the efficiency is between 100% and 75%. In this case,
Figure 1.a gives the mean response time, as a function of the the service time is slightly longer, but it is still acceptable.
number of threads. The confidence intervals are also plotted as Finally, for more than 142 threads, the efficiency is less than
vertical lines. The 95th percentile and confidence interval of 75%, and the service is classified as not efficient.
the observed response times for each number of threads is also A slightly different picture emerges when we use the
given in this figure. The confidence intervals were computed efficiency plot based on the 95th percentile of the response
using the batch means method, with a batch size equal to 1000 time. High-efficiency is achieved from 1 to 83 threads, and
observations. low-efficiency is achieved from 84 and 141 threads. The high-
The 95th percentile gives us an idea of the tail of the re- efficiency range is shorter than the one based on the mean
sponse time distribution. The further away it is from the mean, response time, since we use the 95th percentile of the response
the longer the tail is. For instance, for a single thread, the time which is an upper bound. Both efficiency plots have
average response time is 3.82 ms, whereas the corresponding the same cut-off of 141 threads, that corresponds to 75%
95th percentile is 10.48 ms. Similarly, for 100 and 200 threads, efficiency considered as the lowest acceptable level for service
we have a mean response time of 399.89 ms and 1295 ms requirements.
respectively with a corresponding 95th percentiles of 2544.32 Obviously, if we want to decrease the non-efficient zone, we
ms and 5077.85 ms. These numbers indicate a large tail of need to use more than one instantiation of the microservice
the probability distribution of the response time, which is due when the number of threads exceeds 141. To that effect,
to the fact that we basically submit all the requests from all Figure 1.c gives the mean response time of the microservice
the threads as soon as possible, thus creating a huge backlog, implementation for k instantiations, where k = 1, 2, 3, 4, 5.
which in turn causes the response time of the requests towards Horizontal scaling allows the average service time to decrease
the end of the queue to increase quite dramatically. We note or remain constant as the number of threads increases. A sim-
that the percentile of a performance metric is more meaningful ilar set of curves can be obtained using the 95th percentile of
in SLAs than its corresponding mean. the response time. In a monolithic implementation, scalability
Using the mean response times for the different number of cannot be performed after a certain level even if the resources
threads, we calculated the following efficiency metric. Let Ri are enhanced. Conversely, microservice architectures offer
be the mean response time when there are i threads. Then, we scalability at a very high level by replicating microservices.
define the efficiency metric Ei , as follows: We note that the efficiency of a microservice implemen-
tation depends on the number of co-processors. Multi-core
Ei = ((R1 ∗ i)/Ri ) ∗ 100. (1) processors are appealing to microservice architectures, but
Also, using the 95th percentile values R0.95,i we calculated they incur a higher hosting cost. Another important factor is
a similar efficiency metric using the expression: the misconfiguration of RAM can cause excessive memory
swapping which leads to longer response times.
E0.95,i = ((R0.95,1 ∗ i)/R0.95,i ) ∗ 100. (2) The case study data reveal that the most ideal architecture
JOURNAL OF IEEE INTERNET COMPUTING 3
for an equally balanced scenario is The API Gateway design resized, grayscaled and watermarked image is returned to
pattern. It also provides flexibility when we want to manage re- the thread that initiated the request. All the services in this
quests from multiple channels. It facilitates the execution logic scenario were implemented using the Python OpenCV image
of synchronous communications between equally balanced editing libraries. A diagrammatic representation of the testbed
services and provides scale-up artifacts. Separation of concerns is shown in Figure 2.a.
along with distributing the load over multiple service instances In this case study, we compared the CPU and RAM
provides the ultimate black boxed service experience in a requirements of the microservice implementation against a
polyglot configuration. Polyglot programming benefits from composite implementation for three different sets of threads.
services developed with different programming languages over The results obtained are given in Table I. The column la-
various stacks. The API Gateway design is typically used beled “Chain of Responsibility” gives the RAM and CPU
as a point of contact layer among other architectures. In requirements per microservice and also the total for all three
fact, a microservices ecosystem without an API Gateway is microservices for 10, 20, and 50 threads. The column labeled
considered as a bad practice or an antipattern. A similar study “Composite” gives the RAM and CPU requirements for the
to this scenario [9] focused on what-if analysis and capacity composite implementation for the same number of threads.
planning of microservices. It is also possible to predict the The CPU and RAM for the composite implementation were
user load [10] using these analyzes and monitoring metrics. obtained experimentally by varying them until the throughput
Finally, version control can be performed with this approach of the composite implementation became equal to that of
[11]. the microservice implementation. The third column labeled
“Hosting Diff” gives the percentage by which the hosting
cost will increase if we used the composite implementation
B. Case Study 2 - The Chain of Responsibility Design Pattern
based on hosting prices from Amazon web services [12]. In
In the second case study, we implemented an image editing all considered scenarios the CPU’s utilization was 10-40%.
application using the Chain of Responsibility pattern. This The same application may require different RAM and CPU
design pattern consists of a collection of sub-services designed for different design pattern implementations. Therefore, the
to work together in order to process a request. The sub-services design pattern is an important decision that has a direct impact
are linked together sequentially, so that one sub-service’s on hosting costs. Table I shows that the cost difference for a
output becomes input to the next sub-service. In this scenario, different number of users varies between 21.25% and 33.18%.
users submit a colored image as an input and request to have it For enterprise-level applications, this difference can amount
converted to black and white. The application is triggered by to thousands of dollars. Even if a private cloud is used for
first delivering requests from the web channel or API channel hosting microservices, using an accurate design pattern for an
to the Adjuster microservice. This microservice, which is the application contributes to green computing.
first node of the chain, checks the size of the submitted image A megaservice [13] is a service that provides many function-
and resizes it if it is bigger than 1920 x 1080. The modified alities. It is an antipattern and it should be decomposed into
image is transferred to the Converter microservice in byte array multiple separated microservices. When a multiple application
form for the next operation. The second node is responsible logic developed via a single code-base is scaled up, all
for converting the image into a grayscale form. parts are scaled up even those that do not require additional
The Converter microservice passes this image to the Labeler resources. However, if each application logic is implemented
node which applies a watermark onto the image. Finally, the as a separate microservice, scale-ups are made for only the
JOURNAL OF IEEE INTERNET COMPUTING 4
TABLE I
D OCKER C ONTAINER P ERFORMANCE E VALUATION ON C OMPOSITE VERSUS C HAIN OF R ESPONSIBILITY PATTERN
needed entities, so that less resource is consumed. The ideal times. The capacity of the queue implemented using Rab-
microservice design pattern for functions that process consecu- bitMQ is limited since it is allocated a fixed amount of
tively shared data is the Chain of Responsibility design pattern. memory. In view of this, messages sent from the Reservations
In case study II, we observed that this architecture allows us sub-process to the Builder sub-process during the time that the
to scale back-end services independently, with a gain on the queue is full, are lost. In view of this, the queue size has to be
hardware usage of around 30% compared to the megaservice fixed so that to minimize the queue overflow. When we inspect
equivalent structure. The most important issue to guard against the results for the Asynchronous Messaging implementation
in this architecture, is the possibility of excessively long chains for two different configurations, we observed that packet-
that may lead to a spaghetti structure. losses are minimized with lower configurations. In the first
configuration, we allocated 8.192 GB vRAM and 4 cores of
3.5 GHz vCPUs and measured the packet loss in the case
C. Case Study 3 - The Asynchronous Messaging Design where a single thread generated 1000 requests back-to-back.
Pattern We note that the packet loss was 26% of all packets sent
The synchronous communication provided by the REST to the queue by the Reservations sub-process. Subsequently,
mechanism is simple and well-known, easy to test, and we slowly increased the vRAM and vCPU allocation until
firewall-friendly. However, it is not ideal for some scenarios no packet loss was observed. This occurred when we have
because of the blocking of the clients. In view of this, asyn- allocated 122.288 GB vRAM and six 4 GHz vCPUs.
chronous messaging and event-driven communications can We compared Case 3 to an equivalent composite imple-
be implemented to propagate changes between microservices. mentation of the Reservations and Builder microservices. Like
The asynchronous messaging design pattern is preferred when the composite structure in Case study 2, the megaservice that
there is a large volume of data that needs to be processed houses both Reservations and Builder microservices’ applica-
and also when no immediate response is expected. In this tion logic should be scaled-up according to the needs of the
case study, we retained the Reservations microservice from the Builder part, since it performs more complex computations.
first case study and we added a Builder node that generates This causes an additional unwanted resource allocation to
pdf documents. The Reservations microservice transfers user Reservations part as well. The input buffer of each sub-
information, such as, name, surname, movie name, show time, process is also limited due to the finite amount of memory
and seat number, to the Builder which generates a ticket allocated at configuration time. For this implementation, we
in pdf format. The case study was implemented using the kept increasing both the vRAM and vCPU allocation until no
Asynchronous Messaging design pattern, as shown in Figure packets were lost at the Builder. The resulting configuration
2.b. A queue structure is introduced between the Reservation is 131.073 GB vRAM and ten 4.0 GHz vCPUs. We observe
and the Builder microservices using RabbitMQ since the that the Asynchronous Messaging design pattern requires less
service time of a request in the Builder is much longer than memory and CPU in order to achieve zero loss.
that in the Reservation. A queue in RabbitMQ is an ordered An important constraint of data centers is energy consump-
collection of messages which are enqueued and consumed in tion. An improper software architecture may increase the CPU
a FIFO manner. The scenario ends with the transfer of the utilization, thus increasing the energy consumption. Barroso
pdf-formatted ticket to the client. The reason we implemented and Holzle [14] observed in 2007 that processors in data
RabbitMQ instead of any other message broker products is centers operate mostly within a utilization range of 10% to
because of it provides the state of the messages (consumed, 50%. Today, for competitive data centers, this figure is up
rejected, acknowledged, etc.) in the ecosystem. Most of the to 60%. The use of Dynamic Voltage and Frequency Scaling
message brokers are stateless and assume that the consumer (DVFS) power management mechanisms lead to significant
keeps track of what has been consumed. Furthermore, Rab- energy reductions (up to 40%) and power savings (up to 20%)
bitMQ supports several protocols such as MQTT, STOMP etc. [15] for the same utilization levels. In our experiments, we
for processing messages in addition to AMQP. observed that the presence of a messaging system can help
The goal of Case Study 3 is to demonstrate the usefulness to maintain an optimal CPU utilization. In Figure 3 we give
of the Asynchronous Messaging design pattern for transaction- the CPU utilization with and without the messaging queue for
based scenarios involving processes with disparate service the first 100 sec. In the graph on the left, we observe a 100%
JOURNAL OF IEEE INTERNET COMPUTING 5
utilization for the case when there is no messaging queue in and scope of the tests and also inspect the different functional
front of the Builder. In the graph on the right, we see that pieces of the application. In addition, load tests should be
the CPU settles down to around 50%. This was achieved by performed periodically to asses the application topology. Load
configuring appropriately the egress rate of RabbitMQ via its tests are useful to find the bottlenecks in the ecosystem and
management plugin. Operating at 100% utilization causes an help to configure the accurate number of running instances
unwanted energy consumption. of the entities. It is important to use SLA and also have
some knowledge of the user behavior so that to characterize
better the traffic amounts and usage patterns, which enables
us to perform more realistic tests. However, almost all load
testing products (e.g. JMeter) may produce false negatives due
to caching and session management configurations. Software
architects can also benefit from simulators like Hoverfly or
Vagrant to evaluate different configurations without stopping
or interfering the ecosystem’s traffic. Running such simula-
tions are necessary in order to expose the nonscalable modules
of the ecosystem and prevent meltdowns associated with
high amounts of user traffic in live-usage. Finally, testing
Fig. 3. Builder Microservice Utilization for the API Gateway and Asyn- the resiliency of the microservices uncloaks the potential
chronous Messaging Patterns a) Utilization w/o Queue, b) Utilization w Queue infrastructure failures. Netflix’s open-source application Chaos
Monkey is a good alternative to observe the destructive behav-
Apart from orchestrating an efficient operation of consumer ior of underlying resources. No matter which architecture is
services, the messaging system can handle requests from preferred for the application, the entire ecosystem should be
various channels with different operating logic. Similar to orchestrated with a system like Kubernetes, Mesosphere or
the FIFO approach, priority queues can be defined which Docker Swarm along with an infrastructure-level monitoring
allow different priorities according to different user types. system. Scalability and performance are two different metrics,
In practice, RabbitMQ implements a queue over a single but they are intimately entwined. Therefore, monitoring the
processor. This is because real-life scenarios are implemented system will give us valuable and critical insights. Nevertheless,
in multiprocessor environments with multiple instances of a favorable scalability and performance are not sufficient
consumers and queues. It should be noted that in general criteria for adopting a microservices architecture, unless we
having a single queue in the ecosystem is considered an anti- gain agility in the development team and the deployment
pattern and it has a negative effect on resource utilization. infrastructure can become fully automated.
If the queue service is insufficient and cannot be mirrored,
it may be possible to respond fairly to requests from all the IV. C ONCLUSIONS
channels using a Round-and-Robin approach. The benefits of Successful microservice implementations in enterprises,
using RabbitMQ is also documented by Hong et. al. [16]. In such as Netflix and Spotify, provide a motivation for other
Case Study III, we observed that, with the use of a message organizations to adopt this technology. However, the needs
queue like RabbitMQ, the most important gain is redundancy of the organization should be taken into consideration while
via persistence. Such messaging systems allow requests to be choosing a microservices architecture. In general, there is no
stored in the system until they are processed by appropriate single microservices pattern that is better than the others.
microservices. They are also useful in the case where microser- Rather, each design pattern performs better in different sce-
vices do not have access to adequate CPU and RAM resources. narios. Complex architectures come with long-term develop-
This architecture allows a more energy efficient ecosystem for ment cycles and additional license expenses for third-party
hosting non-time-critical services. Also, the use of a queue applications. In addition, the employment of more qualified
allows non-real time operations to be batched and executed developer and test personnel in the team is another factor that
together. For example, it is more efficient to batch 100 database increases the total cost. However, it should not be forgotten
commands and execute them all together rather than do them that these architectures increase productivity and drive down
one at a time. In addition, different queueing models can be costs because they are energy efficient in the long term. The
used so that to meet business needs. easiest way to evaluate the success of microservices is to
ensure that they meet or exceed monolithic pre-migration
III. L ESSONS L EARNED performance. Microservices architectures are still immature,
The impact of adopting a microservices architecture cannot and therefore, best practices of their use are critical to their
be seen immediately. In order to make a complete evolution, successful adaptation and incorporation in the future of SOA.
practitioners should carry out the following steps. The unit
testing action is the first step for verifying that the services R EFERENCES
work as intended. This API functionality test covers overall [1] O. Zimmermann, “Microservices tenets,” Computer Science-Research
application functionality in the service and completely exam- and Development, vol. 32, no. 3-4, pp. 301–310, 2017.
[2] I. MuleSoft, “Whitepaper: The top six microservices patterns,”
ine the API with multiple test cases. Using automated testing https://www.mulesoft.com/lp/whitepaper/api/top-microservices-patterns,
frameworks such as NUnit or JUnit we can increase the depth Oct. 2018.
JOURNAL OF IEEE INTERNET COMPUTING 6