0% found this document useful (0 votes)
126 views26 pages

Vaadin 14 Scalability Report - December 2019

This document provides a summary of a scalability assessment report for the Bakery App Starter Flow application. Key points include: 1. User journeys for a Barista and Admin were tested on an AWS environment with two EC2 instances. The Barista journey was the focus. 2. Profiling found the main scalability issues were heavy database queries, especially fetching past orders. Optimizations like caching and reducing query size were recommended. 3. Memory usage increases logarithmically based on session size and number of concurrent users. Guidelines are provided for sizing production servers.

Uploaded by

dskumarg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
126 views26 pages

Vaadin 14 Scalability Report - December 2019

This document provides a summary of a scalability assessment report for the Bakery App Starter Flow application. Key points include: 1. User journeys for a Barista and Admin were tested on an AWS environment with two EC2 instances. The Barista journey was the focus. 2. Profiling found the main scalability issues were heavy database queries, especially fetching past orders. Optimizations like caching and reducing query size were recommended. 3. Memory usage increases logarithmically based on session size and number of concurrent users. Guidelines are provided for sizing production servers.

Uploaded by

dskumarg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

SCALABILITY

REPORT
December 2019
 
 

Introduction 
 
The following scalability assessment was done for Bakery App Starter 
Flow application. For that purpose we set up a production-like 
scalability testing environment with Amazon EC2 instances. A local 
machine was used to execute Gatling tests against the environment. 
 
The purpose of this document is to show how well Bakery Flow 
scales up and to describe how the testing was done. We also provide 
some recommendations for optimizing the application and setup. 
 
First, we will go through the user journeys, then the test setup. Next, 
basic profiling will be described to show possible configuration 
problems and the most prominent scalability issues. After that, we 
will discuss about the memory usage and general CPU usage of 
Bakery Flow with our user journey. Finally, at the end of the 
document, we will make a summary, some remarks, and a short 
guide on how to continue testing by yourself in the future. 
 
 
 
 
 

​1​ ​Scalability Assessment Report


 
 

User journeys 
 
It is easy to identify at least two separate user journeys: Barista and 
Admin. It is expected that the Barista user journey is at least 10 times 
more common than the Admin user journey. Therefore, we will do 
the scalability test only for the Barista journey. 
 
Barista 
● Log in (view Storefront) 
● Click New Order 
● Fill in customer name and phone number 
● Add an item to the order (first from the combobox) 
● Add another item into order (second from the combobox) 
● Save order 
● Go back to storefront 
 
Admin 
● Log in (view Storefront) 
● Navigate to Dashboard 
● Navigate to Users 
● Navigate to Products 
● Click Strawberry Bun 
● Increase the price by $0.01 
● Click Save. 

​2​ ​Scalability Assessment Report


 
 

Test setup 
 
The application server and database were run on Amazon EC2 (EU 
Frankfurt) m5.2xlarge (8 virtual CPU cores and 32GB RAM) cluster 
consisting of two identical nodes. On the first node, we used 
embedded Spring boot Apache Tomcat/8.5.15 server with Hikari 
connection pool. Tomcat server’s maximum thread count was 
reduced from the default 200 to 100 to reduce thread switching 
overhead. A good guideline for setting the thread count is the 
formula below: 
 
connections = ((core_count * 2) + 1) 
 
(https://github.com/brettwooldridge/HikariCP/wiki/About-Pool-Sizing) 
 
The Postgres 9.6 database server was run on the second node of the 
cluster. Hikari connection pool’s size was increased to 17 connections. 
The postgresql.conf file is shown in the appendix. Both the Tomcat 
server and the Postgres server were setup with help of Docker. Used 
Docker files are attached to the appendix. 
 
Gatling was run on a separate 6 core (12 thread) i7 machine. This 
machine was located in Turku, Finland. Our previous tests indicated 
that we were able to run Gatling with at least 10,000 concurrent 
users without significant CPU or network bandwidth limitations. To 

​3​ ​Scalability Assessment Report


be on the safe side, we still recommend using two separate 
machines (possible located in different networks) if going above 
10,000 users. Terminal commands used to run Gatling tests are also 
presented in the appendix. 
 
Tests were run in production mode, and loading of static resources 
such as html and js files were not part of the test (simulating 
separate CDN server).  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

​4​ ​Scalability Assessment Report


 
 

The size of a session 


 
Bakery Flow consumes about 175 MB of memory when idle. The size 
of a session is about 385 kB. The size of a session can be used to 
estimate the memory needed for a production server. For example, if 
you want to serve 2,000 concurrent sessions per server, you should 
reserve at minimum 4.8 GB of memory for it when calculating the 
number of hourly users and with assumption of a 30min session 
timeout, and the average visit time of 6min. 
 
Though, to be on the safe side and enable optimal performance 
(without frequent garbage collection), we recommend reserving at 
least 3.7 GB of memory for every 1,000 concurrent users. In addition, 
you should measure the session size every now and then when 
further developing the web application. The goal is to make sure that 
it will not increase too much or you might need to increase the 
memory reserved for your production servers. 
 
The figure below shows how the memory usage increases 
depending on the number of concurrent users. Please note the 
logarithmic y-axis. The figure is constructed based on the following 
assumptions: 
 
1. Average session size is 385 KB 

​5​ ​Scalability Assessment Report


2. Session timeout is 30 min 
3. Average visit time is 6 min (after which the session is idle) 
4. Idle memory usage of the application is 175 MB 
5. Recommended memory is 1.5 * minimum memory 
 

 
 
 
 
 
 
 
 
 
 
 

​6​ ​Scalability Assessment Report


 
 

Profiling 
 
The profiling was done in the local development environment using 
Gatling, local Postgres database, and the JProfiler tool. The profiling 
was done with 1,000 concurrent users. Tomcat’s maxThread 
parameter was set to 10, and max heap size was 7 GB. The purpose of 
profiling before actual scalability tests is to verify and possibly fix (at 
least the low-hanging-fruit type issues) the application and the 
environment such that it would be scalable as possible. 
 
The picture below shows basic telemetric of the test when it was run 
for a couple of minutes with constant 1,000 users. Maybe the most 
interesting observations are as follows: 
 
● GC collect activity is modest since the application doesn’t use 
all of it’s 7 GB heap limit. 
● The CPU load is not very high. 
● Maybe most importantly, the time threads spent on the net I/O 
(light blue color) is very high when compared to the time spent 
on normal work (green color). Also there are a lot of idle threads 
(yellow color). 
 

​7​ ​Scalability Assessment Report


 
 
A more detailed view of live threads is shown below. It looks like that 
heavy database queries are the probable reason for the high net I/O. 
 

 
The view of JPA hotspots (below) reveals that over 70% of database 
time is spent fetching previous orders from the database. These 
orders are shown in Grid of Storefront view.  
 

​8​ ​Scalability Assessment Report


 
 
Below is the first general CPU hotspots graph, and the CPU call 
graph opened for the most important parts. 
 

 
 
This indicates clearly that the biggest part of the time (~40 %) is 
spent waiting (I/O) for the Storefront view’s database query to 
complete. After that, the next most significant hotspot is actual CPU 

​9​ ​Scalability Assessment Report


time (approximately ~60%) spent on the same query. One can easily 
identify following remedies for these hotspots: 
 
1. Make the first query (loading the content of Storefront's Grid) 
asynchronous. 
2. Cache (e.g. with Guava Loading Cache) content of the main 
view’s database query. 
3. Reduce the size of the query (i.e. the number of order rows 
fetched in the beginning). 
 
The reasoning behind the first one is that to reduce the synchronous 
I/O wait by freeing business logic to continue while data is fetched 
from the database. The second one is quite obvious, since the 
content of the Storefront view’s Grid is typically the same for every 
user, there is no reason to always fetch it from the database, Finally, 
the third one, at the moment Storefront view’s Grid is loaded with 50 
rows, even though typically only about 10 of those are visible. From 
an implementation point of view, the first one is the most 
complicated, thus we chose the second and third one for the 
optimizations before continuing testing in AWS environment. Below, 
we show how these optimizations can be applied to Bakery. 
 
First, caching the heaviest query. Our analyze above showed that the 
heaviest query is OrderService.findAnyMatchingAfterDueDate. Thus, 
we applied Guava LoadingCache for that query by adding the 
following code snippet to OrderService class: 
 
​ )
LoadingCache<Pair<Pageable, LocalDate>, Page<Order>> ​pageCache ​= CacheBuilder.​newBuilder(
.maximumSize(​1000​)

.expireAfterWrite(​5​, TimeUnit.​SECONDS)

​10​ ​Scalability Assessment Report


.build(​new ​CacheLoader<Pair<Pageable, LocalDate>, Page<Order>>() {
public ​Page<Order> load(Pair<Pageable, LocalDate> keys) {
​return ​orderRepository​.findByDueDateAfter(keys.getValue(), keys.getKey());
}
});

and modified findAnyMatchingAfterDueDate method like below: 

public ​Page<Order> findAnyMatchingAfterDueDate(Optional<String> optionalFilter,


Optional<LocalDate> optionalFilterDate, Pageable pageable) {
​if ​(optionalFilter.isPresent() && !optionalFilter.get().isEmpty()) {
​if ​(optionalFilterDate.isPresent()) {
​return ​orderRepository​.findByCustomerFullNameContainingIgnoreCaseAndDueDateAfter(
optionalFilter.get(), optionalFilterDate.get(), pageable);
} ​else ​{
​return ​orderRepository​.findByCustomerFullNameContainingIgnoreCase(optionalFilter.get(),
pageable);
}
} ​else ​{
​if ​(optionalFilterDate.isPresent()) {
​ ry ​{
t
​return ​pageCache​.get(​new ​Pair<>(pageable, optionalFilterDate.get()));
} ​catch ​(ExecutionException e) {
​return ​orderRepository​.findByDueDateAfter(optionalFilterDate.get(), pageable);
}
} ​else ​{
​return ​orderRepository​.findAll(pageable);
}
}
}

To make this more reliable, one should fully or partially invalidate 


cache after there are known changes, such as new order entities, in 
the database. Another alternative is to keep cache lifetime short 
enough. In the following we decide to keep cache lifetime to 5 
seconds. Thus cache will expire every 5 seconds, even with such low 
lifetime the number of database queries was reduced to less than 2% 
of the original amount. We added similar cache also for two other 

​11​ ​Scalability Assessment Report


most used queries: OrderService.countAnyMatchingAfterDueDate 
and ProductService.find. 
 
The third remedy is very straightforward to apply. Just add the 
following method call to StorefrontView class where the Grid 
containing orders is initialized: 
 
grid​.setPageSize(​25​);

This will reduce the size of the query to half of it since the default 
page size is 50. 
 
To verify the effect of these changes we profiled the application 
again with JProfiler. There is first the same basic telemetric of the 
test when it was run for a couple of minutes with constant 1,000 
users. The proportion of I/O wait time has significantly reduced from 
~40% to <10%. The next figure shows the CPU hotspots after 
optimizations. We can observe that CPU spent to DB operations is 
less than one fifth of the original value. 
 

​12​ ​Scalability Assessment Report


Now the biggest part of time is spent waiting for login to complete. 
Our assumption is that this is mostly spent on general client-server 
communication and sending html, js, and css files to client side 
which could explain the remaining ~10% of I/O wait. One possible 
way to optimize this would be serving static files from a separate 
http server such as Nginx.  
 

 
 
We made these modifications and built a new version. The 
following tests were run on this optimized version of the 
application. 
 
 
 
 
 
 
 
 
 
 
 

​13​ ​Scalability Assessment Report


 
 

General scalability on 


the CPU level 
 
We measured the CPU usage and requests’ response times by 
running the recorded Gatling test (Barista.scala) on the Gatling 
Machine (see the Test setup chapter). Performance was measured by 
running increasing number (500, 1,500 and 2,500) of virtual 
concurrent users with the Gatling tool. The CPU usage was observed 
with Amazon CloudWatch dashboard. 
 
We can observe that 1,500 virtual users produced about 46% of the 
Tomcat server’s CPU usage. Adding 1,000 more users consumed 
about 76% of CPU. As a rough estimate, we suggest to have at least 1 
logical CPU core (for an application server) per every 400 concurrent 
users. Another alternative for adding more CPU cores is of course 
distributing the load by clustering (see Summary). The maximum 
CPU usages are presented in the table below for both the Tomcat 
server and the Postgres server. One can observe that 2,500 users 
consume about two thirds of the available CPU capacity which 
should not yet have a big effect on the response times (see the 
following chapter). It is noteworthy that our optimizations basically 
minimize the CPU consumption of the DB.  
 

​14​ ​Scalability Assessment Report


Concurrent users  Max Avg (10s) CPU  Required CPU  Max Avg CPU 
Tomcat  cores estimate  Postgres 

500  15%  1.2  <1% 


1,500  46%  3.7  2% 
2,500  76%  6.1  5% 
 
Below, we have estimated how many peak concurrent users you 
should be able to serve with various configurations of AWS one-node 
Tomcat servers before consuming all the available CPU. 
 

Server  CPU cores  RAM (GB)  Peak concurrent users 

m5.large  2  8  800 
m5.xlarge  4  16  1600 
m5.2xlarge  8  32  3200 
m5.4xlarge  16  64  6400 
 
 
 
 
 
 
 
 
 
 
 

​15​ ​Scalability Assessment Report


 
 

User journey 1: Barista 


 
Below is the CPU load graph of the Tomcat server (orange line) for 
500 (red marker), 1,500 (orange marker) and 2,500 (green marker) 
virtual users. The blue line represents the CPU load for the Postgres 
server. The chart’s x-axis represents elapsed time from the beginning 
of the test and the y-axis represents CPU usage. The CPU percentage 
of 100% means that we are fully utilising all available CPU cores. 
 

 
 
In addition to the CPU usage we monitored the network usage of 
Tomcat (blue line: Tomcat network out, orange line: Tomcat in) and 

​16​ ​Scalability Assessment Report


Postgres servers (red line: in, green line: out). The graph is presented 
below. Our previous test with the same server has indicated that it is 
capable to serve several times more network traffic than what is now 
consumed with 2,500 users (~100Mb/s). 
 

 
 
In the figure below there are the response time percentiles over the 
duration of the tests for 500, 1,500 and 2,500 virtual users. With all 
these tested amounts, the response times of the requests were low 
(<500ms) for most of requests (90%). A very small percentage of 
requests (~1%) resulted in increased response times (~700ms) when 
having 1,500 users. With 2,500 users the percentage of slower 
requests (>700ms) was slightly bigger, around 5% of the total 
amount of the requests.  
 

​17​ ​Scalability Assessment Report


 
 
To summarize, we can deduct that the constant maximal concurrent 
user count for our server setup is around 2,500 users since there was 
a small but not yet significant increase in the response times 
compared to the case of 1,500 users. We did also ad-hoc tests with 
3,000 users, but in that case the increase of the response times were 
such big that constant 3,000 users would significantly affect the end 
user experience of the application. 
 
In the table below the data shown above is averaged over the test to 
show expected response times (in milliseconds) for different users. It 
also shows the server’s throughput in requests/second value. For 
 
 

​18​ ​Scalability Assessment Report


example column 95% for row 1,500 means that 95% of all requests are 
taking an average of 109ms to complete. 
 
Users  Min  Avg  50%  75%  95%  99%  Max%  r/s 

500  25  34  27  35  58  111  579  118 

1500  25  40  27  36  109  262  1602  353 

2500  25  49  29  38  129  421  2391  587 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

​19​ ​Scalability Assessment Report


 

 
 

Summary 
 
Having a cluster of two m5.2xlarge nodes for your Bakery App Starter 
Flow application, you should be able to serve constantly about 2,500 
concurrent users and still survive without major problems if the 
concurrent user count occasionally jumps to 3,000 users. Since our 
optimizations reduced the CPU load of the DB server, you could use 
a lower end server for that, e.g. m5.large (2 CPU cores and 8 GB RAM). 
It should be even possible to use the same server used for Bakery for 
DB too. 
 
To translate this to hosting costs, let's assume we would host a 
business system with similar resource use to Bakery for 10,000 users. 
Let’s further assume that the globally distributed workforce uses the 
system so that there are 1,000 - 2,500 concurrent users depending on 
the time of the day. We are taking into account the optimization 
presented and using a single m5.2xlarge AWS reserved instance 
(standard 1-year term) for hosting the application and the DB costs 
$0.2/user/year. 
 
Some additional improvements might be possible to achieve using 
optimised garbage collection, native application server libraries and 

​20​ ​Scalability Assessment Report


by serving static resources from a basic http server (e.g. Nginx or 
Apache2) instead of from an application server such as Tomcat. 
 
You can refer to the table in the beginning of page 11 if you 
considering scaling vertically (i.e. using faster server) the application 
server for bigger amount of concurrent users. On the other hand, we 
recommend clustering your application server (with sticky sessions) 
at least if you are expecting to have constantly more than 5,000 
concurrent users on it. By clustering you should be able to scale out 
Bakery Flow to theoretically any amount of concurrent users. 
Clustering in an early phase also adds capacity for possible rush 
usage even though the expected average load would be lower. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

​21​ ​Scalability Assessment Report


 
 

Appendix 
 
postgresql.conf generated with https://pgtune.leopard.in.ua/  
 
max_connections = 200
shared_buffers = 7680MB
effective_cache_size = 23040MB
maintenance_work_mem = 1920MB
checkpoint_completion_target = 0.7
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 200
work_mem = 10485kB
min_wal_size = 1GB
max_wal_size = 2GB
max_worker_processes = 16
max_parallel_workers_per_gather = 4

Dockerfiles 
Tomcat server: 
FROM ​anapsix​/​alpine-java:8_server-jre
ADD ​bakery-app.war .

ARG ​dbip
​ flow_bakery
ARG ​dbname=
​ flow_bakery_user
ARG ​dbuser=
​ flow_bakery_user_pw
ARG ​dbpw=

​ $​dbip
ENV ​dbip=
​ $​dbname
ENV ​dbname=

​22​ ​Scalability Assessment Report


​ $​dbuser
ENV ​dbuser=
​ $​dbpw
ENV ​dbpw=

ENTRYPOINT ​java ​-​jar \


​-​Xmx24g ​-​Xms24g \
​-​Dvaadin.productionMode \
​ ​5432​/​$​dbname \
​-​Dspring.datasource.url=jdbc:postgresql:​//​$​dbip: ​
​-​Dspring.datasource.username=$​dbuser ​\

​-​Dspring.datasource.password=$​dbpw \
Bakery-app.war

Postgres server: 
FROM ​library​/​postgres:​9.6.9

RUN ​printf ​"max_connections = 200\nshared_buffers = 7680MB\neffective_cache_size =


23040MB\nmaintenance_work_mem = 1920MB\ncheckpoint_completion_target = 0.7\nwal_buffers =
16MB\ndefault_statistics_target = 100\nrandom_page_cost = 1.1\neffective_io_concurrency =
200\nwork_mem = 393kB\nmin_wal_size = 1GB\nmax_wal_size = 2GB\nmax_worker_processes =
16\nmax_parallel_workers_per_gather = 4\n" ​>> /​usr​/​share​/​postgresql​/p
​ ostgresql.conf.sample

ENV ​POSTGRES_USER ​flow_bakery_user


ENV ​POSTGRES_PASSWORD ​flow_bakery_user_pw
​ low_bakery
ENV ​POSTGRES_DB f
EXPOSE ​5432

Gatling commands 
Gatling test were run with Gatling Maven plugin under the project 
with the following parameters, where the IP address (marked as 
xx.xxx.xxx.xx) was the application server’s IP. 
 
500 users: 
mvn -Pscalability gatling:test -Dgatling.sessionCount=500 -Dgatling.sessionStartInterval=140
-Dgatling.sessionRepeats=4 -Dgatling.baseUrl=​http://xx.xxx.xxx.xx:8080  

1,500 users: 
mvn -Pscalability gatling:test -Dgatling.sessionCount=1500 -Dgatling.sessionStartInterval=140
-Dgatling.sessionRepeats=4 -Dgatling.baseUrl=​http://xx.xxx.xxx.xx:8080

​23​ ​Scalability Assessment Report


2,500 users: 
mvn -Pscalability gatling:test -Dgatling.sessionCount=2500 -Dgatling.sessionStartInterval=140
-Dgatling.sessionRepeats=4 -Dgatling.baseUrl=http://xx.xxx.xxx.xx:8080  

​24​ ​Scalability Assessment Report


 
 
 
 
 
 
 
Want to discover the scalability 
potential of your business web app 
built with Vaadin? 
 
 

You might also like