SCALABILITY
REPORT
December 2019
Introduction
The following scalability assessment was done for Bakery App Starter
Flow application. For that purpose we set up a production-like
scalability testing environment with Amazon EC2 instances. A local
machine was used to execute Gatling tests against the environment.
The purpose of this document is to show how well Bakery Flow
scales up and to describe how the testing was done. We also provide
some recommendations for optimizing the application and setup.
First, we will go through the user journeys, then the test setup. Next,
basic profiling will be described to show possible configuration
problems and the most prominent scalability issues. After that, we
will discuss about the memory usage and general CPU usage of
Bakery Flow with our user journey. Finally, at the end of the
document, we will make a summary, some remarks, and a short
guide on how to continue testing by yourself in the future.
1 Scalability Assessment Report
User journeys
It is easy to identify at least two separate user journeys: Barista and
Admin. It is expected that the Barista user journey is at least 10 times
more common than the Admin user journey. Therefore, we will do
the scalability test only for the Barista journey.
Barista
● Log in (view Storefront)
● Click New Order
● Fill in customer name and phone number
● Add an item to the order (first from the combobox)
● Add another item into order (second from the combobox)
● Save order
● Go back to storefront
Admin
● Log in (view Storefront)
● Navigate to Dashboard
● Navigate to Users
● Navigate to Products
● Click Strawberry Bun
● Increase the price by $0.01
● Click Save.
2 Scalability Assessment Report
Test setup
The application server and database were run on Amazon EC2 (EU
Frankfurt) m5.2xlarge (8 virtual CPU cores and 32GB RAM) cluster
consisting of two identical nodes. On the first node, we used
embedded Spring boot Apache Tomcat/8.5.15 server with Hikari
connection pool. Tomcat server’s maximum thread count was
reduced from the default 200 to 100 to reduce thread switching
overhead. A good guideline for setting the thread count is the
formula below:
connections = ((core_count * 2) + 1)
(https://github.com/brettwooldridge/HikariCP/wiki/About-Pool-Sizing)
The Postgres 9.6 database server was run on the second node of the
cluster. Hikari connection pool’s size was increased to 17 connections.
The postgresql.conf file is shown in the appendix. Both the Tomcat
server and the Postgres server were setup with help of Docker. Used
Docker files are attached to the appendix.
Gatling was run on a separate 6 core (12 thread) i7 machine. This
machine was located in Turku, Finland. Our previous tests indicated
that we were able to run Gatling with at least 10,000 concurrent
users without significant CPU or network bandwidth limitations. To
3 Scalability Assessment Report
be on the safe side, we still recommend using two separate
machines (possible located in different networks) if going above
10,000 users. Terminal commands used to run Gatling tests are also
presented in the appendix.
Tests were run in production mode, and loading of static resources
such as html and js files were not part of the test (simulating
separate CDN server).
4 Scalability Assessment Report
The size of a session
Bakery Flow consumes about 175 MB of memory when idle. The size
of a session is about 385 kB. The size of a session can be used to
estimate the memory needed for a production server. For example, if
you want to serve 2,000 concurrent sessions per server, you should
reserve at minimum 4.8 GB of memory for it when calculating the
number of hourly users and with assumption of a 30min session
timeout, and the average visit time of 6min.
Though, to be on the safe side and enable optimal performance
(without frequent garbage collection), we recommend reserving at
least 3.7 GB of memory for every 1,000 concurrent users. In addition,
you should measure the session size every now and then when
further developing the web application. The goal is to make sure that
it will not increase too much or you might need to increase the
memory reserved for your production servers.
The figure below shows how the memory usage increases
depending on the number of concurrent users. Please note the
logarithmic y-axis. The figure is constructed based on the following
assumptions:
1. Average session size is 385 KB
5 Scalability Assessment Report
2. Session timeout is 30 min
3. Average visit time is 6 min (after which the session is idle)
4. Idle memory usage of the application is 175 MB
5. Recommended memory is 1.5 * minimum memory
6 Scalability Assessment Report
Profiling
The profiling was done in the local development environment using
Gatling, local Postgres database, and the JProfiler tool. The profiling
was done with 1,000 concurrent users. Tomcat’s maxThread
parameter was set to 10, and max heap size was 7 GB. The purpose of
profiling before actual scalability tests is to verify and possibly fix (at
least the low-hanging-fruit type issues) the application and the
environment such that it would be scalable as possible.
The picture below shows basic telemetric of the test when it was run
for a couple of minutes with constant 1,000 users. Maybe the most
interesting observations are as follows:
● GC collect activity is modest since the application doesn’t use
all of it’s 7 GB heap limit.
● The CPU load is not very high.
● Maybe most importantly, the time threads spent on the net I/O
(light blue color) is very high when compared to the time spent
on normal work (green color). Also there are a lot of idle threads
(yellow color).
7 Scalability Assessment Report
A more detailed view of live threads is shown below. It looks like that
heavy database queries are the probable reason for the high net I/O.
The view of JPA hotspots (below) reveals that over 70% of database
time is spent fetching previous orders from the database. These
orders are shown in Grid of Storefront view.
8 Scalability Assessment Report
Below is the first general CPU hotspots graph, and the CPU call
graph opened for the most important parts.
This indicates clearly that the biggest part of the time (~40 %) is
spent waiting (I/O) for the Storefront view’s database query to
complete. After that, the next most significant hotspot is actual CPU
9 Scalability Assessment Report
time (approximately ~60%) spent on the same query. One can easily
identify following remedies for these hotspots:
1. Make the first query (loading the content of Storefront's Grid)
asynchronous.
2. Cache (e.g. with Guava Loading Cache) content of the main
view’s database query.
3. Reduce the size of the query (i.e. the number of order rows
fetched in the beginning).
The reasoning behind the first one is that to reduce the synchronous
I/O wait by freeing business logic to continue while data is fetched
from the database. The second one is quite obvious, since the
content of the Storefront view’s Grid is typically the same for every
user, there is no reason to always fetch it from the database, Finally,
the third one, at the moment Storefront view’s Grid is loaded with 50
rows, even though typically only about 10 of those are visible. From
an implementation point of view, the first one is the most
complicated, thus we chose the second and third one for the
optimizations before continuing testing in AWS environment. Below,
we show how these optimizations can be applied to Bakery.
First, caching the heaviest query. Our analyze above showed that the
heaviest query is OrderService.findAnyMatchingAfterDueDate. Thus,
we applied Guava LoadingCache for that query by adding the
following code snippet to OrderService class:
)
LoadingCache<Pair<Pageable, LocalDate>, Page<Order>> pageCache = CacheBuilder.newBuilder(
.maximumSize(1000)
.expireAfterWrite(5, TimeUnit.SECONDS)
10 Scalability Assessment Report
.build(new CacheLoader<Pair<Pageable, LocalDate>, Page<Order>>() {
public Page<Order> load(Pair<Pageable, LocalDate> keys) {
return orderRepository.findByDueDateAfter(keys.getValue(), keys.getKey());
}
});
and modified findAnyMatchingAfterDueDate method like below:
public Page<Order> findAnyMatchingAfterDueDate(Optional<String> optionalFilter,
Optional<LocalDate> optionalFilterDate, Pageable pageable) {
if (optionalFilter.isPresent() && !optionalFilter.get().isEmpty()) {
if (optionalFilterDate.isPresent()) {
return orderRepository.findByCustomerFullNameContainingIgnoreCaseAndDueDateAfter(
optionalFilter.get(), optionalFilterDate.get(), pageable);
} else {
return orderRepository.findByCustomerFullNameContainingIgnoreCase(optionalFilter.get(),
pageable);
}
} else {
if (optionalFilterDate.isPresent()) {
ry {
t
return pageCache.get(new Pair<>(pageable, optionalFilterDate.get()));
} catch (ExecutionException e) {
return orderRepository.findByDueDateAfter(optionalFilterDate.get(), pageable);
}
} else {
return orderRepository.findAll(pageable);
}
}
}
To make this more reliable, one should fully or partially invalidate
cache after there are known changes, such as new order entities, in
the database. Another alternative is to keep cache lifetime short
enough. In the following we decide to keep cache lifetime to 5
seconds. Thus cache will expire every 5 seconds, even with such low
lifetime the number of database queries was reduced to less than 2%
of the original amount. We added similar cache also for two other
11 Scalability Assessment Report
most used queries: OrderService.countAnyMatchingAfterDueDate
and ProductService.find.
The third remedy is very straightforward to apply. Just add the
following method call to StorefrontView class where the Grid
containing orders is initialized:
grid.setPageSize(25);
This will reduce the size of the query to half of it since the default
page size is 50.
To verify the effect of these changes we profiled the application
again with JProfiler. There is first the same basic telemetric of the
test when it was run for a couple of minutes with constant 1,000
users. The proportion of I/O wait time has significantly reduced from
~40% to <10%. The next figure shows the CPU hotspots after
optimizations. We can observe that CPU spent to DB operations is
less than one fifth of the original value.
12 Scalability Assessment Report
Now the biggest part of time is spent waiting for login to complete.
Our assumption is that this is mostly spent on general client-server
communication and sending html, js, and css files to client side
which could explain the remaining ~10% of I/O wait. One possible
way to optimize this would be serving static files from a separate
http server such as Nginx.
We made these modifications and built a new version. The
following tests were run on this optimized version of the
application.
13 Scalability Assessment Report
General scalability on
the CPU level
We measured the CPU usage and requests’ response times by
running the recorded Gatling test (Barista.scala) on the Gatling
Machine (see the Test setup chapter). Performance was measured by
running increasing number (500, 1,500 and 2,500) of virtual
concurrent users with the Gatling tool. The CPU usage was observed
with Amazon CloudWatch dashboard.
We can observe that 1,500 virtual users produced about 46% of the
Tomcat server’s CPU usage. Adding 1,000 more users consumed
about 76% of CPU. As a rough estimate, we suggest to have at least 1
logical CPU core (for an application server) per every 400 concurrent
users. Another alternative for adding more CPU cores is of course
distributing the load by clustering (see Summary). The maximum
CPU usages are presented in the table below for both the Tomcat
server and the Postgres server. One can observe that 2,500 users
consume about two thirds of the available CPU capacity which
should not yet have a big effect on the response times (see the
following chapter). It is noteworthy that our optimizations basically
minimize the CPU consumption of the DB.
14 Scalability Assessment Report
Concurrent users Max Avg (10s) CPU Required CPU Max Avg CPU
Tomcat cores estimate Postgres
500 15% 1.2 <1%
1,500 46% 3.7 2%
2,500 76% 6.1 5%
Below, we have estimated how many peak concurrent users you
should be able to serve with various configurations of AWS one-node
Tomcat servers before consuming all the available CPU.
Server CPU cores RAM (GB) Peak concurrent users
m5.large 2 8 800
m5.xlarge 4 16 1600
m5.2xlarge 8 32 3200
m5.4xlarge 16 64 6400
15 Scalability Assessment Report
User journey 1: Barista
Below is the CPU load graph of the Tomcat server (orange line) for
500 (red marker), 1,500 (orange marker) and 2,500 (green marker)
virtual users. The blue line represents the CPU load for the Postgres
server. The chart’s x-axis represents elapsed time from the beginning
of the test and the y-axis represents CPU usage. The CPU percentage
of 100% means that we are fully utilising all available CPU cores.
In addition to the CPU usage we monitored the network usage of
Tomcat (blue line: Tomcat network out, orange line: Tomcat in) and
16 Scalability Assessment Report
Postgres servers (red line: in, green line: out). The graph is presented
below. Our previous test with the same server has indicated that it is
capable to serve several times more network traffic than what is now
consumed with 2,500 users (~100Mb/s).
In the figure below there are the response time percentiles over the
duration of the tests for 500, 1,500 and 2,500 virtual users. With all
these tested amounts, the response times of the requests were low
(<500ms) for most of requests (90%). A very small percentage of
requests (~1%) resulted in increased response times (~700ms) when
having 1,500 users. With 2,500 users the percentage of slower
requests (>700ms) was slightly bigger, around 5% of the total
amount of the requests.
17 Scalability Assessment Report
To summarize, we can deduct that the constant maximal concurrent
user count for our server setup is around 2,500 users since there was
a small but not yet significant increase in the response times
compared to the case of 1,500 users. We did also ad-hoc tests with
3,000 users, but in that case the increase of the response times were
such big that constant 3,000 users would significantly affect the end
user experience of the application.
In the table below the data shown above is averaged over the test to
show expected response times (in milliseconds) for different users. It
also shows the server’s throughput in requests/second value. For
18 Scalability Assessment Report
example column 95% for row 1,500 means that 95% of all requests are
taking an average of 109ms to complete.
Users Min Avg 50% 75% 95% 99% Max% r/s
500 25 34 27 35 58 111 579 118
1500 25 40 27 36 109 262 1602 353
2500 25 49 29 38 129 421 2391 587
19 Scalability Assessment Report
Summary
Having a cluster of two m5.2xlarge nodes for your Bakery App Starter
Flow application, you should be able to serve constantly about 2,500
concurrent users and still survive without major problems if the
concurrent user count occasionally jumps to 3,000 users. Since our
optimizations reduced the CPU load of the DB server, you could use
a lower end server for that, e.g. m5.large (2 CPU cores and 8 GB RAM).
It should be even possible to use the same server used for Bakery for
DB too.
To translate this to hosting costs, let's assume we would host a
business system with similar resource use to Bakery for 10,000 users.
Let’s further assume that the globally distributed workforce uses the
system so that there are 1,000 - 2,500 concurrent users depending on
the time of the day. We are taking into account the optimization
presented and using a single m5.2xlarge AWS reserved instance
(standard 1-year term) for hosting the application and the DB costs
$0.2/user/year.
Some additional improvements might be possible to achieve using
optimised garbage collection, native application server libraries and
20 Scalability Assessment Report
by serving static resources from a basic http server (e.g. Nginx or
Apache2) instead of from an application server such as Tomcat.
You can refer to the table in the beginning of page 11 if you
considering scaling vertically (i.e. using faster server) the application
server for bigger amount of concurrent users. On the other hand, we
recommend clustering your application server (with sticky sessions)
at least if you are expecting to have constantly more than 5,000
concurrent users on it. By clustering you should be able to scale out
Bakery Flow to theoretically any amount of concurrent users.
Clustering in an early phase also adds capacity for possible rush
usage even though the expected average load would be lower.
21 Scalability Assessment Report
Appendix
postgresql.conf generated with https://pgtune.leopard.in.ua/
max_connections = 200
shared_buffers = 7680MB
effective_cache_size = 23040MB
maintenance_work_mem = 1920MB
checkpoint_completion_target = 0.7
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 200
work_mem = 10485kB
min_wal_size = 1GB
max_wal_size = 2GB
max_worker_processes = 16
max_parallel_workers_per_gather = 4
Dockerfiles
Tomcat server:
FROM anapsix/alpine-java:8_server-jre
ADD bakery-app.war .
ARG dbip
flow_bakery
ARG dbname=
flow_bakery_user
ARG dbuser=
flow_bakery_user_pw
ARG dbpw=
$dbip
ENV dbip=
$dbname
ENV dbname=
22 Scalability Assessment Report
$dbuser
ENV dbuser=
$dbpw
ENV dbpw=
ENTRYPOINT java -jar \
-Xmx24g -Xms24g \
-Dvaadin.productionMode \
5432/$dbname \
-Dspring.datasource.url=jdbc:postgresql://$dbip:
-Dspring.datasource.username=$dbuser \
-Dspring.datasource.password=$dbpw \
Bakery-app.war
Postgres server:
FROM library/postgres:9.6.9
RUN printf "max_connections = 200\nshared_buffers = 7680MB\neffective_cache_size =
23040MB\nmaintenance_work_mem = 1920MB\ncheckpoint_completion_target = 0.7\nwal_buffers =
16MB\ndefault_statistics_target = 100\nrandom_page_cost = 1.1\neffective_io_concurrency =
200\nwork_mem = 393kB\nmin_wal_size = 1GB\nmax_wal_size = 2GB\nmax_worker_processes =
16\nmax_parallel_workers_per_gather = 4\n" >> /usr/share/postgresql/p
ostgresql.conf.sample
ENV POSTGRES_USER flow_bakery_user
ENV POSTGRES_PASSWORD flow_bakery_user_pw
low_bakery
ENV POSTGRES_DB f
EXPOSE 5432
Gatling commands
Gatling test were run with Gatling Maven plugin under the project
with the following parameters, where the IP address (marked as
xx.xxx.xxx.xx) was the application server’s IP.
500 users:
mvn -Pscalability gatling:test -Dgatling.sessionCount=500 -Dgatling.sessionStartInterval=140
-Dgatling.sessionRepeats=4 -Dgatling.baseUrl=http://xx.xxx.xxx.xx:8080
1,500 users:
mvn -Pscalability gatling:test -Dgatling.sessionCount=1500 -Dgatling.sessionStartInterval=140
-Dgatling.sessionRepeats=4 -Dgatling.baseUrl=http://xx.xxx.xxx.xx:8080
23 Scalability Assessment Report
2,500 users:
mvn -Pscalability gatling:test -Dgatling.sessionCount=2500 -Dgatling.sessionStartInterval=140
-Dgatling.sessionRepeats=4 -Dgatling.baseUrl=http://xx.xxx.xxx.xx:8080
24 Scalability Assessment Report
Want to discover the scalability
potential of your business web app
built with Vaadin?