0% found this document useful (0 votes)
49 views

23 - An - Analysis of The Performance - of - Websockets - in - Various - Programming - Languages - and - Libraries

Uploaded by

so.ghost.07
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views

23 - An - Analysis of The Performance - of - Websockets - in - Various - Programming - Languages - and - Libraries

Uploaded by

so.ghost.07
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

An Analysis of the Performance of Websockets in

Various Programming Languages and Libraries


Matt Tomasetti
Computer Science
Rampo College of New Jersey
Mahwah, United States of America
[email protected]

Abstract — As the demand for real-time data increases, so does the clients try to reconnect? Or, will another instance go down under
use of websockets. It is crucial to consider the speed along with the the increased demand, causing a domino effect to occur? Which
reliability of a language and websocket library before programming language/library offers the most reliability in
implementing it in an application. This study aims to benchmark order to handle the awaiting clients, until one’s infrastructure
various websocket servers in order to determine which one offers can spool up another instance of the downed websocket server?
the fastest round trip time of a request, as well as the reliability of
Originally, the intent of this project was to determine
the websocket server under load.
which websocket server is best suited for some of my own
Keywords— websocket, server, client, benchmark, echo, c, personal projects, based on speed, reliability, and furthermore,
libwebsocket, c++, cpp uwebsockets, c#, cs, fleck, go, gorilla, java, code complexity. However, it has been considered that this
java-websocket, node, nodejs, php, ratchet, python, websockets, rust, information would be useful to other developers as well.
rust-websocket, connection, request, round trip, time, docker, Therefore, I have decided to publish my findings.
container, virtual machine, KVM
II. METHOD
I. PURPOSE
As mentioned, the websocket servers are kept as close
The purpose of this benchmarking exercise is to
to generic echo servers as possible. That is, with the exception
determine which programming language/library will offer the
of the servers having a small difference. In a traditional echo
best performance in a websocket application. The websocket
server, the client sends a string to the server, and the server
servers that are being tested in this benchmark are purposely
responds back with that exact same string. The benchmark
simple, without extensive configuration or multithreading. This
servers, on the other hand, wait for the client to send a JSON
is in order to showcase which server performs the best, right out
string containing the message count signified as "c" i.e. { "c":
of the box. For this reason, each server is kept as close to a
1 }. The server then responds back with a JSON string
generic "echo server" as possible. The data being collected will
containing "c" along with the additional property "ts" which is
provide insights on the performance of the websocket servers.
the Unix timestamp of when the server receives the message i.e.
The specific metrics that are of interest in this assessment are
{ "c": 1, "ts": 780283800 }.
the round trip time (the time from when a client sends a request
The original intention of the "ts" property is for the
to the server, to the time the client receives a response), and the
client to be able to calculate the time it takes the server to
success rate of requests (the percentage of requests which
receive the client’s message, as well as the time it takes the
successfully get a response).
client to receive the server’s response. However, this calculation
It is important to note that a websocket which is critical
turns out to be more complex due to the system times of the
to an application should be set-up in a more optimal
client machine and the server machine not being perfectly
environment than the one being demonstrated in this testing.
synced. Because of this, only the total round trip time will be
Nevertheless, accounting for the worst case scenario is always
calculated for each request.
imperative. A websocket that needs to be dependable may use
That being said, the "ts" property remains in the
multithreading in order to achieve better performance, and/or a
response in order to give the websocket a task to complete
load balancer to distribute connected clients. That being said,
before responding. This small operation gives the websocket an
perhaps one has a websocket with a defect, whether it is due to
intermediate step, rather than behave as a basic echo server. As
an undetected bug or a flaw in the initial design. Contemplate
this is not a performance test of the languages themselves, it felt
what may happen to the flawed websocket application in a
improper to replace this step with some arbitrary pattern
moment of high demand. Even if load balanced, assess the
matching or matrix multiplication. Because of this, it has been
events that would follow if one of the instances of the
decided to keep the process of decoding an incoming JSON
websocket goes down. Will the remaining instances be able to
string, adding a new property, and encoding a JSON string,
shoulder the influx of incoming connections, as the abandoned

Electronic copy available at: https://ssrn.com/abstract=3778525


despite the fact that the "ts" property is not used. This felt like a offer the best performance when code complexity is taken into
more natural task for the application as manipulating JSON account. Finally, PHP will perform with the slowest round trip
objects is extremely common for websockets. times, in addition to being the least reliable.
In order to measure changes to the websocket servers’
response and reliability, the benchmarking client puts each IV. SET-UP
server under an increasing amount of stress. This is done by
performing multiple rounds of testing. Each round consists of a A. Server Machine
collection of websocket clients sending requests to the server, The machine running the websocket servers is a Dell
with the amount of clients increasing by a fixed amount each PowerEdge R410. It is outfitted with dual 8-core Xeon L5630’s
round. The number of requests per client per round is also fixed. clocked at 2.13 GHz, and has 16 GB of DDR3 RAM. For the
Therefore, as the number of total clients increases each round, operating system, the machine is running Ubuntu 20. In order to
so does the total number of requests to the server. More isolate the benchmarking process from any other programs that
specifically, 100 new clients are added each round, with each may be running in the background, Docker has been installed on
client sending 100 requests. This increases the total number of a virtual machine maintained by KVM. The actual
requests sent each round by 10,000. There will be a total of 100 benchmarking VM has 2 cores allocated to it with 4 GB of
rounds, with a cumulative 5.5 million requests. This will allow RAM, again running Ubuntu. The websocket servers will run
us to see how each server behaves under an increasingly one at a time inside their own Docker container, which the
demanding load. images for each are available on my Docker Hub.
Determining the appropriate number of clients and
requests took some trial and error. Any more than 14,000 clients, B. Client Machine
and the operating system throws an "Open File Limit" error.
This limit can be increased, but doing so is unnecessary for our The client machine is running an Intel i5-8600, with 6
testing. The most important factor turns out to be time, which cores, clocked at 3.10 GHz, but can boost up to 4.30 GHz. It has
has dictated the final configuration the most. This configuration 8 GB of DDR4 RAM in a dual channel configuration, and
ended up being 100 rounds of testing, adding 100 clients each utilizes Windows 10 as its operating system.
round, for a total of 10,000 clients, and each client sending 100
C. Connection
requests per round. A total of 9 websocket servers are to be
tested for this experiment. The languages and libraries are as There is a direct cat 5e ethernet link running from the
follows: client machine to the server machine. This is specifically done
• C / Libwebsockets as to not DDOS my home router. However, the direct link also
• C++ / uWebSockets has the additional benefit of reducing latency between the two
• C# / Fleck machines, which further increases the accuracy of the results.
• Go / Gorilla
D. Reasoning
• Java / Java-WebSocket
• NodeJS / uWebsSocket This set-up is deliberately in a configuration where the
• PHP / Ratchet client machine is more powerful than the server machine. This is
• Python / websockets to ensure that any bottleneck in the benchmarking process is due
• Rust / rust-websocket to the benchmarking servers rather than a lack of requests
For the sake of readability, going forward the websocket servers coming from the client.
will be referred to solely by the language in which they are Furthermore, a CPU with a lower clock speed is
written. It should be noted that the results for a given language utilized in order to exacerbate any performance differences in
is not a representation of the language as a whole, and alternate the websocket servers. Two Raspberry Pi 4B’s were originally
libraries for the same language may yield different results. considered for the role of both the client and server machines.
The benchmarking client is written in NodeJS, which is However, that idea was scrapped, as an early test showed that
specifically chosen because of its non-blocking nature. This is a these devices are actually too slow for the job. The clock speed
desirable trait, as all of the websocket clients should make their on the PowerEdge R410 turns out to be perfect to adequately
requests to the server at the same time, rather than one highlight any performance difference in the websocket servers.
connection having to finish sending all their requests before the Yet, it is fast enough as to not be unbearable to work on, and to
next connection can start. perform the benchmark test in an adequate amount of time.

III. PREDICTION V. RESULTS

Prior to running the benchmarks, my predictions are as Each test was run 3 times per websocket server, with
follows: the websocket written in C will have the fastest round the results of the 3 tests being averaged together. During the
trip times. Furthermore, the websocket written in Python will benchmarking process, the client machine sends upwards of
22.5 Mb/s to the server, with the more performant websocket

Electronic copy available at: https://ssrn.com/abstract=3778525


servers responding with up to 29.0 Mb/s. Although, this varies Java and C# follow closely behind NodeJS. With PHP,
based on the performance of the websocket. The CPU of the C++, and Rust performing the benchmark a little slower.
client machine does have to boost from base clock, sustaining
4.09 GHz, and maxing at 4.13 GHz. The CPU never fully hits B. Success Rate
100% utilization, despite staying in the high 90s for the later All the websockets remain 100% reliable throughout
rounds of tests. TheVM server maxes out CPU utilization on all the benchmark. That is, with the exception of Python and C,
tests, and uses roughly 400 MB of RAM, give or take 100 MB. which are the only two websockets that drop messages. Due to
their unreliability, and inability to complete the benchmark, the
A. Overall results for C and Python are excluded from the rest of this
It is unexpected to see that C and Python are unable to section.
complete the benchmark test. Python consistently makes it to
round 32, and then drops all the websocket connections. The C. Connection Time
benchmarking client eventually throws a "heap out of memory" Despite being the slowest when it came to responding
error from trying to recursively connect back to the server. It is to requests, Go displays the best connection times. Connecting
also extremely concerning that Python’s elapsed time for each to the cumulative 10,000 clients in 8.8 seconds. Java, which is
round seems to increase exponentially as the number of one of the best when it came to request times, performs the
connections/requests increases linearly. connections the slowest, connecting to the 10,000 clients in 205
C, on the other hand, is a bit more unpredictable, seconds. C++ takes 60 seconds, with the other servers
making it to between round 70-80. Again, it drops the performing the connections in 10-20 seconds.
connections until the client runs out of heap space.
Multithreading these websockets, or running them on a more Total Connection Time
powerful machine would potentially improve their results. 220,000
However, that point is moot, since doing so will improve the 200,000
results for all websocket servers. 180,000
Another surprise is that out of all the websockets that 160,000
are able to complete the benchmark, Go performs the worst. 140,000
Milliseconds

Go’s performance does not just lag behind by a little, rather it 120,000
takes over twice as long to complete the benchmark when 100,000
compared to the next slowest websocket, which is Rust. 80,000
Something that does not come as too much of a
60,000
surprise is the fastest websocket. Although, my initial prediction
40,000
was that C would perform the best, it is not unexpected that
20,000
NodeJS takes the crown. Node’s asynchronous nature allows for
0
greater throughput of requests coming into the server. C++ C# Go Java NodeJS PHP Rust
Fig. 2 Visually displays the time it takes each server to connect to the
Request Time Elapse cumulative 10,000 websocket clients.
300,000

250,000 Total Request Time


6,000,000
200,000
Milliseconds

5,000,000
150,000
4,000,000
Milliseconds

100,000
3,000,000
50,000
2,000,000
0
0 2000 4000 6000 8000 10000
1,000,000
Connections
C C++ C# Go Java NodeJS 0
PHP Python Rust C++ C# Go Java NodeJS PHP Rust
Fig. 1 Illustrates the time it takes each websocket server to respond to all Fig. 3 Visually displays the time it takes each server to respond to the
requests from the benchmarking client for a given number of connections. cumulative 5.5 million requests.

Electronic copy available at: https://ssrn.com/abstract=3778525


Total Benchmark Time( Connections + Requests ) Longest Round-Trip Time

7,000,000 140,000

6,000,000 120,000

5,000,000 100,000

Milliseconds
Milliseconds

4,000,000 80,000

3,000,000 60,000

2,000,000 40,000

1,000,000 20,000

0 0
C++ C# Go Java NodeJS PHP Rust 0 2000 4000 6000 8000 10000
Fig. 4 Visually displays the total time each server takes to complete the entire Connections
benchmark test.
C C++ C# Go Java NodeJS
PHP Python Rust
D. Request Time
Fig. 5 Illustrates the longest amount of time it takes each websocket server to
The amount of time it takes for the websocket servers respond to a single request for a given number of connections.
to respond to requests increases linearly as the total number of
requests also increases linearly. Go is the biggest loser in this
category, taking 100 minutes to complete the cumulative 5.5 to find out that this is highly discouraged. Libwebsocket’s own
million requests. NodeJS performs the best, taking under 12 website [2] states "Directly performing websocket actions from
minutes to complete all requests. Rust takes 42 minutes, C++ other threads is not allowed. Aside from the internal data being
takes 37 minutes, PHP takes 32 minutes, C# 20 minutes, and inconsistent in forked() processes, the scope of a wsi (struct
Java 16 minutes. websocket) can end at any time during service with the socket
closing and the wsi freed." The wsi variable referenced in that
E. Longest Round Trip quote refers to the pointer to the client connection. Plainly
speaking, if one tries to multithread LWS, one could end up with
Another metric to be considered is the longest round trip time, incorrect data in the forked thread, or lose the connection to the
this being the longest amount of time for a single round trip to client all together. Therefore, the poor performance experienced
complete. This is useful to see how long a websocket server is a design decision of the LWS library. This suggests that while
could potentially leave a client waiting for a response. our tests shows poor performance for the C websocket server,
Obviously, Go has the longest round trip times. PHP and C++ tie there may be other C libraries which would offer better speed
for third. Java and C# tie for second, and NodeJS arises and reliability.
victorious having the fastest longest round trips. The idea of this single-threaded design also explains
Go’s poor performance. Go does not meet the performance
VI. ANALYSIS expected either. However, the explanation is quite simple. Go is
While the results of the benchmark tests are not as designed to take advantage of concurrent processing. In other
expected, they do make sense once further understood. Let us words, Go achieves its renowned performance by completing
start by looking at C, or more specifically LWS tasks in parallel. Therefore, running everything from a single
(Libwebsockets). As it turns out, LWS is in fact an extremely goroutine (a single thread) substantially hampers the
inefficient websocket library when it comes to performance. The performance of the websocket, as it was never designed to be
entire websocket server runs in a blocking event loop on a single utilized in such a bare-bones set-up. The good news is, unlike
thread. To put it simply the server operates something like this: the C websocket, the Go server is able to use multiple
goroutines, and therefore be multithreaded for better
1. Accept incoming message
performance.
2. Read incoming message
So how did C++, PHP, and Rust achieve better
3. Generate response
performance than their C and Go counterparts? To put it simply,
4. Send response
while C and Go servers are subject to blocking code, the C++,
5. Repeat
PHP, and Rust servers are not. In other words, C and Go
It does this for each incoming request, one request at a time, and complete each task one at a time, in order, one after another.
all in a single thread. In fact, it was proposed earlier that Meanwhile C++, PHP, and Rust can complete their tasks
multithreading the application could improve performance, only asynchronously, out of order, in whatever sequence will get the

Electronic copy available at: https://ssrn.com/abstract=3778525


job done the fastest. The advantage here is that these VII. HOW TO REPLICATE
asynchronous servers can work on other tasks while waiting for
The source code for each websocket server as well as
an entirely different tasks to complete. This leads to huge
the client can be found on GitHub at: https://github.com/
performance improvements, as seen in the results of this
matttomasetti. In addition, ready-made docker images can be
benchmark.
found on Docker Hub at: https://hub.docker.com/u/mtomasetti.
So what tricks do Java, C#, and NodeJS use to improve
The images will automatically start the websocket server
performance even further? Not only are these servers
listening on port 8080. The image for the websocket client will
performing their tasks asynchronously, but they also
automatically start the benchmarking client, looking for a server
automatically spawn dozens of threads to perform them in
on localhost. However the default settings can be altered
parallel. This gave Java, C#, and NodeJS an even greater
through environment variables. Further details are listed in the
advantage than the aforementioned servers which are limited to
README files in each repository. The results of each
one single thread. It should be noted that, if one desires to do so,
benchmark test can be viewed in greater detail at https://
C++, PHP, and Rust can also be multithreaded to achieve similar
matttomasetti.com.
performance improvements. This just has to be implemented
manually by the developer. VIII. WHAT CAN BE IMPROVED
Lastly, that brings us to Python. Why does Python
perform so poorly? Is it also a victim of blocking code? If one chooses to revisit this experiment, it would be
Actually, no. While the Python server does run on a single beneficial to perform it with a greater number of websocket
thread, the code is written to be asynchronous. In part, the variations. These variations would include websockets written in
reason Python performs so terribly is because the websocket more languages as well as multiple libraries per language. This
library being used is horribly unoptimized. While reputable will not only highlight which library is the best, but also
websocket libraries are being used for the other websocket potentially which language offers the best performance for
servers, like the established Ratchet websocket for PHP and websocket applications. It would also be a good idea to increase
uWebsockets for NodeJS, Python is different. For the Python the number of clients and rounds of the benchmark test in order
websocket, a generic module is used which is simply named to see if there is a breaking point for additional websocket
"websockets". The documentation for this module is limited, servers other than just the C and Python implementations.
and custom configuration is non-existent. This is most likely a Lastly, it could be interesting to multithread the websockets
module that offers the simplest of websocket functionality. Now, servers (that support it) as to give a level playing field between
it was mentioned that this only partly explains the poor the servers that do so automatically and the ones that do not.
performance. While writing this report, it seemed unjust not to If one wants to go above and beyond they may not only
give Python a fighting chance. So, the websocket server has multithread the servers, but also load up multiple instances of
been rebuilt with the more trusted Autobahn library and the the websockets with a load balancer in front of them. This will
benchmark test has been rerun. This new server does lead to be useful to see just how much performance could be achieved
better results, though I use the word "better" loosely. With the out of each websocket when it is given a more optimal
Autobahn server, the time to complete each round increases environment, rather than thrown in a worse case scenario.
linearly rather than exponentially, which is a promising sign.
Even so, the performance is still worse than Go’s websocket IX. CONCLUSION
server. Additionally, it is still unable to finish the benchmark test As we can see, all my predictions turned out to be
even with this more optimized websocket library. It gets to entirely incorrect, but this is not a bad thing. Ending up with
round 98, and then the server drops the websocket connections, data that goes against one’s initial expectations proves that there
with many dropped messages throughout the benchmarking is knowledge to be gained. From the information that has been
process. Nevertheless, the Python server is rebuilt one more uncovered in this report, I propose the following 4 guidelines
time, this time with a library by the name of "aiohttp." At last, when selecting a websocket library:
all 100 rounds of the benchmark are able to be completed,
though not very well. Aiohttp still takes longer than Go, and 1. Ensure the websocket library is asynchronous. This
becomes substantially unreliable after round 50, dropping may also be expressed as "non-blocking".
anywhere from 30-50% of the messages. It can only be 2. Ensure the library allows the websocket to be
concluded that the reason for this dreadful performance is multithreaded, either done automatically, or with
Python itself. Python, which is interpreted at run time rather additional configuration by the developer.
than compiled, suffers from slow execution time. Even when 3. Greater performance may be achieved by using a
compared to other interpreted languages, like PHP, Python’s compiled language over an interpreted one.
performance still lags behind [1]. 4. Do not use Python.

All in all, the winner here is clearly NodeJS. From its


amazing performance to its code complexity (or rather lack of
complexity), NodeJS is an optimal choice for a websocket

Electronic copy available at: https://ssrn.com/abstract=3778525


project. I thought a similar complement would be applicable to
Python. Unfortunately, after this test, it is clear that Python
based websockets should be avoided at all costs. For a more
business oriented application, one cannot go wrong with the
enterprise favorites of Java or C#. That being said, if one is to
take a single piece of knowledge away from this study, it would
be to always use an asynchronous websocket.

REFERENCE
[1] A. Burets, “Python vs PHP: Main Differences and Comparison: SCAND
Blog,” SCAND, 28-Jun-2019. [Online]. Available: https://scand.com/
company/blog/python-vs-php-for-web-development/. [Accessed: 26-
Jan-2021].

[2] “Notes about coding with lws ,” libwebsockets. [Online]. Available:


https://libwebsockets.org/lws-api-doc-master/html/
md_README_8coding.html. [Accessed: 26-Jan-2021].

Electronic copy available at: https://ssrn.com/abstract=3778525


ADDITIONAL GRAPHS

Average Round-Trip Time Python Total Request Time Elapse


60,000 400000

50,000
300000
40,000
Milliseconds

Milliseconds
30,000 200000

20,000
100000
10,000

0 0
0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000
Connections Connections

C C++ C# Go Java Websockets Autobahn aiohttp Go


NodeJS PHP Python Rust

Fig. 6 Illustrates the average amount of time it takes each websocket server to Fig. 5 Illustrates the total amount of time it takes each websocket server to
respond to a single request for a given number of connections. complete all requests for a given number of connections

Connection Time Elapse Connection Time Elapse ( without C and Python )


160,000 5,000

4,500
140,000
4,000
120,000
3,500
100,000 3,000
Milliseconds

Milliseconds

80,000 2,500

2,000
60,000
1,500
40,000
1,000
20,000 500

0 0
0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000
Connections Connections

C C++ C# Go Java C++ C# Go Java NodeJS


NodeJS PHP Python Rust PHP Rust
Fig. 7 Illustrates the amount of time it takes for each websocket server to add Fig. 8 Illustrates the amount of time it takes for each websocket server to add
100 new connections with a given number of already existing connections 100 new connections with a given number of already existing connections
( same a Figure 7, just without C and Python ).

Electronic copy available at: https://ssrn.com/abstract=3778525

You might also like