23 - An - Analysis of The Performance - of - Websockets - in - Various - Programming - Languages - and - Libraries
23 - An - Analysis of The Performance - of - Websockets - in - Various - Programming - Languages - and - Libraries
Abstract — As the demand for real-time data increases, so does the clients try to reconnect? Or, will another instance go down under
use of websockets. It is crucial to consider the speed along with the the increased demand, causing a domino effect to occur? Which
reliability of a language and websocket library before programming language/library offers the most reliability in
implementing it in an application. This study aims to benchmark order to handle the awaiting clients, until one’s infrastructure
various websocket servers in order to determine which one offers can spool up another instance of the downed websocket server?
the fastest round trip time of a request, as well as the reliability of
Originally, the intent of this project was to determine
the websocket server under load.
which websocket server is best suited for some of my own
Keywords— websocket, server, client, benchmark, echo, c, personal projects, based on speed, reliability, and furthermore,
libwebsocket, c++, cpp uwebsockets, c#, cs, fleck, go, gorilla, java, code complexity. However, it has been considered that this
java-websocket, node, nodejs, php, ratchet, python, websockets, rust, information would be useful to other developers as well.
rust-websocket, connection, request, round trip, time, docker, Therefore, I have decided to publish my findings.
container, virtual machine, KVM
II. METHOD
I. PURPOSE
As mentioned, the websocket servers are kept as close
The purpose of this benchmarking exercise is to
to generic echo servers as possible. That is, with the exception
determine which programming language/library will offer the
of the servers having a small difference. In a traditional echo
best performance in a websocket application. The websocket
server, the client sends a string to the server, and the server
servers that are being tested in this benchmark are purposely
responds back with that exact same string. The benchmark
simple, without extensive configuration or multithreading. This
servers, on the other hand, wait for the client to send a JSON
is in order to showcase which server performs the best, right out
string containing the message count signified as "c" i.e. { "c":
of the box. For this reason, each server is kept as close to a
1 }. The server then responds back with a JSON string
generic "echo server" as possible. The data being collected will
containing "c" along with the additional property "ts" which is
provide insights on the performance of the websocket servers.
the Unix timestamp of when the server receives the message i.e.
The specific metrics that are of interest in this assessment are
{ "c": 1, "ts": 780283800 }.
the round trip time (the time from when a client sends a request
The original intention of the "ts" property is for the
to the server, to the time the client receives a response), and the
client to be able to calculate the time it takes the server to
success rate of requests (the percentage of requests which
receive the client’s message, as well as the time it takes the
successfully get a response).
client to receive the server’s response. However, this calculation
It is important to note that a websocket which is critical
turns out to be more complex due to the system times of the
to an application should be set-up in a more optimal
client machine and the server machine not being perfectly
environment than the one being demonstrated in this testing.
synced. Because of this, only the total round trip time will be
Nevertheless, accounting for the worst case scenario is always
calculated for each request.
imperative. A websocket that needs to be dependable may use
That being said, the "ts" property remains in the
multithreading in order to achieve better performance, and/or a
response in order to give the websocket a task to complete
load balancer to distribute connected clients. That being said,
before responding. This small operation gives the websocket an
perhaps one has a websocket with a defect, whether it is due to
intermediate step, rather than behave as a basic echo server. As
an undetected bug or a flaw in the initial design. Contemplate
this is not a performance test of the languages themselves, it felt
what may happen to the flawed websocket application in a
improper to replace this step with some arbitrary pattern
moment of high demand. Even if load balanced, assess the
matching or matrix multiplication. Because of this, it has been
events that would follow if one of the instances of the
decided to keep the process of decoding an incoming JSON
websocket goes down. Will the remaining instances be able to
string, adding a new property, and encoding a JSON string,
shoulder the influx of incoming connections, as the abandoned
Prior to running the benchmarks, my predictions are as Each test was run 3 times per websocket server, with
follows: the websocket written in C will have the fastest round the results of the 3 tests being averaged together. During the
trip times. Furthermore, the websocket written in Python will benchmarking process, the client machine sends upwards of
22.5 Mb/s to the server, with the more performant websocket
Go’s performance does not just lag behind by a little, rather it 120,000
takes over twice as long to complete the benchmark when 100,000
compared to the next slowest websocket, which is Rust. 80,000
Something that does not come as too much of a
60,000
surprise is the fastest websocket. Although, my initial prediction
40,000
was that C would perform the best, it is not unexpected that
20,000
NodeJS takes the crown. Node’s asynchronous nature allows for
0
greater throughput of requests coming into the server. C++ C# Go Java NodeJS PHP Rust
Fig. 2 Visually displays the time it takes each server to connect to the
Request Time Elapse cumulative 10,000 websocket clients.
300,000
5,000,000
150,000
4,000,000
Milliseconds
100,000
3,000,000
50,000
2,000,000
0
0 2000 4000 6000 8000 10000
1,000,000
Connections
C C++ C# Go Java NodeJS 0
PHP Python Rust C++ C# Go Java NodeJS PHP Rust
Fig. 1 Illustrates the time it takes each websocket server to respond to all Fig. 3 Visually displays the time it takes each server to respond to the
requests from the benchmarking client for a given number of connections. cumulative 5.5 million requests.
7,000,000 140,000
6,000,000 120,000
5,000,000 100,000
Milliseconds
Milliseconds
4,000,000 80,000
3,000,000 60,000
2,000,000 40,000
1,000,000 20,000
0 0
C++ C# Go Java NodeJS PHP Rust 0 2000 4000 6000 8000 10000
Fig. 4 Visually displays the total time each server takes to complete the entire Connections
benchmark test.
C C++ C# Go Java NodeJS
PHP Python Rust
D. Request Time
Fig. 5 Illustrates the longest amount of time it takes each websocket server to
The amount of time it takes for the websocket servers respond to a single request for a given number of connections.
to respond to requests increases linearly as the total number of
requests also increases linearly. Go is the biggest loser in this
category, taking 100 minutes to complete the cumulative 5.5 to find out that this is highly discouraged. Libwebsocket’s own
million requests. NodeJS performs the best, taking under 12 website [2] states "Directly performing websocket actions from
minutes to complete all requests. Rust takes 42 minutes, C++ other threads is not allowed. Aside from the internal data being
takes 37 minutes, PHP takes 32 minutes, C# 20 minutes, and inconsistent in forked() processes, the scope of a wsi (struct
Java 16 minutes. websocket) can end at any time during service with the socket
closing and the wsi freed." The wsi variable referenced in that
E. Longest Round Trip quote refers to the pointer to the client connection. Plainly
speaking, if one tries to multithread LWS, one could end up with
Another metric to be considered is the longest round trip time, incorrect data in the forked thread, or lose the connection to the
this being the longest amount of time for a single round trip to client all together. Therefore, the poor performance experienced
complete. This is useful to see how long a websocket server is a design decision of the LWS library. This suggests that while
could potentially leave a client waiting for a response. our tests shows poor performance for the C websocket server,
Obviously, Go has the longest round trip times. PHP and C++ tie there may be other C libraries which would offer better speed
for third. Java and C# tie for second, and NodeJS arises and reliability.
victorious having the fastest longest round trips. The idea of this single-threaded design also explains
Go’s poor performance. Go does not meet the performance
VI. ANALYSIS expected either. However, the explanation is quite simple. Go is
While the results of the benchmark tests are not as designed to take advantage of concurrent processing. In other
expected, they do make sense once further understood. Let us words, Go achieves its renowned performance by completing
start by looking at C, or more specifically LWS tasks in parallel. Therefore, running everything from a single
(Libwebsockets). As it turns out, LWS is in fact an extremely goroutine (a single thread) substantially hampers the
inefficient websocket library when it comes to performance. The performance of the websocket, as it was never designed to be
entire websocket server runs in a blocking event loop on a single utilized in such a bare-bones set-up. The good news is, unlike
thread. To put it simply the server operates something like this: the C websocket, the Go server is able to use multiple
goroutines, and therefore be multithreaded for better
1. Accept incoming message
performance.
2. Read incoming message
So how did C++, PHP, and Rust achieve better
3. Generate response
performance than their C and Go counterparts? To put it simply,
4. Send response
while C and Go servers are subject to blocking code, the C++,
5. Repeat
PHP, and Rust servers are not. In other words, C and Go
It does this for each incoming request, one request at a time, and complete each task one at a time, in order, one after another.
all in a single thread. In fact, it was proposed earlier that Meanwhile C++, PHP, and Rust can complete their tasks
multithreading the application could improve performance, only asynchronously, out of order, in whatever sequence will get the
REFERENCE
[1] A. Burets, “Python vs PHP: Main Differences and Comparison: SCAND
Blog,” SCAND, 28-Jun-2019. [Online]. Available: https://scand.com/
company/blog/python-vs-php-for-web-development/. [Accessed: 26-
Jan-2021].
50,000
300000
40,000
Milliseconds
Milliseconds
30,000 200000
20,000
100000
10,000
0 0
0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000
Connections Connections
Fig. 6 Illustrates the average amount of time it takes each websocket server to Fig. 5 Illustrates the total amount of time it takes each websocket server to
respond to a single request for a given number of connections. complete all requests for a given number of connections
4,500
140,000
4,000
120,000
3,500
100,000 3,000
Milliseconds
Milliseconds
80,000 2,500
2,000
60,000
1,500
40,000
1,000
20,000 500
0 0
0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000
Connections Connections