The Websocket Handbook
The Websocket Handbook
Handbook
Learn about the technology underpinning
the realtime web and build your first web
app powered by WebSockets
The WebSocket
Handbook
By Alex Diaconu
Version 1, October 2021
Special Thanks
In no particular order: Jo Franchetti (for contributing Chapter 4 and building the demo
app), Ramiro Nuñez Dosio (for encouraging me to write the book in the first place,
giving valuable advice, and removing blockers), Jonathan Mercier-Ganady (for the
technical review), Jo Stichbury (for the editorial review), Leonie Wharton, Chris Hipson,
Jamie Watson (for all the design work involved).
Preface 06
Who this book is for 06
What this book covers 07
Final thoughts 61
About Ably 62
Preface
Our everyday digital experiences are in the midst of a
realtime revolution. Whether we’re talking about virtual
events, EdTech, news and financial information, IoT
devices, asset tracking and logistics, live score updates, or
gaming, consumers increasingly expect realtime digital
experiences as standard. And what better to power these
realtime interactions than WebSockets?
Until the emergence of WebSockets, the “realtime” web was difficult to achieve and
slower than we’re used to nowadays; it was delivered by hacking existing HTTP-based
technologies that were not designed and optimized for realtime applications.
Knowledge of/familiarity with HTML, JavaScript (and Node.js), HTTP, web APIs, and web
development is required to get the most out of this book.
Chapter 2: The WebSocket Protocol covers key considerations related to the WebSocket
protocol. You’ll find out how to establish a WebSocket connection and exchange
messages, what kind of data can be sent over WebSockets, what types of extensions and
subprotocols you can use to augment WebSockets.
Chapter 3: The WebSocket API provides details about the constituent components of the
WebSocket API — its events, methods, and properties, alongside usage examples for
each of them.
Resources — a collection of articles, videos, and WebSocket solutions you might want to
explore.
The reader should bear in mind that this is the first version of this book; therefore, it does
not intend to be exhaustive. In future versions, we plan to:
• Add more details to the existing chapters.
• Provide more examples and walkthroughs for building apps with WebSockets.
• Cover additional topics that are currently out of scope, such as engineering
challenges (for example, scaling and making WebSockets reliable), and alternatives
to WebSockets.
The first realtime web apps started to appear in the 2000s, attempting to deliver
responsive, dynamic, and interactive end-user experiences. However, at that time, the
realtime web was difficult to achieve and slower than we’re used to nowadays; it was
delivered by hacking existing HTTP-based technologies that were not designed and
optimized for realtime applications. It quickly became obvious that a better alternative
was needed.
In this first chapter, we’ll look at how web technologies evolved, culminating with the
emergence of WebSockets, a vastly superior improvement on HTTP for building realtime
web apps.
This initial version of HTTP1 (commonly known as HTTP/0.9) that Berners-Lee developed
was incredibly basic. Requests consisted of a single line and started with the only
supported method, GET, followed by the path to the resource:
GET /mypage.html
<HTML>
My HTML page
</HTML>
There were no HTTP headers, status codes, URLs, or versioning, and the connection was
terminated immediately after receiving the response.
Since interest in the web was skyrocketing, and with HTTP/0.9 being severely limited,
both browsers and servers quickly made the protocol more versatile by adding new
capabilities. Some key changes:
• Header fields including rich metadata about the request and response (HTTP version
number, status code, content type).
• Two new methods — HEAD and POST.
• Additional content types (e.g., scripts, stylesheets, or media), so that the response was
no longer restricted to hypertext.
1
The Original HTTP as defined in 1991
2
The IETF HTTP Working Group
3
RFC 1945: Hypertext Transfer Protocol - HTTP/1.0
In 1995, Netscape hired Brendan Eich with the goal of embedding scripting capabilities
into their Netscape Navigator browser. Thus, JavaScript was born. The first version of the
language was simple, and you could only use it for a few things, such as basic validation
of input fields before submitting an HTML form to the server. Limited as it was back
then, JavaScript brought dynamic experiences to a web that had been fully static until
that point. Progressively, JavaScript was enhanced, standardized, and adopted by all
browsers, becoming one of the core technologies of the web as we know it today.
4
RFC 2068: Hypertext Transfer Protocol - HTTP/1.1
5
IETF HTTP Working Group, HTTP Documentation, Core Specifications
We will now look at the main HTTP-centric design models that emerged for developing
realtime apps: AJAX and Comet.
AJAX
AJAX (short for Asynchronous JavaScript and XML) is a method of asynchronously
exchanging data with a server in the background and updating parts of a web page —
without the need for an entire page refresh (postback).
Publicly used as a term for the first time in 20056, AJAX encompasses several technologies:
• HTML (or XHTML) and CSS for presentation.
• Document Object Model (DOM) for dynamic display and interaction.
• XML or JSON for data interchange, and XSLT for XML manipulation.
• XMLHttpRequest7 (XHR) object for asynchronous communication.
• JavaScript to bind everything together.
It’s worth emphasizing the importance of XMLHttpRequest, a built-in browser object that
allows you to make HTTP requests in JavaScript. The concept behind XHR was initially
created at Microsoft and included in Internet Explorer 5, in 1999. In just a few years,
XMLHttpRequest would benefit from widespread adoption, being implemented by Mozilla
Firefox, Safari, Opera, and other browsers.
Let’s now look at how AJAX works, by comparing it to the classic model of building a web
app.
6
Jesse James Garrett, Ajax: A New Approach to Web Applications
7
XMLHttpRequest Living Standard
In a classic model, most user actions in the UI trigger an HTTP request sent to the server.
The server processes the request and returns the entire HTML page to the client.
In comparison, AJAX introduces an intermediary (an AJAX engine) between the user and
the server. Although it might seem counterintuitive, the intermediary significantly improves
responsiveness. Instead of loading the webpage, at the start of the session, the client
loads the AJAX engine, which is responsible for:
• Regularly polling the server on the client’s behalf.
• Rendering the interface the user sees, and updating it with data retrieved from the
server.
AJAX (and XMLHttpRequest request in particular) can be considered a black swan event
for the web. It opened up the potential for web developers to start building truly dynamic,
asynchronous, realtime-like web applications that could communicate with the server
silently in the background, without interrupting the user’s browsing experience. Google
was among the first to adopt the AJAX model in the mid-2000s, initially using it for Google
Suggest, and its Gmail and Google Maps products. This sparked widespread interest in
AJAX, which quickly became popular and heavily used.
The Comet model was made famous by organizations such as Google and Meebo. The
former initially used Comet to add web-based chat to Gmail, while Meebo used it for their
web-based chat app that enabled users to connect to AOL, Yahoo, and Microsoft chat
platforms through the browser. In a short time, Comet became a default standard for
building responsive, interactive web apps.
Several different techniques can be used to deliver the Comet model, the most well-
known being long polling9 and HTTP streaming. Let’s now quickly review how these two
work.
Long polling
Essentially a more efficient form of
polling, long polling is a technique
where the server elects to hold a client’s
connection open for as long as possible,
delivering a response only after data
becomes available or a timeout threshold
is reached. Upon receipt of the server
response, the client usually issues another
request immediately. Long polling is
often implemented on the back of
XMLHttpRequest , the same object that
plays a key role in the AJAX model. Figure 1.2: High-level overview of long polling
HTTP streaming
Also known as HTTP server push, HTTP streaming is a data transfer technique that allows
a web server to continuously send data to a client over a single HTTP connection that
remains open indefinitely. Whenever there’s an update available, the server sends a
response, and only closes the connection when explicitly told to do so.
HTTP streaming can be achieved by using the chunked transfer encoding mechanism
available in HTTP/1.1. With this approach, the server can send response data in chunks of
newline-delimited strings, which are processed on the fly by the client.
8
Alex Russell, Comet: Low Latency Data for the Browser
9
Long Polling - Concepts and Considerations
HTTP/1.1 200 OK
Content-Type: text/plain
Transfer-Encoding: chunked
7\r\n
Chunked\r\n
8\r\n
Response\r\n
7\r\n
Example\r\n
0\r\n
\r\n
Server-Sent Events10 (SSE) is another option you can leverage to implement HTTP
streaming. SSE is a server push technology commonly used to send message updates or
continuous data streams to a browser client. SSE aims to enhance native, cross-browser
server-to-client streaming through a JavaScript API called EventSource, standardized11 as
part of HTML5 by the World Wide Web Consortium (W3C).
10
Server-Sent Events (SSE): A Conceptual Deep Dive
11
Server-sent events, HTML Living Standard
Most of their limitations stem from using HTTP as the underlying transport protocol. The
problem is that HTTP was initially designed to serve hypermedia resources in a request-
response fashion. It hadn’t been optimized to power realtime apps that usually involve
high-frequency or ongoing client-server communication, and the ability to react instantly
to changes.
Hacking HTTP-based technologies to emulate the realtime web was bound to lead to all
sorts of drawbacks. We will now cover the main ones (without being exhaustive).
Limited scalability
HTTP polling, for example, involves sending requests to the server at fixed intervals to see
if there’s any new update to retrieve. High polling frequencies result in increased network
traffic and server demands; this doesn’t scale well, especially as the number of concurrent
users rises. Low polling frequencies will be less taxing on the server, but they may result in
delivery of stale information that has lost (part of) its value.
Although an improvement on regular polling, long polling is also intensive on the server,
and handling thousands of simultaneous long polling requests requires huge amounts of
resources.
Another problem is that a server may send a response, but network or browser issues
may prevent the message from being successfully received. Unless some sort of message
receipt confirmation process is implemented, a subsequent call to the server may result in
missed messages.
Although HTTP streaming techniques are better for lower latencies than (long) polling,
they are limited themselves (just like any other HTTP-based mechanism) by HTTP headers,
which increase message size and cause unnecessary delays. Often, the HTTP headers in
the response outweigh the core data being delivered12.
No bi-directional streaming
A request/response protocol by design, HTTP doesn’t support bidirectional, always-
on, realtime communication between client and server over the same connection. You
can create the illusion of bidirectional realtime communication by using two HTTP
connections. However, the maintenance of these two connections introduces significant
overhead on the server, because it takes double the resources to serve a single client.
With the web continuously evolving, and user expectations of rich, realtime web-based
experiences growing, it was becoming increasingly obvious that an alternative to HTTP
was needed.
12
Matthew O’Riordan, Google — polling like it’s the 90s
13
IRC logs, 18.06.2008
14
W3C mailing lists, TCPConnection feedback
WEBSOCKETS HTTP/1.1
Communication
Full-duplex Half-duplex
Bi-directional Request-response
Server push
Overhead
State
Stateful Stateless
HTTP and WebSockets are designed for different use cases. For example, HTTP is a good
choice if your app relies heavily on CRUD operations, and there’s no need for the user
to react to changes quickly. On the other hand, when it comes to scalable, low-latency
realtime applications, WebSockets are the way to go. More about this in the next section.
Adoption
Initially called TCPConnection, the WebSocket interface made its way into the HTML5
specification15, which was first released as a draft in January 2008. The WebSocket
protocol was standardized in 2011 via RFC 6455; more about this in Chapter 2: The
WebSocket Protocol.
In December 2009, Google Chrome 4 was the first browser to ship full support for
WebSockets. Other browser vendors started to follow suit over the next few years; today,
all major browsers have full support for WebSockets. Going beyond web browsers,
WebSockets can be used to power realtime communication across various types of user
agents — for example, mobile apps.
Nowadays, WebSockets are a key technology for building scalable realtime web apps.
The WebSocket API and protocol have a thriving community, which is reflected by a
variety of client and server options (both open-source and commercial), developer
ecosystems, and myriad real-life implementations.
15
Web sockets, HTML Living Standard
This chapter covers key considerations related to the WebSocket protocol, as described
in RFC 6455. You’ll find out how to establish a WebSocket connection and exchange
messages, what kind of data can be sent over WebSockets, what types of extensions and
subprotocols you can use to augment WebSockets.
Protocol overview
The WebSocket protocol enables ongoing, full-duplex, bidirectional communication
between web servers and web clients over an underlying TCP connection.
16
RFC 6455: The WebSocket Protocol
17
IANA WebSocket Protocol Registries
The WebSocket URI schemes are analogous to the HTTP ones; the wss scheme
uses the same security mechanism as https to secure connections, while ws
corresponds to http.
The rest of the WebSocket URI follows a generic syntax, similar to HTTP. It consists of
several components: host, port, path, and query, as highlighted in the example below.
Client request
Here’s a basic example of a GET request made by the client to initiate the opening
handshake:
In addition to the required headers, the request may also contain optional ones. See
the Opening handshake headers section later in this chapter for more information on
headers.
18
RFC 8441: Bootstrapping WebSockets with HTTP/2
Server response
The server must return an HTTP 101 Switching Protocols response code for the
WebSocket connection to be successfully established:
The response must contain several headers: Connection, Upgrade, and Sec-WebSocket-
Accept. Other optional headers may be included, such as Sec-WebSocket-Extensions ,
or Sec-WebSocket-Protocol (provided they were passed in the client request). See the
Opening handshake headers section in this chapter for additional details.
If the status code returned by the server is anything but HTTP 101 Switching
Protocol, the handshake will fail, and the WebSocket connection will not be
established.
Host Yes The host name and optionally the port number of the
server to which the request is being sent. If no port
number is included, a default value is implied (80 for
ws, or 433 for wss).
Sec-WebSocket- Yes The only accepted value is 13. Any other version passed
Version in this header is invalid.
Some common, optional headers like User-Agent, Referer, or Cookie may also be used in
the opening handshake. However, we have omitted them from the table above, as they
don’t directly pertain to WebSockets.
First, we have Sec-WebSocket-Key, which is passed by the client to the server, and contains
a 16-byte, base64-encoded one-time random value (nonce). Its purpose is to help ensure
that the server does not accept connections from non-WebSocket clients (e.g., HTTP
clients) that are being abused (or misconfigured) to send data to unsuspecting WebSocket
servers. Here’s an example of Sec-WebSocket-Key:
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
The WebSocket frame has a binary syntax and contains several pieces of information, as
shown in the following figure:
We will now take a more detailed look at all these constituent parts of a WebSocket frame.
Per RFC 645519, another use case for fragmentation is represented by multiplexing, where
“[...] it is not desirable for a large message on one logical channel to monopolize the
output channel, so the multiplexing needs to be free to split the message into smaller
fragments to better share the output channel.”
All data frames that comprise a WebSocket message must be of the same type
(text or binary); you can’t have a fragmented message that consists of both text
and binary frames. However, a fragmented WebSocket message may include
control frames. See the Opcodes section later in this chapter for more details
about frame types.
Let’s now look at some quick examples to illustrate fragmentation. Here’s what a single-
frame message might look like:
In comparison, with fragmentation, the same message would look like this:
The WebSocket protocol makes fragmentation possible via the first bit of the WebSocket
frame — the FIN bit, which indicates whether the frame is the final fragment in a
message. If it is, the FIN bit must be set to 1. Any other frame must have the FIN bit clear.
19
RFC 6455: The WebSocket Protocol
Opcodes
Every frame has an opcode that determines how to interpret that frame’s payload data.
The standard opcodes currently in use are defined by RFC 6455 and maintained by
IANA20.
OPCODE DESCRIPTION
8 (Close), 9 (Ping), and 10 (Pong) are known as control frames, and they are used
to communicate state about the WebSocket connection.
20
IANA WebSocket Opcode Registry
A masking bit set to 1 indicates that the respective frame is masked (and
therefore contains a masking-key). The server will close the WebSocket
connection if it receives an unmasked frame.
On the server-side, frames received from the client must be unmasked before further
processing. Here’s an example of how you can do that:
Payload length
The WebSocket protocol encodes the length of the payload data using a variable number
of bytes:
• For payloads <126 bytes, the length is packed into the first two frame header bytes.
• For payloads of 126 bytes, two extra header bytes are used to indicate length.
• If the payload is 127 bytes, eight additional header bytes are used to indicate its
length.
Each frame’s payload type is indicated via a 4-bit opcode (1 for text or 2 for binary).
Closing handshake
Compared to the opening handshake, the closing handshake is a much simpler process.
You initiate it by sending a close frame with an opcode of 8. In addition to the opcode,
the close frame may contain a body that indicates the reason for closing. This body
consists of a status code (integer) and a UTF-8 encoded string (the reason).
The standard status codes that can be used during the closing handshake are defined by
RFC 6455; additional, custom close codes can be registered with IANA21.
0-999 N/A Codes below 1000 are invalid and cannot be used.
1000 Normal closure Indicates a normal closure, meaning that the purpose
for which the WebSocket connection was established
has been fulfilled.
1001 Going away Should be used when closing the connection and there
is no expectation that a follow-up connection will
be attempted (e.g., server shutting down, or browser
navigating away from the page).
21
IANA WebSocket Close Code Number Registry
1005 No status received Used by apps and the WebSocket API to indicate
that no status code was received, although one was
expected.
1006 Abnormal closure Used by apps and the WebSocket API to indicate that
a connection was closed abnormally (e.g., without
sending or receiving a close frame).
1009 Message too big The endpoint is terminating the connection due to
receiving a data frame that is too large to process.
1013 Try again later The server is terminating the connection due to a
temporary condition, e.g., it is overloaded.
1014 Bad gateway The server was acting as a gateway or proxy and
received an invalid response from the upstream server.
Similar to 502 Bad Gateway HTTP status code.
1015 TLS handshake Reserved. Indicates that the connection was closed due
to a failure to perform a TLS handshake (e.g., the server
certificate can’t be verified).
Both the client and the server can initiate the closing handshake. Upon receiving a close
frame, an endpoint (client or server) has to send a close frame as a response (echoing the
status code received).
Once a close frame has been sent, no more data frames can pass over the
WebSocket connection.
After an endpoint has both sent and received a close frame, the closing handshake is
complete, and the WebSocket connection is considered closed.
Subprotocols are negotiated during the opening handshake. The client uses the Sec-
WebSocket-Protocol header to pass along one or more comma-separated subprotocols,
as shown in this example:
Provided it understands the subprotocols passed in the client request, the server must pick
one (and only one) and return it alongside the Sec-WebSocket-Protocol header. From this
point onwards, the client and server can communicate over the negotiated subprotocol.
If the server doesn’t agree with any of the subprotocols suggested, the Sec-
WebSocket-Protocol header won’t be included in the response.
22
IANA WebSocket Subprotocol Name Registry
23
Kayla Matthews, MQTT: A Conceptual Deep-Dive
24
The Simple Text Oriented Messaging Protocol (STOMP)
At the time of writing, there are only a couple of extensions registered with IANA25, such
as permessage-deflate, which compresses the payload data portion of WebSocket
frames. If you’re interested in developing your own extension, you can use an open-source
framework like websocket-extensions26.
Extensions are negotiated during the opening handshake. The client uses the Sec-
Websocket-Extensions header to pass along the extensions it wishes to use, as shown in
this example:
Provided it supports the extensions sent in the client request, the server must include
them in the response, alongside the Sec-WebSocket-Extensions header. From this point
onwards, the client and server can communicate over WebSockets using the extensions
they’ve negotiated.
Security
In this section, we will cover some of the mechanisms you can use to secure WebSocket
connections, and communication done over the WebSocket protocol. This is by no means
an exhaustive section; it only aims to provide a high-level overview of several security-
related considerations. More about the complex topic of WebSockets security will be
treated in future versions of this book.
Let’s start with the Origin header, which is sent by all browser clients (optional for non-
browsers) to the server during the opening handshake. The Origin header is essential
for securing cross-domain communication. Specifically, if the Origin indicated is
unacceptable, the server can fail the handshake (usually by returning an HTTP 403
Forbidden status code). This ability can be extremely helpful in mitigating denial of service
(DoS) attacks.
25
IANA WebSocket Extension Name Registry
26
The websocket-extensions framework
The WebSocket protocol doesn’t prescribe any particular way that servers can
authenticate clients. For example, you can handle authentication during the
opening handshake, by using cookie headers. Another option is to manage
authentication (and authorization) at the application level, by using techniques
such as JSON Web Tokens27.
So far, we’ve covered security mechanisms that are used during connection establishment.
Now, let’s look at some aspects that impact security during data exchange between
the client and the server. First of all, to reduce the chance of man-in-the-middle
and eavesdropping attacks (especially when exchanging critical, sensitive data), it’s
recommended to use the wss URI scheme — which uses TLS to encrypt the connection, just
like https.
We’ve talked about message frames earlier in this chapter, and mentioned that frames
sent by the client to the server need to be masked with the help of a random masking-
key (32-bit value). This key is contained within the frame, and it’s used to obfuscate the
payload data. Frames need to be unmasked by the server before further processing.
Masking makes WebSocket traffic look different from HTTP traffic, which is especially useful
when proxy servers are involved. That’s because some proxy servers may not “understand”
the WebSocket protocol, and, were it not for the mask, they might mistake it for regular
HTTP traffic; this could lead to all sorts of problems, such as cache poisoning.
27
RFC 7519: JSON Web Token (JWT)
Overview
Defined in the HTML Living Standard28, the WebSocket API is a technology that makes
it possible to open a persistent two-way, full-duplex communication channel between
a web client and a web server. The WebSocket interface enables you to send messages
asynchronously to a server and receive event-driven responses without having to poll for
updates.
Almost all modern browsers support the WebSocket API29. Additionally, there are plenty
of frameworks and libraries — both open-source and commercial solutions — that
implement WebSocket APIs. See the Resources section in this book for more details.
For the rest of this chapter, we will cover the core capabilities of the WebSocket API. As you
will see, it’s intuitive, designed with simplicity in mind, and trivial to use.
28
Web sockets, HTML Living Standard
29
Can I use WebSockets?
See Chapter 4: Building a Web App with WebSockets to learn how to create your own
WebSocket server in Node.js.
The WebSocket constructor contains a required parameter — the url to the WebSocket
server. Additionally, the optional protocols parameter may also be included, to indicate
one or more WebSocket subprotocols (application-level protocols) that can be used
during the client-server communication:
Once the WebSocket object is created and the connection is established, the client can
start exchanging data with the server.
30
Berkeley sockets
Open
The open event is raised when a WebSocket connection is established. It indicates that the
opening handshake between the client and the server was successful, and the WebSocket
connection can now be used to send and receive data. Here’s a usage example:
// Connection opened
socket.onopen = function(e) {
console.log('Connection open!');
};
socket.onmessage = function(msg) {
if(msg.data instanceof ArrayBuffer) {
processArrayBuffer(msg.data);
} else {
processText(msg.data);
}
}
Error
The error event is fired in response to unexpected failures or issues (for example, some
data couldn’t be sent). Here’s how you listen for error events:
socket.onerror = function(e) {
console.log('WebSocket failure', e);
handleErrors(e);
};
socket.onclose = function(e) {
console.log('Connection closed', e);
};
You can manually trigger calling the close event by executing the close()
method.
Methods
The WebSocket API supports two methods: send() and close().
send()
Once the connection has been established, you’re almost ready to start sending and
receiving messages to and from the WebSocket server. But before doing that, you first
have to ensure that the connection is open and ready to receive messages. You can
achieve this in two main ways.
The first option is to trigger the send() method from within the onopen event handler, as
demonstrated in the following example:
socket.onopen = function(e) {
socket.send(JSON.stringify({'msg': 'payload'}));
}
function processEvent(e) {
if(socket.readyState === WebSocket.OPEN) {
// Socket open, send!
socket.send(e);
} else {
// Show an error, queue it for sending later, etc
}
}
The two code snippets above show how to send text (string) messages. However, in
addition to strings, you can also send binary data (Blob or ArrayBuffer), as shown in this
example:
After sending one or more messages, you can leave the WebSocket connection open for
further data exchanges, or call the close() method to terminate it.
close()
The close() method is used to close the WebSocket connection (or connection attempt).
It’s essentially the equivalent of the closing handshake we covered previously, in Chapter
2. After this method is called, no more data can be sent or received over the WebSocket
connection.
If the connection is already closed, calling the close() method does nothing.
socket.close();
Here’s an example of calling the close() method with the two optional parameters:
Properties
The WebSocket object exposes several properties containing details about the WebSocket
connection.
binaryType
The binaryType property controls the type of binary data being received over the
WebSocket connection. The default value is blob; additionally, WebSockets also support
arraybuffer.
bufferedAmount
Read-only property that returns the number of bytes of data queued for transmission but
not yet sent. The value of bufferedAmount resets to zero once all queued data has been
sent.
bufferedAmount is most useful particularly when the client application transports large
amounts of data to the server. Even though calling send() is instant, actually transmitting
that data over the Internet is not. Browsers will buffer outgoing data on behalf of your
client application. The bufferedAmount property is useful for ensuring that all data is sent
before closing a connection, or performing your own throttling on the client-side.
Below is an example of how to use bufferedAmount to send updates every second, and
adjust accordingly if the network cannot handle the rate:
extensions
Read-only property that returns the name of the WebSocket extensions that were
negotiated between client and server during the opening handshake.
“onevent” properties
These properties are called to run associated handler code whenever a WebSocket event
is fired. There are four types of “onevent” properties, one for each type of event:
PROPERTY DESCRIPTION
onerror Gets called when an error event occurs, impacting the WebSocket connection.
onclose Called with a close event when the WebSocket’s connection readyState
property changes to 3; this indicates that the connection is closed.
Subprotocols are specified via the protocols parameter when creating the
WebSocket object (see The WebSocket constructor section earlier in this chapter
for details). If no protocol is specified during connection establishment, the
protocol property will return an empty string.
readyState
Read-only property that returns the current state of the WebSocket connection. The table
below shows the values you can see reflected by this property, and their meaning:
0 CONNECTING Socket has been created, but the connection is not yet
open.
The value of readyState will change over time. It’s recommended to check
it periodically to understand the lifespan and life cycle of the WebSocket
connection.
url
Read-only property that returns the absolute URL of the WebSocket, as resolved by the
constructor.
Using WebSockets in the frontend is fairly straightforward, via the WebSocket API built
into all modern browsers (we’ll use this API on the client-side in the first part of the demo,
alongside ws on the server-side). Additionally, there are plenty of libraries and solutions
implementing the WebSocket technology on both the client-side and the server-side. This
includes SockJS, which we will cover in the second part of the demo.
For more details about WebSocket client and server implementations, see the Resources
section.
31
Jo Franchetti, WebSockets and Node.js — testing WS and SockJS by building a web app
In order to demonstrate how to set up WebSockets with Node.js and ws, we will build
a demo app that shares users’ cursor positions in realtime. We walk through building it
below.
For brevity’s sake, we call it wss in our code. Any resemblance to secure
WebSockets (often referred to as wss) is a coincidence.
Next, create a Map to store a client’s metadata (any data we wish to associate with a
WebSocket client):
32
ws: a Node.js WebSocket library
clients.set(ws, metadata);
Every time a client connects, we generate a new unique ID, which is used to identify them.
Clients are also assigned a cursor color by using Math.random(); this generates a number
between 0 and 360, which corresponds to the hue value of an HSV color. The ID and
cursor color are then added to an object that we’ll call metadata, and we’re using the Map
to associate them with our ws WebSocket instance.
The Map is a dictionary — we can retrieve this metadata by calling get and providing a
WebSocket instance later on.
Using the newly connected WebSocket instance, we subscribe to that instance’s message
event, and provide a callback function that will be triggered whenever this specific client
sends a message to the server.
This event is on the WebSocket instance (ws) itself, and not on the WebSocket.
Server instance (wss).
Whenever our server receives a message, we use JSON.parse to get the message contents,
and load our client metadata for this socket from our Map using clients.get(ws).
We’re going to add our two metadata properties to the message as sender and color:
message.sender = metadata.id;
message.color = metadata.color;
[...clients.keys()].forEach((client) => {
client.send(outbound);
});
});
Finally, when a client closes its connection, we remove its metadata from our Map:
ws.on("close", () => {
clients.delete(ws);
});
});
function uuidv4() {
return 'xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx'.replace(/[xy]/g, function(c)
{
var r = Math.random() * 16 | 0, v = c == 'x' ? r : (r & 0x3 | 0x8);
return v.toString(16);
});
}
console.log('wss up');
This server implementation multicasts, sending any message it has received to all
connected clients.
We now need to write some client-side code to connect to the WebSocket server, and
transmit the user’s cursor position as it moves.
<!DOCTYPE html>
<html lang='en'>
<head>
<meta charset='UTF-8'>
<meta http-equiv='X-UA-Compatible' content='IE=edge'>
<meta name='viewport' content='width=device-width, initial-scale=1.0'>
<title>Document</title>
The body contains a single HTML template which contains an SVG image of a pointer.
We’re going to use JavaScript to clone this template whenever a new user connects to our
server.
<body id='box'>
<template id='cursor'>
<svg viewBox='0 0 16.3 24.7' class='cursor'>
<path stroke='#000' stroke-linecap='round' stroke-
linejoin='round'
stroke-miterlimit='10' d='M15.6 15.6L.6.6v20.5l4.6-4.5 3.2 7.5
3.4-1.3-3-7.2z' />
</svg>
</template>
</body>
</html>
(async function() {
const ws = await connectToServer();
...
We stringify this object, and send it via our now connected ws WebSocket instance as
the message text:
You might notice that the syntax here differs slightly from the server-side WebSocket code.
That’s because we’re using the browser’s native WebSocket class, rather than the ws library.
When we receive a message over the WebSocket, we parse the data property of the
message, which contains the stringified data that the onmousemove handler sent to the
WebSocket server, along with the additional sender and color properties that the server-
side code adds to the message.
We then use the x and y values from the messageBody to adjust the cursor position using a
CSS transform.
Our code relies on two utility functions. The first is connectToServer, which opens a
connection to our WebSocket server, and then returns a Promise that resolves when the
WebSocket readyState property is 1 - CONNECTED.
This means that we can just await this function, and we’ll know that we have a connected
and working WebSocket connection.
function getOrCreateCursorFor(messageBody) {
const sender = messageBody.sender;
const existing = document.querySelector(`[data-sender='${sender}']`);
if (existing) {
return existing;
}
If we can’t find an existing element, we clone our HTML template, add the data attribute
with the current sender ID to it, and append it to the document.body before returning it:
cursor.setAttribute('data-sender', sender);
svgPath.setAttribute('fill', `hsl(${messageBody.color}, 50%, 50%)`);
document.body.appendChild(cursor);
return cursor;
}
}) ();
Now when you run the web application, each user viewing the page will have a cursor
that appears on everyone’s screens because we are sending the data to all the clients
using WebSockets.
If not, you can clone a working version of the demo from: https://github.com/ably-labs/
websockets-cursor-sharing.
This demo includes two applications: a web app that we serve through Snowpack33, and a
Node.js web server. The NPM start task spins up both the API and the web server.
Click to play
(opens in a browser)
33
Snowpack
Figure 4.2: Error message returned by the browser when a WebSocket connection can’t be established
This is because the ws library offers no fallback transfer protocols if WebSockets are
unavailable. If this is a requirement for your project, or you want to have a higher level of
reliability of delivery for your messages, then you will need a library that offers multiple
transfer protocols, such as SockJS.
Using SockJS in the client is similar to the native WebSocket API, with a few small
differences. We can swap out ws in the demo built previously and use SockJS instead to
include fallback support.
<script src='https://cdn.jsdelivr.net/npm/sockjs-client@1/dist/sockjs.min.
js' defer></script>
Note the defer keyword — it ensures that the SockJS library is loaded before index.js
runs.
In the app/script.js file, we then update the JavaScript to use SockJS. Instead of the
WebSocket object, we’ll now use a SockJS object. Inside the connectToServer function, we’ll
establish the connection with the SockJS server:
SockJS requires a prefix path on the server URL. The rest of the app/script.js
file requires no change.
34
SockJS-client
35
SockJS-node
Then we need to require the sockjs module and the built-in HTTP module from Node.
Delete the line that requires ws and replace it with the following:
At the very bottom of the API/index.js file we’ll create the HTTPS server and add the
SockJS HTTP handlers:
We map the handlers to a prefix supplied in a configuration object ('/ws'). We tell the
HTTP server to listen on port 7071 (arbitrarily chosen) on all the network interfaces on the
machine.
The final job is to update the event names to work with SockJS:
And that’s it, the demo will now run with WebSockets where they are supported; and
where they aren’t, it will use Comet long polling. This latter fallback option will show a
slightly less smooth cursor movement, but it is more functional than no connection at all!
If not, you can clone a working version of the demo from: https://github.com/ably-labs/
websockets-cursor-sharing/tree/sockjs.
This demo includes two applications: a web app that we serve through Snowpack36, and a
Node.js web server. The NPM start task spins up both the API and the web server.
Click to play
(opens in a browser)
Figure 4.3: Realtime cursor movement powered by the SockJS WebSockets library
36
Snowpack
The number of active users you can support is thus directly related to how much hardware
your server has. Node.js is pretty good at managing concurrency, but once you reach
a few hundred to a few thousand users, you’re going to need to scale your hardware
vertically to keep all the users in sync.
Scaling vertically is often an expensive proposition, and you’ll always be faced with
a performance ceiling of the most powerful piece of hardware you can procure.
Additionally, vertical scaling is not elastic, so you have to do it ahead of time.
Once you’ve run out of vertical scaling options, you’ll have to consider horizontal scaling
— which is better in the long run, but also significantly more difficult.
There are multiple ways to solve this: either by using some form of direct connection
between the cluster nodes that are handling the traffic, or by using an external pub/sub38
mechanism. This is sometimes called “adding a backplane” to your infrastructure, and is
yet another moving part that makes scaling WebSockets difficult.
We’ll dive deeper into the many challenges of scaling WebSockets in future versions of this
book.
37
Redis
38
Everything You Need To Know About Publish/Subscribe
Videos
• A Beginner’s Guide to WebSockets
• The Complete Guide to WebSockets
• WebSockets Crash Course - Handshake, Use-cases, Pros & Cons and more
Further reading
• WebSockets Security: Main Attacks and Risks
• WebSocket Security - Cross-Site Hijacking (CSWSH)
• The Future of Web Software Is HTML-over-WebSockets
• Implementing a WebSocket server with Node.js
• Migrating Millions of Concurrent Websockets to Envoy (Slack Engineering)
• The Periodic Table of Realtime
We treasure any feedback from our readers. If you’ve spotted a mistake, if you have any
suggestions for what we should include in future versions of the ebook, or if you simply
want to chat about WebSockets, reach out to us!
Contact us
Ably provides a suite of APIs to build, extend, and deliver powerful digital experiences
in realtime – primarily over WebSockets – for more than 250 million devices across 80
countries each month. Organizations like Bloomberg, HubSpot, Verizon, and Hopin
depend on Ably’s platform to offload the growing complexity of business-critical realtime
data synchronization at global scale.