Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Node.js Design Patterns
Node.js Design Patterns

Node.js Design Patterns: Level up your Node.js skills and design production-grade applications using proven techniques , Fourth Edition

Arrow left icon
Profile Icon Luciano Mammino Profile Icon Mario Casciaro
Arrow right icon
€31.99 €35.99
Full star icon Full star icon Full star icon Full star icon Full star icon 5 (1 Ratings)
eBook Sep 2025 732 pages 4th Edition
eBook
€31.99 €35.99
Paperback
€51.99
Subscription
Free Trial
Renews at $19.99p/m
Arrow left icon
Profile Icon Luciano Mammino Profile Icon Mario Casciaro
Arrow right icon
€31.99 €35.99
Full star icon Full star icon Full star icon Full star icon Full star icon 5 (1 Ratings)
eBook Sep 2025 732 pages 4th Edition
eBook
€31.99 €35.99
Paperback
€51.99
Subscription
Free Trial
Renews at $19.99p/m
eBook
€31.99 €35.99
Paperback
€51.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Table of content icon View table of contents Preview book icon Preview Book

Node.js Design Patterns

Coding with Streams

We briefly touched on streams in Chapter 3, Callbacks and Events and Chapter 5, Asynchronous Control Flow Patterns with Promises and Async/Await as an option to make some of our code a bit more robust. Now, it’s finally time to dive in! We are here to talk about streams: one of the most important components and patterns of Node.js. There is a motto in the community that goes, “stream all the things!”, and this alone should be enough to describe the role of streams in Node.js. Dominic Tarr, an early contributor to the Node.js community, defined streams as “Node’s best and most misunderstood idea.” There are different reasons that make Node.js streams so attractive; it’s not just related to technical properties, such as performance or efficiency, but it’s more about their elegance and the way they fit perfectly into the Node.js philosophy. Yet, despite their potential, streams remain underutilized in the broader developer community. Many find them intimidating and choose to avoid them altogether. This chapter is here to change that. We’ll explore streams in depth, highlight their advantages, and present them in a clear and approachable way, making their power accessible to all developers.

But before we dive in, let’s take a short break for an author’s note (Luciano here). Streams are one of my favourite topics in Node.js, and I can’t help but share a story from my career where streams truly saved the day.

I was working for a network security company on a team developing a cloud application. The application’s purpose was to collect network metadata from physical devices monitoring traffic in corporate environments. Imagine recording all the connections between hosts in the network, which protocols they’re using, and how much data they’re transferring. This data could help spot the movement of an attacker in the network or uncover attempts at data exfiltration. The idea was simple yet powerful: in the event of a security incident, our customers could log into our platform, browse through the recorded metadata, and figure out exactly what happened, enabling them to take action quickly.

As you might imagine, this required continuously streaming a significant amount of data from devices at customer sites to our cloud-based web server. In the spirit of keeping things simple and shipping fast, our initial implementation of the data collector (the HTTP server receiving and storing metadata) used a buffered approach.

Devices would send network metadata in frames every minute, each containing all the observations from the previous 60 seconds.

Here’s how it worked: we’d load the entire frame into memory as it arrived, and only after receiving the complete frame would we write it to persistent storage. This worked well in the beginning because we were only serving small customers who generated relatively modest amounts of metadata, even during peak traffic.

But when we rolled out the solution to a larger customer, things started to break down. We noticed occasional failures in the collector and, worse, gaps in the stored data. After digging into the issue, we discovered that the collector was crashing due to excessive memory usage. If a customer generated a particularly large frame, the system couldn’t handle it, leading to data loss.

This was a serious problem. Our entire value proposition depended on being able to reliably store and retrieve network metadata for forensic analysis. If customers couldn’t trust us to preserve their data, the platform was effectively useless.

We needed a fix, and fast. The root of the problem was clear: buffering entire frames in memory was a rookie mistake. The solution? Keep the memory footprint low by processing data in smaller chunks and writing them to storage incrementally.

Enter Node.js streams. With streams, we could process data piece by piece as it arrived, rather than waiting for the entire frame. After refactoring our code to use streams, we were able to handle terabytes of data daily without breaking a sweat. The system’s latency improved dramatically: customers could see their data in the cloud in under two minutes. We also cut costs by using smaller machines with less memory, and the new implementation was far more elegant and maintainable, thanks to the composable nature of the Node.js streams API.

While this might sound like a specific use case, the lessons here apply broadly. Any time you’re moving data from A to B, especially when dealing with unpredictable volumes or when early results are valuable, Node.js streams are an invaluable tool.

I promise you that once you learn the fundamentals of streams, you’ll appreciate their power and see many opportunities to leverage them in your applications!

This chapter aims to provide a complete understanding of Node.js streams. The first half of this chapter serves as an introduction to the main ideas, the terminology, and the libraries behind Node.js streams. In the second half, we will cover more advanced topics and, most importantly, we will explore useful streaming patterns that can make your code more elegant and effective in many circumstances.

In this chapter, you will learn about the following topics:

  • Why streams are so important in Node.js
  • Understanding, using, and creating streams
  • Streams as a programming paradigm: leveraging their power in many different contexts and not just for I/O
  • Streaming patterns and connecting streams together in different configurations

Without further ado, let’s discover together why streams are one of the cornerstones of Node.js.

Discovering the importance of streams

In an event-based platform such as Node.js, the most efficient way to handle I/O is in real time, consuming the input as soon as it is available and sending the output as soon as the application produces it.

In this section, we will give you an initial introduction to Node.js streams and their strengths. Please bear in mind that this is only an overview, as a more detailed analysis on how to use and compose streams will follow later in this chapter.

Buffering versus streaming

Almost all the asynchronous APIs that we’ve seen so far in this book work using buffer mode. For an input operation, buffer mode causes all the data coming from a resource to be collected into a buffer until the operation is completed; it is then passed back to the caller as one single blob of data. The following diagram shows a visual example of this paradigm:

Figure 6.1: Buffering

Figure 6.1: Buffering

In Figure 6.1, we aim to transfer data containing the string “Hello Node.js” from a resource to a consumer. This process illustrates the concept of buffer mode, where all data is accumulated in a buffer before being consumed. At time t1, the first chunk of data, “Hello N,” is received from the resource and stored in the buffer. At t2, the second chunk, “ode.js, arrives, completing the read operation. With the entire string now fully accumulated in the buffer, it is sent to the consumer at t3.

Streams provide a different approach, allowing data to be processed incrementally as it arrives from the resource. This is shown in the following diagram:

Figure 6.2: Streaming

Figure 6.2: Streaming

This time, Figure 6.2 shows that, as soon as each new chunk of data is received from the resource, it is immediately passed to the consumer, who now has the chance to process it straight away, without waiting for all the data to be collected in the buffer.

But what are the differences between these two approaches? Purely from an efficiency perspective, streams are generally more efficient in terms of space (memory usage) and sometimes even in terms of computation clock time. However, Node.js streams have another important advantage: composability. Let’s now see what impact these properties have on the way we design and write our applications.

Spatial efficiency

First of all, streams allow us to do things that would not be possible by buffering data and processing it all at once. For example, consider the case in which we have to read a very big file, let’s say, in the order of hundreds of megabytes or even gigabytes. Clearly, using an API that returns a big buffer when the file is completely read is not a good idea. Imagine reading a few of these big files concurrently; our application would easily run out of memory. Besides that, buffers in V8 are limited in size. You cannot allocate more than a few gigabytes of data, so we may hit a wall way before running out of physical memory.

The actual maximum size of a buffer changes across platforms and versions of Node.js. If you are curious to find out what the limit in bytes is in a given platform, you can run this code:

import buffer from 'node:buffer'
console.log(buffer.constants.MAX_LENGTH)

Gzipping using a buffered API

To make a concrete example, let’s consider a simple command-line application that compresses a file using the GZIP format. Using a buffered API, such an application will look like the following in Node.js (error handling is omitted for brevity):

// gzip-buffer.js
import { readFile, writeFile } from 'node:fs/promises'
import { gzip } from 'node:zlib'
import { promisify } from 'node:util'
const gzipPromise = promisify(gzip) // note: gzip is a callback-based function
const filename = process.argv[2]
const data = await readFile(filename)
const gzippedData = await gzipPromise(data)
await writeFile(`${filename}.gz`, gzippedData)
console.log('File successfully compressed')

Now, we can try to run it with the following command:

node gzip-buffer.js <path to file>

If we choose a file that is big enough (for instance, 8 GB or more), we will most likely receive an error message saying that the file we are trying to read is bigger than the maximum allowed buffer size:

RangeError [ERR_FS_FILE_TOO_LARGE]: File size is greater than possible Buffer

That’s exactly what we expected, and it’s a symptom of the fact that we are using the wrong approach.

Note that the error happens when we execute readFile(). This is where we are taking the entire content of the file and loading it into a buffer in memory. Node.js will check the file size before starting to load its content. If the file is too big to fit in a buffer, then we will be presented with the ERR_FS_FILE_TOO_LARGE error.

Gzipping using streams

The simplest way we have to fix our Gzip application and make it work with big files is to use a streaming API. Let’s see how this can be achieved. Let’s write a new module with the following code:

// gzip-stream.js
import { createReadStream, createWriteStream } from 'node:fs'
import { createGzip } from 'node:zlib'
const filename = process.argv[2]
createReadStream(filename)
  .pipe(createGzip())
  .pipe(createWriteStream(`${filename}.gz`))
  .on('finish', () => console.log('File successfully compressed'))

“Is that it?” you may ask. Yes! As we said, streams are amazing because of their interface and composability, thus allowing clean, elegant, and concise code. We will see this in a while in more detail, but for now, the important thing to realize is that the program will run smoothly against files of any size and with constant memory utilization. Try it yourself (but consider that compressing a big file may take a while).

Note that, in the previous example, we omitted error handling for brevity. We will discuss the nuances of proper error handling with streams later in this chapter. Until then, be aware that most examples will be lacking proper error handling.

Time efficiency

We could talk about the time efficiency of streams in abstract terms, but it’s probably much easier to understand why streams are so advantageous by seeing them in action. Let’s work on something practical to appreciate how streams save both time and resources in real-world scenarios.

Let’s build a new client-server application! Our goal is to create a client that reads a file from the file system, compresses it, and sends it to a server over HTTP. The server will then receive the file, decompress it, and save it to a local folder. This way, we’re creating our very own homemade file transfer utility!

To achieve this, we have two options: we can use a buffer-based API or leverage streams. If we don’t expect to transfer large files, both approaches will get the job done, but they differ significantly in how the data is processed and transferred.

If we were to use a buffered API for this, the client would first need to load the entire file into memory as a buffer. Once the file is fully loaded, it will compress the data, creating a second buffer containing the compressed version. Only after these steps can the client send the compressed data to the server.

On the server side, a buffered approach would involve accumulating all the incoming data from the HTTP request into a buffer. Once all the data has been received, the server would decompress it into another buffer containing the uncompressed data, which would then be saved to disk.

While this works, a better approach uses streams. With streams, the client can start compressing and sending chunks of data as soon as they are read from the file system. Similarly, the server can decompress each chunk of data as soon as it arrives, eliminating the need to wait for the entire file. As a bonus, we have already seen how streams give us the ability to handle arbitrarily large files.

Let’s dive into how we can build a simple version of this stream-based approach, starting with the server:

// gzip-receive.js
import { createServer } from 'node:http'
import { createWriteStream } from 'node:fs'
import { createGunzip } from 'node:zlib'
import { basename, join } from 'node:path'
const server = createServer((req, res) => {
  const filename = basename(req.headers['x-filename'])
  const destFilename = join(import.meta.dirname, 'received_files',
    filename)
  console.log(`File request received: ${filename}`)
  req
    .pipe(createGunzip())
    .pipe(createWriteStream(destFilename))
    .on('finish', () => {
      res.writeHead(201, { 'content-type': 'text/plain' })
      res.end('OK\n')
      console.log(`File saved: ${destFilename}`)
    })
})
server.listen(3000, () => console.log('Listening on http://localhost:3000'))

In the preceding example, we are setting up an HTTP server that listens for incoming file uploads, decompresses them, and saves them to disk. The key part of this server is the handler function (the one passed to the createServer() function), where two important objects, req (the request) and res (the response), come into play. These objects are both streams:

  • req represents the incoming request from the client to the server. In this case, it carries the compressed file data being sent by the client.
  • res represents the outgoing response from the server back to the client.

The focus here is on req, which acts as the source stream. The code processes req by:

  • Decompressing it using createGunzip().
  • Saving it to disk with createWriteStream() in a directory named received_files (in the same folder as this code example).

The pipe() calls link these steps together, creating a smooth flow of data from the incoming request, through decompression, to the file on disk. Don’t worry too much about the pipe() syntax for now—we’ll cover it in more detail later in the chapter.

When all the data has been written to disk, the finish event is triggered. At this point, the server responds to the client with a status code of 201 (Created) and a simple "OK" message, indicating that the file has been successfully received and saved.

Finally, the server listens for connections on port 3000, and a message is logged to confirm it’s running.

In our server application, we use basename() to remove any path from the name of a received file (e.g., basename("/path/to/file") would give us "file"). This is an important security measure to ensure that files are saved within our received_files folder. Without basename(), a malicious user could create a request that escapes the application’s folder, leading to potentially serious consequences like being able to overwrite system files and inject malicious code. For example, imagine if the provided filename was something like ../../../usr/bin/node. An attacker could eventually guess a relative path to overwrite /usr/bin/node, replacing the Node.js interpreter with any executable file they want. Scary, right? This type of attack is called a path traversal attack (or directory traversal). You can read more about it here: nodejsdp.link/path-traversal.

Note that here we are not following the most conventional way to perform file uploads over HTTP. In fact, generally, this feature is implemented using a slightly more advanced and standard protocol that requires encoding the source data using the multipart/form-data specification (nodejsdp.link/multipart). This specification allows you to send one or more files and their respective file names using fields encoded in the body. In our simpler implementation, the body of the request contains no metadata, but only the gzipped bytes of the original file; therefore, we must specify the filename somewhere else. That’s why we provide a custom header called x-filename.

Now that we are done with the server, let’s write the corresponding client code:

// gzip-send.js
import { request } from 'node:http'
import { createGzip } from 'node:zlib'
import { createReadStream } from 'node:fs'
import { basename } from 'node:path'
const filename = process.argv[2]
const serverHost = process.argv[3]
const httpRequestOptions = {
  hostname: serverHost,
  port: 3000,
  path: '/',
  method: 'POST',
  headers: {
    'content-type': 'application/octet-stream',
    'content-encoding': 'gzip',
    'x-filename': basename(filename),
  },
}
const req = request(httpRequestOptions, res => {
  console.log(`Server response: ${res.statusCode}`)
})
createReadStream(filename)
  .pipe(createGzip())
  .pipe(req)
  .on('finish', () => {
    console.log('File successfully sent')
  })

In the preceding code, we implement the client side of our file transfer system. Its goal is to read a file from the local file system, compress it, and send it to the server using an HTTP POST request. Here’s how it works:

The client reads the filename (to be sent) and the server’s hostname (serverHost) from the command-line arguments. These values are then used to configure the httpRequestOptions object, which defines the details of the HTTP request, including:

  • The server hostname and port
  • The request path and method
  • The headers, including information about the file name (x-filename), content type, and the fact that the content is gzip-compressed.
  • The actual HTTP request (req) that is created using the request() function. This object is a stream that represents an HTTP request going from the client to the server.

The source file is read using createReadStream(), compressed with createGzip(), and then sent to the server by piping the resulting stream into req. This creates a continuous flow of data from the file on disk, through compression, and finally to the server.

When all the data has been sent, the finish event is triggered on the request stream. At this point, a confirmation message (“File successfully sent”) is logged.

Meanwhile, the server’s response is handled in the callback provided to request(). Once the server responds, its status code is logged to the console, allowing the client to confirm that the operation was completed successfully.

Now, to try out the application, let’s first start the server using the following command:

node gzip-receive.js

Then, we can launch the client by specifying the file to send and the address of the server (for example, localhost):

node gzip-send.js <path to file> localhost

If we choose a sufficiently large file, we can observe how the data flows from the client to the server. The target file will appear in the received_files folder before the “File successfully sent” message is displayed on the client. This is because, as the compressed file is being sent over HTTP, the server is already decompressing it and saving it on the disk.

However, we still haven’t addressed why this paradigm, with its continuous data flow, is more efficient than using a buffered API. Figure 6.3 should make this concept easier to grasp:

Figure 6.3: Buffering and streaming compared

Figure 6.3: Buffering and streaming compared

When a file is processed, it goes through a number of sequential steps:

  1. [Client] Read from the filesystem
  2. [Client] Compress the data
  3. [Client] Send it to the server
  4. [Server] Receive from the client
  5. [Server] Decompress the data
  6. [Server] Write the data to disk

To complete the processing, we have to go through each stage like in an assembly line, in sequence, until the end. In Figure 6.3, we can see that, using a buffered API, the process is entirely sequential. To compress the data, we first must wait for the entire file to be read, then, to send the data, we have to wait for the entire file to be both read and compressed, and so on.

Using streams, the assembly line is kicked off as soon as we receive the first chunk of data, without waiting for the entire file to be read. But more amazingly, when the next chunk of data is available, there is no need to wait for the previous set of tasks to be completed; instead, another assembly line is launched concurrently. This works perfectly because each task that we execute is asynchronous, so it can be executed concurrently by Node.js. The only constraint is that the order in which the chunks arrive at each stage must be preserved. The internal implementation of Node.js streams takes care of maintaining the order for us.

As we can see from Figure 6.3, the result of using streams is that the entire process takes less time, because we waste no time waiting for all the data to be read and processed all at once.

This section might make it seem like streams are always faster than using a buffered approach. While that’s often true (as in the example we just covered), it’s not guaranteed. Streams are designed for memory efficiency, not necessarily speed. The abstraction they provide can add overhead, which might slow things down. If all the data you need fits in memory, is already loaded, and doesn’t need to be transferred between processes or systems, processing it directly without streams is likely to give you faster results.

Composability

The code we’ve seen so far demonstrates how streams can be composed using the pipe() method. This method allows us to connect different processing units, each responsible for a single functionality, in true Node.js style. Streams can do this because they share a consistent interface, making them compatible with one another at the API level. The only requirement is that the next stream in the pipeline must support the data type produced by the previous stream (binary data or objects, as we’ll explore later in this chapter).

To further demonstrate the composability of Node.js streams, let’s try to add an encryption layer to the gzip-send/gzip-receive application we built earlier. This will require just a few small changes to both the client and the server.

Adding client-side encryption

Let’s start with the client:

// crypto-gzip-send.js
// ...
import { createCipheriv, randomBytes } from 'node:crypto' // 1
// ...
const secret = Buffer.from(process.argv[4], 'hex') // 2
const iv = randomBytes(16) // 3
// ...

Let’s review what we changed here:

  1. First of all, we import the createCipheriv() Transform stream and the randomBytes() function from the node:crypto module.
  2. We get the server’s encryption secret from the command line. We expect the string to be passed as a hexadecimal string, so we read this value and load it in memory using a buffer set to hex mode.
  3. Finally, we generate a random sequence of bytes that we will be using as an initialization vector (nodejsdp.link/iv) for the file encryption.

An Initialization Vector (IV) is a bit like giving a deck of cards a different shuffle before dealing them, even if you’re always using the same deck. By starting each round with a different shuffle, it becomes much harder for someone watching your hands closely to predict the cards you’re holding. In cryptography, the IV sets the initial state for encryption. It’s usually random or unique, ensuring that encrypting the same message twice with the same key produces different results. This helps prevent attackers from identifying patterns. Note that the IV is required for later decryption. The message recipient must know both the key and the IV to decrypt the message, and only the key must remain secret (generally, the IV is transferred together with the encrypted message, while the key is exchanged in some other secure way). The card-shuffling analogy isn’t perfect, but it helps illustrate how starting with a different configuration each time can significantly increase security.

Now, we can update the piece of code responsible for creating the HTTP request:

const httpRequestOptions = {
  hostname: serverHost,
  headers: {
    'content-type': 'application/octet-stream',
    'content-encoding': 'gzip',
    'x-filename': basename(filename),
    'x-initialization-vector': iv.toString('hex') // 1
  }
}
// ...
const req = request(httpRequestOptions, (res) => {
  console.log(`Server response: ${res.statusCode}`)
})
createReadStream(filename)
  .pipe(createGzip())
  .pipe(createCipheriv('aes192', secret, iv)) // 2
  .pipe(req)
// ...

The main changes here are:

  1. We pass the initialization vector to the server as an HTTP header.
  2. We encrypt the data, just after the Gzip phase.

That’s all for the client side.

Adding server-side decryption

Let’s now refactor the server. The first thing that we need to do is import some utility functions from the core node:crypto module, which we can use to generate a random encryption key (the secret):

// crypto-gzip-receive.js
// ...
import { createDecipheriv, randomBytes } from 'node:crypto'
const secret = randomBytes(24)
console.log(`Generated secret: ${secret.toString('hex')}`)

The generated secret is printed to the console as a hex string so that we can share that with our clients.

Now, we need to update the file reception logic:

const server = createServer((req, res) => {
  const filename = basename(req.headers['x-filename'])
  const iv = Buffer.from( req.headers['x-initialization-vector'], 'hex') // 1
  const destFilename = join('received_files', filename)
  console.log(`File request received: ${filename}`)
  req
    .pipe(createDecipheriv('aes192', secret, iv)) // 2
    .pipe(createGunzip())
    .pipe(createWriteStream(destFilename))
    // ...

Here, we are applying two changes:

  1. We have to read the encryption initialization vector sent by the client.
  2. The first step of our streaming pipeline is now responsible for decrypting the incoming data using the createDecipheriv Transform stream from the crypto module.

With very little effort (just a few lines of code), we added an encryption layer to our application; we simply had to use some already available Transform streams (createCipheriv and createDecipheriv) and included them in the stream processing pipelines for the client and the server. In a similar way, we can add and combine other streams, as if we were playing with LEGO bricks.

The main advantage of this approach is reusability, but as we can see from the code so far, streams also enable cleaner and more modular code. For these reasons, streams are often used not just to deal with pure I/O, but also to simplify and modularize code.

Now that you have had an appetizer of what using streams tastes like, we are ready to explore, in a more structured way, the different types of streams available in Node.js.

In this implementation, we used encryption as an example to demonstrate the composability of streams. Since our client-server communication relies on the HTTP protocol, a more standard and possibly simpler approach would have been to use HTTPS by simply switching from the node:http module to the node:https module. Regardless of which implementation you decide to use, make sure that, if you are transferring data over a network, you use some strong form of encryption. Never transfer unencrypted data over a network!

Getting started with streams

In the previous section, we learned why streams are so powerful, but also that they are everywhere in Node.js, starting from its core modules. For example, we have seen that the fs module has createReadStream() for reading from a file and createWriteStream() for writing to a file, the HTTP request and response objects are essentially streams, the zlib module allows us to compress and decompress data using a streaming interface, and, finally, even the crypto module exposes some useful streaming primitives like createCipheriv and createDecipheriv.

Now that we know why streams are so important, let’s take a step back and start to explore them in more detail.

Anatomy of streams

Every stream in Node.js is an implementation of one of the four base abstract classes available in the stream core module:

  • Readable
  • Writable
  • Duplex
  • Transform

Each stream class is also an instance of EventEmitter. Streams, in fact, can produce several types of events, such as end when a Readable stream has finished reading, finish when a Writable stream has completed writing (we have already seen this one in some of the examples before), or error when something goes wrong.

One reason why streams are so flexible is the fact that they can handle not just binary data, but almost any JavaScript value. In fact, they support two operating modes:

  • Binary mode: To stream data in the form of chunks, such as buffers or strings
  • Object mode: To stream data as a sequence of discrete objects (allowing us to use almost any JavaScript value)

These two operating modes allow us to use streams not just for I/O, but also as a tool to elegantly compose processing units in a functional fashion, as we will see later in this chapter.

Let’s start our deep dive into Node.js streams by introducing the class of Readable streams.

Readable streams

A Readable stream represents a source of data. In Node.js, it’s implemented using the Readable abstract class, which is available in the stream module.

Reading from a stream

There are two approaches to receive the data from a Readable stream: non-flowing (or paused) and flowing. Let’s analyze these modes in more detail.

The non-flowing mode

The non-flowing or paused mode is the default pattern for reading from a Readable stream. It involves attaching a listener to the stream for the readable event, which signals the availability of new data to read. Then, in a loop, we read the data continuously until the internal buffer is emptied. This can be done using the read() method, which synchronously reads from the internal buffer and returns a Buffer object representing the chunk of data. The read() method has the following signature:

readable.read([size])

Using this approach, the data is pulled from the stream on demand.

To show how this works, let’s create a new module named read-stdin.js, which implements a simple program that reads from the standard input (which is also a Readable stream) and echoes everything back to the standard output:

process.stdin
  .on('readable', () => {
    let chunk
    console.log('New data available')
    while ((chunk = process.stdin.read()) !== null) {
      console.log(
        `Chunk read (${chunk.length} bytes): "${chunk.toString()}"`
      )
    }
  })
  .on('end', () => console.log('End of stream'))

The read() method is a synchronous operation that pulls a data chunk from the internal buffers of the Readable stream. The returned chunk is, by default, a Buffer object if the stream is working in binary mode.

In a Readable stream working in binary mode, we can read strings instead of buffers by calling setEncoding(encoding) on the stream, and providing a valid encoding format (for example, utf8). This approach is recommended when streaming UTF-8 text data, as the stream will properly handle multibyte characters, doing the necessary buffering to make sure that no character ends up being split into separate chunks. In other words, every chunk produced by the stream will be a valid UTF-8 sequence of bytes.

Note that you can call setEncoding() as many times as you want on a Readable stream, even after you’ve started consuming the data from the stream. The encoding will be switched dynamically on the next available chunk. Streams are inherently binary; encoding is just a view over the binary data that is emitted by the stream.

The data is read solely from within the Readable listener, which is invoked as soon as new data is available. The read() method returns null when there is no more data available in the internal buffers; in such a case, we have to wait for another readable event to be fired, telling us that we can read again, or wait for the end event that signals the end of the stream. When a stream is working in binary mode, we can also specify that we are interested in reading a specific amount of data by passing a size value to the read() method. This is particularly useful when implementing network protocols or when parsing specific data formats.

Now, we are ready to run the read-stdin.js module and experiment with it. Let’s type some characters into the console and then press Enter to see the data echoed back into the standard output. To terminate the stream and hence generate a graceful end event, we need to insert an EOF (end-of-file) character (using Ctrl + Z on Windows or Ctrl + D on Linux and macOS).

We can also try to connect our program with other processes. This is possible using the pipe operator (|), which redirects the standard output of a program to the standard input of another. For example, we can run a command such as the following:

cat <path to a file> | node read-stdin.js

This is an amazing demonstration of how the streaming paradigm is a universal interface that enables our programs to communicate, regardless of the language they are written in.

Flowing mode

Another way to read from a stream is by attaching a listener to the data event. This will switch the stream into using flowing mode, where the data is not pulled using read(), but instead is pushed to the data listener as soon as it arrives. For example, the read-stdin.js application that we created earlier will look like this using flowing mode:

process.stdin
  .on('data', (chunk) => {
    console.log('New data available')
    console.log(
      `Chunk read (${chunk.length} bytes): "${chunk.toString()}"`
    )
  })
  .on('end', () => console.log('End of stream'))

Flowing mode offers less flexibility to control the flow of data compared to non-flowing mode. The default operating mode for streams is non-flowing, so to enable flowing mode, it’s necessary to attach a listener to the data event or explicitly invoke the resume() method. To temporarily stop the stream from emitting data events, we can invoke the pause() method, causing any incoming data to be cached in the internal buffer. Calling pause() will switch the stream back to non-flowing mode.

Async iterators

Readable streams are also async iterators; therefore, we could rewrite our read-stdin.js example as follows:

for await (const chunk of process.stdin) {
  console.log('New data available')
  console.log(`Chunk read (${chunk.length} bytes): "${chunk.toString()}"`)
}
console.log('End of stream')

We will discuss async iterators in greater detail in Chapter 9, Behavioral Design Patterns, so don’t worry too much about the syntax in the preceding example for now. What’s important to know is that you can also consume data from a Readable stream using this convenient for await ... of syntax.

Implementing Readable streams

Now that we know how to read from a stream, the next step is to learn how to implement a new custom Readable stream. To do this, it’s necessary to create a new class by inheriting the prototype Readable from the stream module. The concrete stream must provide an implementation of the _read() method, which has the following signature:

readable._read(size)

The internals of the Readable class will call the _read() method, which, in turn, will start to fill the internal buffer using push():

readable.push(chunk)

Please note that read() is a method called by the stream consumers, while _read() is a method to be implemented by a stream subclass and should never be called directly. The underscore prefix is used to indicate that the method is not to be considered public and should not be called directly.

To demonstrate how to implement new Readable streams, we can try to implement a stream that generates random strings. Let’s create a new module that contains the code of our random string generator:

// random-stream.js
import { Readable } from 'node:stream'
import Chance from 'chance' // v1.1.12
const chance = new Chance()
export class RandomStream extends Readable {
  constructor(options) {
    super(options)
    this.emittedBytes = 0
  }
  _read(size) {
    const chunk = chance.string({ length: size }) // 1
    this.push(chunk, 'utf8') // 2
    this.emittedBytes += chunk.length
    if (chance.bool({ likelihood: 5 })) { // 3
      this.push(null)
    }
  }
}

For this example, we are using a third-party module from npm called chance (nodejsdp.link/chance), which is a library for generating all sorts of random values, from numbers to strings to entire sentences.

Note that chance is not cryptographically secure, which means it can be used for tests or simulations but not to generate tokens, passwords, or other security-related purposes.

We start by defining a new class called RandomStream, which specifies Readable as its parent. In the constructor of this class, we have to invoke super(options), which will call the constructor of the parent class (Readable), initializing the stream’s internal state.

If you have a constructor that only invokes super(options), you can remove it, since the class will inherit the parent constructor by default. Just be careful to remember to call super(options) every time you need to write a custom constructor.

The possible parameters that can be passed through the options object include the following:

  • The encoding argument, which is used to convert buffers into strings (defaults to null)
  • A flag to enable object mode (objectMode, defaults to false)
  • The upper limit of the data stored in the internal buffer, after which no more reading from the source should be done (highWaterMark, defaults to 16KB)

Inside the constructor, we initialized an instance variable: emittedBytes. We will be using this variable to keep track of how many bytes have been emitted so far from the stream. This is going to be useful for debugging, but it’s not a requirement when creating Readable streams.

Okay, now let’s discuss the implementation of the _read() method:

  1. The method generates a random string of length equal to size using chance.
  2. It pushes the string into the internal buffer. Note that since we are pushing strings, we also need to specify the encoding, utf8 (this is not necessary if the chunk is simply a binary Buffer).
  3. It terminates the stream randomly, with a likelihood of 5 percent, by pushing null into the internal buffer to indicate an EOF situation or, in other words, the end of the stream. This is just an implementation detail that we are adopting to force the stream to eventually terminate. Without this condition, our stream would be producing random data indefinitely.

    Finite vs. Infinite Readable Streams

    It’s up to us to determine whether a Readable stream should terminate. You can signal the end of a stream by invoking this.push(null) in the _read() method. Some streams are naturally finite. For example, when reading from a file, the stream will end once all the bytes have been read because the file has a defined size. In other cases, we might create streams that provide data indefinitely. For instance, a readable stream could deliver continuous temperature readings from a sensor or a live video feed from a security camera. These streams will keep producing data for as long as the source remains active, and no communication errors occur.

Note that the size argument in the _read() function is an advisory parameter. It’s good to honor it and push only the amount of data that was requested by the caller, even though it is not mandatory to do so.

When we invoke push(), we should check whether it returns false. When that happens, it means that the internal buffer of the receiving stream has reached the highWaterMark limit and we should stop adding more data to it. This is called backpressure, and we will be discussing it in more detail in the next section of this chapter. For now, just be aware that this implementation is not perfect because it does not handle backpressure.

That’s it for RandomStream, we are now ready to use it. Let’s see how to instantiate a RandomStream object and pull some data from it (using flowing mode):

// index.js
import { RandomStream } from './random-stream.js'
const randomStream = new RandomStream()
randomStream
  .on('data', chunk => {
    console.log(`Chunk received (${chunk.length} bytes): ${chunk.toString()}`)
  })
  .on('end', () => {
    console.log(`Produced ${randomStream.emittedBytes} bytes of random data`)
  })

Now, everything is ready for us to try our new custom stream. Simply execute the index.js module as usual and watch a random set of strings flow on the screen.

Simplified construction

For simple custom streams, we can avoid creating a custom class by using the Readable stream’s simplified construction approach. With this approach, we only need to invoke new Readable(options) and pass a method named read() in the set of options. The read() method here has exactly the same semantic as the _read() method that we saw in the class extension approach. Let’s rewrite RandomStream using the simplified constructor approach:

// simplified-construction.js
import { Readable } from 'node:stream'
import Chance from 'chance' // v1.1.12
const chance = new Chance()
let emittedBytes = 0
const randomStream = new Readable({
  read(size) {
    const chunk = chance.string({ length: size })
    this.push(chunk, 'utf8')
    emittedBytes += chunk.length
    if (chance.bool({ likelihood: 5 })) {
      this.push(null)
    }
  },
})
// now you can read data from the randomStream instance directly ...

This approach can be particularly useful when you don’t need to manage a complicated state, and allows you to take advantage of a more succinct syntax. In the previous example, we created a single instance of our custom stream. If we want to adopt the simplified constructor approach, but we need to create multiple instances of the custom stream, we can wrap the initialization logic in a factory function that we can invoke multiple times to create those instances.

We will discuss the Factory pattern and other creational design patterns in Chapter 7, Creational Design Patterns.

Readable streams from iterables

You can easily create Readable stream instances from arrays or other iterable objects (that is, generators, iterators, and async iterators) using the Readable.from() helper.

In order to get accustomed to this helper, let’s look at a simple example where we convert data from an array into a Readable stream:

import { Readable } from 'node:stream'
const mountains = [
  { name: 'Everest', height: 8848 },
  { name: 'K2', height: 8611 },
  { name: 'Kangchenjunga', height: 8586 },
  { name: 'Lhotse', height: 8516 },
  { name: 'Makalu', height: 8481 }
]
const mountainsStream = Readable.from(mountains)
mountainsStream.on('data', (mountain) => {
  console.log(`${mountain.name.padStart(14)}\t${mountain.height}m`)
})

As we can see from this code, the Readable.from() method is quite simple to use: the first argument is an iterable instance (in our case, the mountains array). Readable.from() accepts an additional optional argument that can be used to specify stream options like objectMode.

Note that we didn’t have to explicitly set objectMode to true. By default, Readable.from() will set objectMode to true, unless this is explicitly opted out by setting it to false. Stream options can be passed as a second argument to the function.

Running the previous code will produce the following output:

       Everest    8848m
            K2    8611m
 Kangchenjunga    8586m
        Lhotse    8516m
        Makalu    8481m

You should avoid instantiating large arrays in memory. Imagine if, in the previous example, we wanted to list all the mountains in the world. There are about 1 million mountains, so if we were to load all of them in an array upfront, we would allocate a quite significant amount of memory. Even if we then consume the data in the array through a Readable stream, all the data has already been preloaded, so we are effectively voiding the memory efficiency of streams. It’s always preferable to load and consume the data in chunks, and you could do so by using native streams such as fs.createReadStream, by building a custom stream, or by using Readable.from with lazy iterables such as generators, iterators, or async iterators. We will see some examples of the latter approach in Chapter 9, Behavioral Design Patterns.

Writable streams

A Writable stream represents a data destination. Imagine, for instance, a file on the filesystem, a database table, a socket, the standard output, or the standard error interface. In Node.js, these kinds of abstractions can be implemented using the Writable abstract class, which is available in the stream module.

Writing to a stream

Pushing some data down a Writable stream is a straightforward business; all we have to do is use the write() method, which has the following signature:

writable.write(chunk, [encoding], [callback])

The encoding argument is optional and can be specified if chunk is a string (it defaults to utf8, and it’s ignored if chunk is a buffer). The callback function, on the other hand, is called when the chunk is flushed into the underlying resource and is optional as well.

To signal that no more data will be written to the stream, we have to use the end() method:

writable.end([chunk], [encoding], [callback])

We can provide a final chunk of data through the end() method; in this case, the callback function is equivalent to registering a listener to the finish event, which is fired when all the data written in the stream has been flushed into the underlying resource.

Now, let’s show how this works by creating a small HTTP server that outputs a random sequence of strings:

// entropy-server.js
import { createServer } from 'node:http'
import Chance from 'chance' // v1.1.12
const chance = new Chance()
const server = createServer((_req, res) => {
  res.writeHead(200, { 'Content-Type': 'text/plain' }) // 1
  do { // 2
    res.write(`${chance.string()}\n`)
  } while (chance.bool({ likelihood: 95 }))
  res.end('\n\n') // 3
  res.on('finish', () => console.log('All data sent')) // 4
})
server.listen(3000, () => {
  console.log('listening on http://localhost:3000')
})

The HTTP server that we created writes into the res object, which is an instance of http.ServerResponse and also a Writable stream. What happens is explained as follows:

  1. We first write the head of the HTTP response. Note that writeHead() is not a part of the Writable interface; in fact, it’s an auxiliary method exposed by the http.ServerResponse class and is specific to the HTTP protocol. This method writes into the stream a properly formatted HTTP header, which will contain the status code 200 and a header specifying the content type of the response body that we are about to stream.
  2. We start a loop that terminates with a likelihood of 5% (we instruct chance.bool() to return true 95% of the time). Inside the loop, we write a random string into the stream. Note that we use a do ... while loop here because we want to make sure to produce at least one random string.
  3. Once we are out of the loop, we call end() on the stream, indicating that no more data will be written. Also, we provide a final string containing two new line characters to be written into the stream before ending it.
  4. Finally, we register a listener for the finish event, which will be fired when all the data has been flushed into the underlying socket.

To test the server, we can open a browser at the address http://localhost:3000 or use curl from the terminal as follows:

curl -i --raw localhost:3000

At this point, the server should start sending random strings to the HTTP client that you chose. If you are using a web browser, bear in mind that modern hardware can process things very quickly and that some browsers might buffer the data, so the streaming behavior might not be apparent.

By using the -i --raw flags in the curl command, we can have a peek at many details of the HTTP protocol. Specifically, -i includes response headers in the output. This allows us to see the exact data transferred in the header part of the response when we invoke res.writeHead(). The node:http module simplifies working with the HTTP protocol by automatically formatting response headers and applies sensible defaults such as adding standard headers like Connection: keep-alive and Transfer-Encoding: chunked. This last header is particularly interesting. It informs the client how to interpret the body of the incoming response. Chunked encoding is especially useful when sending large amounts of data whose total size isn’t known until the request has been fully processed. This makes it a perfect fit for writable Node.js streams. With chunked encoding, the Content-Length header is omitted. Instead, each chunk begins with its length in hexadecimal format, followed by \r\n, the chunk’s data, and another \r\n. The stream ends with a terminating chunk, which is identical to a regular chunk except that its length is zero. In our code, we don’t need to handle these details manually. The ServerResponse writable stream provided by the node:http module takes care of encoding chunks correctly for us. We simply provide chunks by calling write() or end() on the response stream, and the stream handles the rest. This is one of the strengths of Node.js streams: they abstract away complex implementation details, making them easy to work with. If you want to learn more about chunked encoding, check out: nodejsdp.link/transfer-encoding. By using the --raw option, which disables all internal HTTP decoding of content or transfer encodings, we can see that these chunk terminators (\r\n) are present in the data received from the server.

Backpressure

Node.js data streams, like liquids in a piping system, can suffer from bottlenecks when data is written faster than the stream can handle. To manage this, incoming data is buffered in memory. However, without feedback from the stream to the writer, the buffer could keep growing, potentially leading to excessive memory usage.

Node.js streams are designed to maintain steady and predictable memory usage, even during large data transfers. Writable streams include a built-in signaling mechanism to alert the application when the internal buffer has accumulated too much data. This signal indicates that it’s better to pause and wait for the buffered data to be flushed to the stream’s destination before sending more data. The writable.write() method returns false once the buffer size exceeds the highWaterMark limit.

In Writable streams, the highWaterMark value sets the maximum buffer size in bytes. When this limit is exceeded, write() returning false signals the application to stop writing. Once the buffer is cleared, a drain event is emitted, indicating it’s safe to resume writing. This process is known as backpressure.

Backpressure is an advisory mechanism. Even if write() returns false, we could ignore this signal and continue writing, making the buffer grow indefinitely. The stream won’t be blocked automatically when the highWaterMark threshold is reached; therefore, it is recommended to always be mindful and respect the backpressure.

The mechanism described in this section is similarly applicable to Readable streams. In fact, backpressure exists in Readable streams too, and it’s triggered when the push() method, which is invoked inside _read(), returns false. However, that’s a problem specific to stream implementers, so we usually have to deal with it less frequently.

We can quickly demonstrate how to take into account the backpressure of a Writable stream by modifying the entropy-server.js module that we created previously:

// ...
const CHUNK_SIZE = 16 * 1024 - 1
const chance = new Chance()
const server = createServer((_req, res) => {
  res.writeHead(200, { 'Content-Type': 'text/plain' })
  let backPressureCount = 0
  let bytesSent = 0
  function generateMore() { // 1
    do {
      const randomChunk = chance.string({ length: CHUNK_SIZE }) // 2
      const shouldContinue = res.write(`${randomChunk}\n`) // 3
      bytesSent += CHUNK_SIZE
      if (!shouldContinue) { // 4
        console.warn(`back-pressure x${++backPressureCount}`)
        return res.once('drain', generateMore)
      }
    } while (chance.bool({ likelihood: 95 }))
    res.end('\n\n')
  }
  generateMore()
  res.on('finish', () => console.log(`Sent ${bytesSent} bytes`))
})
// ...

The most important steps of the previous code can be summarized as follows:

  1. We wrapped the main data generation logic in a function called generateMore().
  2. To increase the chances of receiving some backpressure, we increased the size of the data chunk to 16 KB minus 1 byte, which is very close to the default highWaterMark limit.
  3. After writing a chunk of data, we check the return value of res.write(). If we receive false, it means that the internal buffer is full, and we should stop sending more data.
  4. When this happens, we exit the function and register another cycle of writes using generateMore() for when the drain event is emitted.

If we now try to run the server again, and then generate a client request with curl (or with a browser), there is a high probability that there will be some backpressure, as the server produces data at a very high rate, faster than the underlying socket can handle. This example also prints how many backpressure events happen and how much data is being transferred for every request. You are encouraged to try different requests, check the logs, and try to make sense of what’s happening under the hood.

Implementing Writable streams

We can implement a new Writable stream by inheriting the class Writable and providing an implementation for the _write() method. Let’s try to do it immediately while discussing the details along the way.

Let’s build a Writable stream that receives objects in the following format:

{
  path: <path to a file>
  content: <string or buffer>
}

For each one of these objects, our stream will save the content property into a file created at the given path. We can immediately see that the inputs of our stream are objects, and not strings or buffers. This means that our stream must work in object mode, which gives us a great excuse to also explore object mode with this example:

// to-file-stream.js
import { Writable } from 'node:stream'
import { promises as fs } from 'node:fs'
import { dirname } from 'node:path'
import { mkdirp } from 'mkdirp' // v3.0.1
export class ToFileStream extends Writable {
  constructor(options) {
    super({ ...options, objectMode: true })
  }
  _write(chunk, _encoding, cb) {
    mkdirp(dirname(chunk.path))
      .then(() => fs.writeFile(chunk.path, chunk.content))
      .then(() => cb())
      .catch(cb)
  }
}

We created a new class for our new stream, which extends Writable from the stream module.

We had to invoke the parent constructor to initialize its internal state; we also needed to make sure that the options object specifies that the stream works in object mode (objectMode: true). Other options accepted by Writable are as follows:

  • highWaterMark (the default is 16 KB): This controls the backpressure limit.
  • decodeStrings (defaults to true): This enables the automatic decoding of strings into binary buffers before passing them to the _write() method. This option is ignored in object mode.

Finally, we provided an implementation for the _write() method. As you can see, the method accepts a data chunk and an encoding (which makes sense only if we are in binary mode and the stream option decodeStrings is set to false). Also, the method accepts a callback function (cb), which needs to be invoked when the operation completes; it’s not necessary to pass the result of the operation but, if needed, we can still pass an error that will cause the stream to emit an error event.

Now, to try the stream that we just built, we can create a new module and perform some write operations against the stream:

// index.js
import { join } from 'node:path'
import { ToFileStream } from './to-file-stream.js'
const tfs = new ToFileStream()
const outDir = join(import.meta.dirname, 'files')
tfs.write({ path: join(outDir, 'file1.txt'), content: 'Hello' })
tfs.write({ path: join(outDir, 'file2.txt'), content: 'Node.js' })
tfs.write({ path: join(outDir, 'file3.txt'), content: 'streams' })
tfs.end(() => console.log('All files created'))

Here, we created and used our first custom Writable stream. Run the new module as usual and check its output. You will see that after the execution, three new files will be created within a new folder called files.

Simplified construction

As we saw for Readable streams, Writable streams also offer a simplified construction mechanism. If we were to rewrite ToFileStream using the simplified construction for Writable streams, it would look like this:

// ...
const tfs = new Writable({
  objectMode: true,
  write(chunk, _encoding, cb) {
    mkdirp(dirname(chunk.path))
      .then(() => fs.writeFile(chunk.path, chunk.content))
      .then(() => cb())
      .catch(cb)
  },
})
// ...

With this approach, we are simply using the Writable constructor and passing a write() function that implements the custom logic of our Writable instance. Note that with this approach, the write() function doesn’t have an underscore in the name. We can also pass other construction options like objectMode.

Duplex streams

A Duplex stream is a stream that is both Readable and Writable. It is useful when we want to describe an entity that is both a data source and a data destination, such as network sockets, for example. Duplex streams inherit the methods of both stream.Readable and stream.Writable, so this is nothing new to us. This means that we can read() or write() data, or listen for both readable and drain events.

To create a custom Duplex stream, we have to provide an implementation for both _read() and _write(). The options object passed to the Duplex() constructor is internally forwarded to both the Readable and Writable constructors. The options are the same as those we already discussed in the previous sections, with the addition of a new one called allowHalfOpen (defaults to true) that, if set to false, will cause both parts (Readable and Writable) of the stream to end if only one of them does.

If we need to have a Duplex stream working in object mode on one side and binary mode on the other, we can use the options readableObjectMode and writableObjectMode independently.

Transform streams

Transform streams are a special kind of Duplex stream that are specifically designed to handle data transformations. Just to give you a few concrete examples, the functions zlib.createGzip() and crypto.createCipheriv() that we discussed at the beginning of this chapter create Transform streams for compression and encryption, respectively.

In a simple Duplex stream, there is no immediate relationship between the data read from the stream and the data written into it (at least, the stream is agnostic to such a relationship). Think about a TCP socket, which just sends and receives data to and from the remote peer; the socket is not aware of any relationship between the input and output. Figure 6.4 illustrates the data flow in a Duplex stream:

Figure 6.4: Duplex stream schematic representation

Figure 6.4: Duplex stream schematic representation

On the other hand, Transform streams apply some kind of transformation to each chunk of data that they receive from their Writable side, and then make the transformed data available on their Readable side. Figure 6.5 shows how the data flows in a Transform stream:

Figure 6.5: Transform stream schematic representation

Figure 6.5: Transform stream schematic representation

Returning to our compression example, a transform stream can be visualized as follows: when uncompressed data is written to the stream, its internal implementation compresses the data and stores it in an internal buffer. When the data is read from the other end, the compressed version of the data is retrieved. This is how transformation happens on the fly: data comes in, gets transformed, and then goes out.

From a user perspective, the programmatic interface of a Transform stream is exactly like that of a Duplex stream. However, when we want to implement a new Duplex stream, we have to provide both the _read() and _write() methods, while for implementing a new Transform stream, we have to fill in another pair of methods: _transform() and _flush().

Let’s see how to create a new Transform stream with an example.

Implementing Transform streams

Let’s implement a Transform stream that replaces all the occurrences of a given string:

// replaceStream.js
import { Transform } from 'node:stream'
export class ReplaceStream extends Transform {
  constructor(searchStr, replaceStr, options) {
    super({ ...options })
    this.searchStr = searchStr
    this.replaceStr = replaceStr
    this.tail = ''
  }
  _transform(chunk, _encoding, cb) {
    const pieces = (this.tail + chunk).split(this.searchStr) // 1
    const lastPiece = pieces[pieces.length - 1] // 2
    const tailLen = this.searchStr.length - 1
    this.tail = lastPiece.slice(-tailLen)
    pieces[pieces.length - 1] = lastPiece.slice(0, -tailLen)
    this.push(pieces.join(this.replaceStr)) // 3
    cb()
  }
  _flush(cb) {
    this.push(this.tail)
    cb()
  }
}

In this example, we created a new class extending the Transform base class. The constructor of the class accepts three arguments: searchStr, replaceStr, and options. As you can imagine, they allow us to define the text to match and the string to use as a replacement, plus an options object for advanced configuration of the underlying Transform stream. We also initialize an internal tail variable, which will be used later by the _transform() method.

Now, let’s analyze the _transform() method, which is the core of our new class. The _transform() method has practically the same signature as the _write() method of the Writable stream, but instead of writing data into an underlying resource, it pushes it into the internal read buffer using this.push(), exactly as we would do in the _read() method of a Readable stream. This shows how the two sides of a Transform stream are connected.

The _transform() method of ReplaceStream implements the core of our algorithm. To search for and replace a string in a buffer is an easy task; however, it’s a totally different story when the data is streaming, and possible matches might be distributed across multiple chunks. The procedure followed by the code can be explained as follows:

  1. Our algorithm splits the data in memory (tail data and the current chunk) using searchStr as a separator.
  2. Then, it takes the last item of the array generated by the operation and extracts the last searchString.length - 1 characters. The result is saved in the tail variable and will be prepended to the next chunk of data.
  3. Finally, all the pieces resulting from split() are joined together using replaceStr as a separator and pushed into the internal buffer.

When the stream ends, we might still have some content in the tail variable not pushed into the internal buffer. That’s exactly what the _flush() method is for; it is invoked just before the stream is ended, and this is where we have one final chance to finalize the stream or push any remaining data before completely ending the stream.

The _flush() method only takes in a callback, which we have to make sure to invoke when all the operations are complete, causing the stream to be terminated. With this, we have completed our ReplaceStream class.

Why is the tail variable necessary?

Streams process data in chunks, and these chunks don’t always align with the boundaries of the target search string. For example, if the string we are trying to match is split across two chunks, the split() operation on a chunk alone won’t detect it, potentially leaving part of the match unnoticed. The tail variable ensures that the last portion of a chunk—potentially part of a match—is preserved and concatenated with the next chunk. This way, the stream can properly handle matches that span chunk boundaries, avoiding incorrect replacements or missing matches entirely.

In Transform streams, it’s not uncommon for the logic to involve buffering data from multiple chunks before there’s enough information to perform the transformation. For example, cryptography often works on fixed-size blocks of data. If a chunk doesn’t provide enough data to form a complete block, the Transform stream accumulates multiple chunks until it has enough to process the transformation. This buffering behavior ensures transformations are accurate and consistent, even when input data arrives in unpredictable sizes.

This should also clarify why the _flush() method exists. It’s provided to handle any remaining data still buffered in the Transform stream when the writer has finished writing. This ensures that leftover data—such as the tail in this example—is processed and emitted, preventing incomplete or lost output.

Now, it’s time to try the new stream. Let’s create a script that writes some data into the stream and then reads the transformed result:

// index.js
import { ReplaceStream } from './replace-stream.js'
const replaceStream = new ReplaceStream('World', 'Node.js')
replaceStream.on('data', chunk => process.stdout.write(chunk.toString()))
replaceStream.write('Hello W')
replaceStream.write('orld!')
replaceStream.end('\n')

To make life a little bit harder for our stream, we spread the search term (which is World) across two different chunks, then, using flowing mode, we read from the same stream, logging each transformed chunk. Running the preceding program should produce the following output:

Hello Node.js!

Simplified construction

Unsurprisingly, even Transform streams support simplified construction. At this point, we should have developed an instinct for how this API might look, so let’s get our hands dirty straight away and rewrite the previous example using this approach:

// simplified-construction.js
// ...
const searchStr = 'World'
const replaceStr = 'Node.js'
let tail = ''
const replaceStream = new Transform({
  defaultEncoding: 'utf8',
  transform(chunk, _encoding, cb) {
    const pieces = (tail + chunk).split(searchStr)
    const lastPiece = pieces[pieces.length - 1]
    const tailLen = searchStr.length - 1
    tail = lastPiece.slice(-tailLen)
    pieces[pieces.length - 1] = lastPiece.slice(0, -tailLen)
    this.push(pieces.join(replaceStr))
    cb()
  },
  flush(cb) {
    this.push(tail)
    cb()
  },
})
// now write to replaceStream ...

As expected, simplified construction works by directly instantiating a new Transform object and passing our specific transformation logic through the transform() and flush() functions directly through the options object. Note that transform() and flush() don’t have a prepended underscore here.

Filtering and aggregating data with Transform streams

As we discussed earlier, Transform streams are a great tool for building data transformation pipelines. In a previous example, we showed how to use a Transform stream to replace words in a text stream. We also mentioned other use cases, like compression and encryption. But Transform streams aren’t limited to those examples. They’re often used for tasks like filtering and aggregating data.

To make this more concrete, imagine a Fortune 500 company asks us to analyze a large file containing all their sales data for 2024. The file, data.csv, is a sales report in CSV format, and they want us to calculate the total profit for sales made in Italy. Sure, we could use a spreadsheet application to do this, but where’s the fun in that?

Instead, let’s use Node.js streams. Streams are well-suited for this kind of task because they can process large datasets incrementally, without loading everything into memory. This makes them efficient and scalable. Plus, building a solution with streams sets the stage for automation; perfect if you need to generate similar reports regularly or process other large datasets in the future.

Let’s imagine the sales data that is stored in the CSV file contains three fields per line: item type, country of sale, and profit. So, such a file could look like this:

type,country,profit
Household,Namibia,597290.92
Baby Food,Iceland,808579.10
Meat,Russia,277305.60
Meat,Italy,413270.00
Cereal,Malta,174965.25
Meat,Indonesia,145402.40
Household,Italy,728880.54
[... many more lines]

Now, it’s clear that we must find all the records that have “Italy” in the country field and, in the process, sum up the profit value of the matching lines into a single number.

In order to process a CSV file in a streaming fashion, we can use the excellent third-party module csv-parse (nodejsdp.link/csv-parse).

If we assume for a moment that we have already implemented our custom streams to filter and aggregate the data, a possible solution to this task might look like this:

// index.js
import { createReadStream } from 'node:fs'
import { Parser } from 'csv-parse' // v5.6.0
import { FilterByCountry } from './filter-by-country.js'
import { SumProfit } from './sum-profit.js'
const csvParser = new Parser({ columns: true })
createReadStream('data.csv.gz') // 1
  .pipe(csvParser) // 2
  .pipe(new FilterByCountry('Italy')) // 3
  .pipe(new SumProfit()) // 4
  .pipe(process.stdout) // 5

The streaming pipeline proposed here consists of five steps:

  1. We read the source CSV file as a stream.
  2. We use the csv-parse library to parse every line of the document as a CSV record. For every line, this stream will emit an object containing the properties type, country, and profit. With the option columns: true, the library will read the names of the available columns from the first row of the CSV file.
  3. We filter all the records by country, retaining only the records that match the country “Italy.” All the records that don’t match “Italy” are dropped, which means that they will not be passed to the other steps in the pipeline. Note that this is one of the custom Transform streams that we have to implement.
  4. For every record, we accumulate the profit. This stream will eventually emit a single string, which represents the value of the total profit for products sold in Italy. This value will be emitted by the stream only when all the data from the original file has been completely processed. Note that this is the second custom Transform stream that we have to implement to complete this project.
  5. Finally, the data emitted from the previous step is displayed in the standard output.

Now, let’s implement the FilterByCountry stream:

// filter-by-country.js
import { Transform } from 'node:stream'
export class FilterByCountry extends Transform {
  constructor(country, options = {}) {
    options.objectMode = true
    super(options)
    this.country = country
  }
  _transform(record, _enc, cb) {
    if (record.country === this.country) {
      this.push(record)
    }
    cb()
  }
}

FilterByCountry is a custom Transform stream. We can see that the constructor accepts an argument called country, which allows us to specify the country name we want to match on. In the constructor, we also set the stream to run in objectMode because we know it will be used to process objects (records coming from the CSV file).

In the _transform method, we check if the country of the current record matches the country specified at construction time. If it’s a match, then we pass the record on to the next stage of the pipeline by calling this.push(). Regardless of whether the record matches or not, we need to invoke cb() to indicate that the current record has been successfully processed and that the stream is ready to receive another record.

Pattern: Transform filter

Invoke this.push() in a conditional way to allow only some data to reach the next stage of the pipeline.

Finally, let’s implement the SumProfit filter:

// sum-profit.js
import { Transform } from 'node:stream'
export class SumProfit extends Transform {
  constructor(options = {}) {
    options.objectMode = true
    super(options)
    this.total = 0
  }
  _transform(record, _enc, cb) {
    this.total += Number.parseFloat(record.profit)
    cb()
  }
  _flush(cb) {
    this.push(this.total.toString())
    cb()
  }
}

This stream needs to run in objectMode as well, because it will receive objects representing records from the CSV file. Note that, in the constructor, we also initialize an instance variable called total and we set its value to 0.

In the _transform() method, we process every record and use the current profit value to increase the total. It’s important to note that this time, we are not calling this.push(). This means that no value is emitted while the data is flowing through the stream. We still need to call cb(), though, to indicate that the current record has been processed and the stream is ready to receive another one.

In order to emit the final result when all the data has been processed, we have to define a custom flush behavior using the _flush() method. Here, we finally call this.push() to emit a string representation of the resulting total value. Remember that _flush() is automatically invoked before the stream is closed.

Pattern: Streaming aggregation

Use _transform() to process the data and accumulate the partial result, then call this.push() only in the _flush() method to emit the result when all the data has been processed.

This completes our example. Now, you can grab the CSV file from the code repository and execute this program to see what the total profit for Italy is. No surprise it’s going to be a lot of money since we are talking about the profit of a Fortune 500 company!

You could combine filtering and aggregation into a single Transform stream. While this approach might be less reusable, it can offer a slight performance boost since less data gets passed between steps in the stream pipeline. If you’re up for the challenge, try implementing this as an exercise!

The Node.js streams library includes a set of Readable stream helper methods (experimental at the time of writing). Among these are Readable.map() and Readable.reduce(), which could solve the problem we just explored in a more concise and streamlined way. We’ll dive into Readable stream helpers later in this chapter.

PassThrough streams

There is a fifth type of stream that is worth mentioning: PassThrough. This type of stream is a special type of Transform stream that outputs every data chunk without applying any transformation.

PassThrough is possibly the most underrated type of stream, but there are several circumstances in which it can be a very valuable tool in our toolbelt. For instance, PassThrough streams can be useful for observability or to implement late piping and lazy stream patterns.

Observability

If we want to observe how much data is flowing through one or more streams, we could do so by attaching a data event listener to a PassThrough instance and then piping this instance at a given point in a stream pipeline. Let’s see a simplified example to be able to appreciate this concept:

import { PassThrough } from 'node:stream'
let bytesWritten = 0
const monitor = new PassThrough()
monitor.on('data', chunk => {
  bytesWritten += chunk.length
})
monitor.on('finish', () => {
  console.log(`${bytesWritten} bytes written`)
})
monitor.write('Hello!')
monitor.end()

In this example, we are creating a new instance of PassThrough and using the data event to count how many bytes are flowing through the stream. We also use the finish event to dump the total amount to the console. Finally, we write some data directly into the stream using write() and end(). This is just an illustrative example; in a more realistic scenario, we would be piping our monitor instance at a given point in a stream pipeline. For instance, if we wanted to monitor how many bytes are written to disk in our first file compression example of this chapter, we could easily achieve that by doing something like this:

createReadStream(filename)
  .pipe(createGzip())
  .pipe(monitor)
  .pipe(createWriteStream(`${filename}.gz`))

The beauty of this approach is that we didn’t have to touch any of the other existing streams in the pipeline, so if we need to observe other parts of the pipeline (for instance, imagine we want to know the number of bytes of the uncompressed data), we can move monitor around with very little effort. We could even have multiple PassThrough streams to monitor different parts of a pipeline at the same time.

Note that you could implement an alternative version of the monitor stream by using a custom transform stream instead. In such a case, you would have to make sure that the received chunks are pushed without any modification or delay, which is something that a PassThrough stream would do automatically for you. Both approaches are equally valid, so pick the approach that feels more natural to you.

Late piping

In some circumstances, we might have to use APIs that accept a stream as an input parameter. This is generally not a big deal because we already know how to create and use streams. However, it may get a little bit more complicated if the data we want to read or write through the stream is only available after we’ve called the given API.

To visualize this scenario in more concrete terms, let’s imagine that we have to use an API that gives us the following function to upload a file to a data storage service:

function upload (filename, contentStream) {
  // ...
}

This function is effectively a simplified version of what is commonly available in the SDK of file storage services like Amazon Simple Storage Service (S3) or Azure Blob Storage service. Often, those libraries will provide the user with a more flexible function that can receive the content data in different formats (for example, a string, a buffer, or a Readable stream).

Now, if we want to upload a file from the filesystem, this is a trivial operation, and we can do something like this:

import { createReadStream } from 'fs'
upload('a-picture.jpg', createReadStream('/path/to/a-picture.jpg'))

But what if we want to do some processing on the file stream before the upload? For instance, let’s say we want to compress or encrypt the data. Also, what if we need to perform this transformation asynchronously after the upload function has been called?

In such cases, we can provide a PassThrough stream to the upload() function, which will effectively act as a placeholder. The internal implementation of upload() will immediately try to consume data from it, but there will be no data available in the stream until we actually write to it. Also, the stream won’t be considered complete until we close it, so the upload() function will have to wait for data to flow through the PassThrough instance to initiate the upload.

Let’s see a possible command-line script that uses this approach to upload a file from the filesystem and compresses it using Brotli compression. We are going to assume that the third-party upload() function is provided in a file called upload.js.

// upload-cli.js
import { createReadStream } from 'node:fs'
import { createBrotliCompress } from 'node:zlib'
import { PassThrough } from 'node:stream'
import { basename } from 'node:path'
import { upload } from './upload.js'
const filepath = process.argv[2] // 1
const filename = basename(filepath)
const contentStream = new PassThrough() // 2
upload(`${filename}.br`, contentStream) // 3
  .then(response => {
    console.log(`Server response: ${response.data}`)
  })
  .catch(err => {
    console.error(err)
    process.exit(1)
  })
createReadStream(filepath) // 4
  .pipe(createBrotliCompress())
  .pipe(contentStream)

In this book’s repository, you will find a complete implementation of this example that allows you to upload files to an HTTP server that you can run locally.

Let’s review what’s happening in the previous example:

  1. We get the path to the file we want to upload from the first command-line argument and use basename to extrapolate the filename from the given path.
  2. We create a placeholder for our content stream as a PassThrough instance.
  3. Now, we invoke the upload function by passing our filename (with the added .br suffix, indicating that it is using Brotli compression) and the placeholder content stream.
  4. Finally, we create a pipeline by chaining a filesystem Readable stream, a Brotli compression Transform stream, and finally our content stream as the destination.

When this code is executed, the upload will start as soon as we invoke the upload() function (possibly establishing a connection to the remote server), but the data will start to flow only later, when our pipeline is initialized. Note that our pipeline will also close the contentStream when the processing completes, which will indicate to the upload() function that all the content has been fully consumed.

Pattern

Use a PassThrough stream when you need to provide a placeholder for data that will be read or written in the future.

We can also use this pattern to transform the interface of the upload() function. Instead of accepting a Readable stream as input, we can make it return a Writeable stream, which can then be used to provide the data we want to upload:

function createUploadStream (filename) {
  // ...
  // returns a writable stream that can be used to upload data
}

If we were tasked to implement this function, we could achieve that in a very elegant way by using a PassThrough instance, as in the following example implementation:

function createUploadStream (filename) {
  const connector = new PassThrough()
  upload(filename, connector)
  return connector
}

In the preceding code, we are using a PassThrough stream as a connector. This stream becomes a perfect abstraction for a case where the consumer of the library can write data at any later stage.

The createUploadStream() function can then be used as follows:

const upload = createUploadStream('a-file.txt')
upload.write('Hello World')
upload.end()

This book’s repository also contains an HTTP upload example that adopts this alternative pattern.

Lazy streams

Sometimes, we need to create a large number of streams at the same time, for example, to pass them to a function for further processing. A typical example is when using archiver (nodejsdp.link/archiver), a package for creating archives such as TAR and ZIP. The archiver package allows you to create an archive from a set of streams, representing the files to add. The problem is that if we want to pass a large number of streams, such as from files on the filesystem, we would likely get an EMFILE, too many open files error. This is because functions like createReadStream() from the fs module will actually open a file descriptor every time a new stream is created, even before you start to read from those streams.

In more generic terms, creating a stream instance might initialize expensive operations straight away (for example, open a file or a socket, initialize a connection to a database, and so on), even before we start to use such a stream. This might not be desirable if you are creating a large number of stream instances for later consumption.

In these cases, you might want to delay the expensive initialization until you need to consume data from the stream.

It is possible to achieve this by using a library like lazystream (nodejsdp.link/lazystream). This library allows you to create proxies for actual stream instances, where the creation of the stream instance is deferred until some piece of code starts to consume data from the proxy.

In the following example, lazystream allows us to create a lazy Readable stream for the special Unix file /dev/urandom:

import lazystream from 'lazystream'
const lazyURandom = new lazystream.Readable(function (options) {
  return fs.createReadStream('/dev/urandom')
})

The function we pass as a parameter to new lazystream.Readable() is effectively a factory function that generates the proxied stream when necessary.

Behind the scenes, lazystream is implemented using a PassThrough stream that, only when its _read() method is invoked for the first time, creates the proxied instance by invoking the factory function, and pipes the generated stream into the PassThrough itself. The code that consumes the stream is totally agnostic of the proxying that is happening here, and it will consume the data as if it was flowing directly from the PassThrough stream. lazystream implements a similar utility to create a lazy Writable stream as well.

Creating lazy Readable and Writable streams from scratch could be an interesting exercise that is left to you. If you get stuck, have a look at the source code of lazystream for inspiration on how to implement this pattern.

In the next section, we will discuss the .pipe() method in greater detail and also see other ways to connect different streams to form a processing pipeline.

Connecting streams using pipes

The concept of Unix pipes was invented by Douglas McIlroy. This enabled the output of a program to be connected to the input of the next. Take a look at the following command:

echo Hello World! | sed s/World/Node.js/g

In the preceding command, echo will write Hello World! to its standard output, which is then redirected to the standard input of the sed command (thanks to the pipe | operator). Then, sed replaces any occurrence of World with Node.js and prints the result to its standard output (which, this time, is the console).

In a similar way, Node.js streams can be connected using the pipe() method of the Readable stream object, which has the following interface:

readable.pipe(writable, [options])

We have already used the pipe() method in a few examples, but let’s finally dive into what it does for us under the hood.

Very intuitively, the pipe() method takes the data that is emitted from the readable stream and pumps it into the provided writable stream. Also, the writable stream is ended automatically when the readable stream emits an end event (unless we specify {end: false} as options). The pipe() method returns the writable stream passed in the first argument, allowing us to create chained invocations if such a stream is also Readable (such as a Duplex or Transform stream).

Piping two streams together will create suction, which allows the data to flow automatically to the writable stream, so there is no need to call read() or write(), but most importantly, there is no need to control the backpressure anymore, because it’s automatically taken care of.

To provide a quick example, we can create a new module that takes a text stream from the standard input, applies the replace transformation discussed earlier when we built our custom ReplaceStream, and then pushes the data back to the standard output:

// replace.js
import { ReplaceStream } from './replace-stream.js'
process.stdin
  .pipe(new ReplaceStream(process.argv[2], process.argv[3]))
  .pipe(process.stdout)

The preceding program pipes the data that comes from the standard input into an instance of ReplaceStream and then back to the standard output. Now, to try this small application, we can leverage a Unix pipe to redirect some data into its standard input, as shown in the following example:

echo Hello World! | node replace.js World Node.js

This should produce the following output:

Hello Node.js!

This simple example demonstrates that streams (and in particular, text streams) are a universal interface and that pipes are the way to compose and interconnect all these interfaces almost magically.

Pipes and error handling

The pipe() method is very powerful, but there’s one important problem: error events are not propagated automatically through the pipeline when using pipe(). Take, for example, this code fragment:

stream1
  .pipe(stream2)
  .on('error', () => {})

In the preceding pipeline, we will catch only the errors coming from stream2, which is the stream that we attached the listener to. This means that, if we want to catch any error generated from stream1, we have to attach another error listener directly to it, which will make our example look like this:

stream1
  .on('error', () => {})
  .pipe(stream2)
  .on('error', () => {})

This is clearly not ideal, especially when dealing with pipelines with a significant number of steps. To make this matter worse, in the event of an error, the failing stream is only unpiped from the pipeline. The failing stream is not properly destroyed, which might leave dangling resources (for example, file descriptors, connections, and so on) and leak memory. A more robust (yet inelegant) implementation of the preceding snippet might look like this:

function handleError (err) {
  console.error(err)
  stream1.destroy()
  stream2.destroy()
}
stream1
  .on('error', handleError)
  .pipe(stream2)
  .on('error', handleError)

In this example, we registered a handler for the error event for both stream1 and stream2. When an error happens, our handleError() function is invoked, and we can log the error and destroy every stream in the pipeline. This allows us to ensure that all the allocated resources are properly released, and the error is handled gracefully.

Better error handling with pipeline()

Handling errors manually in a pipeline is not just cumbersome, but also error-prone—something we should avoid if we can!

Luckily, the core node:stream package offers us an excellent utility function that can make building pipelines a much safer and more enjoyable practice, which is the pipeline() helper function.

In a nutshell, you can use the pipeline() function as follows:

pipeline(stream1, stream2, stream3, ... , cb)

The last argument is an optional callback that will be called when the stream finishes. If it finishes because of an error, the callback will be invoked with the given error as the first argument.

If you prefer to avoid callbacks and rather use a Promise, there’s a Promise-based alternative in the node:stream/promises package:

pipeline(stream1, stream2, stream3, ...) // returns a promise

This alternative returns a Promise that will resolve when the pipeline completes or rejects in case of an error.

Both of these helpers pipe every stream passed in the arguments list to the next one. For each stream, they will also register a proper error and close listeners. This way, all the streams are properly destroyed when the pipeline completes successfully or when it’s interrupted by an error.

To get some practice with these helpers, let’s write a simple command-line script that implements the following pipeline:

  • Reads a Gzip data stream from the standard input
  • Decompresses the data
  • Makes all the text uppercase
  • Gzips the resulting data
  • Sends the data back to the standard output
// uppercasify-gzipped.js
import { createGzip, createGunzip } from 'node:zlib' // 1
import { Transform } from 'node:stream'
import { pipeline } from 'node:stream/promises'
const uppercasify = new Transform({ // 2
  transform(chunk, _enc, cb) {
    this.push(chunk.toString().toUpperCase())
    cb()
  },
})
await pipeline( // 3
  process.stdin,
  createGunzip(),
  uppercasify,
  createGzip(),
  process.stdout
)

In this example:

  1. We are importing the necessary dependencies from zlib, stream, and the stream/promises modules.
  2. We create a simple Transform stream that makes every chunk uppercase.
  3. We define our pipeline, where we list all the stream instances in order. Note that we use await to wait for the pipeline to complete. In this example, this is not mandatory because we don’t do anything after the pipeline is completed, but it’s a good practice to have this since we might decide to evolve our script in the future, or we might want to add a try catch around this expression to handle potential errors.

The pipeline will start automatically by consuming data from the standard input and producing data for the standard output.

We could test our script with the following command:

echo 'Hello World!' | gzip | node uppercasify-gzipped.js | gunzip

This should produce the following output:

HELLO WORLD!

If we try to remove the gzip step from the preceding sequence of commands, our script will fail with an uncaught error. This error is raised by the stream created with the createGunzip() function, which is responsible for decompressing the data. If the data is not actually gzipped, the decompression algorithm won’t be able to process the data and it will fail. In such a case, pipeline() will take care of cleaning up after the error and destroy all the streams in the pipeline.

Now that we have built a solid understanding of Node.js streams, we are ready to move into some more involved stream patterns like control flow and advanced piping patterns.

Asynchronous control flow patterns with streams

Going through the examples that we have presented so far, it should be clear that streams can be useful not only to handle I/O, but also as an elegant programming pattern that can be used to process any kind of data. But the advantages do not end at its simple appearance; streams can also be leveraged to turn “asynchronous control flow” into “flow control,” as we will see in this section.

Sequential execution

By default, streams will handle data in sequence. For example, the _transform() function of a Transform stream will never be invoked with the next chunk of data until the previous invocation completes by calling callback(). This is an important property of streams, crucial for processing each chunk in the right order, but it can also be exploited to turn streams into an elegant alternative to the traditional control flow patterns.

Let’s look at some code to clarify what we mean. We will be working on an example to demonstrate how we can use streams to execute asynchronous tasks in sequence. Let’s create a function that concatenates a set of files received as input, making sure to honor the order in which they are provided. Let’s create a new module called concat-files.js and define its contents as follows:

import { createReadStream, createWriteStream } from 'node:fs'
import { Readable, Transform } from 'node:stream'
export function concatFiles(dest, files) {
  return new Promise((resolve, reject) => {
    const destStream = createWriteStream(dest)
    Readable.from(files) // 1
      .pipe(
        new Transform({ // 2
          objectMode: true,
          transform(filename, _enc, done) {
            const src = createReadStream(filename)
            src.pipe(destStream, { end: false })
            // same as ((err) => done(err))
            // propagates the error
            src.on('error', done)
            // same as (() => done())
            // propagates correct completion
            src.on('end', done) // 3
          },
        })
      )
      .on('error', err => {
        destStream.end()
        reject(err)
      })
      .on('finish', () => { // 4
        destStream.end()
        resolve()
      })
  })
}

The preceding function implements a sequential iteration over the files array by transforming it into a stream. The algorithm can be explained as follows:

  1. First, we use Readable.from() to create a Readable stream from the files array. This stream operates in object mode (the default setting for streams created with Readable.from()) and it will emit filenames: every chunk is a string indicating the path to a file. The order of the chunks respects the order of the files in the files array.
  2. Next, we create a custom Transform stream to handle each file in the sequence. Since we are receiving strings, we set the option objectMode to true. In our transformation logic, for each file we create a Readable stream to read the file content and pipe it into destStream (a Writable stream for the destination file). We make sure not to close destStream after the source file finishes reading by specifying { end: false } in the pipe() options.
  3. When all the contents of the source file have been piped into destStream, we invoke the done function to communicate the completion of the current processing, which is necessary to trigger the processing of the next file.
  4. When all the files have been processed, the finish event is fired; we can finally end destStream and invoke the cb() function of concatFiles(), which signals the completion of the whole operation.

We can now try to use the little module we just created:

// concat.js
import { concatFiles } from './concat-files.js'
try {
  await concatFiles(process.argv[2], process.argv.slice(3))
} catch (err) {
  console.error(err)
  process.exit(1)
}
console.log('All files concatenated successfully')

We can now run the preceding program by passing the destination file as the first command-line argument, followed by a list of files to concatenate; for example:

node concat.js all-together.txt file1.txt file2.txt

This should create a new file called all-together.txt containing, in order, the contents of file1.txt and file2.txt.

With the concatFiles() function, we were able to obtain an asynchronous sequential iteration using only streams. This is an elegant and compact solution that enriches our toolbelt, along with the techniques we already explored in Chapter 4, Asynchronous Control Flow Patterns with Callbacks, and Chapter 5, Asynchronous Control Flow Patterns with Promises and Async/Await.

Pattern

Use a stream, or combination of streams, to easily iterate over a set of asynchronous tasks in sequence.

In the next section, we will discover how to use Node.js streams to implement unordered concurrent task execution.

Unordered concurrent execution

We just saw that streams process data chunks in sequence, but sometimes, this can be a bottleneck as we would not make the most of the concurrency of Node.js. If we have to execute a slow asynchronous operation for every data chunk, it can be advantageous to make the execution concurrent and speed up the overall process. Of course, this pattern can only be applied if there is no relationship between each chunk of data, which might happen frequently for object streams, but very rarely for binary streams.

Caution

Unordered concurrent streams cannot be used when the order in which the data is processed is important.

To make the execution of a Transform stream concurrent, we can apply the same patterns that we learned about in Chapter 4, Asynchronous Control Flow Patterns with Callbacks, but with some adaptations to get them working with streams. Let’s see how this works.

Implementing an unordered concurrent stream

Let’s immediately demonstrate how to implement an unordered concurrent stream with an example. Let’s create a module called concurrent-stream.js and define a generic Transform stream that executes a given transform function concurrently:

import { Transform } from 'node:stream'
export class ConcurrentStream extends Transform {
  constructor(userTransform, opts) { // 1
    super({ objectMode: true, ...opts })
    this.userTransform = userTransform
    this.running = 0
    this.terminateCb = null
  }
  _transform(chunk, enc, done) { // 2
    this.running++
    this.userTransform(
      chunk,
      enc,
      this.push.bind(this),
      this._onComplete.bind(this)
    )
    done()
  }
  _flush(done) { // 3
    if (this.running > 0) {
      this.terminateCb = done
    } else {
      done()
    }
  }
  _onComplete(err) { // 4
    this.running--
    if (err) {
      return this.emit('error', err)
    }
    if (this.running === 0) {
      this.terminateCb?.()
    }
  }
}

Let’s analyze this new class step by step:

  1. As you can see, the constructor accepts a userTransform() function, which is then saved as an instance variable. This function will implement the transformation logic that should be executed for every object flowing through the stream. In this constructor, we invoke the parent constructor to initialize the internal state of the stream, and we enable the object mode by default.
  2. Next, it is the _transform() method. In this method, we execute the userTransform() function and then increment the count of running tasks. Finally, we notify the Transform stream that the current transformation step is complete by invoking done(). The trick for triggering the processing of another item concurrently is exactly this. We are not waiting for the userTransform() function to complete before invoking done(); instead, we do it immediately. On the other hand, we provide a special callback to userTransform(), which is the this._onComplete() method. This allows us to get notified when the execution of userTransform() completes.
  3. The _flush() method is invoked just before the stream terminates, so if there are still tasks running, we can put the release of the finish event on hold by not invoking the done() callback immediately. Instead, we assign it to the this.terminateCallback variable.
  4. To understand how the stream is then properly terminated, we have to look into the _onComplete() method. This last method is invoked every time an asynchronous task completes. It checks whether there are any more tasks running and, if there are none, it invokes the this.terminateCallback() function, which will cause the stream to end, releasing the finish event that was put on hold in the _flush() method. Note that _onComplete() is a method that we introduced for convenience as part of the implementation of our ConcurrentStream; it is not a method we are overriding from the base Transform stream class.

The ConcurrentStream class we just built allows us to easily create a Transform stream that executes its tasks concurrently, but there is a caveat: it does not preserve the order of the items as they are received. In fact, while it starts every task in order, asynchronous operations can complete and push data at any time, regardless of when they are started. This property does not play well with binary streams where the order of data usually matters, but it can surely be useful with some types of object streams.

Implementing a URL status monitoring application

Now, let’s apply our ConcurrentStream to a concrete example. Let’s imagine that we want to build a simple service to monitor the status of a big list of URLs. Let’s imagine all these URLs are contained in a single file and are newline-separated.

Streams can offer a very efficient and elegant solution to this problem, especially if we use our ConcurrentStream class to check the URLs in a concurrent fashion.

// check-urls.js
import { createInterface } from 'node:readline'
import { createReadStream, createWriteStream } from 'node:fs'
import { pipeline } from 'node:stream/promises'
import { ConcurrentStream } from './concurrent-stream.js'
const inputFile = createReadStream(process.argv[2]) // 1
const fileLines = createInterface({ // 2
  input: inputFile,
})
const checkUrls = new ConcurrentStream( // 3
  async (url, _enc, push, done) => {
    if (!url) {
      return done()
    }
    try {
      await fetch(url, {
        method: 'HEAD',
        timeout: 5000,
        signal: AbortSignal.timeout(5000),
      })
      push(`${url} is up\n`)
    } catch (err) {
      push(`${url} is down: ${err}\n`)
    }
    done()
  }
)
const outputFile = createWriteStream('results.txt') // 4
await pipeline(fileLines, checkUrls, outputFile) // 5
console.log('All urls have been checked')

As we can see, with streams, our code looks very elegant and straightforward: we initialize the various components of our streaming pipeline and then we combine them together. But let’s discuss some important details:

  1. First, we create a Readable stream from the file given as input.
  2. We leverage the createInterface() function from the node:readline module to create a stream that wraps the input stream and provides the content of the original file line by line. This is a convenient helper that is very flexible and allows us to read lines from various sources.
  3. At this point, we create our ConcurrentStream instance. In our custom transformation logic, we expect to receive one URL at a time. If the URL is empty (e.g., if there’s an empty line in the source file), we just ignore the current entry. Otherwise, we make a HEAD request to the given URL with a timeout of 5 seconds. If the request is successful, the stream emits a string that describes the positive outcome; otherwise, it emits a string that describes an error. Either way, we call the done() callback, which tells the ConcurrentStream that we have completed processing the current task. Note that, since we are handling failure gracefully, the stream can continue processing tasks even if one of them fails. Also, note that we are using both timeout and an AbortSignal because AbortSignal ensures that the request will fail if it takes longer than 5 seconds, regardless of whether data is actively being transferred. Some bot prevention tools deliberately keep connections open by sending responses at very slow rates, effectively causing bots to hang indefinitely. By implementing this mechanism, we ensure that requests are treated as failed if they exceed 5 seconds for any reason.
  4. The last stream that we need to create is our output stream: a file called results.txt.
  5. Finally, we have all the pieces together! We just need to combine the streams into a pipeline to let the data flow between them. And, once the pipeline completes, we print a success message.

Now, we can run the check-urls.js module with a command such as this:

node check-urls.js urls.txt

Here, the file urls.txt contains a list of URLs (one per line); for example:

https://mario.fyi
https://loige.co
http://thiswillbedownforsure.com

When the command finishes running, we will see that a file, results.txt, was created. This contains the results of the operation; for example:

http://thiswillbedownforsure.com is down
https://mario.fyi is up
https://loige.co is up

There is a good probability that the order in which the results are written is different from the order in which the URLs were specified in the input file. This is clear evidence that our stream executes its tasks concurrently, and it does not enforce any order between the various data chunks in the stream.

For the sake of curiosity, we might want to try replacing ConcurrentStream with a normal Transform stream and compare the behavior and performance of the two (you might want to do this as an exercise). Using Transform directly is way slower, because each URL is checked in sequence, but on the other hand, the order of the results in the file results.txt is preserved.

In the next section, we will see how to extend this pattern to limit the number of concurrent tasks running at a given time.

Unordered limited concurrent execution

If we try to run the check-urls.js application against a file that contains thousands or millions of URLs, we will surely run into issues. Our application will create an uncontrolled number of connections all at once, sending a considerable amount of data concurrently, and potentially undermining the stability of the application and the availability of the entire system. As we already know, the solution to keep the load and resource usage under control is to limit the number of concurrent tasks running at any given time.

Let’s see how this works with streams by creating a limited-concurrent-stream.js module, which is an adaptation of concurrent-stream.js, which we created in the previous section.

Let’s see what it looks like, starting from its constructor (we will highlight the changed parts):

export class LimitedConcurrentStream extends Transform {
  constructor (concurrency, userTransform, opts) {
    super({ ...opts, objectMode: true })
    this.concurrency = concurrency
    this.userTransform = userTransform
    this.running = 0
    this.continueCb = null
    this.terminateCb = null
  }
// ...

We need a concurrency limit to be taken as input, and this time, we are going to save two callbacks, one for any pending _transform method (continueCb—more on this next) and another one for the callback of the _flush method (terminateCb).

Next is the _transform() method:

  _transform (chunk, enc, done) {
    this.running++
    this.userTransform(
      chunk,
      enc,
      this.push.bind(this),
      this._onComplete.bind(this)
    )
    if (this.running < this.concurrency) {
      done()
    } else {
      this.continueCb = done
    }
  }

This time, in the _transform() method, we must check whether we have any free execution slots before we can invoke done() and trigger the processing of the next item. If we have already reached the maximum number of concurrently running streams, we save the done() callback in the continueCb variable so that it can be invoked as soon as a task finishes.

The _flush() method remains exactly the same as in the ConcurrentStream class, so let’s move directly to implementing the _onComplete() method:

  _onComplete (err) {
    this.running--
    if (err) {
      return this.emit('error', err)
    }
    const tmpCb = this.continueCb
    this.continueCb = null
    tmpCb?.()
    if (this.running === 0) {
      this.terminateCb && this.terminateCb()
    }
  }

Every time a task completes, we invoke any saved continueCb() that will cause the stream to unblock, triggering the processing of the next item.

That’s it for the LimitedConcurrentStream class. We can now use it in the check-urls.js module in place of ConcurrentStream and have the concurrency of our tasks limited to the value that we set (check the code in the book’s repository for a complete example).

Ordered concurrent execution

The concurrent streams that we created previously may shuffle the order of the emitted data, but there are situations where this is not acceptable. Sometimes, in fact, it is necessary to have each chunk emitted in the same order in which it was received. However, not all hope is lost. We can still run the transform function concurrently; all we must do is sort the data emitted by each task so that it follows the same order in which the data was received. It’s important here to clearly distinguish between the internal processing logic applied to each received chunk, which can safely occur concurrently and therefore in any arbitrary order, and how the processed data is ultimately emitted by the transform stream, which might need to preserve the original order of chunks.

The technique we are going to use involves the use of a buffer to reorder the chunks while they are emitted by each running task. For brevity, we are not going to provide an implementation of such a stream, as it’s quite verbose for the scope of this book. What we are going to do instead is reuse one of the available packages on npm built for this specific purpose, that is, parallel-transform (nodejsdp.link/parallel-transform).

We can quickly check the behavior of an ordered concurrent execution by modifying our existing check-urls module. Let’s say that we want our results to be written in the same order as the URLs in the input file, while executing our checks concurrently. We can do this using parallel-transform:

//...
import parallelTransform from 'parallel-transform' // v1.2.0
const inputFile = createReadStream(process.argv[2])
const fileLines = createInterface({
  input: inputFile,
})
const checkUrls = parallelTransform(8, async function (url, done) {
  if (!url) {
    return done()
  }
  try {
    await fetch(url, { method: 'HEAD', timeout: 5 * 1000 })
    this.push(`${url} is up\n`)
  } catch (err) {
    this.push(`${url} is down: ${err}\n`)
  }
  done()
})
const outputFile = createWriteStream('results.txt')
await pipeline(fileLines, checkUrls, outputFile)
console.log('All urls have been checked')

In the example here, parallelTransform() creates a Transform stream in object mode that executes our transformation logic with a maximum concurrency of 8. If we try to run this new version of check-urls.js, we will now see that the results.txt file lists the results in the same order as the URLs appear in the input file. It is important to see that, even though the order of the output is the same as the input, the asynchronous tasks still run concurrently and can possibly complete in any order.

When using the ordered concurrent execution pattern, we need to be aware of slow items blocking the pipeline or growing the memory indefinitely. In fact, if there is an item that requires a very long time to complete, depending on the implementation of the pattern, it will either cause the buffer containing the pending ordered results to grow indefinitely or the entire processing to block until the slow item completes. With the first strategy, we are optimizing for performance, while with the second, we get predictable memory usage. parallel-transform implementation opts for predictable memory utilization and maintains an internal buffer that will not grow more than the specified maximum concurrency.

With this, we conclude our analysis of the asynchronous control flow patterns with streams. Next, we are going to focus on some piping patterns.

Piping patterns

As in real-life plumbing, Node.js streams can also be piped together by following different patterns. We can, in fact, merge the flow of two different streams into one, split the flow of one stream into two or more pipes, or redirect the flow based on a condition. In this section, we are going to explore the most important plumbing patterns that can be applied to Node.js streams.

Combining streams

In this chapter, we have stressed the fact that streams provide a simple infrastructure to modularize and reuse our code, but there is one last piece missing from this puzzle: what if we want to modularize and reuse an entire pipeline? What if we want to combine multiple streams so that they look like one from the outside? The following figure shows what this means:

Figure 6.6: Combining streams

Figure 6.6: Combining streams

From Figure 6.6, we should already get a hint of how this works:

  • When we write into the combined stream, we are writing into the first stream of the pipeline.
  • When we read from the combined stream, we are reading from the last stream of the pipeline.

A combined stream is usually a Duplex stream, which is built by connecting the first stream to its Writable side and the last stream to its Readable side.

To create a Duplex stream out of two different streams, one Writable and one Readable, we can use an npm module such as duplexer3 (nodejsdp.link/duplexer3) or duplexify (nodejsdp.link/duplexify).

But that’s not enough. In fact, another important characteristic of a combined stream is that it must capture and propagate all the errors that are emitted from any stream inside the pipeline. As we already mentioned, any error event is not automatically propagated down the pipeline when we use pipe(), and we should explicitly attach an error listener to each stream. We saw that we could use the pipeline() helper function to overcome the limitations of pipe() with error management, but the issue with both pipe() and the pipeline() helper is that the two functions return only the last stream of the pipeline, so we only get the (last) Readable component and not the (first) Writable component.

We can verify this very easily with the following snippet of code:

import { createReadStream, createWriteStream } from 'node:fs'
import { Transform, pipeline } from 'node:stream'
import assert from 'node:assert/strict'
const streamA = createReadStream('package.json')
const streamB = new Transform({
  transform(chunk, _enc, done) {
    this.push(chunk.toString().toUpperCase())
    done()
  },
})
const streamC = createWriteStream('package-uppercase.json')
const pipelineReturn = pipeline(streamA, streamB, streamC, () => {
  // handle errors here
})
assert.equal(streamC, pipelineReturn) // valid
const pipeReturn = streamA.pipe(streamB).pipe(streamC)
assert.equal(streamC, pipeReturn) // valid

From the preceding code, it should be clear that with just pipe() or pipeline(), it would not be trivial to construct a combined stream.

To recap, a combined stream has two major advantages:

  • We can redistribute it as a black box by hiding its internal pipeline.
  • We have simplified error management, as we don’t have to attach an error listener to each stream in the pipeline, but just to the combined stream itself.

Combining streams is common in Node.js, and node:stream exposes compose() to make it clean. It merges two or more streams into a single Duplex: writes you perform on the composite enter the first stream in the chain, reads come from the last. Backpressure is preserved end to end, and if any inner stream errors, the composite emits error and the whole chain is destroyed.

import { compose } from 'node:stream'
// ... define streamA, streamB, streamC
const combinedStream = compose(streamA, streamB, streamC)

When we do something like this, compose will create a pipeline out of our streams, return a new combined stream that abstracts away the complexity of our pipeline, and provide the advantages discussed previously.

Unlike .pipe() or pipeline(), compose() is lazy: it just builds the chain and does not start any data flow, so you still need to pipe the returned Duplex to a source and/or destination to move data. Use it when you want to package a reusable processing pipeline as one stream; use pipeline() when you want to wire a source to a destination and wait for completion.

Implementing a combined stream

To illustrate a simple example of combining streams, let’s consider the case of the following two Transform streams:

  • One that both compresses and encrypts the data
  • One that both decrypts and decompresses the data

Using compose, we can easily build these streams (in a file called combined-streams.js) by combining some of the streams that we already have available from the core libraries:

import { createGzip, createGunzip } from 'node:zlib'
import { createCipheriv, createDecipheriv, scryptSync } from 'node:crypto'
import { compose } from 'node:stream'
function createKey(password) {
  return scryptSync(password, 'salt', 24)
}
export function createCompressAndEncrypt(password, iv) {
  const key = createKey(password)
  const combinedStream = compose(
    createGzip(),
    createCipheriv('aes192', key, iv)
  )
  combinedStream.iv = iv
  return combinedStream
}
export function createDecryptAndDecompress(password, iv) {
  const key = createKey(password)
  return compose(createDecipheriv('aes192', key, iv), createGunzip())
}

We can now use these combined streams as if they were black boxes, for example, to create a small application that archives a file by compressing and encrypting it. Let’s do that in a new module named archive.js:

import { createReadStream, createWriteStream } from 'node:fs'
import { pipeline } from 'node:stream'
import { randomBytes } from 'node:crypto'
import { createCompressAndEncrypt } from './combined-streams.js'
const [, , password, source] = process.argv
const iv = randomBytes(16)
const destination = `${source}.gz.enc`
pipeline(
  createReadStream(source),
  createCompressAndEncrypt(password, iv),
  createWriteStream(destination),
  err => {
    if (err) {
      console.error(err)
      process.exit(1)
    }
    console.log(`${destination} created with iv: ${iv.toString('hex')}`)
  }
)

Note how we don’t have to worry about how many steps there are within archive.js. In fact, we just treat it as a single stream within our current pipeline. This makes our combined stream easily reusable in other contexts.

Now, to run the archive module, simply specify a password and a file in the command-line arguments:

node archive.js mypassword /path/to/a/file.txt

This command will create a file called /path/to/a/file.txt.gz.enc, and it will print the generated initialization vector to the console.

Now, as an exercise, you could use the createDecryptAndDecompress() function to create a similar script that takes a password, an initialization vector, and an archived file and unarchives it. Don’t worry, if you get stuck, we will have a solution implemented in this book’s code repository under the file unarchive.js.

In real-life applications, it is generally preferable to include the initialization vector as part of the encrypted data, rather than requiring the user to pass it around. One way to implement this is by having the first 16 bytes emitted by the archive stream represent the initialization vector. The unarchive utility would need to be updated accordingly to consume the first 16 bytes before starting to process the data in a streaming fashion. This approach would add some additional complexity, which is outside the scope of this example; therefore, we opted for a simpler solution. Once you feel comfortable with streams, we encourage you to try to implement, as an exercise, a solution where the initialization vector doesn’t have to be passed around by the user.

With this example, we have clearly demonstrated how important it is to combine streams. On one side, it allows us to create reusable compositions of streams, and on the other, it simplifies the error management of a pipeline.

Forking streams

We can perform a fork of a stream by piping a single Readable stream into multiple Writable streams. This is useful when we want to send the same data to different destinations; for example, two different sockets or two different files. It can also be used when we want to perform different transformations on the same data, or when we want to split the data based on some criteria. If you are familiar with the Unix command tee (nodejsdp.link/tee), this is exactly the same concept applied to Node.js streams!

Figure 6.7 gives us a graphical representation of this pattern:

Figure 6.7: Forking a stream

Figure 6.7: Forking a stream

Forking a stream in Node.js is quite easy, but there are a few caveats to keep in mind. Let’s start by discussing this pattern with an example. It will be easier to appreciate the caveats of this pattern once we have an example at hand.

Implementing a multiple checksum generator

Let’s create a small utility that outputs both the sha1 and md5 hashes of a given file. Let’s call this new module generate-hashes.js:

import { createReadStream, createWriteStream } from 'node:fs'
import { createHash } from 'node:crypto'
const filename = process.argv[2]
const sha1Stream = createHash('sha1').setEncoding('hex')
const md5Stream = createHash('md5').setEncoding('hex')
const inputStream = createReadStream(filename)
inputStream.pipe(sha1Stream).pipe(createWriteStream(`${filename}.sha1`))
inputStream.pipe(md5Stream).pipe(createWriteStream(`${filename}.md5`))

Very simple, right? The inputStream variable is piped into sha1Stream on one side and md5Stream on the other. There are a few things to note that happen behind the scenes:

  • Both md5Stream and sha1Stream will be ended automatically when inputStream ends, unless we specify { end: false } as an option when invoking pipe().
  • The two forks of the stream will receive a reference to the same data chunks, so we must be very careful when performing side-effect operations on the data, as that would affect every stream that we are sending data to.
  • Backpressure will work out of the box; the flow coming from inputStream will go as fast as the slowest branch of the fork. In other words, if one destination pauses the source stream to handle backpressure for a long time, all the other destinations will be waiting as well. Also, one destination blocking indefinitely will block the entire pipeline!
  • If we pipe to an additional stream after we’ve started consuming the data at source (async piping), the new stream will only receive new chunks of data. In those cases, we can use a PassThrough instance as a placeholder to collect all the data from the moment we start consuming the stream. Then, the PassThrough stream can be read at any future time without the risk of losing any data. Just be aware that this approach might generate backpressure and block the entire pipeline, as discussed in the previous point.

Merging streams

Merging is the opposite operation to forking and involves piping a set of Readable streams into a single Writable stream, as shown in Figure 6.8:

Figure 6.8: Merging streams

Figure 6.8: Merging streams

Merging multiple streams into one is, in general, a simple operation; however, we have to pay attention to the way we handle the end event, as piping using the default options (whereby { end: true }) causes the destination stream to end as soon as one of the sources ends. This can often lead to an error, as the other active sources continue to write to an already terminated stream.

The solution to this problem is to use the option { end: false } when piping multiple sources to a single destination and then invoke end() on the destination only when all the sources have completed reading.

Merging text files

To make a simple example, let’s implement a small program that takes an output path and an arbitrary number of text files, and then merges the lines of every file into the destination file. Our new module is going to be called merge-lines.js. Let’s define its contents, starting from some initialization steps:

import { createReadStream, createWriteStream } from 'node:fs'
import { Readable, Transform } from 'node:stream'
import { createInterface } from 'node:readline'
const [, , dest, ...sources] = process.argv

In the preceding code, we are just loading all the dependencies and initializing the variables that contain the name of the destination (dest) file and all the source files (sources).

Next, we will create the destination stream:

const destStream = createWriteStream(dest)

Now, it’s time to initialize the source streams:

let endCount = 0
for (const source of sources) {
  const sourceStream = createReadStream(source, { highWaterMark: 16 })
  const linesStream = Readable.from(createInterface({ input: sourceStream }))
  const addLineEnd = new Transform({
    transform(chunk, _encoding, cb) {
      cb(null, `${chunk}\n`)
    },
  })
  sourceStream.on('end', () => {
    if (++endCount === sources.length) {
      destStream.end()
      console.log(`${dest} created`)
    }
  })
  linesStream
    .pipe(addLineEnd)
    .pipe(destStream, { end: false })
}

In this code, we initialize a source stream for each file in the sources array. Each source is read using createReadStream().

The createInterface() function from the node:readline module is used to process each source file line by line, producing a linesStream that emits individual lines of the source file.

To ensure each emitted line ends with a newline character, we use a simple Transform stream (addLineEnd). This transform appends \n to each chunk of data.

We also attach an end event listener to each source stream. This listener increments a counter (endCount) each time a source stream finishes. When all source streams have been read, it ensures the destination stream (destStream) is closed, signaling the completion of the streaming pipeline.

Finally, each linesStream is piped through the addLineEnd transform and into the destination stream. During this last step, we use the { end: false } option to keep the destination stream open even when one of the sources ends. The destination stream is only closed when all source streams have finished, ensuring no data is lost during the merge. This last step is where the merge happens, because we are effectively piping multiple independent streams into the same destination stream.

We can now execute this code with the following command:

node merge-lines.js <destination> <source1> <source2> <source3> ...

If you run this code with large enough files, you will notice that the destination file will contain lines that are randomly intermingled from all the source files (a low highWaterMark of 16 bytes makes this property even more apparent). This kind of behavior can be acceptable in some types of object streams and some text streams split by line (as in our current example), but it is often undesirable when dealing with most binary streams.

There is one variation of the pattern that allows us to merge streams in order; it consists of consuming the source streams one after the other. When the previous one ends, the next one starts emitting chunks (it is like concatenating the output of all the sources). As always, on npm, we can find some packages that also deal with this situation. One of them is multistream (https://npmjs.org/package/multistream).

Multiplexing and demultiplexing

There is a particular variation of the merge stream pattern in which we don’t really want to just join multiple streams together, but instead, use a shared channel to deliver the data of a set of streams. This is a conceptually different operation because the source streams remain logically separated inside the shared channel, which allows us to split the stream again once the data reaches the other end of the shared channel. Figure 6.9 clarifies this situation:

Figure 6.9: Multiplexing and demultiplexing streams

Figure 6.9: Multiplexing and demultiplexing streams

The operation of combining multiple streams (in this case, also known as channels) to allow transmission over a single stream is called multiplexing, while the opposite operation, namely reconstructing the original streams from the data received from a shared stream, is called demultiplexing. The devices that perform these operations are called multiplexer (or mux) and demultiplexer (or demux), respectively. This is a widely studied area in computer science and telecommunications in general, as it is one of the foundations of almost any type of communication media, such as telephony, radio, TV, and, of course, the Internet itself. For the scope of this book, we will not go too far with the explanations, as this is a vast topic.

What we want to demonstrate in this section is how it’s possible to use a shared Node.js stream to transmit multiple logically separated streams that are then separated again at the other end of the shared stream.

Building a remote logger

Let’s use an example to drive our discussion. We want a small program that starts a child process and redirects both its standard output and standard error to a remote server, which, in turn, saves the two streams in two separate files. So, in this case, the shared medium is a TCP connection, while the two channels to be multiplexed are the stdout and stderr of a child process. We will leverage a technique called packet switching, the same technique that is used by protocols such as IP, TCP, and UDP. Packet switching involves wrapping the data into packets, allowing us to specify various meta information that’s useful for multiplexing, routing, controlling the flow, checking for corrupted data, and so on. The protocol that we are implementing in our example is very minimalist. We wrap our data into simple packets, as illustrated in Figure 6.10:

Figure 6.10: Byte structure of the data packet for our remote logger

Figure 6.10: Byte structure of the data packet for our remote logger

As shown in Figure 6.10, the packet contains the actual data, but also a header (Channel ID + Data length), which will make it possible to differentiate the data of each stream and enable the demultiplexer to route the packet to the right channel.

Client side – multiplexing

Let’s start to build our application from the client side. With a lot of creativity, we will call the module client.js. This represents the part of the application that is responsible for starting a child process and multiplexing its streams.

So, let’s start by defining the module. First, we need some dependencies:

import { fork } from 'node:child_process'
import { connect } from 'node:net'

Now, let’s implement a function that performs the multiplexing of a list of sources:

function multiplexChannels(sources, destination) {
  let openChannels = sources.length
  for (let i = 0; i < sources.length; i++) {
    sources[i]
      .on('readable', function () { // 1
        let chunk
        while ((chunk = this.read()) !== null) {
          const outBuff = Buffer.alloc(1 + 4 + chunk.length) // 2
          outBuff.writeUInt8(i, 0)
          outBuff.writeUInt32BE(chunk.length, 1)
          chunk.copy(outBuff, 5)
          console.log(`Sending packet to channel: ${i}`)
          destination.write(outBuff) // 3
        }
      })
      .on('end', () => { // 4
        if (--openChannels === 0) {
          destination.end()
        }
      })
  }
}

The multiplexChannels() function accepts the source streams to be multiplexed and the destination channel as input, and then it performs the following steps:

  1. For each source stream, it registers a listener for the readable event, where we read the data from the stream using the non-flowing mode (the use of the non-flowing mode will give us more flexibility on reading a specific number of bytes, as we get to write the demultiplexing code).
  2. When a chunk is read, we wrap it into a packet called outBuff that contains, in order, 1 byte (UInt8) for the channel ID (offset 0), 4 bytes (UInt32BE) for the packet size (offset 1), and then the actual data (offset 5).
  3. When the packet is ready, we write it into the destination stream.
  4. Finally, we register a listener for the end event so that we can terminate the destination stream when all the source streams have ended.

Our protocol is capable of multiplexing up to 256 different source streams because we have 1 byte to identify the channel. This is probably enough for most use cases, but if you need more, you can use more bytes to identify the channel.

Now, the last part of our client becomes very easy:

const socket = connect(3000, () => { // 1
  const child = fork( // 2
    process.argv[2],
    process.argv.slice(3),
    { silent: true }
  )
  multiplexChannels([child.stdout, child.stderr], socket) // 3
})

In this last code fragment, we perform the following operations:

  1. We create a new TCP client connection to the address localhost:3000.
  2. We start the child process by using the first command-line argument as the path, while we provide the rest of the process.argv array as arguments for the child process. We specify the option {silent: true} so that the child process does not inherit stdout and stderr of the parent.
  3. Finally, we take stdout and stderr of the child process and we multiplex them into the socket’s Writable stream using the mutiplexChannels() function.

Server side – demultiplexing

Now, we can take care of creating the server side of the application (server.js), where we demultiplex the streams from the remote connection and pipe them into two different files.

Let’s start by creating a function called demultiplexChannel():

import { createWriteStream } from 'node:fs'
import { createServer } from 'node:net'
function demultiplexChannel(source, destinations) {
  let currentChannel = null
  let currentLength = null
  source
    .on('readable', () => { // 1
      let chunk
      if (currentChannel === null) { // 2
        chunk = source.read(1)
        currentChannel = chunk?.readUInt8(0)
      }
      if (currentLength === null) { // 3
        chunk = source.read(4)
        currentLength = chunk?.readUInt32BE(0)
        if (currentLength === null) {
          return null
        }
      }
      chunk = source.read(currentLength) // 4
      if (chunk === null) {
        return null
      }
      console.log(`Received packet from: ${currentChannel}`)
      destinations[currentChannel].write(chunk) // 5
      currentChannel = null
      currentLength = null
    })
    .on('end', () => { // 6
      for (const destination of destinations) {
        destination.end()
      }
      console.log('Source channel closed')
    })
}

The preceding code might look complicated, but it is not. Thanks to the features of Node.js Readable streams, we can easily implement the demultiplexing of our little protocol as follows:

  1. We start reading from the stream using the non-flowing mode (as you can see, now we can easily read as many bytes as we need for every part of the received message).
  2. First, if we have not read the channel ID yet, we try to read 1 byte from the stream and then transform it into a number.
  3. The next step is to read the length of the data. We need 4 bytes for that, so it’s possible (even if unlikely) that we don’t have enough data in the internal buffer, which will cause the this.read() invocation to return null. In such a case, we simply interrupt the parsing and retry at the next readable event.
  4. When we can finally also read the data size, we know how much data to pull from the internal buffer, so we try to read it all. Again, if this operation returns null, we don’t yet have all the data in the buffer, so we return null and retry on the next readable event.
  5. When we read all the data, we can write it to the right destination channel, making sure that we reset the currentChannel and currentLength variables (these will be used to parse the next packet).
  6. Lastly, we make sure to end all the destination channels when the source channel ends.

Now that we can demultiplex the source stream, let’s put our new function to work:

const server = createServer(socket => {
  const stdoutStream = createWriteStream('stdout.log')
  const stderrStream = createWriteStream('stderr.log')
  demultiplexChannel(socket, [stdoutStream, stderrStream])
})
server.listen(3000, () => console.log('Server started'))

In the preceding code, we first start a TCP server on port 3000; then, for each connection that we receive, we create two Writable streams pointing to two different files: one for the standard output and the other for the standard error. These are our destination channels. Finally, we use demultiplexChannel() to demultiplex the socket stream into stdoutStream and stderrStream.

Running the mux/demux application

Now, we are ready to try our new mux/demux application, but first, let’s create a small Node.js program to produce some sample output:

// generate-data.js
console.log('out1')
console.log('out2')
console.error('err1')
console.log('out3')
console.error('err2')

Okay, now we are ready to try our remote logging application. First, let’s start the server:

node server.js

Then, we’ll start the client by providing the file that we want to start as a child process:

node client.js generateData.js

The client will run almost immediately, but at the end of the process, the standard input and standard output of the generate-data.js application will have traveled through one single TCP connection and been demultiplexed on the server into two separate files.

Please make a note that, as we are using child_process.fork() (nodejsdp.link/fork), our client will only be able to launch other Node.js modules.

Multiplexing and demultiplexing object streams

The example that we have just shown demonstrates how to multiplex and demultiplex a binary/text stream, but it’s worth mentioning that the same rules apply to object streams. The biggest difference is that when using objects, we already have a way to transmit the data using atomic messages (the objects), so multiplexing would be as easy as setting a channelID property in each object. Demultiplexing would simply involve reading the channelID property and routing each object toward the right destination stream.

Another pattern involving only demultiplexing is routing the data coming from a source depending on some condition. With this pattern, we can implement complex flows, such as the one shown in Figure 6.11:

Figure 6.11: Demultiplexing an object stream

Figure 6.11: Demultiplexing an object stream

The demultiplexer used in the system in Figure 6.11 takes a stream of objects representing animals and distributes each of them to the right destination stream based on the class of the animal: reptiles, amphibians, or mammals.

Using the same principle, we can also implement an if...else statement for streams. For some inspiration, take a look at the ternary-stream package (nodejsdp.link/ternary-stream), which allows us to do exactly that.

Readable stream utilities

In this chapter, we’ve explored how Node.js streams work, how to create custom streams, and how to compose them into efficient, elegant data processing pipelines. To complete the picture, let’s look at some utilities provided by the node:stream module that simplify working with Readable streams. These utilities are designed to streamline data processing in a streaming fashion and bring a functional programming flavor to stream operations.

All these utilities are methods available for any Readable stream, including Duplex, PassThrough, and Transform streams. Since most of these methods return a new Readable stream, they can be chained together to create expressive, pipeline-like code. Unsurprisingly, many of these methods mirror common operations available in the Array prototype, but they are optimized for handling streaming data.

Here’s a summary of the key methods:

Mapping and transformation

  • readable.map(fn): Applies a transformation function (fn) to each chunk in the stream, returning a new stream with the transformed data. If fn returns a Promise, the result is awaited before being passed to the output stream.
  • readable.flatMap(fn): Similar to map, but allows fn to return streams, iterables, or async iterables, which are then flattened and merged into the output stream.

Filtering and iteration

  • readable.filter(fn): Filters the stream by applying fn to each chunk. Only chunks for which fn returns a truthy value are included in the output stream. Supports async fn functions.
  • readable.forEach(fn): Invokes fn for each chunk in the stream. This is typically used for side effects rather than producing a new stream. If fn returns a Promise, it will be awaited before processing the next chunk.

Searching and evaluation

  • readable.some(fn): Checks if at least one chunk satisfies the condition in fn. Once a truthy value is found, the stream is destroyed, and the returned Promise resolves to true. If no chunk satisfies the condition, it resolves to false.
  • readable.every(fn): Verifies if all chunks satisfy the condition in fn. If any chunk fails the condition, the stream is destroyed, and the returned Promise resolves to false. Otherwise, it resolves to true when the stream ends.
  • readable.find(fn): It returns a Promise that will resolve to the value of the first chunk that satisfies the condition in fn. If no chunk meets the condition, the returned Promise will resolve to undefined once the stream ends.

Limiting and reducing

  • readable.drop(n): Skips the first n chunks in the stream, returning a new stream that starts from the (n+1)th chunk.
  • readable.take(n): Returns a new stream that includes, at most, the first n chunks. Once n chunks are reached, the stream is terminated.
  • readable.reduce(fn, initialValue): Reduces the stream by applying fn to each chunk, accumulating a result that is returned as a Promise. If no initialValue is provided, the first chunk is used as the initial value.

The official documentation has lots of examples for all these methods and there are other less common methods we haven’t explored for brevity. We recommend you check out the docs (nodejsdp.link/stream-iterators) if any of these still feel confusing and you are unsure about when to use them.

Just to give you a more practical overview, let’s re-implement the processing pipeline we illustrated before to explain filtering and reducing with a custom Transform stream, but this time we are going to use only Readable stream utilities. As a reminder, in this example, we are parsing a CSV file that contains sales data. We want to calculate the total amount of profit made from sales in Italy. Every line of the CSV file has 3 fields: type, country, and profit. The first line contains the CSV headers.

import { createReadStream } from 'node:fs'
import { createInterface } from 'node:readline'
import { Readable, compose } from 'node:stream'
import { createGunzip } from 'node:zlib'
const uncompressedData = compose( // 1
  createReadStream('data.csv.gz'),
  createGunzip()
)
const byLine = Readable.from( // 2
  createInterface({ input: uncompressedData })
)
const totalProfit = await byline // 3
  .drop(1) // 4
  .map(chunk => { // 5
    const [type, country, profit] = chunk.toString().split(',')
    return { type, country, profit: Number.parseFloat(profit) }
  })
  .filter(record => record.country === 'Italy') // 6
  .reduce((acc, record) => acc + record.profit, 0) // 7
console.log(totalProfit)

Here’s a step-by-step breakdown of what the preceding code does:

  1. The data comes from a gzipped CSV file, so we initially compose a file read stream and a decompression stream to create a source stream that gives uncompressed CSV data.
  2. We want to read the data line by line, so we use the createInterface() utility from the node:readline module to wrap our source stream and give us a new Readable stream (byLine) that produces lines from the original stream.
  3. Here’s where we start to use some of the helpers we discussed in this section. Since the last helper is .reduce(), which returns a Promise, we use await here to wait for the returned Promise to resolve and to capture the final result in the total variable.
  4. The first helper we use is .drop(1), which allows us to skip the first line of the uncompressed source data. This line will contain the CSV header (“type,country,profit”) and no useful data, so it makes sense to skip it. This operation returns a new Readable stream, so we can chain other helper methods.
  5. The next helper we use in the chain is .map(). In the mapping function, we provide all the necessary logic to parse a line from the original CSV file and convert it into a record object containing the fields type, country, and profit. This operation returns another Readable stream, so we can keep chaining more helper functions to continue building our processing logic.
  6. The next step is .filter(), which we use to retain only records that represent profit associated with the country Italy. Once again, this operation gives us a new Readable stream.
  7. The last step of the processing pipeline is .reduce(). We use this helper to aggregate all the filtered records by summing their profit. As we mentioned before, this operation will give us a Promise that will resolve to the total profit once the stream completes.

This example shows how to create stream processing pipelines using a more direct approach. In this approach, we chain helper methods, and we have all the transformation logic clearly visible in the same context (assuming we define all the transformation functions in line). This approach can be particularly convenient in situations where the transformation logic is very simple, and you don’t need to build highly specialized and reusable custom Transform streams.

Note that, in this example, we created our own basic way of parsing records out of CSV lines rather than using a dedicated library for it. We did this just to have an excuse to showcase how to use the .drop() and .map() methods. Our implementation is very rudimentary, and it doesn’t handle all the possible edge cases. This is fine because we know there aren’t edge cases (e.g., quoted fields) in our input data, but in real-world projects, we would recommend using a reliable CSV parsing library instead.

Web Streams

The WHATWG Streams Standard (nodejsdp.link/web-streams) provides a standardized API for working with streaming data, known as “Web Streams.” While inspired by Node.js streams, it has its own distinct implementation and is designed to be a universal standard for the broader JavaScript ecosystem, including browsers.

About a decade after the initial development of Node.js streams, Web Streams emerged to address the lack of a native streaming API in browser environments, something that made it difficult to efficiently work with large datasets on the frontend.

Today, most modern browsers support the Web Streams standard natively, making it an ideal choice for building streaming pipelines within the browser. In contrast, Node.js streams are not natively available in browsers. You could bring Node.js streams to the browser by installing them as a library in your project, but their utility is limited since native APIs like fetch use Web Streams to send requests or read responses incrementally. Given this, using Web Streams in the browser is the recommended choice.

Web Streams have also been implemented in Node.js, effectively giving us two competing APIs to deal with streaming data. However, at the time of writing, Web Streams is still relatively new and hasn’t yet reached the same level of adoption as native Node.js streams within the large Node.js ecosystem. That’s why this chapter focused mainly on Node.js streams, but understanding Web Streams is still an important piece of knowledge, and we expect it to become more relevant in the coming years.

Fortunately, getting started with Web Streams should be easy if you have been following this chapter. Most of the concepts are aligned, and the primary differences lie in function names and arguments, which is something that can be easily learned by checking the Web Streams API documentation.

One aspect that is worth exploring here is the interoperability between Node.js and Web Streams. Fortunately, it’s possible to convert Node.js stream objects to equivalent Web Stream objects and vice versa. This makes it easy to transition or work with third-party libraries that use Web Streams in the context of Node.js.

Let’s briefly discuss how this interoperability works.

In the Web Streams standard, we have 3 primary types of objects:

  • ReadableStream: Source of streaming data and pretty much equivalent to a Readable Node.js stream.
  • WritableStream: Destination for streaming data; equivalent to a Node.js Writable stream.
  • TransformStream: Allows you to transform streaming data in a streaming pipeline. Equivalent to a Node.js Transform stream.

Note how these concepts match almost perfectly. Also note how, thanks to the Stream suffix of the Web Streams classes, we don’t have naming conflicts between equivalent streaming abstractions.

Converting Node.js streams to Web Streams

You can easily convert Node.js streams to equivalent Web Streams objects by using the .toWeb(sourceNodejsStream) method available respectively in the Readable, Writable, and Transform classes.

Let’s see what the syntax looks like:

import { Readable, Writable, Transform } from 'node:stream'
const nodeReadable = new Readable({/*...*/}) // Readable
const webReadable = Readable.toWeb(nodeReadable) // ReadbleStream
const nodeWritable = new Writable({/*...*/}) // Writable
const webWritable = Writable.toWeb(nodeWritable) // WritableStream
const nodeTransform = new Transform({/*...*/}) // Transform
const webTransform = Transform.toWeb(nodeTransform) // TransformStream

Converting Web Streams to Node.js streams

The Readable, Writable, and Transform classes also expose methods to convert a Web Stream to an equivalent Node.js stream. These methods, unsurprisingly, have the following signature: .fromWeb(sourceWebStream).

Let’s see a quick example to clarify the syntax:

import { Readable, Writable, Transform } from 'node:stream'
import {
  ReadableStream,
  WritableStream,
  TransformStream,
} from 'node:stream/web'
const webReadable = new ReadableStream({/*...*/}) // ReadableStream
const nodeReadable = Readable.fromWeb(webReadable) // Readable
const webWritable = new WritableStream({/*...*/}) // WritableStream
const nodeWritable = Writable.fromWeb(webWritable) // Writable
const webTransform = new TransformStream({/*...*/}) // TransformStream
const nodeTransform = Transform.fromWeb(webTransform) // Transform

The last two snippets illustrate how easy it is to convert stream types between Node.js streams and Web Streams.

One important detail to keep in mind is that these conversions don’t destroy the source stream but rather wrap it in a new object that is compliant with the target API. For example, when we convert a Node.js Readable stream to a web ReadableStream, we can still read from the source stream while also reading from the new Web Stream. The following example should help to clarify this idea:

import { Readable } from 'node:stream'
const nodeReadable = new Readable({
  read() {
    this.push('Hello, ')
    this.push('world!')
    this.push(null)
  },
})
const webReadable = Readable.toWeb(nodeReadable)
nodeReadable.pipe(process.stdout)
webReadable.pipeTo(Writable.toWeb(process.stdout))

In the preceding example, we are defining a Node.js stream that emits the string “Hello, world!” in 2 chunks before completing. We convert this stream into an equivalent Web Stream, then we pipe both the source Node.js stream and the newly created Web Stream to standard output.

This code will produce the following output:

Hello, Hello, world!world!

This is because, every time that the source Node.js stream emits a chunk, the same chunk is also emitted by the associated Web Stream.

The .fromWeb() and .toWeb() methods are implementations of the Adapter pattern that we will discuss in more detail in Chapter 8, Structural Design Patterns.

Stream consumer utilities

As we’ve repeated countless times throughout this chapter, streams are designed to transfer and process large amounts of data in small chunks. However, there are situations where you need to consume the entire content of a stream and accumulate it in memory. This is more common than it might seem, largely because many abstractions in the Node.js ecosystem use streams as the fundamental building block for data transfer. This design provides a great deal of flexibility, but it also means that sometimes you need to handle chunk-by-chunk data manually. In such cases, it’s important to understand how to convert a stream of discrete chunks into a single, buffered piece of data that can be processed as a whole.

A good example of this is the low-level node:http module, which allows you to make HTTP requests. When handling an HTTP response, Node.js represents the response body as a Readable stream. This means you’re expected to process the response data incrementally, as chunks arrive.

But what if you know in advance that the response body contains a JSON-serialized object? In that case, you can’t process the chunks independently; you need to wait until the entire response has been received so you can parse it as a complete string using JSON.parse().

A simple implementation of this pattern might look like the following code:

import { request } from 'node:http'
const req = request('http://example.com/somefile.json', res => { // 1
  let buffer = '' // 2
  res.on('data', chunk => {
    buffer += chunk
  })
  res.on('end', () => { // 3
    console.log(JSON.parse(buffer))
  })
})
req.end() // 4

To better understand this example, let’s discuss its main points:

  1. Here, a request is being made to http://example.com/somefile.json. The second argument is a callback that receives the response (res) object, which is a Readable stream. This stream emits chunks of data as they arrive over the network.
  2. Inside the response callback, we initialize an empty string called buffer. As each chunk of data arrives (via the 'data' event), we concatenate it to the buffer string. This effectively buffers the entire response body in memory. This approach is necessary when you need to handle the whole response as a complete unit – for example, when parsing JSON, since JSON.parse() only works on complete strings.
  3. Once the entire response has been received and no more data will arrive ('end' event), we use JSON.parse() to deserialize the accumulated string into a JavaScript object. The resulting object is then logged to the console.
  4. Finally, req.end() is called to signal that no request body will be sent (our request is complete and can be forwarded). Since this is a GET request with no body, it’s necessary to explicitly finalize the request.

A final point worth noting is that this code doesn’t require async/await because it relies entirely on event-based callbacks, which is the traditional way of handling asynchronous operations in Node.js streams.

This solution works, but it’s a bit boilerplate-heavy. Thankfully, there’s a better solution, thanks to the node:stream/consumers module.

This built-in library was introduced in Node.js version 16 to expose various utilities that make it easy to consume the entire content from a Node.js Readable instance or a Web Streams ReadableStream instance.

This module exposes the consumers object, which implements the following static methods:

  • consumers.arrayBuffer(stream)
  • consumers.blob(stream)
  • consumers.buffer(stream)
  • consumers.text(stream)
  • consumers.json(stream)

Each one of these methods consumes the given stream and returns a Promise that resolves only when the stream has been fully consumed.

It’s easy to guess that each method accumulates the data into a different kind of object. arrayBuffer(), blob(), and buffer() will accumulate chunks as binary data in an ArrayBuffer, a Blob, or a Buffer instance, respectively. text() accumulates data in a string object, while json() accumulates data in a string object and will also try to deserialize the data using JSON.parse() before resolving the corresponding Promise.

This means that we can rewrite the previous example as follows:

import { request } from 'node:https'
import consumers from 'node:stream/consumers'
const req = request(
  'http://example.com/somefile.json',
  async res => {
    const buffer = await consumers.json(res)
    console.log(buffer)
  }
)
req.end()

Much more concise and elegant, isn’t it?

If you use fetch to make HTTP(s) requests, the response object provided by the fetch API has various consumers built in. You could rewrite the previous example as follows:

const res = await fetch('http://example.com/somefile.json')
const buffer = await res.json()
console.log(buffer)

The response object (res) also exposes .blob(), .arrayBuffer(), and .text() if you want to accumulate the response data as a binary buffer or as text. Note that the .buffer() method is missing, though. This is because the Buffer class is not part of the Web standard, but it exists only in Node.js.

Summary

In this chapter, we shed some light on Node.js streams and some of their most common use cases. We learned why streams are so acclaimed by the Node.js community and we mastered their basic functionality, enabling us to discover more and navigate comfortably in this new world. We analyzed some advanced patterns and started to understand how to connect streams in different configurations, grasping the importance of interoperability, which is what makes streams so versatile and powerful.

If we can’t do something with one stream, we can probably do it by connecting other streams together, and this works great with the one thing per module philosophy. At this point, it should be clear that streams are not just a good-to-know feature of Node.js; they are an essential part – a crucial pattern to handle binary data, strings, and objects. It’s not by chance that we dedicated an entire chapter to them.

In the next few chapters, we will focus on the traditional object-oriented design patterns. But don’t be fooled; even though JavaScript is, to some extent, an object-oriented language, in Node.js, the functional or hybrid approach is often preferred. Get rid of every prejudice before reading the next chapters.

Exercises

  • 6.1 Data compression efficiency: Write a command-line script that takes a file as input and compresses it using the different algorithms available in the zlib module (Brotli, Deflate, Gzip). You want to produce a summary table that compares the algorithm’s compression time and compression efficiency on the given file. Hint: This could be a good use case for the fork pattern, but remember that we made some important performance considerations when we discussed it earlier in this chapter.
  • 6.2 Stream data processing: On Kaggle, you can find a lot of interesting datasets, such as London Crime Data (nodejsdp.link/london-crime). You can download the data in CSV format and build a stream processing script that analyzes the data and tries to answer the following questions:
    • Did the number of crimes go up or down over the years?
    • What are the most dangerous areas of London?
    • What is the most common crime per area?
    • What is the least common crime?

Hint: You can use a combination of Transform streams and PassThrough streams to parse and observe the data as it is flowing. Then, you can build in-memory aggregations for the data, which can help you answer the preceding questions. Also, you don’t need to do everything in one pipeline; you could build very specialized pipelines (for example, one per question) and use the fork pattern to distribute the parsed data across them.

  • 6.3 File share over TCP: Build a client and a server to transfer files over TCP. Extra points if you add a layer of encryption on top of that and if you can transfer multiple files at once. Once you have your implementation ready, give the client code and your IP address to a friend or a colleague, then ask them to send you some files! Hint: You could use mux/demux to receive multiple files at once.
  • 6.4 Animations with Readable streams: Did you know you can create amazing terminal animations with just Readable streams? Well, to understand what we are talking about here, try to run curl parrot.live in your terminal and see what happens! If you think that this is cool, why don’t you try to create something similar? Hint: If you need some help with figuring out how to implement this, you can check out the actual source code of parrot.live by simply accessing its URL through your browser.
Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Gain a deep understanding of the Node.js philosophy, its core components, and the solutions in its ecosystem
  • Avoid common pitfalls in applying proven patterns to create robust, maintainable Node.js applications
  • Enhance your development skills through a wealth of real-world examples and case studies

Description

Node.js underpins much of modern web development, reliably powering APIs and full-stack apps across all industries. Authors Luciano Mammino and Mario Casciaro offer a practical guide that unpacks the JavaScript runtime so you can write reliable, high-performance Node.js apps. Building on the highly rated third edition, this new edition adds fresh case studies and the latest Node.js developments: newer APIs and libraries, ESM improvements, practical security and production tips, and guidance on using Node.js with TypeScript. It also introduces a new chapter on testing that gives you a full introduction to testing philosophy and practical guidance on writing unit, integration, and end-to-end tests, giving you the confidence to write functional, stable, and reliable code. Real-world, end-to-end examples throughout the book show how to build microservices and distributed systems with Node.js, integrating production-proven technologies such as Redis, RabbitMQ, LevelDB, and ZeroMQ, the same components you’ll find in scalable deployments at companies of all sizes. End-of-chapter exercises consolidate your understanding. By the end of this Node.js book, you’ll have the design patterns, mindset, and hands-on skills every serious Node.js professional needs to confidently architect robust, efficient, and maintainable applications.

Who is this book for?

This book is for you if you’re a developer or software architect with basic knowledge of JavaScript and Node.js and want to get the most out of these technologies to maximize productivity, design quality, and scalability. It’ll help you level up from junior to senior roles. This book is a tried-and-tested reference guide for readers at all levels. Even those with more experience will find value in the more advanced patterns and techniques presented. You’re expected to have an intermediate understanding of web application development, databases, and software design principles.

What you will learn

  • Understand Node.js basics and its async event-driven architecture
  • Write correct async code using callbacks, promises, and async/await
  • Harness Node.js streams to create data-driven processing pipelines
  • Implement trusted software design patterns for production-grade applications
  • Write testable code and automated tests (unit, integration, E2E)
  • Use advanced recipes: caching, batching, async init, offload CPU-bound work
  • Build and scale microservices and distributed systems powered by Node.js

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Sep 25, 2025
Length: 732 pages
Edition : 4th
Language : English
ISBN-13 : 9781803235431
Languages :
Tools :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Product Details

Publication date : Sep 25, 2025
Length: 732 pages
Edition : 4th
Language : English
ISBN-13 : 9781803235431
Languages :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Table of Contents

15 Chapters
The Node.js Platform Chevron down icon Chevron up icon
The Module System Chevron down icon Chevron up icon
Callbacks and Events Chevron down icon Chevron up icon
Asynchronous Control Flow Patterns with Callbacks Chevron down icon Chevron up icon
Asynchronous Control Flow Patterns with Promises and Async/Await Chevron down icon Chevron up icon
Coding with Streams Chevron down icon Chevron up icon
Creational Design Patterns Chevron down icon Chevron up icon
Structural Design Patterns Chevron down icon Chevron up icon
Behavioral Design Patterns Chevron down icon Chevron up icon
Testing: Patterns and Best Practices Chevron down icon Chevron up icon
Advanced Recipes Chevron down icon Chevron up icon
Scalability and Architectural Patterns Chevron down icon Chevron up icon
Messaging and Integration Patterns Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Rating distribution
Full star icon Full star icon Full star icon Full star icon Full star icon 5
(1 Ratings)
5 star 100%
4 star 0%
3 star 0%
2 star 0%
1 star 0%
ZhiChao Apr 20, 2025
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book is very good and has benefited me a lot. It is also the best and most high-quality Node.js book I have seen so far. I am very grateful to the author for his hard work. A small suggestion: adding a chapter specifically describing RxJS and comparing it to Promise would make the book much better and more powerful!
Subscriber review Packt
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.

Modal Close icon
Modal Close icon