Node js+at+Scale+II +-+node js+Under+the+Hood
Node js+at+Scale+II +-+node js+Under+the+Hood
RisingStack 1
From the Engineers of
Table of contents
RisingStack 2
CHAPTER ONE: THE EVENT LOOP
The first chapter helps you to understand how the Node.js event
loop works, and how you can leverage it to build fast applications.
We’ll also discuss the most common problems you might encounter,
and the solutions for them.
THE PROBLEM
For network communications it can be even worse, just try and ping
google.com
$ ping google.com
64 bytes from 172.217.16.174: icmp_seq=0 ttl=52
time=33.017 ms
64 bytes from 172.217.16.174: icmp_seq=1 ttl=52
time=83.376 ms
64 bytes from 172.217.16.174: icmp_seq=2 ttl=52
time=26.552 ms
64 bytes from 172.217.16.174: icmp_seq=3 ttl=52
time=40.153 ms
64 bytes from 172.217.16.174: icmp_seq=4 ttl=52
time=37.291 ms
64 bytes from 172.217.16.174: icmp_seq=5 ttl=52
time=58.692 ms
64 bytes from 172.217.16.174: icmp_seq=6 ttl=52
time=45.245 ms
64 bytes from 172.217.16.174: icmp_seq=7 ttl=52
time=27.846 ms
RisingStack 3
THE SOLUTION
It is tedious, complicated, but gets the job done. But what about
Node? Well, we are surely facing some problems as Node.js - or
more like V8 - is single-threaded. Our code can only run in one thread.
SIDE NOTE: This is not entirely true. Both Java and Python have async
interfaces, but using them is definitely more difficult than in Node.js.
First of all, let’s take a look at the call stack, or simply, “stack”. I am
going to make things simple, as we only need to understand the very
basics of the call stack. In case you are familiar how it works, feel
free to jump to the next section.
THE STACK
For the sake of simplicity I will say that ‘a function is pushed’ to the
top of the stack from now on, even though it is not exactly correct.
RisingStack 4
Let’s take a look!
function main () {
const hypotenuse = getLengthOfHypotenuse(3, 4)
console.log(hypotenuse)
}
function getLengthOfHypotenuse(a, b) {
const squareA = square(a)
const squareB = square(b)
const sumOfSquares = squareA + squareB
return Math.sqrt(sumOfSquares)
}
function square(number) {
return number * number
}
main()
RisingStack 5
afterwards square is with the value of a
when square returns, it is popped from the stack, and its return value
is assigned to squareA. squareA is added to the stack frame of
getLengthOfHypotenuse
RisingStack 6
in the next line the expression squareA + squareB is evaluated
RisingStack 7
the returned value gets assigned to hypotenuse in main
finally, main returns without any value, gets popped from the stack
leaving it empty
RisingStack 8
SIDE NOTE: You saw that local variables are popped from the stack
when the functions execution finishes. It happens only when you work
with simple values such as numbers, strings and booleans. Values
of objects, arrays and such are stored in the heap and your variable
is merely a pointer to them. If you pass on this variable, you will only
pass the said pointer, making these values mutable in different stack
frames. When the function is popped from the stack, only the pointer
to the Object gets popped with leaving the actual value in the heap.
The garbage collector is the guy who takes care of freeing up space
once the objects outlived their usefulness.
RisingStack 9
‘use strict’
const express = require(‘express’)
const superagent = require(‘superagent’)
const app = express()
app.get(‘/’, sendWeatherOfRandomCity)
const CITIES = [
‘london’,
‘newyork’,
‘paris’,
‘budapest’,
‘warsaw’,
‘rome’,
‘madrid’,
‘moscow’,
‘beijing’,
‘capetown’,
]
function sayHi () {
console.log(‘Hi’)
}
app.listen(3000)
What will be printed out aside from getting the weather when a
request is sent to localhost:3000?
RisingStack 10
Fetching the weather, please be patient
Hi
Got the weather
To peek under the hood, we need to introduce two new concepts: the
event loop and the task queue.
TASK QUEUE
RisingStack 11
case there is another request being served when the said file is read,
its callback will need to wait for the stack to become empty. The
limbo where callbacks are waiting for their turn to be executed is
called the task queue (or event queue, or message queue). Callbacks
are being called in an infinite loop whenever the main thread has
finished its previous task, hence the name ‘event loop’.
RisingStack 12
So now we can understand why the previously mentioned
setTimeout hack works. Even though we set the counter to zero,
it defers the execution until the current stack and the task queue is
empty, allowing the browser to redraw the UI, or Node to serve other
requests.
If this wasn’t enough, we actually have more then one task queue.
One for microtasks and another for macrotasks.
examples of microtasks:
• process.nextTick
• promises
• Object.observe
examples of macrotasks:
• setTimeout
• setInterval
• setImmediate
• I/O
Let’s take a look at the following code (on the next page) :
RisingStack 13
console.log(‘script start’)
setTimeout(() => {
console.log(‘setTimeout 1’)
Promise.resolve().then(() => {
console.log(‘promise 3’)
}).then(() => {
console.log(‘promise 4’)
}).then(() => {
setTimeout(() => {
console.log(‘setTimeout 2’)
Promise.resolve().then(() => {
console.log(‘promise 5’)
}).then(() => {
console.log(‘promise 6’)
}).then(() => {
clearInterval(interval)
})
}, 0)
})
}, 0)
Promise.resolve().then(() => {
console.log(‘promise 1’)
}).then(() => {
console.log(‘promise 2’)
})
script start
promise1
promise2
setInterval
setTimeout1
promise3
promise4
setInterval
setTimeout2
setInterval
promise5
promise6
RisingStack 14
According to the WHATVG specification, exactly one (macro)task
should get processed from the macrotask queue in one cycle of the
event loop. After said macrotask has finished, all of the available
microtasks will be processed within the same cycle. While these
microtasks are being processed, they can queue more microtasks,
which will all be run one by one, until the microtask queue is
exhausted.
RisingStack 15
--- CYCLE 1 ---
RisingStack 16
We can fix this in Node too with process.nextTick and some mind-
boggling callback hell.
console.log(‘script start’)
setTimeout(() => {
console.log(‘setTimeout 1’)
process.nextTick(() => {
console.log(‘nextTick 3’)
process.nextTick(() => {
console.log(‘nextTick 4’)
setTimeout(() => {
console.log(‘setTimeout 2’)
process.nextTick(() => {
console.log(‘nextTick 5’)
process.nextTick(() => {
console.log(‘nextTick 6’)
clearInterval(interval)
})
})
}, 0)
})
})
})
process.nextTick(() => {
console.log(‘nextTick 1’)
process.nextTick(() => {
console.log(‘nextTick 2’)
})
})
This is the exact same logic as our beloved promises use, only a little
bit more hideous. At least it gets the job done the way we expected.
The event loop might be a slippery concept to grasp at first, but once
you get the hang of it, you won’t be able to imagine that there is life
without it. The continuation passing style that can lead to a callback
hell might look ugly, but we have Promises, and soon we will have
RisingStack 17
async-await in our hands... and while we are (a)waiting, you can
simulate async-await using co and/or koa.
Happy coding!
RisingStack 18
CHAPTER TWO: GARBAGE COLLECTION
MEMORY MANAGEMENT
IN NODE.JS APPLICATIONS
RisingStack 19
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main() {
char name[20];
char *description;
strcpy(name, “RisingStack”);
// memory allocation
description = malloc( 30 * sizeof(char) );
// release memory
free(description);
}
• Memory leaks when the used memory space is never freed up.
• Wild/dangling pointers appear when an object is deleted, but the
pointer is reused. Serious security issues can be introduced when
other data structures are overwritten or sensitive information is read.
Luckily for you, Node.js comes with a garbage collector, and you
don’t need to manually manage memory allocation.
RisingStack 20
LISP in 1959, invented by John McCarthy.
The way how the GC knows that objects are no longer in use is that
no other object has references to them.
The following diagram shows how the memory can look like if you
have objects with references to each other, and with some objects
that have no reference to any objects. These are the objects that can
be collected by a garbage collector run.
Once the garbage collector is run, the objects that are unreachable
gets deleted, and the memory space is freed up.
RisingStack 21
THE ADVANTAGES OF
USING A GARBAGE COLLECTOR
The Stack
add(4, 5)
The Heap
RisingStack 22
The Car object created in the following snippet is placed on the
heap.
Let’s add more cars, and see how our memory would look like!
If the GC would run now, nothing could be freed up, as the root has
a reference to every object. Let’s make it a little bit more interesting,
and add some parts to our cars!
RisingStack 23
function Engine (power) {
this.power = power
}
RisingStack 24
As a result, the original Mater object cannot be reached from the root
object, so on the next garbage collector run it will be freed up:
The heap has two main segments, the New Space and the Old Space.
The New Space is where new allocations are happening; it is fast to
collect garbage here and has a size of ~1-8MBs. Objects living in the
New Space are called Young Generation.
The Old Space where the objects that survived the collector in the
New Space are promoted into - they are called the Old Generation.
Allocation in the Old Space is fast, however collection is expensive
so it is infrequently performed .
Young Generation
RisingStack 25
collection algorithms.
Well, the typical way that closures are implemented is that every function
object has a link to a dictionary-style object representing its lexical scope.
If both functions defined inside replaceThing actually used originalThing,
it would be important that they both get the same object, even if
originalThing gets assigned to over and over, so both functions share
the same lexical environment. Now, Chrome’s V8 JavaScript engine is
apparently smart enough to keep variables out of the lexical environment
if they aren’t used by any closures - from the Meteor blog.
Further reading:
• Finding a memory leak in Node.js
• JavaScript Garbage Collection Improvements - Orinoco
• memorymanagement.org
RisingStack 26
CHAPTER THREE: NATIVE MODULES
Just take a look at the list of a few popular modules using native
extensions. You’re using at least one of them, right?
• https://github.com/wadey/node-microtime
• https://github.com/node-inspector
• https://github.com/node-inspector/v8-profiler
• http://www.nodegit.org/
There are a few reasons why one would consider writing native
Node.js modules, these include but not limited to:
RisingStack 27
This means that (if done right) the quirks of C/C++ can be hidden
from the module’s consumer. What they will see instead is that
your module is a Node.js module - just like if you had written it in
JavaScript.
Also, in the previous chapter we’ve learnt about the cost of the
Node.js Garbage Collector. Although Garbage Collection can be
completely avoided if you decide to manage memory yourself
(because C/C++ have no GC concept), you’ll create memory issues
much easier.
• Libuv
• V8
• Node.js internals
PREREQUISITES
Linux:
Mac:
• Xcode installed: make sure you not only install it, but you start it at
least once and accept its terms and conditions - otherwise it won’t
work!
RisingStack 28
Windows
OR
• Install Visual Studio (it has all the C/C++ build tools preconfigured)
OR
• Use the Linux subsystem provided by the latest Windows build. With
that, follow the LINUX instructions above.
Let’s create our first file for the native extension. We can either use
the .cc extension that means it’s C with classes, or the .cpp
extension which is the default for C++. The Google Style Guide
recommends .cc, so I’m going to stick with it.
First, let’s see the file in whole, and after that, I’m going to explain it
to you line-by-line!
RisingStack 29
#include <node.h>
if (!args[0]->IsNumber()) {
isolate->ThrowException(v8::Exception::TypeError(
v8::String::NewFromUtf8(isolate, “Argument must be a
number”)));
return;
}
numberOfCalls += argsValue;
auto currentNumberOfCalls =
v8::Number::New(isolate, static_cast<double>(numberOfCalls));
args.GetReturnValue().Set(currentNumberOfCalls);
}
NODE_MODULE(module_name, Initialize)
#include <node.h>
RisingStack 30
void WhoAmI(const v8::FunctionCallbackInfo<v8::Value>& args) {
v8::Isolate* isolate = args.GetIsolate();
auto message = v8::String::NewFromUtf8(isolate, “I’m a Node Hero!”);
args.GetReturnValue().Set(message);
}
function () {
var a = 1;
} // SCOPE
C++ has built-in types for storing integers and strings, but JavaScript
only understands it’s own v8:: type objects. As long as we are in
the scope of the C++ world, we are free to use the ones built into C++,
but when we’re dealing with JavaScript objects and interoperability
with JavaScript code, we have to transform C++ types into ones
that are understood by the JavaScript context. These are the types
that are exposed in the v8:: namespace like v8::String or
v8::Object.
RisingStack 31
void WhoAmI(const v8::FunctionCallbackInfo<v8::Value>& args) {
v8::Isolate* isolate = args.GetIsolate();
auto message = v8::String::NewFromUtf8(isolate, “I’m a Node Hero!”);
args.GetReturnValue().Set(message);
}
if (!args[0]->IsNumber()) {
isolate->ThrowException(v8::Exception::TypeError(
v8::String::NewFromUtf8(isolate, “Argument must be a
number”)));
return;
}
RisingStack 32
In C++ with the v8 api it looks like:
v8::Exception:Error(v8::String::NewFromUtf8(isolate,
“Counter went through the roof!”))); where the isolate
is the current scope that we have to first get the reference via the
v8::Isolate* isolate = args.GetIsolate();.
auto currentNumberOfCalls =
v8::Number::New(isolate, static_cast<double>(numberOfCalls));
exports.whoami = WhoAmI
All C++ modules have to register themselves into the node module
system. Without these lines, you won’t be able to access your
module from JavaScript. If you accidentally forget to register your
module, it will still compile, but when you’re trying to access it from
JavaScript you’ll get the following exception:
RisingStack 33
module.js:597
return process.dlopen(module, path._makeLong(filename));
^
From now on when you see this error you’ll know what to do.
{
“targets”: [
{
“target_name”: “addon”,
“sources”: [ “example.cc” ]
}
]
}
npm install will take care of the rest. You can also use
node-gyp in itself by installing it globally on your system with
npm install node-gyp -g.
Now that we have the C++ part ready, the only thing remaining is to
get it working from within our Node.js code. Calling these addons
are seamless thanks to the node-gyp compiler.
It’s just a require away.
This approach works, but it can get a little bit tedious to specify
paths every time, and we all know that relative paths are just hard to
work with. There is a module to help us deal with this problem.
RisingStack 34
We can require the bindings module, and it will expose all the
.node native extensions that we’ve specified in the binding.gyp
files target_name.
To use NaN, we have to rewrite parts of our application, but first, let’s
install it with npm install nan --save. First, we have to add the
following lines into the targets field in our bindings.gyp. This will
make it possible to include the NaN header file in our program to use
NaN’s functions.
{
“targets”: [
{
“include_dirs” : [
“<!(node -e \”require(‘nan’)\”)”
],
“target_name”: “addon”,
“sources”: [ “example.cc” ]
}
]
}
RisingStack 35
We can replace some of the v8’s types with NaN’s abstractions
in our sample application. It provides us helper methods on the
call arguments and makes working with v8 types a much better
experience.
The first thing you’ll probably notice is that we don’t have to have
explicit access to the JavaScript’s scope, via the
v8::Isolate* isolate = args.GetIsolate(); NaN handles
that automatically for us. Its types will hide bindings to the current
scope, so we don’t have to bother using them.
#include <nan.h>
numberOfCalls += argsValue;
auto currentNumberOfCalls =
Nan::New<v8::Number>(numberOfCalls);
args.GetReturnValue().Set(currentNumberOfCalls);
}
NODE_MODULE(addon, Initialize)
There is one more small tweak we can make, and that is to use the
provided macros of NaN.
RisingStack 36
Macros are snippets of code that the compiler will expand
when compiling the code. More on macros can be found in this
documentation. We had already been using one of these macros,
NODE_MODULE, but NaN has a few others that we can include as well.
These macros will save us a bit of time when creating our native
extensions.
#include <nan.h>
NAN_METHOD(WhoAmI) {
auto message = Nan::New<v8::String>(“I’m a Node Hero!”).ToLocalChecked();
info.GetReturnValue().Set(message);
}
NAN_METHOD(Increment) {
if (!info[0]->IsNumber()) {
Nan::ThrowError(“Argument must be a number”);
return;
}
numberOfCalls += infoValue;
auto currentNumberOfCalls =
Nan::New<v8::Number>(numberOfCalls);
info.GetReturnValue().Set(currentNumberOfCalls);
}
NAN_MODULE_INIT(Initialize) {
NAN_EXPORT(target, WhoAmI);
NAN_EXPORT(target, Increment);
}
NODE_MODULE(addon, Initialize)
The first NAN_METHOD will save us the burden of typing the long
method signature and will include that for us when the compiler
expands this macro. Take note that if you use macros, you’ll have
to use the naming provided by the macro itself - so now instead
of args the arguments object will be called info, so we have to
change that everywhere.
The last macro is NAN_EXPORT which will set our modules interface.
You can see that we cannot specify the objects keys in this macro, it
will assign them with their respective names.
RisingStack 37
module.exports = {
Increment,
WhoAmI
}
If you’d like to use this with our previous example make sure you
change the function names to uppercase, like this:
‘use strict’
Example Repository
We’ve created a repository with all the code included in this post.
The repository is under GIT version control, and available on GitHub.
Each of the steps have their own branch, master is the first example,
nan is the second one and the final step’s branch is called macros.
Conclusion
I hope you had as much fun following along, as we’ve had writing this
book. We’d highly recommend getting into at least a bit of C/C++ to
understand the lower levels of the platform itself. You’ll surely find
something of your interest. :)
RisingStack 38