NodeJS Book
NodeJS Book
NodeJS Book
book.mixu.net /single.html
2. What is Node.js?
In this chapter, I: describe the Node.js event loop and the premise behind asynchronous I/O go through an example of how context switches are made between V8 and Node Node - or Node.js, as it is called to distinguish it from other "nodes" - is an event- driven I/O framework for the V8 JavaScript engine. Node.js allows Javascript to be executed on the server side, and it uses the wicked fast V8 Javascript engine which was developed by Google for the Chrome browser. The basic philosophy of node.js is: Non- blocking I/O - every I/O call must take a callback, whether it is to retrieve information from disk, network or another process. Built- in support for the most important protocols (HTTP, DNS, TLS) Low- level . Do not remove functionality present at the POSIX layer. For example, support half- closed TCP connections. St ream everyt hing ; never force the buffering of data. Node.js is different from client- side Javascript in that it removes certain things, like DOM manipulation, and adds support for evented I/O, processes, streams, HTTP, SSL, DNS, string and buffer processing and C/C++ addons. Let's skip the boring general buz z word bingo introduction and get to the meat of the matter - how does node run your code?
The Event Loop - underst anding how Node execut es Javascript code
The event loop is a mechanism which allows you to specify what happens when a particular event occurs. This might be familiar to you from writing client- side Javascript, where a button might have an onClick event. When the button is clicked, the code associated with the onClick event is run. Node simply extends this idea to I/O operations: when you start an operation like reading a file, you can pass control to back to Node and have your code run when the data has been read. For example: // read the le /etc/passwd, and call console.log on the returned data fs.readFile('/etc/passwd', function(err, data){ console.log(data); }); You can think of the event loop as a simple list of tasks (code) bound to events. When an event happens, the code/task associated with that event is executed. Remember that all of your code in Node is running in a single process. There is no parallel execution of Javascript code that you write - you can only be running a single piece of code at any time. Consider the following code, in which: 1. We set a function to be called after 1000 milliseconds using setTimeout() and then 2. start a loop that blocks for 4 seconds. What will happen?
// set function to be called after 1 second setTimeout(function() { console.log('Timeout ran at ' + new Date().toTimeString()); }, 1000); // store the start time var start = new Date(); console.log('Enter loop at: '+start.toTimeString()); // run a loop for 4 seconds var i = 0; // increment i while (current time < start time + 4000 ms) while(new Date().getTime() < start.getTime() + 4000) { i++; } console.log('Exit loop at: ' +new Date().toTimeString() +'. Ran '+i+' iterations.'); Because your code executes in a single process, the output looks like this: Enter loop at: 20:04:50 GMT+0300 (EEST) Exit loop at: 20:04:54 GMT+0300 (EEST). Ran 3622837 iterations. Timeout ran at 20:04:54 GMT+0300 (EEST) Notice how the setTimeout function is only triggered after four seconds. This is because Node cannot and will not interrupt the while loop. The event loop is only used to determine what do next when the execution of your code finishes, which in this case is after four seconds of forced waiting. If you would have a CPU- intensive task that takes four seconds to complete, then a Node server would not be able to do respond to other requests during those four seconds, since the event loop is only checked for new tasks once your code finishes. Some people have criticiz ed Node's single process model because it is possible to block the current thread of execution like shown above. However, the alternative - using threads and coordinating their execution - requires somewhat intricate coding to work and is only useful if CPU cycles are the main bottleneck. In my view, Node is about taking a simple idea (single- process event loops), and seeing how far one can go with it. Even with a single process model, you can move CPU- intensive work to other background processes, for example by setting up a queue which is processed by a pool of workers, or by load balancing over multiple processes. If you are performing CPU- bound work, then the only real solutions are to either figure out a better algorithm (to use less CPU) or to scale to multiple cores and multiple machines (to get more CPU's working on the problem). The premise of Node is that I/O is the main bottleneck of many (if not most) tasks. A single I/O operation can take millions of CPU cycles, and in traditional, non- event- loop- based frameworks the execution is blocked for that time. In Node, I/O operations such as reading a file are performed asynchronously. This is simply a fancy way of saying that you can pass control back to the event loop when you are performing I/O, like reading a file, and specify the code you want to run when the data is available using a callback function. For example: setTimeout(function() { console.log('setTimeout at '+new Date().toTimeString()); }, 1000); require('fs').readFile('/etc/passwd', function(err, result) { console.log(result); } ); Here, we are reading a file using an asynchronous function, fs.readFile(), which takes as arguments the name of the file and a callback function. When Node executes this code, it starts the I/O operation in the background. Once the execution has passed over fs.readFile(), control is returned back to Node, and the event loop gets to run. When the I/O operation is complete, the callback function is executed, passing the data from the file as the second argument. If reading the file takes longer than 1 second, then the function we set using setTimeout will be run after 1 second - before the file reading is completed. In node.js, you arent supposed to worry about what happens in the backend: just use callbacks when you are doing I/O; and you are guaranteed that your code is never interrupted and that doing I/O will not block other requests.
Having asynchronous I/O is good, because I/O is more expensive than most code and we should be doing something better than just waiting for I/O. The event loop is simply a way of coordinating what code should be run during I/O, which executes whenever your code finishes executing. More formally, an event loop is an entity that handles and processes external events and converts them into callback invocations. By making calls to the asynchronous functions in Nodes core libraries, you specify what code should be run once the I/O operation is complete. You can think of I/O calls as the points at which Node.js can switch from executing one request to another. At an I/O call, your code saves the callback and returns control to the Node runtime environment. The callback will be called later when the data actually is available. Of course, on the backend - invisible to you as a Node developer - may be thread polls and separate processes doing work. However, these are not explicitly exposed to your code, so you cant worry about them other than by knowing that I/O interactions e.g. with the database, or with other processes will be asynchronous from the perspective of each request since the results from those threads are returned via the event loop to your code. Compared to the non- evented multithreaded approach (which is used by servers like Apache and most common scripting languages), there are a lot fewer threads and thread overhead, since threads arent needed for each connection; just when you absolutely positively must have something else running in parallel and even then the management is handled by Node.js. Other than I/O calls, Node.js expects that all requests return quickly; e.g. CPU- intensive work should be split off to another process with which you can interact as with events, or by using an abstraction such as WebWorkers (which will be supported in the future). This (obviously) means that you cant paralleliz e your code without another process in the background with which you interact with asynchronously. Node provides the tools to do this, but more importantly makes working in an evented, asynchronous manner easy.
When you navigate your browser to http://localhost:8080/, the Node.js runtime receives an event which indicates that a new client has connected to the server. It searches the internal list of callbacks to find the callback function we have set previously to respond to new client requests, and executes it using V8 in the execution context of server.js. [V8 engine running the callback in the server.js context] [Node.js runtime] When the callback is run, it receives two parameters which represent the client request (the first parameter, request), and the response (the second parameter). The callback calls response.end(), passing the variable content and instructing the response to be closed after sending that data back. Calling response.end() causes some core library code to be run which writes the data back to the client. Finally, when the callback finishes, the control is returned back to the Node.js runtime: [Node.js runtime (waiting for client request to run callback)] As you can see, whenever Node.js is not executing code, the runtime checks for events (more accurately it uses platform- native APIs which allow it to be activated when events occur). Whenever control is passed to the Node.js runtime, another event can be processed. The event could be from an HTTP client connection, or perhaps from a file read. Since there is only one process, there is no parallel execution of Javascript code. Even though you may have several evented I/O operations with different callbacks ongoing, only one of them will have it's Node/Javascript code run at a time (the rest will be activated whenever they are ready and no other JS code is running). The client (your web browser) will receive the data and interpret it as HTML. The alert() call in the Javascript tag in the returned data will be run in your web browser, and the HTML containing Hello World will be displayed. It is important to realiz e that just because both the server and the client are running Javascript, there is no special connection - each has its own Javascript variables, functions and context. The data returned from the client request callback is just data and there is no automatic sharing or built- in ability to call functions on the server without issuing a server request. However, because both the server and the client are written in Javascript, you can share code. And even better, since the server is a persistent program, you can build programs that have long- term state - unlike in scripting languages like PHP, where the script is run once and then exits, Node has its own internal HTTP server which is capable of saving the state of the program and resuming it quickly when a new request is made.
The difference between request- response (simple polling), long polling and sockets To implement long polling, we need two things: 1. Some sort of data payload. In our case, this will be a chat message. 2. Some way of knowing which messages are new to our client. In our case, we will use a simple counter to know which messages are new. The client will be a simple HTML page which uses jQuery to perform the long polling calls, while the server will be a Node.js server. There are three cases we need to handle: 1. Case 1: New messages are available when the client polls. The server should check it's message list against the counter received from the client. If the server has messages that are newer than the counter, the server should return those messages up to the current state as well as the current count. 2. Case 2: No new messages are available when the client polls. The server should store the client request into the list of pending requests, and not respond until a new message arrives. 3. Case 3: A client sends a new message. The server should parse the message, and add it to the message list and release all pending requests, sending the message to them. These are illustrated below:
var http = require('http'), url = require('url'), fs = require('fs'); In addition, we need storage for the messages as well as pending clients: var messages = ["testing"]; var clients = []; We can create a server using http.createServer(). This function takes a callback function as an argument, and calls it on each request with two parameters: the first parameter is the request, while the second parameter is the response. Refer to nodejs.org for more information on the http API. We will get into more detail in the later chapters. Lets create a simple server which returns Hello World: http.createServer(function (req, res) { res.end("Hello world"); }).listen(8080, 'localhost'); console.log('Server running.'); If you run the code above using node server.js, and make a request by pointing your browser to http://localhost:8080/, you will get a page containing Hello World. This is not particularly interesting, however, we have now created our first server. Lets make the server return a file which will contain our client code. The main reason for doing this is that browsers enforce a same- origin policy for security reasons which makes long polling complicated unless the client comes from the same URL as we will be using for the long polling. This can be done using the FS API: http.createServer(function (req, res) { fs.readFile('./index.html', function(err, data) { res.end(data); }); }).listen(8080, 'localhost'); console.log('Server running.'); We will read the file using asynchronous function fs.readFile. When it completes, it runs the inner function, which calls res.end() with the content of the file. This allows us to send back the content of the index.html file in the same directory as server.js.
the getJSON() function, which makes a HTTP GET call and parses the resulting data from the JSON format. The first argument is the URL to get, and the second parameter is the function which handles the returned response. // Client code var counter = 0; var poll = function() { $.getJSON('/poll/'+counter, function(response) { counter = response.count; var elem = $('#output'); elem.text(elem.text() + response.append); poll(); }); } poll(); We maintain a global counter, which starts at z ero and is passed to in the URL to the server. The first request will be to /poll/0, with subsequent requests incrementing that counter to keep track of which messages we have already received. Once the message is received, we update the counter on the client side, append the message text to the textarea with the ID #output, and finally initiate a new long polling request by calling poll() again. To start the polling for the first time, we call poll() at the end of code.
the counter value, and if it is, we will immediately return by using Response.end(). Because we are sending data as JSON, we create an object with the "count" and "append" properties and encode it into a string using JSON.stringify. This JSON message contains the current count on the server side (which is the same as messages.length) and all the messages starting from count (using the slice function) joined together (with newlines separating the messages). If the count is greater than the current number of messages, then we do not do anything. The client request will remain pending, and we will store the Response object into the clients array using push(). Once this is done, our server goes back to waiting for a new message to arrive, while the client request remains open.
3.4 Implement ing message receiving and broadcast ing on t he server side
Finally, lets implement the message receiving functionality on the server side. Messages are received via the HTTP GET requests to the /msg/ path, for example: /msg/Hello%20World. This allows us to skip writing more client code for making these requests (easy, but unnecessary). } else if(url_parts.pathname.substr(0, 5) == '/msg/') { // message receiving var msg = unescape(url_parts.pathname.substr(5)); messages.push(msg); while(clients.length > 0) { var client = clients.pop(); client.end(JSON.stringify( { count: messages.length, append: msg+"\n" })); } res.end(); } We decode the url- encoded message using unescape(), then we push the message to the messages array. After this, we will notify all pending clients by continuously pop()ing the clients array until it is empty. Each pending client request receives the current message. Finally, the pending request is terminated.
we will cover more advanced ways of structuring your code that help you in writing more complex applications.
Using apply: context foo.apply(context, [myArgs]); Constructor with new: var newFoo = new Foo(); n/a the new instance (e.g. newFoo) value of this in parent context
var obj = { id: "An object", f1: function() { console.log(this); } }; obj.f1(); As you can see, this refers to the current object, as you might expect.
Context changes
As I noted earlier, the value of this is not fixed - it is determined by how the function is called. In other words, the
value of this is determined at the time the function is called, rather than being fixed to some particular value. This causes problems (pun intended) when we want to defer calling a function. For example, the following won't work: var obj = { id: "xyz", printId: function() { console.log('The id is '+ this.id + ' '+ this.toString()); } }; setTimeout(obj.printId, 100); Why doesn't this work? Well, for the same reason this does not work: var obj = { id: "xyz", printId: function() { console.log('The id is '+ this.id + ' '+ this.toString()); } }; var callback = obj.printId; callback(); Since the value of this is determined at call time - and we are not calling the function using the "object.method" notation, " this" refers to the global object - - which is not what we want. In "setTimeout(obj.printId, 100);", we are passing the value of obj.printId, which is a function. When that function later gets called, it is called as a standalone function - not as a method of an object. To get around this, we can create a function which maintains a reference to obj, which makes sure that this is bound correctly: var obj = { id: "xyz", printId: function() { console.log('The id is '+ this.id + ' '+ this.toString()); } }; setTimeout(function() { obj.printId() }, 100); var callback = function() { obj.printId() }; callback(); A pattern that you will see used frequently is to store the value of this at the beginning of a function to a variable called self, and then using self in callback in place of this: var obj = { items: ["a", "b", "c"], process: function() { var self = this; // assign this to self this.items.forEach(function(item) { // here, use the original value of this! self.print(item); }); }, print: function(item) { console.log('*' + item + '*'); } }; obj.process(); Because self is an ordinary variable, it will contain the value of this when the first function was called - no matter how or when the callback function passed to forEach() gets called. If we had used " this" instead of "self" in the
callback function, it would have referred to the wrong object and the call to print() would have failed.
var a = "foo"; function parent() { var b = "bar"; function nested() { console.log(a); console.log(b); } nested(); } parent(); 2. non- nested functions can only access the topmost, global variables: var a = "foo"; function parent() { var b = "bar"; } function nested() { console.log(a); console.log(b); } parent(); nested(); 2. Defining functions creates new scopes: 1. and the default behavior is to access previous scope: var a = "foo"; function grandparent() { var b = "bar"; function parent() { function nested() { console.log(a); console.log(b); } nested(); } parent(); } grandparent(); 2. but inner function scopes can prevent access to a previous scope by defining a variable with the same name: var a = "foo"; function grandparent() { var b = "bar"; function parent() { var b = "b redened!"; function nested() { console.log(a); console.log(b); } nested(); } parent(); } grandparent(); 3. Some functions are executed later, rather than immediately. You can emulate this yourself by storing but not executing functions, see example #3. What we would expect, based on experience in other languages, is that in the for loop, calling a the function would
result in a call- by- value (since we are referencing a primitive an integer) and that function calls would run using a copy of that value at the time when the part of the code was passed over (e.g. when the surrounding code was executed). Thats not what happens, because we are using a closure/nested anonymous function: A variable referenced in a nested function/closure is not a copy of the value of the variable it is a live reference to the variable itself and can access it at a much later stage. So while the reference to i is valid in both examples 2 and 3 they refer to the value of i at the time of their execution which is on the next event loop which is after the loop has run which is why they get the value 5. Functions can create new scopes but they do not have to. The default behavior allows us to refer back to the previous scope (all the way up to the global scope); this is why code executing at a later stage can still access i. Because no variable i exists in the current scope, the i from the parent scope is used; because the parent has already executed, the value of i is 5. Hence, we can fix the problem by explicitly establishing a new scope every time the loop is executed; then referring back to that new inner scope later. The only way to do this is to use an (anonymous) function plus explicitly defining a variable in that scope. We can pass the value of i from the previous scope to the anonymous nested function, but then explicitly establish a new variable j in the new scope to hold that value for future execution of nested functions:
When you are iterating through the contents of an array, you should use Array.forEach(), as it passes values as function arguments, avoiding this problem. However, in some cases you will still need to use the "create an anonymous function" technique to explicitly establish new scopes.
Returns true if a variable is an array, false if it is not. Returns the first (least) index of an element within the array equal to the specified value, or - 1 if none is found. The search can optionally begin at fromIndex.
lastIndexOf(searchElement[, Returns the last (greatest) index of an element within the array equal to the specified fromIndex]) value, or - 1 if none is found.The array is searched backwards, starting at fromIndex. The indexOf() and lastIndexOf() functions are very useful for searching an array for a particular value, if necessary. For example, to check whether a particular value is present in an array: function process(argv) { if(argv.indexOf('help')) { console.log('This is the help text.'); } } process(['foo', 'bar', 'help']); However, be aware that indexOf() uses the strict comparison operator (===), so the following will not work: var arr = ["1", "2", "3"]; // Search the array of keys console.log(arr.indexOf(2)); // returns -1 This is because we defined an array of Strings, not Integers. The strict equality operator used by indexOf takes into account the type, like this: console.log(2 == "2"); // true console.log(2 === "2"); // false var arr = ["1", "2", "3"]; // Search the array of keys console.log(arr.indexOf(2)); // returns -1 console.log(arr.indexOf("2")); // returns 1 Notably, you might run into this problem when you use indexOf() on the return value of Object.keys(). var lookup = { 12: { foo: 'b'}, 13: { foo: 'a' }, 14: { foo: 'c' }}; console.log(Object.keys(lookup).indexOf(12) > -1); // false console.log(Object.keys(lookup).indexOf(''+12) > -1); // true
forEach(callback[, Calls a function for each element in the array. thisObject]) map(callback[, thisObject]) Creates a new array with the results of calling a provided function on every element in this array.
lter(), map() and forEach() all call a callback with every value of the array. This can be useful for performing various operations on the array. Again, the callback is invoked with three arguments: the value of the element, the index of the element, and the Array object being traversed. For example, you might apply a callback to all items in the array:
var names = ['a', 'b', 'c']; names.forEach(function(value) { console.log(value); }); // prints a b c or you might filter based on a criterion: var items = [ { id: 1 }, { id: 2}, { id: 3}, { id: 4 }]; // only include items with even id's items = items.lter(function(item){ return (item.id % 2 == 0); }); console.log(items); // prints [ {id: 2 }, { id: 4} ] If you want to accumulate a particular value - like the sum of elements in an array - you can use the reduce() functions: reduce(callback[, initialValue]) reduceRight(callback[, initialValue]) Apply a function simultaneously against two values of the array (from left- to- right) as to reduce it to a single value. IE9 Apply a function simultaneously against two values of the array (from right- to- left) as to reduce it to a single value. IE9
reduce() and reduceRight() apply a function against an accumulator and each value of the array. The callback receives four arguments: the initial value (or value from the previous callback call), the value of the current element, the current index, and the array over which iteration is occurring (e.g. arr.reduce(function(previousValue, currentValue, index, array){ ... }).
some() and every() allow for a condition to be specified which is then tested against all the values in the array. The callback is invoked with three arguments: the value of the element, the index of the element, and the Array object being traversed. For example, to check whether a particular string contains at least one of the tokens in an array, use some(): var types = ['text/html', 'text/css', 'text/javascript']; var string = 'text/javascript; encoding=utf-8'; if (types.some(function(value) { return string.indexOf(value) > -1; })) { console.log('The string contains one of the content types.'); }
Adds and/or removes elements from an array. Reverses the order of the elements of an array - - the first becomes the last, and the last becomes the first.
These functions are part of ECMAScript 3, so they are available on all modern browsers. var a = [ 'a', 'b', 'c' ]; var b = [ 1, 2, 3 ]; console.log( a.concat(['d', 'e', 'f'], b) ); console.log( a.join('! ') ); console.log( a.slice(1, 3) ); console.log( a.reverse() ); console.log( ' --- '); var c = a.splice(0, 2); console.log( a, c ); var d = b.splice(1, 1, 'foo', 'bar'); console.log( b, d );
hasOwnProperty(prop) Returns a boolean indicating whether the object has the specified property. This method can be used to determine whether an object has the specified property as a direct property of that object; unlike the in operator, this method does not check down the object's prototype chain. prop in objectName The in operator returns true if the specified property is in the specified object. It is useful for checking for properties which have been set to undefined, as it will return true for those as well.
You can use this to count the number of properties in an object which you are using as a hash table: // returns array of keys var keys = Object.keys({ a: 'foo', b: 'bar'}); // keys.length is 2 console.log(keys, keys.length);
order, use sort(): var obj = { x: '1', a: '2', b: '3'}; var items = Object.keys(obj); items.sort(); // sort the array of keys items.forEach(function(item) { console.log(item + '=' + obj[item]); });
var obj = { a: "value", b: false }; // dierent results when the property is from an object higher up in the prototype chain console.log( !!obj.toString ); console.log( 'toString' in obj ); console.log( obj.hasOwnProperty('toString') ); (Note: All objects have a toString method, derived from Object).
Function.call
Calls a function with a given this value and arguments provided individually.
Function.apply Applies the method of another object in the context of a different object (the calling object); arguments can be passed as an Array object. As you can see, both call() and apply() allow us to specify what the value of this should be. The difference between the two is how they pass on addional arguments: function f1(a, b) { console.log(this, a, b); } var obj1 = { id: "Foo"}; f1.call(obj1, 'A', 'B'); // The value of this is changed to obj1 var obj2 = { id: "Bar"}; f1.apply(obj2, [ 'A', 'B' ]); // The value of this is changed to obj2 The syntax of call() is identical to that of apply(). The difference is that call() uses the actual arguments passed to it (after the first argument), while apply() takes just two arguments: thisArg and an array of arguments.
JSON.parse() can be used to convert JSON data to a Javascript Object or Array: // returns an Object with two properties var obj = JSON.parse('{"hello": "world", "data": [ 1, 2, 3 ] }'); console.log(obj.data); JSON.stringify() does the opposite: var obj = { hello: 'world', data: [ 1, 2, 3 ] }; console.log(JSON.stringify(obj)); The optional space parameter in JSON.stringify is particularly useful in producing readable output from complex object. The reviver and replacer parameters are rarely used. They expect a function which takes the key and value of each value as an argument. That function is applied to the JSON input before returning it.
run time lookups from the prototype property rather than statically defined class constructs. The prototype chain lookup mechanism is the essence of prototypal inheritance. There are further nuances to the system. Here are my recommendations on what to read: Let's look at some applied patterns next:
Class pattern
// Constructor function Foo(bar) { // always initialize all instance properties this.bar = bar; this.baz = 'baz'; // default value } // class methods Foo.prototype.fooBar = function() { }; // export the class module.exports = Foo; Instantiating a class is simple: // constructor call var object = new Foo('Hello'); Note that I recommend using function Foo() { ... } for constructors instead of var Foo = function() { ... }. The main benefit is that you get better stack traces from Node when you use a named function. Generating a stack trace from an object with an unnamed constructor function: var Foo = function() { }; Foo.prototype.bar = function() { console.trace(); }; var f = new Foo(); f.bar(); ... produces something like this: Trace: at [object Object].bar (/home/m/mnt/book/code/06_oop/constructors.js:3:11) at Object. (/home/m/mnt/book/code/06_oop/constructors.js:7:3) at Module._compile (module.js:432:26) at Object..js (module.js:450:10) at Module.load (module.js:351:31) at Function._load (module.js:310:12) at Array.0 (module.js:470:10) at EventEmitter._tickCallback (node.js:192:40) ... while using a named function function Baz() { }; Baz.prototype.bar = function() { console.trace(); }; var b = new Baz(); b.bar(); ... produces a stack trace with the name of the class:
Trace: at Baz.bar (/home/m/mnt/book/code/06_oop/constructors.js:11:11) at Object. (/home/m/mnt/book/code/06_oop/constructors.js:15:3) at Module._compile (module.js:432:26) at Object..js (module.js:450:10) at Module.load (module.js:351:31) at Function._load (module.js:310:12) at Array.0 (module.js:470:10) at EventEmitter._tickCallback (node.js:192:40) To add private shared (among all instances of the class) variables, add them to the top level of the module: // Private variable var total = 0; // Constructor function Foo() { // access private shared variable total++; }; // Expose a getter (could also expose a setter to make it a public variable) Foo.prototype.getTotalObjects = function(){ return total; };
If you must implement inheritance, at least avoid using yet another nonstandard implementation / magic function. Here is how you can implement a reasonable facsimile of inheritance in pure ES3 (as long as you follow the rule of never defining properties on prototypes): function Animal(name) { this.name = name; }; Animal.prototype.move = function(meters) { console.log(this.name+" moved "+meters+"m."); }; function Snake() { Animal.apply(this, Array.prototype.slice.call(arguments)); }; Snake.prototype = new Animal(); Snake.prototype.move = function() { console.log("Slithering..."); Animal.prototype.move.call(this, 5); }; var sam = new Snake("Sammy the Python"); sam.move(); This is not the same thing as classical inheritance - but it is standard, understandable Javascript and has the functionality that people mostly seek: chainable constructors and the ability to call methods of the superclass. Or use util.inherits() (from the Node.js core). Here is the full implementation: var inherits = function (ctor, superCtor) { ctor.super_ = superCtor; ctor.prototype = Object.create(superCtor.prototype, { constructor: { value: ctor, enumerable: false } }); }; And a usage example: var util = require('util'); function Foo() { } util.inherits(Foo, EventEmitter); The only real benefit to util.inherits is that you don't need to use the actual ancestor name in the Child constructor. Note that if you define variables as properties of a prototype, you will experience unexpected behavior (e.g. since variables defined on the prototype of the superclass will be accessible in subclasses but will also be shared among all instances of the subclass). As I pointed out with the class pattern, always define all instance variables in the constructor. This forces the properties to exist on the object itself and avoids lookups on the prototype chain for these variables. Otherwise, you might accidentally define/access a variable property defined in a prototype. Since the prototype is shared among all instances, this will lead to the unexpected behavior if the variable is not a primitive (e.g. is an Object or an Array). See the earlier example under "Avoid setting variables as properties of prototypes".
Use mixins
A mixin is a function that adds new functions to the prototype of an object. I prefer to expose an explicit mixin() function to indicate that the class is designed to be mixed into another one:
function Foo() { } Foo.prototype.bar = function() { }; Foo.prototype.baz = function() { }; // mixin - augment the target object with the Foo functions Foo.mixin = function(destObject){ ['bar', 'baz'].forEach(function(property) { destObject.prototype[property] = Foo.prototype[property]; }); }; module.exports = Foo; Extending the Bar prototype with Foo: var Foo = require('./foo.js'); function Bar() {} Bar.prototype.qwerty = function() {}; // mixin Foo Foo.mixin(Bar);
Avoid currying
Currying is a shorthand notation for creating an anonymous function with a new scope that calls another function. In other words, anything you can do using currying can be done using a simple anonymous function and a few variables local to that function. Function.prototype.curry = function() { var fn = this; var args = Array.prototype.slice.call(arguments); return function() { return fn.apply(this, args.concat(Array.prototype.slice.call(arguments, 0))); }; } Currying is intriguing, but I haven't seen a practical use case for it outside of subverting how the this argument works in Javascript. Don't use currying to change the context of a call/the this argument. Use the "self" variable accessed through an anonymous function, since it achieves the same thing but is more obvious. Instead of using currying: function foo(a, b, c) { console.log(a, b, c); } var bar = foo.curry('Hello'); bar('World', '!'); I think that writing: function foo(a, b, c) { console.log(a, b, c); } function bar(b, c) { foo('Hello', b, c); } bar('World', '!'); is more clear.
7. Control f low
In this chapter, I: discuss nested callbacks and control flow in Node introduce three essential async control flow patterns: Series - for running async tasks one at a time
Fully parallel - for running async tasks all at the same time Limitedly parallel - for running a limited number of async tasks at the same time walk you through a simple implementation of these control flow patterns and convert the simple implementation into a control flow library that takes callback arguments When you start coding with Node.js, its a bit like learning programming the first time. Since you want everything to be asynchronous, you use a lot of callbacks without really thinking about how you should structure your code. Its a bit like being overexcited about the if statement, and using it and only it to write complex programs. One of my first programs in primary school was a text- based adventure where you would be presented with a scenario and a choice. I wrote code until I reached the maximum level of nesting supported by the compiler, which probably was 63 nested if statements. Learning how to code with callbacks is similar in many ways. If that is the only tool you use, you will create a mess. Enlightenment comes when you realiz e that this: async1(function(input, result1) { async2(function(result2) { async3(function(result3) { async4(function(result4) { async5(function(output) { // do something with output }); }); }); }); }) ought be written as: myLibrary.doStu(input, function(output){ // do something with output }); In other words, you can and are supposed to think in terms of higher level abstractions. Refactor, and extract functionality into its own module. There can be any number of callbacks between the input that matters and the output that matters, just make sure that you split the functionality into meaningful modules rather than dumping it all into one long chain. Yes, there will still be some nested callbacks. However, more than a couple of levels of nesting would should be a code smell - time to think what you can abstract out into separate, small modules. This has the added benefit of making testing easier, because you end up having smaller, hopefully meaningful code modules that provide a single capability. Unlike in tradional scripting languages based on blocking I/O, managing the control flow of applications with callbacks can warrant specializ ed modules which coordinate particular work flows: for example, by dealing with the level concurrency of execution. Blocking on I/O provides just one way to perform I/O tasks: sequentially (well, at least without threads). With Node's "everything can be done asynchronously" approach, we get more options and can choose when to block, when to limit concurrency and when to just launch a bunch of tasks at the same time. Let's look at the most common control flow patterns, and see how we can take something as abstract as control flow and turn it into a small, single purpose module to take advantage of callbacks- as- input.
As you already know, there are two types of API functions in Node.js: 1. asynchronous, non- blocking functions - for example: fs.readFile(filename, [encoding], [callback]) 2. synchronous, blocking functions - for example: fs.readFileSync(filename, [encoding]) Synchronous functions return a result: var data = fs.readFileSync('/etc/passwd'); While asynchronous functions receive the result via a callback (after passing control to the event loop): fs.readFileSync('/etc/passwd', function(err, data) { } ); Writing synchronous code is not problematic: we can draw on our experience in other languages to structure it appropriately using keywords like if, else, for, while and switch. Its the way we should structure asynchronous calls which is most problematic, because established practices do not help here. For example, wed like to read a thousand text files. Take the following naive code: for(var i = 1; i <= 1000; i++) { fs.readFile('./'+i+'.txt', function() { // do something with the le }); } do_next_part(); This code would start 1000 simultaneous asynchronous file reads, and run the do_next_part() function immediately. This has several problems: first, wed like to wait until all the file reads are done until going further. Second, launching a thousand file reads simultaneously will quickly exhaust the number of available file handles (a limited resource needed to read files). Third, we do not have a way to accumulate the result for do_next_part(). We need: a way to control the order in which the file reads are done some way to collect the result data for processing some way to restrict the concurrency of the file read operations to conserve limited system resources a way to determine when all the reads necessary for the do_next_part() are completed Control flow functions enable us to do this in Node.js. A control flow function is a lightweight, generic piece of code which runs in between several asynchronous function calls and which take care of the necessary housekeeping to: 1. control the order of execution, 2. collect data, 3. limit concurrency and 4. call the next step in the program. There are three basic patterns for this.
// Async task (same in all examples in this chapter) function async(arg, callback) { console.log('do something with \''+arg+'\', return 1 sec later'); setTimeout(function() { callback(arg * 2); }, 1000); } // Final task (same in all the examples) function nal() { console.log('Done', results); } // A simple async series: var items = [ 1, 2, 3, 4, 5, 6 ]; var results = []; function series(item) { if(item) { async( item, function(result) { results.push(result); return series(items.shift()); }); } else { return nal(); } } series(items.shift()); Basically, we take a set of items and call the series control flow function with the first item. The series launches one async() operation, and passes a callback to it. The callback pushes the result into the results array and then calls series with the next item in the items array. When the items array is empty, we call the final() function. This results in serial execution of the asynchronous function calls. Control is passed back to the Node event loop after each async I/O operation is completed, then returned back when the operation is completed. Charact erist ics: Runs a number of operations sequentially Only starts one async operation at a time (no concurrency) Ensures that the async function complete in order Variat ions: The way in which the result is collected (manual or via a stashing callback) How error handling is done (manually in each subfunction, or via a dedicated, additional function) Since execution is sequential, there is no need for a final callback Tags: sequential, no- concurrency, no- concurrency- control
We take every item in the items array and start async operations for each of the items immediately. The async() function is passed a function that stores the result and then checks whether the number of results is equal to the number of items to process. If it is, then we call the final() function. Since this means that all the I/O operations are started in parallel immediately, we need to be careful not to exhaust the available resources. For example, you probably don't want to start 1000's of I/O operations, since there are operating system limitations for the number of open file handles. You need to consider whether launching parallel tasks is OK on a case- by- case basis. Charact erist ics: Runs a number of operations in parallel Starts all async operations in parallel (full concurrency) No guarantee of order, only that all the operations have been completed Variat ions: The way in which the result is collected (manual or via a stashing callback) How error handling is done (via the first argument of the final function, manually in each subfunction, or via a dedicated, additional function) Whether a final callback can be specified Tags: parallel, full- concurrency, no- concurrency- control
7.2.3 Control f low pattern # 3: Limited parallel - an asynchronous, parallel, concurrency limited f or loop
In this case, we want to perform some operations in parallel, but keep the number of running I/O operations under a set limit: function async(arg, callback) { console.log('do something with \''+arg+'\', return 1 sec later'); setTimeout(function() { callback(arg * 2); }, 1000); } function nal() { console.log('Done', results); } var items = [ 1, 2, 3, 4, 5, 6 ]; var results = []; var running = 0; var limit = 2; function launcher() { while(running < limit && items.length > 0) { var item = items.shift(); async(item, function(result) { results.push(result); running--; if(items.length > 0) { launcher(); } else if(running == 0) { nal(); } }); running++; } } launcher(); We start new async() operations until we reach the limit (2). Each async() operation gets a callback which stores the result, decrements the number of running operations, and then check whether there are items left to process. If yes, then laucher() is run again. If there are no items to process and the current operation was the last running operation, then final() is called. Of course, the criteria for whether or not we should launch another task could be based on some other logic. For example, we might keep a pool of database connections, and check whether "spare" connections are available -
or check server load - or make the decision based on some more complicated criteria. Charact erist ics: Runs a number of operations in parallel Starts a limited number of operations in parallel (partial concurrency, full concurrency control) No guarantee of order, only that all the operations have been completed
}, }, }, }, }, }
E.g. an array of callback functions and a final() function. The callback functions get a next() function as their first parameter which they should call when they have completed their async operations. This allows us to use any async function as part of the control flow. The final function is called with a single parameter: an array of arrays with the results from each async call. Each element in the array corresponds the values passed back from the async function to next(). Unlike the examples in previous section, these functions store all the results from the callback, not just the first argument - so you can call next(1, 2, 3, 4) and all the arguments are stored in the results array.
Series
This conversion is pretty straightforward. We pass an anonymous function which pushes to results and calls next() again: this is so that we can push the results passed from the callback via arguments immediately, rather than passing them directly to next() and handling them in next().
function series(callbacks, last) { var results = []; function next() { var callback = callbacks.shift(); if(callback) { callback(function() { results.push(Array.prototype.slice.call(arguments)); next(); }); } else { last(results); } } next(); } // Example task function async(arg, callback) { var delay = Math.oor(Math.random() * 5 + 1) * 100; // random ms console.log('async with \''+arg+'\', return in '+delay+' ms'); setTimeout(function() { callback(arg * 2); }, delay); } function nal(results) { console.log('Done', results); } series([ function(next) { async(1, next); }, function(next) { async(2, next); }, function(next) { async(3, next); }, function(next) { async(4, next); }, function(next) { async(5, next); }, function(next) { async(6, next); } ], nal);
Full parallel
Unlike in a series, we cannot assume that the results are returned in any particular order. Because of this we use callbacks.forEach, which returns the index of the callback - and store the result to the same index in the results array. Since the last callback could complete and return it's result first, we cannot use results.length, since the length of an array always returns the largest index in the array + 1. So we use an explicit result_counter to track how many results we've gotten back.
function fullParallel(callbacks, last) { var results = []; var result_count = 0; callbacks.forEach(function(callback, index) { callback( function() { results[index] = Array.prototype.slice.call(arguments); result_count++; if(result_count == callbacks.length) { last(results); } }); }); } // Example task function async(arg, callback) { var delay = Math.oor(Math.random() * 5 + 1) * 100; // random ms console.log('async with \''+arg+'\', return in '+delay+' ms'); setTimeout(function() { callback(arg * 2); }, delay); } function nal(results) { console.log('Done', results); } fullParallel([ function(next) { async(1, next); }, function(next) { async(2, next); }, function(next) { async(3, next); }, function(next) { async(4, next); }, function(next) { async(5, next); }, function(next) { async(6, next); } ], nal);
Limited parallel
This is a bit more complicated, because we need to launch async tasks once other tasks finish, and need to store the result from those tasks back into the correct position in the results array. Details further below.
function limited(limit, callbacks, last) { var results = []; var running = 1; var task = 0; function next(){ running--; if(task == callbacks.length && running == 0) { last(results); } while(running < limit && callbacks[task]) { var callback = callbacks[task]; (function(index) { callback(function() { results[index] = Array.prototype.slice.call(arguments); next(); }); })(task); task++; running++; } } next(); } // Example task function async(arg, callback) { var delay = Math.oor(Math.random() * 5 + 1) * 1000; // random ms console.log('async with \''+arg+'\', return in '+delay+' ms'); setTimeout(function() { var result = arg * 2; console.log('Return with \''+arg+'\', result '+result); callback(result); }, delay); } function nal(results) { console.log('Done', results); } limited(3, [ function(next) { async(1, next); }, function(next) { async(2, next); }, function(next) { async(3, next); }, function(next) { async(4, next); }, function(next) { async(5, next); }, function(next) { async(6, next); } ], nal); We need to keep two counter values here: one for the next task, and another for the callback function. In the fully parallel control flow we took advantage of [].forEach(), which returns the index of the currently running task in it's own scope. Since we cannot rely on forEach() as tasks are launched in small groups, we need to use an anonymous function to get a new scope to hold the original index. This index is used to store the return value from the callback. To illustrate the problem, I added a longer delay to the return from async() and an additional line of logging which shows when the result from async is returned. At that moment, we need to store the return value to the right index. The anonymous function: (function(index) { ... } (task)) is needed because if we didn't create a new scope using an anonymous function, we would store the result in the wrong place in the results array (since the value of task might have changed between calling the callback and returning back from the callback). See the chapter on Javascript gotchas for more information on scope rules in JS.
Ill go through the parts of the Node API that youll use the most when writing web applications. The rest of the API is best looked up from nodejs.org/api/. Fundament als The current chapter and Chapter 9. Process I/O and V8 VM Covered in Chapter TODO. Net work I/O HTTP and HTTPS are covered in Chapter 10. Terminal/console REPL is discussed in Chapter TODO. File syst em I/O The file system module is covered in Chapter 11. Test ing and debugging Coverage TODO.
Any properties assigned to the exports object will be accessible from the return value of the require() function: var hello = require(./hello.js); console.log(hello.funcname()); // Print Hello World You can also use module.exports instead of exports: function funcname() { return Hello World; } module.exports = { funcname: funcname }; This alternative syntax makes it possible to assign a single object to exports (such as a class). Weve previously discussed how you can build classes using prototypal inheritance. By making your classes separate modules, you can easily include them in your application: // in class.js: var Class = function() { } Class.prototype.funcname = function() {...} module.exports = Class; Then you can include your file using require() and make a new instance of your class: // in another le: var Class = require(./class.js); var object = new Class(); // create new instance
process.argv.
An array containing the command line arguments. The first element will be 'node', the second element will be the name of the JavaScript file. The next elements will be any additional command line arguments. Streams which correspond to the standard input, standard output and standard error output for the current process. An object containing the user environment of the current process. When a file is run directly from Node, require.main is set to its module .
The code below will print the values for the current script: console.log('__lename', __lename); console.log('__dirname', __dirname); console.log('process.argv', process.argv); console.log('process.env', process.env); if(module === require.main) { console.log('This is the main module being run.'); } require.main can be used to detect whether the module being currently run is the main module. This is useful when you want to do something else when a module is run standalone. For example, I make my test files runnable via node lename.js by including something like this: // if this module is the script being run, then run the tests: if (module === require.main) { var nodeunit_runner = require('nodeunit-runner'); nodeunit_runner.run(__lename); } process.stdin, process.stdout and process.stderr are briefly discussed in the next chapter, where we discuss readable and writable streams.
Directories as modules
You can organiz e your modules into directories, as long as you provide a point of entry for Node. The easiest way to do this is to create the directory ./node_modules/mymodulename/, and put an index.js file in that directory. The index.js file will be loaded by default.
Alternatively, you can put a package.json file in the mymodulename folder, specifying the name and main file of the module: { "name": "mymodulename", "main": "./lib/foo.js" } This would cause the file ./node_modules/mymodulename/lib/foo.js to be returned from require('mymodulename') . Generally, you want to keep a single ./node_modules folder in the base directory of your app. You can install new modules by adding files or directories to ./node_modules. The best way to manage these modules is to use npm, which is covered briefly in the next section.
8.2 npm
npm is the package manager used to distribute Node modules. I won't cover it in detail here, because the Internet does that already. npm is awesome, and you should use it. Below are a couple of use cases.
One of my favorite features is the ability to use git+ssh URLs to fetch remote git repositories. By specifying a URL like git+ssh://github.com:mixu/nwm.git#master, you can install a dependency directly from a remote git repository. The part after the hash refers to a tag or branch on the repository. To list the installed dependencies, use: npm ls
9.1 Timers
The timers library consists of four global functions: setTimeout(callback, delay, [arg], [...]) setInterval(callback, delay, [arg], [...]) Schedule the execution of the given callback after delay milliseconds. Returns a timeoutId for possible use with clearTimeout(). Optionally, you can also pass arguments to the callback. Schedule the repeated execution of callback every delay milliseconds. Returns a intervalId for possible use with clearInterval(). Optionally, you can also pass arguments to the callback.
clearTimeout(timeoutId) Prevents a timeout from triggering. clearInterval(intervalId) Stops an interval from triggering.
These functions can be used to schedule callbacks for execution. The setTimeout function is useful for performing housekeeping tasks, such as saving the state of the program to disk after a particular interval. The same functions are available in all major browsers:
// setting a timeout setTimeout(function() { console.log('Foo'); }, 1000); // Setting and clearing an interval var counter = 0; var interval = setInterval( function() { console.log('Bar', counter); counter++; if (counter >= 3) { clearInterval(interval); } }, 1000); While you can set a timeout or interval using a string argument (e.g. setTimeout(longRepeatedTask, 5000), this is a bad practice since the string has to be dynamically evaluated (like using the eval() function, which is not recommended). Instead, use a variable or a named function as instead of a string. Remember that timeouts and intervals are only executed when the execution is passed back to the Node event loop, so timings are not necessarily accurate if you have a long- running blocking task. So a long, CPU- intensive task which takes longer than the timeout/interval time will prevent those tasks from being run at their scheduled times.
Adding listeners
EventEmitters allow you to add listeners - callbacks - to any arbitrarily named event (except newListener, which is special in EventEmitter). You can attach multiple callbacks to a single event, providing for flexibility. To add a listener, use EventEmitter.on(event, listener) or EventEmitter.addListener(event, listener) - they both do the same thing: var obj = new MyClass(); obj.on(someevent, function(arg1) { }); You can use EventEmitter.once(event, listener) to add a callback which will only be triggered once, rather than every time the event occurs. This is a good practice, since you should keep the number of listeners to a minimum (in fact, if you have over 10 listerners, EventEmitter will warn you that you need to call emitter.setMaxListeners).
Triggering events
To trigger an event from your class, use EventEmitter.emit(event, [arg1], [arg2], [...]):
MyClass.prototype.whatever = function() { this.emit(someevent, Hello, World); }; The emit function takes an unlimited number of arguments, and passes those on to the callback(s) associated with the event. You can remove event listeners using EventEmitter.removeListener(event, listener) or EventEmitter.removeAllListeners(event), which remove either one listener, or all the listeners associated with a particular event.
9.3 St reams
Weve discussed the three main alternatives when it comes to controlling execution: Sequential, Full Parallel and Parallel. Streams are an alternative way of accessing data from various sources such as the network (TCP/UDP), files, child processes and user input. In doing I/O, Node offers us multiple options for accessing the data: Synchoronous Fully buffered Asynchronous
readFileSync() readFile()
read(), createReadStream()
The difference between these is how the data is exposed, and the amount of memory used to store the data.
Streams
However, in most cases we only want to read/write through the data once, and in one direction (forward). Streams are an abstraction over partially buffered data access that simplify doing this kind of data processing. Streams return smaller parts of the data (using a Buffer), and trigger a callback when new data is available for processing. Streams are EventEmitters. If our 1 GB file would, for example, need to be processed in some way once, we could use a stream and process the data as soon as it is read. This is useful, since we do not need to hold all of the data in memory in some buffer: after processing, we no longer need to keep the data in memory for this kind of application. The Node stream interface consists of two parts: Readable streams and Writable streams. Some streams are both readable and writable.
Readable streams
The following Node core objects are Readable streams: Files fs.createReadStream(path, [options]) HTTP (Server) http.ServerRequest HTTP (Client) Returns a new ReadStream object (See Readable Stream).
The request object passed when processing the request/response callback for HTTP servers. The response object passed when processing the response from an HTTP client
HTTP (Client) http.ClientResponse TCP net.Socket Child process child.stdout Child process child.stderr Process process.stdin
The response object passed when processing the response from an HTTP client request. Construct a new socket object. The stdout pipe for child processes launched from Node.js The stderr pipe for child processes launched from Node.js A Readable Stream for stdin. The stdin stream is paused by default, so one must call process.stdin.resume() to read from it.
Readable streams emit the following events: Event: data Event: end Event: error Emits either a Buffer (by default) or a string if setEncoding() was used. Emitted when the stream has received an EOF (FIN in TCP terminology). Indicates that no more 'data' events will happen. Emitted if there was an error receiving data.
To bind a callback to an event, use stream.on(eventname, callback). For example, to read data from a file, you could do the following: var fs = require('fs'); var le = fs.createReadStream('./test.txt'); le.on('error', function(err) { console.log('Error '+err); throw err; }); le.on('data', function(data) { console.log('Data '+data); }); le.on('end', function(){ console.log('Finished reading all of the data'); }); Readable streams have the following functions: pause() Pauses the incoming 'data' events.
resume() Resumes the incoming 'data' events after a pause(). destroy() Closes the underlying file descriptor. Stream will not emit any more events.
Writable streams
The following Node core objects are Writable streams: Files fs.createWriteStream(path, [options]) Returns a new WriteStream object (See Writable Stream). HTTP (Server) http.ServerResponse HTTP (Client) http.ClientRequest TCP net.Socket Child process child.stdin Process process.stdout Process process.stderr Writable streams emit the following events: A Writable Stream to stdout. A writable stream to stderr. Writes on this stream are blocking.
Event: drain After a write() method returned false, this event is emitted to indicate that it is safe to write again. Event: error Emitted on error with the exception exception.
Writable streams have the following functions: write(string, encoding='utf8') end() destroy() Writes string with the given encoding to the stream. Terminates the stream with EOF or FIN. This call will allow queued write data to be sent before closing the stream. Closes the underlying file descriptor. Stream will not emit any more events. Any queued write data will not be sent.
Lets read from stdin and write to a file: var fs = require('fs'); var le = fs.createWriteStream('./out.txt'); process.stdin.on('data', function(data) { le.write(data); }); process.stdin.on('end', function() { le.end(); }); process.stdin.resume(); // stdin in paused by default Running the code above will write everything you type in from stdin to the file out.txt, until you hit Ctrl+d (e.g. the end of file indicator in Linux). You can also pipe readable and writable streams using readableStream.pipe(destination, [options]). This causes the content from the read stream to be sent to the write stream, so the program above could have been written as: var fs = require('fs'); process.stdin.pipe(fs.createWriteStream('./out.txt')); process.stdin.resume();
// Create a Buer of 10 bytes var buer = new Buer(10); // Modify a value buer[0] = 255; // Log the buer console.log(buer); // outputs: <Buer 00 00 00 00 00 4a 7b 08 3f> Note how the buffer has its own representation, in which each byte is shown a hexadecimal number. For example, ff in hex equals 255, the value we just wrote in index 0. Since Buffers are raw allocations of memory, their content is whatever happened to be in memory; this is why there are a number of different values in the newly created buffer in the example. Buffers do not have many predefined functions and certainly lack many of the features of strings. For example, strings are not fixed siz e, and have convenient functions such as String.replace(). Buffers are fixed siz e, and only offer the very basics: new Buffer(siz e) new Buffer(str, encoding='utf8') new Buffer(array) buffer.write(string, offset=0, encoding='utf8') buffer.isBuffer(obj) Write a string to the buffer at [offset] using the given encoding. Tests if obj is a Buffer. Buffers can be created: 1) with a fixed siz e, 2) from an existing string and 3) from an array of octets
buffer.byteLength(string, encoding='utf8') Gives the actual byte length of a string. This is not the same as String.prototype.length since that returns the number of characters in a string. buffer.length buffer.copy(targetBuffer, targetStart=0, sourceStart=0, sourceEnd=buffer.length) buffer.slice(start, end=buffer.length) The siz e of the buffer in bytes. Does a memcpy() between buffers. Returns a new buffer which references the same memory as the old, but offset and cropped by the start and end indexes. Modifying the new buffer slice will modify memory in the original buffer! buffer.toString(encoding, start=0, end=buffer.length) Decodes and returns a string from buffer data encoded with encoding beginning at start and ending at end.
However, if you need to use the string functions on buffers, you can convert them to strings using buffer.toString() and you can also convert strings to buffers using new Buffer(str). Note that Buffers offer access to the raw bytes in a string, while Strings allow you to operate on charaters (which may consist of one or more bytes). For example: var buer = new Buer('Hyv piv!'); // create a buer containing Good day! in Finnish var str = 'Hyv piv!'; // create a string containing Good day! in Finnish // log the contents and lengths to console console.log(buer); console.log('Buer length:', buer.length); console.log(str); console.log('String length:', str.length); If you run this example, you will get the following output: <Buer 48 79 76 c3 a4 c3 a4 20 70 c3 a4 69 76 c3 a4 c3 a4 21> Buer length: 18 Hyv piv! String length: 13 Note how buffer.length is 18, while string.length is 13 for the same content. This is because in the default UTF- 8 encoding, the a with dots character is represented internally by two characters (c3 a4 in hexadecimal). The Buffer
allows us to access the data in its internal representation and returns the actual number of bytes used, while String takes into account the encoding and returns the number of characters used. When working with binary data, we frequently need to access data that has no encoding - and using Strings we could not get the correct length in bytes. More realistic examples could be, for example, reading an image file from a TCP stream, or reading a compressed file, or some other case where binary data will be accessed.
Creating an HTTP server is simple: after requiring the http module, you call createServer, then instruct the server to listen on a particular port: var http = require('http'); var server = http.createServer(function(request, response) { // Read the request, and write back to the response }); server.listen(8080, 'localhost'); The callback function you pass to http.createServer is called every time a client makes a request to the server. The callback should take two parameters - a request and a response - and send back HTML or some other output to the client. The request is used to determine what should be done (e.g. what path on the server was requested, what GET and POST parameters were sent). The response allows us to write output back to the client. The other API functions related to starting the HTTP server and receiving requests and closing the server are: http.createServer(requestListener) Returns a new web server object. The requestListener is a function which is automatically added to the 'request' event. server.listen(port, [hostname], [callback]) Begin accepting connections on the specified port and hostname. If the hostname is omitted, the server will accept connections directed to any IPv4 address (INADDR_ANY). To listen to a unix socket, supply a filename instead of port and hostname. This function is asynchronous. The last parameter callback will be called when the server has been bound to the port. server.on([eventname], [callback]) server.close() Allows you to bind callbacks to events such as "request", "upgrade" and "close". Stops the server from accepting new connections.
The server object returned by http.createServer() is an EventEmitter (see the previous chapter) - so you can also bind new request handlers using server.on():
// create a server with no callback bound to 'request' var server = http.createServer().listen(8080, 'localhost'); // bind a listener to the 'request' event server.on('request', function(req, res) { // do something with the request }); The other events that the HTTP server emits are not particularly interesting for daily use, so let's look at the Request and Response objects.
{ socket: { }, connection: { }, httpVersion: '1.1', complete: false, headers: { host: 'localhost:8080', connection: 'keep-alive', 'cache-control': 'max-age=0', 'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) ...', accept: 'application/xml,application/xhtml+xml ...', 'accept-encoding': 'gzip,deate,sdch', 'accept-language': 'en-US,en;q=0.8', 'accept-charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3' }, trailers: {}, readable: true, url: '/', method: 'GET', statusCode: null, client: { }, httpVersionMajor: 1, httpVersionMinor: 1, upgrade: false }
{ href: 'http://user:[email protected]:8080/p/a/t/h?query=string#hash', protocol: 'http:', host: 'user:[email protected]:8080', auth: 'user:pass', hostname: 'host.com', port: '8080', pathname: '/p/a/t/h', search: '?query=string', query: { query: 'string' }, hash: '#hash', slashes: true } Of these result values, there are three are most relevant for data prosessing in the controller: pathname (the URL path), query (the query string) and hash (the hash fragment).
application/x-www-f orm-urlencoded
name=John+Doe&gender=male&family=5&city=kent&city=miami&other=abc%0D%0Adef&nickname=J%26D application/x- www- form- urlencoded data is encoded like a GET request. It's the default encoding for forms and used for most textual data. The QueryString module provides two functions: querystring.parse(str, sep=&, eq==): Parses a GET query string and returns an object that contains the parameters as properties with values. Example: qs.parse(a=b&c=d) would return {a: b, c: d}. querystring.stringify(obj, sep=&, eq==): Does the reverse of querystring.parse(); takes an object with properties and values and returns a string. Example: qs.stringify({a: b}) would return a=b. You can use querystring.parse to convert POST data into an object: var qs = require('querystring'); var data = ''; req.on('data', function(chunk) { data += chunk; }); req.on('end', function() { var post = qs.parse(data); console.log(post); });
multipart/f orm-data
Content-Type: multipart/form-data; boundary=AaB03x --AaB03x Content-Disposition: form-data; name="submit-name" Larry --AaB03x Content-Disposition: form-data; name="les"; lename="le1.txt" Content-Type: text/plain ... contents of le1.txt ... --AaB03x-multipart/form- data is used for binary files. This encoding is somewhat complicated to decode, so I won't provide a snippet. Instead, have a look at how it's done in: felixge's node- formidable visionmedia's connect- form
Implicitly: the first time response.write() is called, the currently set implicit headers are sent. The API has the details: response.writeHead(statusCode, Sends a response header to the request. The status code is a 3- digit HTTP [reasonPhrase], [headers]) status code, like 404. The last argument, headers, are the response headers. Optionally one can give a human- readable reasonPhrase as the second argument. response.statusCode When using implicit headers (not calling response.writeHead() explicitly), this property controls the status code that will be send to the client when the headers get flushed. Sets a single header value for implicit headers. If this header already exists in the to- be- sent headers, it's value will be replaced. Use an array of strings here if you need to send multiple headers with the same name. Reads out a header that's already been queued but not sent to the client. Note that the name is case insensitive. This can only be called before headers get implicitly flushed. Removes a header that's queued for implicit sending.
response.removeHeader(name)
Generally, using implicit headers is simpler since you can change the individual headers up until the point when the first call to response.write() is made.
HT T P client
Methods http.request(options, callback) http.get(options, callback) Client request Methods Events Client response Methods Events Properties
var http = require('http'); var options = { host: 'www.google.com', port: 80, path: '/' }; var req = http.get(options, function(response) { // handle the response var res_data = ''; response.on('data', function(chunk) { res_data += chunk; }); response.on('end', function() { console.log(res_data); }); }); req.on('error', function(e) { console.log("Got error: " + e.message); }); To add GET query parameters from an object, use the querystring module: var qs = require('querystring'); var options = { host: 'www.google.com', port: 80, path: '/'+'?'+qs.stringify({q: 'hello world'}) }; // .. as in previous example As you can see above, GET parameters are sent as a part of the request path.
// JSON encoding opts.headers['Content-Type'] = 'application/json'; req.data = JSON.stringify(req.data); opts.headers['Content-Length'] = req.data.length; Making a request is very similar to making a GET request: var req = http.request(opts, function(response) { response.on('data', function(chunk) { res_data += chunk; }); response.on('end', function() { callback(res_data); }); }); req.on('error', function(e) { console.log("Got error: " + e.message); }); // write the data if (opts.method != 'GET') { req.write(req.data); } req.end(); Note, however, that you need to call req.end() after http.request(). This is because http.ClientRequest supports sending a request body (with POST or other data) and if you do not call req.end(), the request remains "pending" and will most likely not return any data before you end it explicitly.
[callback]) fs.read(fd, buffer, offset, length, position, [callback]) fs.write(fd, buffer, offset, length, position, [callback]) fs.fsync(fd, callback) fs.truncate(fd, len, [callback]) fs.close(fd, [callback])
Files: inf o Files: rename, watch changes & change timestamps fs.rename(path1, path2, [callback]) fs.watchFile(filename, [options], listener) fs.unwatchFile(filename) fs.watch(filename, [options], listener) fs.utimes(path, atime, mtime, callback) fs.futimes(path, atime, mtime, callback)
Readable st reams Files: Owner and permissions fs.chown(path, uid, gid, [callback]) fs.fchown(path, uid, gid, [callback]) fs.lchown(path, uid, gid, [callback]) fs.chmod(path, mode, [callback]) fs.fchmod(fd, mode, [callback]) fs.lchmod(fd, mode, [callback])
You should use the asynchronous version in most cases, but in rare cases (e.g. reading configuration files when starting a server) the synchronous version is more appropriate. Note that the asynchronous versions require a bit more thought, since the operations are started immediately and may finish in any order: fs.readFile('./le.html', function (err, data) { // ... }); fs.readFile('./other.html', function (err, data) { // .. }); These file reads might complete in any order depending on how long it takes to read each file. The simplest solution is to chain the callbacks: fs.readFile('./le.html', function (err, data) { // ... fs.readFile('./other.html', function (err, data) { // ... }); }); However, we can do better by using the control flow patterns discussed in the chapter on control flow.
Recipe: Opening, seeking to a position, reading f rom a f ile and closing it (in parts)
fs.open('./data/index.html', 'r', function(err, fd) { if(err) throw err; var str = new Buer(3); fs.read(fd, buf, 0, buf.length, null, function(err, bytesRead, buer) { if(err) throw err; console.log(err, bytesRead, buer); fs.close(fd, function() { console.log('Done'); }); }); });
var path = './data/'; fs.readdir(path, function (err, les) { if(err) throw err; les.forEach(function(le) { console.log(path+le); fs.stat(path+le, function(err, stats) { console.log(stats); }); }); }); fs.stat() gives us more information about each item. The object returned from fs.stat looks like this: { dev: 2114, ino: 48064969, mode: 33188, nlink: 1, uid: 85, gid: 100, rdev: 0, size: 527, blksize: 4096, blocks: 8, atime: Mon, 10 Oct 2011 23:24:11 GMT, mtime: Mon, 10 Oct 2011 23:24:11 GMT, ctime: Mon, 10 Oct 2011 23:24:11 GMT } atime, mtime and ctime are Date instances. The stat object also has the following functions: stats.isFile() stats.isDirectory() stats.isBlockDevice() stats.isCharacterDevice() stats.isSymbolicLink() (only valid with fs.lstat()) stats.isFIFO() stats.isSocket() The Path module has a set of additional functions for working with paths, such as: path.normaliz e(p) path.join([path1], [path2], [...]) path.resolve([from ...], to) Normaliz e a string path, taking care of '..' and '.' parts. Join all arguments together and normaliz e the resulting path. Resolves to to an absolute path. If to isn't already absolute from arguments are prepended in right to left order, until an absolute path is found. If after using all from paths still no absolute path is found, the current working directory is used as well. The resulting path is normaliz ed, and trailing slashes are removed unless the path gets resolved to the root directory. Resolves both absolute (/path/file) and relative paths (../../file) and returns the absolute path to the file.
Return the directory name of a path. Similar to the Unix dirname command. Return the last portion of a path. Similar to the Unix basename command. Return the extension of the path. Everything after the last '.' in the last portion of the path. If there is no '.' in the last portion of the path or the only '.' is the first character, then it returns an empty string. Test whether or not the given path exists. Then, call the callback argument with either true
path.exists(p,
Test whether or not the given path exists. Then, call the callback argument with either true or false.
path to search, the name of the file we are looking for, and a callback which is called when the file is found. Here is the naive version: a bunch of nested callbacks, no thought needed: var fs = require('fs'); function ndFile(path, searchFile, callback) { fs.readdir(path, function (err, les) { if(err) { return callback(err); } les.forEach(function(le) { fs.stat(path+'/'+le, function() { if(err) { return callback(err); } if(stats.isFile() && le == searchFile) { callback(undened, path+'/'+le); } } else if(stats.isDirectory()) { ndFile(path+'/'+le, searchFile, callback); } }); }); }); } ndFile('./test', 'needle.txt', function(err, path) { if(err) { throw err; } console.log('Found le at: '+path); }); Splitting the function into smaller functions makes it somewhat easier to understand: var fs = require('fs'); function ndFile(path, searchFile, callback) { // check for a match, given a stat function isMatch(err, stats) { if(err) { return callback(err); } if(stats.isFile() && le == searchFile) { callback(undened, path+'/'+le); } else if(stats.isDirectory()) { statDirectory(path+'/'+le); } } // launch the search statDirectory(path, isMatch); } // Read and stat a directory function statDirectory(path, callback) { fs.readdir(path, function (err, les) { if(err) { return callback(err); } les.forEach(function(le) { fs.stat(path+'/'+le, callback); }); }); } ndFile('./test', 'needle.txt', function(err, path) { if(err) { throw err; } console.log('Found le at: '+path); }); The function is split into smaller parts: findFile: This code starts the whole process, taking the main input arguments as well as the callback to call with the results. isMatch: This hidden helper function takes the results from stat() and applies the "is a match" logic necessary to implement findFile(). statDirectory: This function simply reads a path, and calls the callback for each file. I admit this is fairly verbose.
travelsal logic in the same module, you have a fixed interface which you can write your path traversing operations against.
The problem with periodic polling is that: 1) it tends to generate a lot of requests and 2) it's not instant - if messages arrive during the time the client is waiting, then those will only be received later. Long polling . This is similar to periodic polling, except that the server does not return the response immediately. Instead, the response is kept in a pending state until either new data arrives, or the request times out in the browser. Compared to periodic polling, the advantage here is that clients need to make fewer requests (requests are only made again if there is data) and that there is no "idle" timeout between making requests: a new request is made immediately after receiving data. Client: Are we there yet? Server: [Wait for ~30 seconds] Server: No Client: Are we there yet? Server: Y es. Here is a message for you. This approach is slightly better than periodic polling, since messages can be delivered immediately as long as a pending request exists. The server holds on to the request until the timeout triggers or a new message is available, so there will be fewer requests. However, if you need to send a message to the server from the client while a long polling request is ongoing, a second request has to be made back to the server since the data cannot be sent via the existing (HTTP) request. Socket s / long- lived connections. WebSockets (and other transports with socket semantics) improve on this further. The client connects once, and then a permanent TCP connection is maintained. Messages can be passed in both ways through this single request. As a conversation: Client: Are we there yet? Server: [Wait for until we're there] Server: Y es. Here is a message for you. If the client needs to send a message to the server, it can send it through the existing connection rather than through a separate request. This efficient and fast, but Websockets are only available in newer, better browsers.
Socket .io
As you can see above, there are several different ways to implement Comet. Socket.io offers several different transports: Long polling: XHR- polling (using XMLHttpRequest), JSONP polling (using JSON with padding), HTMLFile (forever Iframe for IE) Sockets / long- lived connections: Flash sockets (Websockets over plain TCP sockets using Flash) and Websockets Ideally, we would like to use the most efficient transport (Websockets) - but fall back to other transports on older browsers. This is what Socket.io does.
{ "name": "siosample", "description": "Simple Socket.io app", "version": "0.0.1", "main": "server.js", "dependencies": { "socket.io": "0.8.x" }, "private": "true" } This allows us to install the app with all the dependencies using npm install. In server.js: var fs = require('fs'), http = require('http'), sio = require('socket.io'); var server = http.createServer(function(req, res) { res.writeHead(200, { 'Content-type': 'text/html'}); res.end(fs.readFileSync('./index.html')); }); server.listen(8000, function() { console.log('Server listening at http://localhost:8000/'); }); // Attach the socket.io server io = sio.listen(server); // store messages var messages = []; // Dene a message handler io.sockets.on('connection', function (socket) { socket.on('message', function (msg) { console.log('Received: ', msg); messages.push(msg); socket.broadcast.emit('message', msg); }); // send messages to new clients messages.forEach(function(msg) { socket.send(msg); }) }); First we start a regular HTTP server that always respondes with the content of "./index.html". Then the Socket.io server is attached to that server, allowing Socket.io to respond to requests directed towards it on port 8000. Socket.io follows the basic EventEmitter pattern: messages and connection state changes become events on socket. On "connection", we add a message handler that echoes the message back and broadcasts it to all other connected clients. Additionally, messages are stored in memory in an array, which is sent back to new clients so that the can see previous messages. Next, let's write the client page (index.html):
<html> <head> <style type="text/css"> #messages { padding: 0px; list-style-type: none;} #messages li { padding: 2px 0px; border-bottom: 1px solid #ccc; } </style> <script src="http://code.jquery.com/jquery-1.7.1.min.js"></script> <script src="/socket.io/socket.io.js"></script> <script> $(function(){ var socket = io.connect(); socket.on('connect', function () { socket.on('message', function(message) { $('#messages').append($('<li></li>').text(message)); }); socket.on('disconnect', function() { $('#messages').append('<li>Disconnected</li>'); }); }); var el = $('#chatmsg'); $('#chatmsg').keypress(function(e) { if(e.which == 13) { e.preventDefault(); socket.send(el.val()); $('#messages').append($('<li></li>').text(el.val())); el.val(''); } }); }); </script> </head> <body> <ul id="messages"></ul> <hr> <input type="text" id="chatmsg"> </body> </html> BTW, "/socket.io/socket.io.js" is served by Socket.io, so you don't need to have a file placed there. To start the server, run node server.js and point your browser to http://localhost:8000/. To chat between two users, open a second tab to the same address.
Additional f eatures
There are two more advanced examples on Github. I'm going to focus on deployment, which has not been covered in depth.
made from Javascript. JSONP, or JSON with padding, is an alternative technique, which relies on the fact that the <script> tag is not subject to the same origin policy to receive fragments of information (as JSON in Javascript). Socket.io supports these techniques, but you should consider try to set up you application in such a way that the HTML page using Socket.io is served from the same host, port and protocol. Socket.io can work even when the pages are different, but it is subject to more browser restrictions, because dealing with the same origin policy requires additional steps in each browser. There are two important things to know: First, you cannot perform requests from a local file to external resources in most browsers. You have to serve the page you use Socket.io on via HTTP. Second, IE 8 will not work with requests that 1) violate the same origin policy (host/port) and 2) also use a different protocol. If you serve your page via HTTP (http://example.com/index.html) and attempt to connect to HTTPS (https://example.com:8000), you will see an error and it will not work. My recommendation would be to only use HTTPS or HTTP everywhere, and to try to make it so that all the requests (serving the page and making requests to the Socket.io backend) appear to the browser as coming from the same host, port and protocol. I will discuss some example setups further below.
The benefit is simplicity, but of course you are now tasking your Node server with a lot of work that it wouldn't need to do, such as serving static files and (optionally) SSL termination. The first step in scaling this setup up is to use more CPU cores on the same machine. There are two ways to do this: use a load balancer, or use node cluster.
to perform SSL termination before routing the request. There are two options: Use Node to terminate SSL requests (e.g. start a HTTPS server). Use a separate SSL terminator, such as stunnel, stud or specializ ed hardware Using Node is a neat solution, however, this will also increase the overhead per connection in Node (SSL termination takes memory and CPU time from Socket.io) and will require additional coding. I would prefer not to have to maintain the code for handling the request routing in the Node app - and hence recommend using HAProxy. Here we will use stunnel (alternative: stud) to offload this work. Nginx will proxy to Ruby only and Socket.io is only accessible behind SSL. [Nginx at :80] --> [Ruby at :3000] [Stunnel at :443] --> [HAProxy at :4000] --> [Socket.io at :8000] --> [Nginx at :80] --> [Ruby at :3000] Traffic comes in SSL- encrypted to port 443, where Stunnel removes the encryption, and then forwards the traffic to HAProxy. HAProxy then looks at the destination and routes requests to /socket.io/ to Node at port 8000, and all other requests to Ruby/Nginx at port 3000. To run Stunnel, use stunnel path/to/stunnel.conf. The associated HAProxy and Stunnel configuration files can be found here for your cloning and forking convinience. To make connections to port 443 over SSL, run the connection tests for the testing tool using node client.js https. If your HAProxy + Stunnel setup works correctly, you will get a "It worked" message from the client.
HAproxy is configured with the same (URL- based) routing as in the previous example, but the traffic is balanced over several servers. Note that in the configuration file, two different load balancing strategies are used. For the second (non- Socket.io) stack, we are using round robin load balancing. This assumes that any server in the pool can handle any request. With Socket.io, there are two options for scaling up to multiple machines: First, you can use source IP based sticky load balancing. Source IP based stickiness is needed because of the way Socket.io handles handshakes: unknown clients (e.g. clients that were handshaken on a different server) are rejected by the current (0.8.7) version of Socket.io. This means that: 1. in the event of a server failure, all client sessions must be re- established, since even if the load balancer is smart enough to direct the requests to a new Socket.io server, that server will reject those requests as not handshaken. 2. load balancing must be sticky, because for example round robin would result in every connection attempt being rejected as "not handshaken" - since handshakes are mandatory but not synchroniz ed across servers. 3. doing a server deploy will require all clients to go through a new handshake, meaning that deploys are intrusive to the end users. Example with four backend servers behind a load balancer doing round robin: [client] [client] [client] [client] -> /handshake -> [load balancer] -> [server #1] Y our new Session id is 1 -> /POST data (sess id =1) -> [load balancer] -> [server #2] Unknown session id, please reconnect -> /handshake -> [load balancer] -> [server #3] Y our new Session id is 2 -> /POST data (sess id =2) -> [load balancer] -> [server #4] Unknown session id, please reconnect
This means that you have to use sticky load balancing with Socket.io. The second alternative is to use the Stores mechanism in Socket.io. There is a Redis store which synchroniz es in memory information across multiple servers via Redis. Unfortunately, the stores in Socket.io are only a partial solution, since stores rely on a pub/sub API arrangement where all Socket.io servers in the pool receive all messages and maintain the state of all connected clients inmemory. This is not desirable in larger deployments, because the memory usage now grows across all servers independently of whether a client is connected to a particular server (related issue on GitHub). Hopefully, in the future, Socket.io (or Engine.io) will offer the ability to write a different kind of system for synchroniz ing the state of clients accross multiple machines. In fact, this is being actively worked on in Engine.io - which will form the basis for the next release of Socket.io. Until then, you have to choose between these two approaches to scale over multiple machines.