NodeJS - From “Yea Right” to “Hell Ya”

Yea Right
If someone had told me 2 years ago that I would be implementing server-side logic in Javascript, I would have laughed him out of the room. It was about a year ago I came across an article on a framework called, node.JS. At the time, the idea of building networked servers/services in Javascript seemed very foreign to me, but I remember the sense of excitement I felt about the thought of it. When I tried to explain what node was to my colleagues and it’s potential to solve some of our specific problems, they gave me a dumbass look followed by a resonating, ‘Yea right’.

What follows is a very high level view of a few of the pieces in the node puzzle, but this should be enough to get you inspired to craft some server-side Javascript.
Non-blocking I/O
Node is a set of asynchronous libraries that are built on top of Google’s high performance V8 Javascript engine. The asynchronous nature comes from the fact that it takes a non-blocking approach to all I/O. In traditional runtime environments, threads are used as a means to support concurrency and scale. In a threaded model I/O is a wait operation. This simply means that operations required to access an external resource, a database request for instance, will result in the thread waiting until the I/O completes. Oftentimes, from my experience at least, the majority of the time spent processing client requests is spent waiting on I/O.
Node works quite differently using a single-threaded/event loop design. “In node everything runs in parallel except your code”. Yep, that’s right. The best way to explain this is to use an analogy. Let’s say your code is the manager for a courier service and node is the manager’s fleet of couriers. When the day starts, the manager assigns jobs to the couriers and the couriers go to work delivering packages throughout the city. Once all the work has been assigned, the manager decides it’s time for a little nap. After a short snooze, the manager is awakened by a knock at the door. His couriers have just begun to return after completing their tasks and are lining up at the door. The manager lets each courier in one at a time and gathers the relevant paperwork associated with the job. As each courier leaves the manager’s office, he closes the door and the manager tries to get back to his nap. Unfortunately for the manager, as long as there are couriers lined up outside of his door, the knocking will continue. This flow continues throughout the day until there is no more work left and all the couriers have reported back with the necessary paperwork from their jobs.
While this is a contrived example, it should give you a general sense of how things work in node. Your code issues async function calls to the node API, and node calls back (knocks), as results become available. There are a couple of interesting things to note about the node runtime environment. First, when the application code is running, it is the only thing that is running in the event loop,(it’s single threaded remember), and will run until there is no more logic to execute. When the event loop starts, it checks for the presence of async call results and calls back to the application code if one is available. One interesting behavior to be aware of is that the order in which the callbacks are received by the application code may or may not be the order in which the async operation completed. This detail will/should not have any observable side effects as your application code ‘should’ run blazingly fast. (Stay away from CPU intensive operations in your script).
Intermission break => roflscale
CommonJS & Node Package Manager (NPM)
The node environment includes a module system base on CommonJS. The CommonJS initiative was started in order to address many of the needs that would be required to use Javascript as a language outside of the browser. Platform like things such as a module system, file system access, binary I/O, process management, sockets, etc. The important thing for this context is the module system that allows us to require other modules via the require() function defined in node’s global namespace. It will become more apparent in the code examples later.
NPM is to node what RubyGems is to Ruby or Maven is to Java (yuck btw). You can use it to publish your node modules for others to consume but you will primarily use it to fetch 3rd party packages for use in your scripts.
Conventions & Patterns
The programming model can be summed up as ‘callback driven’. This shouldn’t be a big surprise, it’s just Javascript. A large majority of your code will exist in callback functions.
Comparison between sync and async programming paradigms.
Sync
function do(something) {
if (!something) throw new Error('nothing to do!');
return 'done';
}
try {
var result = do(something);
console.log(result);
} catch err {
throw err
}
Async
function do(something, callback) {
if (!something) callback(new Error('nothing to do!'));
return callback(null, result);
}
do('work', function(err, result) {
if (err) throw err;
console.log(result);
});
Patterns
Callback parameter
The convention in node is to pass the callback function as the last parameter to an async function.
function doAsync(something, callback) {
callback(null, result);
}
doAsync(work, function(err, result) {
if (err) throw err;
console.log(result);
});
Callback result
The convention in node is to pass the result to the callback as the 2nd argument.
function doAsync(something, callback) {
...
callback(null, result);
}
Error Propagation aka Errorback
The convention in node is to pass the callback function an error as the first argument or null if an error wasn’t encountered.
function doAsnyc(something, callback) {
if (!something) callback(new Error('nothing to do!'));
...
}
Codes using node - taken from the node docs
Filesystem
var fs = require('fs');
fs.readFile('/etc/passwd', function (err, data) {
if (err) throw err;
console.log(data);
});
Http
var http = require('http');
http.createServer(function (req, res) {
res.writeHead(200, {'Content-Type': 'text/plain'});
res.end('Hello World\n');
}).listen(1337, "127.0.0.1");
console.log('Server running at http://127.0.0.1:1337/')
Sweet Spot
The ecosystem around node is pretty incredible and there are an amazing number of 3rd party modules in the NPM repo. It seems that just about everything under the moon is being developed or ported for the node platform. I cut my teeth writing a JSONP middleware module for the connect framework which is to node what rack is to Ruby. Connect is the foundation for the popular Express web application framework. While Express is pretty solid, I’m not convinced that I would write web applications on top of node. I don’t see the advantage over mature web frameworks.
Another area where I wouldn’t use node is for CPU intensive logic. This actually goes against what node is trying to solve, efficiency through offloading latency intensive operations outside of the process / event loop. If each iteration of the event loop takes a considerable amount of time performing application logic, the benefit of non-blocking I/O gets negated.
I have found node to be very well suited for I/O intensive batch processing. In particular, I’m using node as a lightweight, scalable and high performant means for shelling out to CPU hungry Unix command line tools using node to manage intensive I/O. I built Mimeograph as a simple OCR/text extraction batching solution to orchestrate tools such as ghostscript, image magick and tesseract. Mimeograph uses a simple utility called RedisFS, to move files to be processed in and out of Redis and to and from a Unix filesystem. Mimeograph uses coffee-resque as the batching framework, a node port of the popular Resque framework. With the architecture, I can massively scale out mimeograph processes across virtual machines to support a very impressive amount of OCR/text extraction processing required by the business to support upcoming legal proceedings as well as process multiple years worth of indexing backlog. I’ll post a focused discussion of Mimeograph in the near future and how to best utilize coffee-resque to distribute CPU intensive parallel problems.
I recently watched a vid in which Isaac Z. Schlueter of NPM fame in which he explains why node is well suited for Data-Intensive Real-Time (DIRT) applications. Pretty good stuff, take a look.
Hell Ya
Javascript is definitely not just for the browser anymore, thanks to node. I’ve been working with it for the last year and have had great success with it in production. Mimeograph impressed my client so much that we are targeting more opportunities within the organization where it is a good fit. Things have changed quite a bit over the last year, the dumbass looks are no longer, and the ‘yeah right’ responses are now a resonating ‘hell ya!’.
Happy Noding.