Node.js Takes a Bite Out of the Back-end
Tempted to consolidate to a single stack, one of Lullabot’s largest digital publishing clients, a TV network, has begun to phase out Drupal in favor of microservices written in Node.js. They still need Drupal to maintain metadata and collections, but they’re talking about moving all of their DRM (digital rights management) content to a new microservice. This also provoked my curiosity. Is the rapidly changing Node.js ecosystem ready for the enterprise? Who is guaranteeing your stack stays secure? The Node foundation? Each maintainer? Drupal has a seasoned and dedicated security team that keeps core and contrib safe. Node.js is changing fast, and the small applications written in Node.js are changing faster than that. This is a rate of change typically anathema to enterprise software. And it’s a significant shift for the enterprise in other ways, as well. While Drupal sites are built with many small modules that together create a unique application, they all live within one set of code. The Node.js community approaches the same problem by building small applications that communicate with each other over a network (known as microservices). The preferred approach of most within the Node.js community is to build applications using a microservices architecture. Does that work in the context of a major enterprise publisher? Is it maintainable? Airbnb, Paypal, and Netflix have proven that it can be, but for many of our clients in the digital publishing industry, I wonder how its use within these technology companies pertains. Arguably, Amazon pioneered the modern decouple-all-the-things, service-oriented architecture, and, well, you, dear client, are not Amazon.
In this article, I’ll explore this question through examples and examine how JS technologies are changing the traditional CMS stack architecture within some of the client organizations we work with. I should disclose my own limitations as I tackle an ambitious topic. I’ve approached this exploration as a journalist and a business person, not as an engineer. I’ve tried to interview many sources and reflect the nuances of the technical distinctions they’ve made, but I do so as a technical layperson, so any inaccuracies are my own.
The Cathedral and the Bazaar
Drupal still benefits from a thriving open-source community. According to Drupal.org, Drupal has more than 111,000 users actively contributing. Meaning, in perhaps the most essential way, it is still benefiting from Linux founder Linus Torvald’s law, “Given enough eyes, all bugs are shallow.” Moreover, the Drupal community remains the envy of free software movement, according to Google’s Steve Francia, who also guided the Docker and MongoDB communities following Drupal’s lead.
Concurrency and non-blocking IO
I spoke with some of our clients who are gradually increasing the amount of JS in their stack and reducing the role of Drupal about why they’ve chosen to do so. (I only found one who is seeking to eliminate it all together, and they haven’t been able to do so yet.) Surprisingly, it had nothing to do with Hannah’s observation that hiring developers for and maintaining a single stack would pay dividends in a large organization. This was a secondary benefit, not a primary motivation. The short answer in each case was speed, in one case speed in the request-response sense, and in the other, speed in the go-to-market sense.
Let’s look at the first example. We work with a large entertainment media company that provides digital services for a major sports league. Their primary site and responsive mobile experience are driven by Drupal 8’s presentation layer (though there’s also the usual caching and CDN magic to make it fast). Beyond the website, this publisher needs to feed data to 17 different app experiences including things like iOS, tvOS, various Android devices, Chromecast, Roku, Samsung TV, etc. Using a homegrown system (JSON API module wasn’t finished when this site was migrated to D8), this client pushes all of their content into an Elasticsearch datastore where it is indexed and available for the downstream app consumers. They built a Node.js-based API to provide the middleware between these consumers and Elasticsearch. According to a stakeholder, the group achieved “single-digit millisecond responses to any API call, making it the easiest thing in the whole stack to scale.”
This is likely in part due to one of the chief virtues of Node.js. According to Node’s about page:
Multiple core CPUs can handle multiple threads in direct proportion to the number of CPU cores. The OS manages these threads and can switch between them as it sees fit. Whereas PHP moves through a set of instructions top to bottom with a single pointer for where it’s at in those instructions, Node.js uses a more complex program counter that allows it to have multiple counters at a time. To roughly characterize this difference between the Node.js asynchronous event loop and the approach taken by other languages, let me offer a metaphor. Pretend for a moment that threads are cooks in the kitchen awaiting instructions. Our PHP chef needs to proceed through the recipe a step at a time: chopping vegetables, and then boiling water, and then putting on a frying pan to sauté those veggies. While the water is boiling, our PHP chef waits, or the OS makes a decision and moves on to something else. Our Node.js chef, on the other hand, can handle multi-tasking to a degree, starting the water to boil, leaving a pointer there, and then moving on to the next thing.
However, Node.js can only do this for input and output, like reading a database or fetching data over HTTP. This is what is referred to as “non-blocking IO.” And, it’s why the Node.js community can say things like, “projects that need big concurrency will choose Node (and put up with its warts) because it’s the best way to get their project done.” Asynchronous, event-driven programs are tricky. The problem is that parallelism is a hard problem in computer science and it can have unexpected results. Imagine our cook accidentally putting the onion on the stove to fry before the pan is there or the burner is lit. This is akin to trying to read the results from a database before those results are available. Other languages can do this too, but Node’s real innovation is in taking one of the easier to solve problems of concurrent programming (IO), designing it directly into the system so it’s easier to use by default, and marketing those benefits to developers who may not have been familiar with similar solutions in more heavyweight languages like Java.
Even though Node.js does well as a listener for web requests, the non-blocking IO bogs down if you’re performing CPU intensive computation. You wouldn’t use Node.js “to build a Fibonacci computation server in Node.js. In general, any CPU intensive operation annuls all the throughput benefits Node offers with its event-driven, non-blocking I/O model because any incoming requests will be blocked while the thread is occupied with your number-crunching,” writes Tomislav Capan in “Why the Hell Would You Use Node.js.” And Node.js is inherently single-threaded. If you run a Node.js application on a CPU with 8 cores, Node.js would just use one, whereas other server-side languages could make use of all of them. So Node.js is designed for lots of little concurrent tasks, like real-time updates to requests or user interactions, but bad at computationally intensive ones. As one might expect, given that, it’s not great for image processing, for instance. But it’s great for making seemingly real-time, responsive user interfaces.
After hacking together that first site, they found React and began to take full advantage of stateful components that provide real-time, interactive UX, like AJAX but better. Data refreshes instantly, and that refresh can be caused by a user’s actions on the front-end or initiated by the server. To hear this client tell it, discovering these technologies and the virtues of Node.js led them to change the role of Drupal. “With Drupal, we were fighting scale in terms of usage, and that led us to commit to a new, microservices-oriented stack with Drupal playing a more limited role in a much larger data pipeline that utilizes a number of smaller networked programs.” These included a NoSQL data-as-a-service provider called MarkLogic, and a search service called Algolia, among others.
As Acquia CTO Dries Buytaert wrote in The Future of Decoupled Drupal,
Before decoupling, you need to ask yourself if you're ready to do without functionality usually provided for free by the CMS, such as layout and display management, content previews, user interface (UI) localization, form display, accessibility, authentication, crucial security features such as XSS (cross-site scripting) and CSRF (cross-site request forgery) protection, and last but not least, performance. Many of these have to be rewritten from scratch, or can't be implemented at all, on the client-side. For many projects, building a decoupled application or site on top of a CMS will result in a crippling loss of critical functionality or skyrocketing costs to rebuild missing features.
These things all have to be reinvented in a decoupled site. This is prohibitively expensive for small to medium-sized businesses, but for large enterprises with the resources and a predilection for lean, specific architectures, it’s a reasonable trade-off to harness the power of something like the React library fully.
A sophisticated front-end such as a single-page application, a PWA, or a React application, let’s say, still needs a data source to feed it content. And while it’s possible to make use of different services to furnish this data pipeline, editors still need a place to edit content, govern content, and manage meta-data and the relationship between different pieces of content; it’s a task to which the PHP monolithic CMS platforms are uniquely suited.
“I’m not convinced from my explorations of the JS ecosystem that NPM packages and Node.js are mature enough to build something to compete with Drupal,” says senior architect Andrew Berry. “Drupal is still relevant because it’s predicated on libraries that have 5-10 years of development, whereas in the Node world everything is thrown out every 6 months. In Drupal, we can’t always get clients to do major releases, can you imagine if we had to throw it out or change it every 6 months?” This was echoed by other experts that I spoke with.
To get involved with the admin-ui-js initiative, start here.
To get involved with the API-first initiative, start here.
Special thanks to Lullabots Andrew Berry, Ben Chavet, John Hannah, Mateu Aguiló Bosch, Mike Herchel, and Sally Young for helping me take on a topic that was beyond my technical comfort zone.