Want to get Lullabot article, videocast, and podcast announcements delivered right to your in-box? Let us know your email address (we won't share it) and we'll let you know when anything exciting happens.

The Trouble With Nodes

Warning! Super-geeky post ahead!

Drupal's node system is extremely powerful, and has made it one of the most flexible OSS CMS systems around. however, when it comes time to actually build and render nodes for display, our APIs are showing their age. *Many* modules often alter the contents of a node before it's output, themers often need much finer control over the data when it comes time to output it, and often HTML is insufficient. Output as JSON data, PDF, XML, and so on is essential for many complex sites. What problems do we currently face, and what are some of the possible solutions?

This post is a stab at capturing some of the diverse discussions going on around Drupal-space and describing some of the solutions that are being proposed.
Feedback is welcome. I'd love to see some of these things make it into 6 before the freeze, but I'd love even more for the community to iron our a really solid path forward.

The problems

  • Namespace collisions
    The existing $node object is a dumping ground for properties: every module gets a chance to add properties to it and very few use defined namespaces like $node->modulename. Common nouns like 'state' or 'vote', and ambiguous ids like 'pid', 'vid', and 'sid' can easily collide. There is no way to attach metadata to these properties (what module they belong to, for example) without excessively verbose naming schemes. See http://drupal.org/node/148420
  • Rigid theming and rendering
    After the node is loaded as an object, its display content is built as a structured array. However, only some chunks of a node's content are included: Taxonomy terms, links, submission information, etc are handled separately and cannot benefit from rendering system features.
    After this content is built, we collapse that array to HTML and use it to replace the existing $node->body variable before passing the node to the theming layer. Themers are unable to control this rendered content, but $node->body and $node->teaser have already been overwritten.
    Themers who want to make simple changes -- altering the order of fields, showing the first field in a list but linking to additional fields, etc. -- have no recourse other than writing a module from scratch to alter the node.
    See http://www.drupal.org/node/134478
  • Only two display modes
    Nodes are displayed in a wide variety of contexts and for a variety of purposes, but there's only one way to control what contents are assembled: the TRUE or FALSE $teaser variable. $page and $links also exist, but these are tacked-on special case booleans. Piling on additional boolean params for every special case is obviously unacceptable (witness the refactoring of the l() function).
    See http://www.drupal.org/node/144608
  • Single output format
    Nodes are built and rendered for display in HTML. While this is often sufficient, other cases (pdf rendering, XML and JSON output, separation of attachments for RSS and mail, search indexing, etc.) all have additional requirements that can't be captured using the normal node rendering system.
    See http://www.drupal.org/node/145551
  • Custom hacks become necessary
    Due to the limitations listed above, there are numerous special case functions in core that wrap node_view() in hard coded workflows, or re-implement it with minor variations. For example:
    node_show(), which forces comment listings to display after node bodies and marks the node as being read by the current user.
    Book module implements a custom version of node_view() when creating print-friendly pages, simply to give modules a chance to modify nodes for print.
    RSS feed generation and search indexing also re-implement node_view(), in order to extract field information before the node is collapsed to HTML.
    Contrib modules are in the same position: many bypass node_view(), to avoid its hard-coded limitations.

Proposed solutions
These proposed solutions are not necessarily dependent on each other. They are definitely complimentary, as they collectively solve quite a few problems. But they can be implemented one by one as we're able to bring development resources to bear. Some -- like the ability to build nodes in a structured form, are certainly central to the plan.

  • Build the node as a structured array
    Rather than dumping properties onto the node object, we should build the node as a structured array from the start. We already have a proven format for storing structured data -- the FAPI-style #prefixed array. This allows us to append descriptive metadata onto each piece of data as necessary (source module, access control information). In the future, it's also possible to map schema data directly to the node using metadata in our SchemaAPI.
  • Pass structured data to the rendering layer
    Because we're building structured FAPI style arrays on node_load, only a few additional properties are necessary (#type and/or #theme) to make the entire node object renderable to HTML by default. While some modules will obviously need to add data at render time, or modify the default node structures for display, this is less work than rebuilding everything from scratch.
    Passing this structure to the display/theming layer -- rather than a node object with rendered HTML embedded in it -- means more flexibility at render time and less doubling of work when themers need to override the default node layout.
  • Support 'viewing styles' for nodes
    The true/false $teaser variable and the true/false $page variable are insufficient for representing the range of settings nodes are displayed in. Instead, we should support a more flexible $style parameter, with 'teaser,' 'full,' and 'feed' being the core-provided defaults.
    Rather than tagging additional parameters like $page and $links onto the build params, a second $options parameter should contain a keyed array of various optional flags to control rendering. Other modules could respond to the contents of this options array as overrides to their normal rendering behavior.
    Contrib modules should be able to expose their own styles beyond teaser, full, and feed. Modules that currently provide configurations options for the explicit teaser/full modes should instead provide configuration options for each option in the list of currently available render modes, in the same way that per-content-type options are often provided.
  • Support output formats for nodes (and other data)
    Rendering to HTML is the default output format for Drupal, naturally. We special-case certain output paths to generate RSS formatted XML, but this solution is less than ideal. The 'options' array mentioned above (to control node rendering) should support the ability to control the desired structure (HTML, Javascript, XML, PDF, PHP serialized data, etc.) and format (ATOM, RSS, JSON, PHP Array, etc.)

So. Yeah.

Feedback is welcome. I've got two fairly heavy-duty patches in the queue for Drupal 6 that address the rendering side of things, but the architectural questions are bigger than just rendering -- it's an opportunity for us to carve out a new, more robust system that will help Drupal grow into a more connected, more flexible system.

Thoughts? Talk, follow Drupal hackers!

Comments on this post will automatically be closed three months from the original post date.

Comments

That the node object

That the node object contains pre-rendered HTML has always struck me as one of the most insane and frustrating aspects of Drupal. The first two proposed changes here would certainly be welcome structural changes.

Support for multiple output formats needs more consideration and articulation, however. As someone who spends a lot of time doing theming on multiple platforms (Django, Drupal, and until recently, some custom data-driven systems), my biggest concern with the proposal is the notion that the 'options' array should be in charge of controlling the desired output format. I hope it is not a misreading or semantic quibble to suggest this use of the options array might work better as a form of hinting.

It seems likely that a module developer or even someone simply cooking up their own nodetype with CCK will not be able to anticipate all of the possible and useful output paths for their own data. Hints could suggest things like how to handle content filtering and desired field ordering, but the theme layer should reserve the right to trample any and all of those hints in the service of creating the final output. In the absence of an override, rendering could fall back to the hints, and then to defaults.

Another output path, one that might be valuable in commercial and government settings, would be OpenDocument.

Finally, and maybe this is a bit crackpot (it is past my bed time)... perhaps the theme system itself could be overhauled to handle multiple output paths by simple convention. A themer could stuff all their HTML-theming related code and templates into theme/html, PDF theming into theme/pdf and so forth, to allow fine-grained control over output of the same data in different formats. Nodes, views, and the like could be accessed with a ?format= querystring appended to the path, and each output format's template.php file could tell Drupal how to handle things like http headers.

Multiple output formats and the dangers of 'options'

You raise an interesting question, and the idea of 'hinting' information is interesting. The idea of the options array was originally to replace the possible explosion of 'Oh, and parameter 43 is the 'should we show comments below links' boolean...' scenarios. Currently, we use these parameters as indications of what sort of context the node is being built in, and/or what pieces of information it should include.

The 'node styles' patch that was in the queue for D6 did some of that -- it allowed each node style (like teaser and full and feed) to specify a set of default options. Should links be included? Should viewing the node in this style automatically flip the 'this node has been viewed' flag? etc. I think the biggest problem with that patch is that it muddies the distinction between contextual differences and structural differences. 'Output for an RSS feed' and 'Output in teaser mode' sound similar but really they're very different. 'Teaser' is a given view of a node, while 'RSS feed' is an explicit output format.

Complicating matters, of course, is the fact that many users want to have a special 'rss feed' version of the node to go with that specific output format. ;-)

The querystring solution sounds relatively elegant -- we tend to favor the clean URL concept but querystrings seem ideal for specifying nonstandard render output and formats...

Re: options

The idea of the options array was originally to replace the possible explosion of 'Oh, and parameter 43 is the 'should we show comments below links' boolean...' scenarios.

That would be pretty dang nice -- it certainly would make theming and debugging a lot nicer. My thought is simply that if the node data is being passed to the theme as proposed, can't many of the things handled in options array simply be handled by the theme at that point? If the theme has the data and the context, I would argue a sensible option would be to allow the theme itself to make decisions such as the one you mention.

The tradeoffs, I suspect, would be pushing too much logic into the theme layer (though if that logic lives in template.php and not in .tpl files, that doesn't seem so bad) and making more work for themers. More work is tricky too. I can imagine on my recent project (big site, exacting client) it would mean less work and cognitive dissonance, but it could (at least without sane defaults) mean more work for smaller scale projects

The 'node styles' patch that was in the queue for D6 did some of that.

As much as I'm working with Drupal these days, I need to follow the community more and be less of a consumer. Is the patch still in the queue or are you trying to get more of these ideas integrated prior to the freeze?

The querystring solution sounds relatively elegant -- we tend to favor the clean URL concept but querystrings seem ideal for specifying nonstandard render output and formats...

The clean url format would be cool as well. The motivation for using query strings is the notion that the clean url is basically the canonical, default view of the data, and a query string could make sense conceptually as an output modifier on the atomic unit of content, rather than appending the url.

But I was just making that part up as I wrote it last night, based on a quite successful pattern I used for a small internal Django project where we split up the template files on the filesystem in a similar way.

That would be pretty dang

That would be pretty dang nice -- it certainly would make theming and debugging a lot nicer. My thought is simply that if the node data is being passed to the theme as proposed, can't many of the things handled in options array simply be handled by the theme at that point? If the theme has the data and the context, I would argue a sensible option would be to allow the theme itself to make decisions such as the one you mention.

Kind of. The trick is that those parameters are often used to determing what should be built in the actual content of the node. Pushing that work to the theme also means that we have to build every possible permutation of the node (each size of image thumbnails, each size of JS widget, etc.) every time it's rendered, just so that the theme can selectively output the right parts.

The tradeoffs, I suspect, would be pushing too much logic into the theme layer (though if that logic lives in template.php and not in .tpl files, that doesn't seem so bad) and making more work for themers. More work is tricky too. I can imagine on my recent project (big site, exacting client) it would mean less work and cognitive dissonance, but it could (at least without sane defaults) mean more work for smaller scale projects

Yeah, One of my own goals for any refactored node system is simplicity: although it should be POSSIBLE to rearrange and reshuffle stuff at the theme layer, it should be just as easy to simply say, 'Here, print the node I was handed' and have things display properly for the current context...

The clean url format would be cool as well. The motivation for using query strings is the notion that the clean url is basically the canonical, default view of the data, and a query string could make sense conceptually as an output modifier on the atomic unit of content, rather than appending the url.

I think this is an excellent approach. While the 'clean url, append .xml to get xml or .atom to get atom, etc etc' approach sounds good I'm concerned it would cause a LOT of trickiness regarding path and url mapping...

Kind of. The trick is that

Kind of. The trick is that those parameters are often used to determing what should be built in the actual content of the node. Pushing that work to the theme also means that we have to build every possible permutation of the node (each size of image thumbnails, each size of JS widget, etc.) every time it's rendered, just so that the theme can selectively output the right parts.

That makes sense, as long as it doesn't forcibly dictate what ultimately happens in rendering more than is necessary, which is clearly the goal. I certainly hope you post updates on this blog about the further adventures of node api improvments in D6 and beyond.

Hooray viewing styles

One of the criticisms I often hear about Drupal is that so many of the sites look the same. I think a big piece of that is the limited teaser/page/feed scope themers are forced to work within if they don't want to (or don't have the programming skill to) develop their own module. Giving developers a way to write predefined viewing styles would help those people out and prevent a lot unnecessary repetition of effort.

It would be awesome, for example, to have iTunes and MediaRSS viewing styles. And then to make those available to the Views module.

Go, Jeff, go!

Yes, yes, yes, and yes. I don't think I can be any more clear than that. Your proposed solutions look excellent.

One question, from almost total ignorance, is about what this better division of structure and rendering would mean for non-node content types. Would it put more pressure on users and comments to be nodes, or be more like nodes?

I'm not sure Jeff's ideas

I'm not sure Jeff's ideas here take any step on this regard (users / nodes, comments / nodes)..
Although I guess the underlying trend could be to move towards a generic 'drupal object' API, with nodes, users, (comments ?) being 'drupal objects' - I think I grabbed the end of an IRC conversation about that the other day between jeff (eaton) and, was it fago ? Hem, CCK fields for users ?

Anyway, as I already voiced in the 'rendering styles' thread, huge +1 on these concepts. Advanced theming of CCK and Views are currently quite tedious. Display styles and flexible rendering would bring much power to these two modules.

I really hope we can start moving in that direction before code freeze :-)

Er, um...

Isn't a structured array an awful lot like a DOM tree?

I applaud the separation of form and content that this thread advocates (and that D6 and FAPI) go a long way toward implementing...

But at the end of the day, will navigating these structured arrays simply reinvent XPath? (and theming, XSLT?)

Just wondering....