Want to get Lullabot article, videocast, and podcast announcements delivered right to your in-box?
Let us know your email address (we won't share it) and we'll let you know when anything exciting happens.
Drupal, duplicate content, and you
Does Google's "duplicate content penalty" harm Drupal sites? No! Here's why.
For years, Drupal has enjoyed a solid reputation as a search engine friendly CMS. It generates relatively clean, standards-compliant HTML out of the box; syncs up the important TITLE tag with semantically useful H1 and H2 tags in the body of each page; and provides short, human-readable URLs with plentiful options for customization. (Anecdotal evidence: several years back, I wrote a post on my Drupal-powered blog that mentioned the name of the company I worked for. Within two weeks, my blog post ranked higher than the company's own web site on Google.)
Recently, I've witnessed a number of discussions where people expressed concern about the way Drupal generates the human-readable URLs that help make it Google-friendly. In particular, they were worried about Google's dreaded Duplicate Content Penalty, a system designed to keep spammers from flooding Google with the same content at dozens (or hundreds!) of URLs. There's a lot of confusion floating around, so for the geeks in the crowd (and the not-so-geeky interested in learning how things work behind the scenes), I thought it would be useful to give a guided tour of how Drupal manages and generates URLs.
"Classic" Web Problems, Solved
A lot of energy in the Drupal world goes towards solving complex problems: giving administrators ways to build publishing workflows without writing code, integrating with cool new APIs, automatically translating site content into Klingon... You know. The usual.
With all of that energy focused on complex architectural problems, it's easy to lose sight of the simple solutions that Drupal provides for really common "classic" web problems. This really hit home the other week as I sifted through an old Zip disk with archives of sites I'd built for clients in the heady days of the late 90s. One by one, I started ticking off requests my clients had made that today's site-builders can solve in minutes with Drupal modules -- no wacky configuration, no complicated recipes. Just a simple, "Yes!" when a client says, "Can you...?"
Beginning with Drupal 6 and PostgreSQL on OS X 10.5 Leopard
PostgreSQL (often called Postgres) is the Other Major Database for Drupal. It has one of the same strange characteristics that Drupal does: its adherents swear that it is the best thing since sliced bread. In this article, we'll examine how to get PostgreSQL installed, then get a Drupal 6 installation running on top of it.
Assumptions: XCode Tools from your Mac OS X DVD (also available from Apple Developer Connection) are installed. Your copy of PHP has support for Postgres (I used MAMP and it worked out of the box).
Drupal Community Philosophies
In presentations I usually point out that Drupal is three things:
- A content management system: The forms you fill out, the buttons you click, and the content you work with. The stuff you interact with every day while you manage your site.
- A content management framework: The low-level APIs that let you extend and modify Drupal to make it do everything from ratings to image galleries to your dishes.
- A community: The thousands of documentation writers, developers, testers, support providers, designers, and evangelists from all over the world, working together to make Drupal a better platform every single day.
It's this third point I'd like to talk a little more about today, by providing some insights into the general Drupal community philosophies that guide our interactions with one another, and as a result, the growth and direction of the Drupal project as a whole.
The Drupal community has several core philosophies, some of which are documented, others of which you just kind of pick up from spending enough time watching various interactions on the forums, on issue queues, and on the mailing lists. Here's a brief guide, for the uninitiated.
Avoiding the Template.php of Doom (or, Overriding Theme Functions in Modules)
Drupal's theming system offers developers and designers a flexible way to override default HTML output when specific portions of the page are rendered. Everything from the name of the currently logged in user to the HTML markup of the entire page can be customized by a plugin "theme".
Unfortunately, this system can be its own worst enemy. Themes are very powerful, but in many cases they're the only place where specific output can be changed without hacking core. Because of this, themes on highly customized production sites can easily turn into code-monsters, carrying the weight of making 'Drupal' look like 'My Awesome Site.'
This can make maintenance difficult, and it also makes sharing these tweaks with other Drupal developers tricky. In fact, some downloadable modules also come with instructions on how to modify a theme to 'complete' the module's work. Wouldn't it be great if certain re-usable theme overrides could be packaged up and distributed as part of any Drupal? As it turns out, that is possible. In this article, we'll be exploring two ways to do it: a tweaky, hacky approach for Drupal 5, and a clean and elegant approach that's only possible in Drupal 6.
The Open Security Model, Drupal and ExpressionEngine on Security
I recently evaluated ExpressionEngine for viability as a CMS replacement for Drupal. ExpressionEngine (EE) is a commercial CMS tool built on PHP and mySQL by EllisLab. A client of ours was attracted to its clean interface, built-in features, and expandability.
I was impressed by the well-designed UI and the flexibility of EE's templating engine. There are definitely lessons the Drupal community can learn from their attention to detail. In fact, as I explored the EE support forums, I discovered a great deal of antagonism towards Drupal -- to my surprise, it wasn't based on features or learning curve, but on the idea that Drupal is insecure.
In the forums, comments such as this were common whenever Drupal or other CMS systems were brought up for comparison:
Some reasons why one might not want to use Drupal?
[2/5] Drupal Acidfree Module “node titles” SQL Injection Vulnerability
[2/5] Drupal Unspecified Spoofing Weakness and Cross-Site Scripting
[2/5] Drupal Project Issue Tracking Module Multiple Vulnerabilities
[2/5] Drupal Project Module Script Insertion Vulnerability
[4/5] Drupal Comment Preview Arbitrary Code Execution
Module-In-A-Box: We Built Admin Tools So You Don't Have To
Building a Drupal module from scratch can be remarkably simple -- just create an .info file, create a .module file, then implement a few functions like hook_menu and hook_nodeapi. In no time, you've got a module up and running, leveraging Drupal's APIs and adding functionality to your site.
The problem
Unfortunately, things can get a bit more complicated if your module needs to store and maintain its own collection of data. The Custom Links module, for example, allows users to add clickable links to the bottom of each node on a Drupal site. While the code to actually add the links to each node is only a few dozen lines of PHP, it takes a few hundred lines of code to store and manage the information about those links. The module needs to create a database table to store its records, provide management pages so an admin can add new links, manage permissions, handle adding and editing records, request confirmation when administrators delete a record, and so on.
Is that site running Drupal?
Various attempts at "fingerprinting" a Drupal site have been tried in the past, none of which are completely fool-proof.
These range from *super* easy stuff like checking for CHANGELOG.txt to checking the source for a reference to "drupal.css" (Drupal 4.7) to checking for common paths like taxonomy/term/1, and /user, (which might be aliased to something else with something like Pathauto/Path Redirect module), and so on.
However, since Drupal 4.6, there's a super geeky trick you can use to fingerprint a Drupal site that works 90% of the time.
- Get Firefox.
- Get the Live HTTP Headers extension.
- After restarting Firefox, click Tools > Live HTTP Headers. This'll pop up a little window to the side.
- Visit a website you suspect of being Drupalish. You know, like http://drupal.org/ (except, you know, I bet they're running WordPress...).
- Highlight the Live HTTP headers window and type "exp", looking for the following in the output:
Expires: Sun, 19 Nov 1978 05:00:00 GMT
You know, like so!
Update 2008-May-31 Or! For you command-line junkies out there, check out TBarregren's helpful bash script which allows you to just do ./is-drupal www.lullabot.com - Awesome!
By the way, this date has special significance in the Drupal community. Anyone know why? ;)
Hat tip to chx for this trick. :)
Theming Best Practices (Garland Gets a Cleanup)
Yesterday Garland got a long-overdue update: The page.tpl.php file was updated to use best practices. Now we can finally open up Garland in a workshop scenario and not have to use it as example of the bad practices within a .tpl.php file. This article applies to Drupal 6 and higher, though the theming principles apply to all versions of Drupal.
What's this about best practices? Let's compare the before and after of a few of the improvements. Each of the items below are extremely common things you can do to keep your .tpl.php files clean.
Modifying Forms in Drupal 5 and 6
Drupal has a lot of forms to fill out and not all of them may be just the way you want or need them to be. Modifying forms is a topic that is often met with groans but once you understand the two methods to accomplish the task and the basic, underlying concepts, it really isn't that hard to do at all. You'll be a form-modifying, input-customizing wiz in no time. This article will briefly discuss what's going on and then mainly focus on showing working examples for both methods in Drupal 5 and 6. You should be comfortable creating a new function, looking at arrays and a having at least a passing understanding of the Forms API is real handy. Also note that in the examples below I have them wrapped in php tags but you should not include those if you copy/paste. That is just so it looks nice and clear for the article.
Deciding to make the change in the theme or a module
So there are two methods for altering form output in Drupal, one at the theme layer and the other through a custom module. Changes to the HTML can be accomplished with either method so most people will use the method they are more comfortable with already; themers use the theme and developers use a module. There are two situations however where you will want to use a module rather than a theme:
- Changing functionality of a form (e.g. adding new validation rules or submission actions) can only happen in a module.





