by Jeff Eaton

Module Monday: Link Checker

In honor of Module Monday's post-holiday return, we're taking a look at a problem that plagues many sites: dead links. If you maintain content that contains links to other sites, it's inevitable that some of them will ultimately go bad. Domains expire, sites go down, articles are unpublished, blogs migrate to a new CMS and change their URL patterns... and eventually you're left with a dusting of broken URLs in your otherwise pristine content. That's where LinkChecker comes in. It's a module for Drupal 6 and 7 that scans your content for busted links, tells you what nodes need fixing, and -- optionally -- tidies up the ones it can fix automatically.

Screenshot of administration screen

LinkChecker runs at cron time on your Drupal site, churning its way through a bite-size number of nodes each time and scanning them for URLs. It pings those URLs, makes sure a working web page is still there, and moves on. If not, it logs the specific HTTP error for later reference and moves on. It's handy, but the devil is always in the details -- and LinkChecker is designed to handle all of them with aplomb. Do you need to white-list certain content types to ensure they aren't scanned? No problem. Need to make sure that dummy URLs like "" don't get checked and generate false positives? It handles that, too. Need to hunt for urls contained in dedicated CCK or FieldAPI Link fields, in addition to text fields and node bodies? No problem. Want to check image links that reside on remote servers, or check URLs that are generated by Drupal input filters even though they don't appear in the "raw" text of the node? LinkChecker allows a site administrator to toggle all of those options and more.

The module can correct kinds of errors automatically (301 redirects, for example) but it's up to the site's administrator to check the report that it generates for news about broken links. There, each node with busted links can be reviewed and edited.

Screenshot of resulting change to site

LinkChecker is a tremendously useful tool, and its smorgasbord of configuration options means that it can deal with lots of oddball edge cases. One missing option that would still be welcome? An easy way to export the "busted links" report to a text file for review. For extremely large sites, dedicated third-party web crawlers with link checking functions may be a more robust solution, but for Drupal admins who need a hand keeping their sites tidy, it's a lifesaver.