Home

Lullabot

Lullabot Ideas

We know stuff. We empower you to know stuff too.

On Site Drupal Training

We'll come to you! Graduate from your own on-site courses and become Drupalistas!

A beginner's guide to caching data

Article by Jeff EatonMay 18, 2007 - 7:57pm

Building complicated, dynamic content in Drupal is easy, but it can come at a price. A lot of the stuff that makes a Web 2.0 site so cool can spell 'performance nightmare' under heavy load, thrashing the database to perform complex queries and expensive calculations every time a user looks at a node or loads a particular page.

One solution is to turn on page caching on Drupal's performance options administration page. That speeds things up for anonymous users by caching the output of each page, greatly reducing the number of DB queries needed when they hit the site. That doesn't help with logged in users, however: because page level caching is an all-or-nothing affair, it only works for the standardized, always-the-same view that anonymous users see when they arrive.

Eventually there comes a time when you have to dig in to your code, identify the database access hot spots, and add caching yourself. Fortunately, Drupal's built-in caching APIs and some simple guidelines can make that task easy.

The basics

The first rule of optimization and caching is this: never do something time consuming twice if you can hold onto the results and re-use them. Let's look at a simple example of that principle in action:

<?php
function my_module_function($reset = FALSE) {
  static
$my_data;
  if (!isset(
$my_data) || $reset) {
   
// Do your expensive calculations here, and populate $my_data
    // with the correct stuff..
 
}
  return
$my_data;
}
?>

The important part to look at in this function is the static variable named $my_data. Static variables start out empty the first time a function is called, but they keep the data they're populated with even when the function is called again. That means that we can check if the variable is already populated, and if so return it immediately without doing any more work.

This pattern appears all over the place in Drupal -- including key functions like node_load(). Calling node_load() for a particular node ID requires database hits the first time, but the resulting information is kept in a static variable for the duration of the page load. That way, displaying a node once in a list, a second time in a block, and a third time in a list of related links (for example) doesn't require three full trips to the database.

Another important feature is the use of the $reset variable. Caching is good, but occasionally you want to be sure you're getting the absolute freshest data available. Using a 'reset' variable in your function, and always performing the 'expensive' version of the function if it's set to TRUE, lets you bypass caching when you really need to.

Drupal's cache functions

You might notice that the static variable technique only stores data for the duration of a single page load. For even better performance, it's often possible to cache data in a more permanent fashion...

<?php
function my_module_function($reset = FALSE) {
  static
$my_data;
  if (!isset(
$my_data) || $reset) {
    if (!
$reset && ($cache = cache_get('my_module_data')) && !empty($cache->data)) {
     
$my_data = unserialize($cache->data);
    }
    else {
     
// Do your expensive calculations here, and populate $my_data
      // with the correct stuff..
     
cache_set('my_module_data', 'cache', serialize($my_data));
    }
  }
  return
$my_data;
}
?>

This version of the function still uses the static variable, but it adds another layer: database caching. Drupal's APIs provide three key functions you'll need to be familiar with: cache_get(), cache_set(), and cache_clear_all(). Let's look at how they're used.

After the initial check of the static variable, this function checks Drupal's cache for data stored with a particular key. If it finds it, and the $cache->data element isn't empty, it unserializes the stored data and sticks it into the $my_data variable.

If no cached version is found (or if we called the function using the $reset parameter), the function does the actual work of generating the data. Then it serializes it, and save it TO the cache so future requests will find it. The key that you pass in as the first parameter can by anything you choose, though it's important to avoid colliding with any other modules' keys. Starting the key with the name of your module is always a good idea.

The end result? A slick little function that saves time whenever it can -- first checking for an in-memory copy of the data, then checking the cache, and finally calculating it from scratch if necessary. You'll see this pattern a lot if you dig into the guts of data-intensive Drupal modules.

Keeping up to date

What happens, though, if the data that you've cached becomes outdated and needs to be recalculated? By default, cached information stays around until some module explicitly calls the cache_clear_all() function, emptying out your record. If your data is updated sporadically, you might consider simply calling cache_clear_all('my_module_data', 'cache') each time you save the changes to it. If you're caching quite a few pieces of data (perhaps versions of a particular block for each role on the site), there's a third 'wildcard' parameter:

<?php
cache_clear_all
('my_module', 'cache', TRUE);
?>

This clears out all the cache values whose keys start with 'my_module'.

If you don't need your cached data to be perfectly up-to-the-second, but you want to keep it reasonably fresh, you can also pass in an expiration date to the cache_set() function. For example:

<?php
cache_set
('my_module_data', 'cache', serialize($my_data), time() + 360);
?>

The final parameter is a unix timestamp value representing the 'expiration date' of the cache data. The easiest way to calculate it is to use the time() function, and add the data's desired lifetime in seconds. Expired entries will be automatically discarded as they pass that date.

Advanced caching

You might have noticed that cache_set()'s second parameter is 'cache' -- the name of the table that stores the default cache data. If you're storing large amounts of data in the cache, you can set up your own dedicated cache table and pass its name into the function. That will help keep your cache lookups speedy no matter what other modules are sticking into their own tables. The Views module uses that technique to maintain full control over when its cache data is cleared.

If you're really hoping to squeeze the most out of your server, Drupal also supports the use of alternative caching systems. By changing a single line in your site's settings.php file, you can point it to different implementations of the standard cache_set(), cache_get(), and cache_clear_all() functions. File-based caching, integration with the open source memcached project, and other approaches are all possible. As long as you've used the standard Drupal caching functions, your module's code won't have to be altered.

A few caveats

Like all good things, it's possible to overdo it with caching. Sometimes, it just doesn't make sense -- if you're looking up a single record from a table, saving the result to a database cache is silly. Using the Devel module is a good way to spot the functions where caching will pay off: it can log the queries that are used on your site and highlight the ones that are slow, or the ones that are repeated numerous times on each page.

Other times, the data you're using will just be a bad fit for the standard caching system. If you need to join cached data in SQL queries, for example, cache_set()'s practice of string data as a serialized string will be a problem. In those cases, you'll need to come up with a solution that's specific to your module. VotingAPI maintains one table full of individual votes and another table full of calculated results (averages, sums, etc.) for quick joining when sorting and filtering nodes.

Finally, it's important to remember that the cache is not long term storage! Since other modules can call cache_clear_all() and wipe it out, you should never put something into it if you can't recalculate it again using the original source data.

Go west, young Drupaler!

Congratulations: you now have a powerful set of tools to speed up your code! Go forth, and optimize.

Comments

RobRoy (not verified) on May 18, 2007 - 9:34pm

The colonoscopy

One minor thing I do, is when setting wildcard TRUE to make cache_clear_all() a right-hand match (in the case where you have lots of my_module:foo, my_module:bar, etc.), it's probably better to include the colon too just in the case there is a module storing data in 'my_modulename'.

This won't clear 'my_modulename' data by accident:

<?php
cache_clear_all
('my_module:', 'cache', TRUE);
?>

Farsheed (not verified) on May 19, 2007 - 1:36am

Great post. The details of

Great post. The details of caching beyond static variables has been a mystery to me for a while now. At least now I know it's not some voodoo magic. :)

Robert Douglass on May 19, 2007 - 4:09am

Drupal 6 is slightly different (better)

Look at this line from Jeff's code:

<?php

cache_set
('my_module_data', 'cache', serialize($my_data), time() + 360);
?>

Since $my_data is a complex data type (array or object), he needs to serialize it before sending it into the cache. Likewise, when getting that data from the cache, the result has to be unserialized before it is useful. This is a facet of Drupal 5. Drupal 6 hides this implementation detail inside of cache_get and cache_set so that you can just send $my_data in there and serialization will be done if needed. So the appropriate Drupal 6 code would be:

<?php
cache_set
('my_module_data', 'cache', $my_data, time() + 360);
?>

frando (not verified) on May 19, 2007 - 8:43am

... and Drupal 6 is even better ;)

... and Drupal 6 is even better ;)
We changed the argument ordering for cache_set, so that the optional table argument comes after the data.

So the real Drupal 6 code would be:

<?php
cache_set
('my_module_data', $my_data, 'cache', time() + 360);
?>

Thanks to that, for simple caching that doesn't need special expire settings, you can just do:

<?php
cache_set
('my_module_data', $my_data);
?>

yaph (not verified) on May 22, 2007 - 8:12am

Thanks Jeff for this

Thanks Jeff for this enlighting article. Now I am ready to speed up some modules.

Tomasz Gorski (not verified) on May 31, 2007 - 11:23am

article

Thank You for another very interesting article Jeff. It's really good written and I fully agree with You on main issue, btw. I must say that I really enjoyed reading all of Your posts. It’s interesting to read ideas, and observations from someone else’s point of view… it makes you think more. So please try to keep up the great work all the time. Greetings

Jonny (not verified) on August 12, 2007 - 4:05pm

Excellent

This is great, thanks Jeff - I implemented this in a piece of new code today and was amazed at how easy it is to get up and running. I've got lots of ideas now of how I can speed up other parts of the site for logged in users - cheers!

szachy z metalu (not verified) on April 25, 2008 - 11:21am

Searching through websites I

Searching through websites I came across this article. I think it is fairly interesting considering the spam crusing about the net. Good Job.

Manuel (not verified) on August 18, 2007 - 4:13am

Not bad, I will implement

Not bad, I will implement this soon, since my site has problems with to many (db) visitors lately.

Anonymous (not verified) on November 5, 2007 - 9:01pm

Block Cache Module

For other people may interest with another module:

http://drupal.org/project/blockcache

Richie (not verified) on November 6, 2007 - 11:12am

Really Useful Info

I'm glad I stumbled upon this article. It has demystified the black magic that is drupals caching system. I will attempt to use some of this on the next site I build.

Dla Dziewczyn (not verified) on April 8, 2008 - 12:07pm

Thank you for your amazing

Thank you for your amazing site.

Chris (not verified) on November 25, 2007 - 11:08am

Cache

Hi,

Since we installed the boost module the sites do not appear to be updating.

For example if you are logged in as a user you can see the new posts.

If you are not logged in you see the posts from the previous day. It appears that the site is refreshing every 24 hours.

We ahve set the cache setting to zero to work with the boost module, but this appears to cause problems.

The site is www.frostfirepulse.com

regards
Chris

????????? (not verified) on December 18, 2007 - 2:41pm

"For other people may

"For other people may interest with another module:

http://drupal.org/project/blockcache"

Thanx!

United States freebies (not verified) on March 17, 2008 - 2:54am

thank you wery mach

thank you wery mach

Mike Cantelon (not verified) on March 19, 2008 - 1:48pm

Caching Paging Queries

One thing which tripped me up for awhile... if you're caching the result from a pager query, note that you should also cache the resulting contents of the pager_page_array, pager_total, and pager_total_items global variables as well. Otherwise when you attempt to show paging links via theme_pager you'll get nothing.

Anonymous (not verified) on April 8, 2008 - 7:37am

Riddle me this if you will

Riddle me this if you will ;-)

- From the following, we generate a unique id, and use it as cache key.

Without relying on metadata outputted with the data... how would we keep the uid on the second hit?

Every refresh will regenerate another id, would it not?

How could the first part of the condition know about the second part?

if(!isset($fx_data) || $reset){
   if (!$reset && ($cache = cache_get($_UID)) && !empty($cache->data)) {
      $fx_data = unserialize($cache->data);
      $_RAW = $fx_data;
   }
   else{
      $_UID = uniqid();
      cache_set($_UID, serialize($MyValue), 'cache');
      $_RAW = unserialize(cache_get($_UID)->data);
   }
}

April 8, 2008 - 8:22pm Jeff Eaton

No good solution

That's a somewhat awkward approach to the problem, actually -- since I don't know the context in which the code is being used I can't really say how to solve it more effectively. Do you really need a *random* key for the $_UID variable? In most situations you'll be retrieving information based on some criteria -- the time, the current path, the logged in user, etc -- and that information can be used to create a unique but predictable key to store the data.

Richard Burford (not verified) on May 14, 2008 - 9:18am

Excellent article

I'm a little late to this party but I'd like to say I found your article to be concise and well written. This is a great introduction to caching and the technique you describe seems a perfect fit for my api_helper module!

Thanks ;)

prezenty (not verified) on May 28, 2008 - 12:33pm

This is interesting article,

This is interesting article, I did not it think that it yes. Interesting it knew persons about this how much. Sorry if I wrote bad there now my English is novice and I do not it write yet good.

Anthony Cartmell (not verified) on June 3, 2008 - 9:28am

Turning off page caching for anon forms

I had a problem with the page cache, when mixed with (multi-step) forms for anon users that had the user's chosen values stored in the session and subsequently used as input defaults.

Since the initial form display is done with a GET, the form, including whatever default values were present at the time, was being saved into the page cache. This meant that on returning to the form with a GET, anon users would see the old default values, and not those in their current session.

So I needed to switch off the page cache just for these form pages. Thanks to a helpful comment in the issues list of the Protected Node module http://drupal.org/node/233979, I too found that setting $GLOBALS['conf']['cache'] was the answer. This is the variable that variable_get('cache',0) uses, but I modify it directly as calling variable_set('cache',false) would set it off for all pages.

Since I had several forms to prevent caching for, I added a module form_alter() function to set $GLOBALS['conf']['cache'] to false if the $form_id was found in an array of form IDs.

So now I have page caching set to On, but I can turn off caching for individual pages. Much neater than caching the page and then deleting it from the cache later.

Golebie (not verified) on June 9, 2008 - 8:38am

Hello. I recently saw an

Hello.
I recently saw an article written by a professor, which confirm that what is written here. I agree with that.
:)

Rollo (not verified) on June 12, 2008 - 8:06am

tada

Very useful advices. I think I will search more about this on the Net, Guys. Many thanks

Przepisy kulinarne (not verified) on July 22, 2008 - 1:55pm

Dobra kuchnia

Thank you for this article, I appreciate it even more because it is not so common to find those kind of things on the net. Thnx!

Prezenty (not verified) on July 24, 2008 - 5:34am

Hello I'm glad I stumbled

Hello

I'm glad I stumbled upon this article. It has demystified the black magic that is drupals caching system. I will attempt to use some of this on the next site I build.

Amnon Levav (not verified) on July 24, 2008 - 7:14pm

Manual cache expiry

The default arg for drupal's cache_set function is CACHE_PERMANENT.
(ref: http://api.drupal.org/api/function/cache_set/5 )

So how does one sets manual expiry timeout for all content?

Or should I run cache_clear_all on cron?

Ari (not verified) on July 29, 2008 - 4:01pm

can you give a real life

can you give a real life example of when to use my_module_function(), with reset too?

Like is this a good example... like your module pretend makes a page /mystats/nodes which keeps track of node statistics ... and for every 50 new nodes that are inserted (kept track in a variable in the variables table), then call my_module_function(TRUE) (or if 50 new nodes havent been inserted yet, then with FALSE param), to get the data for our function to format and pass to template. Then in the my_module_function(), either get the data from the database in the unserialize() line....
or
calculate new stats and store the new data in the cache (in the cache_set() line) anc set that variable reset back to 0...

just wondering, shouldnt $my_data be set in the latter part as well? so it will be stored in cache, but need to return data back to the module to format/etc

mjwest10 (not verified) on August 3, 2008 - 5:12pm

This is good to know

This is good to know. I was wondering how the related links module handles this cacheing? For example I have a Visual Basic tutorial site that uses related links within the actual tutorial pages. To see an example you can check out this: Visual Basic Oracle tutorial. At the bottom I have related links to other database guides and tutorials on my site. I am using the module to do this. If I have caching turned on does it make hits for all those nodes? How would I check this out?

jayjaytheluffy (not verified) on September 9, 2008 - 4:12am

Nice article but I have a

Nice article but I have a question:
How does drupal make sure or check if the data being cached is not corrupted?
My boss just assigned me this task after I successfully cache tagadelic tag cloud in my page. I'm really confused and doesn't know the answer.
Please help me.

Anonymous (not verified) on September 11, 2008 - 2:55am

szafy bhp

Thanks for the great advice about blog posting services..it is so hard to get through all the false info out there!

Odlewnia (not verified) on September 24, 2008 - 4:07am

This is interesting article,

This is interesting article, I did not it think that it yes.

Angarsk (not verified) on October 11, 2008 - 10:59am

I wonder

What is faster for caching? variable_get updated on cron or cache_get?

Bilrace (not verified) on October 11, 2008 - 4:35pm

nice articel

Sweet. Looks like its not a hard thing to speed up some database questions with cached info. Thanx mate

caine (not verified) on October 19, 2008 - 1:18pm

No one can drive us crazy

No one can drive us crazy unless we give them the keys. Doug Horton

Anonymous (not verified) on October 22, 2008 - 6:27pm

How can I use cache function

How can I use cache function in Drupal 6 for "funny story of the day"? I am obsessed with this question for long time. Please keep me informed by TONYLIUH AT GMAIL.COM.
Thank you in advance.

Chris Cohen (not verified) on October 28, 2008 - 10:35am

Thanks

Thanks. This was a very useful explanation of a powerful Drupal feature.

przepisy kulinarne (not verified) on April 24, 2009 - 2:13pm

Very useful advices. I think

Very useful advices. I think I will search more about this on the Net, Guys. Many thanks

aac (not verified) on July 2, 2009 - 11:02am

A very nice article about

A very nice article about drupal caching system!

Miro Dietiker (not verified) on July 28, 2009 - 5:35pm

Creating cache DB Tables & Cache introduction into comment_cck

This article was a great introduction.
For introduction of a former missing cache in modules, you need to add schema information and update hooks.

A real caching patch (hopefully) containing everything needed for a real module could be found at:
http://drupal.org/node/533218
.install update hook, schema providing, cache get/set, cache clearing

borgo (not verified) on August 25, 2009 - 7:19am

further reading

Great intro, thanks! I can't find any further reading (web / book) on this though. Thanks.

vacilando (not verified) on October 20, 2009 - 3:17pm

On logical operator precedence

It took me ages to figure out why my code did not fetch cache.

Similar to this article, I had used this condition:
if ($cache = cache_get($mycacheid) && !empty($cache->data)) {

What I eventually found was that although both parts of the condition were true, the whole was returning false!

So I replaced '&&' with 'and' and it works just perfectly.
if ($cache = cache_get($mycacheid) and !empty($cache->data)) {

I am aware of the precedence of the first logical operator (see http://be2.php.net/manual/en/language.operators.logical.php). But why would it matter here?
Also, the above code must have worked on other systems. Is it a question of PHP version? Or why would it not work on my server?

FYI, I am on PHP 5.2.6 and Drupal 6.14.

vacilando (not verified) on October 20, 2009 - 3:34pm

Solution...

Shoot, it was so simple... '&&' has higher precedence even than '='. So I should have enclosed the $cache = cache_get($mycacheid) in brackets!

vacilando (not verified) on October 20, 2009 - 3:36pm

Solution...

Shoot, it was so simple... '&&' has higher precedence even than '='. So I should have enclosed the $cache = cache_get($mycacheid) in brackets!

Redrigo (not verified) on November 21, 2009 - 4:22pm

Thanks

Thanks For informations in your site...is very useful for me...

About this 'bot

Jeff Eaton

Jeff Eaton is a long-time web developer. He's been designing, administering, and implementing web projects since he pieced together his first HTML file in 1996. He's built ecommerce sites for florists, helped implement enterprise web systems for multinational corporations, and juggled web architectures from legacy Perl to ASP.Net. (With some Windows desktop development thrown in for good measure...) Yes...

more

Recent

Drupal Voices 160: Moshe Weitzman on Page Rendering in Drupal 7

Podcast 9.02.2010

Drupal Voices 159: John Albin Wilkins on Drupal 7 Theming

Podcast 9.01.2010

Drupal Voices 158: Emma Jane Hogbin on PHP for Designers

Podcast 8.31.2010

Command Line Basics: More Editing with Vi/Vim

Video 8.31.2010

Lullabot's Back to School Sale

Blog 8.30.2010

Popular

Drupal Voices 160: Moshe Weitzman on Page Rendering in Drupal 7

Podcast 9.02.2010

Drupal Voices 159: John Albin Wilkins on Drupal 7 Theming

Podcast 9.01.2010

Drupal Voices 158: Emma Jane Hogbin on PHP for Designers

Podcast 8.31.2010

Installing Memcached on RedHat or CentOS

Article 8.20.2009

Photo galleries with Views Attach

Article 6.01.2009
 
  • Home
  • Services
  • Events
  • Ideas
  • Store

Connect the Bots:

Twitter Facebook YouTube blip.tv All Posts Newsletter
  • Ideas
  • Blog
  • Podcasts
  • Videos
  • About
  • Contact
  • Jobs
  • Services
    • Training
  • Events
    • Training Workshops
    • Other Events
    • Conferences
    • Calendar
  • Products
    • Videos
    • Books
    • Swag
  • Ideas
    • Blog
    • Podcast
    • Videos
  • About
    • Philosophy
    • Team
    • Presskit
  • Contact
    • General
    • Work Inquiries
    • Mailing List