by Jeff Eaton on May 18, 2007 // Short URL

A beginner's guide to caching data

Building complicated, dynamic content in Drupal is easy, but it can come at a price. A lot of the stuff that makes a Web 2.0 site so cool can spell 'performance nightmare' under heavy load, thrashing the database to perform complex queries and expensive calculations every time a user looks at a node or loads a particular page.

One solution is to turn on page caching on Drupal's performance options administration page. That speeds things up for anonymous users by caching the output of each page, greatly reducing the number of DB queries needed when they hit the site. That doesn't help with logged in users, however: because page level caching is an all-or-nothing affair, it only works for the standardized, always-the-same view that anonymous users see when they arrive.

Eventually there comes a time when you have to dig in to your code, identify the database access hot spots, and add caching yourself. Fortunately, Drupal's built-in caching APIs and some simple guidelines can make that task easy.

The basics

The first rule of optimization and caching is this: never do something time consuming twice if you can hold onto the results and re-use them. Let's look at a simple example of that principle in action:

<?php
function my_module_function($reset = FALSE) {
  static
$my_data;
  if (!isset(
$my_data) || $reset) {
   
// Do your expensive calculations here, and populate $my_data
    // with the correct stuff..
 
}
  return
$my_data;
}
?>

The important part to look at in this function is the static variable named $my_data. Static variables start out empty the first time a function is called, but they keep the data they're populated with even when the function is called again. That means that we can check if the variable is already populated, and if so return it immediately without doing any more work.

This pattern appears all over the place in Drupal -- including key functions like node_load(). Calling node_load() for a particular node ID requires database hits the first time, but the resulting information is kept in a static variable for the duration of the page load. That way, displaying a node once in a list, a second time in a block, and a third time in a list of related links (for example) doesn't require three full trips to the database.

Another important feature is the use of the $reset variable. Caching is good, but occasionally you want to be sure you're getting the absolute freshest data available. Using a 'reset' variable in your function, and always performing the 'expensive' version of the function if it's set to TRUE, lets you bypass caching when you really need to.

Drupal's cache functions

You might notice that the static variable technique only stores data for the duration of a single page load. For even better performance, it's often possible to cache data in a more permanent fashion...

<?php
function my_module_function($reset = FALSE) {
  static
$my_data;
  if (!isset(
$my_data) || $reset) {
    if (!
$reset && ($cache = cache_get('my_module_data')) && !empty($cache->data)) {
     
$my_data = unserialize($cache->data);
    }
    else {
     
// Do your expensive calculations here, and populate $my_data
      // with the correct stuff..
     
cache_set('my_module_data', 'cache', serialize($my_data));
    }
  }
  return
$my_data;
}
?>

This version of the function still uses the static variable, but it adds another layer: database caching. Drupal's APIs provide three key functions you'll need to be familiar with: cache_get(), cache_set(), and cache_clear_all(). Let's look at how they're used.

After the initial check of the static variable, this function checks Drupal's cache for data stored with a particular key. If it finds it, and the $cache->data element isn't empty, it unserializes the stored data and sticks it into the $my_data variable.

If no cached version is found (or if we called the function using the $reset parameter), the function does the actual work of generating the data. Then it serializes it, and save it TO the cache so future requests will find it. The key that you pass in as the first parameter can by anything you choose, though it's important to avoid colliding with any other modules' keys. Starting the key with the name of your module is always a good idea.

The end result? A slick little function that saves time whenever it can -- first checking for an in-memory copy of the data, then checking the cache, and finally calculating it from scratch if necessary. You'll see this pattern a lot if you dig into the guts of data-intensive Drupal modules.

Keeping up to date

What happens, though, if the data that you've cached becomes outdated and needs to be recalculated? By default, cached information stays around until some module explicitly calls the cache_clear_all() function, emptying out your record. If your data is updated sporadically, you might consider simply calling cache_clear_all('my_module_data', 'cache') each time you save the changes to it. If you're caching quite a few pieces of data (perhaps versions of a particular block for each role on the site), there's a third 'wildcard' parameter:

<?php
cache_clear_all
('my_module', 'cache', TRUE);
?>

This clears out all the cache values whose keys start with 'my_module'.

If you don't need your cached data to be perfectly up-to-the-second, but you want to keep it reasonably fresh, you can also pass in an expiration date to the cache_set() function. For example:

<?php
cache_set
('my_module_data', 'cache', serialize($my_data), time() + 360);
?>

The final parameter is a unix timestamp value representing the 'expiration date' of the cache data. The easiest way to calculate it is to use the time() function, and add the data's desired lifetime in seconds. Expired entries will be automatically discarded as they pass that date.

Advanced caching

You might have noticed that cache_set()'s second parameter is 'cache' -- the name of the table that stores the default cache data. If you're storing large amounts of data in the cache, you can set up your own dedicated cache table and pass its name into the function. That will help keep your cache lookups speedy no matter what other modules are sticking into their own tables. The Views module uses that technique to maintain full control over when its cache data is cleared.

If you're really hoping to squeeze the most out of your server, Drupal also supports the use of alternative caching systems. By changing a single line in your site's settings.php file, you can point it to different implementations of the standard cache_set(), cache_get(), and cache_clear_all() functions. File-based caching, integration with the open source memcached project, and other approaches are all possible. As long as you've used the standard Drupal caching functions, your module's code won't have to be altered.

A few caveats

Like all good things, it's possible to overdo it with caching. Sometimes, it just doesn't make sense -- if you're looking up a single record from a table, saving the result to a database cache is silly. Using the Devel module is a good way to spot the functions where caching will pay off: it can log the queries that are used on your site and highlight the ones that are slow, or the ones that are repeated numerous times on each page.

Other times, the data you're using will just be a bad fit for the standard caching system. If you need to join cached data in SQL queries, for example, cache_set()'s practice of string data as a serialized string will be a problem. In those cases, you'll need to come up with a solution that's specific to your module. VotingAPI maintains one table full of individual votes and another table full of calculated results (averages, sums, etc.) for quick joining when sorting and filtering nodes.

Finally, it's important to remember that the cache is not long term storage! Since other modules can call cache_clear_all() and wipe it out, you should never put something into it if you can't recalculate it again using the original source data.

Go west, young Drupaler!

Congratulations: you now have a powerful set of tools to speed up your code! Go forth, and optimize.

Jeff Eaton

Senior Drupal Architect

Want Jeff Eaton to speak at your event? Contact us with the details and we’ll be in touch soon.

Comments

RobRoy

The colonoscopy

One minor thing I do, is when setting wildcard TRUE to make cache_clear_all() a right-hand match (in the case where you have lots of my_module:foo, my_module:bar, etc.), it's probably better to include the colon too just in the case there is a module storing data in 'my_modulename'.

This won't clear 'my_modulename' data by accident:

<?php
cache_clear_all
('my_module:', 'cache', TRUE);
?>
Reply

Farsheed

Great post. The details of

Great post. The details of caching beyond static variables has been a mystery to me for a while now. At least now I know it's not some voodoo magic. :)

Reply

robert

Drupal 6 is slightly different (better)

Look at this line from Jeff's code:

<?php

cache_set
('my_module_data', 'cache', serialize($my_data), time() + 360);
?>

Since $my_data is a complex data type (array or object), he needs to serialize it before sending it into the cache. Likewise, when getting that data from the cache, the result has to be unserialized before it is useful. This is a facet of Drupal 5. Drupal 6 hides this implementation detail inside of cache_get and cache_set so that you can just send $my_data in there and serialization will be done if needed. So the appropriate Drupal 6 code would be:

<?php
cache_set
('my_module_data', 'cache', $my_data, time() + 360);
?>
Reply

frando

... and Drupal 6 is even better ;)

... and Drupal 6 is even better ;)
We changed the argument ordering for cache_set, so that the optional table argument comes after the data.

So the real Drupal 6 code would be:

<?php
cache_set
('my_module_data', $my_data, 'cache', time() + 360);
?>

Thanks to that, for simple caching that doesn't need special expire settings, you can just do:

<?php
cache_set
('my_module_data', $my_data);
?>
Reply

Tomasz Gorski

article

Thank You for another very interesting article Jeff. It's really good written and I fully agree with You on main issue, btw. I must say that I really enjoyed reading all of Your posts. It’s interesting to read ideas, and observations from someone else’s point of view… it makes you think more. So please try to keep up the great work all the time. Greetings

Reply

Jonny

Excellent

This is great, thanks Jeff - I implemented this in a piece of new code today and was amazed at how easy it is to get up and running. I've got lots of ideas now of how I can speed up other parts of the site for logged in users - cheers!

Reply

Richie

Really Useful Info

I'm glad I stumbled upon this article. It has demystified the black magic that is drupals caching system. I will attempt to use some of this on the next site I build.

Reply

Chris

Cache

Hi,

Since we installed the boost module the sites do not appear to be updating.

For example if you are logged in as a user you can see the new posts.

If you are not logged in you see the posts from the previous day. It appears that the site is refreshing every 24 hours.

We ahve set the cache setting to zero to work with the boost module, but this appears to cause problems.

The site is www.frostfirepulse.com

regards
Chris

Reply

Mike Cantelon

Caching Paging Queries

One thing which tripped me up for awhile... if you're caching the result from a pager query, note that you should also cache the resulting contents of the pager_page_array, pager_total, and pager_total_items global variables as well. Otherwise when you attempt to show paging links via theme_pager you'll get nothing.

Reply

Anonymous

Riddle me this if you will

Riddle me this if you will ;-)

- From the following, we generate a unique id, and use it as cache key.

Without relying on metadata outputted with the data... how would we keep the uid on the second hit?

Every refresh will regenerate another id, would it not?

How could the first part of the condition know about the second part?

if(!isset($fx_data) || $reset){
   if (!$reset && ($cache = cache_get($_UID)) && !empty($cache->data)) {
      $fx_data = unserialize($cache->data);
      $_RAW = $fx_data;
   }
   else{
      $_UID = uniqid();
      cache_set($_UID, serialize($MyValue), 'cache');
      $_RAW = unserialize(cache_get($_UID)->data);
   }
}
Reply

eaton

No good solution

That's a somewhat awkward approach to the problem, actually -- since I don't know the context in which the code is being used I can't really say how to solve it more effectively. Do you really need a *random* key for the $_UID variable? In most situations you'll be retrieving information based on some criteria -- the time, the current path, the logged in user, etc -- and that information can be used to create a unique but predictable key to store the data.

Reply

Richard Burford

Excellent article

I'm a little late to this party but I'd like to say I found your article to be concise and well written. This is a great introduction to caching and the technique you describe seems a perfect fit for my api_helper module!

Thanks ;)

Reply

prezenty

This is interesting article,

This is interesting article, I did not it think that it yes. Interesting it knew persons about this how much. Sorry if I wrote bad there now my English is novice and I do not it write yet good.

Reply

Anthony Cartmell

Turning off page caching for anon forms

I had a problem with the page cache, when mixed with (multi-step) forms for anon users that had the user's chosen values stored in the session and subsequently used as input defaults.

Since the initial form display is done with a GET, the form, including whatever default values were present at the time, was being saved into the page cache. This meant that on returning to the form with a GET, anon users would see the old default values, and not those in their current session.

So I needed to switch off the page cache just for these form pages. Thanks to a helpful comment in the issues list of the Protected Node module http://drupal.org/node/233979, I too found that setting $GLOBALS['conf']['cache'] was the answer. This is the variable that variable_get('cache',0) uses, but I modify it directly as calling variable_set('cache',false) would set it off for all pages.

Since I had several forms to prevent caching for, I added a module form_alter() function to set $GLOBALS['conf']['cache'] to false if the $form_id was found in an array of form IDs.

So now I have page caching set to On, but I can turn off caching for individual pages. Much neater than caching the page and then deleting it from the cache later.

Reply

Rollo

tada

Very useful advices. I think I will search more about this on the Net, Guys. Many thanks

Reply

Przepisy kulinarne

Dobra kuchnia

Thank you for this article, I appreciate it even more because it is not so common to find those kind of things on the net. Thnx!

Reply

Prezenty

Hello I'm glad I stumbled

Hello

I'm glad I stumbled upon this article. It has demystified the black magic that is drupals caching system. I will attempt to use some of this on the next site I build.

Reply

Ari

can you give a real life

can you give a real life example of when to use my_module_function(), with reset too?

Like is this a good example... like your module pretend makes a page /mystats/nodes which keeps track of node statistics ... and for every 50 new nodes that are inserted (kept track in a variable in the variables table), then call my_module_function(TRUE) (or if 50 new nodes havent been inserted yet, then with FALSE param), to get the data for our function to format and pass to template. Then in the my_module_function(), either get the data from the database in the unserialize() line....
or
calculate new stats and store the new data in the cache (in the cache_set() line) anc set that variable reset back to 0...

just wondering, shouldnt $my_data be set in the latter part as well? so it will be stored in cache, but need to return data back to the module to format/etc

Reply

mjwest10

This is good to know

This is good to know. I was wondering how the related links module handles this cacheing? For example I have a Visual Basic tutorial site that uses related links within the actual tutorial pages. To see an example you can check out this: Visual Basic Oracle tutorial. At the bottom I have related links to other database guides and tutorials on my site. I am using the module to do this. If I have caching turned on does it make hits for all those nodes? How would I check this out?

Reply

jayjaytheluffy

Nice article but I have a

Nice article but I have a question:
How does drupal make sure or check if the data being cached is not corrupted?
My boss just assigned me this task after I successfully cache tagadelic tag cloud in my page. I'm really confused and doesn't know the answer.
Please help me.

Reply

Anonymous

szafy bhp

Thanks for the great advice about blog posting services..it is so hard to get through all the false info out there!

Reply

Angarsk

I wonder

What is faster for caching? variable_get updated on cron or cache_get?

Reply

Bilrace

nice articel

Sweet. Looks like its not a hard thing to speed up some database questions with cached info. Thanx mate

Reply

Anonymous

How can I use cache function

How can I use cache function in Drupal 6 for "funny story of the day"? I am obsessed with this question for long time. Please keep me informed by TONYLIUH AT GMAIL.COM.
Thank you in advance.

Reply

Chris Cohen

Thanks

Thanks. This was a very useful explanation of a powerful Drupal feature.

Reply

borgo

further reading

Great intro, thanks! I can't find any further reading (web / book) on this though. Thanks.

Reply

vacilando

On logical operator precedence

It took me ages to figure out why my code did not fetch cache.

Similar to this article, I had used this condition:
if ($cache = cache_get($mycacheid) && !empty($cache->data)) {

What I eventually found was that although both parts of the condition were true, the whole was returning false!

So I replaced '&&' with 'and' and it works just perfectly.
if ($cache = cache_get($mycacheid) and !empty($cache->data)) {

I am aware of the precedence of the first logical operator (see http://be2.php.net/manual/en/language.operators.logical.php). But why would it matter here?
Also, the above code must have worked on other systems. Is it a question of PHP version? Or why would it not work on my server?

FYI, I am on PHP 5.2.6 and Drupal 6.14.

Reply

vacilando

Solution...

Shoot, it was so simple... '&&' has higher precedence even than '='. So I should have enclosed the $cache = cache_get($mycacheid) in brackets!

Reply

vacilando

Solution...

Shoot, it was so simple... '&&' has higher precedence even than '='. So I should have enclosed the $cache = cache_get($mycacheid) in brackets!

Reply

Redrigo

Thanks

Thanks For informations in your site...is very useful for me...

Reply