Want to get Lullabot article, videocast, and podcast announcements delivered right to your in-box?
Let us know your email address (we won't share it) and we'll let you know when anything exciting happens.
A beginner's guide to caching data
Building complicated, dynamic content in Drupal is easy, but it can come at a price. A lot of the stuff that makes a Web 2.0 site so cool can spell 'performance nightmare' under heavy load, thrashing the database to perform complex queries and expensive calculations every time a user looks at a node or loads a particular page.
One solution is to turn on page caching on Drupal's performance options administration page. That speeds things up for anonymous users by caching the output of each page, greatly reducing the number of DB queries needed when they hit the site. That doesn't help with logged in users, however: because page level caching is an all-or-nothing affair, it only works for the standardized, always-the-same view that anonymous users see when they arrive.
Eventually there comes a time when you have to dig in to your code, identify the database access hot spots, and add caching yourself. Fortunately, Drupal's built-in caching APIs and some simple guidelines can make that task easy.
The basics
The first rule of optimization and caching is this: never do something time consuming twice if you can hold onto the results and re-use them. Let's look at a simple example of that principle in action:
<?php
function my_module_function($reset = FALSE) {
static $my_data;
if (!isset($my_data) || $reset) {
// Do your expensive calculations here, and populate $my_data
// with the correct stuff..
}
return $my_data;
}
?>The important part to look at in this function is the static variable named $my_data. Static variables start out empty the first time a function is called, but they keep the data they're populated with even when the function is called again. That means that we can check if the variable is already populated, and if so return it immediately without doing any more work.
This pattern appears all over the place in Drupal -- including key functions like node_load(). Calling node_load() for a particular node ID requires database hits the first time, but the resulting information is kept in a static variable for the duration of the page load. That way, displaying a node once in a list, a second time in a block, and a third time in a list of related links (for example) doesn't require three full trips to the database.
Another important feature is the use of the $reset variable. Caching is good, but occasionally you want to be sure you're getting the absolute freshest data available. Using a 'reset' variable in your function, and always performing the 'expensive' version of the function if it's set to TRUE, lets you bypass caching when you really need to.
Drupal's cache functions
You might notice that the static variable technique only stores data for the duration of a single page load. For even better performance, it's often possible to cache data in a more permanent fashion...
<?php
function my_module_function($reset = FALSE) {
static $my_data;
if (!isset($my_data) || $reset) {
if (!$reset && ($cache = cache_get('my_module_data')) && !empty($cache->data)) {
$my_data = unserialize($cache->data);
}
else {
// Do your expensive calculations here, and populate $my_data
// with the correct stuff..
cache_set('my_module_data', 'cache', serialize($my_data));
}
}
return $my_data;
}
?>This version of the function still uses the static variable, but it adds another layer: database caching. Drupal's APIs provide three key functions you'll need to be familiar with: cache_get(), cache_set(), and cache_clear_all(). Let's look at how they're used.
After the initial check of the static variable, this function checks Drupal's cache for data stored with a particular key. If it finds it, and the $cache->data element isn't empty, it unserializes the stored data and sticks it into the $my_data variable.
If no cached version is found (or if we called the function using the $reset parameter), the function does the actual work of generating the data. Then it serializes it, and save it TO the cache so future requests will find it. The key that you pass in as the first parameter can by anything you choose, though it's important to avoid colliding with any other modules' keys. Starting the key with the name of your module is always a good idea.
The end result? A slick little function that saves time whenever it can -- first checking for an in-memory copy of the data, then checking the cache, and finally calculating it from scratch if necessary. You'll see this pattern a lot if you dig into the guts of data-intensive Drupal modules.
Keeping up to date
What happens, though, if the data that you've cached becomes outdated and needs to be recalculated? By default, cached information stays around until some module explicitly calls the cache_clear_all() function, emptying out your record. If your data is updated sporadically, you might consider simply calling cache_clear_all('my_module_data', 'cache') each time you save the changes to it. If you're caching quite a few pieces of data (perhaps versions of a particular block for each role on the site), there's a third 'wildcard' parameter:
<?php
cache_clear_all('my_module', 'cache', TRUE);
?>This clears out all the cache values whose keys start with 'my_module'.
If you don't need your cached data to be perfectly up-to-the-second, but you want to keep it reasonably fresh, you can also pass in an expiration date to the cache_set() function. For example:
<?php
cache_set('my_module_data', 'cache', serialize($my_data), time() + 360);
?>The final parameter is a unix timestamp value representing the 'expiration date' of the cache data. The easiest way to calculate it is to use the time() function, and add the data's desired lifetime in seconds. Expired entries will be automatically discarded as they pass that date.
Advanced caching
You might have noticed that cache_set()'s second parameter is 'cache' -- the name of the table that stores the default cache data. If you're storing large amounts of data in the cache, you can set up your own dedicated cache table and pass its name into the function. That will help keep your cache lookups speedy no matter what other modules are sticking into their own tables. The Views module uses that technique to maintain full control over when its cache data is cleared.
If you're really hoping to squeeze the most out of your server, Drupal also supports the use of alternative caching systems. By changing a single line in your site's settings.php file, you can point it to different implementations of the standard cache_set(), cache_get(), and cache_clear_all() functions. File-based caching, integration with the open source memcached project, and other approaches are all possible. As long as you've used the standard Drupal caching functions, your module's code won't have to be altered.
A few caveats
Like all good things, it's possible to overdo it with caching. Sometimes, it just doesn't make sense -- if you're looking up a single record from a table, saving the result to a database cache is silly. Using the Devel module is a good way to spot the functions where caching will pay off: it can log the queries that are used on your site and highlight the ones that are slow, or the ones that are repeated numerous times on each page.
Other times, the data you're using will just be a bad fit for the standard caching system. If you need to join cached data in SQL queries, for example, cache_set()'s practice of string data as a serialized string will be a problem. In those cases, you'll need to come up with a solution that's specific to your module. VotingAPI maintains one table full of individual votes and another table full of calculated results (averages, sums, etc.) for quick joining when sorting and filtering nodes.
Finally, it's important to remember that the cache is not long term storage! Since other modules can call cache_clear_all() and wipe it out, you should never put something into it if you can't recalculate it again using the original source data.
Go west, young Drupaler!
Congratulations: you now have a powerful set of tools to speed up your code! Go forth, and optimize.





Comments
The colonoscopy
One minor thing I do, is when setting wildcard TRUE to make cache_clear_all() a right-hand match (in the case where you have lots of my_module:foo, my_module:bar, etc.), it's probably better to include the colon too just in the case there is a module storing data in 'my_modulename'.
This won't clear 'my_modulename' data by accident:
<?phpcache_clear_all('my_module:', 'cache', TRUE);
?>
Great post. The details of
Great post. The details of caching beyond static variables has been a mystery to me for a while now. At least now I know it's not some voodoo magic. :)
Drupal 6 is slightly different (better)
Look at this line from Jeff's code:
<?php
cache_set('my_module_data', 'cache', serialize($my_data), time() + 360);
?>
Since $my_data is a complex data type (array or object), he needs to serialize it before sending it into the cache. Likewise, when getting that data from the cache, the result has to be unserialized before it is useful. This is a facet of Drupal 5. Drupal 6 hides this implementation detail inside of cache_get and cache_set so that you can just send $my_data in there and serialization will be done if needed. So the appropriate Drupal 6 code would be:
<?phpcache_set('my_module_data', 'cache', $my_data, time() + 360);
?>
... and Drupal 6 is even better ;)
... and Drupal 6 is even better ;)
We changed the argument ordering for cache_set, so that the optional table argument comes after the data.
So the real Drupal 6 code would be:
<?phpcache_set('my_module_data', $my_data, 'cache', time() + 360);
?>
Thanks to that, for simple caching that doesn't need special expire settings, you can just do:
<?phpcache_set('my_module_data', $my_data);
?>
Thanks Jeff for this
Thanks Jeff for this enlighting article. Now I am ready to speed up some modules.
article
Thank You for another very interesting article Jeff. It's really good written and I fully agree with You on main issue, btw. I must say that I really enjoyed reading all of Your posts. It’s interesting to read ideas, and observations from someone else’s point of view… it makes you think more. So please try to keep up the great work all the time. Greetings
Excellent
This is great, thanks Jeff - I implemented this in a piece of new code today and was amazed at how easy it is to get up and running. I've got lots of ideas now of how I can speed up other parts of the site for logged in users - cheers!
Searching through websites I
Searching through websites I came across this article. I think it is fairly interesting considering the spam crusing about the net. Good Job.
Not bad, I will implement
Not bad, I will implement this soon, since my site has problems with to many (db) visitors lately.
Block Cache Module
For other people may interest with another module:
http://drupal.org/project/blockcache
Really Useful Info
I'm glad I stumbled upon this article. It has demystified the black magic that is drupals caching system. I will attempt to use some of this on the next site I build.
Thank you for your amazing
Thank you for your amazing site.
Cache
Hi,
Since we installed the boost module the sites do not appear to be updating.
For example if you are logged in as a user you can see the new posts.
If you are not logged in you see the posts from the previous day. It appears that the site is refreshing every 24 hours.
We ahve set the cache setting to zero to work with the boost module, but this appears to cause problems.
The site is www.frostfirepulse.com
regards
Chris
"For other people may
"For other people may interest with another module:
http://drupal.org/project/blockcache"
Thanx!
thank you wery mach
thank you wery mach
Caching Paging Queries
One thing which tripped me up for awhile... if you're caching the result from a pager query, note that you should also cache the resulting contents of the pager_page_array, pager_total, and pager_total_items global variables as well. Otherwise when you attempt to show paging links via theme_pager you'll get nothing.
Riddle me this if you will
Riddle me this if you will ;-)
- From the following, we generate a unique id, and use it as cache key.
Without relying on metadata outputted with the data... how would we keep the uid on the second hit?
Every refresh will regenerate another id, would it not?
How could the first part of the condition know about the second part?
if(!isset($fx_data) || $reset){if (!$reset && ($cache = cache_get($_UID)) && !empty($cache->data)) {
$fx_data = unserialize($cache->data);
$_RAW = $fx_data;
}
else{
$_UID = uniqid();
cache_set($_UID, serialize($MyValue), 'cache');
$_RAW = unserialize(cache_get($_UID)->data);
}
}
No good solution
That's a somewhat awkward approach to the problem, actually -- since I don't know the context in which the code is being used I can't really say how to solve it more effectively. Do you really need a *random* key for the $_UID variable? In most situations you'll be retrieving information based on some criteria -- the time, the current path, the logged in user, etc -- and that information can be used to create a unique but predictable key to store the data.
Excellent article
I'm a little late to this party but I'd like to say I found your article to be concise and well written. This is a great introduction to caching and the technique you describe seems a perfect fit for my api_helper module!
Thanks ;)
This is interesting article,
This is interesting article, I did not it think that it yes. Interesting it knew persons about this how much. Sorry if I wrote bad there now my English is novice and I do not it write yet good.
Turning off page caching for anon forms
I had a problem with the page cache, when mixed with (multi-step) forms for anon users that had the user's chosen values stored in the session and subsequently used as input defaults.
Since the initial form display is done with a GET, the form, including whatever default values were present at the time, was being saved into the page cache. This meant that on returning to the form with a GET, anon users would see the old default values, and not those in their current session.
So I needed to switch off the page cache just for these form pages. Thanks to a helpful comment in the issues list of the Protected Node module http://drupal.org/node/233979, I too found that setting $GLOBALS['conf']['cache'] was the answer. This is the variable that variable_get('cache',0) uses, but I modify it directly as calling variable_set('cache',false) would set it off for all pages.
Since I had several forms to prevent caching for, I added a module form_alter() function to set $GLOBALS['conf']['cache'] to false if the $form_id was found in an array of form IDs.
So now I have page caching set to On, but I can turn off caching for individual pages. Much neater than caching the page and then deleting it from the cache later.
Hello. I recently saw an
Hello.
I recently saw an article written by a professor, which confirm that what is written here. I agree with that.
:)
tada
Very useful advices. I think I will search more about this on the Net, Guys. Many thanks
Dobra kuchnia
Thank you for this article, I appreciate it even more because it is not so common to find those kind of things on the net. Thnx!
Hello I'm glad I stumbled
Hello
I'm glad I stumbled upon this article. It has demystified the black magic that is drupals caching system. I will attempt to use some of this on the next site I build.
Manual cache expiry
The default arg for drupal's cache_set function is CACHE_PERMANENT.
(ref: http://api.drupal.org/api/function/cache_set/5 )
So how does one sets manual expiry timeout for all content?
Or should I run cache_clear_all on cron?
Post new comment