Querying a Slave Database with Views

Scaling Views with Views 3 and Pressflow 6

If you're not already aware of it, there's this little fork of Drupal called Pressflow, maintained by those really smart people over at Four Kitchens. If you're at all serious about getting your Drupal site to succeed (as in lots of visitors), start using Pressflow now, before success comes knocking on your door. Pressflow is a collection of patches, or nips and tucks of Drupal, geared towards performance and scalability.

Database Replication in Pressflow 6

One of the things that Pressflow 6 supports that Drupal 6 does not is database replication. Database replication sounds all scary and impressive, but put simply, it just means you are creating another database to allow more traffic. The way to do that is to replicate, or copy, the database. You can then move those copies of your database to other servers, which will allow you to distribute the traffic you are receiving across more than just one server. It's kind of like adding another lane on a highway—you're not only allowing more cars to drive on the road (Scalability), you're also adding the space (Bandwidth) for those cars to drive a bit faster (Performance).

The original database that you have copied is usually called the Master database. This database ideally receives all the INSERT, UPDATE, and DELETE queries. The Master database then replicates out these changes to the copied databases, called Slaves. The Slaves ideally would receive all SELECT queries, or read only queries.

Rather than add in any sort of functionality to auto detect whether a particular query should go to the Master or the Slave, Pressflow takes the conservative route and only modifies one Drupal core function, and then adds a few more. The modified function is pager_query(), and the new functions are db_query_slave() and db_query_range_slave(). The pager_query() function is the function that is used in Drupal core to create paged lists of data, and for this function alone, Pressflow basically assumes that it's safe to route this query to the Slave database, if it exists.

So, what about all those other queries? What about the SELECTs that are occuring outside of pager_query()? Pressflow stays hands off. It doesn't want to make any assumptions about how you want to handle those queries. This means that it's up to you to start implementing those two other functions, db_query_slave() and db_query_range_slave(). Now before you start going through all the modules on your site and hacking them to use db_query_slave(), back up a bit. Most of the public listing of content on your site is likely being presented by the Views module. So for the greatest gains, we really only need to hack one module, right?

Getting Views to Query the Slave Database

Well, it turns out if you're using Views 3 (at the time of this writing, the current stable release of Views 3 is Views-6.x-3.0-alpha3), you don't have to hack anything, you only need to extend. Views 3 has what is called a pluggable query engine. This means that you can do a complete drop in replacement of the code that is driving how your database is queried. Now that sounds promising, doesn't it? It also sounds a bit scary. Let's walk through how to do this step by step.

Tell Views You Have Something to Say

The first step in doing pretty much any sort of extension of Views is to introduce yourself. The place you do this is in a module, and the way you do it is with hook_views_api(). If you already have a custom module for your site, great! If not, go ahead and create one. In this example, I'm going to call my module Views Query Slave, and give it a machine name of views_query_slave. So, inside of sites/all/modules I create a directory called views_query_slave and inside that directory I create a views_query_slave.module file and a views_query_slave.info file.

Let's pop open views_query_slave.module and see how to introduce ourselves to Views:

  
/**
 * Implementation of hook_views_api().
 */
function views_query_slave_views_api() {
  return array(
    'api' => 3, // We are implementing a Views 3 only feature.
  );
}
  

That's it! So the "hook" in hook_views_api() is the machine name of our module, views_query_slave. We tell Views that we are implementing features in the 3 branch of Views, just to be sure that in the off chance this module gets enabled on a site with Views 2 on it, it doesn't start barfing up PHP errors and such.

Now that we've introduced ourselves to Views, we need to actually tell Views what we want to do. For this we need to create a file called views_query_slave.views.inc, and put it in our module directory. This is the file that Views is going to look to for our customizations. Inside of that file, we're going to be implementing some more views hooks, namely hook_views_data_alter() and hook_views_plugins().

  
/**
 * We only want to modify the query plugin if db_query_slave() exists. This is
 * in effect saying, "Hey, are we on Pressflow 6?"
 */
if (function_exists('db_query_slave')) {
  /**
   * Implementation of hook_views_plugins
   */
  function views_query_slave_views_plugins() {
    $plugins = array(
      'query' => array(
        'views_query_slave' => array(
          'title' => t('SQL Query (slave)'),
          'help' => t('Query will be generated and run using the Pressflow Slave database API.'),
          'handler' => 'views_plugin_query_slave',
          'parent' => 'views_query',
        ),
      ),
    );

    return $plugins;
  }
 
  /**
   * Implementation of hook_views_data_alter().
   */
  function views_query_slave_views_data_alter(&$data) {
    foreach ($data as $table => &$table_data) {
      if (isset($table_data['table']['base'])) {
        // If query class is set and it contains views_query, we can swap it out.
        $is_views_query = isset($table_data['table']['base']['query class']) && ($table_data['table']['base']['query class'] == 'views_query');
        // If query class isn't set, we can assume that it's using views_query.
        if ($is_views_query || empty($table_data['table']['base']['query class'])) {
          $table_data['table']['base']['query class'] = 'views_query_slave';
        }
      }
    }
  }
}
  

As you can see above, all the code is wrapped in if (function_exists('db_query_slave')) {. This means if the function db_query_slave() doesn't exist (in other words, this module isn't running on Pressflow 6), our code won't get executed. Phew.

The first hook we're implementing is hook_views_plugins(). This is where we define our new Query Plugin, the one that will be querying the slave database if it exists. The important parts here are the key of the array, the handler and the parent. The key is essentially the machine readable name of our Query plugin (views_query_slave), the handler will be the name of the class as well as the name of the file that Views will look for, and the parent is what class our class will be inheriting from, or extending. In this case, we are extending the default Views query plugin, whose machine name happens to be views_query.

The second hook we're implementing is hook_views_data_alter(). This hook is altering any Views configuration previously declared by other modules. This configuration contains information about all the database tables that Views can query, along with the fields, arguments, relationships, sorters, filters, etc., that are associated with these tables. The main thing we are interested in here is a nested setting called 'query class'. Query class is the key of the Views plugin declared in hook_views_plugins(), not to be confused with the handler or actual PHP class that will get called.

So, if a module specifies a query plugin for a particular table, Views will use the handler associated with that query plugin for our queries. If a module doesn't specify a query class, Views just assumes it wants to use the internal default Query plugin we mentioned earlier, views_query. So, we want to hijack any table that says to use the default Query plugin, or any table that doesn't specify one at all, and instead inform Views to use our class instead. We do that by setting 'query class' to the name of our Query plugin, 'views_query_slave'.

Building the Query plugin

Now that we've introduced ourselves to Views, and told Views a bit about what we want to do, the next step is to actually write our Views Query plugin. Now this sounds a bit scary, but really all it entails is copying the parent class we are extending, and then removing the parts that we don't want to change. We mentioned earlier how the machine name and the handler are distinct from each other. The default Query plugin was named 'views_query', but the handler for it is called views_plugin_query_default. So the file we want to copy is inside of a directory called plugins within the Views module. If you want to follow along but don't have Views 3 downloaded, drupalcode.org is a good place to go.

The first thing we'll do is just copy this file completely into our custom module. Rename the file to the name of our handler (views_plugin_query_slave), and then update the class declaration to reflect the name of our handler and the parent class we are extending (views_plugin_query_default):

  
/**
 * Extension of views_plugin_query_default to query a slave db if it exists.
 */
class views_plugin_query_slave extends views_plugin_query_default {
  

Now it's time for cleanup. The only method we care about within the default class is execute(), since that is where the query actually gets executed. So, just delete all the other methods and junk out of our class—since we're extending views_plugin_query_default, it will just find all that goodness there.

Let's set up a new variable in this class first. We don't want to just assume that all views are safe to query against the Slave database, so we'll create a variable called $slave_safe within our class:

  
  /**
   * Whether or not this view is safe to be run against the Slave database.
   *
   * @var boolean
   */
  protected $slave_safe = FALSE;
  

Then, let's add some new methods in our class, ones we'll use to wrap around our query functions:

  
  /**
   * Wrapper method for db_query().
   */
  function db_query($query, $args = array()) {
    $fnc = $this->slave_safe ? 'db_query_slave' : 'db_query';
    return $fnc($query, $args);
  }

  /**
   * Wrapper method for db_query_range().
   */
  function db_query_range($query, $from, $count, $args = array()) {
    $fnc = $this->slave_safe ? 'db_query_range_slave' : 'db_query_range';
    return $fnc($query, $from, $count, $args);
  }
  

We check our new variable, $this->slave_safe in each query method. If the view is "slave safe", the functions we'll be calling are db_query_slave() and db_query_range_slave(). Otherwise, it just defaults to the normal Drupal core functions.

Modifying execute()

Now we need to modify execute() to set up our $this->slave_safe variable and use our new methods:

  
  function execute(&$view) {
    $cache_settings = $view->display_handler->get_option('cache');
    $this->slave_safe = $cache_settings['type'] != 'none';
  

In our example, we're only going to be saying a view is "slave safe" if it has any sort of caching on. When you create a view, you have an option to set up time based caching of the query results or the markup itself. It's pretty handy, so start using it! You can easily set up your own method for determining what qualifies a view as "slave safe".

Now that we've got our $this->slave_safe variable set, the next thing we need to do is to find all the instances of db_query() and db_query_range() within the execute method, and replace them with $this->db_query and $this->db_query_range, respectively. You should be seeing something like this:

  
        $result = $this->db_query_range($query, $args, $offset, $limit);
      }
      else {
        $result = $this->db_query($query, $args);
      }
  

Hey, I already see that in there!
If you're already seeing $this->db_query() and $this->db_query_range(), it could be that you are using a version of Views newer than 3.0-alpha3. This feature request in the Views issue queue has been committed to Views 3, but is only available in versions newer than Alpha 3. Check out the code listed at the bottom of this article for more details.

Cool, so now we've got Views using our custom query wrapper methods! We're not completely finished though. There's one more function call in here we need to modify, and that's the pager count query. For any pager views, Views executes two queries—one to get the number of total rows there are, and the other that actually returns the results. If we don't step in somewhere, the count query will actually get executed against the Master database, which could lead to some funkiness in our View results. Here's the code as you're probably seeing it now (unless you fall into the group above that were already seeing $this->db_query()):

  
      if ($this->pager->use_count_query() || !empty($view->get_total_rows)) {
        $this->pager->execute_count_query($count_query, $args);
      }
  

What we want to do is create our own method for the pager count query, and call that instead:

  
      if ($this->pager->use_count_query() || !empty($view->get_total_rows)) {
        $this->execute_count_query($count_query, $args);
      }
  

And here's the new method we're adding to this class:

  
  /**
   * Execute the count query, which will be done just prior to the query
   * itself being executed.
   *
   * @see views_plugin_pager::execute_count_query()
   */
  function execute_count_query(&$count_query, $args = array()) {
    $this->pager->total_items = db_result($this->db_query($count_query, $args));
    if (!empty($this->pager->options['offset'])) {
      $this->pager->total_items -= $this->pager->options['offset'];
    }

    $this->pager->update_page_info();
    return $this->pager->total_items;
  }
  

Voilà!

Great! We've got everything set up to have Views start querying the slave. Keep in mind, Pressflow is smart enough to know whether or not you even have a Slave database, and it won't cause any problems to use this module if you don't. It will function the same either way, so we can safely enable the module on any Pressflow site without worry.

If you're not using Pressflow already, it's definitely something to consider switching to. Even if you don't need to scale right now, you'll be better prepared if you do. Having a module like this example one is handy as well, allowing you to scale your website to multiple databases. Just keep picturing that additional lane on the highway and you'll see why it can benefit you to add replication, and to have your Views query the slave database.

A complete version of this example module can be found on GitHub. Please note, if you are using a version of Views 3 later than alpha3, a lot of the code above is not applicable. You should view this branch instead.

Get in touch with us

Tell us about your project or drop us a line. We'd love to hear from you!