Analyze This! Using the Google Analytics API

Karen Stevenson gives a detailed overview of the Google Analytics API

Google Analytics is a great way to monitor site usage and traffic. You add Google Analytics to your site using the Google Analytics module, which is super simple to set up. After it's in place, you can go to the Google Analytics site and dig into a ton of data, create custom reports, etc. But you can also use the Google Analytics API to pull Google statistics into your own site and display them there. There is a Drupal module, Google Analytics API that was created by Joel Kitching as a Google Summer of Code project. It provides a wrapper you can use to create tailored queries of your analytic data. You can turn on the included 'Google Analytics API Reports' module to display Google statistics in blocks or pages right on your site, and/or create custom code to suck in specific statistical data and do any Drupally thing you like with the results.

A screenshot of graphs representing page views and bounce rates of a webpage

Turning on the reports module gives you a taste of the things you can do. It requires that you install the Chart and Country API modules. Once turned on, you will see a tab on all your nodes called 'Statistics', which will display several charts of recent Google Analytics data for that node. You also will see a new Statistics block that you can add to your site which will display charts of Google Analytics data for whatever page the block is displayed on. So that much is cool, but you can do much much more. To do much with it, you will need to create some code customized for the way your site is designed, and you will want to dive into the Google Analytics API documentation. One really great tool Google provides is a Data Feed Query Explorer where you can sign into your own account and select metrics and filters to pull out any kind of custom data you like. So, say that I want to create a block of the most popular links in the last 24 hours that I can feature on the front page of the site. I start by creating a simple request array that looks like this:

  
// Build the data request.
$request = array(
  '#dimensions' => array('pagePath'),
  '#metrics' => array('pageviews'),
  '#filter' => 'pagePath!=/',
  '#sort_metric' => array('-pageviews'),
  '#start_date' => date('Y-m-d', time() - 86400),
  '#max_results' => 10,
);
  

In this code I'm requesting the number of page views grouped by page path, filtering out the home page, sorted by page views (descending), starting 24 hours ago and limiting my results to the first 10 items that match my request. Once I construct my request, I pass it to the API, which will return me an array of result objects which I can manipulate to get the dimensions and metrics I requested.

  
$items = array();
$data = google_analytics_api_report_data($request);
foreach ($data as $page) {
  $dimensions = $page->getDimensions;
  $metrics = $page->getMetrics;
  $items[] = $dimensions['pagePath'] .' ('. $metrics['pageviews'] .')';
}
print theme('item_list', $items);
  

Obviously, if I want to make these into nice links, I need the page titles. I can use some Drupal functions to get more information about those paths. Let's say I only want to create links to these items if they are nodes, and in that case I need to get the page title for the link. // Strip the leading slash or base_path so we have a // normal-looking Drupal alias. $alias = substr($dimensions['pagePath'], strlen(base_path())); // Get the 'real' Drupal path for this item. $path = drupal_lookup_path('source', $alias); // If it's a node, get the title. if (arg(0, $path) == 'node' && is_numeric(arg(1, $path))) { $id = arg(1, $path); $title = db_result(db_query("SELECT title FROM {node} WHERE nid = %d", $id)); $items[] = l($title.' ('. $metrics['pageviews'] .')', $alias); } The API allows for simple regex filters, so I can search for statistics for only paths that start with /taxonomy/ (the tilde (~) means it is a regex):

  
// Build the data request.
$request = array(
  '#dimensions' => array('pagePath'),
  '#metrics' => array('pageviews'),
  '#filter' => 'pagePath=~^/taxonomy/',
  '#sort_metric' => array('-pageviews'),
  '#start_date' => date('Y-m-d', time() - 86400),
  '#max_results' => 10,
);
  

Or I can find the top pages visited by people from the United States, limiting the results to those that had the substring 'American' in the title:

  
// Build the data request.
$request = array(
  '#dimensions' => array('pagePath'),
  '#metrics' => array('visits'),
  '#filter' => 'pageTitle=@American && country==United States',
  '#sort_metric' => array('-visits'),
  '#start_date' => date('Y-m-d', time() - 86400),
  '#max_results' => 10,
);
  

Once you get started you will find it helps to have an easy way to play with your queries to make sure you are getting the results you expect. This is where Google's Data Feed Query Explorer really helps. You can create a query in the explorer, and then use it to set up the right values in your request.















google-query-explorer_0.jpg

Note! The Drupal API makes a few changes to the raw Google API that confused me for a while. The Google API prefixes 'ga:' to each data element. When using the Drupal module you leave that off, the module adds it to each element before sending the request to Google. The Google API uses a semicolon for AND and a comma for OR, and the Drupal module uses && for AND and || for OR. Once I figured that out, I was able to use the Google tool to model a custom query and then adapt the values to create a request in Drupal. Also note that there are lots of ecommerce tools here. If you have an ecommerce site that uses Google's ecommerce tracking code, you have dimensions, filters, and metrics available for things like product skus and even revenue. One thing that quickly became apparent is that it is really really important to have meaningful information in your page titles and paths. Google has no information about the source of the data and only knows the pagePath and pageTitle. If you want to look for specific content types and there is nothing in either the path or title that tells you what kind of content it is, you will have no easy way to specify the right information in your query. A perfect partner for Google Analytics API is the PathAuto module. With PathAuto, you can create automatic aliases for all your page paths. So you could change 'node/10' into something like '[type]/[title-raw]', which would give you a path that includes the content type. Then you could create a Google Analytics query that filters out paths that match your desired content type, and that will allow you to get aggregate data, like pageviews, by content type. If the page title is in your aliased path ([title-raw]), you can also easily reconstruct the title from the path when you create links without any need to do local queries to find the original item:

  
$title = ucfirst(str_replace('-', ' ', $alias));
print l($title, $alias);
  

There are a few caveats here. The Google Analytics API module is new and still in development, so you'll want to pick up the latest code and check the issue queue for possible patches. If you find this interesting (I sure do!) jump in and help polish this useful module.

Get in touch with us

Tell us about your project or drop us a line. We'd love to hear from you!