by Andrew Berry on July 13, 2011 // Short URL

Slow Queries? Check the Cardinality of Your MySQL Indexes

More indexes doesn't always mean better performance

This week I've been heavily involved with optimizing the performance of the social networking features of a client site. The site in question uses the User Relationships module, a module that allows site members to connect with each other and share content within their circle of friends. The User Relationships module installs two key tables for storing relationships:

  • The {user_relationship_types} table contains configuration data for each relationship type. Supporting multiple relationship types allows users to have different kinds of relationships, such as those that are one-way (like Twitter), or require approval (like Facebook) on the same site.
  • The {user_relationships} table contains the actual relationships between users on the site. This table can be quite large, as it contains one row for every user relationship on the site.

How do these tables and their indexes relate to performance? Cardinality.

In this context, cardinality refers to the number of unique values in a column. For example, primary keys such as a node ID or user ID would have very high cardinality. The mail column in the {users} table would also have high cardinality, but not as much as the user name or user ID as the mail column is not guaranteed to be unique at the database layer. The status column in the {users} table would have low cardinality, as there are only two possible values for it (zero for blocked and one for active).

What does it mean?

How does this relate to the User Relationships module? Let's take a look at the indexes it creates for the {user_relationship} table from it's hook_schema() implementation:

<?php
$schema
['user_relationships'] = array(
 
'fields' => array(
   
// snipped fields
 
),
 
'primary key' => array('requester_id', 'requestee_id', 'rtid'),
 
'indexes' => array(
   
'requester_id' => array('requester_id'),
   
'requestee_id' => array('requestee_id'),
   
'rtid' => array('rtid'),
   
'rid' => array('rid'),
  ),
);
?>

This code means that when the module is installed:

  1. A primary key is created of the composite of the user IDs of the users in the relationship along with the type of relationship they have.
  2. An index is created for the requester_id to allow searching for the outgoing relationships of a user.
  3. An index is created for the requestee_id to allow searching for the incoming relationships of a user.
  4. An index is created for the relationship type.
  5. Finally, an additional index is created for the relationship ID, which is an auto-incremented value but not the primary key.

Let's example the performance of a common query - finding the number of relationships a user has:

mysql> SET profiling = 1;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT COUNT(DISTINCT rid) AS count FROM user_relationships ur INNER JOIN user_relationship_types urt USING ( rtid ) WHERE (ur.requester_id = 1 OR ((ur.approved <> 1 OR ur.rtid IN (1)) AND ur.requestee_id = 1)) AND ur.approved = 1 AND ur.rtid = 1;

*************************** 1. row ***************************
count: 0
1 row in set (0.99 sec)

mysql> SHOW profiles;
*************************** 1. row ***************************
Query_ID: 1
Duration: 0.98712800
   Query: SELECT COUNT(DISTINCT rid) AS count FROM user_relationships ur INNER JOIN user_relationship_types urt USING ( rtid ) WHERE (ur.requester_id = 1 OR ((ur.approved <> 1 OR ur.rtid IN (1)) AND ur.requestee_id = 1)) AND ur.approved = 1 AND ur.rtid = 1
1 row in set (0.00 sec)

mysql> SET profiling = 0;
Query OK, 0 rows affected (0.00 sec)

Nearly a second! EXPLAIN says:

mysql> EXPLAIN SELECT COUNT(DISTINCT rid) AS count FROM user_relationships ur INNER JOIN user_relationship_types urt USING ( rtid ) WHERE (ur.requester_id = 1 OR ((ur.approved <> 1 OR ur.rtid IN (1)) AND ur.requestee_id = 1)) AND ur.approved = 1 AND ur.rtid = 1;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: urt
         type: const
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: const
         rows: 1
        Extra: Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: ur
         type: ref
possible_keys: PRIMARY,rtid,requester_id,requestee_id
          key: rtid
      key_len: 4
          ref: const
         rows: 109924
        Extra: Using where

The good news is, neither temporary tables or filesorts are used for the query. As well, the second table is able to use a ref join type, which is reasonably fast. So why is the query taking so long to run?

The key is in the key rtid:

mysql> SELECT COUNT(1) FROM user_relationships;
*************************** 1. row ***************************
COUNT(1): 219406
1 row in set (0.15 sec)

mysql> SHOW INDEXES FROM user_relationships;
<snip>
*************************** 5. row ***************************
        Table: user_relationships
   Non_unique: 1
     Key_name: rtid
Seq_in_index: 1
  Column_name: rtid
    Collation: A
  Cardinality: 6
     Sub_part: NULL
       Packed: NULL
         Null:
   Index_type: BTREE
      Comment:
Index_comment:
<snipped further output>
7 rows in set (0.02 sec)

mysql> SELECT DISTINCT rtid FROM user_relationships;
*************************** 1. row ***************************
rtid: 1
1 row in set (0.00 sec)

The index indicates are cardinality of 7 (which is an estimate based on the number of rows in the table) for the total of 219406 rows. Yet, rtid (the relationship type ID) is set to 1 for every single relationship. This means that the rtid index is not only useless, but actively slowing down every query on this table!

Time to bite the bullet and drop the index:

mysql> ALTER TABLE user_relationships DROP INDEX rtid;
Query OK, 0 rows affected (0.09 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> SELECT COUNT(DISTINCT rid) AS count FROM user_relationships ur INNER JOIN user_relationship_types urt USING ( rtid ) WHERE (ur.requester_id = 1 OR ((ur.approved <> 1 OR ur.rtid IN (1)) AND ur.requestee_id = 1)) AND ur.approved = 1 AND ur.rtid = 1;
*************************** 1. row ***************************
count: 0
1 row in set (0.48 sec)

mysql> SHOW profiles;
*************************** 1. row ***************************
Query_ID: 1
Duration: 0.49973000
   Query: SELECT COUNT(DISTINCT rid) AS count FROM user_relationships ur INNER JOIN user_relationship_types urt USING ( rtid ) WHERE (ur.requester_id = 1 OR ((ur.approved <> 1 OR ur.rtid IN (1)) AND ur.requestee_id = 1)) AND ur.approved = 1 AND ur.rtid = 1
1 row in set (0.01 sec)

Great! The query time has been cut nearly in half. It still seems a little slow. Perhaps it's because the primary key still contains rtid?

mysql> ALTER TABLE user_relationships DROP PRIMARY KEY;
Query OK, 219406 rows affected (15.23 sec)
Records: 219406  Duplicates: 0  Warnings: 0

mysql> ALTER TABLE user_relationships ADD INDEX (requester_id, requestee_id);
Query OK, 0 rows affected (2.55 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> SELECT COUNT(DISTINCT rid) AS count FROM user_relationships ur INNER JOIN user_relationship_types urt USING ( rtid ) WHERE (ur.requester_id = 1 OR ((ur.approved <> 1 OR ur.rtid IN (1)) AND ur.requestee_id = 1)) AND ur.approved = 1 AND ur.rtid = 1;
*************************** 1. row ***************************
count: 0
1 row in set (0.01 sec)

mysql> SHOW PROFILES;
*************************** 1. row ***************************
Query_ID: 1
Duration: 0.00850100
   Query: SELECT COUNT(DISTINCT rid) AS count FROM user_relationships ur INNER JOIN user_relationship_types urt USING ( rtid ) WHERE (ur.requester_id = 1 OR ((ur.approved <> 1 OR ur.rtid IN (1)) AND ur.requestee_id = 1)) AND ur.approved = 1 AND ur.rtid = 1
1 row in set (0.00 sec)

The takeaway

With some careful analysis of our indexes and data, we've sped up a server-killing query by nearly 100x. Looking for other ways to speed up your Drupal site? Check out other articles on speeding up Views with slave databases and absorbing heavy traffic with Varnish, watch Lullabot's Performance and Scalability DVD, or have us check out your slow site for hands-on optimization.