by Andrew BerryJuly 13, 2011

Slow Queries? Check the Cardinality of Your MySQL Indexes

This week I've been heavily involved with optimizing the performance of the social networking features of a client site. The site in question uses the User Relationships module, a module that allows site members to connect with each other and share content within their circle of friends. The User Relationships module installs two key tables for storing relationships:

  • The {user_relationship_types} table contains configuration data for each relationship type. Supporting multiple relationship types allows users to have different kinds of relationships, such as those that are one-way (like Twitter), or require approval (like Facebook) on the same site.
  • The {user_relationships} table contains the actual relationships between users on the site. This table can be quite large, as it contains one row for every user relationship on the site.

How do these tables and their indexes relate to performance? Cardinality.

In this context, cardinality refers to the number of unique values in a column. For example, primary keys such as a node ID or user ID would have very high cardinality. The mail column in the {users} table would also have high cardinality, but not as much as the user name or user ID as the mail column is not guaranteed to be unique at the database layer. The status column in the {users} table would have low cardinality, as there are only two possible values for it (zero for blocked and one for active).

What does it mean?

How does this relate to the User Relationships module? Let's take a look at the indexes it creates for the {user_relationship} table from it's hook_schema() implementation:

<pre> <code class="language-php"> $schema['user_relationships'] = array( 'fields' => array( // snipped fields ), 'primary key' => array('requester_id', 'requestee_id', 'rtid'), 'indexes' => array( 'requester_id' => array('requester_id'), 'requestee_id' => array('requestee_id'), 'rtid' => array('rtid'), 'rid' => array('rid'), ), ); ?>

This code means that when the module is installed:

  1. A primary key is created of the composite of the user IDs of the users in the relationship along with the type of relationship they have.
  2. An index is created for the requester_id to allow searching for the outgoing relationships of a user.
  3. An index is created for the requestee_id to allow searching for the incoming relationships of a user.
  4. An index is created for the relationship type.
  5. Finally, an additional index is created for the relationship ID, which is an auto-incremented value but not the primary key.

Let's example the performance of a common query - finding the number of relationships a user has:

mysql> SET profiling = 1; Query OK, 0 rows affected (0.00 sec) mysql> SELECT COUNT(DISTINCT rid) AS count FROM user_relationships ur INNER JOIN user_relationship_types urt USING ( rtid ) WHERE (ur.requester_id = 1 OR ((ur.approved <> 1 OR ur.rtid IN (1)) AND ur.requestee_id = 1)) AND ur.approved = 1 AND ur.rtid = 1; *************************** 1. row *************************** count: 0 1 row in set (0.99 sec) mysql> SHOW profiles; *************************** 1. row *************************** Query_ID: 1 Duration: 0.98712800 Query: SELECT COUNT(DISTINCT rid) AS count FROM user_relationships ur INNER JOIN user_relationship_types urt USING ( rtid ) WHERE (ur.requester_id = 1 OR ((ur.approved <> 1 OR ur.rtid IN (1)) AND ur.requestee_id = 1)) AND ur.approved = 1 AND ur.rtid = 1 1 row in set (0.00 sec) mysql> SET profiling = 0; Query OK, 0 rows affected (0.00 sec)

Nearly a second! EXPLAIN says:

mysql> EXPLAIN SELECT COUNT(DISTINCT rid) AS count FROM user_relationships ur INNER JOIN user_relationship_types urt USING ( rtid ) WHERE (ur.requester_id = 1 OR ((ur.approved <> 1 OR ur.rtid IN (1)) AND ur.requestee_id = 1)) AND ur.approved = 1 AND ur.rtid = 1; *************************** 1. row *************************** id: 1 select_type: SIMPLE table: urt type: const possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: const rows: 1 Extra: Using index *************************** 2. row *************************** id: 1 select_type: SIMPLE table: ur type: ref possible_keys: PRIMARY,rtid,requester_id,requestee_id key: rtid key_len: 4 ref: const rows: 109924 Extra: Using where

The good news is, neither temporary tables or filesorts are used for the query. As well, the second table is able to use a ref join type, which is reasonably fast. So why is the query taking so long to run?

The key is in the key rtid:

mysql> SELECT COUNT(1) FROM user_relationships; *************************** 1. row *************************** COUNT(1): 219406 1 row in set (0.15 sec) mysql> SHOW INDEXES FROM user_relationships; <snip> *************************** 5. row *************************** Table: user_relationships Non_unique: 1 Key_name: rtid Seq_in_index: 1 Column_name: rtid Collation: A Cardinality: 6 Sub_part: NULL Packed: NULL Null: Index_type: BTREE Comment: Index_comment: <snipped further output> 7 rows in set (0.02 sec) mysql> SELECT DISTINCT rtid FROM user_relationships; *************************** 1. row *************************** rtid: 1 1 row in set (0.00 sec)

The index indicates are cardinality of 7 (which is an estimate based on the number of rows in the table) for the total of 219406 rows. Yet, rtid (the relationship type ID) is set to 1 for every single relationship. This means that the rtid index is not only useless, but actively slowing down every query on this table!

Time to bite the bullet and drop the index:

mysql> ALTER TABLE user_relationships DROP INDEX rtid; Query OK, 0 rows affected (0.09 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> SELECT COUNT(DISTINCT rid) AS count FROM user_relationships ur INNER JOIN user_relationship_types urt USING ( rtid ) WHERE (ur.requester_id = 1 OR ((ur.approved <> 1 OR ur.rtid IN (1)) AND ur.requestee_id = 1)) AND ur.approved = 1 AND ur.rtid = 1; *************************** 1. row *************************** count: 0 1 row in set (0.48 sec) mysql> SHOW profiles; *************************** 1. row *************************** Query_ID: 1 Duration: 0.49973000 Query: SELECT COUNT(DISTINCT rid) AS count FROM user_relationships ur INNER JOIN user_relationship_types urt USING ( rtid ) WHERE (ur.requester_id = 1 OR ((ur.approved <> 1 OR ur.rtid IN (1)) AND ur.requestee_id = 1)) AND ur.approved = 1 AND ur.rtid = 1 1 row in set (0.01 sec)

Great! The query time has been cut nearly in half. It still seems a little slow. Perhaps it's because the primary key still contains rtid?

mysql> ALTER TABLE user_relationships DROP PRIMARY KEY; Query OK, 219406 rows affected (15.23 sec) Records: 219406 Duplicates: 0 Warnings: 0 mysql> ALTER TABLE user_relationships ADD INDEX (requester_id, requestee_id); Query OK, 0 rows affected (2.55 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> SELECT COUNT(DISTINCT rid) AS count FROM user_relationships ur INNER JOIN user_relationship_types urt USING ( rtid ) WHERE (ur.requester_id = 1 OR ((ur.approved <> 1 OR ur.rtid IN (1)) AND ur.requestee_id = 1)) AND ur.approved = 1 AND ur.rtid = 1; *************************** 1. row *************************** count: 0 1 row in set (0.01 sec) mysql> SHOW PROFILES; *************************** 1. row *************************** Query_ID: 1 Duration: 0.00850100 Query: SELECT COUNT(DISTINCT rid) AS count FROM user_relationships ur INNER JOIN user_relationship_types urt USING ( rtid ) WHERE (ur.requester_id = 1 OR ((ur.approved <> 1 OR ur.rtid IN (1)) AND ur.requestee_id = 1)) AND ur.approved = 1 AND ur.rtid = 1 1 row in set (0.00 sec)

The takeaway

With some careful analysis of our indexes and data, we've sped up a server-killing query by nearly 100x. Looking for other ways to speed up your Drupal site? Check out other articles on speeding up Views with slave databases and absorbing heavy traffic with Varnish, watch Lullabot's Performance and Scalability DVD, or have us check out your slow site for hands-on optimization.

Join the conversation
Sign up for the Lullabot Newsletter and get our latest articles and more direct to your inbox

We can partner with you!

We’ve accomplished some amazing things for some really great clients. We’d love to talk about how we can help your next project!

Tell us about your project

You can be a ’bot!

We have an amazing team of friendly experts in a number of disciplines. We work hard, have lots of fun together, and take great care of our team.

See our open positions