by Andrew Berry

Squeeze! Save disk space with MySQL compression

After a Drupal site launches and starts gathering content, the database can be expected to grow. If a site has commenting or node revisions enabled, the database can grow very quickly. It's not uncommon to encounter site databases that are hundreds of megabytes large, if not multiple gigabytes. Developers may need to have multiple sites set up on their local machines for quick debugging and development. With the advent of reasonably priced, but small SSDs (these days, 120GB is the sweet spot for price and size), having 20 GB of MySQL databases isn't always doable, especially when site instances are rarely used.

Take a look at the largest database you have on your local machine. Odds are, most of the content is text that is easily compressed. Developers see this every time they gzip or bzip2 a database dump and a 1GB .sql file becomes 50M. With MySQL 5.5, we can take advantage of the fact that most Drupal tables compress with very high efficiency and save previous space on our hard drives.

MySQL 5.5 offers automatic compression for InnoDB tables using the Barracuda table format. There are two requirements to be able to use compressed tables:

  1. innodb_file_per_table must be enabled in my.cnf
  2. The Barracuda file format must be enabled when MySQL is built.

Enabling compression is a per-table setting that requires an ALTER TABLE statement like the following:

ALTER TABLE node ROW_FORMAT=compressed;

Disabling table compression is equally as simple:


Note that if the table format is not Barracuda, the ALTER will still succeed. If the table is exported and imported into a Barracuda table, it will be compressed during the import.

Running ALTER TABLE for each table in a database can be time consuming, especially since it would need to be run every time the database is re-imported from production. To simplify managing table compression, I've created bash script that:

  1. Can compress or decompress all tables in a given database.
  2. Checks to make sure that tables will actually be compressed.
  3. Shows the progress of individual queries, as compression and decompression can take significant time.

I'm getting in the range of 50% efficiency for larger databases where the content is mostly node revisions. Write speed for compressed tables is significantly slower than uncompressed tables, so consider leaving your most-used databases uncompressed, or only compress tables that are not related to current development.

Download (or fork!) on gist.