OneFreeVoice

December 1, 2008

Performance notes on INFORMATION_SCHEMA tables

Filed under: Code, MySQL — Gregory Haase @ 2:38 pm

I have written a bit in the past about using INFORMATION_SCHEMA tables, and I use them extensively in stored procedures for performing regular maintenance operations. There are a great many clever and useful things you can do with them. However, recently I have been noticing some performance issues using these tables that puts a damper on their usefulness.

For example, let’s look at the query I use in some of my stored procedures to manage my partitioning logic. I have multiple reporting tables that are partitioned the same way, and I am always adding new tables and removing old tables depending on current requirements, so I need to have a single abstract procedure to deal with all of them. I open the procedure with a cursor that gets the table names for every table that is partitioned on a particular column. INFORMATION_SCHEMA tables make this possible:

SELECT a.table_name
  FROM information_schema.tables a,
       information_schema.columns b
 WHERE a.table_name = b.table_name
   AND a.table_name like '%reporting'
   AND b.column_name = 'datestring'
   AND EXISTS (SELECT 1
                 FROM information_schema.partitions c
                WHERE a.table_name = c.table_name
                  AND partition_name IS NOT NULL );
......
13 rows in set (6 min 13.38 sec)

It turns out that although there have already been some efforts at information_schema optimization, the partitions table was not included. Indeed, if I take the EXISTS clause out of the picture, you see things are coming back much quicker (you can also see that I have a couple of reporting tables that are not partitioned):

15 rows in set (0.09 sec)

Another thing I noticed was that calling both tables and columns was redundant because table_name exists on columns. Indeed, if I factor this out, it reduces my query time by half:

SELECT distinct a.table_name
  FROM information_schema.columns a
 WHERE a.table_name like '%reporting'
   AND a.column_name = 'datestring'
   AND EXISTS (SELECT 1
                 FROM information_schema.partitions b
                WHERE a.table_name = b.table_name
                  AND partition_name IS NOT NULL );
.....
13 rows in set (3 min 30.05 sec)

Now, reducing the amount of time by half is great, but it’s not good enough here. There is also a table_name column on partitions. Let’s see what happens when we turn the query on it’s head:

SELECT distinct a.table_name
  FROM information_schema.partitions a
 WHERE a.table_name like '%reporting'
   AND a.partition_name IS NOT NULL
   AND EXISTS (SELECT 1
                 FROM information_schema.columns b
                WHERE a.table_name = b.table_name
                  AND column_name = 'datestring' );
.....
13 rows in set (28.23 sec)

Now given that this is a maintenance task this timing may be appropriate. The job is already set to run in the wee hours where load is lightest and the operation is unlikely to affect anything. Furthermore, no one is sitting on the other end of a browser or client impatiently waiting for the results.

Do I have a couple of morals or learnings from this? Not anything a good DBA shouldn’t already know:
1.) there is more than one way to write a query
2.) every query needs tuning, even if it’s embedded deep in your maintenance routines.

About these ads

3 Comments »

  1. Greg,
    did you try with a JOIN, instead of a subquery?

    SELECT DISTINCT a.table_name
    FROM
    information_schema.partitions a
    INNER JOIN information_schema.COLUMNS b
    ON a.table_name = b.table_name
    AND a.table_schema = b.table_schema
    WHERE
    a.partition_name IS NOT NULL
    AND a.table_name LIKE ‘%reporting’
    AND column_name = ‘datestring’ ;

    This should be faster.

    Comment by Giuseppe Maxia — December 1, 2008 @ 3:05 pm

  2. I guess that is learning number 3.) Don’t over-complicate things.

    As Giuseppe has pointed out, a simple join will suffice:

    SELECT DISTINCT a.table_name
      FROM information_schema.partitions a,
           information_schema.columns b
     WHERE a.table_name = b.table_name
       AND a.table_name like '%reporting'
       AND a.partition_name is not null
       AND b.column_name = 'datestring';
    .....
    13 rows in set (21.80 sec)
    

    Comment by Gregory Haase — December 1, 2008 @ 3:25 pm

  3. Keep in mind that the Perl motto is usable with MySQL Databases.

    Comment by Cyril Scetbon — December 1, 2008 @ 4:21 pm


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Silver is the New Black Theme Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: