OneFreeVoice

November 18, 2008

Interesting SQL Challenge

Filed under: Code, MySQL — Gregory Haase @ 3:45 pm

A former colleague recently posed a problem to me: “This guy has a table full of table names, and he wants to run a query that unions those tables.” I spent a couple of minutes telling him why it was a bad idea. It’s not a very robust solution, and an incorrect value in your table of tables could cause the query to fail, as could a missing, renamed, or altered column on one of the target tables. In the end, my former colleague was able to argue for an architecture change. However, my curiosity had been piqued.

I ask: “Can it be done in a stored proc?”

Answer: “No.”

Hmmm… now I feel a challenge! (more…)

October 6, 2008

Information_Schema.Partitions – table_rows

Filed under: MySQL — Tags: , , — Gregory Haase @ 4:00 pm

I’m in the process of converting some very large data tables to partitioned tables. By “In the process” I mean “scripts are running as we speak and I’m monitoring what’s going on.”

When I did this in our test environment (2 or 3 times to be sure), I got familiar with the information_schema.partitions table. There are multiple ways to check if your data is going into your partitions, and going into the correct partition. One easy thing you can do is use “explain partitions” and make up a select query that should be narrowed down to that partition:

explain partitions
select *
  from very_large_table
 where date_string between '20080514' and '20080524'

If set up correctly, you’ll see a result like:

+----+-------------+------------------------------------+------------+-------+---------------+---------+---------+------+---------+-------------+
| id | select_type | table            | partitions | type  | possible_keys | key     | key_len | ref  | rows    | Extra       |
+----+-------------+------------------------------------+------------+-------+---------------+---------+---------+------+---------+-------------+
|  1 | SIMPLE      | very_large_table | p200805    | range | PRIMARY       | PRIMARY | 4       | NULL | 9016317 | Using where |
+----+-------------+------------------------------------+------------+-------+---------------+---------+---------+------+---------+-------------+

You can also take a look at the filesystem real quick to check the volume of the partition files

ls -lh very_large_table*.ibd

-rw-rw---- 1 mysql mysql 1.3G Oct  6 10:40 very_large_table#P#p200804.ibd
-rw-rw---- 1 mysql mysql 1.2G Oct  6 10:45 very_large_table#P#p200805.ibd
-rw-rw---- 1 mysql mysql 2.0G Oct  6 10:51 very_large_table#P#p200806.ibd
-rw-rw---- 1 mysql mysql 2.8G Oct  6 11:02 very_large_table#P#p200807.ibd
-rw-rw---- 1 mysql mysql 2.4G Oct  6 11:12 very_large_table#P#p200808.ibd
-rw-rw---- 1 mysql mysql 3.2G Oct  6 11:32 very_large_table#P#p200809.ibd
-rw-rw---- 1 mysql mysql 628M Oct  6 11:35 very_large_table#P#p200810.ibd

Through the course of my work, I also noticed that there is a table_rows column on the information_schema.partitions table.

select partition_name,
       table_rows
  from information_schema.partitions
 where table_name = 'very_large_table';

+----------------+------------+
| partition_name | table_rows |
+----------------+------------+
| p200804        |   19609997 |
| p200805        |   18041881 |
| p200806        |   31124871 |
| p200807        |   43307939 |
| p200808        |   35578598 |
| p200809        |   48627782 |
| p200810        |    9298862 |
+----------------+------------+

Circle back to todays script. The first thing I do is open a screen session so that if I somehow get disconnected from the server, it’s not going to kill my session. I kick off my script and there are 12 huge tables that are going to get partitioned. I start working on some other things and periodically check the progress using “show processlist”.

After a while I began to get concerned that the largest table had been processing for a very long time. I didn’t want to interrupt anything, and I didn’t want to start running count queries, but I was getting concerned about the progress. So I opened another window and performed the same query:

select partition_name,
       table_rows
  from information_schema.partitions
 where table_name = 'very_large_table';

+----------------+------------+
| partition_name | table_rows |
+----------------+------------+
| p200804        |   19609997 |
| p200805        |   18041881 |
| p200806        |   31124871 |
| p200807        |   21405637 |
| p200808        |   0        |
| p200809        |   0        |
| p200810        |   0        |
+----------------+------------+

Oh good, I can see that at least I’m half way done.

These aren’t exact numbers, but if you just want to get a size estimate of a very large table they are close enough.

There is also table_rows column on information_schema.tables. With the amount of time it takes to count rows on an InnoDB table, if you just need rough numbers on how large your tables are, you can just pull the info from the information_schema.

select count(1)
  from 'very_large_table_two';
+-----------+
| count(1)  |
+-----------+
| 275689898 |
+-----------+
1 row in set (17 min 21.60 sec)

select table_rows
  from information_schema.tables
 where table_name = 'very_large_table_two';
+------------+
| table_rows |
+------------+
|  269991752 |
+------------+
1 row in set (0.49 sec)

The table_rows number is off by more than 2% here, but there are many scenarios when all you really need is a general idea of how big something is. I’ll take even a 5% margin of error in less than a second over waiting 15 minutes for an exact number. Especially when someone is standing over my shoulder asking questions about table sizes.

July 15, 2008

Creating an Intermediate Replication Layer

Filed under: Code, MySQL — Tags: , , — Gregory Haase @ 1:52 pm

A few weeks ago, I discussed how to keep binlogs in sync in a tree or pyramid replication scheme. That thread discussed how to re-distribute load in case of failure in one of the intermediate slaves. Today we will look at how to create those intermediate replication slaves. We start with a traditional flat replication scheme and promote several servers into the intermediate layer. (more…)

June 27, 2008

Lotame is looking for a Database Engineer

Filed under: MySQL — Tags: — Gregory Haase @ 9:19 am

Lotame Solutions, Inc. is always looking for talented individuals. We have many openings in our Technology group, but the one I’m most interested in filling is the newly opened Database Engineer position. The Database Engineer will work with me to maintain Lotame’s entire database infrastructure – including design, development, testing, implementation, monitoring and support.

Interested parties should submit their resume and contact information to:
SELECT CONCAT_WS('@','jobs','lotame.com');
Include “Database Engineer” in the subject line.

This position is in the Baltimore, MD area.

June 24, 2008

How to Keep Binlogs in Sync?

Filed under: MySQL — Gregory Haase @ 4:06 pm

When you are using pyramid replication scheme – it could be useful to have n+1 intermediate servers. The idea is to have extra bandwidth in case one of those intermediate slaves fails. For example, if you have 3 intermediate slaves named A,B and C and you have 18 end slaves named 1-18: (more…)

« Newer PostsOlder Posts »

Blog at WordPress.com.