I’m in the process of converting some very large data tables to partitioned tables. By “In the process” I mean “scripts are running as we speak and I’m monitoring what’s going on.”
When I did this in our test environment (2 or 3 times to be sure), I got familiar with the information_schema.partitions table. There are multiple ways to check if your data is going into your partitions, and going into the correct partition. One easy thing you can do is use “explain partitions” and make up a select query that should be narrowed down to that partition:
explain partitions select * from very_large_table where date_string between '20080514' and '20080524'
If set up correctly, you’ll see a result like:
+----+-------------+------------------------------------+------------+-------+---------------+---------+---------+------+---------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------------------------------+------------+-------+---------------+---------+---------+------+---------+-------------+ | 1 | SIMPLE | very_large_table | p200805 | range | PRIMARY | PRIMARY | 4 | NULL | 9016317 | Using where | +----+-------------+------------------------------------+------------+-------+---------------+---------+---------+------+---------+-------------+
You can also take a look at the filesystem real quick to check the volume of the partition files
ls -lh very_large_table*.ibd -rw-rw---- 1 mysql mysql 1.3G Oct 6 10:40 very_large_table#P#p200804.ibd -rw-rw---- 1 mysql mysql 1.2G Oct 6 10:45 very_large_table#P#p200805.ibd -rw-rw---- 1 mysql mysql 2.0G Oct 6 10:51 very_large_table#P#p200806.ibd -rw-rw---- 1 mysql mysql 2.8G Oct 6 11:02 very_large_table#P#p200807.ibd -rw-rw---- 1 mysql mysql 2.4G Oct 6 11:12 very_large_table#P#p200808.ibd -rw-rw---- 1 mysql mysql 3.2G Oct 6 11:32 very_large_table#P#p200809.ibd -rw-rw---- 1 mysql mysql 628M Oct 6 11:35 very_large_table#P#p200810.ibd
Through the course of my work, I also noticed that there is a table_rows column on the information_schema.partitions table.
select partition_name,
table_rows
from information_schema.partitions
where table_name = 'very_large_table';
+----------------+------------+
| partition_name | table_rows |
+----------------+------------+
| p200804 | 19609997 |
| p200805 | 18041881 |
| p200806 | 31124871 |
| p200807 | 43307939 |
| p200808 | 35578598 |
| p200809 | 48627782 |
| p200810 | 9298862 |
+----------------+------------+
Circle back to todays script. The first thing I do is open a screen session so that if I somehow get disconnected from the server, it’s not going to kill my session. I kick off my script and there are 12 huge tables that are going to get partitioned. I start working on some other things and periodically check the progress using “show processlist”.
After a while I began to get concerned that the largest table had been processing for a very long time. I didn’t want to interrupt anything, and I didn’t want to start running count queries, but I was getting concerned about the progress. So I opened another window and performed the same query:
select partition_name,
table_rows
from information_schema.partitions
where table_name = 'very_large_table';
+----------------+------------+
| partition_name | table_rows |
+----------------+------------+
| p200804 | 19609997 |
| p200805 | 18041881 |
| p200806 | 31124871 |
| p200807 | 21405637 |
| p200808 | 0 |
| p200809 | 0 |
| p200810 | 0 |
+----------------+------------+
Oh good, I can see that at least I’m half way done.
These aren’t exact numbers, but if you just want to get a size estimate of a very large table they are close enough.
There is also table_rows column on information_schema.tables. With the amount of time it takes to count rows on an InnoDB table, if you just need rough numbers on how large your tables are, you can just pull the info from the information_schema.
select count(1) from 'very_large_table_two'; +-----------+ | count(1) | +-----------+ | 275689898 | +-----------+ 1 row in set (17 min 21.60 sec) select table_rows from information_schema.tables where table_name = 'very_large_table_two'; +------------+ | table_rows | +------------+ | 269991752 | +------------+ 1 row in set (0.49 sec)
The table_rows number is off by more than 2% here, but there are many scenarios when all you really need is a general idea of how big something is. I’ll take even a 5% margin of error in less than a second over waiting 15 minutes for an exact number. Especially when someone is standing over my shoulder asking questions about table sizes.