How to Keep Binlogs in Sync?

When you are using pyramid replication scheme - it could be useful to have n+1 intermediate servers. The idea is to have extra bandwidth in case one of those intermediate slaves fails. For example, if you have 3 intermediate slaves named A,B and C and you have 18 end slaves named 1-18:
pyramid replication schema

The idea would be that if your slave B fails, you split up slaves 7-12 between slaves A and C. You can point 7-9 to slave A and 10-12 to slave C. Your replication scheme now looks more like this:

The question becomes: What is th easiest way to move these slaves to a new intermediate server?

The brute force method is not hard, it’s just tedious. You need to examine the bin logs on 7-12, look at the last statement that was run, then look at slave A and C and find the bin-log and position for the same statement. You’ll probably want to make sure you are matching the server_id from your intial master server, and also the SET TIMESTAMP line from the binlogs. The book High Performance MySQL, Second Edition covers this task well (see page 384 - Locating the desired log positions). If you are lucky, at least servers 7-12 are in the same place and the amount of work you need to do is minimized. Once you know the bin-log and position on intermediate slave, you can issue CHANGE MASTER commands on slaves 7-12 to move them over.

But what if your intermediate servers are always in sync? If they’re always recording updates from the master in the same bin-log position in the same bin-log file, this would become easy. Simply perform SHOW SLAVE STATUS from each of 7-12, note the RELAY_MASTER_LOG_FILE, EXEC_MASTER_LOG_POSITION and then incorporate them into a CHANGE MASTER command pointing to the new host.

Assuming your servers can actually stay in sync, we need to get them there in the first place. I’m still trying to figure out the best way to do this. This should be a process that can also be followed to get things back in sync if somehow they slip. So far I’ve come up with the following process:

  1. Issue a show master logs to find the current bin-log position on the master server
  2. Add one to the bin-log number from above - e.g. if the result of step 1 is bin-logs.00024 then your new number is bin-logs.00025
  3. On each of the intermediate servers, issue: STOP SLAVE; START SLAVE UNTIL MASTER_LOG_FILE = ‘bin-logs.00025′, MASTER_LOG_POS = 106; SELECT MASTER_POS_WAIT(’bin-logs.00025′,106); RESET MASTER;
  4. On the master, issue: FLUSH LOGS;
  5. Wait for all the intermediate slaves to catch up to the master and reset
  6. On each of the end slaves, issue: RESET SLAVE followed by CHANGE MASTER TO…*

I have a couple of unresolved questions/concerns with this plan.

Is there a chance that a statement could replicate between when MASTER_POS_WAIT returns and when RESET MASTER executes? If that happens, does it happen on all of the intermediate slaves? [UPDATE] Changing this to use the START SLAVE UNTIL syntax suggested by Mark Leith in the comments resolves this issue.

What’s the best way to script this so I’m not manually issuing RESET SLAVE and CHANGE MASTER on 18 different end slaves?

*Interestingly enough, if you have master_host, master_user, etc. set up in your my.cnf file, then you only need to do a reset slave - the reset slave command picks up this information from the configuration file and starts replication on that master server at bin-logs.00001, position 4. You don’t really want to do this though. The option is deprecated and will disappear in upcoming versions. Also, this will work against you if you eventually have to point your slave to a new master.

9 Responses to “How to Keep Binlogs in Sync?”

Mark Leith Says:

Yes there is a chance that there could be a statement issued from the slave SQL thread between the MASTER_POS_WAIT and the RESET MASTER.

You want to look in to STOP SLAVE ; START SLAVE UNTIL instead, this guarantees that no statements run after the point that you want to be at.

This was discussed a little before on Baron’s blog (and in the comments) here:

http://www.xaprb.com/blog/2007/01/20/how-to-make-mysql-replication-reliable/

Gregory Haase Says:

That’s a good point. I keep forgetting about START SLAVE UNTIL.

Xaprb Says:

MASTER_LOG_FILE, EXEC_MASTER_LOG_POSITION is quicksand for the unwary :-) Master_log_file is the I/O thread’s position, and you care about what updates have been applied, not which have been read from the master.

I’m still working on automating this with mk-slave-move from Maatkit. You can already do it *before* the server crashes, with the current code. But after it crashes — that’s harder. Even while the servers are all functioning normally there are a lot of tricky cases to cover.

Gregory Haase Says:

I revised the process outlined above to include Mark’s suggestion about START SLAVE UNTIL.

I also changed MASTER_LOG_FILE to RELAY_MASTER_LOG_FILE reflecting Baron’s comment.

RELAY_MASTER_LOG_FILE is the name of the master binary log file containing the most recent event executed by the SQL thread. It’s companion EXEC_MASTER_LOG_POSITION is the position of the last event executed by the SQL thread from the master’s binary log. See Documentation.

I would note that in the case of one of my intermediate slaves going down, the SQL thread on the end slaves is not going to stop. It’s going to process all the way through the relay logs. The I/O thread would keep trying to connect to the master every 60 seconds. If the slaves were guaranteed in sync, one could theoretically map the IP Address of the failed server to one of your working servers and never miss a beat.

One Free Voice » Blog Archive » Creating an Intermediate Replication Layer Says:

[...] few weeks ago, I discussed how to keep binlogs in sync in a tree or pyramid replication scheme. That thread discussed how to re-distribute load in case of [...]

Gregory Haase Says:

I’ve edited this procedure to reflect the fact that START SLAVE UNTIL is asynchronous (see comment from Mats Kindahl).

Sascha Curth Says:

How do you assure the synchronicity of the binlog positions? Some time ago I filed a bug at MySql regarding this problem, but unfortunally didn’t get a proper solution:

http://bugs.mysql.com/bug.php?id=36541

Gregory Haase Says:

I read through your bug report, and I understand your issue, but it doesn’t really apply to my scenario. I have 1 master which replicates to 3 slaves, which each replicate down. Although I am using a lot of transactions, and my 3 intermediate slaves would not have the same binlog name and position of the master, they are in sync with each other.

Even then, I don’t have a system for automatic failover of the intermediate nodes, and I’m not sure I would trust the binlog positions to be absolutely the same.

My current procedure in the event of a failure is to assign the standard replicated servers with one of the other intermediate servers as master. Before manually issuing a change master on each slave, I will compare the relay-log on the slave to the bin-log on the new master.

First, do a SHOW SLAVE STATUS on the slave you want to change master on. Make a note of RELAY_MASTER_LOG_FILE, EXEC_MASTER_LOG_POS, RELAY_LOG_FILE and RELAY_LOG_SPACE. On the slave, do mysqlbinlog RELAY_LOG_FILE and look at the timestamp and the command identified by RELAY_LOG_SPACE. On the new master, do mysqlbinlog RELAY_MASTER_LOG_FILE and look at the timestamp and the command identified by EXEC_MASTER_LOG_POS. If these commands are the same, it’s a safe bet I can just point the slave to it’s new master and use it’s current coordinates. If the commands are different, then we need to find the correct coordinate on the new master that reflect the next command that the slave needs to use before we can do a change master.

There was lots of talk in our shop about using a VIP in our intermediate layer, so that if one of those servers failed, we could just point the VIP to one of the still functioning intermediate servers. I put that conversation on hold for two reasons - I don’t trust the binlog positions to be in sync, and I only have n+1 intermediate servers, and if I assign all of my orphaned slaves to only one of those servers, it’s going to have double the load of all the others. There’s probably a way to carefully automate it all, but we have bigger fish to fry right now.

Gregory Haase Says:

Here’s a clarification: I just looked at my intermediate layers and compared a single command/timestamp between them. They all had a different binlog posititon:
server 1: 111949922
server 2: 111957284
server 3: 112010726

These servers have been running for 98 days. I think considering all the water that’s gone under the bridge, they are really close. To be honest, they are a lot closer than I personally thought they’d be.

Can I tell you why they are off - No. I do know of two separate occasions where the slave SQL thread stopped due to a bad query. In both cases there was some manual intervention. This could possible have had an affect on the binlogs.

I guess I’ll be doing a lot more investigation into this in the next couple of days. :-/

Leave a Reply

XHTML: Line and paragraph breaks happen automagically. You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="">