This post originated from an RSS feed registered with Java Buzz
by dion.
Original Post: Migrating data centers with zero downtime
Feed Title: techno.blog(Dion)
Feed URL: http://feeds.feedburner.com/dion
Feed Description: blogging about life the universe and everything tech
Last week I had to move a live heavily trafficed web application from a data center in south america, to a top class offering at Contegix.
I can't say enough for the guys at Contegix. Normally I never look forward to talking to "the hosting guys", but ever since we have worked with Mathew Porter and his team, it has become a pleasure. They really go above and beyond.
Anyway, back to the move. Some of you will probably think that this is a no-brainer, but a few too many people were talking about sitting there doing the DNS switch and "waiting for people to migrate over" so I thought I would put up our process for moving. Note that we will use the term "old server" to mean the soon-to-be old server, and the "new server" to be the soon to be live and current server :)
High level process
Get the app running on the old and new boxes
Configure the old server for redirect mode
Configure the new server for live mode
Do the switch
Watch and see
Get it running
Of course, the first step was to have migrated the application to the new machine, and run the battery of tests to make sure all is well.
Part of this will involve a test plan of things to check off.
You will also probably want to setup your /etc/hosts file to point to the new machine to do some testing to make sure that there are not any URL gliches "it worked as foo.test.com but not when it was live at foo.com!".
Redirect Mode
The magic is simple. At some point you need to tell DNS that your server is actually at a new location. We all know that it can take roughly up to 24 hours or so for this to propogate through the internet. There are some tricks such as taking the TTL down a notch (and making sure that gets through a cycle), but it will still take time for the full propogation.
You only want people going to ONE machine though, and as soon as you flip the switch, you want it to be the new machine.
For this to happen, you simply make sure that anyone finding their way to the old machine gets proxied through to the correct place. You can't just tell them "hey go here" unless you want to redirect them to an IP address, and that would only work if your server as the only guy on that IP, and that nothing had a domain in it, and that it didn't have a domain piece.
In our case, our app relies a lot on the URL coming in to do different things (respond with different applications).
Anyway, to do the proxy thing, we simply go through each virtual host in our httpd.conf, put an entry in the /etc/hosts of the old machine to point to the new machine, and proxy over with mod_proxy:
RewriteEngine on
RewriteRule ^/(.*)$ http://foo.com/$1 [P]
We had to have foo.com in our /etc/hosts pointing to the new machine, as again, our app happened to care about the URL, so we had to proxy over to "foo.com".
This form of proxy means that if someone connects to the old machine, that old machine connects to the new one and sends the data across. If you looked at the logs on your new machine you will see that each request is coming from the old machines ip address (so your web stats may be a bit weird for this day).
If you use mod_proxy to talk to a different domain, then you will want to make sure that content coming back is rewritten so it hits you on the backside.
Now html with http://bar.com/test.html can be rewritten as /test.html so it still works.
Configure the new server for live mode
Live mode is going to be the same as the httpd.conf on the old box, although you will probably have to do a s/oldip/newip/ on the config file(s).
Do the switch
When you are ready to do the switch (say, 3am on a Sunday?) you will want to have a process/script that is able to dump the latest data from the database on the old setup, migrate it to the new one, load in that data, and then restart httpd on the old box in redirect mode.
Now, you could bring down the old system for this to happen, and then you will get some downtime.
A set of our applications are very heavy read applications (typical in web apps), so rather than have downtime you can make the choice of:
Putting the app into a read only mode
Keep it running, and migrate the couple of records that you missed in the time that you restarted
Keep it running, and tough luck to the few people who may have added/modified/deleted records in the short window
Depending on the app, this is valid. If this is a community site, maybe we don't care too much if Bob replied to a thread and he has to do it again later? :)
Now things are running fine on the new machine it is time to switch your DNS to point to the new boy.
Watch and see
Now the app is switched, you can have windows on both machines with tail -f on the logs.
Over time you will slowly see more different ip addresses on the new machine as the dns change propogates. You will also slowly see less and less traffic on the old machine. When you are done with traffic there you can take it out of commision.
If there are some major problems, you can quickly revert back to the old server :)