Thursday, November 12, 2009

Setting up Heartbeat for high availability

So you're interested in having your services be highly available in the case of an outage? Well the easy to way to do that is with a simple heartbeat setup.

The first thing I recommend for anyone attempting to setup heartbeat is to run ntpdate and turn on ntpd against a local common ntp server. If the times become seriously out of sync sometimes the heartbeat will misbehave.

The actual installation of heartbeat itself is simple and part of the standard distribution in rh5 or centos5, the way we are going to set this up is known as an active/passive configuration. The service is active only on one node of the cluster at a time and the other node only takes over service for the other node if it is unavailable.
yum install heartbeat

Next we need to create some basic files for heartbeat to use.

The first thing we need to setup is the ha.cf, it contains all the information on how the cluster should behave, what the nodes are and how to log and how long before something should be considered dead. There are additional options you can setup here but for simple testing and ease of use these are good starting values:
/etc/ha.d/ha.cf

logfile /var/log/ha-log
logfacility local0
udpport 694
keepalive 5
warntime 10
deadtime 30
initdead 60
bcast eth0
auto_failback on
node server1.domain.tld
node server2.domain.tld

The most significant item here is the auto_failback which means it will restore the services to the appropriate servers as quickly as possible once things are stable.

The next thing that needs setup is what resources each node is primarily responsible for, in the event of failure of a node then the other node will take over operations of that process if possible which means the services need to be installed on a shared device or at least use shared resources or synchronized configurations.
/etc/ha.d/haresources

server1.domain.tld 192.168.1.77 named
server2.domain.tld 192.168.1.78 httpd

It is critical that you use the name here that you gave the node name in the ha.cf. If they do not match "Bad Things Will Happen"(tm). The format is simple, node ip service to use.

The nodes need to have a pre-generated key that they share that is exactly the same, all of these files should contain exactly the same data otherwise you are likely to have significant problems. The ip address should NOT be the main ip address of the machine but an alias used by the machine for just this process.
/etc/ha.d/authkeys

auth 2
2 sha1 ThisIsAKeyThatMustMatch

Copy each of these files to the other server into the /etc/ha.d directory, I also suggest you setup a /etc/hosts entry for each of your nodes in case dns is unavailable. Some people keep the files in sync on both servers by having them mount the /etc/ha.d directory from an nfs or samba file share at boot, others rsync them on a regular basis or use cvs to keep things synced. There are a lot of ways to do it, but at this point lets take the easiest way and just copy the files from server1.domain.tld to server2. You can accomplish this by executing the following command provided you have ssh running on both machines.
scp /etc/ha.d/* root@server2.domain.tld:/etc/ha.d

On both servers you want to start heartbeat and let it take care of starting the daemons it controls or needs to take control of.
chkconfig heartbeat on
chkconfig httpd off
chkconfig named off

Heartbeat itself will take care of starting the daemon if it's not already running. Now shutdown both systems and watch as they come back up, if one boots significantly faster than the other it may immediately acquire the resources of the other machine until its fully online and heartbeat has started to respond. If you have console access you should be able to test functionality by downing one of the machines Ethernet interfaces.

If everything worked correctly you should now have a redundant machine that will take over control of another machine in the event of a machine failure.

0 comments:

Post a Comment