DRBD and the Heartbeat software allow a pair of Linux machines to be set up in a high availability pair. One thing that I hadn't consciously thought about is that administering a HA cluster is more difficult than administering two single machines. A client recently brought this up to me and my first thought was "Yeah", but then I realized that this isn't really documented well anywhere.
Why isn't it easier than maintaining two single machines? While DRBD does share a partition (or several), that partition usually doesn't contain the whole OS. You would normally configure it just to share your configuration files and data, leaving the software, and things like "/etc" on the main hard drive(s).
We have in the past successfully set up systems where the whole OS was on the DRBD shared partition. However, this setup can be incredibly complicated to set up right, and relies on a lot of tricks working with the system boot process. When the system boot or shut down process changes, these settings tend to break and require many hours to track down.
So, you end up having to install packages on both machines, make configuration changes on both machines, etc...
What is it that makes it harder than maintaining a single system? This is largely the complexity of the HA system. You have to build and test scripts that manage the starting and stopping of all the applications you want to be clustered, make sure that the appropriate configuration and data is available on both machines at the right times, work around the fact that simple cron jobs don't work because of clustering, etc. Or if you don't, at least someone does. We do this for our hosting clients as part of the clustered hosting setup, so they're hidden from much of this work.
It also usually requires some understanding of how the clustering software and hardware works. At the very least, you can't just shut down heartbeat to take down the services you are running for maintenance, because they'll shift out from under you to the other machine if you don't do it right. The list goes on.
However, suffice it to say that clusters can add a lot of complexity for the system administrator. As far as I know, there really isn't much documentation about what you're in for when you decide to cluster some systems.comments powered by Disqus