Using the Linux-HA heartbeat software can lead to some interesting situations, even after you have a stable and reliable configuration. I recently ran into a situation where human and crm (Cluster Resource Manager) were bumping heads over resource control.
The heartbeat package is a great set of tools for building and running a highly-available system. This gives you a daemon that watches your other daemons making sure that they're doing what they're supposed to be doing. A side-effect of this is that it becomes slightly more difficult to control these daemons using something like "
service httpd stop".
Take for instance this simple heartbeat configuration, which we'll call our "v1" config. This is the
/etc/ha.d/haresources file for this config:
ha1 IPaddr:a.b.c.d httpd proftpd
If you're running a simple version 1 configuration, meaning using "
crm no" in your
/etc/ha.d/ha.cf, then doing something like "
service httpd stop" normally wouldn't be a problem. Once you've decided to make the leap to using a version 2 config (v2), meaning using "
crm yes" and have converted your haresources file to a
/var/lib/heartbeat/crm/cib.xml file, now you are likely having these resources monitored. This means that the crm is going to restart them if it sees them not running.
Now I need to stop the httpd resource for a bit. Running "
service httpd stop" is only going to cause me problems at this point. Within the default span of 120 seconds the heartbeat crm is going to check on httpd and see that it's dead causing a restart of that service. My tasks on the system aren't yet complete so httpd restarting at this point will be a problem, potentially causing problems with heartbeat itself as we battle over the run state of this service.
The solution is the
crm_resource command. We'll use this to query and control resources from the command-line. First I'm going to check which services are managed and what they're called by the crm:
ha1# crm_resource --list Resource Group: group_1 IPaddr2_1 (ocf::heartbeat:IPaddr2) httpd_2 (lsb:httpd) proftpd_3 (lsb:proftpd)
We can check which heartbeat cluster member the resource is located on also if needed:
ha1# crm_resource --locate --resource httpd_2 resource httpd_2 is running on: ha1
Stop the resource:
ha1# crm_resource --resource httpd_2 --set-parameter target_role --property-value stopped
/var/log/messages in another window is something I usually do when working with heartbeat systems. Upon stopping a resource, many messages will be logged by heartbeat. You can also use your favorite method to verify this, maybe
ps axo comm,pid,pcpu,stat | grep httpd or possibly
pgrep httpd for some variety.
Once you're satisfied that the service can be handed back over to heartbeat for management, start the resource back up:
ha1# crm_resource --resource httpd_2 --set-parameter target_role --property-value started
Now we're equipped with the ability to manually control services in a heartbeat-managed cluster without worrying that they will be stopped or started again by the system.
The systems and tools used in these examples are CentOS 5.5 and heartbeat-2.1.4. In the above examples, I've used the full-name options to help clarify what the options are doing. The following commands do the same as those above:
comments powered by Disqus
crm_resource -L crm_resource -W -r httpd_2 crm_resource -r httpd_2 -p target_role -v stopped crm_resource -r httpd_2 -p target_role -v started