I spent most of the day today trying to get IPMI STONITH working with Heartbeat. IPMI is a system management protocol, usually implemented via an auxiliary controller, for doing various management functions including getting sensor data (fan speed, temp) and turning a server on and off. The IPMI controller is on even if the system is otherwise powered off. However, the ipmilan STONITH plugin is in pretty rough shape.

if you've gotten here via Google and are hoping this will help you get IPMI set up on your cluster, let me cut to the chase: The ipmilan STONITH driver appears to be completely unusable. You will probably have to do what I'm doing and implement a STONITH external script that uses ipmitool to do the job.

The first problem I ran into was that when I tried to set up STONITH with ipmilan, according to the README, it would report:

CRITICAL **: Unable to setup connection: 16

Google wasn't very helpful, it just pointed out someone else asking about this error from 18 months ago with no response…

I dug into the code, and found that the “auth” and “priv” fields, which the documentation says accept values like “none”, “md5”, and “admin” are passed through the “atoi()” C library call to convert them into integers. Since none of the documented values are actually integer strings, they all silently get converted to 0.

That is the core of the problem causing the error above. The “priv” field needs to be the integer 4 for “admin” in my case, but is instead 0. If you change the “priv” field to “4”, and the “auth” field to “2” for “md5” it stops reporting the above error.

However, it then starts core dumping due to an invalid pointer de-reference.

The IPMI library is incredibly poorly documented, and to make it worse the STONITH ipmilan plugin is using a deprecated function.

My opinion is that ipmilan needs to be scrapped and re-written, hopefully by someone who knows the OpenIPMI API or at least someone who can find some documentation on it.

I was able to get ipmilan to reboot the remote machine, right before it seg faults, as well as correcting the argument passing problems above I've sent that patch to the Heartbeat maintainers, but I've also recommended to them that they either completely remove IPMI or at least disable it from the default build.

I just wanted to get this up there where Google could find it so that other people could give up earlier than I did. :-(

