The Problem of Redundant Power Supplies
Redundant power supplies are good things. All power supplies fail eventually, they rank right up there with harddrives failing in my book. You have to plan for them to fail. This means on critical servers, you put two in there! It's like RAID-1 for power supplies if that works for you.

The Solution: Monitoring
The solution here is to monitor your redundant power supplies. You cannot rely on someone hearing the beep. How do you do this? On SuperMicro machines you can setup email alerts. That is kinda good, but email servers change, spam filters, dropped packets... do you sleep well at night? No, the real solution is an active checking system. You need active checks to know that it is good and working, and then the check fails, someone needs to know. A silently failing email alert is not good enough. At tummy.com we use the open source staple, Nagios.IPMItool to the Rescue
IPMItool is an open source utility to work with the IPMI management cards in some servers. Depending on your particular Linux distribution, you can probably "apt-get install ipmitool" or "yum install ipmitool" to get it. It is basically a command line tool that can be used instead of the IPMI web interface.Get the Plugin
The plugin for checking Supermicro power supplies can be found on the tummy.com FTP site. This plugin is written for the X8 class motherboards, and may need changes in the IPMI raw commands to work with other boards.
You can drop this in your nagios plugins directory, usually /usr/lib/nagios/plugins. As with any script I use I suggest at least looking at it to get an idea about how it works. With no arguments it will prompt you with the needed command line format:
# ./check_ipmi_powersupply USAGE: -H host -U ipmi_username -P ipmi_password
Not too complicated, and it looks like most any other nagios plugin. Here is some nagios command glue to help you use it:
define command{
command_name check_ipmi_powersupply
command_line $USER1$/check_ipmi_powersupply -H $HOSTADDRESS$ -U ADMIN -P $ARG1$
}
And to use it as a service for some host:
define service{
use generic-service
host_name My-Really-Important-Server
service_description POWERSUPPLY
contact_groups admin
check_command check_ipmi_powersupply!supersecretpassword
}
You can see in this way I have the password as the first argument, allowing me to use the same command description on multiple different hosts. I found that the Admin account was the only account that had the privilege of sending the raw commands necessary to check the power supply in this way.
The IPMI Raw Command
So a nagios plugin that checks power supplies, no big deal right? Maybe, but if you want to get the job done right, you have to monitor the server completely, from the health of the power supply all the way up to the status code of the apache page. The real magic in this thing comes from the raw IPMI command that the IPMItool sends. This raw command does a very low level query to the data bus that the power supply is connected to. Here is the explanation from the Supermicro engineer I worked with to make this check:
# ipmitool -H-U -P raw 0x06 0x52 0x07 0x78 0x01 0x78 >> >> NetFn: 0x06 >> Cmd : 0x52 >> Data : 0x07 // bus 3 for X8 motherboard >> 0x78 // slave address of PS (it can be 0x78, 0x7a, 0x7c for 3 redundant PS >> 0x01 // read 1 byte >> 0x78 // where 78 is offset of the PS, 0-bad, 1-good >> >> If the power supply is installed but failed, it will return value 0. >> If the power supply totally lose the power, it will reply an error message.
And this is the main reason for this blog post, to get this ipmi raw command out in the open. A special thanks goes out to the Supermicro engineer who was able to pass down these special commands from deep within the bowels of their documentation.
It is worth noting that particular command will only work on X8 class motherboards. Other motherboard types will need to be looked up. If you are deploying this on a Supermicro 4-Node 6026TT then only the blade in the A slot has access to this data bus.