Back to the main page

Nagios plug-in : Solaris Fault Manager

With Nagios you can monitor almost everything and philosophy is simple.

Nagios uses plug-ins, say Perl/shell script and check its returning value and according to that determines host/service state. So Nagios doesn't know and it's not interested to know what plug-in is monitoring.

Here is the plug-in that monitors/reports if anything interesting is found by Solaris Fault Manager.

#!/usr/bin/sh
#set -x

# Nagios plugin return values
STATE_OK=0
STATE_WARNING=1
STATE_CRITICAL=2
STATE_UNKNOWN=3
STATE_DEPENDENT=4

FMADM=/usr/sbin/fmadm
AWK=/usr/bin/awk
SUDO=/opt/csw/bin/sudo

# Function: end script with output
endscript () {
        echo ${RESULT}
        exit ${EXIT_STATUS}
}

# Check if fmdump exists
if [ ! -f ${FMADM} ]
then
        RESULT="Cannot find ${FMADM}"
        EXIT_STATUS=${STATE_WARNING}
        endscript
fi

# check if service 'fmd' is enabled
if [ `svcs -H fmd | awk '{print $1}'` != online ]
then
        RESULT="The fmd service is not online!"
        EXIT_STATUS=${STATE_WARNING}
        endscript
fi

# Run fmdump
# -r = Show Fault Management Resource with their Identifier (FMRI) and state
UUID=`${SUDO} ${FMADM} faulty -r | ${AWK} '$0 !~ /TIME/ && $0 !~ /STATE/ && $0 !~ /^----/ {print $0}'`

if [ -n "${UUID}" ]
then
        RESULT="${UUID}"
        EXIT_STATUS=${STATE_CRITICAL}
else
        RESULT="The Fault Manager does not report any hardware problem."
        EXIT_STATUS=${STATE_OK}
fi

endscript

This executable shell script is located in the directory /opt/csw/libexec/nagios-plugins on machine to be monitored.

This article is not about NRPE, but I have to write this:

And now Nagios knows the state or recourse, like OK or Critical. And Nagios doesn't care what resource is.

Saying this, you need this line in your nrpe.cfg (configuration file for cswnrpe service) file on machine whose FM messages you want to know.

command[check_fmd_output]=/opt/csw/libexec/nagios-plugins/check_fmd_output.sh

Your Nagios machine needs defined service, something like:

define service{
        use                             gen-service         ; Name of service template to use
        hostgroup_name                  SUN
        service_description             Solaris Fault Manager
        servicegroups                   Solaris_Fault_Manager
        check_command                   check-nrpe!check_fmd_output
        }
Back to the main page