Majority people use one of many publicly available Nagios plugins for disk (or file system) space usage.
Me too, and if you use some graphic tool you have nice graph like this one, where you see total size of file system, size of data and free space.

But this is nice if you monitor, like in my case, a UFS file system. What if you monitor a ZFS that have snapshots?
Then your graph is little strange. The picture explains everything.

Okay not big deal, but I have developed the plugin that checks ZFS usage and produces more understandable graph.
The plugin uses some ZFS properties from ZPOOL version 13 and higher, so exits if this is not true.
Here is the plugin, place this on remote host (example: /opt/csw/libexec/nagios-plugins/check_zfs_usage.sh)
#!/bin/sh
#set -x
# script for checking disk usage on ZFS
# requires min zpool version 13 or zfs version 4
# example, it's posible to have zfs ver 1 on zpool ver 15 (script support this)
# ------------ Variables
PROGNAME=`/usr/bin/basename $0`
# ------ Nagios plugin return values
STATE_OK=0
STATE_WARNING=1
STATE_CRITICAL=2
STATE_UNKNOWN=3
STATE_DEPENDENT=4
# ------------ Subroutines
# Program usage
usage() {
echo " \
Usage
${PROGNAME} /zfs warn crit
Note:
1. ZFS filesystem must start with /
2. warn is warning free space in %
3. crit is critical free space in %
example: /var 20 10
"
}
# End script with output, with performance data for NagiosGraph
endscript () {
echo "${RESULT}"
exit ${EXIT_STATUS}
}
# ------------ check if there are 3 arguments
if [ $# != 3 ]; then
usage
exit 3
fi
# --------- check if warning is bigger than critical size
if [ $3 -ge $2 ]; then
echo "Warning[%] must be bigger than Critical[%]"
exit 3
fi
# ----------- check if first argument is a filesystem
FS=`df -n $1`
if [ $? != 0 ]; then
echo "The $1 is not valid filesystem"
exit 3
fi
# ----------- check if filesystem is ZFS
# /var : zfs
# /export/atlant-dbbackup: zfs
# note - comment out: ZFS=`echo ${FS} | awk '{print $3}'`
ZFS=`echo ${FS} | awk -F: '{print $2}'`
if [ ${ZFS} -ne zfs ]; then
echo "The $1 is not ZFS"
exit 3
fi
# -------- get dataset of filesystem
DATASET=`df -h $1 | grep -v Filesystem | awk '{print $1}'`
# ----------- check if ZFS is min required version 4 or ZPOOL min required ver 13
ZFSVER=`zfs get -H version ${DATASET} | awk '{print $3}'`
if [ $? -ne 0 ]; then
echo "The ZFS version can't be determined, it's probably less then 4"
exit 3
fi
if [ ${ZFSVER} -lt 4 ]; then
#echo "The $1 is indeed ZFS, but version ${ZFSVER} which is less than 4 and not supported by this script"
#exit 3
# ---------- check if ZPOOL is min required version 13, or higher
ZPOOLVER=`zpool upgrade | head -1 | awk '{print $NF}' | awk -F. '{print $1}'`
if [ ${ZPOOLVER} -lt 13 ]; then
echo "The script can't support zpool ver ${ZPOOLVER} (<13) and ZFS ver ${ZFSVER} (<4)"
exit 3
fi
fi
# size in bytes
QUOTA=`zfs get -Hp quota ${DATASET} | awk '{print $3}'`
# --- check if there is quota at all
if [ ${QUOTA} -eq 0 ]; then
echo "There is no quota on zfs dataset ${DATASET}"
exit 3
fi
# --- check if zfs properties can be determined
# --- sometimes even zfs ver =4 this is not posible
for i in usedbydataset usedbychildren usedbysnapshots
do
if [ "`zfs get -Hp ${i} ${DATASET} | awk '{print $3}'`" = "-" ]; then
echo "Somehow zfs property ${i} cannot be determined"
exit 3
fi
done
# --- check if usedbydataset is not 0
# --- can happens with export/import zpools
if [ `zfs get -Hp usedbydataset ${DATASET} | awk '{print $3}'` -eq 0 ]; then
echo "Somehow zfs property usedbydataset=0, probably zpool exported/imported and script can't support it"
exit 3
fi
CHILDRENUSE=`zfs get -Hp usedbychildren ${DATASET} | awk '{print $3}'`
DATA=`zfs get -Hp usedbydataset ${DATASET} | awk '{print $3}'`
SNAPSHOT=`zfs get -Hp usedbysnapshots ${DATASET} | awk '{print $3}'`
# size in Mbytes
QUOTA=`(echo "scale=2; ${QUOTA}/1024/1024" | bc -l)`
CHILDRENUSE=`(echo "scale=2; ${CHILDRENUSE}/1024/1024" | bc -l)`
DATA=`(echo "scale=2; ${DATA}/1024/1024" | bc -l)`
SNAPSHOT=`(echo "scale=2; ${SNAPSHOT}/1024/1024" | bc -l)`
# real quota is actually quota-usedbychildren
QUOTA=`(echo "scale=2; ${QUOTA}-${CHILDRENUSE}" | bc -l)`
FREE=`(echo "${QUOTA}-${DATA}-${SNAPSHOT}" | bc -l)`
FREEPERC=`bc -l << E
scale=2
${FREE}*100/${QUOTA}
E`
WARNING=$2
CRITICAL=$3
if [ ${FREEPERC} -gt ${WARNING} ]
then
RESULT="ZFS ver${ZFSVER} $1 OK Free space ${FREE}MB ${FREEPERC}% : ${QUOTA}, ${SNAPSHOT}, ${DATA}, ${FREE}"
EXIT_STATUS=${STATE_OK}
elif [ ${FREEPERC} -le ${WARNING} ] && [ ${FREEPERC} -gt ${CRITICAL} ]
then
RESULT="ZFS ver${ZFSVER} $1 WARNING Free space ${FREE}MB ${FREEPERC}% : ${QUOTA}, ${SNAPSHOT}, ${DATA}, ${FREE}"
EXIT_STATUS=${STATE_WARNING}
else
RESULT="ZFS ver${ZFSVER} $1 CRITICAL Free space ${FREE}MB ${FREEPERC}% : ${QUOTA}, ${SNAPSHOT}, ${DATA}, ${FREE}"
EXIT_STATUS=${STATE_CRITICAL}
fi
# ------- provide output and nagios return value
endscript
|
You need to define new Nagios command on your Nagios machine (example: in /etc/nagios/COMMON/commands.cfg)
# check usage of ZFS
# example: on local host add line to /opt/csw/etc/nrpe.cfg
# command[check_zfs_usage]=/opt/csw/libexec/nagios-plugins/check_zfs_usage.sh
define command{
command_name check-nrpe-zfs-usage
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -t 30 -c check_zfs_usage!$ARG1$!$ARG2$!$ARG3$
}
|
Also define new Nagios service on Nagios machine (example in /etc/nagios/UNIX/service-DISK-ZFS.cfg)
define service{
# check ZFS usage
use gen-service
host_name unixlab
service_description ZFS-/
check_command check-nrpe-zfs-usage!/!20!10
}
|
And configure NRPE (on remote host) to use plugin and check ZFS usage.
(example: /opt/csw/etc/nrpe.cfg)
# check usage of ZFS command[check_zfs_usage]=/opt/csw/libexec/nagios-plugins/check_zfs_usage.sh $ARG1$ $ARG2$ $ARG3$ |
I use NETWAYS Nagios Grapher v1.7.1 for having graphs, so here is graph configuration on Nagios machine.
(example /etc/nagios/ngraph.d/check_zfs_usage.ncfg)
# ---------- Help ------------------------------------
# service_name =
# regular expresion used to identify service
#
# graph_log_regex =
# regular expresion used to find searched value in performance data
# must be in round brackets ()
#
# graph_value = variable name in rrd database, no empty space
#
# graph_units = units on Y axis, X axis is time
#
# graph_legend = it contains key for variable, shows under graph
#
# page = optional
#
# rrd_plottype = LINE1 is simple line, AREA is filled out surface
#
# -----------------------------------------------
# example of plugin output
# ZFS ver4 / OK Free space 1911.93MB 23.33% : 8192.00, 2048.66, 4231.41, 1911.93
define ngraph{
service_name ZFS
graph_log_regex \d*\.\d*, \d*\.\d*, \d*\.\d*, (\d*\.\d*)
graph_value free
graph_units MB
#graph_legend Free space
rrd_plottype AREA
rrd_color 00FFFF # cyan
hide yes
graph_lower_limit 0
}
define ngraph{
service_name ZFS
type GPRINT
print_source free # source is graph_value previously defined
print_description Free disk space:
print_function LAST # returns most recent update of RRA (round robin archive)
print_format %11.2lf MB
print_eol left # start next GPRINT in new row
}
define ngraph{
service_name ZFS
graph_log_regex \d*\.\d*, \d*\.\d*, (\d*\.\d*), \d*\.\d*
graph_value data
graph_units MB
graph_legend ZFS data
graph_lower_limit 0
rrd_plottype AREA
rrd_color 008000 # green
}
define ngraph{
service_name ZFS
type GPRINT
print_source data
print_description Latest:
print_function LAST # returns most recent update of RRA (round robin archive)
print_format %2.2lf
}
define ngraph{
service_name ZFS
type GPRINT
print_source data
print_description Maximum:
print_function MAX # returns max value of RRA (round robin archive)
print_format %2.2lf
print_eol left # start next GPRINT in new row
}
define ngraph{
service_name ZFS
graph_log_regex \d*\.\d*, (\d*\.\d*), \d*\.\d*, \d*\.\d*
graph_value snapshot
graph_units MB
graph_legend ZFS snapshot
graph_lower_limit 0
rrd_plottype STACK # place new value (snapshot) on top of previous (data)
rrd_color C0C0C0 # silver
}
define ngraph{
service_name ZFS
type GPRINT
print_source snapshot
print_description Latest:
print_function LAST
print_format %2.2lf
}
define ngraph{
service_name ZFS
type GPRINT
print_source snapshot
print_description Maximum:
print_function MAX
print_format %2.2lf
print_eol left # start next GPRINT in new row
}
define ngraph{
service_name ZFS
graph_log_regex : (\d*\.\d*), \d*\.\d*, \d*\.\d*, \d*\.\d*
graph_value quota
graph_units MB
graph_legend ZFS quota
graph_lower_limit 0
rrd_plottype LINE1
rrd_color FF0000 # red
}
define ngraph{
service_name ZFS
type GPRINT
print_source quota
print_description Latest:
print_function LAST
print_format %2.2lf
}
define ngraph{
service_name ZFS
type GPRINT
print_source quota
print_description Maximum:
print_function MAX
print_format %2.2lf
print_eol left # start next GPRINT in new row
}
|
And finally here is nice graph.
