November 5, 2009

Monit for Easy Server Process Monitoring

Monit is a free open source utility for managing and monitoring, processes, files, directories and filesystems on a UNIX system. Monit conducts automatic maintenance and repair and can execute meaningful causal actions in error situations.
http://mmonit.com/monit/

Everyone has those daemons that die under pressure, which results in a late night phone call, logging into the server and restarting the process. Monit is an easy to deploy process watchdog that will restart those annoying processes. It also has a whole slew of other monitoring and alert functions.

Here, I am yum install'ing from the Dag Wieers repo.
# yum install monit
# chkconfig monit on
# service monit start

Take a look at /etc/monit.conf for some config examples, then
create your own included config file, per service you want to monitor, in the /etc/monit.d/ directory. My problem daemon is a flexlm license manager.

/etc/monit.d/flexnet:
  check process flexnet with pidfile /var/tmp/flexnet.pid
start program = "/etc/init.d/flexnet start"
stop program = "/etc/init.d/flexnet stop"
if 3 restarts within 5 cycles then timeout

My quick hack to create a pid file for flexlm with the flexnet init script /etc/init.d/flexnet:
#!/bin/sh

case "$1" in
start)
if [ -f /etc/lmboot_TMW ]; then
# pid cleanup
rm -f /var/tmp/.flexlm/lmgrd.* /var/tmp/flexnet.pid
# start
/etc/lmboot_TMW -u flexlm && echo 'MATLAB_lmgrd'
# pid hack
cat /var/tmp/.flexlm/lmgrd.* | grep PID | tail -n1 | cut -d"=" -f2 > /var/tmp/flexnet.pid
fi
;;
stop)
if [ -f /etc/lmdown_TMW ]; then
/etc/lmdown_TMW > /dev/null 2>&1
# pid cleanup
rm -f /var/tmp/.flexlm/lmgrd.* /var/tmp/flexnet.pid
fi
;;
*)
echo "Usage: $0 {start|stop}"
exit 1
;;
esac

exit 0

reload the service after adding you files:
# service monit reload

This is just to get me started. I think I'll monitor some more services, including node uptime, and try the web interface.

0 comments: