So you've got Ubuntu 11.04 and think you'll use Nagios and its fun plugin, NRPE, to monitor your cloud servers, right? Sure! There are some gotchas though. Read on. Note: You may want to check out check_mk instead of NRPE as it is reportedly sexier. For readability I'll start with the remote server. It doesn't matter so much except that obviously the last command where you connect to the remote server won't work until you've set up the remote server.
apt-get install nagios-nrpe-server vim /etc/nagios/nrpe.cfg
Edit the allowed_hosts line as follows, replacing 1.2.3.4 with your monitoring server's IP. Be mindful of the lack of spaces:
allowed_hosts=127.0.0.1,1.2.3.4
Edit the command lines at the bottom:
command[check_hda1]=/usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /dev/hda1
You may want to change both instances of hda1 above to sda1 depending on your config. Run the df command to see your partitions.
service nagios-nrpe-server restart
apt-get install nagios3
It'll ask you to choose a password and email method. I chose Internet server for mail method and a random password.
vim /var/www/index.html
# You may want to clear out this file or change it to a more suitable index. Default splash pages annoy me.
cd /etc/nagios3/conf.d/ cp localhost_nagios2.cfg REMOTESERVER_nagios2.cfg
Replace REMOTESERVER with the remote server's hostname.
vim REMOTESERVER_nagios2.cfg
define host{ use generic-host host_name REMOTESERVER.EXAMPLE.COM ; Change to the remote server's hostname and domain name. alias REMOTESERVER ; Change to the remote server's hostname address 5.6.7.8 ; Change to the remote server's IP address (WAN?) }
define service{ use generic-service host_name REMOTESERVER.EXAMPLE.COM ; Change to the remote server's hostname and domain name. service_description Disk Space 1 check_command check_nrpe!check_sda1 }
define service{ use generic-service host_name REMOTESERVER.EXAMPLE.COM ; Change to the remote server's hostname and domain name. service_description Current Users check_command check_nrpe!check_users }
define service{ use generic-service host_name REMOTESERVER.EXAMPLE.COM ; Change to the remote server's hostname and domain name. service_description Total Processes check_command check_nrpe!check_total_procs }
define service{ use generic-service host_name REMOTESERVER.EXAMPLE.COM ; Change to the remote server's hostname and domain name. service_description Current Load check_command check_nrpe!check_load }
#### End REMOTESERVER_nagios2.cfg
vim hostgroups_nagios2.cfg
You likely have ssh running on your remote server, so go ahead and add it to the ssh-servers hostgroup. You can add it to others too.
define hostgroup { hostgroup_name ssh-servers alias SSH servers members localhost, REMOTESERVER.EXAMPLE.COM ; Change to the remote server's hostname and domain name, being careful to keep the space after the comma }
#### End hostgroups_nagios2.cfg
service nagios3 restart
/usr/lib/nagios/plugins/check_nrpe -H REMOTESERVER.EXAMPLE.COM
Ideally it'll respond with NRPE v2.12 or similar.
Open http://MONITORINGSERVER.EXAMPLE.COM/nagios3 and login with the password you created when you installed Nagios, username is nagiosadmin.
There are a ton of customizations you can do, such as setting parents, customizing icons, installing pnp4nagios, and changing the notifications.