Setting up the Nagios server on a host - UCB Confluence

19
Setting up the Nagios server on a host This document describes how we set up a server on a CentOS host in the cloud, for performing automated networking monitoring of, and problem Nagios notification for, various UCB Research IT hosts. The following examples are based on installing Nagios Core on a CentOS 6.5 32-bit system on a host. Some selection criteria for these DigitalOcean three components of our current monitoring solution: Nagios Core is a widely-used - and venerable - open source IT infrastructure / network monitoring tool. The UCB IST Unix Team and the staff responsible for several high profile UC Berkeley enterprise systems use (some flavor of) Nagios. (As well, unlike many of its competitors, Nagios runs on truly minimal hardware configurations, including very low cost virtual private servers (VPSes).) CentOS was selected as the Linux distribution for UCB-RIT's initial network monitoring host, because it is: A slow-changing, Linux distribution. Maintenance updates for CentOS 6 . long-term support are pledged through 2020 (Note that starting with CentOS 7, only 64-bit hosts are supported. Thus, we'll need to migrate this monitoring host before CentOS 6 support ends on November 30, 2020, presumably to a 64-bit host running a recent version of CentOS - or Ubuntu or some other well-supported Linux distro offering long-term support.) Resource-sparing, enabling it to be run on minimal hardware, such as the single CPU, 512 MB RAM VPS configuration we're renting from our hosting provider. Based on and highly similar to the Red Hat Enterprise Linux (RHEL) systems on which UC Berkeley's deployments of CollectionSpace are running, in the campus Data Center. (See the article, for a description of the relationship between Red Hat reveals CentOS plans, RHEL and CentOS, at least as of March 2014. Interestingly, this article also states that Karsten Wade, Red Hat's CentOS Engineering Manager, believes that "all the people who use CentOS ... may be more than those who use RHEL and Fedora combined.") DigitalOcean is a virtual private server (VPS) hosting service, with , that offers some of the extensive venture capital backing lowest cost cloud available, as of this writing. server hosting Administrative tasks Set up a .berkeley.edu hostname Set the DigitalOcean hostname to that .berkeley.edu hostname Identify the security contact for this host Set up security configuration Add an admin user and grant them sudo access Set your host's time zone Harden SSH access Set up fail2ban to further harden SSH access Set up the firewall Update OpenSSL Update everything Set up automated security updates Set up system administration tools Install Webmin Set up HTTPS encryption for Webmin Set up Webmin users Install and configure Nagios Add swap space Install Nagios Set up command aliases for Nagios Install the Git client Retrieve our customized copy of Nagios's configuration directory from version control (GitHub) Set up Nagios's private configuration Set up users for the Nagios web-based admin console Set the time zone for the Nagios web-based admin console Set up HTTPS encryption for the Nagios web-based admin console Set up Postfix for sending email notifications See also Administrative tasks Set up a .berkeley.edu hostname To set up a .berkeley.edu hostname for this host, fill in and submit the Off-Site Hosting Request form at https://offsitehosting.berkeley.edu/ This document has not yet been tested by anyone except its original author. After testing (and any subsequent revisions), it would be a very good idea to set up scripts to perform this installation automatically. Configuration Management

Transcript of Setting up the Nagios server on a host - UCB Confluence

Setting up the Nagios server on a hostThis document describes how we set up a server on a CentOS host in the cloud, for performing automated networking monitoring of, and problem Nagiosnotification for, various UCB Research IT hosts.

The following examples are based on installing Nagios Core on a CentOS 6.5 32-bit system on a host.  Some selection criteria for these DigitalOceanthree components of our current monitoring solution:

Nagios Core is a widely-used - and venerable - open source IT infrastructure / network monitoring tool.  The UCB IST Unix Team and the staff responsible for several high profile UC Berkeley enterprise systems use (some flavor of) Nagios.  (As well, unlike many of its competitors, Nagios runs on truly minimal hardware configurations, including very low cost virtual private servers (VPSes).)CentOS was selected as the Linux distribution for UCB-RIT's initial network monitoring host, because it is:

A slow-changing, Linux distribution. Maintenance updates for CentOS 6 .long-term support are pledged through 2020(Note that starting with CentOS 7, only 64-bit hosts are supported. Thus, we'll need to migrate this monitoring host before CentOS 6 support ends on November 30, 2020, presumably to a 64-bit host running a recent version of CentOS - or Ubuntu or some other well-supported Linux distro offering long-term support.)

Resource-sparing, enabling it to be run on minimal hardware, such as the single CPU, 512 MB RAM VPS configuration we're renting from our hosting provider.Based on and highly similar to the Red Hat Enterprise Linux (RHEL) systems on which UC Berkeley's deployments of CollectionSpace are running, in the campus Data Center.  (See the article, for a description of the relationship between Red Hat reveals CentOS plans,RHEL and CentOS, at least as of March 2014.  Interestingly, this article also states that Karsten Wade, Red Hat's CentOS Engineering Manager, believes that "all the people who use CentOS ... may be more than those who use RHEL and Fedora combined.")

DigitalOcean is a virtual private server (VPS) hosting service, with , that offers some of the extensive venture capital backing lowest cost cloud available, as of this writing.server hosting

Administrative tasksSet up a .berkeley.edu hostnameSet the DigitalOcean hostname to that .berkeley.edu hostnameIdentify the security contact for this host

Set up security configurationAdd an admin user and grant them sudo accessSet your host's time zoneHarden SSH accessSet up fail2ban to further harden SSH accessSet up the firewallUpdate OpenSSLUpdate everythingSet up automated security updates

Set up system administration toolsInstall WebminSet up HTTPS encryption for WebminSet up Webmin users

Install and configure NagiosAdd swap spaceInstall NagiosSet up command aliases for NagiosInstall the Git clientRetrieve our customized copy of Nagios's configuration directory from version control (GitHub)Set up Nagios's private configurationSet up users for the Nagios web-based admin consoleSet the time zone for the Nagios web-based admin consoleSet up HTTPS encryption for the Nagios web-based admin consoleSet up Postfix for sending email notifications

See also

Administrative tasks

Set up a .berkeley.edu hostname

To set up a .berkeley.edu hostname for this host, fill in and submit the Off-Site Hosting Request form at https://offsitehosting.berkeley.edu/

This document has not yet been tested by anyone except its original author.  After testing (and any subsequent revisions), it would be a very good idea to set up scripts to perform this installation automatically.Configuration Management

Once IT Policy approves this hostname assignment, they'll forward the request to the UCB Hostmaster/Hostmistress.  It will then take up to 2 days to be entered and reflected in DNS.

For a copy of that request, see the attached PDF, Off-Site Hosting Request - ucb-rit-utils-1 - 2014-06-18.pdf (To view attachments, select Attachments from the 'other' ('...') menu at upper right, on this Confluence wiki page.)

(Not shown in that PDF is the full text entered into the  field on that onscreen request form, below.)Site Purpose

Monitoring of host services and health metrics (free memory, disk space)on UC Berkeley Research IT-managed hosts.

All access to UC Berkeley-hosted services, such as CollectionSpace(museum collections management) implementations, PAHMA Delphi(collections browsing system), and Bamboo DiRT (digital humanitiestools registry) performed by this monitoring host will either beperformed as an unauthenticated public user, or as a user withread-only privileges for accessing demonstration data (not actualmuseum data). No data will be stored on this offsite host thatfalls into Data Class Protection Levels 1 or higher.

Set the DigitalOcean hostname to that .berkeley.edu hostname

After the berkeley.edu hostname has been assigned by the UC Berkeley hostmaster/hostmistress, , we in order for reverse DNS (PTR) lookup to workneed to set the hostname of our DigitalOcean virtual private server host ("Droplet") to that hostname.

You do this by logging into the DigitalOcean console, and selecting Settings -> Rename.  Overtype the placeholder value with your New Hostnameberkeley.edu hostname, and click the  button.Rename

After making this change, it took about two hours before this change was reflected in reverse DNS lookups.

Identify the security contact for this host

Glen, as a security contact for our group, associated the host's IP address with the UCB RIT Museum Informatics security mailing list, istds-informatics-, via   By doing this, all of the members of that mailing list will receive security-related [email protected] https://securitycontact.berkeley.edu/

notifications pertaining to this host.

Set up security configuration

Starting out having logged in as the user ...root

Add an admin user and grant them sudo access

Add a non-root, administrative user (here named , as a placeholder for an actual username) on the host :myadmin myhostname

[root@myhostname ~]# adduser myadmin[root@myhostname ~]# passwd myadminChanging password for user myadmin.New password:

(This process involves setting up a password for login.  For public key-based login, see )https://cloud.digitalocean.com/ssh_keys

[root@myhostname ~]# visudo

In that file, add a line to grant  privileges to that user:sudo

## Next comes the main part: which users can run what software on## which machines (the sudoers file can be shared between multiple## systems).## Syntax:#### user MACHINE=COMMANDS#### The COMMANDS section may have other options added to it.#### Allow root to run any commands anywhereroot ALL=(ALL) ALL# Add this linemyadmin ALL=(ALL) ALL

Log out as the  user:root

Log back in as the  user (or whatever your admin account is named).myadmin

Set your host's time zone

This example sets the host to the Pacific Time Zone in North America, by aliasing the  timezone to America/Los_Angeles localtime

cd /etcls -al localtimesudo mv localtime localtime.baksudo ln -s /usr/share/zoneinfo/America/Los_Angeles localtimels -al localtime

Verify this change (look for "PDT" or "PST" in the output from this command):

date

Harden SSH access

Change a variety of SSH configuration settings as recommended by security best practices.  (See, for instance, http://www.faqs.org/docs/securing)/chap15sec122.html

Back up the current  configuration file:sshd

cd /etc/sshsudo cp sshd_config sshd_config.orig

Edit the file:

sudo vi sshd_config

Uncomment a variety of settings that are commented out by default; e.g. among these (by no means a complete list; see the document linked above and similar lists):

(Going beyond these modest configuration changes, below, we should also enable public key access for all users.)

KeyRegenerationInterval 1hServerKeyBits 1024LoginGraceTime 2mPermitRootLogin noStrictModes yesHostbasedAuthentication noIgnoreUserKnownHosts yesIgnoreRhosts yesPermitEmptyPasswords no

Add an AllowUsers directive, with a space-separated list of only those system users who should be given SSH access; e.g.

AllowUsers myadmin

Activate the revised settings by restarting :sshd

sudo service sshd restart

Set up fail2ban to further harden SSH access

Per the description in  ,"fail2ban is a daemon to ban hosts that cause multiple authentication errors."/etc/init.d/fail2ban

Set up fail2ban to further harden SSH access, by generally following the (perhaps now somewhat outdated?) instructions at https://www.digitalocean.com:/community/tutorials/how-to-protect-ssh-with-fail2ban-on-ubuntu-12-04

sudo yum install fail2bancd /etc/fail2ban/cp jail.conf jail.localsudo vi jail.local

Any settings in the  file overwrite their counterpart settings in the default configuration file, .  The contents of  were jail.local jail.conf jail.localobtained by removing most of the contents of  , leaving just the relevant parts that govern global default settings and settings to set up bans on jail.confSSH intrusion attempts, with the following modifications from the default:

Increasing the ban time to 3600 seconds (60 minutes) from 600 seconds (10 minutes).Changing the method to detect changes to logfiles, etc. to .  (This method has been claimed by some users to be more backend pollingreliable than other available methods.)Decreasing the number of allowable failures for SSH login attempts to 2 (from 3), before banning the relevant IP address from further connections.

Additional hardening steps (some still need to be performed as of 2014-06-17):

Replacing password-based authentication with public key authentication; and/or:Adding tools to restrict brute-force attempts to remotely identify user credentials, such as any one of the following; and/or

fail2ban (this tool is now being used - see below)SSHGuardDenyHosts

Restricting incoming SSH connections to allow connections only from addresses on selected UC Berkeley campus networks (see below).

And optionally:

Setting up firewall rules to restrict brute-force exploit attempts

# Fail2Ban jail base specification file

# 'local' jail# See https://www.digitalocean.com/community/tutorials/how-to-protect-ssh-with-fail2ban-on-ubuntu-12-04

[DEFAULT]

# "ignoreip" can be an IP address, a CIDR mask or a DNS host. Fail2ban will not# ban a host which matches an address in this list. Several addresses can be# defined using space separator.ignoreip = 127.0.0.1/8# If we ever wish to prevent Fail2Ban from monitoring connection# attempts coming from major UC Berkeley networks, add one or more# of the address ranges below to the 'ignoreip' directive above,# with addresses separated by space characters:# 128.32.0.0/16 136.152.0.0/16 169.229.0.0/16 192.101.42.0/24

# "bantime" is the number of seconds that a host is banned.# 3600 seconds = 60 minutesbantime = 3600

# A host is banned if it has generated "maxretry" during the last "findtime"# seconds.findtime = 600

# "maxretry" is the number of failures before a host get banned.maxretry = 3

# "backend" specifies the backend used to get files modification.# Available options are "pyinotify", "gamin", "polling" and "auto".# This option can be overridden in each jail as well.## pyinotify: requires pyinotify (a file alteration monitor) to be installed.# If pyinotify is not installed, Fail2ban will use auto.# gamin: requires Gamin (a file alteration monitor) to be installed.# If Gamin is not installed, Fail2ban will use auto.# polling: uses a polling algorithm which does not require external libraries.# auto: will try to use the following backends, in order:# pyinotify, gamin, polling.backend = polling

# "usedns" specifies if jails should trust hostnames in logs,# warn when DNS lookups are performed, or ignore all hostnames in logs## yes: if a hostname is encountered, a DNS lookup will be performed.# warn: if a hostname is encountered, a DNS lookup will be performed,# but it will be logged as a warning.# no: if a hostname is encountered, will not be used for banning,# but it will be logged as info.usedns = yes

# This jail corresponds to the standard configuration in Fail2ban.# The mail-whois action send a notification e-mail with a whois request# in the body.

[ssh-iptables]

enabled = truefilter = sshdaction = iptables[name=SSH, port=ssh, protocol=tcp] sendmail-whois[name=SSH, dest=root, [email protected], sendername="Fail2Ban"]logpath = /var/log/securemaxretry = 2

Start failtoban:

sudo service fail2ban start

Verify that fail2ban-specific rules appear in the listing of iptables rules:

sudo iptables -L

Set up the firewall

(See )http://wiki.centos.org/HowTos/Network/IPTables#head-2760341949881b50946167a95df2945b2f94434a

Identify that the firewall ( /  ) is installed (via ), and that it is actively running as a loadable kernel module (via  ).  (Eachiptables netfilter rpm -q lsmodcommand below should output at least one line containing the string or  , respectively.)iptables ip_tables

rpm -q iptableslsmod | grep ip_tables

List the firewall's current configuration:

sudo iptables -L

(Per , "DigitalOcean VPSs usually come with the https://www.digitalocean.com/community/articles/how-to-setup-a-basic-ip-tables-configuration-on-centos-6empty configuration: all traffic is allowed.")

Set up firewall rules, one rule per command, starting with these:

sudo iptables -Fsudo iptables -A INPUT -p tcp --tcp-flags ALL NONE -j DROPsudo iptables -A INPUT -p tcp ! --syn -m state --state NEW -j DROPsudo iptables -A INPUT -p tcp --tcp-flags ALL ALL -j DROP

Enable inbound connections from localhost ):(lo

sudo iptables -A INPUT -i lo -j ACCEPT

Enable access to SSH from localhost (loopback address):

sudo iptables -I INPUT -p tcp --dport 22 -s 127.0.0.1/8 -j ACCEPT

Enable inbound access to SSH - only from major UC Berkeley network address ranges (obtained from ) - at http://net.berkeley.edu/access/ucb-nets.shtmlits well-known port:

sudo iptables -I INPUT -p tcp --dport 22 -s 128.32.0.0/16 -j ACCEPTsudo iptables -I INPUT -p tcp --dport 22 -s 136.152.0.0/16 -j ACCEPTsudo iptables -I INPUT -p tcp --dport 22 -s 169.229.0.0/16 -j ACCEPTsudo iptables -I INPUT -p tcp --dport 22 -s 192.101.42.0/24 -j ACCEPT

Enable inbound access to Webmin - only from major UC Berkeley network address ranges - at its well-known port:

sudo iptables -I INPUT -p tcp --dport 10000 -s 128.32.0.0/16 -j ACCEPTsudo iptables -I INPUT -p tcp --dport 10000 -s 136.152.0.0/16 -j ACCEPTsudo iptables -I INPUT -p tcp --dport 10000 -s 169.229.0.0/16 -j ACCEPTsudo iptables -I INPUT -p tcp --dport 10000 -s 192.101.42.0/24 -j ACCEPT

Enable inbound access to HTTP at its well-known port:

1. 2. 3.

sudo iptables -A INPUT -p tcp -m tcp --dport 80 -j ACCEPT

Set the most general ("catch all") rules in policies at the end, to:

Allow .established and related incoming connectionsAllow outgoing connections.Block all other incoming connections, other than those explicitly allowed in earlier rules.

sudo iptables -I INPUT -m state --state ESTABLISHED,RELATED -j ACCEPTsudo iptables -P OUTPUT ACCEPTsudo iptables -P INPUT DROP

List current firewall rules:

sudo iptables -L -n

If the current set of rules looks acceptable, save them:

sudo iptables-save | sudo tee /etc/sysconfig/iptables

Restart the firewall to reflect the new rules:

sudo service iptables restart

Update OpenSSL

Update OpenSSL to help ensure we have a version that's been patched against the Heartbleed bug and subsequently-identified vulnerabilities.  (Invoking y will also update an existing package, if any update is available.)um install

sudo yum install openssl

Update everything

Use to update all repositories and upgrade all installed packages, to help pick up security patches to other packages, as well:yum

sudo yum update

Set up automated security updates

Use to set up automatic updates, including security updates. (These updates will be checked for and installed daily.) yum-cron Note: automatic updating was set up on ucb-rit-utils-1.berkeley.edu on 2015-01-29.

Install this package:

sudo yum -y install yum-cron

Run it to initiate automatic updates:

When we add HTTPS access, we'll need to open up port 443 as well.  HTTP port 80 should remain open, as long as http://hackthehearst. is hosted on this same VPS.berkeley.edu

If you should encounter any issues when running the above command, see , the discussion of changing repository priorities when installing Gitbelow.

sudo service yum-cron start

Ensure that it runs even on reboots:

sudo chkconfig yum-cron on

Check to make sure that  is enabled for (at least) runlevels 3 through 5:yum-cron

chkconfig | grep yum-cron

 

Set up system administration tools

Install Webmin

"Webmin is a web-based interface for system administration for Unix."  We've routinely set this handy administration tool up on the UCB and (soon to be) LYRASIS CollectionSpace hosts, and we'll do so on the monitoring host as well, here.

First, install Webmin by following the instructions under "Using the Webmin YUM repository" at .http://www.webmin.com/rpm.html

Create/edit the file , to configure the repository needed to install and update the Webmin package via  :/etc/yum.repos.d/webmin.repo yum

sudo vi /etc/yum.repos.d/webmin.repo

Paste in the following content into that file and save the file:

[Webmin]name=Webmin Distribution Neutral#baseurl=http://download.webmin.com/download/yummirrorlist=http://download.webmin.com/download/yum/mirrorlistenabled=1

Download and install the GNU Privacy Guard (GPG) key for Webmin's author, Jamie Cameron, with which the Webmin packages are signed:

wget http://www.webmin.com/jcameron-key.ascsudo rpm --import jcameron-key.asc

Install webmin:

sudo yum install webmin

Set up HTTPS encryption for Webmin

Webmin comes with a minimal web server, MiniServ.  That server needs to be set up to use HTTPS (SSL/TLS) encryption, in order to encrypt user credentials during Webmin logins.  (See )http://www.webmin.com/ssl.html

First, verify or install needed dependencies for HTTPS encryption:

Verify that OpenSSL is installed.  (You installed or updated the  package in a Security step, above, so it should definitely be installed at this point.)openssl

openssl version -a

Install the  Perl module:Net::SSLeay

sudo yum install perl-Net-SSLeay

Verify that Perl's SSL support (via Net::SSLeay) is working. The following command, when run, should  output any messages:not

perl -e 'use Net::SSLeay'

Second, use Webmin's own web-based console to enable - and force - encrypted HTTPS access:

In your web browser, log into Webmin (by default, via as the user  , with the Linux system user's password.  (You'll http://yourhostname:10000), root rootonly need to enter these credentials once in cleartext.)

Once logged into the Webmin console:

Select from the left sidebar.Webmin -> Webmin Configuration

On the Webmin Configuration screen, click the icon.  (You might need to scroll down in your browser window to see it.)SSL Encryption

On the SSL Encryption screen, next to  , click the Yes button.Enable SSL if available?

Click .Save

This will cause a self-signed server certificate to be created and for Webmin to now require HTTPS access.

Your current session will be interrupted, and you'll presented with an error message in your browser window, noting that the connection is untrusted (e.g. in Firefox) or a similar message in other browsers.

You'll then need to log in again, this time using HTTPS; e.g. https://yourhostname:10000

Set up Webmin users

Log into Webmin (by default, via as the user  , with the Unix  user's password.https://yourhostname:10000), root root

Select from the left sidebar.Webmin -> Webmin Users

For each new Webmin user you want to create:

Click .Create a new Webmin user

In the Webmin User Access Rights section, enter text or select values from menus, as appropriate, in each of the following fields: Username,  Password, Real Name, and the option to Force (password) change at next login.

If this user is intended to have the same username and password as a Linux system user, so that user can conveniently log into Webmin with the same credentials, enter that system user's username in the Username field, and select from the Password dropdown menu.Unix authentication

In the Available Webmin modules section, select the modules to which this Webmin user should be granted access.

Click to save your changes.Create

 

Install and configure Nagios

Add swap space

Nagios requires some reasonable amount of swap space (dedicated disk space that can be used to swap out in-memory data); typically a minimum of at least 1-2 GB.

(See the instructions on setting up swap space in .)https://www.digitalocean.com/community/articles/how-to-install-nagios-on-centos-6

Check to see if swap is already present:

[myadmin@myhostname ~]$ swapon -sFilename Type Size Used Priority

Check on available disk space:

Be sure to create at least one Webmin user account, and preferably more than one, that has the ability to set up users and their access rights. By doing so, you should no longer ever have to log into Webmin as the  user.root

[myadmin@myhostname ~]$ df -HFilesystem Size Used Avail Use% Mounted on/dev/vda 22G 972M 20G 5% /tmpfs 262M 0 262M 0% /dev/shm

Check on available memory, to verify that swap space is not already allocated:

[myadmin@myhostname ~]$ free -m total used free shared buffers cachedMem: 498 55 443 0 5 30-/+ buffers/cache: 20 478Swap: 0 0 0

Create a 1 GB swap space, associated with the file /swapfile1(As an alternative, modify the first command below to set for 2 GB of space.)count=2097152

[myadmin@myhostname ~]$ sudo dd if=/dev/zero of=/swapfile1 bs=1024 count=1048576[myadmin@myhostname ~]$ sudo mkswap -c /swapfile1[myadmin@myhostname ~]$ sudo swapon /swapfile1[myadmin@myhostname ~]$ swapon -s

Make that swap space configuration come up after each reboot:

sudo vi /etc/fstab

And in that file, add the bottom line shown below, the one beginning with  :/swapfile1

# /etc/fstab# Created by anaconda on Tue Dec 17 14:38:10 2013## Accessible filesystems, by reference, are maintained under '/dev/disk'# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info#LABEL=DOROOT / ext4 defaults 1 1tmpfs /dev/shm tmpfs defaults 0 0devpts /dev/pts devpts gid=5,mode=620 0 0sysfs /sys sysfs defaults 0 0proc /proc proc defaults 0 0/swapfile1 swap swap defaults 0 0

Set highly restrictive access permissions on the swapfile:

sudo chown root:root /swapfile1sudo chmod 0600 /swapfile1

Verify that the swap space is now shown, when displaying memory:

$ free -m total used free shared buffers cachedMem: 498 415 83 0 2 385-/+ buffers/cache: 28 470Swap: 1023 0 1023

Set swappiness (bring it down to a fairly low value, if it isn't already), to reduce the likelihood that memory will be paged to disk.

Check the current value.  (If it's higher than 10, we'll next set it to 10.)

cat /proc/sys/vm/swappiness

Set the current, in-memory value:

sudo sysctl vm.swappiness=10

Make this change persistent across system restarts:

sudo vi /etc/sysctl.conf

Add the following line to the  file:/etc/sysctl.conf

vm.swappiness = 10

Reload values from that file:

sudo sysctl –p

Install Nagios

(See )https://www.digitalocean.com/community/articles/how-to-install-nagios-on-centos-6

(On our current CentOS system, following the installation steps below installs Nagios® Core™, Version 3.5.1, at least as of as of 2014-04-18.  The current version of Nagios is 4.x; you'll occasionally see reminder messages, in Nagios' Web-based admin console and possibly elsewhere, letting you know you have a non-current version. For a list of changes introduced in version 4.x, see )What's New in Nagios Core 4.x

Install the Extra Packages for Enterprise Linux (EPEL) repository:

sudo rpm -Uvh http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm

Install Nagios, its plug-in framework, a set of standard plugins, and the NRPE (agent) software.  Also install Apache 2 ( ) and PHP, required for httpdNagios' web-based interface.

sudo yum -y install nagios nagios-plugins-all nagios-plugins-nrpe nrpe php httpd

Validate Nagio's configuration (and resolve any issues found):

sudo nagios -v /etc/nagios/nagios.cfg

Add Apache 2 ( ) and Nagios to the set of applications that are automatically launched at startup, and can be started, stopped, and otherwise httpdmanaged through initialization scripts:

sudo chkconfig httpd onsudo chkconfig nagios on

Verify that both Apache 2 and Nagios are configured to automatically launch at startup.  (In the output from the following command, make sure that the entries for both and  contain  .  This indicates that both services will be enabled during runlevel 3: a standard, multiuser startup.)httpd nagios 3:on

sudo chkconfig --list

Start Apache 2 and Nagios:

sudo service httpd startsudo service nagios start

Set up command aliases for Nagios

In a system-level file that can hold command aliases (e.g. in  or any other file whose name ends in  , placed in  ), set up nagios.sh .sh /etc/profile.daliases for commonly-used commands related to Nagios.  For instance:

alias nagios-reload='sudo service nagios reload'alias nagios-restart='sudo service nagios restart'alias nagios-verify='sudo /usr/sbin/nagios -v /etc/nagios/nagios.cfg'alias nagios-start='sudo service nagios start'alias nagios-status='sudo service nagios status'alias nagios-stop='sudo service nagios stop'

You can then  this file to make those aliases immediately available, allowing you to test them; e.g. if you created a  file in source nagios.sh /etc:/profile.d

source /etc/profile.d/nagios.sh

Install the Git client

In the next step, below, we'll be pulling our customized UCB RIT configuration for Nagios from a Git repository.  To do this, we'll need to first install the Git client.

Install the Git client via:

sudo yum install git

Change repository priorities so that this EL5 repo is checked for packages.  (See )last http://wiki.centos.org/PackageManagement/Yum/Priorities

Install the package:yum-plugin-priorities

yum install yum-plugin-priorities

Verify that the yum priorities system is enabled.  (Running the following command should return output containing  .)enabled = 1

cat /etc/yum/pluginconf.d/priorities.conf

Here's what to do if you encounter this error:

$ sudo yum install git...--> Finished Dependency ResolutionError: Package: git-1.7.12.4-1.el5.rf.i386 (rpmforge) Requires: libcurl.so.3

This looks like a frequently-encountered problem, with many solutions offered:https://www.centos.org/forums/viewtopic.php?t=8226http://unix.stackexchange.com/questions/91668/how-to-install-git-for-centoshttp://serverfault.com/questions/391765/yum-trying-to-install-el5-when-i-am-on-el6

(flagged as containing some http://unix.stackexchange.com/questions/20044/package-git-1-7-6-1-1-el5-rf-i386-rpmforge-requires-libcurl-so-3lower quality answers, so beware)http://wiki.centos.org/AdditionalResources/Repositories/RPMForge

In our case, it looks like two of the repository configurations, both for RPMForge, are configured to install EL5 (RedHat Enterprise Linux 5, and hence CentOS 5.x) packages, rather than EL6 (hence CentOS 6.x) packages:

 

$ grep -l el5 /etc/yum.repos.d/*/etc/yum.repos.d/mirrors-rpmforge/etc/yum.repos.d/rpmforge.repo

Git is being offered from two repositories, and the repo that has priority is the one offering EL5 packages, of which its package, in turn, gitrelies on that older shared library:

$ yum list git...Available Packagesgit.i686 1.7.1-3.el6_4.1 base git.i386 1.7.12.4-1.el5.rf rpmforge

In one or more of the articles above or equivalent ones, I found cogent recommendations to remove the installed EL5 packages; update the repo configurations: to remove any referencing EL5 repos and replace them with their EL6 equivalents; run  ; and then install yum clean alleach of those packages once again.

However, for now, I followed the advice in this answer , which offers an expedient method of getting Git http://unix.stackexchange.com/a/55946installed: explicitly specifying the EL6 flavor of Git, by explicitly adding its version number to the package name in the  yum installcommand.  This resulted in a successful installation:

sudo yum install git-1.7.1-3.el6_4.1

Note that this is a one-time fix that doesn't resolve the underlying problem; for instance, attempting to update every package via sudo yum still yields this error:update

Error: Package: git-1.7.12.4-1.el5.rf.i386 (rpmforge) Requires: libcurl.so.3

Edit individual repository configuration files to set their priorities, by adding  entries.  (Priority  is highest, and  is lowest.)priority=nn 1 99

Set  in the CentOS Base repo configuration file, ; e.g.priority=1 CentOS-Base.repo

[base]name=CentOS-$releasever - Basemirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=os#baseurl=http://mirror.centos.org/centos/$releasever/os/$basearch/gpgcheck=1gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-6priority=1

#released updates [updates]name=CentOS-$releasever - Updatesmirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=updates#baseurl=http://mirror.centos.org/centos/$releasever/updates/$basearch/gpgcheck=1gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-6priority=1

And set  in the problematic EL5-configured RPMforge repo configuration file, :priority=99 rpmforge.repo

# Name: RPMforge RPM Repository for Red Hat Enterprise 5 - dag# URL: http://rpmforge.net/[rpmforge]name = Red Hat Enterprise $releasever - RPMforge.net - dag#baseurl = http://apt.sw.be/redhat/el5/en/$basearch/dagmirrorlist = http://apt.sw.be/redhat/el5/en/mirrors-rpmforge#mirrorlist = file:///etc/yum.repos.d/mirrors-rpmforgeenabled = 1protect = 0gpgkey = file:///etc/pki/rpm-gpg/RPM-GPG-KEY-rpmforge-daggpgcheck = 1priority=99

After doing so, running  now completes successfully, without displaying an error message related to updating the  package.sudo yum update git

 

Retrieve our customized copy of Nagios's configuration directory from version control (GitHub)

We're now going to pull our UCB RIT-customized copy of Nagios's configuration directory - the configuration we will maintain over time for our unit's monitoring host(s) - from version control in one of our GitHub repos.

First, rename the current Nagios configuration directory, thus making a backup copy of the default Nagios configuration:

cd /etcsudo mv nagios nagios.bak

Check out our UCB-RIT customized copy of Nagio's configuration directory from version control:

Using the priorities mechanism, as described above, also means that . (system updates are disabled Update 2016-02-19: it appears that ) To install system updates, you'll need to do this automatic system updates are nonetheless successfully being installed via yum-cron. - Aron

five-step dance:

Disabling the priorities plugin by editing and setting /etc/yum/pluginconf.d/priorities.conf enabled = 0Temporarily uninstalling Git via sudo yum remove gitInstalling system updates via sudo yum updateRe-enabling the priorities plugin (required to re-install Git) by editing and setting /etc/yum/pluginconf.d/priorities.conf enabled = 1Re-installing Git via sudo yum install git

In order to avoid this in the future, one possible thing we might try is to completely remove the EL5 repos from the system, as described above. (I'm also hoping this will no longer be an issue, after a (future) upgrade to CentOS 7 ...)

sudo git clone https://github.com/ucb-rit/service-monitoring-config.git

This checkout results in the creation of a directory named .  Alias that directory's  subdirectory to a top-level service-monitoring-config nagios nagi directory (in ):os /etc

sudo ln -s service-monitoring-config/nagios nagios

Copy the  file from the previous (now backup)  configuration directory into the new one, and set its ownership to include the  passwd nagios apachegroup.  (This file isn't included in the configuration checked into GitHub.)passwd

sudo cp nagios.bak/passwd nagios/passwdsudo chown root:apache nagios/passwd

Copy the  directory from the previous (now backup)  configuration directory into the new one, and set its ownership to include the private nagios nagiosgroup.  (This  directory isn't included in the configuration checked into GitHub; its contents are maintained in Box, as described in the next section privatebelow.)

sudo cp -R nagios.bak/private nagios/privatesudo chown root:nagios nagios/private

Set up Nagios's private configuration

Nagios allows you to store credentials and other private data in a  file (where  is typically $NAGIOS_HOME/private/resource.cfg $NAGIOS_HOME /etc, as it is on our host)./nagios

This file contains up to 32 key/value pairs, where the keys can be used as macros in Nagios configuration files, referencing their private values in that file.  The keys all have names in the range of  through .USER1 USER32

(The first two keys,  and  are typically not available for user purposes, but are instead set to paths to plugins and event handlers, USER1 USER2respectively.  Those paths aren't necessarily private, but may be host-specific; hence, being able to reference those paths via macros allows configuration files to be made uniform across hosts.)

Below are the contents of a customized file, where the value of is set to the name of the CollectionSpace  user in the  resource.cfg USER3 reader coretenant, and the value of is set to the password for that user.  Only the last two lines, which set those two values, have changed from the default.USER4

You can edit the file and change just the relevant lines, as shown below:$NAGIOS_HOME/private/resource.cfg

This file is maintained in Box at . That Box folder is https://berkeley.app.box.com/files/0/f/2371697691/For_etc-nagios-private_on_ucb-rit-utils-1owned by Aron and has been shared (with Editor access rights) with a large number of RIT colleagues.

############################################################################# RESOURCE.CFG - Sample Resource File for Nagios 3.5.1## Last Modified: 09-10-2003## You can define $USERx$ macros in this file, which can in turn be used# in command definitions in your host config file(s). $USERx$ macros are# useful for storing sensitive information such as usernames, passwords, # etc. They are also handy for specifying the path to plugins and # event handlers - if you decide to move the plugins or event handlers to# a different directory in the future, you can just update one or two# $USERx$ macros, instead of modifying a lot of command definitions.## The CGIs will not attempt to read the contents of resource files, so# you can set restrictive permissions (600 or 660) on them.## Nagios supports up to 32 $USERx$ macros ($USER1$ through $USER32$)## Resource files may also be used to store configuration directives for# external data sources like MySQL...############################################################################

# Sets $USER1$ to be the path to the plugins$USER1$=/usr/lib/nagios/plugins

# Sets $USER2$ to be the path to event handlers#$USER2$=/usr/lib/nagios/plugins/eventhandlers

# Store some usernames and passwords (hidden from the CGIs)[email protected]$USER4$=reader

Verify Nagios' configuration, using the command alias set up in an earlier step, above:

nagios-verify

Restart Nagios, using the command alias set up in an earlier step, above, to reflect the new configuration:

nagios-restart

Set up users for the Nagios web-based admin console

(See  )https://www.digitalocean.com/community/articles/how-to-install-nagios-on-centos-6

Set up additional usernames and passwords, as needed. (Any usernames created must match at least one value in at least one contact_name contact file, in the  directory; and by default, that user will have access to services relevant to that contact.)s.cfg conf.d

sudo htpasswd /etc/nagios/passwd somenagioscontactname

Enter a password for each subsequent user when prompted.

Then connect via a web browser and verify that you can log in successfully as the relevant user(s):http://myhostname/nagios/

Set the time zone for the Nagios web-based admin console

To set the time zone that Nagios uses in its web-based administration console, it's necessary to add a directive to SetEnv TZ "timezone name here"the CGI directory configuration block in Nagios' default Apache configuration file,  .  (See the last directive inside the /etc/httpd/conf.d/nagios.conf

block, below.)<Directory>

Here's what to do if you encounter a error when running the command aliases Stopping nagios: No lock file found in ... nagios or .  (See, for instance, )-restart nagios-stop http://serverfault.com/questions/146830/nagios-woudnt-start-now-wont-stop

Verify that both  and  contain the same path to Nagios' PID file; e.g. /etc/init.d/nagios /etc/nagios/nagios.cfg /var/run/nagios.pidAssuming they do ... if that file doesn't already exist, create it.If the doesn't have write access to that file, change ownership and/or file mode to grant that access.nagios group

E.g.

cd /var/runls -l nagios.pid

If that file isn't already present:

sudo touch nagios.pid

And if that file isn't currently writeable by the  group:nagios

sudo chown root:nagios nagios.pidsudo chmod g+w nagios.pid

One-time initial setup

The command below uses .  ; that truncates and rewrites Nagios's  htpasswd -c Do usenot -c in any other, subsequent commands passwdfile.

Set up the password for at least one administrative user (in this example, named ):nagiosadmin

sudo htpasswd -c /etc/nagios/passwd nagiosadmin

Enter a password for that user when prompted.

# SAMPLE CONFIG SNIPPETS FOR APACHE WEB SERVER# Last Modified: 11-26-2005## This file contains examples of entries that need# to be incorporated into your Apache web server# configuration file. Customize the paths, etc. as# needed to fit your system.ScriptAlias /nagios/cgi-bin/ "/usr/lib/nagios/cgi-bin/"<Directory "/usr/lib/nagios/cgi-bin/"># SSLRequireSSL Options ExecCGI... # Added to this <Directory> block per instructions in nagios.cnf SetEnv TZ "America/Los_Angeles"</Directory>

Set up HTTPS encryption for the Nagios web-based admin console

Set up Postfix for sending email notifications

By default, we've been using Sendmail for sending email notifications of outages or other issues with the services we're monitoring.

However, over the many months that we've been running the Nagios network monitoring service, we've encountered at least two instances where email notifications silently failed, leaving us without notification for weeks or even months. In both instances, it appeared that Sendmail's configuration have mightbeen changed out from underneath us, possibly due to having , or?enabled automatic system updates for CentOS via yum-cron

In the first instance, the runlevels associated with the Sendmail service (which would cause it to be automatically launched on a startup or restart) were changed so that this automatic launching didn't occur, at around the time of a system update that also required an automatic restart. After that restart, Sendmail was no longer running. In the second instance, an apparent  change to Sendmail's own configuration caused its ability to send mail via the host's Internet interface to be revoked; its network access was restricted to loopback addresses, so mail couldn't be sent, for instance, to .berkeley.edu addresses.

As a result, in August 2016, we switched to using Postfix as our mail transport agent, in hopes we could avoid a repeat of these types of issues. (Postfix is also widely regarded as having a more administrator-friendly configuration interface than Sendmail.)

To switch from Sendmail to Postfix, here's what we did:

Used Webmin to turn off the Sendmail service, via the appropriate button on Webmin's page.Servers -> Sendmail Mail Server( , from the command line, would also serve this purpose.)sudo service sendmail stopSet all runlevels for Sendmail to off, via the command:

sudo chkconfig sendmail off

Restricted the Postfix server to only accept mail sending requests originating processes running on our Digital Ocean host (even if their mail fromsending destinations are elsewhere on the Internet), by selecting Webmin's option, selecting Servers -> Postfix Mail Server -> Edit Config Files

from the Edit config file: dropdown menu, and /etc/postfix/main.cf adding this line

mynetworks = 127.0.0.0/8, [::1]/128

The following has been done, as of 2014-06-19.not

We'll next want to set up a server certificate for Apache 2, even if self-signed, to enable HTTPS access to the Nagios web-based admin console.  The following is a guide that can be adapted for use here, discussing the use of a self-signed server certificate:http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.3/bk_using_Ambari_book/content/Nagios_instructions.html

Ideally, rather than a self-signed cert, we should get a CalNet inCommon-Comodo certificate for this host:How to get a cert:CalNet InCommon-Comodo Certificate ServiceHow to generate the for Apache 2:private key and public Certificate Signing Request (CSR)https://support.comodo.com/index.php?/Default/Knowledgebase/Article/View/1/19/csr-generation-using-openssl-apache-wmod_ssl-nginx-os-x

Finally, as noted above, in the Firewall configuration section, we'll need to open port 443, to permit HTTPS inbound access.

below the comment beginning with ... and saving this change, # Alternatively, you can specify the mynetworks list by hand as shown here:

Used Webmin to turn on the Postfix service, via the appropriate button on Webmin's  page.Servers -> Postfix Mail Server( , from the command line, would also serve this purpose.)sudo service postfix startEnabled runlevels 3 through 5 for Postfix, thus configuring Postfix to be automatically relaunched on server startups and restarts, via the command:

sudo chkconfig postfix on

Changed to work with Postfix's 'sendmail-like' utility for sending email via the command line.the Nagios command for sending email notifications

See also

Setting up the Nagios agent on a host