Nagios监控系统部署

更新时间:2024-01-19 11:04:01 阅读量: 教育文库 文档下载

说明:文章内容仅供预览,部分内容可能不全。下载后的文档,内容与下面显示的完全一致。下载之前请确认下面内容是否您想要的,是否完整无缺。

一、Nagios监控简介

1.1 Nagios监控工具介绍与优势

Nagios是一款开源的网络及服务的监控工具,其功能强大,灵活性强。能有效的监控windows、Linux和UNIX等系统的主

机各种状态信息,交换机、路由器等网络设备,主机端口及URL服务等。根据不同业务故障级别发出告警信息给管理员,当故障恢复时也会发出恢复消息。Nagios服务端可以在Linux系统和类UNIX的系统上运行,目前无法再windows上运行。 官方网站地址:http://www.nagios.org/

官方快速安装说明:http://nagios.sourceforge.net/docs/3_0/quickstart-fedora.html

1.2 Nagios的特点

1).监控网络服务(SMTP/POP3/HTTP/TCP/PING等);2).监控主机资源(CPU/负载/IO状况/虚拟及正式内存及磁盘利用率等;

3).简单的插件设计模式使得用户可以方便定制符合自己的服务的检测方法;4).并行服务检查机制;5)具备定义网络分层结构的能力,用“parent”主机定义来表达网络主机间的关系,这种关系可被用来发现和明晰主机宕机或不可达状态;6).当服务或主机问题产生与解决时将告警发送给联系人;7).具备定义事件句柄功能,它可以在主机或服务的事件发生时获取更多问题定位; 8).自动的日志回滚;9).可以支持并实现对主机的冗余监控(分布式监控);10).可选的web界面用于查看当前的网络状态、通知和故障历史、日志文件等;

Nagios监控一般由一个主程序(nagios)、一个插件程序(nagios-plugins)和几个可选的附加程序(NRPE、NSClient++、NSCA和NDOUtils)等。Nagios本身只是一个监控的平台而已,其具体的监控工作都是通过插件实现的。因此,Nagios和Nagios-plugins是Nagios服务器端必须的程序组件,Nagios-plugins一般也是安装于被监控端。几个附加程序的描述如下: 1.NRPE:

工作位置:此软件工作于被监控端,一般为Linux、UNIX系统。用途:用来在被监控的远程linux/Unix主机上执行脚本插件(也可以自己编写)来实现对这些主机资源的监控。 2.NSClient++:

位置:此软件工作于windows系统的被监控端。作用:用来监控windows主机时安装在windows主机上的组件,相当于Linux下NRPE。

3.NDOUtils:不推荐使用

位置:NDOUtils工作于服务器;作用:用于将Nagios的配置信息和各event产生的数据存入数据库以实现对这些数据的检索和处理。

4.NSCA:位置:NSCA需要同时安装在服务端和客户端;作用:用于让被监控的远程Linux、UNIX主机主动将监控信息发送给Nagios服务器(在分布式监控集群模式中用到);

分布式监控NSCA外部结构简介:为完成从远程主机主动提交强制检测结果,于是就开发了NSCA外部构件。该外部构件包

括两部分,第一部分是客户端程序(send_nsca),运行于远程主机并负责将强制检测结果送到指定的服务端。另一部分是NSCA守护进程(nsca)。既可以独立的运行于守护服务也可以注册到inetd里作为一个inetd客户程序来提供坚挺联接,从客户端收到服务检测结构信息之后,守护进程将结果交给在中心服务器的Nagios,方式是通过在外部命令文件里插入一条process_svc_check_result命令,后跟上检测结果。Nagios下一次处理外部命令时将会找到这条由分布式服务器送来的强制检测信息并处理它。相关链接:

http://library.nagios.com/library/products/nagioscore/manuals/ http://sourceforge.net/projects/nscplus/

二、Nagios 服务端安装

2.1 安装前准备

2.1.1 实验环境:

系统版本:Centos 6.5

2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux IP地址 192.168.1.1 192.168.1.2 192.168.1.3 192.168.1.4 配置YUM源:略

2.1.2 解决perl编译问题: echo 'export LC_ALL=C' >> /etc/profile tail -1 /etc/profile source /etc/profile echo $LC_ALL

服务角色 Nagios LAMP LAMP windows 备注 被监控端 被监控端 被监控端 关闭防火墙/SELINUX:略 2.1.3 更新同步时间:

ntpdate us.pool.ntp.org echo '*/10 * * * * /usr/sbin/ntpdate us.pool.ntp.org >/dev/null 2>&1' >> /var/spool/cron/root crontab -l

2.1.4 安装Nagios的基础软件包:

编译软件、LAMP环境(YUM安装即可):

yum -y install gcc glibc glibc-common

yum -y install gd gd-devel ######用于后面pnp出图的软件包

yum -y install mysql* ######非必须,如果有监控数据库,那么需要先安装mysql。否则,Mysql的相关插件不会被安装 yum -y install httpd php php-gd

cd /var/cache/yum/x86_64/6/base/packages/ ######yum缓存包路径

sed -i 's#keepcache=0#keepcache=1#g' /etc/yum.conf ######保存yum缓存包,默认为安装后删除 2.1.5 添加Nagios用户:

useradd nagios

groupadd nagcmd ######官方推荐创建 usermod -a -G nagcmd nagios usermod -a -G nagcmd apache id apache id nagios Nagios安装包:

[root@localhost tools]# pwd /usr/src/tools

[root@localhost tools]# ls

epel-release-6-8.noarch.rpm nrpe-2.15.tar.gz

libart_lgpl-2.3.17.tar.gz rpmforge-release-0.5.3-1.el7.rf.x86_64.rpm nagios-4.0.7.tar.gz rrdtool-1.4.8.tar.gz nagios-plugins-2.0.3.tar.gz ###启动Nagios服务并验证 service httpd start lsof -i tcp:80

/etc/init.d/httpd start lsof -i tcp:80

######上述结果表明LAMP环境正常;

2.2 安装Nagios

2.2.1 编译安装nagios

tar zxf nagios-4.0.7.tar.gz -C /usr/src/ cd /usr/src/nagios-4.0.7/

./configure --prefix=/usr/local/nagios --with-command-group=nagcmd --enable-nanosleep --enable-broker

注:如果是编译安装的Apache,可以执行./configure --with-command-group=nagcmd –with-httpd-conf=/usr/local/apache/conf/extra make all make install make install-init

make install-commandmode make install-config make install-webconf

2.2.2 安装Nagios的web界面以及登陆验证 注:如果源代码编译的Apache。相关操作如下:

编译时./configure --with-command-group=nagcmd –with-httpd-conf=/usr/local/apache/conf/extra增加一个编译参数即指定编译

nagios web配置生成路径。然后vi /usr/local/apache/conf/httpd.conf增加Include conf/extra/nagios.conf。 创建nagios登陆用户及密码

[root@nagios-server nagios-4.0.7]# htpasswd -c /usr/local/nagios/etc/htpasswd.users yangsheng New password: ######密码:www.http://www.wodefanwen.com/ Re-type new password:

Adding password for user yangsheng

注:[root@nagios-server nagios-4.0.7]# htpasswd -cb /usr/local/nagios/etc/htpasswd.users yangsheng www.http://www.wodefanwen.com/ 如果为单独编译的Apache。则操作如下:

/usr/local/apache/bin/htpasswd -c /usr/local/nagios/etc/htpasswd.users yangsheng 2.2.3 添加Nagios的报警email地址:联系人配置文件

vi /usr/local/nagios/etc/objects/contacts.cfg +35 ######修改第35行; email 972711021@qq.com 2.2.4 开启sendmail服务: yum -y install sendmail service sendmail start lsof -i :25

chkconfig sendmail on

访问nagios:http://Server_IP/nagios

提示:以下错误的解决方法(关闭SELinux即可)

2.2.5 安装Nagios插件包

tar zxf nagios-plugins-2.0.3.tar.gz -C /usr/src/ cd /usr/src/nagios-plugins-2.0.3/

./configure --with-nagios-user=nagios --with-nagios-group=nagios --enable-perl-modules make

make install cd ..

ls /usr/local/nagios/libexec/ |wc -l

######查看插件个数大概为63个左右;

提示:如果遇到make: *** [all] Error 2则configure加一条--with-mysql=/usr/local/mysql解决; 2.2.6 添加Nagios服务到开机自启动 chkconfig --level 3 nagios on chkconfig --list nagios

或者使用如下:(推荐)

echo \ tail -1 /etc/rc.local

###查看到如下信息;

tail: inotify cannot be used, reverting to polling /etc/init.d/nagios start 2.2.7 检查Nagios的语法

/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg Total Warnings: 0

Total Errors: 0 ########0为正常

Things look okay - No serious problems were detected during the pre-flight check 或者:

[root@nagios-server ~]# /etc/init.d/nagios checkconfig Running configuration check... OK. 或者:

/etc/init.d/nagios configtest ######与第一种显示结果相同; 启动Nagios: /etc/init.d/nagios start /etc/init.d/nagios status 2.2.8 安装NRPE插件

服务端和客户端都需要check_nrpe插件,并且服务端不提供check_nrpe;

tar zxf nrpe-2.15.tar.gz -C /usr/src/ cd /usr/src/nrpe-2.15/ ./configure make all

make install-plugin make install-daemon

make install-daemon-config

ls /usr/local/nagios/libexec/check_nrpe ######验证是否存在check_nrpe; /usr/local/nagios/libexec/check_nrpe

[root@nagios-server nrpe-2.15]# ls /usr/local/nagios/libexec |wc -l 64

三、Nagios客户端安装

3.1 Nagios客户端配置

3.1.2 安装前准备 实验环境:

安装LAMP环境:YUM安装即可,安装完启动http/mysql服务; yum -y install httpd php php-gd yum -y install mysql* 关闭防火墙、SELinux:

service iptables stop chkconfig iptables off chkconfig iptables --list setenforce 0

sed -i 's#SELINUX=enforcing#SELINUX=disabled#g' /etc/sysconfig/selinux 解决perl编译问题:

echo 'export LC_ALL=C' >> /etc/profile tail -1 /etc/profile source /etc/profile echo $LC_ALL 时间同步:

ntpdate us.pool.ntp.org

echo '*/10 * * * * /usr/sbin/ntpdate us.pool.ntp.org >/dev/null 2>&1' >> /var/spool/cron/root crontab -l

上传客户端软件:

nagios-plugins-2.0.3.tar.gz nrpe-2.15.tar.gz

rpmforge-release-0.5.3-1.el6.rf.x86_64.rpm check_memory.pl check_iostat 客户端创建用户:

useradd nagios -M -s /sbin/nologin 3.1.2 安装客户端插件

tar zxf nagios-plugins-2.0.3.tar.gz -C /usr/src/ cd /usr/src/nagios-plugins-2.0.3/

./configure --with-nagios-user=nagios --with-nagios-group=nagios --enable-perl-modules make

make install

ll /usr/local/nagios/libexec/ |wc -l ll /usr/local/nagios/libexec/check_m* ###查看插件个数和确认监控mysql插件

check_mailq check_mrtgtraf check_mysql_query check_mrtg check_mysql 3.3 安装NRPE

tar zxf nrpe-2.15.tar.gz -C /usr/src/ cd /usr/src/nrpe-2.15/ ./configure make all

make install-plugin

make install-daemon

make install-daemon-config cd

3.4 安装其他相关插件【iostat】

软件下载地址:https://metacpan.org/release/Params-Validate tar zxf Nagios_client/Params-Validate-1.13.tar.gz -C /usr/src/ cd /usr/src/Params-Validate-1.13/ Params-Validate-1.13]# perl Build.PL Params-Validate-1.13]# perl INSTALL

软件下载地址:https://metacpan.org/pod/Class::Accessor

tar zxf Nagios_client/Class-Accessor-0.34.tar.gz -C /usr/src/ cd /usr/src/Class-Accessor-0.34/

Class-Accessor-0.34]# perl Makefile.PL Class-Accessor-0.34]# make && make install

----------------------------------------------------------------------------------------------- 软件下载地址:https://metacpan.org/pod/Config::Tiny

tar zxf Nagios_client/Config-Tiny-2.14.tar.gz -C /usr/src/ cd /usr/src/Config-Tiny-2.14/

Config-Tiny-2.14]# perl Makefile.PL Config-Tiny-2.14]# make && make install cd

----------------------------------------------------------------------------------------------- 软件下载地址:http://search.cpan.org/dist/Math-Calc-Units/ tar zxf Nagios_client/Math-Calc-Units-1.07.tar.gz -C /usr/src/ cd /usr/src/Math-Calc-Units-1.07 perl Makefile.PL make && make install cd

----------------------------------------------------------------------------------------------- 软件下载地址:http://search.cpan.org/dist/Regexp-Common/

tar zxf Nagios_client/Regexp-Common-2013031301.tar.gz -C /usr/src/ cd /usr/src/Regexp-Common-2013031301/ perl Makefile.PL make && make install cd

--------------------------------------------------------------- 以上均可以通过YUM安装:(建议使用YUM)

rpm -ivh rpmforge-release-0.5.3-1.el6.rf.x86_64.rpm #下载地址:http://packages.sw.be/rpmforge-release/ yum install -y perl-Params-Validate perl-Nagios-Plugin.noarch 3.5 配置开发的几个基础插件 批量部署命令:

yum -y install dos2unix

perl-Math-Calc-Units

perl-Regexp-Common-*

perl-Config-Tiny

复制check_memory.pl:

cp check_memory.pl /usr/local/nagios/libexec/

chmod 755 /usr/local/nagios/libexec/check_memory.pl dos2unix /usr/local/nagios/libexec/check_memory.pl

chown nagios.nagios /usr/local/nagios/libexec/check_memory.pl ll /usr/local/nagios/libexec/check_memory.pl 复制check_iostat:

cp check_iostat /usr/local/nagios/libexec/

chmod 755 /usr/local/nagios/libexec/check_iostat dos2unix /usr/local/nagios/libexec/check_iostat

chown nagios.nagios /usr/local/nagios/libexec/check_iostat ll /usr/local/nagios/libexec/check_iostat 3.6 配置NRPE

自动部署:

cp /usr/local/nagios/etc/nrpe.cfg /usr/local/nagios/etc/nrpe.cfg.bak sed -i '219,223d' /usr/local/nagios/etc/nrpe.cfg

perl -pi -e 's/allowed_hosts=127.0.0.1/allowed_hosts=192.168.1.1/g' /usr/local/nagios/etc/nrpe.cfg

echo \echo \echo \echo \

echo \修改NRPE配置文件,注释掉219-223行(直接删除也行):

[root@test1 ~]# vi /usr/local/nagios/etc/nrpe.cfg

#command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10

#command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20

#command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/hda1 #command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z #command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200 在其被注释的行后添加如下几行:

command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20 command[check_mem]=/usr/local/nagios/libexec/check_memory.pl -w 10% -c 3% command[check_disk]=/usr/local/nagios/libexec/check_disk -w 15% -c 7% -p / command[check_swap]=/usr/local/nagios/libexec/check_swap -w 20% -c 10%

command[check_iostat]=/usr/local/nagios/libexec/check_iostat -d sda -w 1000 -c 2000

###第一条是监控系统的负载,通过uptime命令可以查看Linux的系统负载。该参数的意思为:在1/2/15分钟内负载达到15/10/5为警告报警,在1/2/15分钟内达到30/25/20为严重报警; [root@test1 ~]# uptime

16:49:01 up 4:33, 1 user, load average: 0.00, 0.00, 0.00 1分钟之内的负载,2分钟,15分钟;

单CPU的值不应该超过5,双CPU不应该超过10;

###第二条是监控系统的内存空间。该参数的意思为:当内存空间剩余10%时为警告报警,剩余3%时为严重报警;

###第三条是监控系统的磁盘空间。该参数的意思为:当磁盘空间剩余15%是为警告报警,剩余7%时为严重报警;-p接磁盘的分区,多个磁盘接多个-p;

###第四条是监控系统的swap空间。该参数的意思为:当swap空间剩余20%时为警告报警,剩余10%时为严重报警; ###第五条是监控系统的磁盘I/O;

注:allowed_hosts=192.168.0.1在nrpe.cfg一定要指定服务器端的ip地址; 启动NRPE服务:

/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d ###-d后台启动 echo \

echo \ps -ef |grep nagios tail -2 /etc/rc.local

[root@test1 ~]# lsof -i tcp:5666

COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME nrpe 3102 nagios 4u IPv4 18638 0t0 TCP *:5666 (LISTEN) nrpe 3102 nagios 5u IPv6 18639 0t0 TCP *:5666 (LISTEN) 重启NRPE服务方法:

[root@test1 ~]# ps -ef|grep nrpe

nagios 3102 1 0 17:10 ? 00:00:00 /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d root 3149 1993 0 17:19 pts/0 00:00:00 grep nrpe [root@test1 ~]# killall nrpe

[root@test1 ~]# /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d

四、配置Nagios服务端

4.1 Nagios 结构

4.1.1 Nagios服务端的目录结构:

[root@nagios-server ~]# ls /usr/local/nagios/

bin etc include libexec perl sbin share var 以下是各目录详细解释: bin:下为nagios相关命令 etc:nagios配置文件 nagios nagiostats nrpe cgi.cfg ;htpasswd.users ;nagios.cfg:主配置文件;nrpe.cfg:nagios客户端配置文件;objects :是一个目录,下面保存了监控对象的配置文件;resource.cfg; libexec:插件目录 sbin:cgi目录 该目录用的不多; share:Nagios界面的PNP程 序目录 var:数据及日志目录 在nagios.cfg中既可以指定单独包含一个cfg文件,也可以指定包含一个目录,即该目录下所有的cfg文件都会包含进来。

为了目录结构更清晰和批量部署服务的需要,我们把主配置文件包含的文件修改如下:

配置文件名称 commands.cfg services.cfg hosts.cfg contacts.cfg timeperiods.cfg templates.cfg 说明 存放命令相关配置(也可指定commands目录) 存放服务相关配置(上百台以上可指定servers目录)默认不存在 存放主机相关配置(上百台以上可指定hosts目录)默认不存在 存放报警联系人相关配置 存放报警周期时间等相关配置 模板配置文件 4.1.2 配置主配置文件nagios.cfg

在nagios.cfg文件中找到cfg_file部分,进行如下配置:

cp /usr/local/nagios/etc/nagios.cfg /usr/local/nagios/etc/nagios.cfg.bak ###备份 vi /usr/local/nagios/etc/nagios.cfg +33 添加如下几行:

# Add configure file hosts or services-2014-07-30. cfg_file=/usr/local/nagios/etc/objects/hosts.cfg cfg_file=/usr/local/nagios/etc/objects/services.cfg cfg_dir=/usr/local/nagios/etc/objects/services 注释掉如下:

# Definitions for monitoring the local (Linux) host

#cfg_file=/usr/local/nagios/etc/objects/localhost.cfg ######注释掉改行; 操作完毕,保存nagios.cfg并建立mkdir /usr/local/nagios/etc/objects/services 对目录授权chown -R nagios.nagios /usr/local/nagios/etc/objects/services/ 生成hosts.cfg文件:

cd /usr/local/nagios/etc/objects/ head -51 localhost.cfg > hosts.cfg

chown -R nagios.nagios /usr/local/nagios/etc/objects/hosts.cfg 生成services.cfg文件:

touch /usr/local/nagios/etc/objects/services/services.cfg

chown -R nagios.nagios /usr/local/nagios/etc/objects/services/services.cfg

4.2 hosts.cfg生成库实战配置例子语法

4.2.1 hosts.cfg中主机定义部分配置参数详解

hosts..cfg一般用来存放主机的相关配置,设置机器名格式为:IP+服务+组,ID+组内机器ID。这样设置的目的,便于查看。

hosts.cfg中的主机定义部分配置参数详解

define host {

use linux-server #定义主机使用的模板,具体见templates.cfg; host_name 2-LAMP-server #主机名称,根据服务功能定义; alias 2-LAMP-server #主机别名,同上;

address 192.168.1.2 #被监控服务器的IP地址; check_command check-host-alive #检测主机存活命令;

max_check_attempts 3 #故障后,最大尝试检测次数;

normal_check_interval 2 #正常的检查间隔,默认单位为分钟;

retry_Check_interval 2 #故障后重试的检查间隔,默认单位为分钟; check_period 24x7 #检查周期24x7,具体参见timeperiods.cfg;

notification_interval 300 #故障后,两次报警的通知间隔,默认单位分钟; notification_period 24x7 #一天之内通知的周期。比如全天还是半天;

notification_options d,u,r #主机状态通知选项d为down,u为unreacheable, r位recovery; contact_groups admins #报警到admins用户组; }

hosts.cfg中主机定义部分批量模板化配置:

主机的配置也可以只配置关键选项,多数选项可采取linux-server模板的默认值,可以先调整linux-server模板,然后

所有机器统一采用这种默认值,如下: define host {

use linux-server host_name 2-LAMP-server alias 2-LAMP-server address 192.168.1.2 }

#省略其他部分,使部署能够更迅速、更方便。(省略部分定义在linux-server模板中,也就是templates.cfg) 4.2.2 services.cfg生产库实战配置例子语法

services.cfg文件是配置监控服务的,是nagios最重要的配置文件之一,对于服务器数量比较少(50以内),大部分需要

监控的服务配置都可以在这里添加,该配置文件默认不存在,需自己创建。 services.cfg中服务定义部分配置参数详解:

define service {

use generic-service #定义服务使用的模板,具体见templates.cfg;

host_name 1-LAMP-server #被监控的主机名,来自hosts.cfg,在hosts.cfg中自定义; service_description Current Load #报警服务描述,根据内容取有意义的名称;

check_command check_nrpe!check_load #检查服务的命令,很关键,注意被动服务的监控均由check_nrpe调用;

max_check_attempts 2 #尝试检查的最大次数;

normal_check_interval 4 #正常状态检查时间间隔,每4分钟去检查一次是否正常; retry_check_interval 4 #重试检查时间间隔,默认单位是分;

check_period 24x7 #检查的周期,24x7仅仅是个字符串而已; notification_interval 1440 #通知的间隔,即1440分钟通知一次。

notification_period 24x7 #通知时间周期,该参数来自timeperiods.cfg中的配置,例如定义半夜不报警短信;

notification_options w,u,c,r #通知服务选项,w为warning,u为unkown,c为critical,r为recovery; contact_groups admins #通知的用户组,其定义来自于contacts.cfg; process_perf_data 1 #PNP出图记录数据相关; }

#省略其他部分,使部署能够更迅速、更方便。(省略部分定义在generic-service模板中,也就是templates.cfg) 4.2.3 磁盘分区监控[被动监控] 定义监控磁盘模板:

define service {

name generic-disk-service service_description Disk Partition

check_command check_nrpe!check_disk active_checks_enabled 1 passive_checks_enabled 1 parallelize_check 1 obsess_over_service 1 check_freshness 0 notifications_enabled 1 event_handler_enabled 1

flap_detection_enabled 1 process_perf_data 1 retain_status_information 1 retain_nonstatus_information 1 is_volatile 0 max_check_attempts 5 normal_check_interval 4 retry_check_interval 4 check_period 24x7 notification_interval 60 notification_period 24x7 notification_options w,u,c,r contact_groups admins register 0 } #提示:

1.同样的服务可以定义多个比如磁盘可根据服务器需求不同,在重新定义,报给不同的人。 2.可以同时报给多个组,这样写:contact_groups mailusers,shoujiusers,QQusers 3.也可以简写,然后大部分参数在模板中定义,如:

###add swap load disk ping mem!date 2014-07-31. define service {

use generic-disk-service ###这就是使用的在templates.cfg定义好的模板名; host_name 1-LAMP-server }

4.2.3 swap监控[被动监控]

define service {

name generic-swap-service service_description Swap Useage

check_command check_nrpe!check_swap active_checks_enabled 1 passive_checks_enabled 1 parallelize_check 1 obsess_over_service 1 check_freshness 0 notifications_enabled 1 event_handler_enabled 1 flap_detection_enabled 1 process_perf_data 1 retain_status_information 1 retain_nonstatus_information 1 is_volatile 0 max_check_attempts 5 normal_check_interval 4 retry_check_interval 4 check_period 24x7 notification_interval 60 notification_period 24x7 notification_options w,u,c,r contact_groups admins register 0 }

通过使用模板监控 define service {

use generic-swap-service ###这就是使用的在templates.cfg定义好的模板名; host_name 1-LAMP-server }

4.2.4 内存监控[被动监控]

define service {

name generic-mem-service service_description Mem Useage

check_command check_nrpe!check_mem active_checks_enabled 1

passive_checks_enabled 1 parallelize_check 1 obsess_over_service 1 check_freshness 0 notifications_enabled 1 event_handler_enabled 1 flap_detection_enabled 1 process_perf_data 1 retain_status_information 1 retain_nonstatus_information 1 is_volatile 0 max_check_attempts 5 normal_check_interval 4 retry_check_interval 4 check_period 24x7 notification_interval 60 notification_period 24x7 notification_options w,u,c,r contact_groups admins register 0 }

通过使用模板监控 define service {

use generic-mem-service ###这就是使用的在templates.cfg定义好的模板名; host_name 1-LAMP-server }

4.2.5 系统负载监控[被动监控]

define service {

name generic-load-service service_description Current Load

check_command check_nrpe!check_load active_checks_enabled 1 passive_checks_enabled 1 parallelize_check 1 obsess_over_service 1 check_freshness 0 notifications_enabled 1 event_handler_enabled 1 flap_detection_enabled 1 process_perf_data 1 retain_status_information 1 retain_nonstatus_information 1 is_volatile 0 max_check_attempts 5 normal_check_interval 4 retry_check_interval 4 check_period 24x7 notification_interval 60 notification_period 24x7 notification_options w,u,c,r contact_groups admins register 0 }通过使用模板监控 define service {

use generic-load-service ###这就是使用的在templates.cfg定义好的模板名; host_name 1-LAMP-server }

4.4.6 I/O监控[被动监控]

define service {

name generic-iostat-service service_description Disk IOSTAT

check_command check_nrpe!check_iostat!5!11

active_checks_enabled 1 passive_checks_enabled 1 parallelize_check 1 obsess_over_service 1 check_freshness 0 notifications_enabled 1 event_handler_enabled 1 flap_detection_enabled 1 process_perf_data 1 retain_status_information 1 retain_nonstatus_information 1 is_volatile 0 max_check_attempts 5 normal_check_interval 4 retry_check_interval 4 check_period 24x7 notification_interval 60 notification_period 24x7 notification_options w,u,c,r contact_groups admins register 0 }

通过使用模板监控

define service {

use generic-iostat-service ###这就是使用的在templates.cfg定义好的模板名; host_name 1-LAMP-server }

4.2.7 ping监控[被动监控]

define service {

name generic-ping-service service_description PING

check_command check_ping!100.0,20%!500.0,60% active_checks_enabled 1 passive_checks_enabled 1 parallelize_check 1 obsess_over_service 1 check_freshness 0 notifications_enabled 1 event_handler_enabled 1 flap_detection_enabled 1 process_perf_data 1 retain_status_information 1 retain_nonstatus_information 1 is_volatile 0 max_check_attempts 5 normal_check_interval 4 retry_check_interval 4 check_period 24x7 notification_interval 60 notification_period 24x7 notification_options w,u,c,r contact_groups admins register 0 }

通过使用模板监控 define service {

use generic-ping-service ###这就是使用的在templates.cfg定义好的模板名; host_name 1-LAMP-server }

4.2.8 千台服务器模板化service部署配置 通过脚本批量部署生产监控的服务。

#把所有服务的配置都简写,然后把相关参数在模板templates.cfg中定义,如: [root@localhost objects]# cat services/services.cfg

define service {

use generic-disk-service host_name test_1 }

define service {

use generic-swap-service host_name test_1 }

define service {

use generic-mem-service host_name test_1 }

define service {

use generic-load-service host_name test_1 }

define service {

use generic-iostat-service host_name test_1 }

define service {

use generic-ping-service host_name test_1 }

4.2.9 URL监控(属于主动监控,主动监控不需要指定nrpe)

此类服务一般都是开启了对外提供服务的业务。这样的业务,我们一般采用主动监控的方式,当然,我们也通过写脚本来监控。(不推荐) 特别说明:为了防止服务频繁误报,在生产环境我们有时也会将域名对应的IP在/etc/hosts里保存;

定义域名监控:www.yangsheng.com地址监控

define service {

name generic-url-service service_description web_url

check_command check_http!-H www.yangsheng.com -w 10 -c 30 active_checks_enabled 1 passive_checks_enabled 1 parallelize_check 1 obsess_over_service 1 check_freshness 0 notifications_enabled 1 event_handler_enabled 1 flap_detection_enabled 1 process_perf_data 1 retain_status_information 1 retain_nonstatus_information 1 is_volatile 0 max_check_attempts 5 normal_check_interval 4 retry_check_interval 4 check_period 24x7 notification_interval 60 notification_period 24x7 notification_options w,u,c,r contact_groups admins register 0 }

url监控是通过check_http实现的,详细信息可以通过如下查看:

[root@nagios-server ~]# cd /usr/local/nagios/libexec/ [root@nagios-server libexec]# ./check_http –help 4.2.10 服务端口监控

注意写法host_name可以同时写多个,要用逗号分隔开。

define service {

name generic-port-service

service_description img_8150

check_command check_tcp!8150 active_checks_enabled 1 passive_checks_enabled 1 parallelize_check 1 obsess_over_service 1 check_freshness 0 notifications_enabled 1 event_handler_enabled 1 flap_detection_enabled 1 process_perf_data 1 retain_status_information 1 retain_nonstatus_information 1 is_volatile 0 max_check_attempts 5 normal_check_interval 4 retry_check_interval 4 check_period 24x7 notification_interval 60 notification_period 24x7 notification_options w,u,c,r contact_groups admins register 0 }

多端口监控:

check_command check_tcp!8150在其后面添加多个端口即可,如:check_command check_tcp!8150!8080!80 4.2.11 oracle生产环境监控

4.3 Nagios错误排查及解决方法

错误1:

[root@nagios-server services]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

…省略若干…

Checking objects...

Error: Service check command 'check_nrpe!check_load' specified in service 'Current Load' for host '2-LAMP-server' not defined anywhere!

Error: Service check command 'check_nrpe!check_iostat!!11' specified in service 'Disk IOSTAT' for host '2-LAMP-server' not defined anywhere!

Error: Service check command 'check_nrpe!check_disk' specified in service 'Disk Partition' for host '2-LAMP-server' not defined anywhere!

Error: Service check command 'check_nrpe!check_mem' specified in service 'Mem Useage' for host '2-LAMP-server' not defined anywhere!

Error: Service check command 'check_nrpe!check_swap' specified in service 'Swap Useage' for host '2-LAMP-server' not defined anywhere!

Error: Service check command 'check_nrpe!check_load' specified in service 'Current Load' for host '3-LAMP-server' not defined anywhere!

Error: Service check command 'check_nrpe!check_iostat!!11' specified in service 'Disk IOSTAT' for host '3-LAMP-server' not defined anywhere!

Error: Service check command 'check_nrpe!check_disk' specified in service 'Disk Partition' for host '3-LAMP-server' not defined anywhere!

Error: Service check command 'check_nrpe!check_mem' specified in service 'Mem Useage' for host '3-LAMP-server' not defined anywhere!

Error: Service check command 'check_nrpe!check_swap' specified in service 'Swap Useage' for host '3-LAMP-server' not defined anywhere!

…省略若干…

Total Warnings: 0 Total Errors: 10

…省略若干…

根据错误提示,我们可以知道,是check_nrpe插件没有定义导致。解决方法如下: [root@nagios-server objects]# vi commands.cfg ###在末尾添加如下行;

define command {

command_name check_nrpe

command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ }

此时可以重新执行检查命令:

[root@nagios-server objects]# service nagios checkconfig Running configuration check... OK.

[root@nagios-server objects]# service nagios configtest

…省略若干…

Total Warnings: 0 Total Errors: 0

…省略若干…

[root@nagios-server objects]# service nagios reload

错误2:

打开浏览器如服务内容,出现下面英文错误提示:

可以按照如下方法解决上面的错误:

[root@nagios-server etc]# pwd /usr/local/nagios/etc

[root@nagios-server etc]# cp cgi.cfg cgi.cfg.bak [root@nagios-server etc]# vi cgi.cfg +118

authorized_for_system_information=nagiosadmin,yangsheng

authorized_for_configuration_information=nagiosadmin,yangsheng authorized_for_system_commands=nagiosadmin,yangsheng authorized_for_all_services=nagiosadmin,yangsheng authorized_for_all_hosts=nagiosadmin,yangsheng

authorized_for_all_service_commands=nagiosadmin,yangsheng authorized_for_all_host_commands=nagiosadmin,yangsheng authorized_for_read_only=user1,user2,yangsheng

在vi命令行模式下进行替换:“:% sub /nagiosadmin/nagiosadmin,yangsheng/g” 更改后的对比:

[root@nagios-server etc]# diff cgi.cfg cgi.cfg.bak

错误3:

重新访问nagios页面,如出现如下错误:

排查思路:

第一个问题:ERROR: Device incorrectly specified

1).现在服务器中测试

[root@nagios-server libexec]# /usr/local/nagios/libexec/check_nrpe -H 192.168.1.2 -c check_iostat ERROR: Device incorrectly specified …省略若干…

/usr/local/nagios/libexec/check_iostat:

-d Device to be checked (without the full path, eg. sda)

-c ,, Sets the CRITICAL level for tps, KB_read/s and KB_written/s, respectively -w ,, Sets the WARNING level for tps, KB_read/s and KB_written/s, respectively 2).在客户端测试

[root@test1 libexec]# /usr/local/nagios/libexec/check_iostat -w 6 -c 10

ERROR: Device incorrectly specified

…省略若干…

/usr/local/nagios/libexec/check_iostat:

-d Device to be checked (without the full path, eg. sda)

-c ,, Sets the CRITICAL level for tps, KB_read/s and KB_written/s, respectively -w ,, Sets the WARNING level for tps, KB_read/s and KB_written/s, respectively 根据提示可以看出,应该是没有指定参数;解决如下:

[root@test1 libexec]# /usr/local/nagios/libexec/check_iostat -d sda -w 1000 -c 2000

OK - I/O stats tps=4.16 KB_read/s=253.89 KB_written/s=38.91 | 'tps'=4.16; 'KB_read/s'=253.89; 'KB_written/s'=38.91;

#还的修改客户端的nrpe.cfg文件:

command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20 command[check_mem]=/usr/local/nagios/libexec/check_memory.pl -w 10% -c 3% command[check_disk]=/usr/local/nagios/libexec/check_disk -w 15% -c 7% -p / command[check_swap]=/usr/local/nagios/libexec/check_swap -w 20% -c 10%

command[check_iostat]=/usr/local/nagios/libexec/check_iostat -w 6 -c 10 #更改如下 command[check_iostat]=/usr/local/nagios/libexec/check_iostat -d sda -w 1000 -c 2000 重启客户端nrpe服务即可;

第二个问题:NRPE:Unable to read output

1).现在服务端测试

[root@nagios-server libexec]# /usr/local/nagios/libexec/check_nrpe -H 192.168.1.2 -c check_mem NRPE: Unable to read output 2).客户端测试

[root@test1 ~]# /usr/local/nagios/libexec/check_memory.pl -H 192.168.1.2

Can't locate Nagios/Plugin.pm in @INC (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at /usr/local/nagios/libexec/check_memory.pl line 26. BEGIN failed--compilation aborted at /usr/local/nagios/libexec/check_memory.pl line 26. 根据上面,可以得出结论,那就是客户端出问题了,解决如下:

[root@test1 ~]# rpm -ivh rpmforge-release-0.5.3-1.el6.rf.x86_64.rpm #下载地址:http://packages.sw.be/rpmforge-release/

yum check-update

yum install -y perl-Params-Validate perl-Math-Calc-Units perl-Regexp-Common-* perl-Config-Tiny perl-Nagios-Plugin.noarch 刷新页面即可;

错误4:

1).在服务器端执行命令测试

[root@localhost etc]# /usr/local/nagios/libexec/check_nrpe -H 192.168.0.2 -c check_iostat -bash: /usr/local/nagios/libexec/check_nrpe: No such file or directory

从这里就可以看出是由于没有找到check_nrpe命令; 编译安装nrpe插件即可;

错误5:

对于“Could not complete SSL handshake.”错误的解决方法:

修改被监控端的nrpe.cfg中的“allowed_hosts=192.168.0.1”,这里的IP指的是服务器端的地址; 最终效果图如下:

五、Nagios图形监控显示和管理【服务端】

5.1 PNP安装图形监控曲线(服务器端)

5.1.1 YUM安装PNP软件需要的基础包

PNP出图软件官方站点:http://docs.pnp4nagios.org/

先通过YUM安装pnp软件所需软件包:

[root@nagios-server ~]# yum -y install cairo pango zlib zlib-devel freetype freetype-devel gd gd-devel 5.1.2 安装libart_lgpl by rrdtool.

tar zxf libart_lgpl-2.3.17.tar.gz -C /usr/src/ cd /usr/src/libart_lgpl-2.3.17/ ./configure

make && make install

cp -rf /usr/local/include/libart-2.0 /usr/include/ 5.1.3

安装rrdtool软件包

tar zxf rrdtool-1.4.8.tar.gz -C /usr/src/ cd /usr/src/rrdtool-1.4.8/

./configure --prefix=/usr/local/rrdtool --disable-python --disable-tcl make

make install

ls /usr/local/rrdtool/bin/ cd 5.1.4

安装PNP出图软件

tar zxf pnp4nagios-0.6.22.tar.gz -C /usr/src/ cd /usr/src/pnp4nagios-0.6.22/ ./configure \\

--with-rrdtool=/usr/local/rrdtool/bin/rrdtool --with-perfdata-dir=/usr/local/nagios/share/perfdata make all make install

make install-webconf make install-config make install-init

------------------------------------------------------------ Nagios user/group: nagios nagios

Install directory: /usr/local/pnp4nagios

HTML Dir: /usr/local/pnp4nagios/share Config Dir: /usr/local/pnp4nagios/etc

Location of rrdtool binary: /usr/local/rrdtool/bin/rrdtool Version 1.4.8 RRDs Perl Modules: *** NOT FOUND ***

RRD Files stored in: /usr/local/nagios/share/perfdata/

process_perfdata.pl Logfile: /usr/local/pnp4nagios/var/perfdata.log Perfdata files (NPCD) stored in: /usr/local/pnp4nagios/var/spool

Web Interface Options:

------------------------- -------------------

HTML URL: http://localhost/pnp4nagios

Apache Config File: /etc/httpd/conf.d/pnp4nagios.conf

PNP提供一了个perl脚本,用如下命令查看:

[root@nagios-server pnp4nagios-0.6.22]# ls /usr/local/pnp4nagios/libexec/ check_pnp_rrds.pl process_perfdata.pl rrd_convert.pl rrd_modify.pl

[root@nagios-server ~]#cd /usr/local/pnp4nagios/share/ #####不做这步一定会报错的 [root@nagios-server share]# mv install.php install.php.bak 5.1.5 Nagios出图相关基本配置

注:PNP4nagios有5种模式:1. Synchronous mode;2. Bulk mode;3. Bulk mode with NPCD;4. Bulk mode with NPCD and npcdmod;

5. Gearman mode;

实验是以第三种:Bulk mode with NPCD来实现nagios出图,用第3种模式是因为为了更好的兼容nagios4.x版本,而且第3

种模式不会出现nagios出图错误。有关其他模式的配置请参考:https://docs.pnp4nagios.org/pnp-0.6/config

1).执行编辑命令vi /usr/local/nagios/etc/nagios.cfg +813,修改nagios.cfg主配置文件;

814 process_performance_data=1

826 host_perfdata_command=process-host-perfdata

827 service_perfdata_command=process-service-perfdata

836 host_perfdata_file=/usr/local/nagios/var/host-perfdata

837 service_perfdata_file=/usr/local/nagios/var/service-perfdata 849

#host_perfdata_file_template=[HOSTPERFDATA]\\t$TIMET$\\t$HOSTNAME$\\t$HOSTEXECUTIONTIME$\\t$HOSTOUTPUT$\\t$HOSTPERFDATA$ 850

host_perfdata_file_template=DATATYPE::HOSTPERFDATA\\tTIMET::$TIMET$\\tHOSTNAME::$HOSTNAME$\\tHOSTPERFDATA::$HOSTPERFDATA$\\tHOSTCHECKCOMMAND::$HOSTCHECKCOMMAND$\\tHOSTSTATE::$HOSTSTATE$\\tHOSTSTATETYPE::$HOSTSTATETYPE$ 851

#service_perfdata_file_template=[SERVICEPERFDATA]\\t$TIMET$\\t$HOSTNAME$\\t$SERVICEDESC$\\t$SERVICEEXECUTIONTIME$\\t$SERVICELATENCY$\\t$SERVICEOUTPUT$\\t$SERVICEPERFDATA$ 852

service_perfdata_file_template=DATATYPE::SERVICEPERFDATA\\tTIMET::$TIMET$\\tHOSTNAME::$HOSTNAME$\\tSERVICEDESC::$SERVICEDESC$\\tSERVICEPERFDATA::$SERVICEPERFDATA$\\tSERVICECHECKCOMMAND::$SERVICECHECKCOMMAND$\\tHOSTSTATE::$HOSTSTATE$\\tHOSTSTATETYPE::$HOSTSTATETYPE$\\tSERVICESTATE::$SERVICESTATE$\\tSERVICESTATETYPE::$SERVICESTATETYPE$

862 host_perfdata_file_mode=a 863 service_perfdata_file_mode=a

873 host_perfdata_file_processing_interval=30 874 service_perfdata_file_processing_interval=30

883 host_perfdata_file_processing_command=process-host-perfdata-file

884 service_perfdata_file_processing_command=process-service-perfdata-file

2).修改commands.cfg配置文件,约227-243行;

# 'process-host-perfdata' command definition define command{

command_name process-host-perfdata

command_line /bin/mv /usr/local/nagios/var/host-perfdata /usr/local/pnp4nagios/var/spool/host-perfdata.$TIMET$ }

# 'process-service-perfdata' command definition define command{

command_name process-service-perfdata

command_line /bin/mv /usr/local/nagios/var/service-perfdata

/usr/local/pnp4nagios/var/spool/service-perfdata.$TIMET$ }

# 'check_nrpe' command definition define command{

command_name check_nrpe

command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ }

# 'check_weburl' command definition define command{

command_name check_weburl

command_line $USER1$/check_http $ARG1$ -w 10 -c 30 }

3).修改hosts.cfg和services.cfg添加图形图标

[root@nagios-server perfdata]# vi /usr/local/nagios/etc/objects/hosts.cfg define host {

use linux-server host_name 2-LAMP-server alias 2-LAMP-server address 192.168.1.2

action_url /pnp4nagios/index.php?host=$HOSTNAME$ }

[root@nagios-server perfdata]# vi /usr/local/nagios/etc/objects/services.cfg define service {

use generic-service

host_name 2-LAMP-server

service_description Disk Partition check_command check_nrpe!check_disk max_check_attempts 8 normal_check_interval 3 retry_check_interval 2 check_period 24x7 notification_interval 1440 notification_period 24x7 notification_options w,u,c,r contact_groups admins process_perf_data 1

action_url /pnp4nagios/index.php?host=$HOSTNAME$&srv=$SERVICEDESC$ }

4).执行检查语法命令,查看状态; [root@nagios-server ~]# service nagios configtest …省略若干…

Checking misc settings...

Total Warnings: 0 Total Errors: 0

Things look okay - No serious problems were detected during the pre-flight check Object precache file created:

/usr/local/nagios/var/objects.precache

5).执行service nagios reload重新加载Nagios配置文件,使配置生效; 此时输入http://192.168.1.1/nagios/打开页面,提示如下:

第六部分 报警方式及生产报警策略

6.1 报警的种类及报警方式选用 6.1.1 报警的种类

1).邮件报警,建议使用公司邮箱,或者建立一个邮箱组,尽量不去使用非公司邮箱,以免把报警邮件当垃圾邮件处理。

2).飞信:需先在win32上安装飞信客户端,把对方手机加为好友。 3).邮件转短信:如139、126、189等。 4).http短信网关,使用如下地址如下:

有专门的公司提供直接发给信息到手机的短信网关,常用的报警就是一个url地址携带信息。 5).购买短信猫,类似手机终端一样的客户端硬件设备,实现报警,早起报警选用的一个方式。 6).电话语言报警,在报警时直接电话给报警负责人。

7).MSN、QQ及时通讯报警,模拟QQ、MSN发消息的功能,网友们开发了程序,从命令执行程序利用MSN、QQ协议直接发给MSN和QQ好友。

6.1.2 短信报警的种类

短信报警是前最重要的及时报警方式。

1).飞信:装个飞信客户端把对方手机加为好友(需要对方确认),然后就可以给对方发短信了。 2).邮件转短信,如139、163、126、189等。 3).http短信网关(收取发送短信费)。

4).购买短信猫。

6.1.3 生产情况适用那种报警?

生产环境中,一般会根据业务的紧急程度不同,多个报警策略结合使用,对于不需要紧急处理的业务一般

选择邮件报警,如:内存、磁盘空间的剩余率。对于重要且紧急的业务,会使用邮件加上短信同时报警,使用邮件报警便于记录故障详细信息,短信报警是及时提醒,短信的优点是及时,而邮件报警如果人不在电脑旁边就没办法知晓。

短信报警的缺点是报警内容有限,所以,工资中如果接到严重报警时,我们在紧急处理之前也会开启邮件

系统先查看邮件细节。

http短信网关的有点:①.简单、易用;②.稳定、可靠;③.收费合理;

6.1.4 故障报警分级 运维工程师值班职责:

第1条 值班报警(故障)分类

A类:磁盘空间、CPU、内存报警等为一般报警,运维内部采取常规处理方式; B类:网站域名不能打开为严重报警,需协调技术部相关人员会诊处理; 第2条 值班配有2部手机,遇到运维问题将报警:

若收到A类报警,原则上不限制处理时间,但以不影响服务为前提,进行及时处理;

若收到B类报警短信,值班人员需在10分钟内邮件告知运维全体同事及相关技术人员,并解决;

6.1.5 实战报警实施细节 6.1.5.1 邮件报警的基本配置

先编辑vi /usr/local/nagios/etc/objects/contacts.cfg +35;

######修改第35行;

define contact{

contact_name nagiosadmin ; Short name of user

use generic-contact ; Inherit default values from generic-contact template (defined above)

alias Nagios Admin ; Full name of user

email 972711021@qq.com ; <<***** CHANGE THIS TO YOUR EMAIL ADDRESS ****** } …省略若干…

define contactgroup{

contactgroup_name admins

alias Nagios Administrators members nagiosadmin }

#定义的联系人一定要在联系人组里面,如标记出蓝色字体,就是联系人nagiosadmin在admins组里;

再启动sendmail服务;

[root@nagios-server ~]# /etc/init.d/sendmail start

Starting sendmail: [ OK ] Starting sm-client: [ OK ] [root@nagios-server ~]# lsof -i:25

COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME

sendmail 3402 root 4u IPv4 17047 0t0 TCP 127.0.0.1:smtp (LISTEN)

默认的报警配置文件在command中:

# 'notify-host-by-email' command definition

define command{

command_name notify-host-by-email

command_line /usr/bin/printf \$NOTIFICATIONTYPE$\\nHost: $HOSTNAME$\\nState: $HOSTSTATE$\\nAddress: $HOSTADDRESS$

\\nInfo: $HOSTOUTPUT$\\n\\nDate/Time: $LONGDATETIME$\\n\$HOSTNAME$ is $HOSTSTATE$ **\ }

# 'notify-service-by-email' command definition define command{

command_name notify-service-by-email

command_line /usr/bin/printf \$NOTIFICATIONTYPE$\\n\\nService: $SERVICEDESC$\\nHost: $HOSTALIAS$\\nAddress: $HOSTA

DDRESS$\\nState: $SERVICESTATE$\\n\\nDate/Time: $LONGDATETIME$\\n\\nAdditional Info:\\n\\n$SERVICEOUTPUT$\\n\/bin/mail -s \$SERVICEDESC$ is $SERVICESTATE$ **\ }

#以上命令已在模板(generic-contact)中定义,不需要手动添加;

templates.cfg中有关邮件报警的联系人模板配置(默认已配置,如果配置飞信、MSN等要追加命令)

define contact{

name generic-contact ; The name of this contact template

service_notification_period 24x7 ; service notifications can be sent anytime host_notification_period 24x7 ; host notifications can be sent anytime service_notification_options w,u,c,r,f,s ; send notifications for all service states, flapping events, and scheduled downtime events

host_notification_options d,u,r,f,s ; send notifications for all host states, flapping events, and scheduled downtime events

service_notification_commands notify-service-by-email ; send service notifications via email host_notification_commands notify-host-by-email ; send host notifications via email register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL CONTACT, JUST A TEMPLATE! }

6.1.5.2 生产环境邮件报警配置【command命令优化】

方法1:

# 'notify-host-by-email' command definition define command{

command_name notify-host-by-email

# command_line /usr/bin/printf \$HOSTNAME$\\nState: $HOSTSTATE$\\nAddress: $HOSTADDRESS$\\nInfo: $HOSTOUTPUT$\\n\\nDate/Time: $LONGDATETIME$\\n\/bin/mail -s \

command_line /usr/bin/printf \$HOSTNAME$\\nState: $HOSTSTATE$\\nAddress: $HOSTADDRESS$\\nInfo: $HOSTOUTPUT$\\n\\nDate/Time: $LONGDATETIME$\\n\/bin/mail -s \ }

# 'notify-service-by-email' command definition define command{

command_name notify-service-by-email

#command_line /usr/bin/printf \\Nagios *****\\n\\nNotification Type: $NOTIFICATIONTYPE$\\n\\nService: $SERVICEDESC$\\nHost: $HOSTALIAS$\\nAddress: $HOSTADDRESS$\\nState: $SERVICESTATE$\\n\\nDate/Time:

$LONGDATETIME$\\n\\nAdditional Info:\\n\\n$SERVICEOUTPUT$\$HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **\

command_line /usr/bin/printf \\Nagios *****\\n\\nNotification Type: $NOTIFICATIONTYPE$\\n\\nService: $SERVICEDESC$\\nHost: $HOSTALIAS$\\nAddress: $HOSTADDRESS$\\nState: $SERVICESTATE$\\n\\nDate/Time:

$LONGDATETIME$\\n\\nAdditional Info:\\n\\n$SERVICEOUTPUT$\$SERVICESTATE$ \ }

方法2:通过PHP程序或者PYTHON发送邮件报警(可以网上找)

6.1.5.3 邮件转短信(略) 6.1.5.4 联系人分级报警生产策略

#手机短信用户

define contact{

contact_name shouji_user1 use generic-contact alias Nagios admin email 18301047710@139.com }

define contact{

contact_name shouji_user2 use generic-contact alias Nagios admin

email 13121191016@139.com }

#邮件用户及msn帐号

define contact{

contact_name msn_user3 use generic-contact

alias Nagios Admin

email user3@qq.com address1 user3@163.com }

define contact{

contact_name msn_user4 use generic-contact alias Nagios Admin

email user4@163.org address1 user4@qq.com }

#仅邮件用户

define contact{

contact_name youjian_user5 use generic-contact alias Nagios Admin

email user5@sina.com,user5@139.com }

define contact{

contact_name youjian_user6 use generic-contact alias Nagios users

email user6@163.com }

…省略若干… #手机组

define contactgroup{

contactgroup_name shoujiusers

alias Nagios Administrators members shouji_user1,shouji_user2 }

##邮件及msn账号组

define contactgroup{

contactgroup_name msnusers

alias Nagios Administrators members msn_user3,msn_user4 }

#仅邮件组

define contactgroup{

contactgroup_name mailusers alias Nagios users

members youjian_user5,youjian_user6 } 6.2 手机短信报警

手机短信报警可采用139、126、189信箱(邮箱转短信),飞信,SMS网关,一般用于紧急的业务报警。

6.2.1 飞信软件报警配置

飞信机器人下载地址:http://bbs.it-adv.net/viewthread.php?tid=1081&extra=page=1

6.2.1.1创建 /usr/local/fetion/lib目录,再通过rz将fetion上传到该目录,并赋予755权限;

[root@nagios-server ~]# mkdir -p /usr/local/fetiion/lib [root@nagios-server ~]# cd /usr/local/fetiion/ [root@nagios-server fetiion]# rz

[root@nagios-server fetiion]# chmod 755 fetion [root@nagios-server fetiion]# cd

6.2.1.2将linuxso_20101113.rar上传到/usr/local/fetion/lib目录下,并解压;

[root@nagios-server ~]# cd /usr/local/fetiion/lib/ [root@nagios-server lib]# rz [root@nagios-server lib]# ls linuxso_20101113.rar

[root@nagios-server lib]# cd

6.2.1.3安装解压rar软件:

[root@nagios-server ~]# wget http://www.rarsoft.com/rar/rarlinux-3.6.0.tar.gz [root@nagios-server ~]# tar zxf rarlinux-3.6.0.tar.gz -C /usr/src/ [root@nagios-server ~]# cd /usr/src/rar/ [root@nagios-server rar]# ls [root@nagios-server rar]# make

[root@nagios-server rar]# cd /usr/local/fetiion/lib/ [root@nagios-server lib]# unrar x linuxso_20101113.rar

-bash: /usr/local/bin/unrar: /lib/ld-linux.so.2: bad ELF interpreter: No such file or directory 解决方法如下:

[root@nagios-server lib]# yum -y install ld-linux.so.2 [root@nagios-server lib]# unrar x linuxso_20101113.rar

unrar: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory 解决方法如下:

[root@nagios-server lib]# yum -y install libstdc++.so.6 [root@nagios-server lib]# unrar x linuxso_20101113.rar [root@nagios-server lib]# ls

libACE-5.7.2.so libACE_SSL-5.7.2.so libcrypto.so.4 libssl.so.4 linuxso_20101113.rar [root@nagios-server lib]# cd

6.2.1.4添加fetion库:

[root@nagios-server fetiion]# vi /etc/ld.so.conf.d/fetion.conf /usr/local/fetiion/lib

[root@nagios-server ~]# ldconfig

[root@nagios-server fetiion]# ./fetion

./fetion: error while loading shared libraries: libgssapi_krb5.so.2: cannot open shared object file: No such file or directory

#还是少一个库文件,根据提示安装即可;

[root@nagios-server fetiion]# yum -y install libgssapi_krb5.so.2 ###正常情况这里应该会报错 Loaded plugins: fastestmirror, refresh-packagekit, security Loading mirror speeds from cached hostfile …省略若干…

Protected multilib versions: krb5-libs-1.10.3-15.el6_5.1.i686 != krb5-libs-1.10.3-10.el6_4.6.x86_64 You could try using --skip-broken to work around the problem You could try running: rpm -Va --nofiles –nodigest [root@nagios-server fetiion]# cd

###这里yum是不行的,只能通过光盘安装; [root@nagios-server ~]# mount /dev/cdrom /media/

[root@nagios-server ~]# rpm -ivh /media/Packages/krb5-libs-1.10.3-10.el6_4.6.i686.rpm ###会提示krb5-libs需要的依赖包; error: Failed dependencies:

libcom_err.so.2 is needed by krb5-libs-1.10.3-10.el6_4.6.i686 libkeyutils.so.1 is needed by krb5-libs-1.10.3-10.el6_4.6.i686

libkeyutils.so.1(KEYUTILS_0.3) is needed by krb5-libs-1.10.3-10.el6_4.6.i686 libselinux.so.1 is needed by krb5-libs-1.10.3-10.el6_4.6.i686 ###根据提示安装相关依赖程序即可;

[root@nagios-server ~]# rpm -ivh /media/Packages/libcom_err-1.41.12-18.el6.i686.rpm [root@nagios-server ~]# yum -y install libkeyutils.so.1 [root@nagios-server ~]# yum -y install libselinux.so.1 ###可以安装libgssapi_krb5.so.2库文件了。

[root@nagios-server ~]# rpm -ivh /media/Packages/krb5-libs-1.10.3-10.el6_4.6.i686.rpm [root@nagios-server ~]# /usr/local/fetiion/fetion

/usr/local/fetiion/fetion: error while loading shared libraries: libz.so.1: cannot open shared object file: No such file or directory 又报错了,继续装吧!

[root@nagios-server ~]# yum -y install libz.so.1

再次执行:[root@nagios-server ~]# /usr/local/fetiion/fetion 出现如下画面即是成功

6.2.2 进行人工测试

在进行人工测试,现在windows下通过飞信工具,注册飞信账号,然后,把两个飞信账号(手机)之间通过

加为好友,然后把其中一个飞信账号手机号作为机器人发信测试,命令如下:

/usr/local/fetiion/fetion --mobile=18301044710 --pwd=000000 --to=13121191016 --msg-type=1 --msg-utf8=\test\

注:第一次执行发送短信会提示输入验证码,这时到/usr/local/fetiion目录下下载(sz -y 名字)到windows中,

查看图片输入验证码,只是第一次需要,后续发送短信不在提示输入;

6.2.3 配置nagios通过飞信报警格式

6.2.3.1 修改command.cfg文件

[root@nagios-server ~]# vi /usr/local/nagios/etc/objects/commands.cfg +37 define command{

command_name notify-host-by-fetion

command_line /usr/local/fetion/fetion/fetion --mobile=18301047710 --pwd=000000 --to=13121191016 --msg-type=1 --msg-utf8=\

}

define command{

command_name notify-service-by-fetion

command_line /usr/local/fetion/fetion/fetion --mobile=18301047710 --pwd=000000 --to=13121191016 --msg-type=1 --msg-utf8=\

$HOSTADDRESS$\\nState: $SERVICESTATE$\\n\\nDate/Time: $LONGDATETIME$\\n\\nAdditional Info:\\n\\n$SERVICEOUTPUT$ **\

}

6.2.3.2 修改templates.cfg文件

define contact{

name generic-contact service_notification_period 24x7 host_notification_period 24x7

service_notification_options w,u,c,r,f,s host_notification_options d,u,r,f,s

service_notification_commands notify-service-by-email,notify-service-by-fetion host_notification_commands notify-host-by-email,notify-host-by-fetion register 0 }

6.2.3.3 重启nagios服务即可

[root@nagios-server ~]# service nagios restart

6.2.4 http网关短信报警配置

http短信网关报警,常用的报警就是一个url地址携带信息。优点:报警及时、服务有保证。购买短信服

务后,除了有界面可以发送报警(一般市场人员用)外,还有一个URL地址,地址后面携带账号、密码、及报警信息。URl地址如下面的形式:

http://s.ccme.cc/qxt/send.jsp?circle=yangsheng&pwd=000000& mobile=$CONTACT&service=abcd546-eee6-gg69-gg40-3gg0524c1f88d&msgid=23224&message=$TITLE[${alert_data} sa] 在这个地址中的信息如下表: 用户名 yangsheng 实际发送演示: #curl方式

curl -d cdkey=5ADF-EFA-2356-DEHER -d password=yangsheng -d phone=$CONTACT -d message=\sa]\http://s.ccme.cc/smsproxy/sendsms.action #wget –quit

\dfgc1f4567&msgid=213478&message=$TITLE[${ALERT_DATE} sa]\

验证密码 000000 目标手机 $CONTACT,脚本变量 消息内容 $TITLE[{alert_date} sa] 注:这里的账号、密码都不能使用的;

6.2.4.1 短信网关设备报警细节

1).开发报警脚本sms_send放于libexec下面,授权755

#!/bin/bash

#This is sms send scriipt

ALERT_DATE=$(date +%y-%m-%d\PROGNAME=`basename $0`

PROGPATH=`echo $0 | sed -e 's,[\\\\/][^\\\\/][^\\\\/]*$,,'`

print_usage() { echo \

echo \ exit 1 }

if [ $# -ne 2 ];then print_usage fi

#FORMAT \TITLE=$1 CONTACT=$2

#send_message method1 curl -d cdkey=5ADF-EFA-2356-DEHER -d password=yangsheng -d phone=$CONTACT -d message=\sa]\http://s.ccme.cc/smsproxy/sendsms.action ##send_message method2 #wget --quiet --spider

\dfgc1f4567&msgid=213478&message=$TITLE[${ALERT_DATE} sa]\

上传sms_send:

[root@nagios-server ~]# cd /usr/local/nagios/libexec/ [root@nagios-server libexec]# pwd /usr/local/nagios/libexec

[root@nagios-server libexec]# rz rz waiting to receive.

Starting zmodem transfer. Press Ctrl+C to cancel. Transferring sms_send...

100% 864 bytes 864 bytes/sec 00:00:01 0 Errors [root@nagios-server libexec]# ll sms_send

-rw-r--r-- 1 root root 864 May 9 2012 sms_send [root@nagios-server libexec]# chmod 755 sms_send

2).在command.cfg中定义报警命令

# 'notify-host-by-pager' command definition define command{

command_name notify-host-by-pager

command_line $USER1$/sms_send \ }

# 'notify-service-by-pager' command definition define command{

command_name notify-service-by-pager

command_line $USER1$/sms_send \ }

3).在模板templates.cfg中添加如下配置:

[root@nagios-server ~]# vi /usr/local/nagios/etc/objects/templates.cfg

define contact{

name generic-contact service_notification_period 24x7 host_notification_period 24x7

service_notification_options w,u,c,r,f,s host_notification_options d,u,r,f,s service_notification_commands

notify-service-by-email,notify-service-by-fetion,notify-service-by-pager

host_notification_commands notify-host-by-email,notify-host-by-fetion,notify-host-by-pager register 0 }

4).在里定义联系人

[root@nagios-server ~]# vi /usr/local/nagios/etc/objects/contacts.cfg define contact{

contact_name yangsheng-pager use generic-contact alias Nagios users pager 13121191016 }

define contactgroup{

contactgroup_name admins

alias Nagios Administrators

members nagiosadmin,yangsheng-pager }

这样就可以实现短信网关报警了。

Nagios还有其他报警方式,比如声音、语音报警等,这里就不一一列举。

第七部分 Nagios插件开发

7.1 什么是Nagios插件?

我们在前文部署Nagios服务时安装了Nagios-pulgins软件包,这个软件包就是一个Nagios插件包,可以通

过执行ls -l /usr/local/nagios/libexec/查看。其实,Nagios软件本身仅仅是一个监控平台,如果要监控具体的主机及服务的状态和数据信息,还必须要配置或调用插件或程序文件才能完成任务,因此,如果Nagios没有插件,Nagios就是一个空壳,啥都不能做。

7.2 为什么开发Nagios插件?

既然已经安装了插件软件包,为什么还要开发Nagios插件呢?

首先说生产场景中常用的大部分服务,都是不需要编写插件就可以完成监控的,如

check-http,check_tcp,check_nrpe,这些自带的插件已经很强大了。但是,仍然有部分我们想要监控的服务,Nagios

没有自带的插件,如:监控LVS RS的lo网卡的VIP,如监控NFS的状态,再比如监控iostat,mem,sar系统指标及相关app应用(MQ队列)等,这个时候就可以去网上下载或者自己编写插件。

7.3 编写Nagios插件规则 7.3.1 编写Nagios插件说明

Nagios插件是Nagios提供的一种通过可扩展的方式部署的程序组件,该插件可通过java、C/C++、php等多种语言开发,

运维或系统架构人员只要通过修改Nagios配置文件和相应的参数,就能很方便的将插件集成到Nagios中,实现对目标系统的监控。

Nagios插件程序提供2个返回值:一个是插件的推出状态码,另一个是插件在控制台打印的第一行数据。退出状态码可以

被Nagios主程序作为判断被监控系统服务状态的依据,控制台打印的第一行数据可以Nagios主程序作为被监控系统服务状态的补充说明,会显示在管理页面里面。

为了管理Nagios插件,Nagios每查询一个服务的状态时,就会产生一个子进程,并且它使用来自该命令的输出和退出代码

来确定具体的状态。Nagios主程序可识别的状态码和说明如下: OK —退出代码,0表示正常地工作;

WARNING —退出代码,1表示服务处于警告状态; CRITICAL —退出代码,2表示服务处于危险状态; UNKNOWN —退出代码,3表示服务处于未知状态;

最后一种状态通常表示该插件无法确定服务的状态。例如,可能出现内部错误。 相关状态可以从如下文件中查看:

[root@nagios-server libexec]# cat utils.sh #! /bin/sh

STATE_OK=0

STATE_WARNING=1 STATE_CRITICAL=2 STATE_UNKNOWN=3 STATE_DEPENDENT=4 提示:结尾用的不多; 7.3.2 Nagios插件开发原理

Nagios插件程序中需要调用监控服务规定的操作序列,并根据预先定义的规则,对返回结果进行分析,

判断服务的当前状态,并以指定的状态码退出程序,同时将对该状态的说明不换行输出到控制台。不同语言的系统退出函数如下:

Java system. exit(int status) Php exit(status)

Python sys.exit(int status) C/C++ return int status Bash exit int status

控制台打印函数示例如下:

Java system.out.println(String msg) Php echo msg Python print msg

C/C++ print(“%s”,msg) Bash echo msg 7.4 Nagios插件开发语言

Nagios的插件开发不限制任何开发语言,只要该插件能被Nagios调用获取到相应服务数据就OK,一般

来说,如能在命令行执行输出结果也可以,常用的插件开发语言shell/perl/python/php/c...

7.5 使用shell开发Nagios插件 7.5.1 编写检查weburl地址的插件 7.5.1.1 编写Nagios插件前的几个变量测试

A.测试变量的脚本

[root@nagios-server libexec]# vi test.sh echo $0

PROGNAME=`basename $0`

PROGPATH=`echo $0 | sed -e 's,[\\\\/][^\\\\/][^\\\\/]*$,,'` echo $PROGNAME echo $PROGPATH

B.执行test.sh

[root@nagios-server libexec]# sh /usr/local/nagios/libexec/test.sh /usr/local/nagios/libexec/test.sh test.sh

/usr/local/nagios/libexec

7.5.1.2 监控weburl的插件脚本

[root@nagios-server libexec]# vi check_url.sh #!/bin/bash

################################# # this script function is # # check_url # # create by yangsheng 2014.8.19 # # # ################################# PROGNAME=`basename $0`

PROGPATH=`echo $0 | sed -e 's,[\\\\/][^\\\\/][^\\\\/]*$,,'` . $PROGPATH/utils.sh

print_usage() { echo \

echo \ echo \

echo \ exit 1 }

if [ $# -ne 1 ];then print_usage fi

if wget -T 20 --spider $1 >/dev/null 2>&1;then echo 'HTTP/1.1 Ok' exit $STAT_OK else

echo 'HTTP/1.1 OK' exit $STAT_CRITICAL fi

7.5.1.3 手工进行测试weburl脚本插件

[root@nagios-server libexec]# sh /usr/local/nagios/libexec/check_url.sh Usage:

/bin/sh /usr/local/nagios/libexec/check_url.sh url For example:

/bin/sh /usr/local/nagios/libexec/check_url.sh http://www.wozhongla.com

[root@nagios-server libexec]# sh /usr/local/nagios/libexec/check_url.sh www.http://www.wodefanwen.com/ HTTP/1.1 Ok

[root@nagios-server libexec]# sh /usr/local/nagios/libexec/check_url.sh www.wozhongla.com HTTP/1.1 Ok

7.5.2 weburl插件部署过程(主动方式)

7.5.2.1 赋予check_url可执行权限

[root@nagios-server ~]# cd /usr/local/nagios/libexec/ [root@nagios-server libexec]# chmod +x check_url.sh 7.5.2.2 修改Nagios服务端配置文件

1).编辑commands.conf

[root@nagios-server ~]# vi /usr/local/nagios/etc/objects/commands.cfg # 'check_url' command definition by yangsheng 2014.8.19 define command {

command_name check_url

command_line $USER1$/check_url.sh $ARG1$ }

2).编辑services.cfg

[root@nagios-server ~]# vi /usr/local/nagios/etc/objects/services.cfg define service {

use generic-service host_name 2-LAMP-server service_description web_url-001

check_command check_url! www.wozhongla.com max_check_attempts 8 normal_check_interval 3 retry_check_interval 2 check_period 24x7 notification_interval 1440 notification_period 24x7 notification_options w,u,c,r contact_groups admins process_perf_data 1

action_url /pnp4nagios/index.php?host=$HOSTNAME$ }

3).重新加载Nagios服务

[root@nagios-server ~]# service nagios reload 4).刷新Nagios页面,即可看到如下效果:

7.5.3 编写检查mysql服务的插件

7.5.3.1 使用check_mysql插件被动监控mysql服务的配置过程 A.在Nagios客户端进行操作(被动监控的mysql服务器)

1.调试check_mysql插件

1.1 进入被监控的mysql服务器,登陆到mysql中,执行下面的命令建立测试账号供监控使用 [root@test1 ~]# mysql -u root

Welcome to the MySQL monitor. Commands end with ; or \\g. Your MySQL connection id is 2

Server version: 5.1.73 Source distribution

Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective

owners.

Type 'help;' or '\\h' for help. Type '\\c' to clear the current input statement.

mysql> grant select on test.* to passviemonitor@'localhost' identified by 'www.http://m.wodefanwen.com/'; Query OK, 0 rows affected (0.00 sec)

mysql> flush privileges;

Query OK, 0 rows affected (0.00 sec)

mysql> select user,host from mysql.user; +----------------+-----------+ | user | host | +----------------+-----------+ | root | 127.0.0.1 | | | localhost | | passviemonitor | localhost | | root | localhost | | | test1 | | root | test1 | +----------------+-----------+ 6 rows in set (0.00 sec)

mysql>

mysql> quit; Bye

1.2 进入Nagios libexec插件目录,测试check_mysql插件。

[root@test1 ~]# cd /usr/local/nagios/libexec/

[root@test1 libexec]# /usr/local/nagios/libexec/check_mysql -upassviemonitor -p www.http://m.wodefanwen.com/ -s /var/lib/mysql/mysql.sock

Uptime: 1222 Threads: 1 Questions: 9 Slow queries: 0 Opens: 15 Flush tables: 1 Open tables: 8 Queries per second avg: 0.7|Connections=5c;;; Open_files=16;;; Open_tables=8;;; Qcache_free_memory=0;;; Qcache_hits=0c;;; Qcache_inserts=0c;;; Qcache_lowmem_prunes=0c;;; Qcache_not_cached=0c;;; Qcache_queries_in_cache=0;;; Queries=9c;;; Questions=9c;;; Table_locks_waited=0c;;; Threads_connected=1;;; Threads_running=1;;; Uptime=1222c;;;

1.3 编辑nrpe.cfg,在结尾加入如下内容:

[root@test1 libexec]# vi /usr/local/nagios/etc/nrpe.cfg +229

command[check_mysql]=/usr/local/nagios/libexec/check_mysql -upassviemonitor -p www.http://m.wodefanwen.com/ -s /var/lib/mysql/mysql.sock -H localhost 1.4 重启nrpe服务

pkill nrpe

ps -ef | grep nrpe

root 2855 1677 0 14:51 pts/0 00:00:00 grep nrpe netstat -lnt |grep 5666

/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d netstat -lnt |grep 5666

tcp 0 0 0.0.0.0:5666 0.0.0.0:* LISTEN tcp 0 0 :::5666 :::* LISTEN

B.在Nagios服务端的操作

1.1 编辑services.cfg添加mysql服务监控:

注:由于是同一台机器,就不需添加到hosts.cfg中(已经添加了),如果是新机器就还需配置hosts.cfg文件;

[root@nagios-server ~]# vi /usr/local/nagios/etc/objects/services.cfg define service {

use generic-service

host_name 2-LAMP-server

service_description mysql server check_command check_nrpe!check_mysql max_check_attempts 8 normal_check_interval 3 retry_check_interval 2 check_period 24x7 notification_interval 1440

notification_period 24x7 notification_options w,u,c,r contact_groups admins process_perf_data 1

action_url /pnp4nagios/index.php?host=$HOSTNAME$ }

1.2 重新加载Nagios服务,出现如下图即正常: [root@nagios-server ~]# service nagios reload

7.5.3.2 使用check_mysql插件主动方式监控mysql服务

注:因为是主动方式监控,所以以下完成完全在Nagios服务端操作

1. 在mysql服务器上授权用户

mysql> grant select on test.* to passviemonitor@'192.168.1.1' identified by 'www.http://m.wodefanwen.com/'; 2. 首先来调试check_mysql命令

[root@nagios-server ~]# cd /usr/local/nagios/libexec/ [root@nagios-server libexec]# ls check_mysql* check_mysql check_mysql_query

[root@nagios-server libexec]# /usr/local/nagios/libexec/check_mysql -upassviemonitor -p www.http://m.wodefanwen.com/ -s /var/lib/mysql/mysql.sock -H 192.168.1.2

Uptime: 4230 Threads: 1 Questions: 33 Slow queries: 0 Opens: 15 Flush tables: 1 Open tables: 8 Queries per second avg: 0.7|Connections=16c;;; Open_files=16;;; Open_tables=8;;; Qcache_free_memory=0;;; Qcache_hits=0c;;; Qcache_inserts=0c;;; Qcache_lowmem_prunes=0c;;; Qcache_not_cached=0c;;;

Qcache_queries_in_cache=0;;; Queries=33c;;; Questions=33c;;; Table_locks_waited=0c;;; Threads_connected=1;;; Threads_running=1;;; Uptime=4230c;;;

注:check_mysql –h可以获得帮助;

3. 命令调试正确之后,我们进行编辑commands.cfg文件;

define command {

command_name check_mysql

command_line $USER1$/check_mysql -u$ARG1$ -p $ARG2$ -s $ARG3$ -H $HOSTADDRESS$ }

4. 编辑services.cfg文件,添加如下内容:

define service {

use generic-service

host_name 2-LAMP-server

service_description mysql-server

check_command check_nrpe!check_mysql!-upassviemonitor -p www.http://m.wodefanwen.com/ -s /var/lib/mysql/mysql.sock -H 192.168.1.2 max_check_attempts 8 normal_check_interval 3 retry_check_interval 2 check_period 24x7 notification_interval 1440 notification_period 24x7 notification_options w,u,c,r contact_groups admins process_perf_data 1

action_url /pnp4nagios/index.php?host=$HOSTNAME$ }

第八部分 监控windows服务器

8.1 安装NSClient++工具

下载地址:http://nsclient.org/nscp/downloads

8.1.1 在windows客户端安装NSClient++工具

注:5666端口是NRPE的监听端口,12489是check_nt的监听端口;

8.2 Nagios服务器端的配置

8.2.1 在Nagios服务器端测试check_nt命令

[root@nagios-server libexec]# ./check_nt -H 192.168.1.4 -v USEDDISKSPACE -p 12489 -s 123456 -w 80 -c 80 -l C C:\\ - total: 39.90 Gb - used: 8.00 Gb (20%) - free 31.90 Gb (80%) | 'C:\\ Used Space'=8.00Gb;31.92;31.92;0.00;39.90 监控磁盘使用率;

[root@nagios-server libexec]# ./check_nt -H 192.168.1.4 -v MEMUSE -p 12489 -s 123456 -w 60 -c 80

Memory usage: total:1279.50 MB - used: 390.82 MB (31%) - free: 888.68 MB (69%) | 'Memory usage'=390.82MB;767.70;1023.60;0.00;1279.50 监控CPU使用率;

8.2.2 编辑Nagios服务器端windows.cfg文件

[root@nagios-server ~]# vi /usr/local/nagios/etc/objects/windows.cfg define host{

use windows-server ; Inherit default values from a template host_name 4-winserver ; The name we're giving to this host alias 4-winserver ; A longer name associated with the host address 192.168.1.4 ; IP address of the host }

define hostgroup{

hostgroup_name windows-servers ; The name of the hostgroup alias 4-winserver ; Long name of the group }

define service{

use generic-service host_name 4-winserver

service_description NSClient++ Version

check_command check_nt!CLIENTVERSION -s 123456 }

define service{

use generic-service host_name winserver service_description Uptime

check_command check_nt!UPTIME -s 123456 }

define service{

use generic-service host_name winserver service_description CPU Load

check_command check_nt!CPULOAD!-l 5,80,90 -s 123456 }

define service{

use generic-service host_name 4-winserver

service_description NSClient++ Version

check_command check_nt!CLIENTVERSION -s 123456 }

define service{

use generic-service host_name 4-winserver service_description Uptime

check_command check_nt!UPTIME -s 123456 }

define service{

use generic-service host_name 4-winserver service_description CPU Load

check_command check_nt!CPULOAD!-l 5,80,90 -s 123456 }

define service{

use generic-service host_name 4-winserver service_description Memory Usage

check_command check_nt!MEMUSE!-w 80 -c 90 -s 123456 }

define service{

use generic-service host_name 4-winserver

service_description C:\\ Drive Space

check_command check_nt!USEDDISKSPACE!-l c -w 80 -c 90 -s 123456 }

define service{

use generic-service host_name 4-winserver service_description W3SVC

check_command check_nt!SERVICESTATE!-d SHOWALL -l W3SVC -s 123456 }

define service{

use generic-service host_name 4-winserver service_description W3SVC

check_command check_nt!SERVICESTATE!-d SHOWALL -l W3SVC -s 123456 }

define service{

use generic-service host_name 4-winserver service_description W3SVC

check_command check_nt!SERVICESTATE!-d SHOWALL -l W3SVC -s 123456 }

define service{

use generic-service host_name 4-winserver service_description Explorer

check_command check_nt!PROCSTATE!-d SHOWALL -l Explorer.exe -s 123456 }

8.2.3 编辑Nagios服务器端command.cfg文件

define command {

command_name check_nt

command_line $USER1$/check_nt -H $HOSTADDRESS$ -p 12489 -s PASSWORD -v $ARG1$ $ARG2$ }

8.2.4 编辑Nagios服务器端nagios.cfg文件

[root@nagios-server ~]# vi /usr/local/nagios/etc/nagios.cfg cfg_file=/usr/local/nagios/etc/objects/windows.cfg 8.3.4 重启Nagios服务

[root@nagios-server ~]# service nagios reload Running configuration check... …省略若干…

Website: http://www.nagios.org Reading configuration data... Read main config file okay... Warning:Duplicate definition found for command 'check_nt' (config file '/usr/local/nagios/etc/objects/commands.cfg', starting on line 272)

Error: Could not add object property in file '/usr/local/nagios/etc/objects/commands.cfg' on line 273. Error processing object config files! …省略若干…

解决办法:

#define command{

# command_name check_nt

# command_line $USER1$/check_nt -H $HOSTADDRESS$ -p 12489 -v $ARG1$ $ARG2$ # }

8.4.4 重启服务,看到如下效果即可

注:这里没有启动W3SVC服务,所以状态为警告; 监控windows的主机方法较多,这里就不详细介绍其他的。

总结:Nagios是一款强大的强大的监控工具,只要我们在命令行能够获取到内容,就可以通过脚本开发Nagios插件,从而达到监控信息的目的。对于监控服务器的硬件,各个厂商都有不同的软件,比如:戴尔—MegaCli,淘宝使用的ipmitool。

监控服务硬件链接:

MegaCli: http://dreamway.blog.51cto.com/1281816/1045604

ipmitool: http://linux.chinaunix.net/techdoc/system/2008/02/05/978119.shtml 监控服务器硬件的工具还有很多……

本文来源:https://www.bwwdw.com/article/i4oo.html

Top