Difference between revisions of "Information Systems:Zabbix NMS"

From uniWIKI
Jump to navigation Jump to search
m (Added Linux agent info)
Line 18: Line 18:
   
 
==Configuration==
 
==Configuration==
===Linux Agents===
+
===Linux and Windows Agent===
 
The Zabbix agent can be installed on Linux servers for resource monitoring. Our current Linux server VMs are Debian-based, and therefore the zabbix-agent package is available in the apt repository. The configuration file is located in /etc/zabbix and has to be changed to point to the zabbix server.
 
The Zabbix agent can be installed on Linux servers for resource monitoring. Our current Linux server VMs are Debian-based, and therefore the zabbix-agent package is available in the apt repository. The configuration file is located in /etc/zabbix and has to be changed to point to the zabbix server.
  +
  +
Zabbix also offers an agent for Windows. The template on the server-side, however, is inferior i.e. it's a community-shared template and some parts are not in English.
   
 
==Additional Notes==
 
==Additional Notes==

Revision as of 14:55, 3 May 2021

Overview

Zabbix is an open source software package installed on a Linux operating system or used as a pre-built virtual machine appliance. The software's purpose is to monitor and display primarily SNMP statistics from network switches or any network attached hardware that speaks SNMP. At uniPHARM, we are using Zabbix to gain a lot of visibility on to what is happening on our network switch infrastructure. Zabbix can query each network port on each network switch and read all of the available statistics such as packets in/out or ping time or bandwidth usage and a hundred other metrics. Zabbix can display all those ports, functions and numbers as graphs so that humans can see if there is a bottleneck or some other problem. Zabbix does have some logic built in that makes choices on what it thinks is an existing problem. For example, if a port on switch has a large number of errors or a port's bandwidth is pegged at maximum or even if a port is unplugged, it will generate an alert on the Zabbix dashboard and wait for somebody to act on it. Unfortunately, Zabbix is not able to make configuration changes on the switches directly so keep in mind that Zabbix is a "network monitoring server" and not a "network management server".

  • http://zabbix.unipharm.local/zabbix
  • Username is Admin
  • Password is zabbix
  • As of July 19 2019, AD logins are working but the above credentials are superior than an AD login. The Linux appliance username is "appliance".

The design of the Zabbix web interface does take a little getting used to. When not making any configuration changes to how Zabbix queries network attached hardware, stick to the "Monitoring" tab in the top left corner of the site. Within Monitoring you will be able to see one or more dashboards and a dedicated page for any listed Problems. You can also go to the Graphs tab and look and all the possible graphs by changing the group, host and graph drop down boxes. The Graphs tab is really the prime rib of the Zabbix application because the stat data can be presented in graphs that can go back as far as 2 years or 2 hours so trend lines can be visualsed easily.

The "Screens" tab is also extremely useful because it contains pre-built ... screens that show groups of graphs that are relevant together instead of individually. For example, one of the screens shows network traffic in/out plus errors on all of the wireless access points at uniPHARM. This shows which AP's have the most activity and shows what time of day that activity happens. The "Maps" tab shows a Visio style graphical layout of the network. Obviously its not as good as Visio but for an OSS web based mapper, its pretty good. Try and keep these diagrams current and up to date because they will be helpful should I.S. staff get hit by a bus at any time.

The object organization in Zabbix is also a little tricky to understand. There is a hierarchy which starts with "Host Groups" in the "Configuration" tab. A Host Group can contain one or more hosts plus one or more Templates. A template in Zabbix is ... a way of interpreting SNMP data. A template that correctly displays SNMP data from a Cisco switch is going to be different than a template that reads Juniper switch SNMP data. Luckily, most of the major hardware brands are included in Zabbix and there is also a bunch of generics that can also be used. Once you create a Host Group and assigned a Template you can create a Host. This is the part where Zabbix fails to be convenient. It's automatic network discovery function doesn't work and you have to manually create each network object. Yes, this is very time consuming and not convenient. In the Host creation step you can either use SNMP or the Zabbix agent to query stats. The agent is a small program that can be installed on a physical server or a VM but SNMP is best used for anything that's hardware. Zabbix will start to query the newly created host right away and depending on what it is and what template was applied, there may be many items, graphs, apps and triggers created that all work together to present as much information as possible.

Zabbix has been configured to send out email alerts when certain conditions are present. For example, if a switch or access point is not pingable, then Zabbix will create a problem ticket and send an email. The built in templates also allow much more fine grained alerts but I have tried to group things together logically. The alert configuration is in the Configuration/Actions tab. Here alerts can be turned off or on or edited. Its important to understand that if for example, a switch or server loses power and then comes back up, the problem ticket that Zabbix generates will be closed by the automated resolution ticket because the state of the hardware returned to baseline normal. If Zabbix is alerting on something that is a false positive like a wonky fan tachometer or a bad temperature sensor, then what needs to happen is the "Trigger Item" in the Host configuration needs to be disabled. That will mute an alert if it is included in an alert package.

As mentioned above, network discovery in Zabbix, at least for switches and printers doesn't work in an obvious way - it might but I couldn't figure it out. Discovery of the VMware infrastructure does indeed work and infact it works a little too well. The Host groups get a little untidy with how Zabbix displays virtual machines because it is showing by cluster and then by host so individual guests show up twice. Anywho, as of July 2019, the config of how Zabbix displays and tracks statistic info from the VMHosts and from the guests is correct. There are some yummy graphs in the Screens section and those screen should show anyone all the info they would ever need to see what's happening on the infrastructure.

Configuration

Linux and Windows Agent

The Zabbix agent can be installed on Linux servers for resource monitoring. Our current Linux server VMs are Debian-based, and therefore the zabbix-agent package is available in the apt repository. The configuration file is located in /etc/zabbix and has to be changed to point to the zabbix server.

Zabbix also offers an agent for Windows. The template on the server-side, however, is inferior i.e. it's a community-shared template and some parts are not in English.

Additional Notes

Documentation to-do

  • Zabbix configuration - /etc/zabbix/, number of pollers, unreachable pollers, SNMP monitors
  • Templates - shared templates from community, difference between Zabbix templates and vendor MIBs
  • Triggers
  • Graphs
  • Screens

Zabbix Upgrade - March 2021

Zabbix was upgraded from 4.2 to 5.2 (the latest version). This was its first ever upgrade -norwizzle. Some notes from the upgrade:

  • The docs provide a sufficient guide, although Linux experience is required since Zabbix is a Linux-based appliance.
  • The underlying OS of our appliance is Ubuntu. There are other Zabbix flavors.
  • Zabbix 5.2 required PHP 7.2. Ubuntu was at version 16.04, which was too old and therefore the repository did not have an upgrade to PHP 7.2 available.
  • Ubuntu had to be upgraded from 16.04 to 18.04. A general Ubuntu upgrade guide was followed. Caution should always be taken when performing a major OS upgrade. On the other hand, Zabbix was not being used much by anyone, so YOLO.
  • And then...Zabbix did not start upon the post-upgrade system restart. Turns out the upgrade overrwrote zabbix_server.conf (I was cavalier during the debconf/apt diff process) and the php conf in web/ and the DB password was now blank. The upgrade did ask for confirmation regarding this change, but looking at the diff, the removal of the DB password line seemed reasonable. It wasn't reasonable. A new password was set using mysql -u root then SET PASSWORD 'zabbix'@'localhost' = '<hidden>' etc. Zabbix booted after that.

Original notes

Darren originally wrote this page. Some stuff has since been addressed, so his comments - as well as old sections - will be moved here for reference.
  • Updating Zabbix. I don't know how to do this, so either leave the Linux appliance as is, which is safe, or do some command line advanced magic to do in place upgrades of the Zabbix ... files. Oh and there is some sort of sharing page on the public Zabbix site where new or customized templates are available, that may be helpful in the future if we are using hardware that isn't included by default. A lot of hours were put into getting Zabbix usable, please take care of it, at the very least take a snapshot before doing anything crazy so that you can revert back.

Things Still To Do

  • Zabbix does apparently speak WMI to Windows machines but I could not figure out how to do this because the documentation is not great. It would be useful for Zabbix to get WMI info.
  • Better map icons. The default icons that are available in Zabbix are very limited and don't make creating maps as easy as it could. There is a way to import more icons/shapes but I wasn't able to get that far
  • External monitoring - this one appears to be super complicated but there may be potential for Zabbix to replace the 24x7 Zoho service we use. Of course the downside of that is that because Zabbix is a VM it is internal and dependent on everything else working to be able to properly monitor external sites. The extended thought is that Zabbix as a "thing" won't be able to alert on anything if either the VM hangs or the host it is on hangs or is otherwise down itself. Funny how we didn't build in that redundancy to the VMware environment.