Difference between revisions of "Information Systems:Zabbix NMS"

From uniWIKI
Jump to navigation Jump to search
Line 16: Line 16:
   
 
[[Category: I.T. Projects and Ideas]]
 
[[Category: I.T. Projects and Ideas]]
  +
[[Category: Power, Alarms, and Monitoring]]

Revision as of 11:44, 17 July 2019

Zabbix is an open source software package installed on a Linux operating system or used as a pre-built virtual machine appliance. The software's purpose is to monitor and display primarily SNMP statistics from network switches or any network attached hardware that speaks SNMP. At uniPHARM, we are using Zabbix to gain a lot of visibility on to what is happening on our network switch infrastructure. Zabbix can query each network port on each network switch and read all of the available statistics such as packets in/out or ping time or bandwidth usage and a hundred other metrics. Zabbix can display all those ports, functions and numbers as graphs so that humans can see if there is a bottleneck or some other problem. Zabbix does have some logic built in that makes choices on what it thinks is an existing problem. For example, if a port on switch has a large number of errors or a port's bandwidth is pegged at maximum or even if a port is unplugged, it will generate an alert on the Zabbix dashboard and wait for somebody to act on it. Unfortunately, Zabbix is not able to make configuration changes on the switches directly so keep in mind that Zabbix is a "network monitoring server" and not a "network management server".

The design of the Zabbix web interface does take a little getting used to. When not making any configuration changes to how Zabbix queries network attached hardware, stick to the "Monitoring" tab in the top left corner of the site. Within Monitoring you will be able to see one or more dashboards and a dedicated page for any listed Problems. You can also go to the Graphs tab and look and all the possible graphs by changing the group, host and graph drop down boxes. The Graphs tab is really the prime rib of the Zabbix application because the stat data can be presented in graphs that can go back as far as 2 years or 2 hours so trend lines can be visualsed easily.

The "Screens" tab is also extremely useful because it contains pre-built ... screens that show groups of graphs that are relevant together instead of individually. For example, one of the screens shows network traffic in/out plus errors on all of the wireless access points at uniPHARM. This shows which AP's have the most activity and shows what time of day that activity happens.

The "Maps" tab shows a Visio style graphical layout of the network. Obviously its not as good as Visio but for an OSS web based mapper, its pretty good. Try and keep these diagrams current and up to date because they will be helpful should I.S. staff get hit by a bus at any time.

The object organization in Zabbix is also a little tricky to understand. There is a hierarchy which starts with "Host Groups" in the "Configuration" tab. A Host Group can contain one or more hosts plus one or more Templates. A template in Zabbix is ... a way of interpreting SNMP data. A template that correctly displays SNMP data from a Cisco switch is going to be different than a template that reads Juniper switch SNMP data. Luckily, most of the major hardware brands are included in Zabbix and there is also a bunch of generics that can also be used. Once you create a Host Group and assigned a Template you can create a Host. This is the part where Zabbix fails to be convenient. It's automatic network discovery function doesn't work and you have to manually create each network object. Yes, this is very time consuming and not convenient. In the Host creation step you can either use SNMP or the Zabbix agent to query stats. The agent is a small program that can be installed on a physical server or a VM but SNMP is best used for anything that's hardware. Zabbix will start to query the newly created host right away and depending on what it is and what template was applied, there may be many items, graphs, apps and triggers created that all work together to present as much information as possible.

To be continued with alerting and problem resolution.