Network monitoring is important. When things break, they need to be fixed, so keeping a watchful eye on all network elements at all times is critical. There’s a hidden issue in network monitoring that often gets missed, especially by the less experienced. Monitoring a complex network can generate large amounts of information in a short amount of time. How will you distribute that information, and how will you manage how your employees and vendors are engaged?
- Which network events need to be reviewed and which can simply be archived for future reference if needed?
- Which network events can be emailed for review when convenient? Which demand 24×7 SMS/text message alerts and immediate action by live humans?
- Are you a 24×7 operation with zero tolerance for downtime? Can a dead web server on Sunday evening be handled on Monday morning?
Consider that human beings are really good at adapting. Consider also that we’re creatures of habit and routine. The point is that even the most dedicated employees can easily fall into habits that could result in ignoring critical network alerts if the process isn’t managed properly. To avoid this, you’ll want to address two issues – false alarms and alarm overload. False alarms can happen now and then, but if you’re issuing alerts on a regular basis that can simply be ignored, you’re conditioning your team to ignore alerts. Obviously not a good thing. If you find a device or a link that continually triggers false alarms, do what you must to either fix it, or adapt the way you monitor that element to avoid this whenever possible.
Alarm overload is very common, especially among IT managers that have just discovered that they can micro-monitor every last nano-event on the network. Cool, but just because you can doesn’t always mean you should (which applies to many things in life). Filling mailboxes or blowing up mobile phones with network notifications means you’re burying the really important things under mountains of the mundane. Again, you’re conditioning your people to ignore alerts and at some point this will bite you in an uncomfortable place.
We have direct experience with one customer where the network monitoring system watches every single element at all times. Servers and Internet links (important), but also workstations, laptops, and even printers and scanner/fax devices (arguably less critical). Let us know if Jenny’s workstation is about to run out of free disk space. Don’t tell us that she shut down and went home for the evening (unless you’re paying us to chain Jenny to her desk for the weekend). Log that info if you feel you must, but sending email and/or SMS alerts on that event is just a bad idea because it slowly erodes the importance of every alert you send. Network monitoring and associated alerts are a good example of when more isn’t necessarily better.
By all means, gather as much information as you want to, but pay close attention to how you distribute that information. Don’t lead your team into the information overload quicksand where they can slowly sink into unintended complacency. Most network engineers will tell you, “If you want me to act on messages sent to my phone at 3 AM, make sure that only truly important messages go there.”