1.6.0 r18 On-Premises Connection Health

I have been noticing that Offline events are no longer being logged by the Dashboard.  This may be a problem with PMP 450  15.1.3 (Build BETA-5) firmware.  Can't be firmware as the data is obviously comming in to cnMaestro. DOH!!! 

This is one way to minimize the Offline events, I guess (TIC). 

I know it's hard to tell from the way the connection Health is scaled, should be double scaled left and right, but there are 0 Offline events across the time line.

Status Down alarms are now only raised if the device has been disconnected from cnMaestro for at least five minutes.  This change was made to prevent false alarms due to a shakey connection or cnMaestro traffic being deprioritized on the device for user traffic.  In the event log your provided, the STATUS offline events were followed by STATUS online events within ten seconds.  Due to the short duration no alarm was raised which is reflected in the dashboard charts.

The Status Down/Up events are immediately logged regardless of the offline duration, which is why you see them in your event logs.  The STA_REG and STA_DROP events are not tracked as alarms at all.  The dashbord widget  circled in the screen shot for alarm counts apply to alarms only and not events.

Alarms represent an active error state in a device that can eventually be cleared and resolved.  Events log the history of a device.  Some events are also tracked as alarms, but not all.

Does this really make sense?  I understand and appreciate the fact that "Offline" only events - cnMaestro comm problems - are not prioritized as they used to be, as they may not be customer affecting events.  However, an actual RF event - session affecting event - STA_DROP has to, in my opinion, be reported as an alarm, regardless of how quickly it is resolved.  This is an actual "Major", customer affecting, event and should be investigated.  I appreciate that it is logged in the reports, but it should also show up in the "Dashboard".  I continuously monitor the "Dashboard" and only deep dive the logs at the beginning of the day or if the "dashboard" gives me a reason to.  In my case, I could have an SM disconnecting "STA_DROP" every six minutes and would never know it until the next day when I check the logs.

I understand the "Active" alarm description, so I can see why there wouldn't be any "Top Alarms" display for an event which has been resolved.  However, how a customer affecting issue like a "STA_DROP" isn't  considered an "Alarm" worthy event escapes me.  It should definitely be a "Major" alarm and should be tracked in the "Alarms" widget.   

If I understand you correctly,  the "Connection Health" widget is only reporting cnMaestro comm loss - 5 min minimum down time, regardless of cause, right?

I don't remember the frequency of the ping-pong handshakes, about 30 to 40 seconds I think, so 5 minutes is OK.

Anyone else?

Are you saying that the SM's Status Offline alarm is insufficient for this case?  When we get the STA_DROP event on the AP we take the extra step to immeidately mark the device as offline.  If the SM remains offline for 5 minutes an alarm would be raised against it.

By "we take the extra step to immediately mark the device as offline" do you mean on the notification log, because as the screenshot shows, there's no display on the "Dashboard".  I don't know what the refresh rate of the "Alarms" or "Devices" widget is, but there's no indication of the "STA_DROP" event anywhere on the "Dashboard".  Now, if the condition does show and then clears when the event clears, how am I supposed to know it happened?  As I mentioned, now that we are using the 5 minute offline threshold, I wont see any drop events that recover within that time frame.  I could have reoccurring drops, on one or multiple SMs,  and they would never be reported displayed, as the "Offline" condition would not satisfy the continuous 5 minute threshold.  If you don't consider the session drop event as "Alarm" worthy, then maybe have a "Event" widget with 24 hour retention.  Either way it's an event of "Major" severity and as such should be displayed somewhere. 

Maybe session drops are rare for us and not others.  I would rather have a tech go out if I can see there are recurring outages rather than waiting for a customer to call with problems.

  

The "Dashboard" for me is a 24 hour "Status at a Glance" tool.

Thanks again