Radios Reverted to Old (Bank 2 Possibly) Configuration When Lost Contact with cnMaestro?

I upgraded my cnMaestro server tonight, and while I was doing it I noticed that a handful of radios reverted to a very old configuration.  The radios never rebooted, and thus never booted from the old bank.  When I checked the logs on the radios I see the following.

Feb 11 23:54:08 03-08-AT5W DEVICE-AGENT[2913]: Not received PONG for the last ping (1)
Feb 11 23:54:29 03-08-AT5W DEVICE-AGENT[2913]: Not received PONG for the last ping (2)
Feb 11 23:54:52 03-08-AT5W DEVICE-AGENT[2913]: Not received PONG for the last ping (3)
Feb 11 23:55:09 03-08-AT5W DEVICE-AGENT[2913]: Not received PONG for the last ping (4)
Feb 11 23:55:35 03-08-AT5W DEVICE-AGENT[2913]: Not received PONG for the last ping (5)
Feb 11 23:55:35 03-08-AT5W DEVICE-AGENT[2913]: Missed 5 consecutive PONGS, disconnecting with server
Feb 11 23:56:06 03-08-AT5W DEVICE-AGENT[2913]: callback_websocket: LWS_CALLBACK_CLOSED
Feb 11 23:56:06 03-08-AT5W DEVICE-AGENT[2913]: Attempting (re)connection in 5 seconds
Feb 11 23:56:27 03-08-AT5W DEVICE-AGENT[2913]: Timeout in select() - Cancelling!
Feb 11 23:56:27 03-08-AT5W DEVICE-AGENT[2913]: OpenConnection to 10.101.0.5:443 failed 
Feb 11 23:56:27 03-08-AT5W DEVICE-AGENT[2913]: Unable to discover cnMaestro URL (re-discover in 357 seconds)
Feb 11 23:56:27 03-08-AT5W DEVICE-AGENT[2913]: Attempting (re)connection in 5 minutes
Feb 12 00:02:41 03-08-AT5W DEVICE-AGENT[2913]: Timeout in select() - Cancelling!
Feb 12 00:02:41 03-08-AT5W DEVICE-AGENT[2913]: OpenConnection to 10.101.0.5:443 failed 
Feb 12 00:02:41 03-08-AT5W DEVICE-AGENT[2913]: Unable to discover cnMaestro URL (re-discover in 307 seconds)
Feb 12 00:02:41 03-08-AT5W DEVICE-AGENT[2913]: Attempting (re)connection in 5 minutes
Feb 12 00:06:06 03-08-AT5W DEVICE-AGENT[2913]: cfg_restore_timer_cb: Reverting to last known good configuration
Feb 12 00:06:06 03-08-AT5W DEVICE-AGENT[2913]: Restoring last good configuration
Feb 12 00:06:06 03-08-AT5W DEVICE-AGENT[2913]: Config restore successful

Is it possible the radios pulled the configuration from the second bank, or does the radio keep copies of previous configurations?  Why would they decide to restore a configuration?  The timing correlates to when I shutdown my cnMaestro server, so it seems to be causal, but could be coincidence.  APs running 3.4.1 and 3.5.1.

Thanks,

Dan

After a configuration update via cnMaestro, if a device is unable to reconnect to cnMaestro within 15 minutes it will roll back the latest configuration change.  This is done to prevent isolating a device due to a configuration change.  Since this happened when your cnMaestro sever was shut down this may be the cause.

Was configuration applied to the devices in question before shutting down the cnMaestro server?  You can check the configuration jobs page or the device details tab in cnMaestro to verify when configuration was applied.

The only config change made to the APs via cnMaestro was back in September of 2018, and has as status of "Success".  This was a mass update to all of my APs to switch from 5ms to 2.5ms frame size.  Configuration changes had also since been made to at least some of these APs directly via the web GUI on the radios themselves.  It seems that all of the manual config changes made since Sept 2018 and the config change made by cnMaestro back in Sept (Frame Size) were reverted.  From what I can tell, some pice of code didn't run on the APs to tell them that everything was cool with the cnMaestro config update.  Perhaps there should be some kind of watchdog-like code that checks to make sure that flag isn't inproperly set.  I'd think the flag should also be cleared when a config update/save is done directly on the radio.

The good news is that I can easily find all of the APs that reverted, by combing the network for a frame size of 5000.

Thanks,
Dan

Thanks,

Dan

The only other way cnMaestro would have pushed configuration to the device would be if the name or latitiud/longitude were changed in the device level Configuration tab, or if an AP was moved to another tower (changing the latitude/longitude).  cnMaestro pushes these new values to the device outside of a configuration job in these cases.  Do you know if this might have been done shortly before the cnMaestro server was shut down?

I have sent these details on to the ePMP team.