We are attempting to use cnMaestro to update SM's/AP's on several sites and are running into lots of problems. The upgrade success rate appears to be in the ~25% range across our network using cnMaestro.
Often times, cnMaestro displays 'SW update file downloaded failed.' The SM's log contains:
Nov 5 07:35:29 CUSTOMER DEVICE-AGENT: platform_sw_update: ENTRY
Nov 5 07:35:30 CUSTOMER DEVICE-AGENT: do_sw_update2: sw_udpate state 0 new_state 1,
Nov 5 07:35:31 CUSTOMER DEVICE-AGENT: SW_UPDATE Failed status=1 msg = [SW update file download failed], notify cnMaestro
Nov 5 07:35:32 CUSTOMER DEVICE-AGENT: do_sw_update2: sw_udpate state 1 new_state 0,
Nov 5 07:35:32 CUSTOMER DEVICE-AGENT: do_sw_update2: Exited while loop sw_upd state=0
Other times, cnMaestro reports 'Device timed out during update. Last known status was: rebooting device." However, the SM was actually upgraded successfully and is back online when I browse it.
We also see some devices that only have a red '<' displayed in their upgrade status.
Another error that we see:
Mar 27 10:14:26 CUSTOMER DEVICE-AGENT: SW_UPDATE Failed status=21 msg = [General error. Device has no free memory], notify cnMaestro
It looks like something is causing a memory leak in this process and is causing the SM's to run out of memory. We have to manually reboot them to fix this.
In the past we have used CNUT to update ePMP and haven't ever ran into any issues. Based on our experience with using cnMaestro to upgrade ePMP devices so far, I do not think it's quite ready for production.
Please check the reachability from cloud management to the radio ePMP.
There does not appear to be a network connectivity issue.
Furthermore, why would a network connectivity problem cause the SM to run out of memory?
I think the 'network connectivity' suggestion was regarding the first part of your post, with failed downloads and failed status refresh.
Regarding the memory situation, I've run into that on occasion (NOT using cnMaestro) and rebooting the SM resolves it. I assumed it was in-memory logging or something, since I only saw it on units that had been up and running for several weeks - but I never dug deeper. I usually update our ePMP SMs via a script, and I just added a reboot-and-wait before it tries to start the update process. (takes a lot longer - initially I just ran the script a second time and it picked up the problem units, since they got rebooted the first time through - but I usually fire the script off and walk away anyway so I don't really care if it takes a few extra hours to process everything)
Please note that the minimum recommended version for ePMP devices managed by cnMaestro is version 2.6.1. Compared to ePMP version 2.6, version 2.6.1 includes important updates to improve the link between the device and cnMaestro (i.e. timeout issue).
This is a reminder to please perform this upgrade soon if you haven't yet.
With that being said, if you are using cnMaestro to upgrade to ePMP to version 2.6.1, allow the software upgrade job to complete (even if some devices in the job return an error message indificating timeout.). Once the job has completed, you can return to the SW upgrade page to re-select devices that still show the old software version. You will notice that many of the devices with the "timeout error" actually did upgrade to the new version.
If you have a set of SM devices with the memory issue described above, the work-around is to reboot the device first. This can be done by clicking on the device in the cnMaestro device tree, then click on the Monitor --> Tools page. There is a RED button next to the picture of each device that can be used to reboot the device. Re-running the software upgrade will resolve this issue.