Unable to pull configs from CLI or units that slowly become unresponsive

There is probably a better subject title, as this issue is not the problem but only a symtom we have had with ePMP from software versions 2.52 thru 2.61.

We use Rancid to pull the configs from the ePMP's on our network. Over time, more and more ePMPs will stop responding to cli command "config show dump".  This is brought to my attention by the config diff's email which shows the entire config being deleted from CVS, and maybe a subsequent polling will retrieve the config.

If you ssh into the radio immediately after noticing it failing to supply the config data and reboot it, the unit will be fine for a while, anywhere from a week to a few weeks when the symtom will return.

If you ignore the symptom, eventually you will not be able to login at all, requiring a power cycle to restore proper operation.

So far, I have not determined if this affects the wireless connectivity or data flows.

My thoughts are something is corrupting memory, or causing the snmp interface to not respond, eventually affecting the SSH service as well. Currently, this morning, I have ten devices which do not respond to ssh but still are pingable.

Also, it seems to not be every device, but the same bunch over and over. This affects AP units and SM units, as well as a few setup as PTP links. As common as this is on our network, I have not seen or heard any talk of other operators with this issue.

I don't normally use CLI login on the radios, but I've seen a similar symptom - probably the same thing - where a unit that hasn't been rebooted for quite some time will fail to update firmware until I've rebooted.

I suspect a memory leak or something similar that slowly restricts resources on the radio until it reaches the point it can't function properly.  I've not seen any clear sign that it impairs the customer traffic, just administrative services.

My answer has been to reboot radios BEFORE I push new firmware to them.  Obviously that's not a suitable workaround if you're doing it regularly, like daily scheduled backups...

j

Hi kpenland,

Thank you for your report. There is an issue if you pull configuration with "config show dump" frequently.

We are going this fix it in upcoming firmware release.

As a workarounf you can use SNMP to export configuration file.

Best regards,

Dmitry

Thanks for confirming the issue, I was pulling configs once every 30 minutes. That is probably a little aggressive. I have changed to a once daily scan, that should alleviate the situation.