We’ve been working with Cambium support on this and we do have a clue, it may have something to do with the SNMPd having issues. We also only seen this issue with 802.11n clients. We were able to grab some logs from a wedged SM that was still connected to cnMaestro. Our current strategy is to use cnMaestro to reboot all of our ePMP SM’s late at night on a semi-weekly basis. This issue doesn’t appear to happen until past 7 days of uptime, and it only happens to a small % of our SM’s.
Here’s a dump of those logs:
Jan 3 20:06:24 Client CN7744 kernel: [2019276.450000] SM disassociated from AP[00:04:56:22:29:bf] F=5240 11naht20. Reason: 32 (NO ALLOCATION ON AP)
Jan 3 20:07:05 Client CN7744 kernel: [2019317.240000] SM associated with AP[00:04:56:22:29:bf]
Jan 3 20:07:07 Client CN7744 miniupnpd[3126]: ioctl(s, SIOCGIFADDR, …): Cannot assign requested address
Jan 3 20:07:07 Client CN7744 miniupnpd[3126]: Failed to get IP for interface ath0
Jan 12 14:11:15 Client CN7744 kernel: [2775567.930000] SM disassociated from AP[00:04:56:22:29:bf] F=5240 11naht20. Reason: 33 (GPFs MISS)
Jan 12 16:10:09 Client CN7744 kernel: [2782701.510000] SM associated with AP[00:04:56:22:29:bf]
Jan 12 16:10:11 Client CN7744 miniupnpd[3126]: ioctl(s, SIOCGIFADDR, …): Cannot assign requested address
Jan 12 16:10:11 Client CN7744 miniupnpd[3126]: Failed to get IP for interface ath0
Feb 7 12:32:56 Client CN7744 snmpd[7593]: DFS status: N/A
Feb 8 09:24:01 Client CN7744 snmpd: Watchdog:Abnormal SNMPd stop occured, restarting…
Feb 8 10:56:02 Client CN7744 snmpd: Watchdog:Abnormal SNMPd stop occured, restarting…
Feb 8 10:56:03 Client CN7744 snmpd[7661]: DFS status: N/A
Feb 8 11:08:01 Client CN7744 snmpd: Watchdog:Abnormal SNMPd stop occured, restarting…
Feb 8 11:08:02 Client CN7744 snmpd[9997]: DFS status: N/A
Feb 8 11:32:01 Client CN7744 snmpd: Watchdog:Abnormal SNMPd stop occured, restarting…
Feb 8 11:34:01 Client CN7744 snmpd: Watchdog:Abnormal SNMPd stop occured, restarting…
Feb 8 11:34:02 Client CN7744 snmpd[12503]: DFS status: N/A
Feb 8 12:08:01 Client CN7744 snmpd: Watchdog:Abnormal SNMPd stop occured, restarting…
Feb 8 12:08:03 Client CN7744 snmpd[16077]: DFS status: N/A
Feb 8 19:32:02 Client CN7744 snmpd: Watchdog:Abnormal SNMPd stop occured, restarting…
Feb 9 05:26:01 Client CN7744 snmpd: Watchdog:Abnormal SNMPd stop occured, restarting…
Okay, cambium replied to my ticket, said it is a ‘strange issue, and they have not come across any other account’ such as this. So, they didn’t make it sound like the have any clue.
They also say they may need access to a device when the problem is present, I’m not sure how to accommodate that, since we can’t SSH or HTTP into them at that time. Plus of course, the customer is without internet, So The priority is usually rebooting the SM and getting them back online. However, we may do a truck roll and see if we can access it from their Ethernet side of things.
For us, I’m pretty sure it has been a mix of ePMP 1000 / Force 200 as well as Force 300. I will triple check with the staff. Virtually all of our 1000/200 clients are 2.4 GHz, almost all of our 5ghz is 3000/300, but there are a few straggling 1000/180/200s in 5ghz out there - not very many. But, I will triple check and make sure.
We mass upgraded our network of about 1500 devices to 4.7
no issues at the beginning but we noticed a lot of disconnected routers (cpe are in bridge mode).
They dont get dhcp anymore.
the cpe are not getting dhcp renewed, but they still answer on their old IP.
we connect them via ssh and reboot them.
downgraded to 4.6.1 and the issue disappeared.
we have noticed that some devices cannot downgrade below 4.6.4 … but we dont have any 4.6.4 around, so they stay on 4.7
We had turned off SNMP on all our SMs but the problem continues long past that so that doesn’t point to an SNMP issue unless turning it off doesn’t really turn it off.
We just lost another SM. All it did was renew it’s management DHCP lease. That was enough to knock it offline and require a power cycle.
It is not a “small number” of devices. It is 100% our network (around 300 devices). So far, we’ve lost around a dozen customers as a result of this. They got tired of their internet just stopping randomly and having to constantly reboot their gear. That did not instill confidence in them with their ISP and they switched.
So there must be something about your configuration that is triggering this issue more than ours. I’d encourage you to work with Cambium support to help them find the issue. I’m currently working with them and I have a debug firmware that they built (4.7.0.1-RC5) that I’ll be deploying tonight in an effort to try to catch the issue.
If you can provide remote access to device when the issue is seen that would help us a lot.
Unfortunately we haven’t succeeded in the issue replication in the lab so far and can rely on customers’ help only.
We do have private firmware with enabled debug prepared that I can share with you, but even having access to device with original 4.7 firmware would be very useful.
We also downgrade all SMs and ePMP2000 APs to 4.6.2, only leaving 4.7.0 on ePMP3000s. Our technicians call 4.7 a total disaster
We’ve lived quite happily for more than a year with 4.6.2 (few long happening issues aside). 4.7 broke lots of things… SMs got more expensive, software got more unstable snd buggy. Such are the times I suppose
@Fedor As much as I’d love to help on this issue, we are moving as many radios to 4.6.1 as fast as possible. This has been such a disaster for us and has angered so many customers, we are moving everything we can as fast as we can. I do not want to leave any devices on 4.7.0 or any beta versions of 4.7 for any longer than absolutely necessary. I am happy to email you configs before we downgrade but even that makes me nervous. It seems every time we touch a radio in any way, it results in a truck roll.
Same here, I learned the hard way to roll out ePMP updates very, very, very slow and cautious. Had a small micropop and small tower running 4.7 for a month or so before rolling them back to 4.6.1
Been upgrading and rolling back 4.7betas on this one poor micropop for over a year now I think. Been on 4.6.1 for almost 2 years now (probably over 2 years if you count betas) looks like going to be 4.6.1 for a long time to come.
Well, 4.7.0 is the gift that keeps on giving. About 50% of our F300 client radios will not downgrade. I get an error stating “Downgrading below 4.6.4 is not possible”. These are on radios that have been running much earlier firmware than 4.6.4 in the past. Again, no rhyme or reason as to which SMs give this error and which ones take the downgrade without an issue.
Here is a radio that has 4.6.2 in the inactive bank but tells me it can’t go below 4.6.4. And, 4.6.4 is not even a firmware that exists. Unreal.
We seen that on a few , tried to reboot and then downgrade but same error.
Downgraded to 4.7.0-RC7 and then it downgraded to 4.6.1 without a problem. Will probably work with other v4.7 RC’s I just went with the oldest one I had on hand and didn’t try anything else.
I got 4.7.0-RC64 from the Cambium site and it has the same problem. I don’t have any older RCs for 4.7 since we tend not to run beta and RC firmware.
My next issue will be we run STA frequencies on many APs so I’ll have to get an STA version of whatever RC works so we don’t have to re-program APs during this transition.
Are you having issues with 4.7 and 802.11ac radios? The only issues we’ve noticed seemed to be 4.7 802.11n clients connected to AC AP’s… but we haven’t seen any issues with AC clients.
The only issue we are not seeing in the AC radios is the SNMP going unresponsive when airtime utilization goes up. That is likely due to the much better CPU than the N radios. Other than that, every other issue we have is across the board. We have 4 towers that are 100% AC radios on them and they have the same issues that the mixed and all N radio towers have.
I’ve seen performance improvements with 4.7 mentioned in the forums. But it’s not clear to me whether those improvements are applicable to 2000 APs with F200 SMs.