100% CPU on 1000 APs getting worse

I have 3 or 4 of our 12 1000 APs out there that are pegged at 100% CPU almost all the time. While the AP continues to pass traffic, we loose SNMP polling and logging in via the web interface is incredibly slow.  Some are running 3.6.5 and some were just upgraded to 4.5. Problem is no different on either firwmare. 

Then we have other 1000 APs that do not ever display this problem yet are configured nearly identical. We do not do QOS in the AP and firewall is off. It is really set up just as a bridge. Our routers and Preseem do all the shaping and NATing. These are running on 40 MHz channels and have an average throughput of around 50 or 60 Mbps when the issue crops up. 

What pushes these CPUs to the max? What can I adjust or look for to fix this? 

1 Like

Are the ones showing high CPU usage the original 1000 radios (non-GPS)?

I know the CPU in the original 1000 was pretty weak. They upgraded the CPU's in the 1000 GPS and the e2k.

No, all of our 1000 APs are GPS versions. Some are the lite that were upgraded with a license to full but that should be the same hardware / CPU.

Are you using cnMaestro on them?

Yes, cloud version.

Try disabling cnMaestro on the ones hitting 100% and see if it fixes it.

Done. We'll see if it helps.

Interestingly, the two worst APs I have are 1000 GPS APs running 4.4.3-DEV for the 5.9 STA. One of those is actually on a 5.9 channel and the other is not. I will move the one that is not to 4.5 and see if that helps the CPU utilization.

Turning off cnMaestro in the AP had no effect on CPU usage.

The only time we see this:

is on heavily loaded 1000 AP's (GPS).  We have one tower with an omni that was doing this, when the 1000 died we replaced it with a 2000 which didn't stop it from needing more capacity and pegging CPU or anything but it no longer takes spells where you can't log into it and the graphs , while still pegging 100% cpu most of the time, no longer have breaks in them (SNMP doesn't stop responding).   

brokengraph002.jpg

For me when an AP starts doing this, dropping SNMP not just pegging 100% cpu because I have 1000 AP's that will peg 100% CPU laying on the bench with nothing connected, the frame time usage and everything else is high also so I consider it a sign the sector has reached capacity and I probably shouldn't have let it get this bad.  

I've got 18 subs on a 40 MHz channel AP. Does not seem that busy. Bandwidth is mostly below the average for 1000 APs @ 40 MHz:

Here is CPU for same time frame. Frame utilization is full as well but this really is not that busy of an AP.

 

Makes me wonder if your problem radios are due to something on the network between them and whatever is monitoring them, a switch or something dropping packets when the load reaches a certain level. I think we have had one site behave similarly when the backhaul was being saturated.

Well, both sites are fed by gigabit links and there are other APs on the same tower that don't have this same problem.

I'm seeing the same issue with one of our tower sites here. 2 Cambium 5GHz 1000-GPS APs on a tower both at 40MHz with roughly around the same amount of clients. One runs at 100% cpu and the other runs at 40%. GPS sync is not enabled. 

Clients behind the 100% cpu AP are all randomly disconnecting from their PPPoE sessions. On the other AP, clients have been connected for days. Disabling cmMaestro did not help nor disabling anything that gets stats via SNMP. 

Hi,

I would cross check if the CPU usage is related to the number of retransmission packets and interference.

Dmitry

My very worst AP (in terms of CPU) is on a completely clean 5.9 GHz channel courtesy of our STA license. No interference and very low re-transmits. 

i can report the same on F200 PTP bridges.

even during the night when trough-traffic is essentially zero.

TDD-PTP, 75/25, 2.5ms in a 40Mhz channel. firmware 4.5.
there is almost no interference. (100down/20up on the channel).

observed behaviors are:
walks fail in the middle with [timeout], or never start[no response]
WebUI is unresponsive for minutes, then comes back.