100% CPU on 1000 APs getting worse

Au_Wireless · April 13, 2020, 5:29pm

I have 3 or 4 of our 12 1000 APs out there that are pegged at 100% CPU almost all the time. While the AP continues to pass traffic, we loose SNMP polling and logging in via the web interface is incredibly slow. Some are running 3.6.5 and some were just upgraded to 4.5. Problem is no different on either firwmare.

Then we have other 1000 APs that do not ever display this problem yet are configured nearly identical. We do not do QOS in the AP and firewall is off. It is really set up just as a bridge. Our routers and Preseem do all the shaping and NATing. These are running on 40 MHz channels and have an average throughput of around 50 or 60 Mbps when the issue crops up.

What pushes these CPUs to the max? What can I adjust or look for to fix this?

Eric_Ozrelic · April 13, 2020, 7:50pm

Are the ones showing high CPU usage the original 1000 radios (non-GPS)?

I know the CPU in the original 1000 was pretty weak. They upgraded the CPU's in the 1000 GPS and the e2k.

Au_Wireless · April 13, 2020, 8:05pm

No, all of our 1000 APs are GPS versions. Some are the lite that were upgraded with a license to full but that should be the same hardware / CPU.

Eric_Ozrelic · April 13, 2020, 8:17pm

Are you using cnMaestro on them?

Au_Wireless · April 13, 2020, 8:28pm

Yes, cloud version.

Eric_Ozrelic · April 13, 2020, 8:36pm

Try disabling cnMaestro on the ones hitting 100% and see if it fixes it.

Au_Wireless · April 13, 2020, 10:35pm

Done. We'll see if it helps.

Interestingly, the two worst APs I have are 1000 GPS APs running 4.4.3-DEV for the 5.9 STA. One of those is actually on a 5.9 channel and the other is not. I will move the one that is not to 4.5 and see if that helps the CPU utilization.

Au_Wireless · April 14, 2020, 3:25pm

Turning off cnMaestro in the AP had no effect on CPU usage.

brubble1 · April 14, 2020, 5:03pm

The only time we see this:

is on heavily loaded 1000 AP's (GPS). We have one tower with an omni that was doing this, when the 1000 died we replaced it with a 2000 which didn't stop it from needing more capacity and pegging CPU or anything but it no longer takes spells where you can't log into it and the graphs , while still pegging 100% cpu most of the time, no longer have breaks in them (SNMP doesn't stop responding).

For me when an AP starts doing this, dropping SNMP not just pegging 100% cpu because I have 1000 AP's that will peg 100% CPU laying on the bench with nothing connected, the frame time usage and everything else is high also so I consider it a sign the sector has reached capacity and I probably shouldn't have let it get this bad.

Au_Wireless · April 14, 2020, 7:24pm

I've got 18 subs on a 40 MHz channel AP. Does not seem that busy. Bandwidth is mostly below the average for 1000 APs @ 40 MHz:

Here is CPU for same time frame. Frame utilization is full as well but this really is not that busy of an AP.

brubble1 · April 15, 2020, 1:20am

Makes me wonder if your problem radios are due to something on the network between them and whatever is monitoring them, a switch or something dropping packets when the load reaches a certain level. I think we have had one site behave similarly when the backhaul was being saturated.

Au_Wireless · April 15, 2020, 1:44pm

Well, both sites are fed by gigabit links and there are other APs on the same tower that don't have this same problem.

supers · April 15, 2020, 2:23pm

I'm seeing the same issue with one of our tower sites here. 2 Cambium 5GHz 1000-GPS APs on a tower both at 40MHz with roughly around the same amount of clients. One runs at 100% cpu and the other runs at 40%. GPS sync is not enabled.

Clients behind the 100% cpu AP are all randomly disconnecting from their PPPoE sessions. On the other AP, clients have been connected for days. Disabling cmMaestro did not help nor disabling anything that gets stats via SNMP.

Dmitry_Moiseev · April 16, 2020, 5:39pm

Hi,

I would cross check if the CPU usage is related to the number of retransmission packets and interference.

Dmitry

Au_Wireless · April 16, 2020, 6:41pm

My very worst AP (in terms of CPU) is on a completely clean 5.9 GHz channel courtesy of our STA license. No interference and very low re-transmits.

Guilherme · April 18, 2020, 10:56am

i can report the same on F200 PTP bridges.

even during the night when trough-traffic is essentially zero.

TDD-PTP, 75/25, 2.5ms in a 40Mhz channel. firmware 4.5.
there is almost no interference. (100down/20up on the channel).

observed behaviors are:
walks fail in the middle with [timeout], or never start[no response]
WebUI is unresponsive for minutes, then comes back.