I have noticed an issue that I am seeing on a majority of my SMs for several FW iterations.
What is happening is if there is anything on the AP side that causes all the SMs to disconnect from the AP after the SMs reconnects the TX rates to the AP become unstable until you reboot the SM.
So changes on the AP like turning on/off MU-MIMO or beamforming and even an AP reboot cause this to happen. A simple reboot of the SM fixes the issue. It does not happen to all SMs but it will happen to most of them.
Recently I had an issue where the AP lost connection to AFC (my mistake) and I did not notice it until the AP was unable to check the AFC and then turned off the wireless and dropped all the clients. When that happened I first thought it was an AP issue so I rebooted the AP. It was not until after I rebooted that I noticed the SM drop was because of no AFC. I got the AFC fixed and then the SMs reconnected. This was Sunday August 24th.
Today I noticed that since that AP reboot 6 of the 9 SMs experienced worse UL stability than before the AP reboot. I have the SMs scheduled to reboot tonight but I have rebooted a couple of them and like I have seen on other APs the SM reboot fixes the rate problem.
I have seen this across several APs on different towers over enough time that I feel this is some type of bug and not my environment.
I was at first thinking AP side interference but I checked and there is no new noise. I then remembered that I rebooted the AP on the 24th at the exact time the rates became worse.
This current AP is 80mhz TDD 75/25 long GI w/beamforming.
It is like the SMs somehow get out of sync and then start causing self-interference on the AP. At least that is my guess.
Again this only seems to happen when the AP drops all the SMs at once either because of an AP reboot or wireless setting change that causes a disconnect.
One odd thing is that FW updates do not seem to cause this issue even though the SMs will upgrade first and then the AP will upgrade and reboot. Maybe the SMs did not connect to the AP before the AP rebooted and the trigger is that the SMs need to be registered first?
Here are some posts I made about similar issue, but with different triggers; some of which have been since fixed. So I wonder if the underlying issue is still the same, but Cambium has fixed the triggers in some of the below cases.
Well it seems rebooting an individual SM only fixes it for a period of time before it become unstable again. It was good for around 30min. Though still not as bad as before the reboot.
This could be nore disruptive for your clients, but maybe you could reboot each SM ‘one per hour’ and see if there’s a magic reboot which fixes one of them, which fixes it for all?
Correction to the above. So far UL rates have improved after I rebooted 4 of the 9 at different times over the last 5 hours. However, rates while improved are not back to where they were before AP reboot on the 24th. Seems like each SM reboot improves things just a little bit for all SMs.
I am still seeing this issue with bad UL rates after a change on the AP that disconnects SMs like changing from Beamforming to MU-MIMO. This is just like I saw before with turning Asymmetrical UL bandwidth on in the past, but that was fixed with a newer FW (5.10.0 I think?).
However, the issue still persists with the change from Beamforming to MU-MIMO. It is pretty easy to test and reproduce on my end. I can take any 4600 AP with SMs that is set to Beamforming only then change to MU-MIMO. After the SMs reconnect after a minute or two the SMs UL MCS rates are lower and erratic. A reboot of all the SMs fixes the issue.
I just did this last night on a 4600 AP with 26 SMs. Only change was Beamforming to MU-MIMO. SMs dropped from AP. They all then reconnected, and the UL rates were bad. Scheduled a mass SM reboot via cnMaestro around 15 minutes later and when all SMs reconnected their rates were same as before the AP setting change.
It is like the UL scheduler is freaking out and the SMs start interfering with each other and the mass reboot of SMs gets everything back in sync. AP is 75/25 GPS sync with Long GI.
I believe I have also seen this if I just reboot the AP as well. Once the SMs reconnect to the AP UL rates are bad. Once I then mass reboot the SMs everything is back to normal. However, I have not done that particular event enough to see if it is repeatable like the Beamforming to MU-MIMO change is.
Here is a photo of the SM rates from the AP perspective. The first gap in the charts is the change from Beamforming only to MU-MIMO on the AP. The second gap is the mass reboot of all the SMs. Pretty easy to see the bad UL rates.
Still seeing this issue at times. Had to do a hard reboot of a 4600 AP as we upgraded/replaced the tower backup power systems. After the AP reboot around 1/3rd of the SMs now have unstable UL rates but this time an AP or SM reboot does not fix it.
Tried just SM reboots and that did not work. Then tried upgrading SMs/AP from 5.10.1 to 5.10.4 and that made no change.
I have triple checked and there were no changes to the AP settings, AP AFC allowed TX power, or the SMs AFC allowed TX power levels.
The ePMP dev’s know about this problem and are working on a fix. They’ve been working directly with me as I’ve been able to reproduce the issue on a fairly regular basis.
This issue with the uplink rates crashing, which in our case causes poor overall speeds along with extreme bouts of latency/jitter across the AP, can be temporarily resolved by rebooting the AP. The issue seems to be amplified with intermittent noise, larger channel widths, and backwards compatibility mode. We have a site with 4x 4500 integrated 8x8’s using 80MHz channel widths, and backwards compatibility mode and this issue would usually happen every day, forcing us to reboot the AP’s.
You can see a visual representation of the latency/jitter here. Blue is high latency/jitter, while green is low latency/jitter with everything working fine. The red line down the middle is when the AP was rebooted.
While my UL rates have the issues I describe in the above charts, so far I have not noticed any performance issues from it. Now these are 6ghz devices so not much noise compared to 5ghz.
I have checked ping latency averages and Preseem latency for the AP/Clients both before and after the UL rates issue and they seem about the same. Now my reporting/graphing system might not be as granular as yours so I could definitely be missing latency and jitter spikes that did not occur before.
Usually either all SM reboot or AP reboot fixes it, but for some reason that does not seem to be the case this time.
Here is what I am seeing on the SMs. The UL MCS for this example client seems to jump around often between DS7 and SS7 where before it was a mostly stable DS7.