Network has around 90 APs. All PMP450 or PMP450i. In the last couple weeks random APs will drop all SM sessions. No amount of trickery can make these SMs recover until all of sudden SMs start coming back online. No pattern to the frequency. No signs of interference. Sync is good across all sites. Channel separation is in order. Firmware is current.
SAS shows everything correctly Authorized but the SMs simply won’t connect. De-registering and re-initializing doesn’t change anything. Event logs lead us nowhere.
Looking for any suggestions if anyone has seen any behavior like this before.
Sorry to hear you’re having issues like this Ryan… I would suggest opening a support ticket so the team can gather more information and help diagnose what’s going on.
In the meantime, I will highlight this internally and see if anyone has heard anything similar.
I’ve got a support ticket open already. Their only response is interference however the spectrum analysis doesn’t point in that direction unless its an outside of the spectrum interference. Ticket 449413.
Many of these affected sites are within a 5 mile radius however when these events happen on 3655 for instance, it doesn’t affect all 3655 radios. And when it happens, its got a “rolling” type affect as if its moving through the area.
I’ve posed the question to support regarding sync timing issues within the CBRS band or outside the band. I’ve dealt with harmonic interference in the past but never in the 3ghz band. This would date back to the old FSK 900Mhz days.
I’m hoping someone here would have some input for some other things to look for.
These graphs look like interference is causing the SMs to drop offline. Have you looked at the sessions list on the AP when this is happening? I suspect if you can, you will see modulation rates start to drop before the SMs go offline.
I have often seen similar in the past when some SMs on a sector can see an interfering AP while others don’t. The SMs seeing the interfering AP will drop offline, often in sequence and come back at different times. I’m not sure if there is a timing component to the interfering AP vs our AP or not, but usually I’ve seen this where another wisp has a 450m AP in similar azimuth from the SM to our 450i. It “seems” like when the other WISPs customers in the are of ours are busy the problem gets worse.
If you can move your AP to a frequency far away in the band you might see if the behavior follows. Obviously this can be difficult as there isn’t much spectrum available in this band.
Not yet. Got a call with Cambium tomorrow where I’ll hopefully get some engineering type questions answered regarding that as well as some other timing related questions.
I spent the afternoon reading up on this and my original hunch about timing lines up with this. I went ahead and implemented this across the board this evening and saw some improvements right away in the SnR however, these “events” as we’ve been calling them, have not been present at all today. As seen on the graphs from the last 24 hours, that 6am drop followed by hours of quality signals. LTE coexist changes weren’t made until around 630pm this evening. The next 24-48 hours will be a true test of the results while we wait on the graphs to populate.
Yesterday after some research I found there were some known issues with 24.2.1 firmware relating to legacy P11 SMs. Majority of our SMs are older units that fall in to this category. Scheduled an overnight downgrade to 22.1.2. Oddly enough Cambium Support recommended the same thing a few hours later. The firmware downgrade provided the most substantial improvement overall. I also made some adjustments to the LTE Frame settings and will continue to monitor throughout the day.
Attached are few graphs that show around 2am when the firmware downgrade took place the changes we have seen.
LTE Colocation settings provided a major improvement in our SNR and RSSI imbalance so big thanks to Eric for pointing us in that direction. These issues appeared to start back in June and have steadily gotten worse over the last couple months.
We’ve had a few days now without any events and that coincides with the replacement of a timing unit that was causing strange behavior on the ethernet link to radios on a single tower. The radios would occasionally drop back to 100Mb Half Duplex from 1000Mb FDX however, the fall back never matched up with the events. The timing unit was a shelf spare that was new but about 6 years old. It was deployed with a complete tower site upgrade after lightning damage and It’s my belief that the timing unit was dropping sync or delaying sync to the radios which was causing wide reaching affects. Combine that with LTE interference and it makes for some difficult troubleshooting.
This is great information Ryan. Thanks for posting. Can you elaborate on the “timing unit” you describe above? What model was it, and were there any other indicators aside from flapping the port speed?
This is the kind of information that can help others to troubleshoot similar issues, thanks again.
It was a LMG Cyclone CTM2M. No other indicators were present. All APs showed 100% sync and no errors in the ethernet stats on the APs themselves. Only port speed fall back in the switch and this was on all brand new equipment and new cable after a lightning damage a few weeks prior. The only reason we changed the timing unit (keep in mind it was factory sealed when it was installed just 2 weeks before) was because the frequency of the issue seemed to behave like radios when they are in proximity of un-timed radios. We’ve seen failed GPS units in the past that have caused timing issues. We have since went to Autosync on the APs and when they lose sync they stop transmitting. Not the case here.