Canopy 14.1.1 DFS problems :-/

I updated all our 450 APs to 14.1.1 last week.  On our 5Ghz 450s we use the 5.4 band.  On all these APs the "Alternate Frequency Carrier 1" and "Alternate Frequency Carrier 2" were reset to "none".  when I go and set them back to what they should be, save and reboot they still come back as "none".
 
Because of this if we experience a DFS event we have no other frequency to hop to.
 
Additionally we have a couple APs that after the upgrade are now seeing DFS events when previously they experienced no events...ever.  We live in the middle of the rocky mtns. so there really shouldn't be any events because of all the granite clouds in the way ;-)
 
anyway, DFS seems to be broken on 14.1.1 and I wanted to let everyone know before they apply the upgrade incase you need DFS to work properly.

alright so after digging into this and making some adjustments I think i've got the DFS events settled down now.

 
1. Issue:  The 14.1.1 upgrade on PMP450 APs using the 5.4 band will delete "Alternate Frequency Carrier 1" and "Alternate Frequency Carrier 2".
    Fix: I had to log into each AP and manually add the 2 alternate channels back into the APs.  This way when we have a DFS event the AP will move to another channel.  Since the upgrade had deleted the 2 alternates the AP would shutdown for 30min.
 
2. Issue: The tower that was having the most DFS issues has 11 PMP450 APs and It's a commercial tower with a lot of FM stations etc.  All APs are connected to a CTM2 however after the upgrade half the APs were getting sync from the CTM2 over the power port and half didn't see any sync on the power port.  
    Fix:  after reading other reports of the 450s having weird sync issues (causing other issues besides DFS events) I decided to force all 11 APs to use the onboard GPS and I turned off the power port timing and timing port timing to achieve this.  So now all APs at this tower are receiving sync from the onboard GPS.
 
3.  Issue: Somewhere along the line (13.x) cambium made it so that a 450AP in the 5.4 band couldn't be set to more than 75% downlink.  We originally had them set for 85% downlink and they were timed with our 430 APs.  Well when cambium forced the change without any notification suddenly our 450 APs were out of time with our 430 APs.  The area that this problem tower is located we had removed all the 430 APs and moved them to smaller towers at the edge of our network so I didn't think we'd have any timing issues since there were mountains blocking the signals.  However there was one tower with 430s still on it that was ~7 miles from this problem tower.
    Fix:  I changed the 430 timing to match the 450 timing at the problem tower.
 
after running this weekend with the new settings I've only had 1 DFS event where as before these fixes the were a couple APs having DFS events 2-5 times a day.
 
In conclusion it seems like 14.1.1 is WAY more sensitive in regards to timing sources.  Also 14.1.1 doesn't seem to reliably receive sync over power.  Again when were were on 13.2.1 we didn't have any of these issues.  14.1.1 introduced some timing and DFS issues but they are somewhat manageable (i really hate turning off a sync source because it'd be nice to have it as a backup).  It seems like if a nearby AP is out of sync that will trigger a DFS event.
 
2 cents.  I'll report again in a few days to see if these fixes have eliminated the DFS events.

I guess i'll just keep replying to myself lol.

Anyway, after watching this for a couple more days it seems like what is happening is one AP on the tower will have a stack dump which causes other APs on the tower to register a DFS hit. 

then sometimes the DFS hit causes a stack dump in that AP.

14.1.1 has a lot of speed improvements but all the DFS hits that are now happening that didn't happen on 13.2.1 are causing issues.

-Sean

so I gave up, our network and clients can't put up with all the memory stack dumps, false DFS hits and registration failures that state "out of range".

 
I downgraded the APs to 13.2.1 and everything is stable once again.
 
I look forward to a stable release of 14.1.1 for all the speed improvements and the frame utilization stats etc. but I can't have our network and clients suffer through buggy software anymore.
 
2 cents

also for what it's worth the APs that seem to have the most problems with the 14.1.1 software are the original APs from a few years ago with the FSK port on them.

 
The newer APs and particularly the ones with the GLONASS GPS chip had the least if any problems.
 

Hi Sean,

Apologies for the late reply, as I've been trying to gather as much information to answer as many of your questions as possible.

Regarding the numbered issues you listed in one of your prior posts, I'll try to address them as listed:

1) Alternate Frequency Carrier issue:

In a prior post you mentioned having a problem where the Alternate Frequency boxes wouldn't save after a reboot. There was a known issue we fixed which corrected this problem, but that fix is present in 14.1.1.  I wonder if you were running a pre-release Beta when you experienced that.  Nonetheless, it looks like you were able to successfully set the fields as of your most recent update.  However, it sounds like there may have been a problem during the upgrade where the Alternate Frequencies were wiped out after the upgrade, which we will look into.  I suspect, however, that may have been a result of an interim upgrade blowing them away, which you would not see outside of the Beta releases.

2)  Timing/Sync/DFS issues

You mention you had 11 APs on this tower.  Were those all upgraded simultaneously to 14.1.1, and are there any other Cambium radios on that tower which were NOT upgraded, and happened to be running older versions of software when you were experiencing issues?  There were some major fixes in 14.1.1 with respect to aligning frame starts for all legacy and new radios, but also unearthed a bug in some legacy software versions which could cause interference between radios running old and new software.  In the 14.1.1 user guide, please reference the section about "Frame Alignment Legacy Mode" and see if it may apply to your tower's configuration.

With respect to some of the APs not seeing sync over power from CTM2, I have not heard of this being an issue on PMP450 in the past.  I would recommend trying to run "debouncepowerport" from the Telnet interface on any APs which are not seeing sync over power.  If the signal is weak or intermittent, the debounce may be filtering it out as invalid.  This may help, but be aware it may also cause the radio to go in and out of sync.

Finally, with resepct to CTM2, there should be an update which provides a CMM Compatibility mode option.  I would recommend updating to a revision which has this setting and enabling it, as it corrects for a discrepency in the frame start delay between a CTM2 and a Cambium CMM device, which could otherwise cause some interference.  All Cambium radios have had their frame starts aligned such that running off of either iGPS, uGPS or CMM sync over power should align correctly.  CTM2 without CMM legacy mode enabled may cause radios using an alternate sync source from not aligning properly and interfering with eachother.

3) 75% downlink

I spoke with our in-house regulatory guru and he informed me that our software is adjusted to support the maxium Uplink/Downlink ratio that is able to pass FCC testing for DFS.  It is possible that at some point in the past a new feature or new channel bandwidth addtion led to a reduction in the ratio due to increased overhead causing difficulty in maintaining FCC compliance.  I don't have an exact software release to reference at this time, but I'll see if I can dig something up for you with respect to that change.

Regarding DFS sensetivity:  There have been some further tweaks and fixes to DFS in current in-house builds, which will hopefully be addressed in a future point release, 14.1.2 and/or 14.2.  We hope to address any issues like the ones you describe as soon as possible.

I hope this answers some of your questions, and we are working hard to address any issues that come up.  Thanks!

1 Like

Looks like we were posting at the same time!  I just caught up on your most recent posts.

I'm sorry you're experiencing so many issues, but I hope my last post will be of some assistance.

Regarding the stack dumps you mentioned, have you opened a support ticket regarding those?  It would be extremely helpful to see them if you would be willing and able to share those with us.  Thanks in advance!

1 Like

here's the answers to your questions:

1.  all 450's on this tower and all our towers were upgraded to 14.1.1 at the same time from 13.2.1 and all SMs are now on 14.1.1.  All our 5ghz 450 APs use the 5.4 DFS band.  On all our towers we also have PMP100 FSK in the 5.8 band.  At first I had the frame alignment mode set to off.  I then changed it to mode 1.  neither helped and we have the same symptoms.  

2.  I haven't talket to last mile gear about an update to their CTM2.  Sync works fine on 13.2.1.  Sync is all messed up on 14.1.1.  I've now downgraded this tower and i'm not going to mess with it again for a while because we just tortured ourselves with a week of buggy software.  I will hold tight until i know for certain you have a 14.x software release that's stable.

3.  i've changed all the timing of our 430s to match the 450s now so this is a non issue anymore.

4.  lastly, 14.1.1 seems to be running fine on our smaller towers that only have a couple 450APs.   It just seems to be this one tower that has a mix of older and newer 450s, doesn't like sync, gets DFS hits often, has stack dumps regularly, and has "out of range" registration failures.

-sean

after looking at more APs they are all having stack dumps, loosing sync over power, and having DFS hits regularly.

I will be downgrading all our 450s to 13.2.1

14.1.1 is NOT ready for prime time...it is extremely buggy and crash prone :-(

Hey Sean,

Once again, I apologize for the problems you've been experiencing.  I'm about to reach out to you via email (my manager was  notified about your issues  via email through the Animal Farm mailing list. I'm pretty busy today, with meetings and work, but expect to see something from me in the next few hours.

We'll see if we can't get things squared away for you.

-Al

Just so others reading this thread can get some more information, I'm including part of my email response to Sean, here in the thread:

Sean said:

all 450's on this tower and all our towers were upgraded to 14.1.1 at the same time from 13.2.1 and all SMs are now on 14.1.1.  All our 5ghz 450 APs use the 5.4 DFS band.  On all our towers we also have PMP100 FSK in the 5.8 band.  At first I had the frame alignment mode set to off.  I then changed it to mode 1.  neither helped and we have the same symptoms.

My response:

Since your FSK radios are not operating in the same frequency band as your 450s, you should not need to worry about Frame Alignment Legacy mode, unless your 450s are running different software versions (and you indicated you had them all on 14.1.1, so you’re in the clear.)  However, if there are any other nearby APs on a different tower which are within range of this tower, you would have to make sure they’re also not potentially interfering on the same freq band from a distance.  Same rules about Frame Alignment Legacy mode apply in that case as well.

-----------------------------

Sean said:

I haven't talket to last mile gear about an update to their CTM2.  Sync works fine on 13.2.1.  Sync is all messed up on 14.1.1.

My response:

While I understand you say the sync works fine with 13.2.1, I would still recommend upgrading the the latest CTM2 firmware and enabling CMM Compatibility Mode to ensure the best chance of interoperability with CMM and aux port sync devices.  CTM2 is known to have a longer delay than CMM between input and output sync, so for proper synchronization between those devices, it’s best to have compatibility mode enabled.  If you exclusively use CTM2 everywhere, you may not run into any issues, but if you mix in CMM hardware anywhere nearby, you may run into interference.

 

Secondly, I spoke to our FPGA team regarding the sync issues you were seeing between 13.2.1 and 14.1.1.  I was informed that due to hardware changes for the 450i platform, the FPGA’s sync filtering was modified slightly to properly handle both 450 and 450i hardware, with respect to sync.  It is possible that this has introduced some sensitivity on 450 hardware when running 14.1.1.  We are working to reproduce this issue in-house, and hope to test a solution which would involve using different sync filtering for 450 and 450i, instead of an all-in-one solution.  If that shows promise, I suspect a future build, like 14.1.2 or 14.2 will incorporate that change, and hopefully fix the sync issues on 450 hardware.  If you are willing, we may ask you to give that update a spin when it’s available, since you appear to have a setup with reproducible sync issues.

-----------------------------

Sean  said:

lastly, 14.1.1 seems to be running fine on our smaller towers that only have a couple 450APs.   It just seems to be this one tower that has a mix of older and newer 450s, doesn't like sync, gets DFS hits often, has stack dumps regularly, and has "out of range" registration failures.

My response:

I’m glad you’re not having all-bad experiences with 14.1.1.  There are indeed a lot of fixes and improvements in there.  Regarding the tower in question, I wonder if I might not have a potential explanation for the issue you’re seeing (outside of the stack dumps).  Are the APs in question set up for Autosync or Autosync + Free Run?  While Free Run can be helpful if you drop sync occasionally, especially in sparse sectors, an AP that is in Free Run mode can very quickly drift out of frame sync with the other APs in the sector leading to self-interference and DFS falsing and, thereby, SM disconnects.  I would recommend trying to use Autosync without Free Run and seeing if that helps in that regard.

 

Obviously, if sync is intermittent, disabling Free Run may cause individual APs to drop session, but it will at least prevent the AP from interfering with the entire sector, limiting downtime.  I know it’s not a perfect solution, but generally speaking, it’s better not to use Free Run in a dense sector where interference is likely.   Also, I understand that the sync issues you’re experiencing exacerbate the problem, due to APs dropping sync more frequently, but we’re working on remediating that ASAP.