Lost access to 2000 AP during upgrade to 3.5


@Au Wireless wrote:

Drove to the base of the mountain where I was able to connect a SM to the bad AP. No problem getting access to it from the wireless side. The bad AP reported a 1 Gbps connection on the Ethernet port but it was unable to connect out or ping anything. Matches the behavior from the other side of that port.

I downgraded from 3.5 to 3.2.2.  AP came back up after a reboot and was working as normal.

I then upgraded from 3.2.2 to 3.4.1. No problems after reboot. All working normal.

I then upgraded from 3.4.1 to 3.5. After the reboot, the AP would not connect to anything on its Ethernet port. It was essentially dead to the world again.

Downgraded back to 3.4.1 and everthing is fine again.

So, as far as I can tell, there is an issue with the 2000 hardware and 3.5.  I have 2 other 2000 APs but I am not really willing to test this theory on them since it's pretty disruptive to my customers.

The system log was downloaded during the 3.5 upgrade when it would not connect via Ethernet but there is nothing of interest in there other than cnmaestro trying to connect.

** UPDATE**  For kicks, I uploaded 3.5RC7 to the AP. Same behavior as 3.5 - no communication out the Ehternet port.

For now, running 3.4.1 with no issues.  I am happy to send the json config file to Cambium for lab testing to reproduce this.


We wil test this scenario with provided configuration files asap and revert back to you.

Thank you.

Hi,

I tested in an epmp 2000 that I have and is not currently in use, updated to version 3.3 and later to version 3.5, everything happened normally without major problems.

I have one on the test bench that is working well with 3.5, but I have held firmware roll out on our network until it is clear what is going on.

A bit dissapointing if this is a issue as stable firmwares was one of the advantages Cambium has over Ubiquiti. After the 3.3 reboot issue :(

Could Cambium please enlighten us on what quality testing these firmwares go through prior to been released?

Example: How many Units do you test a firmware on prior to release? What sort of test bench to you have etc

Would love to see a write up on this to explain the procedure as I am sure a lot of us are curious how a company could handle this.

I am sure we all understand that you can only test so much in house and once out in the wild new bugs maybe be discovered due to different setups used.

Thanks.

P.S. Heres to hoping I do not have to clear the cookies everytime I change a password in the next firmware. 

I will tell you this...  With in minutes of me posting the problem on this forum, Cambium tech support emailed me asking me for my config files for that AP so they could complete internal testing. I sent them the test files and am sure they are trying to duplicate this. I could have bad hardware. There could be something in the way I am set up in software that is causing this. I am happy to hear it is not wide spread.

However, I know Cambium is listening and taking it seriously. I have worked with their tech support in the past (as well as UBNT, Mimosa and others) and found Cambium support to be very responsive. Yes, it sucks when firmware comes out with issues. It's how you deal with that that counts. Let's see what happens in the next few days.  I upgraded 120 SMs with no issues and a dozen 1000 GPS APs with no issues. I can see the throughput gains on our 40 Mhz APs. It's not made up. But, my first 2000 AP failed and I can reproduce the problem over and over. I have an issue. Maybe it's just me, maybe not. I think we'll hear from Cambium Monday with internal testing results.

2 Likes

@Chris-T wrote:

I have one on the test bench that is working well with 3.5, but I have held firmware roll out on our network until it is clear what is going on.

A bit dissapointing if this is a issue as stable firmwares was one of the advantages Cambium has over Ubiquiti. After the 3.3 reboot issue :(

Could Cambium please enlighten us on what quality testing these firmwares go through prior to been released?

Example: How many Units do you test a firmware on prior to release? What sort of test bench to you have etc

Would love to see a write up on this to explain the procedure as I am sure a lot of us are curious how a company could handle this.

I am sure we all understand that you can only test so much in house and once out in the wild new bugs maybe be discovered due to different setups used.

Thanks.

P.S. Heres to hoping I do not have to clear the cookies everytime I change a password in the next firmware. 


Chris, 

Without revealing too many details, we have more than a dozen different test setups that go through testing each release. The test setups range from PTP to small number (4-6) to medium (25-40) to 120 SMs. We test all types (ePMP 1000, F180, F190, F200, ePMP 2000) of radios and bands. There are also different modes and configurations that are tested using automation and traffic generators. Then there are upgrade/downgrade tests using GUI/SNMP/CNUT/CNSS/cnMaestro etc. We also have our field test setup where we run configurations which we can legally run in an outdoor system.  Once this is all done, we proceed to open beta where we spend at least two weeks having beta customers (can't thank these customers enough for helping out!) try out the new release and getting feedback on field configuration which we possibly cannot simulate in a lab or simply did not think of. 

That said, with the number of configuration options and modes and bands and radio types, it is near impossible to cover all permutations and combinations. We work to make continous improvements to our test and quality process to prevent escaped defects but its an endless, on-going process as we rapidly add more capability and complexity to the product line. 

If you are ever on this side of the Pacific and visit Chicago, I'd be more than happy to take you on a tour of our test labs. It is something we're proud of and a legacy we carried over from Motorola. 

Thanks,

Sriram

1 Like

@Cambium_Sri wrote:

If you are ever on this side of the Pacific and visit Chicago, I'd be more than happy to take you on a tour of our test labs. It is something we're proud of and a legacy we carried over from Motorola. 

Thanks,

Sriram


Having seen thier testing lab myself, it is pretty cool.

1 Like

Thanks for the write up Sriram all sounds good.

Look forward to future updates. Any more word on 3.5? Have you managed to replicate this issue others have experienced?

 


@Chris-T wrote:

Thanks for the write up Sriram all sounds good.

Look forward to future updates. Any more word on 3.5? Have you managed to replicate this issue others have experienced?

 


Hi Chris, 

We have been unable to reproduce the issue that AU Wireless ran into with his ePMP 2000 with the configuration he sent us. We'll continue to work with AU and provide updates as we progress further. 

Thanks,

Sriram

1 Like

any update on this?

We cannot reproduce issue in our Lab with configuration provided by Chadwick.

All possible scenarious were tested.

Thank you.

I think I have figured out a relevant point.  Are all the devices that have shown this behavior connected to a Netonix switch?  I don't believe this is restricted to just 2000 series, but 1000 series devices as well.

Rolling out 3.5 on our network has been very smooth with the exception of devices connected to a Netonix switch and where remote access of said devices is only through the Netonix switch (AP downstream of a BH connected through a Netonix).  Any device connected on a stand alone power supply, CMM3, or CMM4 has upgraded just fine and remains accessible. 

In almost all cases, I've not been able to access the affected AP's via a SM (but access via Wireless is not enabled).  I've been able to plug the affected device in to a stand-alone power supply and access it via the secondary IP of 169.254.1.1 but not on the normal management IP.  If I make any change in config, in my case I enabled management access via wirless, save changes, then the device is accessible via the management IP.  When I plug it back in to the Netonix switch, everything has returned to normal.

I will note that management of our devices is on a separate VLAN than customer data and all affected devices have been those with GPS sync.

I have 10 APs all connected to and powered by Netonix switches. 3 of those are 2000’s and 7 are 1000’s. All have GPS. None of the 1000’s had any upgrade issues. Two of the 2000’s failed (same Netonix). The 3rd 2000 I have not tried to upgrade.

The Netonix connected to the 2000’s is a WS-8-150-DC. All the rest are WS-6-Mini switches.


Adam Bates wrote: If I make any change in config, in my case I enabled management access via wirless, save changes, then the device is accessible via the management IP.  When I plug it back in to the Netonix switch, everything has returned to normal. 

Hi. Once the 2000 AP has had a config change made and saved, and once it's working properly with v3.5 - can that AP be downgraded and upgraded smoothly after that?

To be honest, I haven't tried yet.  I will try this on Monday when we are back to full staff and post an update.

More to add.  The problems I've had seem to be related to upgrades completed by cnMaestro on-premise.  The condition does not appear when updating via the web UI...at least in the 2 that I've tried that way.

Hi Adam,

Thank you for your input.
We will check issue with upgrade through cnMaestro on premises.
Thank you.

i have updated about a dozen or so ap's  to 3.5.

I use CNUT to udpate the AP's then i use  cnMaestro to update theclients.

no issues so far.

I am rebooting the AP's prior to upgrade.

My upgrades have all been via web interface with a reboot just prior to running the upgrade. This resulted in success on the 1000’s and failure on the 2000’s.

So an issue with upgrading thru cn maestro or with netonix switch? 

Was getting ready to upgrade a 2000 on a netonix switch this past week wsip switch 250dc i think? It's not on any separate vlans only vlans are to separate back haul from customer facing ap's

Is it possible 3.5 is not keeping your van settings when upgrading thus losing connection? What if you make the port on the netonix just a basic bridge port tied to your other customer facing ports does it still happen?

No, the VLAN settings remain in the AP, at least all the settings appear correct when accessing by the fallback IP.  What is weird is that the AP MAC does not appear in the bridge/arp tables of the Netonix when the issue occurs.  It does after I pull the device off the switch, power it up, save changes, and then put it back on the switch.

I don't know which device is the culprit at this point, the Cambium device or Netonix.

I just started a new batch of upgrades from cnMaestro.  Is there anything else anyone would like me to check while I have an AP in an inaccessible status?

In response to a previous note, I have found that once a device is upgraded to 3.5, it can safely be rolled back to 3.4.1 and then back to 3.5 without the issue occurring again, even from cnMaestro.