All ePMP3000/3000L self reboot/crash eventually?

Since my first ePMP3000 installation at the beginning of 2019 until today, every ePMP3000 and 3000L will eventually crash/self reboot. I have never seen a 3000 with a uptime of over 100 days. Have you guys?

I have seen a 3000 in the 90’s but never triple digits. Don’t get me wrong, I love these two radios very much, but recently one 3000L has been self rebooting weekly, and two of my 3000’s both crashed/self rebooted a few weeks ago, prompting me to look into this more.

It is not the switches at fault, and in the switch logs I just see reports of the eth ports going down then coming back up again.

I do run most of my radios on 80mhz channels as there’s no noise where my WISP is, and performance is amazing. I want this to stop happening though as i’m outside warranty now and I just assumed firmware would eventually fix it - back in 2019 self reboots/crashes were very common, so as they became less frequent with firmware updates, I assumed they would go away altogether, but they have not.

Can we see some screenshots of status pages with an uptime of more than 100 days on epmp3000 running 80mhz channel? Even 40mhz channel would be good to see.

Cheers

This is a tough one as I have worked with many operators having great success with ePMP, on the flip side there are those few that seem to be plagued with issues, bad luck, cursed if you will (we are getting close to halloween). Sometimes it seems like it’s a perfect storm of faulty equipment, challenging environmental conditions, power or grounding issues, and network and configuration issues that all result in a seemingly never ending poor experience. I’ve been using Cambium equipment for a very long time now and on the whole, it’s been very reliable, but I’m pretty strict when it comes to how it’s deployed and its configuration. My core philosophy is to have the AP do as little as possible outside of pushing RF packets around.

All that being said, I can’t give you a great report as to AP uptime, because a large part of my success has been keeping extremely up to date with firmware. I checked and the highest uptime for an e3k AP is 68 days, but the only reason is because we applied 4.7.1-RC13 recently. We have some e3kL AP’s, but again, we just recently hung them and then applied 4.7.1-RC13, so those are at 50+ days. We do have some F300 AP’s or PtP’s running 4.7.01 that have been up for 150+ days. While our uptimes are not high, I can say with confidence that we rarely if ever have any random reboot issues across 50 e1k and e3k AP’s and nearly 500 clients.

Back to my philosophy of having the AP do as little as possible and only RF.

  • All the AP’s use private IP’s and are fully firewalled from the internet. Each site is L3 routed. We do not use VLAN’s on the AP or SM’s. It’s extremely important to minimize or filter any strange traffic going over your BH’s and/or into your AP’s.
  • We do not use PPPoE
  • We do not use IPv6
  • We use our own in house NTP and caching DNS servers
  • We enable and use SNMP for monitoring
  • We use cnMaestro cloud
  • We do use QoS and MIR on the AP’s, but we use Cambium QoE for MIR control
  • All other services are turned off or disabled on the AP
  • We use Cambium sync via ethernet wherever possible
  • We use sync and at most sites we use a TDD fixed ratio of 75/25
  • At many sites we use super high end pre-terminated, certified cat 6 cables and ends
  • We use gig-e capable Transtector inline ethernet surge protectors
  • All SM’s are typically using the most current stable firmware revision, in this case, 4.7.0.1

The goal is to create an environment with as little spurious traffic, electrical or sync issues as possible. Making things as clean and simple as possible is my key to success.

2 Likes

That’s great Eric. Really just want to see screenshots of status pages showing uptimes of more than 100 days.

Cheers

Does this count :laughing:

2 Likes

I also often see spontaneous reboots on 3k gear, but it will stay up past 100 days.

1 Like

Thanks Jacob. Is that running 80mhz channel?

Wow, i figured it wasn’t just me. I have a 3000L on over 100 days running 80mhz channel, but all my 3000’s crash eventually on 80mhz.

Does anyone run 80mhz on 3k other than me? I know i’m lucky not to have noise, but to be honest, the one AP that does face noise runs mint on 80mhz, you just have to tune each sm max mcs in both directions to keep retransmits at zero.

Anyone with a 3k running for more than 100 days on 80mhz?

I don’t have any e3k running 80Mhz. All are either 20Mhz or 40Mhz. Both of the screenshots I posted are radios running 40Mhz.

Thanks Jacob. Regarding the 3000 with 124 days - what firmware version is it running and how many subscribers does it have?

Cheers

16 subs and 4.6.2. It’s been updated to 4.7.1-RC13 now though. I have a few radios running 4.7.1-RC13 with 35~40 subs that have been up 70+ days at this point without any issue.

image
Generally been OK for us. Firmware is 4.6.1 RC27 I believe in updating only if there are clear benefits or security issues AND if others are happy with stability. Our customers will not put up with outages, every time we have one on an AP, we would lose at least 1 customer. Not that we are unreliable, but even a couple of hours once a year is perceived badly. Never mind that if they have fibre,(fiber) that often takes days to repair!

2 Likes

That’s good to know. I’m testing 4.7.1rc13 on a few eptp links and they have not dropped yet. I’ve not tried this fw on any 3000’s as the gui is all messed up for me and won’t load at all on Chrome. Have you had any gui issues on 4.7.1rc13? some menus don’t show at all, like max mcs setting for uplink on the SM.

I feel like the ePMP firmware and the F400ax firmware is about to become properly stable @ 80mhz.

I hear you EI3HG, luckily these APs don’t crash/self reboot very often, but they always end up doing so, even if its after a few months.

I don’t suppose that AP was running at 80mhz? I know it’s not common but it allows big bandwidth.

This got me checking mine. My longest 3K uptime is 26 days. My 2K and 1K APs are all when we last downgraded firmware (to 4.6.1) over 200 days ago.

So, yes, apparently our 6 3K APs all reboot very frequently and my monitoring software is not catching all of those since they seem to be pretty quick.

We do not have the AP doing anything. It is a transparent bridge. Our routers handle all shaping and public IPs are handed off to customers via a data VLAN.

Every AP and SM in our system is at 4.6.1 firmware and will stay there until we replace with 6 GHz gear.

Thanks Au Wireless

… and this is what i’m talking about. I’m also using as transparent bridge, with no shaping done at radio, and using data vlan feature. I wonder if it’s the data vlan feature that leads to the crashes on the 3k. I have a 3KL with 122 days uptime but its the 3K that’s the most crashy.

Ive not tried 4.7.0.1 yet, but 4.7.1rc13 is looking awesome for eptp links. The gui is messed up though, but command line always works. I’d choose satbility over gui any day.

I also don’t see any alerts as the reboot is quick enough not to trigger cn maestro emails.

Cheers

The AP shouldn’t care about data vlans. It doesn’t even know about them, that is a CPE issue. Our APs have nothing turned on other than Option 82. Everything else is basically default.

We tried 4.7 and it was a giant disaster for us. We lost a dozen customers as a result of the issues in 4.7.0, had APs lock up, SMs factory reset - dozens of truck rolls all due to firmware. We decided to roll back and stay there. Fool me once… I don’t see enough improvements to even think about trying that again. Customer complaints stopped the moment we downgraded.

ouch man that is rough, i’m sorry to hear you went through that - I remember being so stressed out of my mind in 2019 due to early ac cambium firmware. I was terrified of getting a poor reputation due to link drops. Luckily I got through it by scheduling reboots every morning at 4am, as the drops usually happened after a few days, so by doing this I beat them to it and managed to keep my reputation - I have lost one customer in 4 years of operation, and they went to starlink as they got it for their campervan.

I still run twice weekly reboots of my [last standing] ptp550 and the new F400AX gear for this very reason. Choosing to reboot at 4am is far superior to having a crash at 6pm. I have to say though that 5.6 looks to have sorted the AX gear.

I only had problems across all ePMP with 4.6.1 software. Have been running 4.6.2 for over a year now without issue. I’m currently trying 4.7.0.1 for new installs to get a feel for its stability before going across the board with the update.

Thanks for the screenshot, Sirgin. I have some 3000L with good uptimes too. It’s the 3000 that seems to crach/reboot a lot. You have any of those with uptimes over 100 days?

Cheers

In my opinion, I would skip this and go directly to 4.7.1 - RC18. I know I normally wouldn’t put an RC in production, but 4.7.0.1 has a number of bugs that 4.7.1 resolved. So I would suggest either stay with 4.6.2, or run 4.7.1 IMHO. (of course YMMV)

2 Likes