Tips on how to increase ePMP reliability

Seems like lately there’s been some operators that have expressed frustration due to frequent issues with ePMP. I thought I’d share some of the tips and tricks that I’ve learned over the years since first deploying ePMP back when it was first released by Cambium. We have a mixture of e1k, e2k, e3k… over 30 AP’s and over 300 clients. My operating philosophy is to have the radios do as little as possible, disabling anything that isn’t specifically being used, and allowing them to focus on just shuffling packets back and forth across interfaces.

Here are my practices that keep ePMP running smoothly:

1. Use the newest firmware on everything, do not revert. I religiously update to every new release. ATM we use 4.5.6 on everything (e1k, e2k, e3k). The reason why you want to use the newest firmware is that if you do have issues, it needs to be fixed and patched going forward. Only roll back to an older firmware if advised to by Cambium and you have an open ticket.

2. For those sites which you need GPS synchronization, have a backup, like Packetflux. Almost all of our sites have some form of Packetflux sync injection. We also use external puck antennas on all of our ePMP radios. With this combo you will never lose sync again.

3. We try to keep things as simple as possible for the radios. This means, no VLAN’s, no PPPoE, no RADIUS, no LLDP, no multicast, no firewall or filtering rules on the AP or SM’s, no broadcast/multicast shaping, no other services that aren’t specifically needed.

On the AP we enable QoS and define some MIR’s, enable cnMaestro, enable SM isolation, NTP, syslog, SNMP, and WPA2 security, ethernet MTU 1700, disable system and agent logs.

On the SM we enable NAT, QoS, Traffic and VoIP priority enabled, SNMP, ethernet MTU 1700, uPNP & NAT-PMP enabled, and all the “Advanced” options disabled EXCEPT for NAT helper for sip (enabled), disable system and agent logs, and of course we enable cnMaestro.

Disable any ports not in use, like unused SFP, or AUX ports on all radios.

4. The powers that be at Cambium will probably get grumpy with me about this, but I would avoid using ePMP or PTP550 for PtP’s on larger networks, especially ones that aren’t L3 routed and using something like OSPF. I only use 5GHz PtP’s for small, end of the line stubs on the network behind a router, where there’s no strange/spurious broadcast traffic to touch them. I’ve had frequent issues with ePMP and PTP550 and even PTP670 when I try to use them to link larger sites… strange issues where mgmt interface disappear, ethernet ports locking up, strange stuff like that, so I just avoid it from now on. I now use licensed links, like PTP820 or AF11’s right out of the gate now.

I think that’s about it, if I remember anything else I’ll update this.

Do you have a large ePMP network? Do you employ any of the above mentioned suggestions? What are some of the tips and tricks that you use to make sure things run smoothly?

13 Likes

Can’t say that I’m onboard with #1. All firmware has some bug/bugs and sometimes those bugs specifically effect something specific to your network. Actually, I am very much the opposite. We currently run v4.4.3 because given our setup it causes the least amount of problems. I never, NEVER, just roll out the newest firmware across my network. I watch the forums for a few weeks to make sure it isn’t bricking radios or doesn’t have any major bugs that would affect us (like breaking “auto channel configuration” would not affect us while breaking TDD/Sync would). If no issues on the forums after a week or so I roll it out on a single Micropop (this way if it bricks the AP it’s just ladder on a utility pole not a tower climb to fix) and the SM’s on that AP (all very near the AP all mounted were they are pretty easy to get to , no SM’s in trees or on towers or peak of a 12/12 pitch roof 40ft up). Then we watch this micropop for a week or two. No problems ? Roll it out to another Micropop and maybe a tower that is close by and doesn’t have a lot of customers on it. If, after a month or two, the new firmware fixes a known problem without introducing any new ones I upgrade a couple of larger pops. Normally we are almost always a firmware release or two behind. In fact all my ePMP 2.4Ghz stuff still runs 3.5.6 because it is very stable and I doubt that we ever upgrade the 2.4Ghz beyond 3.5.6.

#2 couldn’t agree more and it’s the biggest reason we are moving away from using any 3000L radios anywhere.

#3 So true, the more simple it is the less things/places for something to break.

#4 absolutely agree. IMHO the ePMP gang’s attitude towards how reliable these radios should be vs cutting the cost of the radio is a complete deal breaker when it comes to a radio that can take out entire sections of your network when it fails to perform. I realize this is pretty much what drives all products but for me the ePMP gang doesn’t value reliability as much as I do.

1 Like

Thank you for your valuable point of view
Why use QOS on “SM”? VOIP priorities in these times when the amount of DSCP tagged traffic from customers is almost 0? Are you marking UPLOAD from customers on home routers before “SM”? On the “SM” of the GUI I do not see the possibility of DSCP rules and frame classification, even by size.

Maybe I should have elaborated more on how I test Cambium firmware. I don’t just blindly install the newest one that comes out without any testing. I typically will run betas on a select few AP’s and clients, usually a mix of e1k/e2k/e3k… just to make sure they work on all platforms. ATM I have 4.6-RC29 beta running on like 4 e3k AP’s, and a couple e2k AP’s. I usually don’t run the beta on clients unless there’s a specific feature or bug I’m testing. If there are show stopping issues with the beta, I’ll note as much as possible, gather logs and engineering dumps, and then roll back to stable and open a dialog with someone at Cambium, usually Chinmay, Fedor or Dmitry via WhatsApp. It helps to have as much detail as possible and to be able to reproduce the issue somewhat reliably. I also have a dedicated VPN along with an actual computer for Cambium staff to login to my network and check things out any time they want. This actually works out well because a lot of the developers are overseas and when we’re asleep, they’re working and can test things on my network with minimal interruption to clients. I make it as easy as possible for Cambium to verify and fix things. Many bugs, some of them that took over a year to find a solve, were resolved because of my network.

After running the beta on a small group of radios and if I feel confident with that, when the stable version comes out, I’ll usually do a staggered roll out that might be over the course of a few nights. I start with upgrading the beta radios to stable, and if that goes well, then everything else.

I can honestly say that I can only remember like one stable version that bit me… I think it was like 4.3, or 4.4… there was some after the fact bug introduced. But aside from that, I’ve had excellent luck with the aforementioned approach to firmware mgmt.

4 Likes

First off, I’m not a network expert… I made the assumption that enabling VoIP priority would help with any type of VoIP traffic that the radio could ‘see’. To me this seems like a feature that could either help, or just do nothing, but not actually hurt anything… so why not enable it?

The help bubble for the VoIP priority feature is as follows:

image

2 Likes

We have many ePMP PTP’s (around 330 mix of Infrastructure/Customer mostly infrastructure “Routed OSFP Network”) we have had good luck with them (for the most part) for small to medium towers unless the spectrum is to dirty they just work, until we can justify a licensed link (820’s/AF11’s mostly). Almost all running 3.5.6 unless its a 300(only a hand full of sites), then we are using 4.5.6(latest version). We have also recently started to use 3G (CBRS) 450B high gains for a BHM’s for small towers where the 5G spectrum is just to dirty to get a stable connection they have been doing the job just a bit more latency than a ePMP. Thanks for the write up Eric always nice to hear how others a doing things. I appreciate your posts!

:slight_smile: Famous last words :slight_smile:

3 Likes

We’ve been running those settings on all of our ePMP clients since… well, since they were available as settings. Things seem to be running smoothly and the sky hasn’t fallen…yet.

2 Likes

I agree with almost everything Eric, but we do not control or provide routers to our clients so no control of what they use nor nothing but a poe injector to get back when they up and leave (which happens too often). So our setup is a bit more complex, we use the SM as a true endpoint and either use PPPoE and Nat or EAP-TTLS on radius with nat or for some clients that we are providing a managed network for EAP-TTLS on radius and bridged to our managed l3 switch. And if we are caring for a more involved network we add a cisco router/ASA that we can use to monitor and access the remote network when needed. We are using EAP-TTLS as it can (does in our setup) use radius set MIR setting that are not on the AP andbin excess of the 15 slots you are allowed, we also ensure data accounting is kept so clients whom choose a limited account are actually limited correctly.
The other bonus is that a radius setup also is used to give techs access to radios without having to give them a system wide admin password, theirs just works and if they leave we just disable their account.

We alway turn on voip qos as most in game voice chat is true voip and is much better with it on. Also facebook messenger voice is true voip and is horrible without this prioritizing.

All in all we almost never see the SM cpu over 45% and when we do its just for a few seconds at most. Most of the time the CPU is down around 10%.

1 Like

Generally I agree with all affirmations.

Only a question: why you set MTU to 1700 on both AP and SM’s?

1 Like

Setting the mtu to 1700 allows any normal ethernet packet to pass un-fragmented. Lowers the cpu utilization on the leading radio.

2 Likes

I prefer management VLAN’s and no NAT but I’m substantially in agreement with your article.

Management VLAN’s allow you to redirect customer traffic for no-pay without breaking management for updates etc. They also allow you to isolate your equipment to your management network keeping it off of both the public internet and away from customer accessibility.

Extra layers of NAT causes connectivity issues. The best practice is to provide public IPv4 and IPv6 to your customer’s router. If you cannot due to lack of available IP’s then at least provide a CGNAT IP direct to the WiFi router with a single layer of NAT in your network. The best NAT in a carrier network is NO-NAT but the reality is many of us require NAT so we can at least limit it as much as possible.

A notable mention is to configure DHCP Relay with Option 82 if your provisioning system supports it. At least with my system, this allows the customer to be authenticated against the SM MAC address because it’s included as an extra attribute. This allows the customer to change their WiFi router and get an address via DHCP yet they are still a known user because the DHCP request is appended with the SM MAC. If they have a Public IP assigned then it will continue to assign the same Public to the new router. It also allows you to control the DHCP server which must answer the request.

3 Likes

Clean power is a huge deal. Not only does it help your ePMP but if you are using Mikrotik or UBNT routers it will help balance out the input voltage. A regulated power bank is key to having a stable site, especially at rural and remote sites. I don’t know how many times adding a DC rectifier has solved issues at tower sites.

Use your OSI model. Fix the layer 1 stuff first. Good power, good grounding, properly terminated cables, proper standoffs, etc. Remember things like RF shadowing can be an issue.

I use the car analogy. You can have a 900 horsepower, pavement wrinkling torque monster but if it has bad gas it won’t go.

4 Likes

I am probably going to regret this @brubble1 but you know what I would love to do about your comment regarding the ePMP gang not caring about reliability?? I would love to welcome you to a 1 on 1 meeting with a few of our QA and R&D leads. We will present to you what we do, what kind of lab setup we have and our test plans. If after that, you feel we don’t care about what we ship to you then we will gladly take all constructive input and apply to current processes.
@Eric_Ozrelic - Thanks for your post but keep in mind not every network, not every customer globally is the same. There are many who are using things like voip priority on vast networks for example and offering triple play services. Your post is appreciated by many as I can see but there is some generalization there.

3 Likes

I agree on most of your tips; maybe the issues you mention is because the lack of management vlan, that is important!
We stiil have mini sites with F200 as backhaul, but many ptp650 and ptp820

Regards,

Sakid,

I would love to be part of this, I understand some of Brubble1’s issues and concerns. Reliability has been interesting and frustrating for the last couple years to the point that we have lost customers/clients.

The 4.x series of firmware issues that plague the epmp radios in a similar manner that UBNT equipment has issues. We are seeing graphical changes instead of having stability and reliable firmware. There has been much trouble with radios suddenly dropping subscribers, NAT mode data flow issues, GPS sync loss despite having ample satellites tracked and random rf resets with nothing to give a hint for what is going on. If it wasnt for needing a 4.x series firmware for AC/N compatibility, I would have stayed on 3.5.6

1 Like