PPPoE Problem has me at my wits end...

The problem :  ePMP radios appear to just stop attempting to connect via PPPoE or don't even start sometimes (freshly rebooted radio sometimes doesn't get a PPPoE connection)

Frequency :  We use to have maybe one radio  a month with this problem but suddenly about 2 weeks ago it became at least one or two a day and some days there will be 10 or more.

What changed: Nothing. We haven't upgraded or updated or replaced a single piece of gear on our network in the last month.

Radios are 2.4 and 5Ghz all makes and models of ePMP  (though so far I don't believe we have had an elevated ubiquiti do this all of them are elevate v3.4.1) *Elevate radios are just as affacted as the rest* and firmware versions on customer radios is 3.4.1 and 3.5 (AP's are all 3.4.1)

So when this happens a customer calls in to say they have no Internet. We log into their radio and see that it does not have a wireless IP address. Look at Monitor->Network and it shows PPPoE  Enabled-Connecting  there is nothing in the logs, nothing anywhere on the radio that we can find tht would explain why it just sets there at Enabled-Connecting  .   I log into the PPPoE server and search the logs and don't see any errors , I even searched for the Wireless MAC address to see if the radio is even trying to authentica and nothing. From what I can see it looks like the PPPoE client on the radio just hangs.   Today I rebooted a customer radio for a different/unrelated problem and when it came back up the PPPoE didn't connect.   Rebooted it and it did.   Had a customer radio I found ( I went looking for radios not getting a PPPoE connection) and it was a customer I knew was out of town. Left it and for about 14 hours (I have no idea how long it was down before I found it) it just sat there, no IP, just saying  Enabled-Connecting. Nothing in the PPPoE server logs, nothting anywhere to indicate a problem. Anyway, after at least 14 hours of no IP/PPPoE connection I rebooted it and it got an IP/PPPoE connection as soon as it came back up.

So, I'm willing to consider that something on my network may be causing the ePMP radios to do this but I have no idea how to figure out what is even going on with them.  Is there some kind of logs or data that can be accessed via SSH or something on these radios that would give me some kind of clue as to what their problem is ? 

Also-

We see this once in a great while on a 450i customer or Canopy customer. The old canopy radios never had this problem, EVER, until the last firmware update when Cambium stopped releasing new firmware for the Canopy radios that's when we first noticed it on the Canopy stuff (though all the radios were 11.something  up until then). THe 450i customers started having this eissue (also very rareley) around the same firmware update and continue to have it vary rarely like the Canopy radios.  


We never ever even one time that I remember had this problem with our old Ubiquiti radios and so far I don't believe we have had a single elevated uqibuiti radio have this issue (elevate v3.4.1) and we never had this problem with ePMP radios until somewhere around v3.4.1 firmware but I don't know exactly when it showed up as it was really rare that it happened.  Then suddenly two weeks ago it became a many calls a day problem.

Nothing fixis the problem every time but usually you can reboot the radio, power cycle the radio, sometimes just kicking it off the AP will fix it , and sometimes you can just change a setting on the radio (like enable the home account and hit the save button) to make the radio suddenly get a PPPoE connection.

1 Like

I am facing the same issue, it started around two weeks ago as well, started with on SM and it kept spreading on 80% of the SMs, they keep hanging there till deregistration fro the AP or Powercycle it.

i am using Bridge mode and static IPs

Rebooted the PPPoE server early this morning. When it came back up the FTTH , 450i and Canopy customers authenticated and worked fine.

Over 700 ePMP radios did not authenticate, doesn't look like they are even trying.  So for the last 5 hours I have been setting here logging into each ePMP customer radio and rebooting them. Rebooting them doesn't always fix the problem so I have have to wait for it to reboot so I can log in again to see if it got an IP and if it didn't then rinse and repeat.  Some of the radios have taken 6+ reboots before they worked...

The ePMP's poorly designed interface just makes the job harder and more time consuming...

And people wonder why I argue that most CPE routing implementations suck and why we insist that we run all CPEs/SMs in bridge mode.

"Crappy PPPoE client implementations" is actually the #1 reason why.

-- Nathan

Just discovered today that the one cnPilot we have behind a bridged SM is also having this problem.  So all ePMP and our one cnPilot's PPPoE client craps out while the 450i , Canopy and a horde of $30 belkin and netgear routers work just fine.

Edit:  Oh and I forgot about the packetloss ...  below the ePMP radio and network do not change (other than putting the ePMP in bridge mode for the other two tests). Also of note is that the Pings when the ePMP was in PPPoE mode were being sent every 2.5 seconds. When I started to do the other test and realized that I set the pings to every .5 seconds. So Windows PPPoE and $30 router PPPoE lost zero packets in hours sending pings every .5 seconds.

PC ---> ePMP doing PPPoE ---> Network ---> PPPoE Server ---> Cloud = 30%+ PL 

PC windows PPPoE --> Bridged ePMP ----> Same as above = not a single packet dropped in hours

PC ---> $30 router doing PPPoE --> Bridged ePMP ---> Same as above  = No packets dropped in hours