PMP100 900Mhz dropping PPPoE sessions

Hello

Over the weekend we changed several hundred PMP100 900Mhz radios running v11.2 firmware over to PPPoE , previously we had been using NAT/DHCP with static IP addresses.  Immediatly many of the radios could not maintain a PPPoE session more than a few minutes.  We went in and changed the "Keep Alive" setting from the default 30 seconds to the minimum allowed 20 secs.  No help.  We  went in and changed them from "Keep Alive" to idle time out and changed the time to 300 minutes.  This seemed to actually help with some radios., a little, some would maintain a PPPoE session as long as 30 or 40 minutes but then they would have long periods where they still could not maintain a PPPoE connection more than a few minutes at a time.  The sessions drop when customer is moving data and they drop when the radios are idle, there appears to be no consistancy in how long they maintain a PPPoE session or when they drop.

So we upgraded the firmware  on one access point ( we have 9 POPs , 15 900Mhz APs) to the most current on the website v13.1.2 DES.  No change.  We  tried both Idle Time out and Keep Alive on the updated ap/customer radios and no change. Now there are a few, maybe 20 radios that have no problem maintaining a PPPoE connection. 

The problem does not appear to be related to signal strength or maintaining a radio session as radios with rock solid radio connections to the AP and 100%/100%  10sec link tests.  We don't believe it is the backhauls or switches because the Ubiquitis located on the same tower have no problem maintaining PPPoE sessions.

We have several hundred Ubiquiti and Fiber to the Home customers on the same PPPoE device (An Intel quadcore  Mikrotik router ) and we have no problems with these other solutions maintaining PPPoE connections. 

The MikroTik router logs show this when the radios are disconnected :

Nov/25/2014 09:00:17 pppoe,ppp,info,account wtth_username logged out, 15 54 506 3 20

Nov/25/2014 09:00:17 pppoe,ppp,info <pppoe-wtth_username>: terminating... - peer is not responding

Nov/25/2014 09:21:17 pppoe,ppp,info <pppoe-wtth_username>: disconnected

This is the PPPoE Log from the of the  v13.1.2 customer radios that can't stay connected. This is what I believe to be the logs relevent to when the radio drops the  connection:

11/25/2014 : 09:00:30 CST : rcvd PPPoE data pkt, type 0xc021

11/25/2014 : 09:00:30 CST : link 0xb320dc rcvd:
11/25/2014 : 09:00:30 CST : c0 21 05 02 00 12 6c 69 6e 6b 20 69 6e 61 63 74 69 76 65 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
11/25/2014 : 09:00:30 CST : ppp_inpkt (link 0xb320dc); prot: LCP, code TERMREQ, state OPENED, len 34

11/25/2014 : 09:00:30 CST : ppp_fsm (link 0xb320dc): LCP state: OPENED -> STOPPING

11/25/2014 : 09:00:30 CST : LCP down.

11/25/2014 : 09:00:30 CST : close (link 0xb320dc): IPCP

11/25/2014 : 09:00:30 CST : ppp_fsm (link 0xb320dc): IPCP state: OPENED -> CLOSING

11/25/2014 : 09:00:30 CST : ipcp: down

11/25/2014 : 09:00:30 CST : ppp_fsm (link 0xb320dc) sending TERMREQ, len:0

11/25/2014 : 09:00:30 CST : link 0xb320dc sending:
11/25/2014 : 09:00:30 CST : 80 21 05 b0 00 04
11/25/2014 : 09:00:30 CST : kicking off timer, oldstate 9 newstate 4

11/25/2014 : 09:00:30 CST : lowerdown (link 0xb320dc): IPCP

11/25/2014 : 09:00:30 CST : ppp_fsm (link 0xb320dc): IPCP state: CLOSING -> INITIAL

11/25/2014 : 09:00:30 CST : UPAP down.

11/25/2014 : 09:00:30 CST : poe_ln_disconnect: Disconnecting

11/25/2014 : 09:00:30 CST : poe_disconnect on link 0xb320dc, pline 0xb3c4b8

11/25/2014 : 09:00:30 CST : poe_sendctl code 0xa7, state:SESSION->CLOSED, tags:
11/25/2014 : 09:00:30 CST : close (link 0xb320dc): LCP

11/25/2014 : 09:00:30 CST : ppp_fsm (link 0xb320dc): LCP state: STOPPING -> CLOSING

11/25/2014 : 09:00:30 CST : lowerdown (link 0xb320dc): LCP

11/25/2014 : 09:00:30 CST : ppp_fsm (link 0xb320dc): LCP state: CLOSING -> INITIAL

11/25/2014 : 09:00:30 CST : LCP (0xb320dc) failed, closing IPCP

11/25/2014 : 09:00:30 CST : close (link 0xb320dc): IPCP

11/25/2014 : 09:00:30 CST : ppp_fsm (link 0xb320dc): IPCP state: INITIAL -> INITIAL

11/25/2014 : 09:00:30 CST : ppp_fsm (link 0xb320dc) sending TERMACK, len:14

11/25/2014 : 09:00:30 CST : link 0xb320dc sending:
11/25/2014 : 09:00:30 CST : c0 21 06 02 00 12 6c 69 6e 6b 20 69 6e 61 63 74 69 76 65 00
11/25/2014 : 09:00:30 CST : ppp_allpktsend (0xb320dc): lost line

11/25/2014 : 09:00:31 CST : CleanupCallback (PPPoE Server Driven)

11/25/2014 : 09:00:32 CST : PPPoE reconnect attempt in 3 seconds. (keepalive)

Any help on this is greatly apreciated .

Thanks

Do you have any protocol filters in place on your Canopy gear? Translation bridging on the APs? What is the Bridge Entry Timeout?

No protocal filters. Translation Bridging is checked Disabled. Bridge Entry Timout is 25 minutes.

brubble - One of the engineers here is looking at the log but noticed you said the latest software you went to was 13.1.2 however the PMP100 support site lists 13.1.3 as the latest.  Can you upgrade to 13.1.3?

https://support.cambiumnetworks.com/files/pmp100/

Is there alot of traffic on this link?  For 900 especially, since the available throughput is so low, you may be getting exposed to the keepalives not making it across the RF link due to overloading.

One way to mitigate this is to enable the setting to treat PPPoE control messages as High Priority.  This must be set on AP and SMs, and is on the QOS configuration page.  If you can enable High Priority itself for each SM (so that the High Priorify VC is  created, you'll have even better results when enabling this PPPoE control message option.

If that doesnt work, can you please send a CNUT capture of an SM that has gone through the problem to support@cambiumnetworks.com?  

It is most likely the issue is based on the keepalives, so you are playing in the right spot.  Are If the RF links on these devices showing issues  stable?  Have you tried changing the keepalive / timeout settings on the MikroTik?

Regards,

-Aaron

I have upgraded the test AP and the customer radios  to the v13.1.3. No effect on the problem.

As far as traffic goes it is our least populated 900Mhz radio and even during the wee hours of the morning when there is almost no traffic at all the problem persists. 

I will enable the setting to treat PPPoE control messages as High Priority to see if it makes a difference.

The PPPoE Server is configured with a 48 hour idle  timeout and the keep alive timout on the PPPoE server is disabled.

Will try your suggestions and see if anything works.

Thanks

I fixed part of the problem. There is a setting in the Mikrotik router called "Keep Alive timout" that causes the rotuer to check and see if the client is still there every X seconds and if it isn't then the router drops the session. Since the canopy radios default setting is to not reply to pings at the WAN interface the mikrotik was bumping them off anytime they were idle for more than 20 seconds.   I turned that function off and found that if I set the canopy radio to "kee alive" then SOME of the radios will maintain PPPoE sessions for days at a time. This also seems to keep the radios from dropping PPPoE sessions even when they were active and now it looks like they only drop the session when idle for random amounts of time.

If I set the customer radio to "Idle time out" or  "Keep Alive" and I enable the "Reply to Ping on WAN Interface" then I start Pinging the WAN interfacthe the radio will maintain a PPPoE session for as long as I am pinging it.

I configured the Mikrotik's PPPoE session idle time out to 2 days (I even turned it off) and it has no affect on the problem. The "Keep alive timout" is also disabled on the mikrotik.  So as far as I can tell there is no reason (assuming the mikrotik is working correctly) for it to terminate a PPPoE session. If I ping the WAN interface on the radio the radio will maintain the PPPoE session for as long as it is being pinged.

I guess at this point I need to figure out if the  canopy radios are terminating the session or the if it is the Mikrotik is doing it.  I don't see anything in the Mikrotik logs that appears to be the mikrotik terminating the sessions and I don't understand the canopy's logs.  

Below are the logs from a radio that had just restarted a PPPoE session about 20 seconsds before I copy/pasted the logs.  Can someone look at these logs and tell me the reason the PPoE session ended ?  Thanks

12/22/2014 : 11:01:56 CST : rcvd PPPoE data pkt, type 0xc021

12/22/2014 : 11:01:56 CST : link 0xb420dc rcvd:
12/22/2014 : 11:01:56 CST : c0 21 05 02 00 11 6c 69 6e 6b 20 69 6e 61 63 74 69 76 65 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
12/22/2014 : 11:01:56 CST : ppp_inpkt (link 0xb420dc); prot: LCP, code TERMREQ, state OPENED, len 34

12/22/2014 : 11:01:56 CST : ppp_fsm (link 0xb420dc): LCP state: OPENED -> STOPPING

12/22/2014 : 11:01:56 CST : LCP down.

12/22/2014 : 11:01:56 CST : close (link 0xb420dc): IPCP

12/22/2014 : 11:01:56 CST : ppp_fsm (link 0xb420dc): IPCP state: OPENED -> CLOSING

12/22/2014 : 11:01:56 CST : ipcp: down

12/22/2014 : 11:01:56 CST : ppp_fsm (link 0xb420dc) sending TERMREQ, len:0

12/22/2014 : 11:01:56 CST : link 0xb420dc sending:
12/22/2014 : 11:01:56 CST : 80 21 05 11 00 04
12/22/2014 : 11:01:56 CST : kicking off timer, oldstate 9 newstate 4

12/22/2014 : 11:01:56 CST : lowerdown (link 0xb420dc): IPCP

12/22/2014 : 11:01:56 CST : ppp_fsm (link 0xb420dc): IPCP state: CLOSING -> INITIAL

12/22/2014 : 11:01:56 CST : UPAP down.

12/22/2014 : 11:01:56 CST : poe_ln_disconnect: Disconnecting

12/22/2014 : 11:01:56 CST : poe_disconnect on link 0xb420dc, pline 0xb4c3bc

12/22/2014 : 11:01:56 CST : poe_sendctl code 0xa7, state:SESSION->CLOSED, tags:
12/22/2014 : 11:01:56 CST : close (link 0xb420dc): LCP

12/22/2014 : 11:01:56 CST : ppp_fsm (link 0xb420dc): LCP state: STOPPING -> CLOSING

12/22/2014 : 11:01:56 CST : lowerdown (link 0xb420dc): LCP

12/22/2014 : 11:01:56 CST : ppp_fsm (link 0xb420dc): LCP state: CLOSING -> INITIAL

12/22/2014 : 11:01:56 CST : LCP (0xb420dc) failed, closing IPCP

12/22/2014 : 11:01:56 CST : close (link 0xb420dc): IPCP

12/22/2014 : 11:01:56 CST : ppp_fsm (link 0xb420dc): IPCP state: INITIAL -> INITIAL

12/22/2014 : 11:01:56 CST : ppp_fsm (link 0xb420dc) sending TERMACK, len:13

12/22/2014 : 11:01:56 CST : link 0xb420dc sending:
12/22/2014 : 11:01:56 CST : c0 21 06 02 00 11 6c 69 6e 6b 20 69 6e 61 63 74 69 76 65
12/22/2014 : 11:01:56 CST : ppp_allpktsend (0xb420dc): lost line

12/22/2014 : 11:01:57 CST : CleanupCallback (PPPoE Server Driven)

12/22/2014 : 11:01:58 CST : PPPoE reconnect attempt in 3 seconds. (not idle)

12/22/2014 : 11:02:01 CST : start_ppp_link: Link No Longer Idle Attempt Connect

12/22/2014 : 11:02:01 CST : Connect (Link No Longer Idle)

12/22/2014 : 11:02:01 CST : open (link 0xb420dc): IPCP

12/22/2014 : 11:02:01 CST : ppp_fsm (link 0xb420dc): IPCP state: INITIAL -> STARTING

12/22/2014 : 11:02:01 CST : ppp_fsm_init (link 0xb420dc)

12/22/2014 : 11:02:01 CST : open (link 0xb420dc): LCP

12/22/2014 : 11:02:01 CST : ppp_fsm (link 0xb420dc): LCP state: INITIAL -> STARTING

12/22/2014 : 11:02:01 CST : poe_sendctl code 0x09, state:INIT->PADISENT, tags:
12/22/2014 : 11:02:01 CST : poe_rxctlcode code 0x7

12/22/2014 : 11:02:01 CST : poe_sendctl code 0x19, state:PADISENT->PADRSENT, tags:
12/22/2014 : 11:02:01 CST : poe_rxctlcode code 0x65

12/22/2014 : 11:02:01 CST : lowerup: (link 0xb420dc) LCP

12/22/2014 : 11:02:01 CST : ppp_fsm (link 0xb420dc): LCP state: STARTING -> REQSENT

12/22/2014 : 11:02:01 CST : ppp_fsm (link 0xb420dc) sending CONFREQ, len:10

12/22/2014 : 11:02:01 CST : link 0xb420dc sending:
12/22/2014 : 11:02:01 CST : c0 21 01 06 00 0e 01 04 05 d4 05 06 00 b4 20 e8
12/22/2014 : 11:02:01 CST : kicking off timer, oldstate 1 newstate 6

12/22/2014 : 11:02:02 CST : rcvd PPPoE data pkt, type 0xc021

12/22/2014 : 11:02:02 CST : link 0xb420dc rcvd:
12/22/2014 : 11:02:02 CST : c0 21 01 01 00 12 03 04 c0 23 01 04 05 d4 05 06 8a 4b f4 5f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
12/22/2014 : 11:02:02 CST : ppp_inpkt (link 0xb420dc); prot: LCP, code CONFREQ, state REQSENT, len 34

12/22/2014 : 11:02:02 CST : lcp_reqci: rcvd AUTHTYPE

12/22/2014 : 11:02:02 CST : (c023)
12/22/2014 : 11:02:02 CST : (ACK)

12/22/2014 : 11:02:02 CST : lcp_reqci: rcvd MRU

12/22/2014 : 11:02:02 CST : (1492)
12/22/2014 : 11:02:02 CST : (ACK)

12/22/2014 : 11:02:02 CST : lcp_reqci: rcvd MAGICNUMBER

12/22/2014 : 11:02:02 CST : (8a4bf45f)
12/22/2014 : 11:02:02 CST : (ACK)

12/22/2014 : 11:02:02 CST : lcp_reqci: returning CONFACK.

12/22/2014 : 11:02:02 CST : ppp_fsm (link 0xb420dc): LCP state: REQSENT -> ACKSENT

12/22/2014 : 11:02:02 CST : ppp_fsm (link 0xb420dc) sending CONFACK, len:14

12/22/2014 : 11:02:02 CST : link 0xb420dc sending:
12/22/2014 : 11:02:02 CST : c0 21 02 01 00 12 03 04 c0 23 01 04 05 d4 05 06 8a 4b f4 5f
12/22/2014 : 11:02:02 CST : rcvd PPPoE data pkt, type 0xc021

12/22/2014 : 11:02:02 CST : link 0xb420dc rcvd:
12/22/2014 : 11:02:02 CST : c0 21 02 06 00 0e 01 04 05 d4 05 06 00 b4 20 e8 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
12/22/2014 : 11:02:02 CST : ppp_inpkt (link 0xb420dc); prot: LCP, code CONFACK, state ACKSENT, len 34

12/22/2014 : 11:02:02 CST : LCP rcvd ACK: OK

12/22/2014 : 11:02:02 CST : ppp_fsm (link 0xb420dc): LCP state: ACKSENT -> OPENED

12/22/2014 : 11:02:02 CST : lowerup: (link 0xb420dc) AUTH

12/22/2014 : 11:02:02 CST : link 0xb420dc sending:
12/22/2014 : 11:02:02 CST : c0 23 01 01 00 1c 0e 63 77 74 74 68 5f 69 6e 73 70 65 63 74 31 08 73 70 61 72 6b 6c 65 73
12/22/2014 : 11:02:02 CST : upap_sauth: Sent id 1.

12/22/2014 : 11:02:02 CST : rcvd PPPoE data pkt, type 0xc023

12/22/2014 : 11:02:02 CST : link 0xb420dc rcvd:
12/22/2014 : 11:02:02 CST : c0 23 02 00 00 0d 08 4c 6f 67 69 6e 20 6f 6b 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
12/22/2014 : 11:02:02 CST : upap_input link b420dc: code: AUTHACK, id 0, len 13

12/22/2014 : 11:02:02 CST : Login ok

12/22/2014 : 11:02:02 CST : lowerup: (link 0xb420dc) IPCP

12/22/2014 : 11:02:02 CST : ppp_fsm (link 0xb420dc): IPCP state: STARTING -> REQSENT

12/22/2014 : 11:02:02 CST : ipcp_addci: ouraddr==0.0.0.0

12/22/2014 : 11:02:02 CST : ppp_fsm (link 0xb420dc) sending CONFREQ, len:18

12/22/2014 : 11:02:02 CST : link 0xb420dc sending:
12/22/2014 : 11:02:02 CST : 80 21 01 12 00 16 03 06 00 00 00 00 81 06 00 00 00 00 83 06 00 00 00 00
12/22/2014 : 11:02:02 CST : kicking off timer, oldstate 1 newstate 6

12/22/2014 : 11:02:02 CST : rcvd PPPoE data pkt, type 0x8021

12/22/2014 : 11:02:02 CST : link 0xb420dc rcvd:
12/22/2014 : 11:02:02 CST : 80 21 01 01 00 0a 03 06 42 2b 23 fe 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
12/22/2014 : 11:02:02 CST : ppp_inpkt (link 0xb420dc); prot: IPCP, code CONFREQ, state REQSENT, len 34

12/22/2014 : 11:02:02 CST : ipcp: received ADDR
12/22/2014 : 11:02:02 CST : IP:66.43.35.254:
12/22/2014 : 11:02:02 CST :

12/22/2014 : 11:02:02 CST : (ACK)

12/22/2014 : 11:02:02 CST : ipcp: returning Configure-ACK

12/22/2014 : 11:02:02 CST : ppp_fsm (link 0xb420dc): IPCP state: REQSENT -> ACKSENT

12/22/2014 : 11:02:02 CST : ppp_fsm (link 0xb420dc) sending CONFACK, len:6

12/22/2014 : 11:02:02 CST : link 0xb420dc sending:
12/22/2014 : 11:02:02 CST : 80 21 02 01 00 0a 03 06 42 2b 23 fe
12/22/2014 : 11:02:02 CST : rcvd PPPoE data pkt, type 0x8057

12/22/2014 : 11:02:02 CST : link 0xb420dc rcvd:
12/22/2014 : 11:02:02 CST : 80 57 01 01 00 0e 01 0a 00 00 00 00 00 f0 51 d5 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
12/22/2014 : 11:02:02 CST : ppp_inpkt: (link 0xb420dc): unsupported prot 8057

12/22/2014 : 11:02:02 CST : link 0xb420dc sending:
12/22/2014 : 11:02:02 CST : c0 21 08 06 00 2c 80 57 01 01 00 0e 01 0a 00 00 00 00 00 f0 51 d5 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
12/22/2014 : 11:02:02 CST : rcvd PPPoE data pkt, type 0x8281

12/22/2014 : 11:02:02 CST : link 0xb420dc rcvd:
12/22/2014 : 11:02:02 CST : 82 81 01 01 00 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
12/22/2014 : 11:02:02 CST : ppp_inpkt: (link 0xb420dc): unsupported prot 8281

12/22/2014 : 11:02:02 CST : link 0xb420dc sending:
12/22/2014 : 11:02:02 CST : c0 21 08 06 00 2c 82 81 01 01 00 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
12/22/2014 : 11:02:02 CST : rcvd PPPoE data pkt, type 0x8021

12/22/2014 : 11:02:02 CST : link 0xb420dc rcvd:
12/22/2014 : 11:02:02 CST : 80 21 03 12 00 16 03 06 a2 dc b7 2e 81 06 08 08 08 08 83 06 08 08 04 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
12/22/2014 : 11:02:02 CST : ppp_inpkt (link 0xb420dc); prot: IPCP, code CONFNAK, state ACKSENT, len 34

12/22/2014 : 11:02:02 CST : local IP address 162.220.183.46

12/22/2014 : 11:02:02 CST : ipcp rcv NAK: DNS (0x81) address 8.8.8.8

12/22/2014 : 11:02:02 CST : ipcp rcv NAK: DNS (0x83) address 8.8.4.4

12/22/2014 : 11:02:02 CST : ppp_fsm (link 0xb420dc): IPCP state: ACKSENT -> ACKSENT

12/22/2014 : 11:02:02 CST : ppp_fsm (link 0xb420dc) sending CONFREQ, len:18

12/22/2014 : 11:02:02 CST : link 0xb420dc sending:
12/22/2014 : 11:02:02 CST : 80 21 01 13 00 16 03 06 a2 dc b7 2e 81 06 08 08 08 08 83 06 08 08 04 04
12/22/2014 : 11:02:02 CST : rcvd PPPoE data pkt, type 0x8021

12/22/2014 : 11:02:02 CST : link 0xb420dc rcvd:
12/22/2014 : 11:02:02 CST : 80 21 02 13 00 16 03 06 a2 dc b7 2e 81 06 08 08 08 08 83 06 08 08 04 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
12/22/2014 : 11:02:02 CST : ppp_inpkt (link 0xb420dc); prot: IPCP, code CONFACK, state ACKSENT, len 34

12/22/2014 : 11:02:02 CST : ppp_fsm (link 0xb420dc): IPCP state: ACKSENT -> OPENED

12/22/2014 : 11:02:02 CST : ipcp: up