PTP link starts slowing down till it passes almost nothing

brubble1 · January 28, 2017, 6:52pm

Set to ePMP PTP radios :

Firmware : was 2.6.1 now 3.2.1

TDD PtP mode

75/25

5310 Mhz

40Mhz channel

13dBm TP

RSSI d/u (from Master) -55/-60

This link is on a brand new PoP with 2 customers and it moves very little data for most of the day as both customers only use the service in the evenings.

We have a program we call NAG that pings each device on the network 5 times every 15 minutes. If a device fails to answer all 5 pings then I get a text to my phone. A few weeks ago I get a text that the slave end of this backhaul and the two AP's and the Pi behind the backhaul were unreachable. So I remote in and sure enough I can't reach the slave nor any devices behind it. I log into the master radio and it shows it has a link and all the stats look normal .

Since I couldn't do anything else (and by now one of the two customers had called to report his Internet wasn't working) I rebooted the master radio. It fixed the problem. I could reach everything and all tests came back good.

A few days later I get a nag that it could not reach one of the devices at that site. I remote in, and while I'm able to reach the slave radio it is really slow, takes a long time to load the web interface ... well I'm running 2.6.1 at this point so when I say slow I mean even slower than usual. I started pinging the device and I'm getting about about 30% packet loss and sporadic high latency.

OK , I don't have a lot of time to mess with it so I upgrade the firmware from 2.6.1 to 3.2.1 reboot and all seems good.

2 days later another Nag. Again the slave radio loads really slow and lots of packetloss/latency.So I Deregister the slave from the Master and when it reconnects everything is fine.

A day later and another Nag. Again the Slave radio loads really slow and lots of packetloss/latency. It's early morning so I decide to leave it and if it gets worse. Every 15 minutes I get a Nag telling various devices at that PoP can't be reached and after about an hour nothing can be reached. I try to bring up the slave radio can't reach it. I log into the master radio which shows it has link and everything looks fine. So I decide to run a link test just to see how much the two devices are able to talk to each other ( I mean, it shows it has a link with the slave so they are passing at least some data).

The link test comes back 15/9 . I run it again 30/12 and again 60/40 and again 105/43 and every test after that it comes in around 100/40 and the link is now running like it should. The next day I get a Nag and just log in and run the link test, 20/9 then 40/20 then 100/40 and it's fixed again.

So, any ideas what might be going on here ?

Completely unrelated to this I noticed that when I was running 2.6.1 the link tests were in the 150/45 vacinity vs 100/43 on 3.2.1. However under both versions of the firmware I can push 177Mbps / 48Mbps UDP over this link (testing between the mikrotik routers at each end of this link) consistantly. Thats when the link is running correctly, not when the link is acting up of course.

Eric_Ozrelic · January 29, 2017, 7:41am

How much bandwidth do you need? Could you try using a smaller channel width? Does eDetect show any interfering radios? Also, have you tried ePTP mode?

brubble1 · January 30, 2017, 12:00am

I need all the bandwidth I can get. We hopefully will be loading lots of customers on this PoP and we are putting up another PoP that will backhaul through this one. With a little luck I'll be needing 300+Mbps here by mid summer.

I can temporarily try a smaller channel to see if the problem goes away.

eDetect shows nothing other than the radio at the other end of the link (e.g. Ran from the slave it shows nothing but the master and visa versa)

I could put it in ePTP mode for a few days just to see if the problem goes away.

brubble1 · September 4, 2017, 5:02pm

This problem has never been resolved. Putting the radios in 20Mhz channel on ePTP (no GPS Sync) does not exhibit the problem however TDD PTP with GPS and 75/25 at 20 or 40 Mhz channel does. We have upgraded the radios with every firmware version and most recently updated them to 3.5. We have replaced both the slave radio and the master radio with brand new ePMP PTP radios and GPS pucs. We have tried using a SyncInjector instead of the onboard/Pucs , nothing works. If I try to use GPS sync on this link then within hours it will stop passing all but just a trickle of traffic. The slave radio and all radios beyond it become inaccessible.

WHen this happens the SNR and RSSI look great on the master, the MCS is 15/15 but you can not reach anything beyond it. However if you run a link test between the master and slave, even a 4 second one, they start passing traffic again and the link works just fine.. for a while.

Do I need to start an actual ticket ?

Luis · September 6, 2017, 10:18pm

Hello brubble1,

I think it would be best to submit a ticket with Cambium support for this issue, as it may need to be troubleshooted in your current deployment. There is no difference between TDD PTP and TDD PtMP modes, other than restricting the number of clients to only one in PTP, so it is strange a similar behavior is not seen in PtMP deployments. ePTP and TDD PTP are different radio drivers, so issues seen in one will not necessarily happen in the other.

Regards

gregn · December 9, 2017, 5:40am

Hi All,

My name is Greg. I run serveral hundred ePMP1000 & 2000 PTP links and I've seen this exact problem many times. We were using Force200's as PTP link but they needed almost daily rebooting of the SM. For some reason rebooting the AP (Master) didn't do much. We assumed it was a hardware resource problem so the Force200's were dumped in preference for the new and improved ePMP2000. The same problem presists with the ePMP2000 when used in PTP mode. I believe a similar problem also exists when the ePMP equipment is used in PMP mode. Many of of clients have the cnPilot 201P deployed and these power the SM via the PoE from the modem's WAN port. We set the 201P modems to auto reboot every morning and this in turn resets the SM's. Can't really do that for PTP links. The problem occurs in all firmware versions from 3.0 to 3.5. Please note; in firware version 3.0 the ePMP2000 can go into "sleep mode" and lose all communication to the ethernet port. The only cure is a power reset and that a problem when it's the remote end of a link.

The slowdown effect is similar to congestion. The difference is the link doesn't recover until reset. The problem is not caused by RF interference although RF interference does play havic with spectrum analyzer function. If Cambium can't solve this soon we'll have to source a replacement for the ePTP2000 product. On the up side the ePMP2000 is more flexiable and has better noise tolerence than the more exspensive PTP450i or the PTP670's. The ePMP2000 is a cost effective PTP link good for around 150Mb. Cambium just need to make it stable.

Look forward to a speedy solution.

brubble1 · December 12, 2017, 5:56pm

Our problem was fixed by changing the frequency we was using. We are very rural and never found any interference in the channels running the SA from both ends. The problem only occurred using TDD PTP GPS sync so if the problem was noise/interference then I don't know why it would not have affected ePMP non Sync mode also.. That said end it appears to have been an interference problem of some kind that only affected TDD PTP. As much as we watched those frequencies for interference and never seen any if it was a problem with Interference then it was a random blip that would only happen once every day or two that blip would screw up our TDD PTP connection so badly that it would just stop passing more than a trickle of traffic.

Things that seemed to be relevant:

1) The site had only a few customers and traffic was minimal most of the time.

2) The problem seemed to start during very low use periods. Like when there was just a trickle of data moving through the connection it would get stuck on trickle.

3) Running the link test from the master end almost always cleared it up. When it didn't, rebooting either end would clear it up.

4) Only affected TDD PTP using GPS Sync

5) Went away when we moved it to a different frequency/channel.

Douglas_Generous · December 14, 2017, 2:40pm

I have found that any interference on your channel can cause similar issues. 802.11AC home routers that are not using the same center channel, but still encroaching on your 20/40Mhz channel can force the radio to use a lower MCS and not try higher MCS schemes. We too ended up using a different center frequency and it took a bit of testing to find the right center that mitigated the interference that we could not see in the SA tool. we did notice that using the 2.5ms frame did help but at the cost of the link throughput. The 10% rule is not correct, we lost 20Mbps each way!

brubble1 · December 15, 2017, 3:54pm

"can force the radio to use a lower MCS and not try higher MCS schemes"

The radios always showed MCS 15 or 14. When the link would get stuck in "trickle mode" everything about the link looked perfect. There was nothing to indicate there was any kind of problem with the link other than it would only pass a few kbps of traffic unitil you ran a link test between them or rebooted one of them.

Douglas_Generous · December 23, 2017, 5:07pm

This is my own discovery and is no way proof or otherwise. Though I have tried to be scientific in my approach, I did not follow exact proceedures in any manner. So the following is the notes that I am sharing in hopes that others may gain some insight to similar issues.

Ive been doing some testing and an epmp ptp link can get stuck in a lower MCS level despite what the webpage reports. I can do this by having a strong rf signal too close to the link radios. The signal does not even have to be in the same band! Eg. we still have some pmp100 900mhz gear and it was too close to the 5GHz ptp link I was playing with. The link worked great until I let it rest for about an hour then tried to pull a large file from the other side of the link. I could not get more than 15Mbps through a link that was providing 80Mbps despite running MCS14, signal level -67. Ran a link test and suddenly Im pulling 80Mbps again. There is no specific reason but I moved the dish on the tower to be 6ft lower and it has not had an issue since. Total virticle seperation is 10ft now. We have 3 APs connected to KP Performance antennas that use an aluminum housing for the radios and they have never exibited this issue despite being on the same plane as the 900Mhz equipment.

Known to affect force 200 gear and force 110 gear. Force 110 gps sync gear seams immune. Possibly a shielding issue? Maybe a clock issue? GPS requires a very acurate clock on the reciever to measure the difference in delay. The non gps gear would not need as percise of clock or ever as sophisticated of clock.

Settings, did not matter if it was in eptp or tdd-ptp. Even tried using tdd-pmp and wifi modes. No difference in behaviour found. Several channels where tried and still no difference. All other setting were cambium default. PTP AP was tried in each frame size and mix.

Thing to note: another epmp ptp link is on this tower and has not shown this issue since we move channels to a channel with less channel edge interference.