PTP650 link working fine, but cannot ping/access the BH GUI

Ted_Stewart · April 3, 2017, 6:18pm

I'm having an odd problem. We have two PTP650 links, and seemingly randomly they decide to stop responding to pings and won't let me access their GUIs. Traffic and pings through them are fine, but pinging them directly leads to 80+% packet loss, and when I'm lucky enough to get to the login page I'm not able to get past it to the GUI.

It seems to happen to both links at the same time. It reminds me of a corrupted AV blocking specific IPs and web pages, but I've seen it when using multiple computers with different OSs and AVs on them.

These links are getting pounded with weather radar noise, which sometimes drops them down to 16 QAM and locks them there. due to an issue the dev team is already aware of. I'm not sure if that's pertinent to the issue, but it's only odd thing about these links that I know of.

Has anybody else experienced this?

Ted_Stewart · April 7, 2017, 4:53pm

This is still happening fairly regularly. Has anybody else experienced something similar?

Mark_Thomas · April 10, 2017, 4:13pm

Hi Ted,

Are you using a management VLAN with these links?

Do you take any action (like a reboot) to restore the management access?

Thanks, Mark

Ted_Stewart · April 10, 2017, 5:43pm

We're not using a VLAN. I don't have easy physical access to the units to reboot them, so I haven't tried it. The issue seems to randomly go away on its own, and the uptime indicates it didn't restart to do that.

Mark_Thomas · April 19, 2017, 2:15pm

Hi Ted,

The CPU in the PTP 650 controls and monitors the wireless link, and also supports the network management functions like HTTPS and SNMP. It's a powerful CPU and it has plenty of performance under normal circumstances.

If the CPU receives unwanted Ethernet frames it discards them, but processing the received frames still consumes some CPU cycles. It's important that the processing needed to inspect and discard unwanted frames does not use up resources needed for the higher priority tasks associated with keeping the wireless link in service.

The ODU guards against this by detecting the overload and discarding (in hardware) a proportion of the traffic destined for the management agent. This is an effective technique but not at all selective. Consequently, the ODU can appear to be unresponsive when the defenses are triggered.

This protection is essential to prevent a Denial of Service (DoS) attack, where a malicious user attempts to overwhelm the CPU with excessive traffic addressed to the management agent, and by this means to disrupt the end-to-end service.

You can check if the DoS defences have been activated by looking for "event, resource_low" in the syslog record. I appreciate that you can't do this whilst the ODU is non-responsive, so you will necessarily be looking back at an earlier time.

The management agent receives all Ethernet frames directly addressed to the ODU, plus all broadcast and multicast frames. This means that the anti-DoS defenses can be triggered when the link is carrying a large volume of perfectly legitimate broadcast or multicast traffic. This is largely a function of the design of the network in terms of switches, routers, use of PPPoE and such like. For example, a simple link between two routers will not normally need much Ethernet broadcast traffic. On the other hand, a broadcast storm would be a really bad thing.

If the above explanation is confirmed, a good solution would be to introduce a management VLAN. This has the benefit that the only broadcast frames to reach management agent will be the ones already in the management VLAN. The VLAN filtering is in the PTP 650 hardware, so there is then no danger that the CPU will be overloaded.

In addition, we think it is always a good idea to use a management VLAN to minimise the possibility of real DoS attacks from malicious users.

Please let us know how you get on.

Mark

Ted_Stewart · April 22, 2017, 6:45pm

Excellent resposne as always, Mark. There are indeed a bunch of resource_low events in the syslog.

We're looking into adding a management VLAN, so this ust adds another reason to do so.

Thanks!

kendali · July 23, 2019, 8:08pm

I must jump into this because you described our situation perfectly, BUT we do have a management VLAN in place that doesn't have very much in it. We are using PTP 600 links.

Thinking it could be a broadcast storm (we have about 4-5 providers with separate VLANs traversing the link) I peeled off one vlan at a time trying to see who the culprit is. I have two different VLANs that if I take them off at the same time the radio recovers to no packet loss and single digit ping times. These seems to resolve for a short period, but then starts up again.

I was thinking we should just budget to replace the 600s, but this sounds like it is a by design issue and replacing the radios may result in the same mangement loss issue. Might need to find a bigger magnifying glass to try and see what is going on in the provider networks.

RayS · July 23, 2019, 8:16pm

Congratulations Kendali

Your post is the 90,000th post on the Community.

send me a message and we will send you a shirt.

Ray