Looking for help from anyone with a solution or Aaron/Charlie/Vernon/Not-Matt to consider a firmware patch if possible.
Over the last few months we have noticed an increasing number of broadcast storms that end up knocking down visibility to individual APs and Towers as a whole. We used to see maybe 1-2 a year spread over 5000 customers usually due to a loopback or LAN switch failure. Now, we have seen 4 in the last 2 months all related to netgear routers.
As you guys know netgear routers have been getting owned on customer networks and randomly start a broadcast storm that ends up crippling a tower or triggering the AP to be shutdown through protection
This causes us to have to go through the AP and disable/enable SMs one at a time until we find the infected customer who is causing the ARP flood attack
Whats strange is the ARP request is the MAC of the SM while the destination is the broadcast. That makes it seem like the radio itself is at fault. This is why I am reaching out to you guys.
Is there any way you guys can consider a firmware update that specifically detects this type of activity and implaments some type of storm protection to shut down the traffic?
I understand this is on a layer 2 level and goes beyong QOS filtering but it has to be detectable.
Please share any information that you guys can spare. I understand this is a common problem in our industry but as I stated previously the frequency of these events in relation to compromised netgear routers has become exponentially more common.
We have tons of Netgear routers on the network and haven't seen this since we've enabled broadcast/multicast uplink rate limiting (8kbps) and the multicast filter on all SMs about 5 years ago.
We don't use spanning tree anywhere in the network on any L2 segments. We keep broadcast domains as small as possible. We have a number of routed core sites. The remote sites off of those are all bridged back into the core network. So if there's an issue like this, it's isolated to that segment.
That said, we did have a business customer plug in a switch to their SM and looped two ports on it. That caused what you describe, ARP hell. We shut down the SM's ethernet interface. And yes, it took a while to figure out which customer it was, so I can relate with your frustrations there. We're considering going to NAT mode at 99% of our SMs for things like this.
We already filter multicast but it seems on the pmp450 network (which is why i put the post here) the netgear routers that get compromised seem to be causing a bigger issue
The arp filtering option (which is also specific to the pmp450) does not seem to be helping the issue either.
The vulnerability was officially announced on october 12th so we are just seeing the beginning of this now. Ideally it would be nice for netgear to step up and patch their firmware but that does not seem to be the case as of yet.
An idea we were floating around would be to disable the router, turn dhcp on for the SM, add the mac to the bmu, then pray for the best. Our lab setup for the juniper storm protection has not been successful so far.
Are we the only people that are seeing this?
It should also be worth mentioning that we are using the LastMile CTMs at the tower sites affected so far.
We run something similar with regards to SM isolation, we take it a step further and run port isolation on our switch to prevent one AP from sending any packets to SMs on another AP. This for us is more effective then splitting into different VLANs per AP. It makes it less work, and less IP waste on the network. We have also been converting all SMs into NAT mode, we also use PowerCode, over all your network is just much more stable if customers are not able to bridge frames on your layer2.
Just wanded to follow up on this topic as I am sure more people are seeing this event in the wild
We had ot switch to SM Isoluation Option 2 then apply policers on our Juniper switches to limit some of the damage. We still arent out of the woods yet, but would really like to know if anyone has a better solution.
As I stated previously: we had only seen one or 2 of these events a year and now we have seen it close to a dozen times in the last 6 months. We have filtering and isolation in place but it still turns into an ARP hunt to eliminate the infected customer.