Broadcast Storm Control

Any decent managed layer-2 bridge / switch has this nifty feature. Canopy should be no different.

We have a 4-AP cluster of around 500 SM nodes, impaired multiple times by the event of a Broadcast Storm. The storms were caused by misconfigured and/or malfunctioning customer routers. To our dismay, there is no effective way to filter a broadcast storm at the radio level - we have to manually track down the offensive node(s) and shut down the ethernet.

Broadcast Storm Control should be implemented at the SM for greatest effect. Count the number of broadcast packets received per second at the ethernet port. If the broadcast rate exceeds a threshold setting, drop further broadcasts for a period of time.

What do you think?

It should be (if it isn’t) implemented at the AP end, along with the CMM. If the packet isn’t bound for the IP/MAC address of another SM/AP, it should never get there.

A basic fact of Layer-2 ethernet is that Broadcasts must be delivered to all nodes within the same domain or VLAN. Otherwise basic things like ARP would not function correctly, and you’d break connectivity for Layer-3 (IP).

In other words, you could not simply block broadcasts at the AP if you wanted an SM to be able to communicate directly to devices at another SM. We have subscribers that need this ability for a VPN connection, or remote management from home.

Today, Broadcast Storm is not an issue for our network because I’ve eliminated the faulty routers that a couple of our subscribers had. Tomorrow, it could be in my face again. If I wanted to cause some havoc in a Canopy cluster, I would generate as many broadcasts and multicasts packets as I can, flooding everybody else off the air. Note also that your AP’s cpu time gets consumed having to deal with all the broadcasts, and it does slow down the web interface.

Sure you could use the NAT function in a Canopy to create a broadcast domain barrier, but that also breaks VPN and remote access ability for the subscriber. Tell me how else could a Canopy network operator protect his IP network from a subscriber sending, say, a broadcast ARP flood?

jbaland wrote:
A basic fact of Layer-2 ethernet is that Broadcasts must be delivered to all nodes within the same domain or VLAN. Otherwise basic things like ARP would not function correctly, and you'd break connectivity for Layer-3 (IP).


Totally incorrect. Look into how network switches work. If the AP worked that way, life would be good.

In other words, you could not simply block broadcasts at the AP if you wanted an SM to be able to communicate directly to devices at another SM. We have subscribers that need this ability for a VPN connection, or remote management from home.


Not true.

Sure you could use the NAT function in a Canopy to create a broadcast domain barrier, but that also breaks VPN and remote access ability for the subscriber. Tell me how else could a Canopy network operator protect his IP network from a subscriber sending, say, a broadcast ARP flood?


It doesn't break VPNs. It means that there'd have to be reverse NAT.

Perhaps it would help you to view this document about Switching and VLANs: http://www.netcraftsmen.net/welcher/pap … hvlan.html. The part that explains broadcasts being forwarded to all ports is just after the heading “What is a LAN Switch?”.

Canopy systems function as a layer-2 switch, with a wireless layer-1. When a device at an SM sends a broadcast, the AP at your tower is forced to repeat that packet out its ethernet AND back to the downlink channel for all your SM (and their ethernet-attached devices) to hear.

If you want to try a test, you need two PC, two SM and an AP.

You can try an IP conflict test: Set both PC to the same address. Most OS will complain about duplicate IP or that the address is already in use. How did one pc know the IP was in use? It sent an ARP broadcast asking the whole network if anybody has this IP.

You can also try pinging unique IPs within the same subnet: Set one PC to 192.168.1.50, the other to 192.168.1.51 and ping each other. If you have no firewall in your PC and no other filtering, you should get responses.

Canopy does not look at the IP (layer3), only MAC (layer2). When you ping some other IP in the local network, you have no idea its MAC; you have to ask with an ARP broadcast. All devices hear the broadcast; the one which has your requested IP responds with unicast to your MAC with an ARP response. That final response is not sent everywhere - only to the one SM.

jbaland wrote:
Canopy systems function as a layer-2 switch, with a wireless layer-1. When a device at an SM sends a broadcast, the AP at your tower is forced to repeat that packet out its ethernet AND back to the downlink channel for all your SM (and their ethernet-attached devices) to hear.


I see what you're getting at now.

jbaland,

It doesn’t address every case, but I’m curious if you have Bootp Server and SMB filters enabled, or if that would help in the packet storms you have seen.

FYI, if the APs had OSPF and kept a routing table, they would be a lot more versatile than a bridge. :confused:

One thing to notice is that the AP comes defaulted with a broadcast repeat count of 2. This means that when you do have a broadcast storm it is increased two fold automatically. I suggest reducing this value.

The feature request has been sent for limiting TX broadcasts at the SM and AP.

I'm curious if you have Bootp Server and SMB filters enabled


Yes, as of 4.2.3 I have been enabling a few checkmarks (to block) in the advanced network config:

    * PPPoE
    * SMB
    * Bootp Server
    * IPv4 Multicast
    * All others (not ipv4)


This set blocks most junk you'd not want to see from customers on a pure IP network. Bootp Server blocks a customer from serving DHCP out their SM to your network.

the AP comes defaulted with a broadcast repeat count of 2


This is a good note - reduce this to 1 should help when set on every AP in the cluster. I've set it and have had no problems from it so far - it doesn't seem to have affected the performance of DHCP or ARPs.

Our specific broadcast storm is very interesting, I think. Perhaps somebody else here would benefit from knowing about it.

Recap of the system: Canopy cluster of 4 AP (5250), with about 400-500 SM. The busiest AP has 159 subscribers, and its still truckin along. We now run HW scheduler, and the SM are a mix of P7,P8, and P9. All run the latest 7.2.9 firmware. Almost every SM is attached to a consumer router from Linksys, D-Link, Belkin, etc. We use private addressing, and a central linux-based NAT engine.

I have recently logged packets from our bandwidth-hogging broadcast storm. The cause is a SINGLE packet from one of these “consumer” routers. I have a binary pcap file available if anybody is interested in trying to figure out WHY this happens, but here’s a quick breakdown…

One router sends out an “igmp v3 report” message, to a L2 multicast address. Canopy treats this as a broadcast, and every SM gets it. Next, many other routers respond with an “icmp protocol 2 unreachable” message. Whats worse, those icmp replies are addressed to an ethernet broadcast! So every SM hears that too! The effect is that all bandwidth on the whole cluster (since its switched in a CMM) goes to 0 for the length of the storm.

Here’s the first few packets of the storm. There’s over 400 icmp replies to ethernet broadcast per second. I counted two igmp report packets at the start of each storm, each spaced one second apart.

16:33:03.072748 00:40:ca:38:1a:75 > 01:00:5e:00:00:16, ethertype IPv4 (0x0800),
length 60: IP 172.26.40.113 > 224.0.0.22: igmp v3 report, 1 group record(s)

16:33:03.084267 00:06:25:9a:98:64 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800),
length 74: IP 192.168.1.1 > 172.26.40.113: icmp 40: 224.0.0.22 protocol 2
unreachable

16:33:03.096916 00:0f:b5:25:b3:81 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800),
length 74: IP 192.168.1.1 > 172.26.40.113: icmp 40: 224.0.0.22 protocol 2
unreachable

16:33:03.111053 00:0f:b5:ec:f6:83 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800),
length 74: IP 192.168.1.1 > 172.26.40.113: icmp 40: 224.0.0.22 protocol 2
unreachable

So now, the “IPv4 Multicast” filter on the advanced page WILL prevent this, because it blocks the very first “igmp v3 report” packet. However, a few of our installers regularly forget to set the filters, and so the storms persist. :frowning:

I have had exactly the same problems, however I narrowed it down to one SM we had in the network.

The problem first started after an update was performed on the network using the CNUT tool. The last item updated was the SM that started causing the problems. During the update this SM experianced a problem and the update was rejected. Then the flooding started. It took me 3 days to narrow it down to this SM but eventually we switched it out and all worked fine again. Our ethereal trace reports show exactly the same as the previous post with a message sent to the multicast IP address and hence the problems.

We now use the filter feature to stop client side floods but occasionally we find SM’s that start this process off again, a reset to defaults and a reload of the latest update seems to fix the problem.

Where does one change the broadcast repeat count setting in 8.1.4?

Nevermind, I was just told that this is not configurable with hardware scheduling (thanks rjk in #canopy on irc.esper.net).