icmp protocol 2 unreachable packet floods

I didnt mean to post in this section, but oh well… Maybe the staff can move it over to general discussion?

I was wondering if anybody has seen something like this happening on their network. The symptom is at intermittant times, for about a minute to two minutes, an entire cluster’s throughput goes to 0. By that I mean no data can be transferred between SM and AP. I found this to be caused by a TON of broadcast packets from my SMs.

We have a Canopy cluster of 4 AP (5250), with about 400-500 SM. The busiest AP has 159 subscribers. We run HW scheduler, and the SM are a mix of P7,P8, and P9. All run the latest 7.2.9 firmware. Almost every SM is attached to a consumer router from Linksys, D-Link, Belkin, etc. We use private addressing, and a central linux-based NAT engine.

I have recently logged packets from our bandwidth-hogging broadcast storm. The cause is a SINGLE packet from one of these “consumer” routers. I have a binary pcap file available if anybody is interested in trying to figure out WHY this happens, but here’s a quick breakdown…

One router sends out an “igmp v3 report” message, to a L2 multicast address. Canopy treats this as a broadcast, and every SM gets it. Next, many other routers respond with an “icmp protocol 2 unreachable” message. Whats worse, those icmp replies are addressed to an ethernet broadcast! So every SM hears that too! The effect is that all bandwidth on the whole cluster (since its switched in a CMM) goes to 0 for the length of the storm.

Here’s the first few packets of the storm. There’s over 400 icmp replies to ethernet broadcast per second. I counted two igmp report packets at the start of each storm, each spaced one second apart.

16:33:03.072748 00:40:ca:38:1a:75 > 01:00:5e:00:00:16, ethertype IPv4 (0x0800),
length 60: IP > igmp v3 report, 1 group record(s)

16:33:03.084267 00:06:25:9a:98:64 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800),
length 74: IP > icmp 40: protocol 2

16:33:03.096916 00:0f:b5:25:b3:81 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800),
length 74: IP > icmp 40: protocol 2

16:33:03.111053 00:0f:b5:ec:f6:83 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800),
length 74: IP > icmp 40: protocol 2

So now, the “IPv4 Multicast” filter on the advanced page WILL prevent this, because it blocks the very first “igmp v3 report” packet. However, a few of our installers regularly forget to set the filters, and so the storms persist. :frowning:

Yes we have found the exact same thing. We couldnt quite firgure out as to where it was originating but we quickly found that the problem was coming from those radios that didnt have the IP4 Multicast checked. As soon as we checked that option the problem dissapeared.

I think that even an option for limiting the packets per second would also be very helpful. This is an option on a lot of gear and should be in these radios as well.

I just came in here to suggest this and found this post which was the problem that we wanted to address. I know this problem has happened to numerous collegues of mine that use this same system and most of them dont check this option in their radios.

Hit the exact same problem today. It’s very frustrating trying to track a problem down when out of the 20k+ packets affecting the network, that one initial packet is the cause.

Googling around for solutions for this issues, and I ended up…here.

I had somehow hoped that this problem was not directly canopy related, but not I’m beginning to think that the canopy equipment is the sole cause.

This would explain why my analysis of millions of packets has turned up no useful results - after all, if the canopy hardware is causing the problem, any kind of troubleshooting will make no sense.

This is causing me considerable issues again, with 100s of millions of these packets killing large portions of my network…makes running OSPF rather difficult. One one segment with around 2200 canopy SMs, we were seeing upwards of 1.5 million packets in under a minute.

Sigh, we need answers :-’

We’ve been crippled by this a few times. Filtering IPv4 has done the trick, but the “packets per second limited” sounds like a good general-purpose feature for Canopy.

we got this problem when we first moved to 7.2.9 software. Since then we have most of our users NAT enabled, and those that are NAT disabled are always on a VLAN.

This has helped us to prevent the problem. Although at the time we did not know it was an IGMP broadcast causing the issue.