Detecting broadcast storm on a flat network

alexdehaini · October 30, 2007, 5:56pm

Hi everyone,

We have a flat network with over 15 CMMs connecting several back hauls and APs together. Recently, we have been experiencing strange problem on the whole network, whenever there is a problem, we don’t have a definite way of figuring out what the problem was.

If there is a broadcast storm - we can’t trace the source, is there any tool we can use to figure out broadcast storms?

vince · October 30, 2007, 6:48pm

get a tool like ethereal or wire shark. its a free download, the only broadcast storm issues we have are with the icmp floods. its an igmp v2 request that causes the igmp flood.

twinkletoes · October 30, 2007, 7:16pm

Plug in a packet sniffer into one of your switch ports to see the broadcasts. Run ethereal on Windows, or tcpdump on linux, or other packet capture/dump software to see what’s happening.

If you just sat there with tcpdump -i eth0 -w blah and let it sit for a day, the ‘blah’ file could be analyzed by anything that interprets PCAP format and you could see all kinds of broadcast traffic that you didn’t expect.

Also some cheap switches treat UDP traffic as broadcast. There are all kinds of wierd things that happen with cheap switches that shouldn’t. You can catch this type of crap with a packet dump, too, if you know what to look for. It takes some time and critical thinking, both of which are hard for most companies- you’re always too busy and you don’t have enough information (typically experience) to be able to make decisions on the traffic. I’ve been an IP and Ethernet custodian since 1991 so I have “seen it all” yet I still encounter new stuff all the time.

Jerry_Richardson · October 30, 2007, 8:34pm

Our experience has been that customers with infected PC’s are the culprit. Once we confirmed that any SM’s that are in bridge mode have IPv4
multicast and SMB filters enabled we have not seen a flood since.

Generally having customers PC’s connected directly to the network with the SM in bridge mode is bad practice, the SM needs to be in NAT or they customer needs a DSL router.

alexdehaini · October 31, 2007, 8:08am

Thanks for all your help and recommendations, I really appreciate it. The next question is what to look for when you run a sniffer like tcpdump. How do you know something is wrong, most of the information we get from tcpdump and wireshark is overwhelming and searching through all that information can be daunting.

Are there any filters on tcpdump and/or wireshark that can detect broadcast storms? What are the common problems that can affect a network running motorola devices on a flat network.

Recently, we had a weird issue that took out all of our CMMs, they all froze and only a manual reboot could bring them up again. We had no clue where to look because immediately we had this problem, the entire network went down. We run a NDS called OSSIM and found out that there was alot of MAC addresses changes from certain clients, could this lead to anything?

Anonymous1 · October 31, 2007, 8:08pm

When we had a similar problem, we noticed a lot of entries from the same IP address - tons and tons going to the same ports (in particular, 25 - it was an email worm).

Look for large numbers of connections coming from single addresses. I’ve found 3 or 4 at a time is not unusual when making connections to websites, but 10 or more in a row, followed by a brief pause and then another 10 or so is usually a good indication you have the culprit in sight.

alexdehaini · November 1, 2007, 8:28am

wifiguy,

Thanks for your reply. You see, that is the problem - what to look for, we have tonnes for rules on our NDS (Network Detection System) that will flag if there is a virus or malicious attack. However, if the problem originates from a remote site, the entire network can go down without the problem reaching our NDS because we run a flat network.

I know the ultimate solution is to put in routers at key sites but this is a long term solution and we have over 20 sites at the moment. I was wondering if there is a way we can spot this issue whilst it is happening.

Thanks once again.

Anonymous1 · November 1, 2007, 2:44pm

As has been suggested, you need to plug in a laptop into the network running a packet sniffer - if you don’t know exactly where the flood is coming from, you may need to try it in several locations or have a few people help you out and sniff multiple sites at once.

For me, I was fortunate in that the traffic made it to my router as was being logged, so from there I could go out to the site where it was coming from and figure out exactly which user was the cause.

alexdehaini · November 1, 2007, 3:05pm

Thanks guys, your assistance has really been helpful. Many thanks.

vince · November 1, 2007, 5:11pm

As far as monitoring my network, one thing im trying to do is set up a box that only captured igmp reports, does anyone know how to configure wireshark for that? I want to leave this running all day so if there ever is a spike i can find the source asap.

Thanks

brakoli · November 1, 2007, 8:36pm

As others have said this use to be a pretty major issue with Canopy devices in a flat network. Most of the port filters now in place are because of all the complaining most WISP’s did to Moto in regards to this.

That being said, all our storms were caused by Linksys routers issuing improper IGMP V2 packets, and every Netgear router on our network trying to respond… 100k packets in 2 minutes… great fun. We segmented after it was obvious it was an issue, but once we implemented the following Protocol Filtering on the SM’s, problems went away:
SMB
SNMP
Bootp Server stops rogue servers when customer plugs router in wrong
IPv4 Mulitcast stops the Linksys crap
User Defined Port 1 TCP/UDP on port 440 *per our Moto tech rep).

It’s important to remember this filters are 1 way, FROM the SM.

If you have Ethereal running on the segment, trust me, it’s not hard to see! Stop the Ethereal device, then look at the top of the storm to see who sent the 1st IGMP Report request… That is usually the first place to look (not always, but most the time it is the culprit.)…

Brian Brakke
LP Broadband, Inc.

alexdehaini · November 3, 2007, 12:24pm

Thanks for the recommendations guys - I knew I could get some great ideas and suggestions from here. Thanks once again