I have had two occasions recently where two different AP’s seem to be flooded with what shows in the Prizm graph to be 5-15 Mbs of Ethernet traffic, RF traffic and Ethernet discards.
This usually lasts for 15-30 minutes at a time and may occur several times a day.
Prizm also shows that a single SM is seeing or generating the exact same thing.
It is my understanding that a 900 Mhz radio can’t do more than 4-5 Mbs.
If so, how can Prizm show 15 Mbs of transmit and receive traffic?
When this happens all customers on the affected AP get locked out.
Has anyone seen this odd behavior?
What causes discards?
Is this a sign of a bad radio?
I first thought virus, and this might very well be the case.
Any help would be appreciated.
I have seen this problem but with 5.7 Ghz AP. Every time we reboot an AP, SM or BH, prizm shows a spike on the traffic graph that can go as high as 100 Mbps but it does not last as long as you mentioned. May be that AP you have keeps rebooting.
Sounds like a broadcast storm.
Are you using the SM’s in NAT or Bridge Mode? If they are in bridge mode, turn on the IPv4 Multicast filter on all SM’s.
I turned on the filtering you recomended late yesterday afternoon after reading one of your earlier posts.
Exceptions: Bootp client, since we serve out IP’s via DHCP, and I had a question about SNMP.
Will filtering SNMP block Prizm or CNUT?
Thanks for your expertise on this forum. I was hoping you would lend a hand.
You can sniff that traffic when it happens… might help you diagnose what’s going on.
You’ll need to install “Wireshark” on your computer (formerly “Ethereal”)
Since you can determine the SM that causes the issue, you’ll also know the VLAN that this traffic is being generated from.
When the issue is happening, simply put your PC in the same VLAN on your Alcatel switch, then do a Wireshark packet capture for a short while.
Let me know if you need a hand getting that set up.
The last time this happend about 4:00 yesterday, 90% of all SM’s on all 14 AP’s showed in Prizm, the 5 Mbs traffic on their graphs.
That was the first time I saw more than one SM with the noise.
Only the 1 AP reflected the same traffic and discards.
So far, since I enacted the filtering Jerry recomended, no more bursts have occurred.
This behavior has been so sporadic, it may be days before I know if we caught it.
I have Ethereal. I will see if I can still remember how to use it.
A broadcast storm will show up on more than one SM, that’s why it’s so tough to find the source. By blocking the Multicast you stop the computers from being able to talk to each other within the network.
AFAIK, SNMP filter only blocks it from the customer looking out. Keeps little Joey Hacker from getting into your network.
You should be hearing from the culprit customer soon
Its happenning right now.
An hour ago, I saw a single SM running at 4.772 Mbs transmit and discards.
The AP did not see it.
I had trouble getting into his radio to set the filtering last night.
I always figured this guy unplugged his SM when he was not using.
Sure enough, he came up about an hour ago and wham.
I disco’d his 8.03 link and was able to set the filtering.
After reboot, I watched for a couple of scans and the traffic had stopped.
Right before I re-enabled his port, he called. While we were talking I saw his radio again show this anomaly.
How could this be? His Ethernet port is turned off.
Now, after the anomaly died down. I turned his port back on and 30 mutes later the AP shows the anomaly and I can’t find any SM in the same time frame with the traffic.
I am starting to doubt the data from Prizm.
It will show a customer down ( failedto poll) and when I go to the AP the customer will have 0 re-regs.
Keep in mind that SNMP is a UDP protocol, so (to quote one of Mythbuster Adam Savage’s tshirts) “Failure is always an option” You may have lost the UDP SNMP packet, so Prizm concludes the client is down, when it was just a dropped packet.
Still something to investigate, though, because unless there is network congestion or another problem of the network kind, you shouldn’t see a whole lot of lost UDP traffic…
Here is something I found on the 5.2 BH slave.
It coincides with the interuptions I see at the remote AP.
21:50:42 UT : 12/13/06 : File root.c : Line 946 Software Version : CANOPY 7.3.6 Oct 24 2005 12:06:56 BH-DES
21:50:42 UT : 12/13/06 : File root.c : Line 950 Software Boot Version : CANOPYBOOT 3.0
21:50:42 UT : 12/13/06 : File root.c : Line 956 FPGA Version : 070605
21:50:42 UT : 12/13/06 : File root.c : Line 960 FPGA Features : DES Sched
21:52:11 UT : 12/13/06 : File
C:/ISIPPC/pssppc.250/bsps/devices/whisp/syslog.c : Line 1062 Time set
03:52:38 UT : 12/15/06 : File C:/ISIPPC/pssppc.250/bsps/devices/whisp/syslog.c : Line 1062 Time set
09:53:18 UT : 12/16/06 : File C:/ISIPPC/pssppc.250/bsps/devices/whisp/syslog.c : Line 1062 Time set
15:53:57 UT : 12/17/06 : File C:/ISIPPC/pssppc.250/bsps/devices/whisp/syslog.c : Line 1062 Time set
21:54:37 UT : 12/18/06 : File C:/ISIPPC/pssppc.250/bsps/devices/whisp/syslog.c : Line 1062 Time set
03:55:17 UT : 12/20/06 : File C:/ISIPPC/pssppc.250/bsps/devices/whisp/syslog.c : Line 1062 Time set
09:55:56 UT : 12/21/06 : File C:/ISIPPC/pssppc.250/bsps/devices/whisp/syslog.c : Line 1062 Time set
The time stamps for each “Line 1062 Time Set” coincide with the anomalies.
The BH Master event log is clear.
I would call Moto. Be patient and let them lead you through all the troubleshooting steps.
You may have some type of legit bug that will get escalated if first level help desk cannot solve the problem.
I made level 2.
They determine a poor link quality with the Cyclone AP to the SM’s.
Something about the beam pattern of the cyclone Omni.
They could not explain the Ghost 5-10 Mbs traffic as shown in Prizm.
The “1062 Time set” line in the event log of the BHS was explained as a periodic adjustment inside the AP to TIME, due to the AP getting more than 5 seconds off from the timing master or BHM.
They saw no relevance to the SM’s getting blocked out of the AP.
Funny that all time stamps of this TIME SET, coincide with the anomaly.
So, I’m still reaching.
My next guess would be to put a CMM at the remote AP site and forgo passing sync through the Backhaul radios.
And of course the anomaly is becoming more regular.
When you see timestamps over and over in the event log this normally points to a GPS issue. In addition, if you are seeing the ethernet port flooded with discards then your radio is getting more data than it can process. So I would recommend turning on NAT for this client and put a sniffer at the CMM to see if you can determine what kind of traffic this customer is sending. If it is traffic on a particular port then you can use the custom filter option to filter out the traffic at his SM.
I at one time thought I had this anomaly narrowed down to one customer.
However, as discussed in this post, I turned his port off and the anomaly continued on his SM and the AP, according to Prizm.
The abnormaly high traffic and high discards, exactly coincide with the timestamps of the “1062 time set” lines in the event log in the BHS. (I think I said AP before)
Are you stating that the GPS issues are a result of some kind of customer traffic?
The remote AP, the BHS and the BHM do not have logs of “loss of sync” what other GPS issues could there be?
If the AP is losing GPS sync, could it cause all other AP’s and SM’s in range to see spikes in traffic?
I just looked at every AP and SM within the 7 mile range and during the anomalies, every Canopy radio (over 120 SM’s) saw a 500% spike in traffic and discards.
The spikes on the other radios only went up to 500Kbs but the times are right.
Only the SM’s on this particular AP are going offline and showing 5 Mbs.
However, they all see spikes regardless of the frequency setting.
We use 906,915 and 923 on the AP’s.
the SM’s are set to all three freqs.
Could the spikes be this remote AP transmitting without sync, causing interference the whole system can see?
For the first time yesterday, the remote AP event log shows a “Loss of Sync” as the first event of that day.
The time stamp matches the first time the anomaly occurred.
I have not been able to make that correlation before with loss of sync.
I’m assuming that the 900’s are all Omnis so you have a total of 3 AP’s, 1 at 906, one at 915, and a third at 923.
In theory these are all non-overlapping channels so they should not interfere with each other. However if you have an AP that is losing sync and there are other AP’s within 10 miles that are on the same frequency, they can be affected by the out-of-sync AP.
Confirm that all AP’s in your network are configured with identical
- Control Slots
- Max Range
- Downlink Ratio
Confirm your BHM’s are also configured the same.
Check the CMM logfile and confirm you are not getting any GPS or other errors.
Check cabling on all towers
Two tower sites are clusters of 6 AP’s with CMM.
On each tower the Freqs. are 906,915,923, 906,915 and 923.
The two on the same freq. are 180 degrees opposed. That was the recomended config from Moto.
The third is a cluster of two 60 degree AP’s with CMM.
The fourth is this single AP w/Cylcone Omni.
I just installed a CMM Micro there and it has been back up for 30 minutes.
If its is a sync problem I hope this action cures it.
I think you might be right.
Every time I rebooted the AP during the CMM install, the graph shows the spike to 5 Mbs.
To test it, I just did a reboot of the AP and sure enough a 5 Mbs spike.
If thats the case then when the “Line 1062 Time Set” message occurred on the BHS, the AP might have rebooted every time.
The event log on the AP does not reflect that much activity.
Untill yesterday, the AP event log did not show a reset or loss of sync.
If the AP reboots cause Prizm to show as bad traffic, Moto needs to document the behavior. I spent days and days looking for a mischievious customer.
The spikes you are seeing during reboot have nothing to do with high levels of traffic. When you are looking at the graph, you should spend a few minutes to think about how the software actually works. Every 5 minutes, your poller (prizm, cacti, etc) contacts the radio and asks it for the current octets (bytes) transfered on the ethernet port. If you reboot the radio, that number is most likely lower than it was 5 during the last 5 minute poll. So, cacti, prizm, etc have no choice but to assume that there was so much data transferred in the last 5 minute period that you must have wrapped around the 32 bit barrier (the SNMP counters are 32bit or 64bit, depending on design, and i’m pretty sure the moto ethernet counter is 32bit.)
So, depending on how far you were to the end of the 32bit counter, that’s what your traffic “spike” will look like when you’re done. Here’s a practical example… Let’s say you were at 2102467920 bytes, then you reboot and now you’re at 210022 bytes on the ethernet output octet count. Your poller now assumes that your data transfer was 4294967296 (2^32, the maximum possible value of an unsigned 32bit counter) minus the current 2102467920 bytes, plus the new value of 210022 bytes. That’s 2192709398 bytes over 5 minutes! So, that’s 17541675184 bits in 300 seconds, or 58472250 bits/sec, or 58Mbps on your graph, but only for that 5 minute period. Like, duh.
I think I got all that…
The problem has been resolved with the addition of the CMM Micro.
Apparently the sync was not passing properly via the backhaul.
No more time resets on the BH slave and no more re-regs of SM’s.
Thanks everybody for your help.