Intermittency issues without session drops. I needz help.

A significant portion of the 16 SMs connected to one of our 900 Mhz omni APs recently started losing connectivity to the internet… but not to the AP. O_o Where should I start troubleshooting this?

We use a monitoring system called Cacti which refreshes every 5 minutes. On the AP page in Cacti the Registered SM Count is a straight-line graph (no one dropping); this is borne out for the most part by the Session Status page in the AP, which mostly shows Session Uptimes that any 900 Mhz Canopy deployer would be happy with. Even when it does show SMs dropping, the drops are less frequent than what our customers are reporting. Start clicking on individual SMs in Cacti, however, and their graphs tell a different story. About half of them show significant intermittency (SNMP poll is timing out, I suppose).

One of our customers shows nearly constant traffic in the 300 to 400 kbps range; that is, the traffic is nearly constant on the AP traffic graph in Cacti. When I go to his individual traffic graph, however, the graph has many empty spaces (which would normally indicate signal drops), though what remains of it shows the correct amount of traffic. Pinging him shows average latency of around 788 ms with 17% packet loss. Are high ping times common when someone is using a fair amount of bandwidth?

So far I’ve replaced the AP and messed around with the settings a bit. Not that I have extensive training, but the only thing I can find that might have anything to do with this is in the 5.7 BH slave RF stats. There I see a Corrupt Data Count of 6209.

APs and SMs running software ver. 9.5

From the AP to us:
1st BH pair running ver. 7.2.9
2nd BH pair are Motorola PTP 58500 Lites running ver. 58500-03-02
All my equipment are belong to the Motorola brand name. :smiley:

sounds like the sm’s are re-regging and also sounds like interference issue. run link tests on sm’s. if they are below 90% then you’ve got an alignment or interference issue. 700+ms is way way way to high. should never be more than 20ms from ap to sm. anything above 60-80ms from ap to sm indicates an problem with the rf side. also check ethernet statistics on bh’s and ap’s to make sure you dont have a flak ethernet port or duplex mismatch

But that’s what I mean. They are NOT re-regging. At least not as often as customers and our monitoring system are indicating. For example, I am logged into a customer right now (the aforementioned bandwidth muncher) whose Session Uptime show that he has been up since I rebooted the AP 2 and a half hours ago, yet Cacti shows him being down for at least half that time. Link Test gives me 99 down/70 up. This Link Test result is normal for many of our customers and we have never had a problem. When I ping someone else with similar numbers on the same AP I get 0% packet loss. Also keep in mind that the ping is from our servers, not from the AP.

Put simply my main question is this: why would half the customers on a particular AP/BH pair lose their connection to the internet but not the AP?

Hmm well did you actually look on the AP itself under the sessions tab to see if the session list showed re-regs for the problem sm? Just in case cacti is reporting incorrectly for some reason. just a thought. Even from your servers (which i assume are on the local network) ping times should definitely not be 700ms or even more 150ms max. that would indicate a routing/router/switch problem. what i would try is run a ping test from a telnet session in the ap itself and watch the ping times to rule out a router/routing/switch problem. Does the cmm ethernet statistics show any errors on the ports? I did have a similar problem once with an antenna that was going bad and all tests indicated the antenna was fine but we replaced it anyways because we could not find any other problem. And with the new antenna the problem was resolved. Run a link test on the 5.7 bh. Is the PTP500 constantly changing modulation rates? need more info on specifics of what exactly is going on and to whom. IE. are all customers on the ap having issues or just one. what exactly is happening on the customer end? ie. high ping times, no access through sm, etc.

I’d say it’s a connection problem.

I disable that 1 client that is generating all the traffic and see if that fixes the rest of the clients.

Could be that 1 client is opening so many P2P connections that it’s saturating the AP, which cause the return times to sky rocket which makes it appear like the clients can not get online.

those programes rely on a simple ping to determine if someone is up or down, if you radio is getting overloaded or you have ANY trasmission problems at all on your network this will happen. you need to check to see if you are getting out packet discards and if you take steps to correct it, limit excessive usage on the AP or install a 2nd AP and the problem will go away.