Continual reboot of SM

I have a SM that reboots itself every minute or so for several times then will go several minutes and start all over again. Anyone have any ideas? Also I have a couple of other SM that seem to just quit passing traffic. I can still ping them and log on to them but they won’t pass traffic until they’re rebooted. Anyone else experiencing these problems and what’s the fix?

Thanks

For the ones that seem to stop passing traffic…are you using NAT and are the customers behind the SM using VPN clients? Under normal circumstances can the client behind the SM ping the SM (or is it hidden to them via your IP addressing scheme)? If they can ping it under normal circumstances can they also ping it when it seems to stop passing traffic? Are you using a CMM for timing? Do these issues seem to occur on SM’s connected to a particular AP or multiple AP’s?

I haven’t heard of the SM reboot issue so the SM may need to be RMA’ed. What version of software/boot/fpga are you running and what band radio?

We are not using NAT on any of our SM’s and this customer has his own router and all of the radioes are hidden on a seperate IP address range. I don’t believe the customer is using VPN’s at all. When this failure occurs we are able to ping the radio across the air link as well as get on the web interface of the SM and everything looks ok. All of my SM currently are 5.2 running software 4.2.1, Boot ver 2.5 and FPGA 06240308 (Single, 40Mhz, ExtBus, DES, Type0). I believe there are two SM’s that are acting this way but I need to verify the second SM and if I remember correctly it may in fact be off the same AP.

The continual rebooting SM is off the same AP as the above.

bhaggard wrote:
I have a SM that reboots itself every minute or so for several times then will go several minutes and start all over again. Anyone have any ideas? Also I have a couple of other SM that seem to just quit passing traffic. I can still ping them and log on to them but they won't pass traffic until they're rebooted. Anyone else experiencing these problems and what's the fix?

Thanks


I am having the same issue but with a 5.2 AP is AES trim. I can ping and access the web utility, but that's it. Rebooting it corrects it and it only happens once in a while. The other three APs in the cluster on the CMM have not had this problem. If it happens again, I will RMA the AP.

Other than changing the IP and color code, the system is at default.

Dan

Here’s several entries in the event log on the reboots. Anyone have any idea what’s causing this.

***
14:54:38 UT : 12/14/04 : File root.c : Line 874 System Startup
14:54:38 UT : 12/14/04 : File root.c : Line 879 Software Version : CANOPY4.2.1 Apr 16 2004 15:23:05 SM-DES
14:54:38 UT : 12/14/04 : File root.c : Line 883 Software Boot Version : CANOPYBOOT 2.5
14:54:38 UT : 12/14/04 : File root.c : Line 887 FPGA Version : 09110208
14:54:39 UT : 12/14/04 : File root.c : Line 891 FPGA Features : DES
14:56:36 UT : 12/14/04 : File C:/ISIPPC/pssppc.250/bsps/devices/whisp/syslog.c : Line 1055 Time set
14:56:36 UT : 12/14/04 : File C:/ISIPPC/pssppc.250/bsps/devices/whisp/syslog.c : Line 1055 Time set
14:56:36 UT : 12/14/04 : File C:/ISIPPC/pssppc.250/bsps/devices/whisp/syslog.c : Line 914 System Reset Exception – External Hard Reset WatchDog Cur ExtInt 137 Max ExtInt 275 Cur DecInt 82 Max DecInt 315 Cur Sync 47 Max Sync 63 Cur LED 0 Max LED 1 Cur EthXcvr 0 Max EthXcvr 1 Cur FEC 10 Max FEC 53 Cur FPGA 35 Max FPGA 100 Cur FrmLoc 92 Max FrmLoc 132 AAState 0
14:56:36 UT : 12/14/04 : File root.c : Line 874 System Startup
14:56:37 UT : 12/14/04 : File root.c : Line 879 Software Version : CANOPY4.2.1 Apr 16 2004 15:23:05 SM-DES
14:56:37 UT : 12/14/04 : File root.c : Line 883 Software Boot Version : CANOPYBOOT 2.5
14:56:37 UT : 12/14/04 : File root.c : Line 887 FPGA Version : 09110208
14:56:38 UT : 12/14/04 : File root.c : Line 891 FPGA Features : DES
14:58:27 UT : 12/14/04 : File C:/ISIPPC/pssppc.250/bsps/devices/whisp/syslog.c : Line 1055 Time set
14:58:52 UT : 12/14/04 : File C:/ISIPPC/pssppc.250/bsps/devices/whisp/syslog.c : Line 1055 Time set
14:58:52 UT : 12/14/04 : File C:/ISIPPC/pssppc.250/bsps/devices/whisp/syslog.c : Line 914 System Reset Exception – External Hard Reset WatchDog Cur ExtInt 156 Max ExtInt 238 Cur DecInt 84 Max DecInt 199 Cur Sync 34 Max Sync 80 Cur LED 0 Max LED 1 Cur EthXcvr 0 Max EthXcvr 1 Cur FEC 23 Max FEC 42 Cur FPGA 49 Max FPGA 79 Cur FrmLoc 85 Max FrmLoc 132 AAState 0
14:58:52 UT : 12/14/04 : File root.c : Line 874 System Startup
14:58:53 UT : 12/14/04 : File root.c : Line 879 Software Version : CANOPY4.2.1 Apr 16 2004 15:23:05 SM-DES
14:58:53 UT : 12/14/04 : File root.c : Line 883 Software Boot Version : CANOPYBOOT 2.5
14:58:53 UT : 12/14/04 : File root.c : Line 887 FPGA Version : 09110208
14:58:54 UT : 12/14/04 : File root.c : Line 891 FPGA Features : DES
15:00:16 UT : 12/14/04 : File C:/ISIPPC/pssppc.250/bsps/devices/whisp/syslog.c : Line 1055 Time set
15:00:17 UT : 12/14/04 : File C:/ISIPPC/pssppc.250/bsps/devices/whisp/syslog.c : Line 1055 Time set
15:00:17 UT : 12/14/04 : File C:/ISIPPC/pssppc.250/bsps/devices/whisp/syslog.c : Line 914 System Reset Exception – External Hard Reset WatchDog Cur ExtInt 128 Max ExtInt 266 Cur DecInt 76 Max DecInt 197 Cur Sync 45 Max Sync 58 Cur LED 0 Max LED 1 Cur EthXcvr 0 Max EthXcvr 1 Cur FEC 12 Max FEC 46 Cur FPGA 29 Max FPGA 88 Cur FrmLoc 87 Max FrmLoc 133 AAState 0
15:00:17 UT : 12/14/04 : File root.c : Line 874 System Startup
15:00:18 UT : 12/14/04 : File root.c : Line 879 Software Version : CANOPY4.2.1 Apr 16 2004 15:23:05 SM-DES
15:00:18 UT : 12/14/04 : File root.c : Line 883 Software Boot Version : CANOPYBOOT 2.5
15:00:18 UT : 12/14/04 : File root.c : Line 887 FPGA Version : 09110208
15:00:19 UT : 12/14/04 : File root.c : Line 891 FPGA Features : DES
15:01:22 UT : 12/14/04 : File C:/ISIPPC/pssppc.250/bsps/devices/whisp/syslog.c : Line 1055 Time set


Thanks

We have error logs that look simliar to this on our SM’s also. We are not using NAT on any of them, and some are passing VPN traffic, and some are not. The strange thing is that the times that the log reports the SM to be rebooting does not match up with the “Up-Time” of the SM on the status page.

We are using all 5.7 gear with DES. Our errors do not occur as often as some of the others are mentioning in this thread, but we still have them. It does not seem to have any affect (effect?) on system performance. It’s almost like the log just randomly places these messages in there when in-fact the SM is not actually rebooting.

Unfortunately this is rebooting and the uptime does reflect the actual uptime. I’m not sure what’s causing this but I’m going to bring the code up to 4.2.3 and maybe the boot code to 3.0 and see what that does. Due to weather conditions we cannot replace the radio immediately. I’ll keep everyone posted on the progress.

I had a similar issue with some devices in a CMM gen2 cluseter (see clip from event log below). While I’m not 100% certian that I discovered the root cause, I was able to eliminate the problem by replacing the UPS providing power to the CMM.

This UPS was clipping voltage and I have not seen this issue in this cluster since replacing the UPS.

Tim

Event Log Clip:
04:28:08 UT : 09/26/04 : File root.c : Line 874 System Startup
04:28:08 UT : 09/26/04 : File root.c : Line 879 Software Version : CANOPY4.2.1 Apr 16 2004 15:23:05 BH20-DES
04:28:08 UT : 09/26/04 : File root.c : Line 883 Software Boot Version : CANOPYBOOT 2.5
04:28:08 UT : 09/26/04 : File root.c : Line 887 FPGA Version : 06240318
04:28:08 UT : 09/26/04 : File root.c : Line 891 FPGA Features : DES
04:28:08 UT : 09/26/04 : File C:/ISIPPC/pssppc.250/bsps/devices/whisp/syslog.c : Line 1055 Time set
04:28:08 UT : 09/26/04 : File C:/ISIPPC/pssppc.250/bsps/devices/whisp/syslog.c : Line 914 System Reset Exception – External Hard Reset WatchDog Cur ExtInt 20 Max ExtInt 21 Cur DecInt 46 Max DecInt 33 Cur Sync 33 Max Sync 10 Cur LED 0 Max LED 1 Cur EthXcvr 0 Max EthXcvr 1 Cur FEC 0 Max FEC 7 Cur FPGA 20 Max FPGA 21 Cur FrmLoc 0 Max FrmLoc 0 AAState 0
04:28:08 UT : 09/26/04 : File root.c : Line 874 System Startup
04:28:08 UT : 09/26/04 : File root.c : Line 879 Software Version : CANOPY4.2.1 Apr 16 2004 15:23:05 BH20-DES
04:28:08 UT : 09/26/04 : File root.c : Line 883 Software Boot Version : CANOPYBOOT 2.5
04:28:08 UT : 09/26/04 : File root.c : Line 887 FPGA Version : 06240318
04:28:08 UT : 09/26/04 : File root.c : Line 891 FPGA Features : DES
04:28:08 UT : 09/26/04 : File C:/ISIPPC/pssppc.250/bsps/devices/whisp/syslog.c : Line 1055 Time set
04:28:08 UT : 09/26/04 : File C:/ISIPPC/pssppc.250/bsps/devices/whisp/syslog.c : Line 914 System Reset Exception – External Hard Reset WatchDog Cur ExtInt 25 Max ExtInt 18 Cur DecInt 47 Max DecInt 33 Cur Sync 45 Max Sync 5 Cur LED 0 Max LED 1 Cur EthXcvr 0 Max EthXcvr 1 Cur FEC 1 Max FEC 6 Cur FPGA 24 Max FPGA 18 Cur FrmLoc 0 Max FrmLoc 0 AAState 0
04:28:08 UT : 09/26/04 : File root.c : Line 874 System Startup
04:28:08 UT : 09/26/04 : File root.c : Line 879 Software Version : CANOPY4.2.1 Apr 16 2004 15:23:05 BH20-DES
04:28:08 UT : 09/26/04 : File root.c : Line 883 Software Boot Version : CANOPYBOOT 2.5
04:28:08 UT : 09/26/04 : File root.c : Line 887 FPGA Version : 06240318
04:28:08 UT : 09/26/04 : File root.c : Line 891 FPGA Features : DES
04:28:08 UT : 09/26/04 : File C:/ISIPPC/pssppc.250/bsps/devices/whisp/syslog.c : Line 1055 Time set
04:28:08 UT : 09/26/04 : File C:/ISIPPC/pssppc.250/bsps/devices/whisp/syslog.c : Line 914 System Reset Exception – External Hard Reset WatchDog Cur ExtInt 3 Max ExtInt 22 Cur DecInt 19 Max DecInt 33 Cur Sync 9 Max Sync 48 Cur LED 0 Max LED 1 Cur EthXcvr 0 Max EthXcvr 1 Cur FEC 0 Max FEC 7 Cur FPGA 3 Max FPGA 21 Cur FrmLoc 0 Max FrmLoc 0 AAState 0
04:28:08 UT : 09/26/04 : File root.c : Line 874 System Startup
04:28:08 UT : 09/26/04 : File root.c : Line 879 Software Version : CANOPY4.2.1 Apr 16 2004 15:23:05 BH20-DES
04:28:08 UT : 09/26/04 : File root.c : Line 883 Software Boot Version : CANOPYBOOT 2.5
04:28:08 UT : 09/26/04 : File root.c : Line 887 FPGA Version : 06240318
04:28:08 UT : 09/26/04 : File root.c : Line 891 FPGA Features : DES
04:28:08 UT : 09/26/04 : File C:/ISIPPC/pssppc.250/bsps/devices/whisp/syslog.c : Line 1055 Time set
04:28:08 UT : 09/26/04 : File C:/ISIPPC/pssppc.250/bsps/devices/whisp/syslog.c : Line 914 System Reset Exception – External Hard Reset WatchDog Cur ExtInt 9 Max ExtInt 29 Cur DecInt 27 Max DecInt 33 Cur Sync 31 Max Sync 60 Cur LED 0 Max LED 1 Cur EthXcvr 0 Max EthXcvr 1 Cur FEC 0 Max FEC 6 Cur FPGA 9 Max FPGA 28 Cur FrmLoc 0 Max FrmLoc 0 AAState 0
04:28:08 UT : 09/26/04 : File root.c : Line 874 System Startup
04:28:08 UT : 09/26/04 : File root.c : Line 879 Software Version : CANOPY4.2.1 Apr 16 2004 15:23:05 BH20-DES
04:28:08 UT : 09/26/04 : File root.c : Line 883 Software Boot Version : CANOPYBOOT 2.5
04:28:08 UT : 09/26/04 : File root.c : Line 887 FPGA Version : 06240318
04:28:08 UT : 09/26/04 : File root.c : Line 891 FPGA Features : DES
04:28:08 UT : 09/26/04 : File C:/ISIPPC/pssppc.250/bsps/devices/whisp/syslog.c : Line 1055 Time set
04:28:08 UT : 09/26/04 : File C:/ISIPPC/pssppc.250/bsps/devices/whisp/syslog.c : Line 914 System Reset Exception – External Hard Reset WatchDog Cur ExtInt 3 Max ExtInt 18 Cur DecInt 25 Max DecInt 33 Cur Sync 4 Max Sync 30 Cur LED 0 Max LED 1 Cur EthXcvr 0 Max EthXcvr 1 Cur FEC 1 Max FEC 5 Cur FPGA 2 Max FPGA 17 Cur FrmLoc 0 Max FrmLoc 0 AAState 0
04:28:08 UT : 09/26/04 : File root.c : Line 874 System Startup

This is on a SM and it does not have a UPS on it. However you bring up an idea. I wonder if the power brick is failing and then recovers only to fail again. I’ll try the power brick first then replace the radio and go from there. Thanks for the idea. Hopefully this will fix it otherwise I’ll replace the radio and go back to square one.

Bob,

I’ve also improved some AP - SM links by cleaning up the client’s power. Getting the brick directly into the wall is always a good idea and I’ll leave a $2 power strip to free up that bottom outlet.

Tim

We had been experiencing identical symptoms (continual rebooting of the SM). Replacing the SM and the power supply did not fix the problem.

The power supply was plugged into a plain old power strip, which was plugged into a UPS. Plugging it in to a UPS directly, it seems to have fixed the problem (at least for the last 20 minutes).

Just wanted to let you know I am having the exact same problem with one SM constanly rebooting. This is a site that has been working just fine for a year and has now suddenly started doing this.

Is there any reason why poor alignment would cause this? Makes no sense taht it would, but that is the only thing that has happened to this site. The radio was knocked out of alignment, and I helped a customer dial it is as much as they could over the phone (very remote client). And they are up and running, but the unit reboots itself very frequently. We need to do a truck roll, but I am trying to get a better understanding of all teh factors involved before we trek up there.

I will definitely take a new power brick with me and will check the Cat-5 wiring to the SM. Could be we have a nick in the wire or something that is barely maintaining contact for the power supply and perhaps is loosing contact occasiobally due to movement or constriction/expansion due to temp changes.

Will let you know if I find anything out.

I’ve had 2 somewhat similar failures. The one I started this thread with was working fine until a big windstorm came up and knocked it out of alignment. I went out and repositioned the best I could (didn’t have a ladder at the time) and it connected to the AP. However, that’s when we began to notice the reboot. I finally replaced the radio and power brick completely and realigned the radio and it’s been rock solid ever since.

The second we’re replacing even as I type this. We could get to the radio interface but the ethernet link kept dropping in and out. It appears to be the radio ethernet interface though we’ll know shortly after they finish replacing it.

I still haven’t upgraded to 4.2.3 though I just downloaded CNUT and 6.1 and may go direct to that version and see if it changes anything.

Yes, in my case proper alignment seemed to have fixed the constant reboot problem as well. It seems like if the radio is just barely aligned, it keeps restarting itself. Not sure why that would be, but it seems from my experience and yours that there is something to this.

Yes there seems to be something there however after I realigned it continued to reboot so we replaced it and then had to align it again. We may have been better at it the second time around. I’m going to upgrade the radio to 6.1 and see if it will reboot itself on the bench then go from there.

Have you or anyone else on this thread used the ‘tone’ for alignment? I haven’t tried it with the new code which I think is supposed to improve it. Just wondering if it works better than just watching the rssi and jitter at time of alignment.

bob

I have played around with the tone method, but I don’t like it very much because there is no way to know what exactly you ar egetting. Yes, you can hear the change in the tone and therefore aim it as best as you could, meaning wher you get the higher pitched noise or the rapid cadence, etc., but until you fire it up on a computer you cannot tell what you have actually ended up with.

I suppose someone out there is going to claim that they can tell by the picth or the cadence what the actual RSSI or jitter is, but I would not believe that, unless you spend all day doing that. And for us, many of our installs are difficult installs, meaning that we often have to hunt around for the best location on a rooftop or on a wall to get proper signal, so it is very imporatnt that I can see the actual RSSI and specially the jitter.

I am looking into how you can use a PDA to do this insetad of having to lug my laptop on top of 40’ ladders! I am going to post another topic asking about that…

I have had this happen on two diffarent occasions and resolved both. alignment is a possability and so is the power issue. but in both of my cases it was an ap problem. the first was a timing issue the timing cable at the ap had a bad connection i replaced the connector and solved the problem. On the second I eventually had to replace the ap itself, it could have been a timing issue also but internally. no more reboots.

But correct me if I am wrong, if it is an AP issue, then you would see this on ALL SM’s registered to that AP, no?