450M Unexplained reboots - No event log entry

Hello,

On two of our 450M AP 's I have seen a few reboots happen over the past 10 days or so.  No event's leading up to the reboot were logged in the Event log.  Also no start up log showing the AP's System Startup.  Has anyone else noticed their 450M AP's rebooting for no apparent reason?  This seems to have started with v 15.1.3.  Thus far I have noticed this twice on one 450M and one time on another.

Normally I would expect to see something like this:

******System Startup******
System Reset Exception -- User Initiated Reset
Software Version : CANOPY 15.1.3 (Build BETA-7) AP-DES
Software Boot Version : CANOPYBOOT 1.0
Board Type : P14
Board Temperature : 0 C / 32 F
Device Setting : 5.7GHz MU-MIMO OFDM - Access Point - 0a-00-3e-60-xx-xx - 5740.0 MHz - 30.0 MHz - 1/16 - CC 85 - 2.5 ms - North America/United States
FPGA Version : 0b1e75
FPGA Features : DES, Sched, US/ETSI;

1 Like

Following this as we are seeing the same thing.

Yes, with version 15.1.3 there are many ghost restarts. There is no trace anywhere except in uptime.
For now it seems to me that the thing happens on the Medusa AP.
Also I have lots of reboots for Fatal Error:

90670FC8 63E70FC6 A7A516DD 2C0C12CC 8B5F7ACF 3E864D39 9C721BE8 9149D106
88BFF30B E9C8A782 B028B655 050067FC EED6C462 F7669620 02FBA5FC DBA833C4
01/12/2018 : 11:26:03 UTC : :FatalError()

****SYSTEM STARTUP******
System Reset Exception -- User Initiated Reset
Software Version : CANOPY 15.1.3 (Build BETA-7) AP-AES
Software Boot Version : CANOPYBOOT 1.0
[…]
01/16/2018 : 13:27:13 UTC : :NiFreeBuf(): Invalid NiBuf:7609b800 Hdr:c532 on Src:44876 SrcIF:236 Dst:1246 Alc:60136 Len:48473 Cop:128

 

STAT 62 ( 0%) 0 0 0 254293660 ( 8192/ 3%/46%) Ready 0x8000584
IDLE 63 ( 0%) 0 0 0 938546787 ( 8192/ 3%/46%) Ready 0x8000584
PRI PC ID
----------------
01/01/2016 : 02:00:00 CEST : :

01/01/2016 : 00:00:00 UTC : :Time Set
01/01/2016 : 02:00:00 CEST :
******System Startup******
System Reset Exception -- Watchdog Reset
Software Version : CANOPY 15.1.3 AP-AES
Board Type : P12

 


0x76f0ef58: 00000030 00000000 00000000 00000000
0x76f0ef68: 00000000
02/07/2018 : 10:05:31 UTC : :FatalError()
02/07/2018 : 10:05:32 UTC : :Forced reset;
02/07/2018 : 10:05:33 UTC :
******SYSTEM STARTUP******
System Reset Exception -- Reset due to FatalError
Software Version : CANOPY 15.1.3 AP-AES
Software Boot Version : CANOPYBOOT 1.0
Board Type : P14
Board Temperature : 0 C / 32 F

Are you leaning towards this being firmware related?

Well, I think it's either 15.1.3 or it's the 30Mhz channel.  One 450M has been up for 13 months and the other 8 months, and this didn't happen until ~10 days ago.  We loaded 15.1.3 on 12/13/17 to one AP and the other on 1/19/18.  Looking back at my logs I see another unintended reboot, but there was some errors just before the reboot (but no System Startup entry).  Here is the log just before the unintended reboot:

02/25/2018 : 01:11:15 CST : :Invalid NiBuf:0x754b5000 Hdr:5866 on Q:0x10ab2fa8 Src:38 SrcIF:2 Dst:39779 Alc:7967 Len:153 Cop:31
02/25/2018 : 01:11:15 CST : :Buffer:
AllocLUID 7967
BufLen 153
CpyCnt 31
DstLUID 39779
MsgType 1
NiBufSignature x5866
SrcAP 0
RtIF 31
SrcIF 2
SrcLUID 38
Rcvd RSSI 64 V 67 H
VLAN VID 2735
1A003E60 34891A00 3EB2B740 08004500 00875D28 00004011 5A0D0A85 57260A85
570107D1 07D10073 27E52802 00FD0026 00231C00 00000800 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00005800 00000000 00000000 00002630
613A3030 3A33653A 62323A62 373A3430 00000000 003D62AD B3300000 00000000
7BB1ACD2 02DC25FB 015630D0 AE78CDD6 07C06E26 2850C2F5 10990026 81000AAF
C68ACCA8 E16422C1 1C81FEB7 63E8A131 0E99AA43 AD695092 EABD312D 6DE30032
1EC44785 D938B7C7 24D560B5 76481FB9 9922B3DD FF89316E C9C515F8 98397825
02/25/2018 : 01:11:15 CST : :FatalError()

Hello,

I'm very sorry to hear about this problem.

If any of you can go to IPADDRESS/field_diags.cgi on the affected units, download the file and send it to me andrew.rimmer@cambiumnetworks.com

I will make sure it is looked at as soon as possible.

I think we already have some from Tuvix_IT, but we have not been able to reproduce the problem here yet.

best regards,

Andy.


I think we already have some from Tuvix_IT, but we have not been able to reproduce the problem here yet.


That's right, I've already sent all the possible files to Bart, Venkata and Alan. If you need other files I keep sending them ;)

1 Like

any updates to this?


@gouda wrote:

any updates to this?


We have not managed to reproduce it here in the lab yet.

The diags are useful in tracking any common element between APs that have the problem.

It may have something to do with QinQ and VLAN setup, but we are still investigating.

I will update as soon as we have any more information.

best regards,

Andy.

Well I will throw out there it must have something to do with firmware over 30mhz channel, as we do not have any channels greater than 20mhz.

We have 18 radios so far in the field and have seen a few stray reboots. Now I did have one start rebooting last week in pretty quick cycles. We powered it off, got it replaced and now I have it running at the office to see if it starts rebooting.

Could this have something to do with a traffic type that hits the radios?

1 Like

Thanks Kendal,

We now have reports of this fault at 20, 30 and 40MHz. There is a hint in the diags (reported contents of the Nibuff) that it is something to do with QinQ, but we are still looking.

Thank you (and everyone else) for sending a set of diags you collected.

Be assured that finding the cause and a fix is our highest priority, the VP of engineering is keeping a close eye on this one.

best regards,

Andy.

Status Update:

We're still working on it....

We have deployed builds with extra debug in to customers affected by this issue.

We may have to go through a few iterations of special debug loads to find the root cause.

I will update again in a couple of weeks or if we find the root cause.

A fix is expected in 15.2 release s/w.

I'd also like to report we too are seeing reboots on our 450m AP. We only have one AP up currently. It's been up about 2 weeks and has rebooted 3 times now.

AP is running 15.1.3 with 20mhz and has vlan enabled. 126 total connected SM's. Let me know if you would like more details.

1 Like

@ericmulewrote:

I'd also like to report we too are seeing reboots on our 450m AP. We only have one AP up currently. It's been up about 2 weeks and has rebooted 3 times now.

AP is running 15.1.3 with 20mhz and has vlan enabled. 126 total connected SM's. Let me know if you would like more details.


Hi ericmule,

Thank you for the offer. I think we are getting close to a fix for this. We have been running development/debug builds on a couple of networks. These included a fix and debugging counters for if the condition that we think caused the reboot occured again. Over the last week the counters have incremented and the units have not rebooted. So it's looking promising.

The fix will be in 15.2 official release (which should be less than a month away). If anyone doesn't want to wait for this then 15.1.1. should be stable and not have the reboot issue. If you needed  a feature from 15.1.3 (e.g. 30/40MHz or 5ms frame) and can't possibly wait for 15.2 then contact me and I'll see what I can do about getting a closed beta release made available after Easter.

best regards,

Andy.

1 Like

The closed beta should be available now if you contact support (tier 1 should have been informed) or through your RTM. It's not open beta as it has not had as much SIT testing time as usual yet.

This load will have the following

450m reset fix (NiBuff error)

450m GPS coordinate display fix

450b improvements/bugfixes

430 SM first registration fix for moving from 14.x or earlier s/w

best regards,

Andy

The official 15.1.5 release includes the fix for this bug.

Thank you to all who helped us diagnose it.

regards,

Andy.

1 Like

I've been experiencing reboots on 2 brand new 450m APs we recently purchased.  It happens on 15.1.1, 15.1.3, and 15.1.5.  I have only tried 15MHz and 20MHz channels, but it happens with both.  The only way I can get the APs to stay up more than 10 minutes is to select "none" as the frequency carrier.  Below is a little of the event log.  Please advise.

01/01/2016 : 00:00:01 UTC : 
******System Startup****** 
System Reset Exception -- Power-On Reset 
Software Version : CANOPY 15.1.5 AP-None
Software Boot Version : CANOPYBOOT 1.0
Board Type : P14
Board Temperature : 0 C / 32 F
Device Setting : 5.7GHz MU-MIMO OFDM - Access Point - 0a-00-3e-60-46-81 - 5795.0 MHz - 15.0 MHz - 1/16 - CC 0 - 5.0 ms - North America/United States
FPGA Version : 031576
FPGA Features : DES, Sched, US/ETSI;
01/01/2016 : 00:00:07 UTC : :RFSync: EVENT = 4 STATE = 0
05/03/2018 : 14:58:09 UTC : :Time Set
01/01/2016 : 00:00:01 UTC : :
01/01/2016 : 00:00:00 UTC : :Time Set

01/01/2016 : 00:00:01 UTC : 
******System Startup****** 
System Reset Exception -- Power-On Reset 
Software Version : CANOPY 15.1.5 AP-None
Software Boot Version : CANOPYBOOT 1.0
Board Type : P14
Board Temperature : 0 C / 32 F
Device Setting : 5.7GHz MU-MIMO OFDM - Access Point - 0a-00-3e-60-46-81 - 5795.0 MHz - 15.0 MHz - 1/16 - CC 0 - 5.0 ms - North America/United States
FPGA Version : 031576
FPGA Features : DES, Sched, US/ETSI;
01/01/2016 : 00:00:07 UTC : :RFSync: EVENT = 4 STATE = 0
05/03/2018 : 15:03:55 UTC : :Time Set
01/01/2016 : 00:00:01 UTC : :
01/01/2016 : 00:00:00 UTC : :Time Set

01/01/2016 : 00:00:01 UTC : 
******System Startup****** 
System Reset Exception -- Power-On Reset 
Software Version : CANOPY 15.1.5 AP-None
Software Boot Version : CANOPYBOOT 1.0
Board Type : P14
Board Temperature : 0 C / 32 F
Device Setting : 5.7GHz MU-MIMO OFDM - Access Point - 0a-00-3e-60-46-81 - 5795.0 MHz - 15.0 MHz - 1/16 - CC 0 - 5.0 ms - North America/United States
FPGA Version : 031576
FPGA Features : DES, Sched, US/ETSI;
01/01/2016 : 00:00:07 UTC : :RFSync: EVENT = 4 STATE = 0
05/03/2018 : 15:08:09 UTC : :Time Set
01/01/2016 : 00:00:01 UTC : :
01/01/2016 : 00:00:00 UTC : :Time Set

01/01/2016 : 00:00:01 UTC : 
******System Startup****** 
System Reset Exception -- Power-On Reset 
Software Version : CANOPY 15.1.5 AP-None
Software Boot Version : CANOPYBOOT 1.0
Board Type : P14
Board Temperature : 0 C / 32 F
Device Setting : 5.7GHz MU-MIMO OFDM - Access Point - 0a-00-3e-60-46-81 - 5795.0 MHz - 15.0 MHz - 1/16 - CC 0 - 5.0 ms - North America/United States
FPGA Version : 031576
FPGA Features : DES, Sched, US/ETSI;
01/01/2016 : 00:00:07 UTC : :RFSync: EVENT = 4 STATE = 0
05/03/2018 : 15:13:34 UTC : :Time Set

Hi Zach,

Sorry to hear this.

Could you go the the AP and capture IPADDRESS/field_diags.cgi

and send those to me at

andrew.rimmer@cambiumnetworks.com

I will get these looked at as soon as possible for clues as to the problem.

If you could raise a support ticket as well that would be great, to help if others have the same problem and if there is a need for an RMA.

[edit]

Those all look like power cycle reboots, how are you powering the 450m and do you have an alternative ?

Selecting frequency 'none' will effectively disable the TX and will have a lower power draw.

best regards,

Andy

1 Like

Update on Zach's issue.

After emails it looks like this was a problem with trying to use an underrated power supply.

Hopefully Zach can confirm once the correct power source is used.

regards,

Andy.

1 Like

We’re also experiencing similar issues with our 2 3.6 Medusa APs.

12/31/2019 : 17:00:06 MST : :RFSync: EVENT = 4 STATE = 0
09/23/2021 : 10:39:50 MST : :Time Set
09/23/2021 : 10:40:20 MST : :Board Temperature: 16 C / 60 F

AP would reboot every 30 to 40 mins. Our East AP was working good at first with 10 SMs connected but it started acting up when we start connecting SMs to the 2nd AP. Tried updating the firmware to v20.2.2.1 but no luck. East AP eventually went to Radio-Not-Calibrated mode.

Cambium_AndyRCambium

May '18

Update on Zach’s issue.

After emails it looks like this was a problem with trying to use an underrated power supply.

Are Medusas limited to only work with CMM5?