Force 200 rebooting

dcshobby · January 30, 2017, 3:57pm

Hello. We just replaced a Nanobridge backhaul with a Force 200 5ghz link. It is fed from a Mikrotik RB750up and is receiving 22.5V at this time, well within the 10 to 30 V range the spec sheet lists.

Anyway, since replacing the radio, it has rebooted itself 15 times in less than 1 hour. The old radio never did this and the Rocket M5 powered by the same Mikrotik is not rebooting itself and has been up for 96 days, same as the nanobridge it replaced.

Any ideas why the Force 200 would be rebooting itself randomly? It doesn't appear to be a power issue since the other gear is fine and the voltage is within range of the spec sheet.

Running 3.2.1 firmware on it.

Ideas? Thanks

dcshobby · January 30, 2017, 6:44pm

Jan 30 12:42:22 mn DEVICE-AGENT[5296]: send_trap_info: Wake up JSON Object: [{"msgType": 699, "eId": "STA_REG", "eTime": 1485801742, "varPayload": {"mac": "00:04:56:FE:71:81"}}]
Jan 30 12:42:22 mn DEVICE-AGENT[5296]: ping call back
Jan 30 12:42:22 mn DEVICE-AGENT[5296]: PING_DATA: len=68 msg [{"Message":"ping from device agent", "Pid": "5296", "PongLoss": "0"}]
Jan 30 12:42:22 mn DEVICE-AGENT[5296]: service_ping: Ping enqueued
Jan 30 12:42:22 mn DEVICE-AGENT[5296]: send new ping after 15 seconds
Jan 30 12:42:22 mn DEVICE-AGENT[5296]: callback_websocket: LWS_CALLBACK_CLIENT_WRITEABLE
Jan 30 12:42:22 mn DEVICE-AGENT[5296]: callback_websocket: LWS_CALLBACK_CLIENT_WRITEABLE
Jan 30 12:42:22 mn DEVICE-AGENT[5296]: callback_websocket: Received Pong message from cnMaestro
Jan 30 12:42:26 mn kernel: [1552568.070000] SM[00:04:56:fe:71:81] aid=25 disassociated. Reason: COMMUNICATION LOST
Jan 30 12:42:26 mn hostapd: ath0: STA 00:04:56:fe:71:81 IEEE 802.11: disassociated
Jan 30 12:42:26 mn DEVICE-AGENT[5296]: event_rx_cb: MSG [0x7fcb7ae4] len=1
Jan 30 12:42:26 mn DEVICE-AGENT[5296]: event_rx_cb:MSG TYPE = 1
Jan 30 12:42:26 mn DEVICE-AGENT[5296]: Trap data received: name STA_REJECT timestamp 1485801746 mac 00:04:56:FE:71:81 status 0 msg [COMMUNICATION LOST]
Jan 30 12:42:26 mn DEVICE-AGENT[5296]: send_trap_info: Wake up JSON Object: [{"msgType": 699, "eId": "STA_REJECT", "eTime": 1485801746}]
Jan 30 12:42:26 mn DEVICE-AGENT[5296]: callback_websocket: LWS_CALLBACK_CLIENT_WRITEABLE
Jan 30 12:42:26 mn DEVICE-AGENT[5296]: event_rx_cb: MSG [0x7fcb7ae4] len=1
Jan 30 12:42:26 mn DEVICE-AGENT[5296]: event_rx_cb:MSG TYPE = 1
Jan 30 12:42:26 mn DEVICE-AGENT[5296]: Trap data received: name STA_DROP timestamp 1485801746 mac 00:04:56:FE:71:81 status 0 msg [COMMUNICATION LOST]
Jan 30 12:42:26 mn DEVICE-AGENT[5296]: send_trap_info: Wake up JSON Object: [{"msgType": 699, "eId": "STA_DROP", "eTime": 1485801746, "varPayload": {"mac": "00:04:56:FE:71:81"}}]
Jan 30 12:42:26 mn DEVICE-AGENT[5296]: callback_websocket: LWS_CALLBACK_CLIENT_WRITEABLE

Continues rebooting. It has rebooted 27 times now on it's own. This is the log from the Access Point.

system_log (5).txt (5.49 KB)

Luis · January 30, 2017, 6:56pm

Hello Darin,

Does "debug crashlog" command reports a crashlog? To execute, login to the device using secure shell and run the command from CLI.

Regards

dcshobby · January 30, 2017, 7:05pm

On the CPE, Force 200, that command returns "Crashlog not found".

AP also has no crash log.

Cambium_Sri · January 30, 2017, 7:10pm

Can you try disabling Remote Management (cnMaestro) under Configuration->System->cnMaestro?

dcshobby · January 30, 2017, 8:07pm

Can I ask what that would do to help the situation? Just curious. We manage all our ePMP with maestro and haven't had a problem yet.

Cambium_Sri · January 30, 2017, 8:21pm

@dcshobby wrote:

Can I ask what that would do to help the situation? Just curious. We manage all our ePMP with maestro and haven't had a problem yet.

In the syslog you sent, I can see "DA watchdog". DA is the cnMaestro process running on the radio. However, I believe you may be running into the snmpd bug in 3.2/3.2.1 where it maxes out the CPU starving other processes. This could be causing the DA to watchdog. We specifically fixed it in 3.2.2.

If possible, you can try disabling cnMaestro to see if that settles the radio. However, if its the snmpd issue, then you might be better off upgrading to 3.2.2. If it still happens, we may have to dig deeper.

Thanks,

Sriram

dcshobby · January 30, 2017, 8:56pm

I've completed the upgrade to 3.2.2 and will see if this is any better for the reboots.

dcshobby · January 30, 2017, 11:28pm

Upgraded to 3.2.2 and we're at 43 reboots so far since this morning.

Anything else we can do to troubleshoot?

Eric_Ozrelic · January 30, 2017, 11:31pm

Here's an idea... what if you supply power to the radio, but not plug it into your network, just temporarily plug it into laptop or somehow temporarily isolate it from the rest of your network. Does it s till reboot? I'm wondering if there's some sort of odd traffic that's causing it to reboot.

Cambium_Sri · January 31, 2017, 12:43am

Darin,

At this point, we have to look elsewhere. Eric has a good suggestion. Also, can you please try using the PoE injector supplied in the box with the Force 200? I ask for this because the crashlog comes up emtpy. Typically this happens if power to the radio is low or cut off. If its a software issue or even a component issue, there will be something in the crash logs. If none of this helps, we have to try swapping out the radio because you say the other end of the link (which is also a Force 200) is similarly powered and works fine?

Thanks,

Sriram

dcshobby · January 31, 2017, 3:59pm

The Force 200 having issues is just connected to a 2000 sector.

The nanobridge connected to the same router port using the same cable ran fine for 96 days with no reboots. After replacing that with the Force 200, it has now rebooted 82 times since yesterday morning. It is getting 22.5 volts of power. The range is 10 to 30 V so we should be ok, right?

I don't have an easy way to provide power with the POE as the gear is all on the roof and powered by the router located up there.

I will likely have to do some more testing in the office and maybe swap in a nanobeam XW to get things going again.

dkeltgen · January 31, 2017, 8:19pm

Swap it with another force-200. I've run into a handful of force-180s/200s that reboot, several times a day, with no crashlog. I had tried replacing POE injectors and cables with no change in behavior. Eventually I just replaced the radio and the issue went away.

dcshobby · February 1, 2017, 5:27am

Hey Dan,

Have you RMA'ed any of those radios and heard back from Cambium on possible cause?

dkeltgen · February 1, 2017, 9:42pm

One of them was from a physical issue with the network connector. If I wiggled the cable/connector it would reboot. I have two more that I haven't looked into yet.

Douglas_Generous · February 8, 2017, 4:12am

How close are the serial numbers on the force 200’s that you had to swap? I have a number of them and (knocks on wood) all are just fine. We have a policy that dictates building a strain relief for the cable before it enters any radio. Maybe this has prevented the issues.

dcshobby · February 12, 2017, 3:09am

Ok so we didn't replace the radio but we changed the POE setting on the Mikrotik 750UP. Instead of leaving the POE detection to "Auto", we turned it to "Forced On" and the radio has not rebooted since in the last week.

So for some reason, the Mikrotik was not happy with the detecting the POE load (aka Force 200) and was rebooting the port occasionally but it worked with with Ubiquiti.

Weird problem but glad we got it fixed without replacing hardware any further.

mvcwireless · April 24, 2017, 3:45am

I'm currently having issues with one Force 200. Currently it has the latest 3.3 firmware and original PoE. Wire has been replaced, PoE has been replaced, UPS installed, and remote management has been disabled. The cpe will go for hours without a reboot at any given moment, or it will go thru mutliple reboots within 30 minutes. Crashlog displays nothing.

fjp · April 27, 2017, 3:27pm

The same problem solves it with firmware 2.6.2, and tell me if the same thing happens to you.

mvcwireless · April 27, 2017, 8:10pm

Update:

After multiple attempts with different version of firmware, we replaced the cpe itself and at the moment we are going to 26 hours without the unit rebooting.

Currently setting up a test on the bench with the replaced cpe to see if reboots continue with it.