Cnut does not have anywhere near the features needed to determine a hardware failure. It is part of a logging and firmware management system. Log into the radio and actively watch it or use mrtg to pull information for each radios rssi and snr, radio temperature and ethernet link status, plus have all syslogs sent to a syslog server that you can use to split the data stream so you can watch just that SM and AP separately. Determining a hardware failure is not always easy nor a full fault. Reboot issues fall into a particular issue of power management, a antenna cable shorting out will not usually cause a power problem due to how antenna amplifiers are built but a cable that allows water into a radio via capillary action can cause other shorts that can reboot a radio. Heat of the fpga or crystal oscillator can cause system halts which may reboot the radio and so can a bad cable crimp that heats up and pulls away from the connection.
In short, you need to analyze what circumstances cause the SM to reboot and isolate the causes and find the commonality between those causes. Sorry, but none of us on here can be much more helpful as most of us do this intuitively and have the systems in place to provide the information needed to understand what is going on.
I suggest a simple debian box or vm with apache, rrdtools, mrtg, rsyslogd, rhe smnp dictionary for your radios and routers2.cgi installed and on the same network that this link is on so it can access the management data from both. Point the radios syslog server ip to the ip of your new data collection server and use mrtg to pull statistics from both radios. Remember that you want your radios to show up twice, per chain with snr per chain. Most cacti recipes can be cleaned enough to work or just the important bits copied over to mrtg (cacti is a bit more complicated to setup but it is essentially the same thing as mrtg) so there is a lot of references on a google search and in here.
I will follow your suggestion and setup the data collection services.
A bit of additional information.
My throughput has been cut in half.
On the Link Status Page of the SM.
Receive Fragments Modulation Path H: N/A and the session is in MIMO A.
I’m unsure if this is due to worsening signal or hardware issue. Noise floor is the same as it was prior to drop in throughput and unless the forest service has been dropping Miracle Grow I doubt it’s new obstruction in the path.
The radios are at best difficult to access but if replacing the SM fixes the problem and returns the link to it original level of performance it will be worth it.
There may be a problem with an antenna cable that has caused the SM to shut one radio off. Try swapping the cables positions and see what happens. Also at this point turn auto power control off and set your tx power on the sm to what auto power was setting it to. This stops the SM from making adjustments that may be causing the issue. Do not set to max power unless you are 100% sure you are allowed and that the antenna and cables are verified good with no additional shorts or extra load capacitance ( water in a rf cable does this and it doesnt take much!)
The radio log will tell you what has happened since power on/reboot, once you have a syslog server it will tell you what happened before reboot/power loss.
If your using linux for a server, I highly suggest the nano text editor as it is easier than emacs or vi and can be used in their place. You can also use tftpd in windows with the firewall disabled to make a quick and dirty temporary syslog collector, but it does not have persistence of the log after it gets closed.
I will wait for you to collect and post some logs.
First do as Eric has said and update to the latest version 16 firmware on both ends.
Second, clear any watchdogs you may have set. A factory defaulting is a good way to ensure there is nothing being hidden in the config files. But not usually required, if you do plan to re setup from scratch do it after the firmware upgrades.
Have not set up any watchdogs and was assuming that it is an internal watchdog programmed into the firmware.
Will do the firmware update but will do it locally for each radio rather than over the link just in case it decides to go belly up during the transfer.
But that does beg the question is CNUT smart enough to abort the update if the firmware file suffers a transfer error or the file is corrupted.
As for the other log they seemed to show generic (lost connection reconnected) type entries.
Cnut is just a repository and a dedicated ftp (tftp) server that send an snmp command to each radio to download and update. It is the radios themselves that perform the updates by downloading the firmware into memory, then test the integrity and extract the image to burn over the existing nvram image.
Cnut is not smart, thankfully the radios are!
You can also do this from the web interface of each radio too if you dont have the files in cnut yet.
Will be updating this weekend. Looking thru the release notes I did not see anything specific to the 900MHz band radios. Took a look at the v20 release notes as well, same thing.
Are there any improvements / fixes in either of these firmware updates which apply specifically to the 450 900MHz band radio or is just general / across the board fixes and additions for new radios?
@Eric_Ozrelic Update firmware on both radios. Unfortunately that did not seem to resolve the issues I was having so I decided to swap out the SM yesterday.
Unfortunately things went from bad to worse.
I started a new thread as it seemed to be a new problem.