We are automating a lot of the items regarding these radios for deployment in our network. One thing we have issues with is the SNMP daemon providing responses. For example if we are upgrading a radio we see the following pattern during boot:
Upgrade completes, radio remains responsive for 3 seconds.
Radio does not respond to ICMP / SNMP for approximately 45s.
It then responds to ICMP (NO SNMP) for about 10s, followed by an 8s pause - my guess ethernet negotiation or something else taking down the interface.
After this ICMP is responsive 100%, but not until 120s after reboot and 75s after initial ICMP response does the SNMP daemon on the radio respond.
We have mapped similar patterns when simply even applying changes to the device via SNMP and then applying them. Though times are much shorter. This also seems to generate the error:
Error in packet Reason: (noSuchName) There is no such variable name in this MIB.
Though a few seconds later polling the same OID it responds just fine. Assuming this is hitting the daemon during a reload of some sort.
We could really use an SNMP daemon on the radios that becomes active nearly simultaneously with ICMP responsiveness in all cases. We would even prefer it take longer to respond to ICMP if that is needed to get this working.
is there a way of having the epmp units (elevate or real) send an snmp message telling the backend system that its alive and ready to handle packets? this would mean that we could code for a specific response. This also eliminates the boot time coding and having to R&D the boot time on each device after each software update to ensure our timeouts are still valid. This also further eliminates the effective difference of using elevate devices as they can also send the same message once they are ready to handle packets again.
I realize traditionally that snmp is a polling protocol but there are provisions for this type of active response system.
I know snmp messages are small but if you have a 1000 devices to poll and you have to continuously poll one device for so much time with no response to decide if its dead or alive, this can add up to a lot of bandwidth used for management and wasted resources. This gets worse with more devices and I dont want to hear "nobody has that many devices on a network".
What you are describing would be an SNMP trap and devices can typically send them out at restart. However, that typically is directed to your NMS and would not help in the scenarios we are facing. You would somehow have to first tell the unit what IP to send this to and then wait for it.
We have 1000's of devices, but in our case you would typically only be waiting for a few to come back online. The network impact is minimal as we do most of the waiting with NO network activity. The way we have it coded works for ALL units regardless of the timing. It basically is as follows:
Send SNMP command to reboot
Test for ICMP
If good (we use 3 packets sent / 3 received as "up")
Wait 20s (gets us past Ethernet reset)
Poll SNMP GET for sysDescr
If valid response -> Alive
Loop Poll SNMP
Loop Test ICMP
So during the ICMP phase you are talking up to 6 packets every 5s and then during the SNMP phase up to 2 packats every 5s. Not significant impact on network.
Now of course if you needed to do this for all 1000's of devices at once that is a deifferent story, but at that point I suspect you have more pressing problems to be working on - like why is my entire network down :)