Hi everyone, we’ve seen an issue with our APs losing connection to the proxy and then almost immediately dropping to Granted and therefore disabling transmit. TLDR/Questions at the bottom if you want to skip through the full description.
It is our understanding that the radio heartbeat is good for 3:40, but the logs in the AP and the proxy show the connection drop for around 40-50 seconds and then the AP shuts off.
Here is our current setup:
cnMaestro on premise deployed solely for the use with CBRS.
An external (on premise) tinyproxy server.
Ballpark around 80-90 APs onboarded and using CBRS through the proxy.
We found the proxy was restarting during a log rotate, and have disabled the reload command to reduce downtime on our end, but this is concerning if the proxy host drops and we are waiting on say an HA switchover, all of our APs could shut down due to a 60 second switchover because 1 server failed.
Complete log excerpt from a radio during this issue:
10/05/2020 : 18:02:17 CST : Heartbeat request connection timed out, state = CBSD_HEARTBEAT_TX
10/05/2020 : 18:02:28 CST : Error receiving data from domain proxy : Timeout was reached
10/05/2020 : 18:03:01 CST : Error receiving data from domain proxy : Timeout was reached
10/05/2020 : 18:03:06 CST : Transmit time expired hence transitioning to granted state
10/05/2020 : 18:03:42 CST : Error receiving data from domain proxy : Timeout was reached
10/05/2020 : 18:04:06 CST : Processing Transmit expiry timeout
10/05/2020 : 18:04:45 CST : Error receiving data from domain proxy : Timeout was reached
10/05/2020 : 18:05:29 CST : Enabling transmission on radio
Excerpt (truncated) from our proxy during the reload from the last request before the reload to the first request after:
INFO Oct 05 23:59:59 : Closed connection between local client (fd:7) and remote client (fd:8)
INFO Oct 06 00:00:29 : Reloading config file
CONNECT Oct 06 00:00:40 : Connect (file descriptor 7): 10.X.X.X [10.X.X.X]
CONNECT Oct 06 00:00:41 : Request (file descriptor 7): CONNECT sas.cbrs.cambiumnetworks.com:443 HTTP/1.1
Our on premise proxy dropped for roughly 40 seconds, during which time 80% of the APs on our network saw they couldn’t connect to the SAS and reverted to granted (transmit disabled).
If the radios are talking to the SAS 60 seconds prior to the proxy failure, and the Heartbeat is almost 4 minutes long, wouldn’t they be able to withstand the 60-90 second drop say in the case of an HA failover?
If I’m understanding the tech we’ve been emailing correctly, they only heartbeat every 3 minutes and 40 seconds, so even if they did talk to the SAS a minute prior, they will still drop offline if that 3:40 timer expires during the proxy failure. Is this correct?
If that is correct, does anyone have any suggestions to keep the radios online during a short failure, say of a server or backhaul leading to it?
Edit: redacting IP addresses from the log.