E2E Controller disconnect and re-connect behaviour

GREG3f · October 11, 2022, 11:00am

Is this expected behaviour, or should I open a ticket…

We had an issue with our VM running out of disk space, which caused a loss of communication to the network a couple of times. The first time when the controller was not available, the network appeared to continue running as expected. However, when the controller restored communication to the Network and cnMaestro it caused new configs to be pushed to many if not all the devices in the network. This took over 20 minutes for the network to be fully restored. During this time, many DNs and CNs were unreachable, which is not acceptable. The second time it happened, the Network was also available without the controller, but again when it returned the network went down and new configs were pushed. In order to try and expedite the restoration of the network, I found myself enabling Ignition on links that were not establishing on their own. When I would Ignite, they would reconnect within a minute and usually received a fresh config. This reduced the downtime to approx. 5 minutes before the network was fully restored.

I also noticed that the System time was out of sync by over 8hrs between Maestro and E2E, but when I verified System Time on both servers they were a match. After about 10 minutes of the server re-connecting, the System Time reported to be in sync. Not sure if this is related but seems Odd since the clocks never changed from before, during or after the loss of communication?

Prasanna_TM · October 11, 2022, 3:23pm

Hi,
Can you share the techdump of E2E controller and field diagnostic files of PoP and one of the DN that you believe new configs got pushed.

After I analyze above files, I would be able to share more engineering comments

Thankyou