I’ve verified now that the 3 latest links we’ve deployed cause VPLS tunnels to flap every 3~5 minutes. Two of these links were currently AF5xHD and had no issues until the Force 425 radios were swapped in, running 5.3 at the time of swap. The third link, I upgraded from 5.1.3 to 5.3 over the weekend and pushed traffic over it from the link it was acting as a backup for.
The resolution on all three links today was to downgrade them to 5.1.3. That is the only change required for stable tunnel operation. Fortunately, none of these links carried critical traffic, other than the third link. But that also meant that it took until businesses opened today for me to find out about the issue.
I have not tweaked any of the performance knobs in 5.3 on these radios - the performance was set to balanced. Ethernet MTU on the radios set to 1700 and 1600 on the router interfaces, per our usual.
Time permitting, I’ll setup a lab to verify, but simply downgrading firmware on all three links today resolved all issues in and of itself.
Putting this out there in case others run into the same issue.
thank you for the report! I will try to replicate this case in the lab.
At the same time could you tell me if you noticed anything in performance of radios or any logs regarding VPLS in your routers?
Nothing in router or radio logs and no sudden swings in MCS rates or other performance indicators that are obvious. The tunnels will just drop approximately every 3 minutes with 5.3. LDP neighbors stay established just fine, it is just the VPLS tunnels.
Downgrading to 5.1.3 fixes the problem immediately without any other changes.
@aka to really test you would need the tunnel to terminate on a router somewhere else on a network, rather than on a directly connected router. I’ll setup some remote access and PM you. I can really only play with one of these links as the others carry customer traffic.
Please forgive my child-like example of a topology drawing here. The red lines are the VPLS tunnels. You can see in the image the various paths between sites. The red lines are examples of where this problem has demonstrated itself with 5.3 but works fine with 5.1.3
Is it on 5.4.1 also? Do you perform any actions to recover or it goes away by itself?
Can I have tech supports? We have already an engineering ticket. But in our lab we see it on 160 MHz only.