Force 400C - QoS

TDJ211 · June 9, 2022, 11:32pm

I’ve had a single pair of Force 400C’s im using as the 5G failover on a Siklu 70G radio. This site has grown beyond the capacity of the 5G during heavy rain events and I was anticipating using the QoS DSCP traffic prioritization feature as found on the Force 300 but it appears DSCP hasnt made it to the Force 400 yet.

Is the radio DSCP aware at all? Is there any way to QoS over these radios? This really sucks. I may have to replace them.

Douglas_Generous · June 10, 2022, 12:26am

not in the current software.

depending on the switches/routers you have on both ends, you can implement a policy and enforce shaping and queueing on the 5ghz link without needing the radios to do anything more than they currently do.

TDJ211 · June 10, 2022, 12:45am

Thanks for the confirmation.

Yea, but the 5Ghz (200M) is just a failover while the 70Ghz (1G) is the primary. So then the Mikrotik routers would somehow have to know that the link is in failover (scripting I imagine), then enable QoS policy for 5G.

We do have a fiber failover from a fellow WISP, but its an unofficial 200M. Im gonna have to learn some OSPF magic now. It was inevitable I guess

Douglas_Generous · June 10, 2022, 12:50am

since you have tiks, use policy based routing and include the policy and queueing after the decision point. use an ip sla to monitor the connections and choose when to start moving data over the backup link.

You will need someone more knowledgeable with Mtiks to help get this setup but since I do this with cisco routers, it can be done.

Nicholas_Eastman · June 10, 2022, 2:31pm

Depending on your network, there are a couple of ways to set this up. So I have a few preliminary questions:

Are you already using dynamic routing (OSPF, BGP, or both) between these sites?
Do you care if traffic is not flowing across this unit unless its a failover situation?

We have the failover set up on our network over a couple of larger links like the one you described. Simplifying it down, we have both links being run via OSPF, the 5Ghz is cost high (100) in our case. We then have a scheduled script that acts as our SLA and watches the neighbors. If the 70Ghz link in your case drops its neighbor it changes the cost of the other link. This does force our OSPF to re-build, but that is usually a quick drop since the link was already established at the higher cost. Once the main link is back online the same script reverts the cost back to 100 to tell traffic to go back to the 70Ghz radio.

After this setup, all you would need would be to set up your Mangle/QoS to handle the traffic that is only going over the 5Ghz link.

Here is the script we use:

# OSPF SLA Script for Mikrotiks

:local pri
# Primary link: set to the name of the interface in the router
:set pri "CHANGE ME"

:local back
# Backup link: set to the name of the interface in the router
:set back "CHANGE ME"

/routing ospf neighbor
:local found
:set found 0

:foreach n in=[find] do={
    :if ([get $n interface] = $pri) do={
        :if ([get $n state] = "Full") do={
            :set found 1
        } else={}
    } else={}
}

/routing ospf interface
:if ($found = 0) do={
    :if ([get [find interface=$back] cost] = 100) do={
        set [find interface=$back] cost=10
    } else={}
} else={
    :if ([get [find interface=$back] cost] = 10) do={
        set [find interface=$back] cost=100
    } else={}
}

We usually install this directly in a system scheduler instance with a reasonably low interval (5 seconds for us) based on your case. Just remember you are putting more load on the router checking in really short intervals, so I wouldn’t go much lower than 5. All you have to do is change the lines that say “CHANGE ME” with the name of the interfaces of your radios. The script checks for a neighbor using that interface and if it is not up, changes the cost of the other interface. Side note, these links are set up as Point-to-point for us. I cannot guarantee how well it will work with other OSPF interface types.

Here is a terminal-friendly version you can just paste in:

/system scheduler
add disabled=yes interval=5s name=SLA on-event="# OSPF SLA Script for Mikrotiks\r\
    \n\r\
    \n:local pri\r\
    \n# Primary link: set to the name of the interface in the router\r\
    \n:set pri \"CHANGE ME\"\r\
    \n\r\
    \n:local back\r\
    \n# Backup link: set to the name of the interface in the router\r\
    \n:set back \"CHANGE ME\"\r\
    \n\r\
    \n/routing ospf neighbor\r\
    \n:local found\r\
    \n:set found 0\r\
    \n\r\
    \n:foreach n in=[find] do={\r\
    \n    :if ([get \$n interface] = \$pri) do={\r\
    \n        :if ([get \$n state] = \"Full\") do={\r\
    \n            :set found 1\r\
    \n        } else={}\r\
    \n    } else={}\r\
    \n}\r\
    \n\r\
    \n/routing ospf interface\r\
    \n:if (\$found = 0) do={\r\
    \n    :if ([get [find interface=\$back] cost] = 100) do={\r\
    \n        set [find interface=\$back] cost=10\r\
    \n    } else={}\r\
    \n} else={\r\
    \n    :if ([get [find interface=\$back] cost] = 10) do={\r\
    \n        set [find interface=\$back] cost=100\r\
    \n    } else={}\r\
    \n}\r\
    \n" policy=read,write start-time=startup

TDJ211 · June 11, 2022, 5:19pm

Oh hell yea! This is amazing! Yes, we just transitioned to a OSPF/BGP based network. I have a firm grasp of it now, but im still pretty new to solving these types of problems.

I actually just started using ECMP to make traffic go over both paths (we had very heavy rain storm come through the other day) to keep traffic under 200M for now. And Ive started entertaining the idea of doing an OSPF Transit Fabric Traffic Engineering for unequal links (basically create multiple VLANs and ECMP all proportionally to each path). But one caveat to this method is that my failover traffic has to go over an EOIP tunnel back to the Core to maintain the iBGP peering (due it being over another ISP). This has the consequence of a slighly lower IP MTU of 1458. Im not exactly sure how much of an issue that poses but its something I feel I should avoid if possible. Not to mention, this failover was offered free of charge as a kind gesture from a fellow WISP, so I certainly dont want to abuse it.

Anyways, I say all of that to say, this seems like a much better option. Thank you!! Ill let you know how it goes.

TDJ211 · June 11, 2022, 6:19pm

After looking over the script more cloesly, I realized I failed to mention that the 5G failover is built into the Siklu antenna radio so the transition from 70G to 5G is internal on the Siklu (sub-milisecond failover) and is all going over the same OSPF interface. All that said, this is a great starting point and I believe there may be a way for the Siklu radio to signal to the Mikrotik that its on 5G. Ill keep yall posted.

Mau_Padil · June 16, 2022, 1:37am

You could use the QoE solution! A WISP with a similar scenario already used it during a failover and the QoE save the day by avoiding the whole traffic overload the PTP backup link.

TDJ211 · June 16, 2022, 1:55am

Are you referring to the Cambium QoE? Or something like Preseem? I’m using Preseem but they cannot QoE on backhauls, only the AP’s themselves.

And Force400C does not have software available for QoS/QoE as far as I’m aware.

Mau_Padil · June 16, 2022, 2:22am

Our Cambium QoE solution gives you similar functionality as Pressem, but with the chance to accelerate TCP, do rate limiting and traffic shaping, this last one could be the clue to reduce traffic and don’t overload the PTP. This solution could be sit in a box/appliance at your NOC and that way you could control and enhance the backhaul as well as the last mile!
If somehow I can help, you could reach me at mauricio.padilla@cambiumnetworks.com

Nicholas_Eastman · June 16, 2022, 7:31pm

We actually have been running that setup for a couple of years now. There were some definite growing pains, but the BGP on OSPF routing has made it so much easier to manage traffic flows and the VLAN balancing is a lifesaver on dissimilar links.

As far as the EoIP failover link, the MTUs will affect you some but if you’re not using jumbo frames the effect will be minimal for most traffic. We actually use eBGP between sites and then just clean up the private ASNs in the core before advertising to our uplinks. It’s definitely a task to implement (it’s actually the latter half of the slide show you referenced) but can be worth it in these situations. If that link is only between those two routers, and those routers only talk over it and no other link router to router, then you could use eBGP routing policies to keep your traffic off of it until there is a failure and traffic will just drop to it. Otherwise, this SLA script has been a life saver in those situations. We even have a similar version set up on our Juniper core that does this if our Preseem server fails for whatever reason. As soon as Preseem is unable to send the OSPF control data across the “link” our traffic fails to a direct connection around the unit.

While looking for a reference for you, it looks like our friends at IP Architechs have created a new series that go over using communities to influence traffic flow on their blog. There are 3 parts as of now and the others are listed in the pingbacks at the bottom of the first writeup.

As for the Siklu, I have not played with it, but I know they support vLAN configurations per ethernet/wireless port, so you might be able to split the traffic up that way. I have only looked at it once and never spent much time working on that deployment.

TDJ211 · June 18, 2022, 8:09pm

Man, every bit of this post is gold, thank you!

As far as the EoIP failover link, the MTUs will affect you some but if you’re not using jumbo frames the effect will be minimal for most traffic

Of course, I just got finished implementing jumbo frames to my entire network…so this is a very good tidbit to know! So if I were to revert the frames for the EOIP, would it be enough to do it on that particular Tower router or would it have to be the entire path including my Core/Edge routers?

At this particular site, I do have all VLANs going over the SFPs of a CCR1036 so I could easily break out the EOIP path on the router to an ethernet port and switch ports with standard frames. But what about everything before and after it?

EDIT: And by problems, do you mean with Router speed performance? I have this EOIP tunnel between 2 CCR1036’s and I am able to saturate the +400M connection I do have. Im just concerned if the “problems” will be noticeable on the user end.

Nicholas_Eastman · June 23, 2022, 8:16pm

I’m not too informed on jumbo frames, so I can’t answer most of the questions you have regarding running standard frames on the EoIP link.

As far as the performance is concerned, sending larger/full-size frames through the link causes fragmentation across the EoIP. Basically, the packet is so large that the transmitting router has to break it apart and the receiving one tries to re-assemble it once it receives all the fragments. This slows the performance of the router some (honestly, I’m not sure how much at the traffic level you are describing). The biggest hit is if that traffic needs to be real-time (video conferencing, large VPN packets, etc.) it can cause issues that customers may notice. You can force the MikroTiks not to fragment the packets across the tunnel. According to my limited understanding (someone please correct me if I’m wrong, and RTFM and do your research as well), this is the same as the device sending speed notifications back to the subscriber. The sub’s equipment receives the notice and re-transmits a smaller packet out to the internet. This might cause some slowness or interrupts, but they tend to be all at once and then back to business as usual once the program/device knows what size to send over.

The best example I can give you of this is gaming with excessive latency. With fragmentation enabled, the link can cause jitter as the connection jumps from great to medium/poor and back again depending on the level of fragmentation for that one customer’s connection. With fragmentation disabled, it reduces jitter, after determining the latency necessary on the link. I can play reasonably well with a consistent latency of 200ms after the initial handshake, but if that latency is jumping around it’s more likely to cause issues.

Standard warnings apply: YMMV, RTFM, and I’m learning as I go as well.

TDJ211 · June 25, 2022, 6:23pm

Thanks for the reply.

I just had this epiphany regarding enabling jumbo frames (and please correct me if im wrong), is that if the sender is NOT sending jumbo frames (none of our CPE are set for jumbos, its just on our Towers and Core), then having jumbo frames enabled isnt going to force a normal frame to become a jumbo throughout the network, its just an option if a jumbo frame happens to appear. And even if I set all the CPE for jumbos, the customer would still have to have their own devices set for jumbos for it to create the jumbo packet (which I highly doubt).

So if thats the case, then I think im good. I say that under the assumption that we arent aware of any issues running over the EOIP. Just trying to make sense of it.

TDJ211 · August 18, 2022, 7:36pm

Hey Nicholas,

In reference to the OSPF script you posted earlier, why are you using scripting to change the cost?

Why not leave everything as is and let OSPF do failover the normal way? For example, when the lower cost link is down, it routes over higher cost. And then when the lower cost link comes back online, routing resumes over lower cost. What is the advantage to scripting the cost change??

Nicholas_Eastman · August 18, 2022, 9:42pm

Sorry, I missed your post from June. To touch on it a little, your assumptions about jumbo frames are correct as far as customer traffic won’t automatically become jumbo just because you have them enabled. Enabling it on your equipment allows you to send extra data with the packets before seeing fragmentation. MPLS is an excellent example of this. If the customer is already sending a max-sized standard frame, and your router adds an MPLS label on top of it, you just created a jumbo frame. Same for the EOIP tunnel, that encoding adds more on the top of a frame and could cause fragmentation unless jumbo frames are available over the link that runs that tunnel.

As per your current question, we run the script to prevent routing loops on our equipment. Our BGP runs on top of the OSPF. OSPF decides which link is the best to send traffic on and BGP decides the actual next hop. Assume we have this setup:

The dotted link between Site A and the Customer POP is a slower link than the main one and is cost higher than the other link to keep traffic off of it unless the main link fails. The dashed link from the Core to Site B is much slower than even the dotted link to Site A. In the event the main link fails between the Customer POP tower and Site A, the backup link is still running but at a higher cost than the link to Site B. BGP says that the next hop is Site A, but OSPF says the best way to get to Site A is through Site B because that link has a lower cost. Site B, then gets the packet and knows that it prefers all traffic to go through the Customer POP in this example, so it sends the traffic back to the Customer POP. Rinse and repeat. With our script in place, as soon as OSPF notices the main link is down, it brings the backup link’s cost to equal and immediately starts using it, so all traffic continues to flow through Site A to the internet.

Coincidentally, this setup has a similar effect when both of the links to Site A drop. BGP is running through multihop to get to the other router, so if the links drop, the Customer POP can reach Site A through the ring and sends traffic to Site B. Site B can’t see that the link dropped to Site A and sends the traffic back to the Customer POP, as it is it’s natural next hop. We have another script running on the routers to disable BGP peers if all OSPF links to that peer go down to prevent this.

Douglas_Generous · August 27, 2022, 10:14pm

You should not be using a script to change the OSPF cost on a link, just set the lower priority links cost high enough to make it secondary and then use bandwidth metrics to tell OSPF what kind of link is attached. The expected behavior is that OSPF will use the lower cost, higher bandwidth link 90% of the time but can/will use the older link for low priority data if it is the both the shortest path and load balancing is enabled (is by default).

My reasoning is that by changing the cost on one port, you force the entire OSPF path calculation to take place which triggers upstream routers to do so and this will drop traffic on each router that is doing these recalculations.

Nicholas_Eastman · September 1, 2022, 1:53pm

I did not realize the path tree for other routers would also recalculate. What would you recommend using OSPF with BGP and preventing routing loops? The script is in place just because we have issues with looping when OSPF is cost higher, but the next-hop is still that adjacent router.

Douglas_Generous · September 1, 2022, 2:51pm

Sorry, to grasp the full scope of your question, I needed to go into a bit heavier explanation thus the long winded post.

From a network design point you are over complicating the wanted functions at the wrong layer.

Routing is like a layer cake, each layer must be made and in place before you stack the next. So your physical network connections must be in place before you have spanning-tree sort the loops and determine off-line backup path. If this was a L2 network your done, but since your routing, you are routing you can make those backup links work at the same time with the use of dedicaded vlans for ptp links, which allows for spanning-tree separation so multiple paths can exist together if you are not using dedicated physical ports for the ptp links. Your routing protocol determines best path based on the configured metrics, this does not mean only one path may be used as it also monitors bandwidth used and will offload to a secondary (higher cost) path.

Let spanning-tree handle loops with rapid-per-vlan spanning-tree.
Use OSPF as it is intended and set one path higher to send less data to it. Using bandwidth metrics limit what OSPF will use the link for.

Dont mess with OSPF manually, if a link goes down OSPF will monitor the path state, determine if the path is available and if it is not will drop the path on that pair of routers only with no recalculation unless all paths between the affected locations are down. (I realize this is skipping a lot in the function description, but this is for simplicity)

OSPF creates a connectivity map and route paths, change one and the whole thing recalculates.

iBGP should have a pair of routers to use as route reflectors. iBGP should only hold end points that need to connect to each other, like core router to core router, egress gateways for transit data and core to tower routers. Not tower router to tower router.

To get off-line backup links to work you need spanning-tree working, but to use both links in live fail-over (the ideal setup) a dedicated interface on the routers for each link is needed. If you only have 2 interfaces then use sub-interfaces and a vlan aware switch, just remeber to keep your interface over subscription rate reasonable. This means that if you have a 60ghz link at 1gbps but a backup link thats only 300mbps then you can share that interface but will not be able to realize the full capacity of both of your links at the same time. This being said, you can spread your ptp sub-interfaces across several interfaces and use both links at the same time to capacity.

system · September 1, 2023, 2:51pm

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.