Affects of NAT on Speed

Alright we’ll take a chance on the forums =) Moto’s call center doesn’t seem to want to answer our question. Here’s the ‘question’ posed to Moto originally…
-----------------------------------------------------------------------------------
“Hello -

Recently we have experienced several occurrences wherein, a customer (SM) is experiencing horrible throughput - 100K and less on a connection that allows for 3000K. When we DISABLE NAT on the SM, the speeds are the normal 3000K , as we would expect. When we ENABLE NAT, their speeds drop down to below 70K sometimes as low as 16K.

Since many times when we call with questions Moto likes to say that ‘Motorola is unaware of this problem’ - We are reporting the occurrence of this problem to you - so Moto is no longer ‘unaware’.

We would like to know what difference having NAT on or OFF can make as far as throughput.

Canopy 7.3.6”

NAT takes processing power but wow…

Make sure I have this right:
SM in bridge mode: PC pulls an IP directly from your DHCP server and it works fine

SM in NAT mode: SM pulls an IP from your DHCP server, PC pulls an IP from the SM and speeds are slow…

Have you tried disconnecting the customer’s computer and using your laptop to test? i.e. does the problem occur regardless of the computer used?

"Make sure I have this right:
SM in bridge mode: PC pulls an IP directly from your DHCP server and it works fine"


Yes - whether PC or customer’s router

"SM in NAT mode: SM pulls an IP from your DHCP server, PC pulls an IP from the SM and speeds are slow…"

Yes - PC [u:1hb2e7tg]or[/u:1hb2e7tg] Router - So whether the PC is connected through their router, or the PC is connected directly, we see this difference. We’ve see three cases to far.

Have you tried disconnecting the customer’s computer and using your laptop to test? i.e. does the problem occur regardless of the computer used?

No - we are not at the location. The PC and router are not primarily suspected, mainly due to the fact that they seem to be quite capable of moving the data on the connection at 3+Mbps when NAT is off on the SM. This is of course with either SM<->PC, or SM<->Router<->PC. It’s when NAT is enabled that we have the crippling slowdown.

Feels like the NAT on the SM is most suspect-able (to coin a new term).

What ideas do you have as to how the relationship between the PC and the NAT on the SM could be interacting that might cause this?

Steps I would take from here:

Log into your core and/or edge routers and clear arp and clear arp-cache

Have the user disable the firewall(s).

Reset the SM to factory defaults and reconfigure.

Reload the software (back to 7.2.9, then up to 7.3.6).

After that replace the SM.

Log into your core and/or edge routers and clear arp and clear arp-cache
Have the user disable the firewall(s).
Reset the SM to factory defaults and reconfigure.
Reload the software (back to 7.2.9, then up to 7.3.6).
After that replace the SM.


While i explain this to the SysAdmin, can i tell him the theory behind how/why ARP cache might cause this? We tend to try to find the actual source of the problem, rather than trying to work around it. For instance, if the cache for 3900 other connections is working fine, and we need to keep resetting border routers to fix 1 of 3900 connections, this might pose an issue. What do you imagine is happening in the ARP cache that would cause this? I could see how bad ARP cache info might cause loss of routing, but i do not understand how severe and chronic bandwidth restriction could result. As far as we can tell so far, the source of the problem seems to be within the SM only during NAT operation.

If clearing the ARP tables fixes the issue, then you know it’s bridge table issue and gives you a direction to look in.

Hey!

When the customer(s) experience the slowdown, do you look at the NAT Table menu? I have seen customers with badness (viruses, generally) on a computer, and, over a relatively short period, all of the available NAT slots get filled up.

There’s basically 2048 potential ports a NAT-ed SM can open, and each has a timer associated with the type of protocol passed through it: As long as no information is passed over the port, it becomes available for new data after a certain period of time.

An infected machine tends to open up all the available data ports, and keep using them, eventually choking off the end-user connection. Of course, when you take it out of NAT mode, the port limitation of the SM is no longer an issue. Also, if you (hypothetically) reboot the NAT-ed SM, it works OK for awhile, until the ports available for ‘good’ traffic start to dwindle.

Just something to be aware of . . .

newcastle -

thanks for the suggestion - no this is not a full NAT table issue - we check things like that and also we normally watch a tcpdump of the connection while troubleshooting - this is a nice quiet clean connection

What type of router and what subnet are you using for the NAT range at the SM?

I’ve ran into random issues with Linksys, Netgear and D-Link and the 192.168.x.x subnets on the WAN side, so if I absolutely have to do double-NAT, I use a subnet in the 10.x.x.x range.