Core Network Meltdown Since Installing 4.7.4-R19

We are having massive network wide issues since upgrading our cnPilot routers to 4.7.4-R19 .

SYMPTOMS: Devices don’t get DHCP addresses, cnPilots vanish from the cloud, config changes can’t be made, software updated/downgrades can’t be made, and eventually routers crash.

The core issues seems to stem from a massive crit_techsup file generated in /etc/cambium

This issue prevents any firmware updates/downgrades from happening. It prevents any config changes from being made. And it eventually crashes the router and requires a truck roll to fix.

We have been talking to Cambium support for over 3 weeks and get only “This is a critical issue internally”, yet as we’re having a literal core meltdown there isn’t even so much as a work around or solution provided yet.

I’ve e-mailed and talked to on the phone our RTM on Thursday and have not heard anything further from him.

Where can I take this further up the chain in Cambium, and is anyone else seeing this issue?

What version did you upgrade from?

Only issue I’ve found is with routing out “Other” interfaces. Nothing else.

Multiple version… really can’t tell you.

Are you running version 4.7.4-R19 ?

If so, can you log into one of your routers and run the commands:

df

and then:

ls -l /etc/cambium

?

Apparently between certain versions, a factory reset was was needed. Whether it merely needed a fresh config (thereby omitting erroneous parameters), or it actually changed something more permanently, I cannot confirm. that being said, every time I get a new router I upgrade the firmware and factory reset the config, then load a config template.I can’t recall the version past which need this procedure was needed (check release notes).

Yes I am running 4.7.4-R19. I’ll check the output of those commands when I get a gap. What is your output?

Some questions:

  1. Over WiFi or cable?
  2. Which model of the R series?
  3. What SM (ePMP / PMP450)?
  4. What is the layout of the client install? SM → R-Series (WAN) → customer (LAN / WiFi) ?
  5. Are there any other devices or switches used inline? Or that you install on the LAN?
  6. Have there been any lightning strikes or storms?
  7. Do you use VLANs on the WAN, and if so what interface types? Trunk or implicit VLAN?
  8. Are you querying the devices via SNMP?

Some of the above answers may give me an idea of the issue.

Over WiFi or cable?

Doesn’t matter

Which model of the R series?

190, and 195

What SM (ePMP / PMP450)?

Doesn’t matter. Ubiquiti, PMP450, or BaiCells

What is the layout of the client install? SM → R-Series (WAN) → customer (LAN / WiFi) ?

SM → R-Series WAN → Customer (LAN/WiFi) yes

Are there any other devices or switches used inline? Or that you install on the LAN?

No other devices

Have there been any lightning strikes or storms?

No this is across the entire network 500+ devices

Do you use VLANs on the WAN, and if so what interface types? Trunk or implicit VLAN?

No VLANS

Are you querying the devices via SNMP?

We don’t query cnPilots via SNMP

The issue is the storage space on the cnPilot has run out of disk space.

# touch blah
touch: blah: No space left on device
# df
Filesystem           1k-blocks      Used Available Use% Mounted on
rootfs                   10496     10496         0 100% /
/dev/root                10496     10496         0 100% /
/dev/mtdblock13            384       340        44  89% /etc/cambium
# ls -l
-rw-r--r--    1 0        0             121 keystore
-rw-r--r--    1 0        0               4 DA_VERSION
-rw-r--r--    1 0        0           33850 bkup-config.txt
-rw-r--r--    1 0        0          451733 crit_techsup
-rw-r--r--    1 0        0            1400 udhcpd.leases
-rw-r--r--    1 0        0              19 account_id
# rm crit_techsup
# df
Filesystem           1k-blocks      Used Available Use% Mounted on
rootfs                   10496     10496         0 100% /
/dev/root                10496     10496         0 100% /
/dev/mtdblock13            384       220       164  57% /etc/cambium

rootfs and /dev/root are read only. “touch: blah: Read-only file system”. My output below.

# df
Filesystem           1k-blocks      Used Available Use% Mounted on
rootfs                   10496     10496         0 100% /
/dev/root                10496     10496         0 100% /
/dev/mtdblock13            384       260       124  68% /etc/cambium

# touch /etc/cambium/blah
# ls -l /etc/cambium
-rw-r--r--    1 0        0             123 keystore
-rw-r--r--    1 0        0               4 DA_VERSION
-rw-r--r--    1 0        0           33840 bkup-config.txt
-rw-r--r--    1 0        0            3290 crit_techsup
-rw-r--r--    1 0        0            1080 udhcpd.leases
-rw-r--r--    1 0        0              21 account_id
-rw-r--r--    1 0        0               0 blah

I would check that your MTUs are correct (at least 1518) and that you don’t have any fragmentation through your network. If there is, then set the desired MTU in the WAN interface settings (accounting for TCP overhead and the protocol over your network. I take it you don’t have a management network? I only use VLANs, but at the very least you should have a management network on a VLAN.

1 Like

I’m facing a similar issue, customers “stop getting internet from un-explainable reasons”, the typical resolution is to reboot the router (R190V/W & 195W) and everything works again.

However, this morning I accessed to into the logs and found:

<Mon Oct 24 19:36:23 2022> udhcpd[17608]: Sending OFFER of 192.168.11.199
<Mon Oct 24 19:36:23 2022> udhcpd[17608]: Sending ACK to 192.168.11.199
<Mon Oct 24 19:36:23 2022> udhcpd[17608]: can't open '/etc/cambium/udhcpd.leases': No space left on device

Multiple times, so is like right before assigning the IP the system fails to write some files.

I will submit a support ticket tomorrow, but I really appreciate if someone from Cambium jump here to explain what is going on.

I’ve 250+ units deployed, and this is the most common problem I’ve (un-explainable lost of connection), everything looks great from cnMaestro, all the way from the AP to the Router, I’m even able to ping/traceroute in cnMaestro from the router to any website.

@Ignacio_Ocampo_Milla
This issue is fixed in the latest release 4.8-R15. Please upgrade to this version.
It’s already available on the support site https://support.cambiumnetworks.com/files/cnpilot_r195p/
It will be available in cnMaestro shortly.

Is it compatible with cnPilot 190V too? I did read the release notes https://support.cambiumnetworks.com/file/b0c5d686dc090b8bf793f2543cb914ade7e975a5, and I don’t see anything related to the disk space.

Can you please elaborate more? I would like to understand the root cause and how it was fixed?

Yes it compatible with the 190v as well.

I was told it was fixed in 4.7.5-R2… was it not?

We ran into this issue when upgrade to 4.7.5-R2 from 4.7.2R-10. So I would say no it was not fixed in 4.7.5-R2. To fix the issue we had a beta of 4.8R-X

We are still having the issue with 4.7.5-R2… not on all routers but on some… so yeah, I don’t think it’s fully resolved.

We are still having the issue on 4.8. This issue is now going on for 3 years. Someone at Cambium needs to take control of this issue and get it fixed.

@RFWaveRider Questions… Do you have DNS proxy enabled?

@iBound on the 450, no we disabled that after we found a bug in that. On the cnPilot I would have to look.

Probably. I believe the cnPilot hands itself out as the DNS server for LAN devices.

On the R-Series. There is a feature called DNS proxy on the LAN. Originally I found that if there is interference, this hangs and stops providing DNS. Only a reboot fixes this. I have it turned off by default, so I can’t say for sure that it still exists on newer firmware versions. Printer direct WiFi, Roku media players often purposefully (or unintelligently) select their channels.

A good test would be to see if an IP is still ping-able…

@iBound I’ll look. But I don’t think this is the current issue.

The current issues that if the modem and the wireless router come up at the same time from say a power outage, the modem doesn’t yet have internet so this young pilot doesn’t get an IP address. The cnpilot just never gets an IP address and hangs with an empty DHCP on the WAN interface until you hard reboot it

I’m assuming this is is IPv4? IPv6 was / is broken (haven’t tested in a while). When the session drops iPv6 stops functioning. Thinking about it, could be a DHCP issue over IPv4/6 in general.

Check LAN/Advanced and set the MTU. Maybe play with that. There’s also MTU setting for the WAN interfaces. To be honest, I haven’t tested the WAN MTU thing. Test pings with allow fragmentation off.

What is the latency from the DHCP server to Router?