Core Network Meltdown Since Installing 4.7.4-R19

We are having massive network wide issues since upgrading our cnPilot routers to 4.7.4-R19 .

SYMPTOMS: Devices don’t get DHCP addresses, cnPilots vanish from the cloud, config changes can’t be made, software updated/downgrades can’t be made, and eventually routers crash.

The core issues seems to stem from a massive crit_techsup file generated in /etc/cambium

This issue prevents any firmware updates/downgrades from happening. It prevents any config changes from being made. And it eventually crashes the router and requires a truck roll to fix.

We have been talking to Cambium support for over 3 weeks and get only “This is a critical issue internally”, yet as we’re having a literal core meltdown there isn’t even so much as a work around or solution provided yet.

I’ve e-mailed and talked to on the phone our RTM on Thursday and have not heard anything further from him.

Where can I take this further up the chain in Cambium, and is anyone else seeing this issue?

What version did you upgrade from?

Only issue I’ve found is with routing out “Other” interfaces. Nothing else.

Multiple version… really can’t tell you.

Are you running version 4.7.4-R19 ?

If so, can you log into one of your routers and run the commands:

df

and then:

ls -l /etc/cambium

?

Apparently between certain versions, a factory reset was was needed. Whether it merely needed a fresh config (thereby omitting erroneous parameters), or it actually changed something more permanently, I cannot confirm. that being said, every time I get a new router I upgrade the firmware and factory reset the config, then load a config template.I can’t recall the version past which need this procedure was needed (check release notes).

Yes I am running 4.7.4-R19. I’ll check the output of those commands when I get a gap. What is your output?

Some questions:

  1. Over WiFi or cable?
  2. Which model of the R series?
  3. What SM (ePMP / PMP450)?
  4. What is the layout of the client install? SM → R-Series (WAN) → customer (LAN / WiFi) ?
  5. Are there any other devices or switches used inline? Or that you install on the LAN?
  6. Have there been any lightning strikes or storms?
  7. Do you use VLANs on the WAN, and if so what interface types? Trunk or implicit VLAN?
  8. Are you querying the devices via SNMP?

Some of the above answers may give me an idea of the issue.

Over WiFi or cable?

Doesn’t matter

Which model of the R series?

190, and 195

What SM (ePMP / PMP450)?

Doesn’t matter. Ubiquiti, PMP450, or BaiCells

What is the layout of the client install? SM → R-Series (WAN) → customer (LAN / WiFi) ?

SM → R-Series WAN → Customer (LAN/WiFi) yes

Are there any other devices or switches used inline? Or that you install on the LAN?

No other devices

Have there been any lightning strikes or storms?

No this is across the entire network 500+ devices

Do you use VLANs on the WAN, and if so what interface types? Trunk or implicit VLAN?

No VLANS

Are you querying the devices via SNMP?

We don’t query cnPilots via SNMP

The issue is the storage space on the cnPilot has run out of disk space.

# touch blah
touch: blah: No space left on device
# df
Filesystem           1k-blocks      Used Available Use% Mounted on
rootfs                   10496     10496         0 100% /
/dev/root                10496     10496         0 100% /
/dev/mtdblock13            384       340        44  89% /etc/cambium
# ls -l
-rw-r--r--    1 0        0             121 keystore
-rw-r--r--    1 0        0               4 DA_VERSION
-rw-r--r--    1 0        0           33850 bkup-config.txt
-rw-r--r--    1 0        0          451733 crit_techsup
-rw-r--r--    1 0        0            1400 udhcpd.leases
-rw-r--r--    1 0        0              19 account_id
# rm crit_techsup
# df
Filesystem           1k-blocks      Used Available Use% Mounted on
rootfs                   10496     10496         0 100% /
/dev/root                10496     10496         0 100% /
/dev/mtdblock13            384       220       164  57% /etc/cambium

rootfs and /dev/root are read only. “touch: blah: Read-only file system”. My output below.

# df
Filesystem           1k-blocks      Used Available Use% Mounted on
rootfs                   10496     10496         0 100% /
/dev/root                10496     10496         0 100% /
/dev/mtdblock13            384       260       124  68% /etc/cambium

# touch /etc/cambium/blah
# ls -l /etc/cambium
-rw-r--r--    1 0        0             123 keystore
-rw-r--r--    1 0        0               4 DA_VERSION
-rw-r--r--    1 0        0           33840 bkup-config.txt
-rw-r--r--    1 0        0            3290 crit_techsup
-rw-r--r--    1 0        0            1080 udhcpd.leases
-rw-r--r--    1 0        0              21 account_id
-rw-r--r--    1 0        0               0 blah

I would check that your MTUs are correct (at least 1518) and that you don’t have any fragmentation through your network. If there is, then set the desired MTU in the WAN interface settings (accounting for TCP overhead and the protocol over your network. I take it you don’t have a management network? I only use VLANs, but at the very least you should have a management network on a VLAN.

1 Like