Okay, yes, this is exactly what’s going on: the problem is triggered when Auto-RF selects a DFS channel to switch to.
So the conditions that must be met for this bug to trigger are as follows:
- There must be at least one open-security VAP
- It must be running on 5GHz
- Auto-RF must be enabled on the underlying radio
- ‘all-channels’ must be set on the underlying radio
- The Auto-RF subsystem decides to change channels, and it changes from a non-DFS channel to a DFS one
Here is my minimum viable config, and steps to reproduce:
wireless radio 1
shutdown
exit
wireless radio 2
channel-width 40
channel-list all-channels
off-channel-scan
auto-rf
exit
wireless wlan 1
ssid Test
band 5GHz
exit
For bug reproduction to succeed, we need the cnPilot to pick a non-DFS channel initially, and then switch to a DFS channel.
- Load software 4.2.2.1-r3 onto an E410.
- ‘delete config’ and reboot to ensure it is running at defaults.
- Apply & save the above sample config.
- If ‘show wireless radios’ shows that 5GHz radio picked a DFS channel from the start, then force it to try to immediately pick a new channel with the following:
wireless radio 2
channel 36
apply
channel auto
apply
exit
(just issuing ‘shutdown’ and ‘no shutdown’ does not work! you need to actually set non-auto channel, then set back to auto channel)
- If ‘show wireless radios’ continues to show a DFS channel, then repeat step 4 until it finally picks a non-DFS one.
- Use a second AP to generate noise on the channel that your test E410 picked. I do this by setting a second AP to use the same channel, connecting a client to it, and then running a continuous transmit (e.g., iperf test)
- Wait for Auto-RF on the test E410 to decide channel utilization exceeds the acceptable threshold, and performs a channel change (can take 30+ minutes).
If Auto-RF picks a DFS channel, then you can issue ‘service radio iwpriv wlan16 get_authmode’, and this will return ‘get_authmode:3’. This indicates the bug has been triggered. At this point you can try to connect a wireless client to the Test SSID, and will be able to verify that you cannot move any data through that VAP.
If Auto-RF changes the channel but picks another non-DFS one, then the bug will not be triggered (‘get_authmode’ will show ‘1’ and wireless clients will have no trouble moving data when connected to Test SSID), and you will need to try again until you can get the Auto-RF subsystem to pick a DFS channel.
I previously commented that I had a hard time understanding how such a bug – an Auto-RF channel change event breaking all open-security VAPs – could have managed to avoid being caught by the engineers. But now that it’s clear there is this additional condition that must be met (channel must change to a DFS one), it’s a bit more understandable how this could have been missed.
EDIT: It also looks like maybe ‘channel-width 40’ might also be necessary for bug to be tripped.