Random rebooting of PMP450i Access points due to memory allocation failure with NTP Server issues on 21.0 and 21.1

arobo · July 31, 2022, 9:46am

Hi,

We are seeing a strange and regular random issue on just our 450i access points. We are seeing random reboots based on below -

07/31/2022 : 19:27:41 AEST : :Resetting due to memory allocation failure
07/31/2022 : 19:27:41 AEST : :Forced reset;
07/31/2022 : 19:27:41 AEST :
System Startup
System Reset Exception – Reset due to FatalError
Software Version : CANOPY 21.1 AP

All our 450i are seeing the same error and rebooting. Whilst they are on 21.1 - we did see this behaviour on previous versions (20.3 onwards).

We are not seeing the same error or reboot issues on our 450m Access points. Has anyone else seen this issue?

Eric_Ozrelic · July 31, 2022, 5:17pm

Is this on 3GHz/CBRS & 5GHz 450i AP’s? Have you tried the newest beta just on the AP?

@Charlie anything else you can think of other then raising a ticket and submitting diags?

arobo · July 31, 2022, 9:50pm

Hi Eric,

The issue is occurring on the 5Ghz 450i units. I had thought it was a GPS issue (all have two GPS inputs - cnPulse on the AUX and main power sync). I had seen this issue before on the 5Ghz 450m units …

I have supported a ticket and submitted diags - but seeing if anyone else had an issue - as it is regularly occurring on all our 5Ghz 450i units.

I will try the beta tonight (was waiting for release - but will test beta).

Charlie · August 1, 2022, 4:40pm

@arobo, would you send me the engineering.cgi?

I hope this is the same issue already fixed in 21.1.0.1 BETA-3:
CPY-18035: AP with unreachable NTP server is running into FatalError reset due to fail to allocate mem, out of small heap in 21.0 or 21.1.

This issue only happens when on of the listed NTP servers is unreachable. It was introduced in 21.0.

We have not released 21.1.0.1 BETA-3 yet, unfortunately. A workaround would be to reconfigure your NTP servers to something that is valid and can be reached, or 0.0.0.0.

Thanks for posting and I hope it this problem. eng.cgi will confirm it for me.

We are planning to release 21.1.0.1 officially in around two weeks time, unless something urgent comes up in the meantime that we need to fix.

arobo · August 1, 2022, 10:20pm

Hi Charlie, I have already sent in the eng.cgi in a support ticket

Support ID -315833

I can send to you directly if required. I can try the BETA tonight for you - the NTP servers on the APs are the same as the 450m Access points and they should be reachable (will double check nothing has gone awry). Also will try 0.0.0.0

Thanks always for the support.

Charlie · August 1, 2022, 11:15pm

Thanks for that, I’m a developer, so don’t ever do the “look up what tickets a customer has open” thing. Would have been smart to ask someone to do it for me though. Oops. Thanks for the ticket number.

Yeah I don’t like the HTML format, next time try to upload the eng.cgi in XML if you would.

Yes, I have confirmed this is the same unreachable NTP server memory leak issue we have fixed in 21.1.0.1 BETA-3. We will be getting that posted soon.

You’re welcome. But it’s sort of like thanking a firefighter for putting out a fire… that they started. hehe

But I do appreciate you taking the time to post and being willing to try the beta. We need customers like you to make this as well as time allows.

arobo · August 3, 2022, 12:04pm

Hi Charlie,

A bit more info here.

We have updated to the beta - but still having issues with NTP update on the AP. It is very strange as we can test the NTP server and seems to be working fine.

NTP time update to the switch that the APs are connected to using the same NTP server … works fine.

Pinging the NTP server from the AP works fine. Pinging from the NTP to the AP works fine. So network connectivity to both is ok

There is no firewall issues - and as stated other devices are updating their time fine from the same NTP server. But the access points are still stating Server time out on the NTP page of the AP… .We will keep looking. (Happy for suggestions though )

Charlie · August 3, 2022, 3:38pm

Yeah the beta just fixed a memory leak with the NTP server, not NTP parsing. So at least your AP won’t reset after a long time in this condition.

But we still need to figure out why it’s not updating your time. Lets see if we can reproduce it here.

In the meantime, if you try another NTP server does it work?

arobo · August 11, 2022, 10:08am

Hi everyone - update - we have found the issue of the NTP - the NTP server was the issue as we needed to relax the limitations a bit for the network the PMP APs were on. Once we reduced some of the limitations the APs updated correctly. So lesson learned to check the NTP config and make sure the correct settings are applied for the specific network the APs are on to allow correct NTP updates. And yes the new Beta also stopped the rebooting

Charlie · August 11, 2022, 6:11pm

Thanks for reporting back and good to hear. Appreciate confirming beta fixed the reset.

jon3laze · November 5, 2022, 12:35am

I’m running to a similar NTP issue on 450m’s. I can verifying connectivity pinging to/from AP and NTP server. I can see the traffic passing through the firewall on the NTP server. The AP will not update, it just continuously times out. I tried updating the firmware to the 21.1.0.1 but that doesn’t seem to have fixed it.

@arobo Are you able to share what NTP config settings you changed? I don’t believe that I am restricting anything there but any guidance is appreciated.

jon3laze · November 5, 2022, 1:05am

I was able to locate the issue. It seems using the notrap restriction on the ntp.conf causes these AP’s to fail.

notrap
Decline to provide mode 6 control message trap service to matching hosts. The trap service is a subsystem of the ntpdc control message protocol which is intended for use by remote event logging programs.

If you run into this issue, you can remove the notrap option from the restrict line.

system · November 5, 2023, 1:05am

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.