-
Notifications
You must be signed in to change notification settings - Fork 7.7k
Description
Hardware:
Core Installation version: 1.6.0
Computer OS: ?Windows 10? ?Mac OSX? ?Ubuntu?
Description:
I have an app specifically designed to find the "safe" lower heap limit which will allow any app to continue to run. Above 80k , it works exactly as designed, forcing the "heap grabbing" routine to back off once the lower limit is exceeded, and allowing it to continue as the heap is freed and returns back above a (higher) safe value.
The UI
The TL;DR is that Q is std::queue of asynchronous SSE writes which update the UI in "real time" (or at least as fast as the TCP TX rate will allow - currently can't seem to get past 20/s) once the TX rate is exceeded they get Q'd and the matching TCP ACK then pops them off the Q, freeing some heap, allowing more sends...rinse and repeat. The "Visualiser rate" is the rate per second that the Visualiser "LED" is flashed. Each "flash" is an SSE event and there is a rotary encoder hooked up to the Visualiser rate so I can just turn up / down the rate at which the Q fills / drains, and so by adjusting that and the cut-in and cut-out values I can force or avoid a low heap crash. The "Go" button (and the center button of the encoder on GPIO32) stop and start the visualiser so I can gauge the rate of Q fill/drain. The aim of the app is to find the exact lowest-safe-heap breaking point so that "production" code can be tuned to avoid it by judicious choice of of cut-out and cut-in values.
It all works wonderfully until it gets below about 80k, then I'm getting some strange results I don't understand which seems "plain wrong"
Crash Log
On a run with a lower bound "Heap Cut-out" of 50000 the following crash occured:
H= free heap, MB=Max free heap block
00:21:53.684 -> AARD:1: <---- AK 39 bytes Q=1334 H=77268 MB=67952 LOK=0
00:21:53.730 -> AARD:1: <---- AK 39 bytes Q=1337 H=77016 MB=67952 LOK=0
00:21:53.777 -> AARD:1: <---- AK 39 bytes Q=1340 H=76760 MB=67952 LOK=0
00:21:53.871 -> AARD:1: <---- AK 39 bytes Q=1346 H=76224 MB=67952 LOK=0
00:21:53.916 -> abort() was called at PC 0x40169a3f on core 1
00:21:53.916 ->
00:21:53.916 -> ELF file SHA256: 0000000000000000
00:21:53.916 ->
00:21:53.916 -> Backtrace: 0x400887c4:0x3ffb1d90 0x40088a41:0x3ffb1db0 0x40169a3f:0x3ffb1dd0 0x40169a86:0x3ffb1df0 0x4016915f:0x3ffb1e10 0x40169496:0x3ffb1e30 0x401691fd:0x3ffb1e50 0x400e5733:0x3ffb1e70 0x400e7946:0x3ffb1eb0 0x400db636:0x3ffb1ed0 0x400e01be:0x3ffb1ef0 0x400e0ee7:0x3ffb1f10 0x400e0f8a:0x3ffb1f30 0x400e0fa6:0x3ffb1f50 0x400e0ffe:0x3ffb1f90 0x400f0eed:0x3ffb1fb0 0x40089a52:0x3ffb1fd0
00:21:53.963 ->
The failing line is a malloc(1436) call in WiFiUDP, but previous examples have been in my own code, ArduionOTA, MDNS...however in every instance the abort is called on a malloc requesting some small value well below what appears to be available.
Stack Trace
Decoding stack results
0x400887c4: invoke_abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp32/panic.c line 156
0x40088a41: abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp32/panic.c line 171
0x40169a3f: __cxxabiv1::__terminate(void (*)()) at /builds/idf/crosstool-NG/.build/src/gcc-5.2.0/libstdc++-v3/libsupc++/eh_terminate.cc line 47
0x40169a86: std::terminate() at /builds/idf/crosstool-NG/.build/src/gcc-5.2.0/libstdc++-v3/libsupc++/eh_terminate.cc line 57
0x4016915f: __cxxabiv1::__cxa_throw(void*, std::type_info*, void (*)(void*)) at /builds/idf/crosstool-NG/.build/src/gcc-5.2.0/libstdc++-v3/libsupc++/eh_throw.cc line 87
0x40169496: operator new(unsigned int) at /builds/idf/crosstool-NG/.build/src/gcc-5.2.0/libstdc++-v3/libsupc++/new_op.cc line 54
0x401691fd: operator new[](unsigned int) at /builds/idf/crosstool-NG/.build/src/gcc-5.2.0/libstdc++-v3/libsupc++/new_opv.cc line 32
0x400e5733: WiFiUDP::parsePacket() at C:\Users\phil\AppData\Local\Arduino15\packages\esp32\hardware\esp32\1.0.6\libraries\WiFi\src\WiFiUdp.cpp line 210
0x400e7946: ArduinoOTAClass::handle() at C:\Users\phil\AppData\Local\Arduino15\packages\esp32\hardware\esp32\1.0.6\libraries\ArduinoOTA\src\ArduinoOTA.cpp line 379
0x400db636: std::_Function_handler >::_M_invoke(const std::_Any_data &) at C:\Users\phil\Documents\Arduino\libraries\h4plugins\src\H4P_WiFi.cpp line 189
0x400e01be: std::function ::operator()() const at c:\users\phil\appdata\local\arduino15\packages\esp32\tools\xtensa-esp32-elf-gcc\1.22.0-97-gc752ad5-5.2.0\xtensa-esp32-elf\include\c++\5.2.0/functional line 2271
0x400e0ee7: task::operator()() at C:\Users\phil\Documents\Arduino\libraries\H4\src\H4.cpp line 154
0x400e0f8a: H4::next() at C:\Users\phil\Documents\Arduino\libraries\H4\src\H4.cpp line 270
0x400e0fa6: H4::loop() at C:\Users\phil\Documents\Arduino\libraries\H4\src\H4.cpp line 358
0x400e0ffe: loop() at C:\Users\phil\Documents\Arduino\libraries\H4\src\H4.cpp line 315
0x400f0eed: loopTask(void*) at C:\Users\phil\AppData\Local\Arduino15\packages\esp32\hardware\esp32\1.0.6\cores\esp32\main.cpp line 23
0x40089a52: vPortTaskWrapper at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/port.c line 143
Questions
- Why does malloc(1436) fail when ESP.getFreeHeap() shows 76224 and ESP.getMaxAllocHeap() shows 67952? Most of the fails in my own code are of only 30 or 40 bytes, but still the failing line is always a malloc while apparently huge amounts of free heap are available
- I can hear someone saying "heap corruption rather than exhaustion" but why then when the lower limit is in the 80s or 90s does it run all day without missing a beat? What possible things could be happening in the background that only screw-up below 77k ? Why 77k?
- If the ESP heap calls are not the best predictor of impending heap failure, what other metric is? What is the safest and most reliable way of knowing when you are running dangerously low on memory?
Activity
igrr commentedon Jul 2, 2021
Just to make sure the heap is not too fragmented, can you try calling heap_caps_get_largest_free_block(MALLOC_CAP_DEFAULT) and printing its output as well?
philbowles commentedon Jul 2, 2021
I had tried that before, tried that it: gave exactly the same value as ESP.maxHeapBlock. but I guess I didn't get as far with the code at that time.
In the early stages, I see it is indeed the same for a long time, but starts to frgament a little around FH=~185k. AT that point it gets worse and worse fast, but interestingly, while ESP.maxHeapBlock also starts fragmenting at the same point, after a while the two values start to diverge, with ESP.maxHeapBlock showing a much higher figure:
...
and a few seconds later when the limit of 78k FH is passed, there is > 10-fold discrepancy:
SO I guess the new questions are:
Next I tried to force the crash again: by dropping the limit to 50000. And now I see why the crash is occuring: I won't even bother to do the dump because we both know where it will be!
Ok so this explains the FH ~78k issue: its actually the fragmentation reaching crazy levels and other tasks needing that memory. This more than ever convinces me that ESP.maxHeapBlock is broken. I will run it again and see if ESP.getFreeHeap differs from heap_caps_get_free_size
philbowles commentedon Jul 2, 2021
heap_caps_get_free_size(MALLOC_CAP_DEFAULT is showing a huge difference - is that the right call? If not what is the heap_caps_* equivalent of ESP.getFreeHeap?
philbowles commentedon Jul 2, 2021
Trusting ESP.getFreeHeap seems to be the problem: again there is this 10-fold discrepancy at the point of crash:
Pablo2048 commentedon Jul 2, 2021
ESP.getFreeHeap
equivalent isarduino-esp32/cores/esp32/Esp.cpp
Line 137 in c37557c
philbowles commentedon Jul 2, 2021
So whats the difference between MALLOC_CAP_DEFAULT vs MALLOC_CAP_INTERNAL ? The real question is: what is the correct call to ACCURATELY predict when the heap is dangerously low? Obvioulsy if I'm making a big allocation I would check heap_caps_get_largest_free_block(MALLOC_CAP_DEFAULT) first but in a multi-tasking environment, its the other background tasks I have to think of too in the general picture and not starve them - I need a figure for the overall free heap. Obviously I'm going to add e.g. 10% safety margin to what that figure is at run-time, but I need to know a reliable and accurate way to get it!
Already obvious that ESP.* calls aren't the answer!
igrr commentedon Jul 2, 2021
MALLOC_CAP_INTERNAL describes any internal memory — for example, 32-bit aligned IRAM region is also included. Since you never allocate from that region, it is mostly "free" and this explains why the value gets "stuck".
MALLOC_CAP_DEFAULT is the same memory type as will be allocated by malloc. When PSRAM is not enabled, it will be MALLOC_CAP_INTERNAL | MALLOC_CAP_8BIT — i.e. internal memory accessible at 1-byte granularity with no alignment restrictions.
If PSRAM is enabled, and
malloc
is configured to also use PSRAM, then MALLOC_CAP_DEFAULT will also include external memory into the calculation.So MALLOC_CAP_DEFAULT can return either smaller or larger value than MALLOC_CAP_INTERNAL.
If you want to see the value that is related to the heap that
malloc
andfree
functions use, then you should use MALLOC_CAP_DEFAULT.philbowles commentedon Jul 2, 2021
@igrr thanks for that clear explanatin so are you saying I should replace my calls to ESP.getFreeHeap() with heap_caps_free_size(MALLOC_CAP_DEFAULT) ? Does this also explaing the max heap block discrepancy between ESP. and heap_caps ? Is this just a documentation isuue?
My ultimate goal is to get a single figure figure below which I should back off any more of my own heap allocation to let the other tasks "breathe", and prevent a crash.
I can live with "if MBLIMIT < maxblocks OR FSLIMIT < freespace then backoff"... I just need to know the "best" way if the ESP. functions can't give me a good answer, whihc at the moment it seems they can't
igrr commentedon Jul 2, 2021
@me-no-dev do you recall why ESP.getFreeHeap and ESP.maxHeapBlock used MALLOC_CAP_INTERNAL instead of MALLOC_CAP_DEFAULT? It seems that showing the amount of heap accessible by
malloc
(which is MALLOC_CAP_DEFAULT) would cause less confusion.Pablo2048 commentedon Jul 4, 2021
Well, it seems like MALLOC_CAP_DEFAULT show all the available memory, including PSRAM on Wrover modules - ESP.getFreeHeap shows 172696 bytes, heap_caps_free_size(MALLOC_CAP_DEFAULT) shows 4356824 bytes.
me-no-dev commentedon Jul 5, 2021
@igrr what I recall is that prior to 2.0.0, PSRAM was not mapped to malloc and that we wanted FreeHeap to show the internal memory, while FreePSRAM to show the external. It probably makes sense to be changed now that all is available through malloc
igrr commentedon Jul 5, 2021
Okay, if you decide to keep MALLOC_CAP_INTERNAL, please add
| MALLOC_CAP_8BIT
since otherwise the 32-bit-only IRAM region that isn't available through malloc is included (on ESP32).debug: Only report byte accessable free heap.
stale commentedon Sep 6, 2021
[STALE_SET] This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
6 remaining items