Zulip Chat Archive
Stream: lean4 dev
Topic: Code generator/compiler/memory allocator docs?
Tom (Oct 03 2024 at 21:14):
I managed to cause a crash in Lean (filed here).
At first, I thought is was simple stack overflow (due to the structure of the code, which is what I was trying to test). However, to investigate, I built a debug version of Lean and it's not entirely clear to me that it is:
The final part of the stack looks like this:
* thread #2, stop reason = EXC_BAD_ACCESS (code=2, address=0x16fe03fe8)
frame #0: 0x000000010889056c libleanshared.dylib`lean::allocator::heap* std::__1::__cxx_atomic_load[abi:ue170006]<lean::allocator::heap*>(__a=0x000000012781c000, __order=memory_order_seq_cst) at cxx_atomic_impl.h:364:47
frame #1: 0x0000000108890550 libleanshared.dylib`std::__1::__atomic_base<lean::allocator::heap*, false>::load[abi:ue170006](this=0x000000012781c000, __m=memory_order_seq_cst) const at atomic_base.h:60:14
frame #2: 0x0000000108890524 libleanshared.dylib`std::__1::__atomic_base<lean::allocator::heap*, false>::operator lean::allocator::heap*[abi:ue170006](this=0x000000012781c000) const at atomic_base.h:65:53
frame #3: 0x000000010888f1ac libleanshared.dylib`lean::allocator::page::get_heap(this=0x000000012781c000) at alloc.cpp:84:32
frame #4: 0x000000010888f088 libleanshared.dylib`lean::allocator::page::push_free_obj(this=0x000000012781c000, o=0x000000012781ca98) at alloc.cpp:210:20
frame #5: 0x0000000108890338 libleanshared.dylib`lean::dealloc_small_core(o=0x000000012781ca98) at alloc.cpp:430:12
frame #6: 0x000000010889036c libleanshared.dylib`lean_free_small(o=0x000000012781ca98) at alloc.cpp:446:5
frame #7: 0x000000010884f648 libleanshared.dylib`lean_free_small_object(o=0x000000012781ca98) at lean.h:374:5
frame #8: 0x000000010884f7e4 libleanshared.dylib`lean::lean_del_core(o=0x000000012781ca98, todo=0x000000016fe041f0) at object.cpp:273:9
frame #9: 0x000000010884f6d8 libleanshared.dylib`lean_dec_ref_cold(o=0x000000012781ca98) at object.cpp:329:13
* frame #10: 0x00000001053d3980 libleanshared.dylib`l___private_Lean_Util_Trace_0__Lean_checkTraceOption + 188
frame #11: 0x00000001063d2950 libleanshared.dylib`l_Lean_isTracingEnabledFor___at_Lean_Meta_processPostponed_loop___spec__1 + 80
frame #12: 0x00000001062bda08 libleanshared.dylib`l_Lean_Meta_whnfEasyCases___at_Lean_Meta_whnfCore_go___spec__3 + 1776
frame #13: 0x00000001062ed31c libleanshared.dylib`l_Lean_Meta_whnfEasyCases___at_Lean_Meta_whnfImp___spec__3___lambda__4 + 320
frame #14: 0x0000000108872a78 libleanshared.dylib`lean_apply_5(f=0x000000012780f208, a1=0x00000001278178b8, a2=0x0000000140531c28, a3=0x0000000127819848, a4=0x0000000140531c68, a5=0x0000000000000001) at apply.cpp:349:24
Looking higher up the stack, I see the function
frame #9: 0x000000010884f6d8 libleanshared.dylib`lean_dec_ref_cold(o=0x000000012781ca98) at object.cpp:329:13
326 #else
327 object * todo = nullptr;
328 while (true) {
-> 329 lean_del_core(o, todo);
So this appears to be a part of some garbage-collection process; in this case, specifically, hitting the "LEAN_SMALL_ALLOCATOR":
372 static inline void lean_free_small_object(lean_object * o) {
373 #ifdef LEAN_SMALL_ALLOCATOR
-> 374 lean_free_small(o);
375 #else
Ultimately, the crash happens inside of get_heap
:
-> 84 heap * get_heap() { return m_header.m_heap; }
Is there any documentation anywhere where I could read more about how the memory allocator works/is supposed to work in Lean?
In addition, is there a "trace"-like command I can use which writes eagerly to e.g. stderr? I don't get traces when Lean crashes. (Side note, I also noticed that tracing in debug mode is extremely slow and makes it less useful with deeply recursive functions).
Thanks!
Tom (Oct 03 2024 at 21:32):
sandebug
reports it's a stack overflow after all. However, if there's an answer to my questions, I'd still love to know.
Mario Carneiro (Oct 03 2024 at 21:55):
if you have a stack overflow, the top part of the stack is usually not interesting. Is there a huge middle part containing a repetitive element?
Mario Carneiro (Oct 03 2024 at 21:56):
the top part will be some random code which happened to be the first to cross the threshold while sitting on top of a giant stack of other calls using up most of stack memory
Tom (Oct 03 2024 at 21:58):
Yes, it seems to be coming from recursion in whnfCore.
I realized the top of the stack would be a read herring. My current hypothesis is that this:
let f' ← go f
In WHNF.lean on line 595 is not in a tail position.
Tom (Oct 03 2024 at 21:59):
e.g.
frame #10339: 0x00000001062eb93c libleanshared.dylib`l_Lean_withTraceNode___at_Lean_Meta_whnfImp___spec__1 + 664
frame #10340: 0x00000001062ec330 libleanshared.dylib`l_Lean_Meta_whnfEasyCases___at_Lean_Meta_whnfImp___spec__3 + 1912
frame #10341: 0x0000000106295ce4 libleanshared.dylib`l___private_Lean_Meta_WHNF_0__Lean_Meta_reduceRec___rarg + 1352
frame #10342: 0x00000001062bf958 libleanshared.dylib`l_Lean_Meta_whnfEasyCases___at_Lean_Meta_whnfCore_go___spec__3___lambda__4 + 2636
frame #10343: 0x00000001062bda94 libleanshared.dylib`l_Lean_Meta_whnfEasyCases___at_Lean_Meta_whnfCore_go___spec__3 + 1916
frame #10344: 0x00000001062bd0b0 libleanshared.dylib`l___private_Lean_Meta_WHNF_0__Lean_Meta_deltaBetaDefinition___at_Lean_Meta_whnfCore_go___spec__2 + 784
frame #10345: 0x00000001062bf86c libleanshared.dylib`l_Lean_Meta_whnfEasyCases___at_Lean_Meta_whnfCore_go___spec__3___lambda__4 + 2400
frame #10346: 0x00000001062bda94 libleanshared.dylib`l_Lean_Meta_whnfEasyCases___at_Lean_Meta_whnfCore_go___spec__3 + 1916
frame #10347: 0x00000001062c026c libleanshared.dylib`l_Lean_Meta_whnfEasyCases___at_Lean_Meta_whnfCore_go___spec__3___lambda__5 + 856
frame #10348: 0x00000001062bda94 libleanshared.dylib`l_Lean_Meta_whnfEasyCases___at_Lean_Meta_whnfCore_go___spec__3 + 1916
frame #10349: 0x00000001062ed31c libleanshared.dylib`l_Lean_Meta_whnfEasyCases___at_Lean_Meta_whnfImp___spec__3___lambda__4 + 320
frame #10350: 0x0000000108872a78 libleanshared.dylib`lean_apply_5(f=0x00000001006790c8, a1=0x0000000100685f38, a2=0x000000011fb35c28, a3=0x000000010067a738, a4=0x000000011fb35c68, a5=0x0000000000000001) at apply.cpp:349:24
Mario Carneiro (Oct 03 2024 at 22:00):
that's not enough calls to count
Mario Carneiro (Oct 03 2024 at 22:00):
is there on the order of 4000 similar looking calls in the call stack
Tom (Oct 03 2024 at 22:00):
If I'm understanding the mangling correctly, it's picking the third alternative of whngCore.go
, i.e. the app
version.
Mario Carneiro (Oct 03 2024 at 22:00):
that call is not in tail position, but this is expected, WHNF is just traversing the term and deep terms are bad
Tom (Oct 03 2024 at 22:01):
I didn't want to spam the thread with the full bt
but it's just a snippet that seems to be recurring, yes.
Mario Carneiro (Oct 03 2024 at 22:02):
in that case maybe you gave WHNF a deep term
Mario Carneiro (Oct 03 2024 at 22:02):
what is the code which is causing this error?
Tom (Oct 03 2024 at 22:02):
Yes, I linked it in the associated issue at the start of the thread.
Mario Carneiro (Oct 03 2024 at 22:03):
you cranked up the maxRecDepth, this is basically taking the safety off the footgun
Mario Carneiro (Oct 03 2024 at 22:03):
the maxRecDepth is there for a reason, to avoid you seeing the real stack limit
Mario Carneiro (Oct 03 2024 at 22:03):
the typeclass instance is generating a very deep term in that example
Tom (Oct 03 2024 at 22:04):
Right, I understand. I was trying to reproduce another issue for someone else which required me to do that. I can close the issue if your take is "works as expected". This crash was just a side-effect.
Mario Carneiro (Oct 03 2024 at 22:05):
you can probably get the crash to go away if you also increase the C stack limit, but this requires tinkering with OS settings
Tom (Oct 03 2024 at 22:07):
Right. I'll see if I can come up with a different approach.
Any thoughts about my other questions, specifically:
- Any code gen or allocator docs?
- A "trace"-like command I can use which writes eagerly to e.g. stderr?
Tom (Oct 03 2024 at 22:09):
It's a shame that whnfCore
is not tail recursive because it's pretty close. Is this one of the "frozen" sections of the code?
Mario Carneiro (Oct 03 2024 at 22:15):
it's not frozen, but it is "no user serviceable parts inside"
Tom (Oct 03 2024 at 22:15):
I'll move on then.
Mario Carneiro (Oct 03 2024 at 22:16):
the answer to whether there are docs for X in lean is usually no unless it's user-facing
Mario Carneiro (Oct 03 2024 at 22:16):
there are a few module docs here and there
Mario Carneiro (Oct 03 2024 at 22:18):
for tracing, there is dbg_trace "foo"
(which works also on pure functions) and trace[Meta.debug] "foo"
(which shows up with set_option trace.Meta.debug true
)
Tom (Oct 03 2024 at 22:19):
I don't see those outputs when Lean crashes though, unless I'm doing something wrong.
Mario Carneiro (Oct 03 2024 at 22:19):
you can try using panic!
, this will go to the output panel
Mario Carneiro (Oct 03 2024 at 22:19):
you usually don't want to write directly to stderr because lean is talking to vscode via LSP protocol and that would mess it up
Mario Carneiro (Oct 03 2024 at 22:20):
although actually maybe stderr is fine and it's only stdout that's off limits
Tom (Oct 03 2024 at 22:20):
Makes sense on LSP. I am having to run the the cmdline anyway since the code crashes Lean and hence the server.
Mario Carneiro (Oct 03 2024 at 22:20):
I think stderr also goes to the output panel
Mario Carneiro (Oct 03 2024 at 22:21):
if you are ok writing to stdout on a plain lean program you can use println!
Mario Carneiro (Oct 03 2024 at 22:22):
or IO.eprintln
for stderr
Tom (Oct 03 2024 at 22:23):
Ok, those two make sense but does the MetaM monad support IO? Which is where I was trying to do the stderr tracing.
Mario Carneiro (Oct 03 2024 at 22:26):
it does, most of lean's elaborator monads extend IO
Eric Wieser (Oct 03 2024 at 22:30):
Mario Carneiro said:
the maxRecDepth is there for a reason, to avoid you seeing the real stack limit
It might not be checked in the recursive loop?
Eric Wieser (Oct 03 2024 at 22:32):
Yeah, I see no checkSystem
withIncRecDepth
in go
Jannis Limperg (Oct 03 2024 at 22:47):
That whole file contains only a single withIncRecDepth
. :fear:
Mario Carneiro (Oct 03 2024 at 23:34):
actually I'm not sure that call can lead to deep recursion? It's calling go
on the head of an application, which means that the recursive call can't end up in the app case anymore
Mario Carneiro (Oct 03 2024 at 23:37):
oh but that call could end up calling go
on an arbitrary term, for instance the letE
case just instantiates the binder and tail-calls go
, so if you had an application whose head is a let that instantiates to an application with a let at the head and so on, you would get a non-tail-recursive stack as deep as the term
Last updated: May 02 2025 at 03:31 UTC