Zulip Chat Archive
Stream: lean4
Topic: The state of the code generator
Hanting Zhang (Jan 08 2023 at 16:48):
From the recent thread :upside_down::
this is a workaround for the fact that there are two lean compilers
Can someone explain some of the end-to-end details behind what is happening with the code generator? I've been able to piece things together by going to dev meetings, looking at the PR history, and scanning the compiler code base, so I have ~some idea of what's going on. However, I've never heard the "full story" and would appreciate any insight!
Henrik Böving (Jan 08 2023 at 17:11):
There exists a 2 part old code generator, the first part is written in C++ in src/library/compiler
, the 2nd part is written in Lean in src/Lean/Compiler/IR
. The Lean part implements the lambda RC part of the pipeline, most notably the garbage collection optimizations + outputting C or LLVM code based on that. This part is written in Lean due to historical reasons.
src/Lean/Compiler/LCNF
is right now a reimplementation of src/library/compiler
in Lean and better. It is not fully functional because we cannot generate actual executable code with it yet but basically all relevant parts of the old C++ part are already ported here and I do have a branch laying around that gets 90% the way there to attach the LCNF Lean implementation to the IR implementation so that we can produce executable code, we're probably gonna integrate that soon :tm: Once that works we can basically drop the C++ part of the compiler.
The next step from there would be to port src/Lean/Compiler/IR
to the LCNF representation, this involves two major things:
- We will need to port the new LLVM backend that is available for the IR part right now. I don't know if we want to port the C backend that is in use right now as well.
- We will need to port the lambda rc optimizations
Once we have that tackled we can also drop src/Lean/Compiler/IR
, then the entire compiler will live in src/Lean/Compiler/LCNF
and we can be happy at last :octopus:
Hanting Zhang (Jan 08 2023 at 17:21):
Does it look something like this?
Lean -> LCNF (originally in `library/compiler`, rewritten in `Lean/Compiler/LCNF`; theoretically done, but untested)
LCNF -> LCNF.IR (originally ???, rewritten in Lean soon?; not done, but should be soon)
(afaiu Perceus is written in Lean)
LCNF.IR -> LLVM (originally `C/C++` in `EmitC.lean`?, rewritten in `EmitLLVM.lean`?; theoretically done?)
Hanting Zhang (Jan 08 2023 at 17:22):
What does "porting the LLVM/C backend" mean?
Henrik Böving (Jan 08 2023 at 17:23):
Yes it does look like that pretty much
Right now IR
supports outputting the final code as both LLVM and C, if we want to get rid off IR
we have to reimplement that for the LCNF format.
Hanting Zhang (Jan 08 2023 at 17:35):
By get rid of IR
, do you mean removing the LCNF.IR
type? Or is the plan to move LCNF.IR into Compiler/LCNF and reimplement it, porting the lowering/Perceus/EmitLLVM
Also src/runtime
will remain, right? Then we will link against its binaries as normal
Hanting Zhang (Jan 08 2023 at 17:36):
(deleted)
Henrik Böving (Jan 08 2023 at 17:43):
We want to drop all of the old code generator and implement it completely with the LCNF type in one pipeline and yes that involves porting perceus and EmitLLVM to LCNF.
I am not aware of plans to touch the runtime, however we might end up eventually reimplementing the C++ based interpeter while we are at it.
Hanting Zhang (Jan 08 2023 at 17:48):
Ah so the new pipeline should be
Lean -> LCNF
LCNF -> LCNF (base phase, mono phase)
LCNF -> LCNF (refcounting here?)
LCNF -> LLVM
Henrik Böving (Jan 08 2023 at 17:54):
Yes
Hanting Zhang (Jan 08 2023 at 17:55):
:O ok interesting, thank you!
Locria Cyber (Feb 03 2023 at 21:20):
Henrik Böving said:
Yes it does look like that pretty much
Right now
IR
supports outputting the final code as both LLVM and C, if we want to get rid offIR
we have to reimplement that for the LCNF format.
How to use the LLVM backend?
Henrik Böving (Feb 03 2023 at 21:29):
There was a flag for it I tihnk? @Siddharth Bhat should know
Locria Cyber (Feb 03 2023 at 21:38):
I found the flag in CMakefiles.txt. I had to manually edit those.
After I enabled it, what CLI flags do I have to use?
Siddharth Bhat (Feb 03 2023 at 21:44):
-DLLVM=ON
. Note that one needs to point to a copy of llvm-config
that is LLVM 15.
Locria Cyber (Feb 03 2023 at 21:45):
Siddharth Bhat said:
-DLLVM=ON
. Note that one needs to point to a copy ofllvm-config
that is LLVM 15.
The CMake flag is not public
Locria Cyber (Feb 03 2023 at 21:45):
It's not editable with make edit_cache
Locria Cyber (Feb 03 2023 at 21:56):
I got this error
lean_llvm_parse_bitcode ERROR: file doesn't start with bitcode header
LEAN ASSERTION VIOLATION
File: compiler/llvm.cpp
Line: 1065
!is_error && "failed to parse bitcode"
'unreachable' code was reached
Siddharth Bhat (Feb 03 2023 at 21:56):
@Locria Cyber interesting, I have not used edit_cache
. I usually configure these with cmake ../ -DLLVM=ON -DLLVM_CONFIG=/path/to/llvm/15
. I shall change the flag to be public once we are less overwhelmed with high priority mathlib
porting changes.
Locria Cyber (Feb 03 2023 at 21:56):
(deleted)
Locria Cyber (Feb 03 2023 at 21:57):
Siddharth Bhat said:
Locria Cyber interesting, I have not used
edit_cache
. I usually configure these withcmake ../ -DLLVM=ON -DLLVM_CONFIG=/path/to/llvm/15
. I shall change the flag to be public once we are less overwhelmed with high prioritymathlib
porting changes.
I can do that. There are two CMakeLists.txt, one in stage0/. It seems like I have to change both
Siddharth Bhat (Feb 03 2023 at 22:00):
@Locria Cyber there is no need. Right now, even with the LLVM backend enabled, lean still compiles _itself_ using the C backend. Compiling the stage2
lean compiler via the LLVM backend is under development: https://github.com/bollu/lean4/tree/llvm/stage2-build
Locria Cyber (Feb 03 2023 at 22:01):
Locria Cyber said:
I got this error
lean_llvm_parse_bitcode ERROR: file doesn't start with bitcode header LEAN ASSERTION VIOLATION File: compiler/llvm.cpp Line: 1065 !is_error && "failed to parse bitcode" 'unreachable' code was reached
It seems like my build is broken. I built it as above, and used elan toolchain link xxx bulid/stage1
to set it as the default toolchain.
Siddharth Bhat (Feb 03 2023 at 22:03):
@Locria Cyber could you please help me reproduce this locally? What should I do, starting from a bare git clone
of lean to see this error?
Locria Cyber (Feb 03 2023 at 22:07):
I hope lean will bootstrap from WASM like Zig. No need to build stage0 then.
Locria Cyber (Feb 03 2023 at 22:13):
CMake Error at runtime/CMakeLists.txt:22 (message):
building 'lean.h.bc', need CMAKE_CXX_COMPILER_ID to match Clang to build
LLVM bitcode file of Lean runtime.
I got this when building stage1
Jireh Loreaux (Feb 03 2023 at 22:14):
sorry, wrong thread
Siddharth Bhat (Feb 03 2023 at 22:15):
@Locria Cyber could you please send a pastebin of:
$ cmake --trace <YOUR_OPTIONS_HERE>
$ VERBOSE=1 make <YOUR_OPTIONS_HERE>
Locria Cyber (Feb 03 2023 at 22:18):
First, I did cmake -G "Unix Makefiles" -B build -DLLVM=ON
gcc is selected as the compiler
I then changed the compiler to clang/clang++ in build/CMakeCache.txt
Locria Cyber (Feb 03 2023 at 22:18):
CMAKE_C_COMPILER:FILEPATH=/usr/bin/cc
to CMAKE_C_COMPILER:FILEPATH=/usr/bin/clang
CMAKE_CXX_COMPILER:FILEPATH=/usr/bin/c++
to CMAKE_CXX_COMPILER:FILEPATH=/usr/bin/clang++
Locria Cyber (Feb 03 2023 at 22:18):
Locria Cyber said:
CMake Error at runtime/CMakeLists.txt:22 (message): building 'lean.h.bc', need CMAKE_CXX_COMPILER_ID to match Clang to build LLVM bitcode file of Lean runtime.
I got this when building stage1
Then this error
Siddharth Bhat (Feb 03 2023 at 22:20):
@Locria Cyber You might want to ensure that you have a system clang
installed. Could you try to nuke your build
directory and retry with:
$ CC=clang CXX=clang++ cmake ../ -DLLVM=ON
I believe that manually editing the CMAKE_{C,CXX}_COMPILER
does not refresh CMAKE_CXX_COMPILER_ID
.
Locria Cyber (Feb 03 2023 at 22:25):
Is there a way to skip stage0 and use existing lean compiler?
Siddharth Bhat (Feb 03 2023 at 22:27):
Not that I'm aware of, perhaps @Sebastian Ullrich knows :)
Locria Cyber (Feb 03 2023 at 22:27):
It's too slow to build it every time
Siddharth Bhat (Feb 03 2023 at 22:29):
@Locria Cyber if you have ccache
, it ought to be much faster the subsequent times
Locria Cyber (Feb 03 2023 at 22:29):
When will the build system gets changed from CMake+Make?
Locria Cyber (Feb 03 2023 at 22:29):
I found the problem. compiler detection of stage1 happens independently.
Henrik Böving (Feb 03 2023 at 22:30):
Locria Cyber said:
When will the build system gets changed from CMake+Make?
That is not at all on the list right now afaik.
Locria Cyber (Feb 03 2023 at 22:32):
It's too flaky.
Locria Cyber (Feb 03 2023 at 22:32):
Locria Cyber said:
I found the problem. compiler detection of stage1 happens independently.
Solved with env var CC, CXX
Siddharth Bhat (Feb 03 2023 at 22:33):
@Locria Cyber thanks, I shall write that down.
Locria Cyber (Feb 03 2023 at 22:40):
Siddharth Bhat said:
Locria Cyber thanks, I shall write that down.
Should I open an issue
Siddharth Bhat (Feb 03 2023 at 22:45):
Please do so.
Locria Cyber (Feb 03 2023 at 22:46):
Now it works (can compile to bc)
However, I got
❯ lean --bc=Hello.bc --target=wasm32-freestanding Hello.lean
'x86-64' is not a recognized processor for this target (ignoring processor)
'+cx8' is not a recognized feature for this target (ignoring feature)
'+fxsr' is not a recognized feature for this target (ignoring feature)
'+mmx' is not a recognized feature for this target (ignoring feature)
'+sse' is not a recognized feature for this target (ignoring feature)
'+sse2' is not a recognized feature for this target (ignoring feature)
'+x87' is not a recognized feature for this target (ignoring feature)
'x86-64' is not a recognized processor for this target (ignoring processor)
'x86-64' is not a recognized processor for this target (ignoring processor)
'+cx8' is not a recognized feature for this target (ignoring feature)
'+fxsr' is not a recognized feature for this target (ignoring feature)
'+mmx' is not a recognized feature for this target (ignoring feature)
'+sse' is not a recognized feature for this target (ignoring feature)
'+sse2' is not a recognized feature for this target (ignoring feature)
'+x87' is not a recognized feature for this target (ignoring feature)
'x86-64' is not a recognized processor for this target (ignoring processor)
Siddharth Bhat (Feb 03 2023 at 22:52):
@Locria Cyber those are warning emitted by the LLVM toolchain, I believe? Does it produce a <FILENAME>.bc.o
file in the same folder, which should be a wasm blob?
Locria Cyber (Feb 03 2023 at 22:52):
Yes it does
Locria Cyber (Feb 03 2023 at 22:56):
Slight problem
Locria Cyber (Feb 03 2023 at 22:56):
the main function signature is incorrect
Locria Cyber (Feb 03 2023 at 22:56):
1 ❯ zig cc --target=wasm32-wasi -lc -o Hello.wasm Hello.bc.o
LLD Link... wasm-ld: warning: function signature mismatch: main
>>> defined as (i32, i32) -> i32 in /home/user/.cache/zig/o/9e4aeb420731c7ddc81a4c7da00c93e4/libc.a(/home/user/.cache/zig/o/e856621094245318444f2a7d4ed86adf/__main_void.o)
>>> defined as (i64, i32) -> i64 in Hello.bc.o
Locria Cyber (Feb 03 2023 at 22:57):
That's definitely not right. wasm32 is 32-bit
Locria Cyber (Feb 03 2023 at 23:22):
@Siddharth Bhat Do you think it's ok to stop at emitting bytecode? I think lean
doesn't need to compile the object file.
Also, it's better to emit LLVM IR (for readability).
Siddharth Bhat (Feb 04 2023 at 09:50):
@Locria Cyber Couple of things:
- Since we already have the LLVM bitcode in memory, it ought to be faster for
lean
to build the object file, rather than paying the cost of savingfile.bc
, then launchllc
to compilefile.bc
intofile.o
. I'll be benchmarking this soon :tm: - One can always go back from LLVM bitcode to readable LLVM with
opt -S file.bc -o file.llvm
Sebastian Ullrich (Feb 04 2023 at 09:56):
Writing out .bc
would probably be better for implementing ccache-like caching, or even the basic version "do not recompile .o if .bc is exactly the same as before" that we can implement even in Make
Locria Cyber (Feb 15 2023 at 16:53):
Hello @Sebastian Ullrich . Is there a way to skip stage0 and use existing lean
?
Locria Cyber (Feb 15 2023 at 16:56):
I found a way
cd build/
rm -r stage0
rsync -a ~/.elan/toolchains/leanprover--lean4---nightly/ stage0/
touch stage0-prefix/src/stage0-stamp/stage0-build
# now stage0 is "built"
make
Henrik Böving (Feb 15 2023 at 17:35):
i think you should at least make -t stage0-prefix/src/stage0-stamp/stage0-build
to make that reliable
Last updated: Dec 20 2023 at 11:08 UTC