Zulip Chat Archive

Stream: lean4

Topic: The state of the code generator


Hanting Zhang (Jan 08 2023 at 16:48):

From the recent thread :upside_down::

this is a workaround for the fact that there are two lean compilers

Can someone explain some of the end-to-end details behind what is happening with the code generator? I've been able to piece things together by going to dev meetings, looking at the PR history, and scanning the compiler code base, so I have ~some idea of what's going on. However, I've never heard the "full story" and would appreciate any insight!

Henrik Böving (Jan 08 2023 at 17:11):

There exists a 2 part old code generator, the first part is written in C++ in src/library/compiler, the 2nd part is written in Lean in src/Lean/Compiler/IR. The Lean part implements the lambda RC part of the pipeline, most notably the garbage collection optimizations + outputting C or LLVM code based on that. This part is written in Lean due to historical reasons.

src/Lean/Compiler/LCNF is right now a reimplementation of src/library/compiler in Lean and better. It is not fully functional because we cannot generate actual executable code with it yet but basically all relevant parts of the old C++ part are already ported here and I do have a branch laying around that gets 90% the way there to attach the LCNF Lean implementation to the IR implementation so that we can produce executable code, we're probably gonna integrate that soon :tm: Once that works we can basically drop the C++ part of the compiler.

The next step from there would be to port src/Lean/Compiler/IR to the LCNF representation, this involves two major things:

  1. We will need to port the new LLVM backend that is available for the IR part right now. I don't know if we want to port the C backend that is in use right now as well.
  2. We will need to port the lambda rc optimizations

Once we have that tackled we can also drop src/Lean/Compiler/IR, then the entire compiler will live in src/Lean/Compiler/LCNF and we can be happy at last :octopus:

Hanting Zhang (Jan 08 2023 at 17:21):

Does it look something like this?

Lean -> LCNF (originally in `library/compiler`, rewritten in `Lean/Compiler/LCNF`; theoretically done, but untested)
LCNF -> LCNF.IR (originally ???, rewritten in Lean soon?; not done, but should be soon)
                (afaiu Perceus is written in Lean)
LCNF.IR -> LLVM (originally `C/C++` in `EmitC.lean`?, rewritten in `EmitLLVM.lean`?; theoretically done?)

Hanting Zhang (Jan 08 2023 at 17:22):

What does "porting the LLVM/C backend" mean?

Henrik Böving (Jan 08 2023 at 17:23):

Yes it does look like that pretty much

Right now IR supports outputting the final code as both LLVM and C, if we want to get rid off IR we have to reimplement that for the LCNF format.

Hanting Zhang (Jan 08 2023 at 17:35):

By get rid of IR, do you mean removing the LCNF.IR type? Or is the plan to move LCNF.IR into Compiler/LCNF and reimplement it, porting the lowering/Perceus/EmitLLVM

Also src/runtime will remain, right? Then we will link against its binaries as normal

Hanting Zhang (Jan 08 2023 at 17:36):

(deleted)

Henrik Böving (Jan 08 2023 at 17:43):

We want to drop all of the old code generator and implement it completely with the LCNF type in one pipeline and yes that involves porting perceus and EmitLLVM to LCNF.

I am not aware of plans to touch the runtime, however we might end up eventually reimplementing the C++ based interpeter while we are at it.

Hanting Zhang (Jan 08 2023 at 17:48):

Ah so the new pipeline should be

Lean -> LCNF
LCNF -> LCNF (base phase, mono phase)
LCNF -> LCNF (refcounting here?)
LCNF -> LLVM

Henrik Böving (Jan 08 2023 at 17:54):

Yes

Hanting Zhang (Jan 08 2023 at 17:55):

:O ok interesting, thank you!

Locria Cyber (Feb 03 2023 at 21:20):

Henrik Böving said:

Yes it does look like that pretty much

Right now IR supports outputting the final code as both LLVM and C, if we want to get rid off IR we have to reimplement that for the LCNF format.

How to use the LLVM backend?

Henrik Böving (Feb 03 2023 at 21:29):

There was a flag for it I tihnk? @Siddharth Bhat should know

Locria Cyber (Feb 03 2023 at 21:38):

I found the flag in CMakefiles.txt. I had to manually edit those.
After I enabled it, what CLI flags do I have to use?

Siddharth Bhat (Feb 03 2023 at 21:44):

-DLLVM=ON. Note that one needs to point to a copy of llvm-config that is LLVM 15.

Locria Cyber (Feb 03 2023 at 21:45):

Siddharth Bhat said:

-DLLVM=ON. Note that one needs to point to a copy of llvm-config that is LLVM 15.

The CMake flag is not public

Locria Cyber (Feb 03 2023 at 21:45):

It's not editable with make edit_cache

Locria Cyber (Feb 03 2023 at 21:56):

I got this error

lean_llvm_parse_bitcode ERROR: file doesn't start with bitcode header
LEAN ASSERTION VIOLATION
File: compiler/llvm.cpp
Line: 1065
!is_error && "failed to parse bitcode"
'unreachable' code was reached

Siddharth Bhat (Feb 03 2023 at 21:56):

@Locria Cyber interesting, I have not used edit_cache. I usually configure these with cmake ../ -DLLVM=ON -DLLVM_CONFIG=/path/to/llvm/15 . I shall change the flag to be public once we are less overwhelmed with high priority mathlib porting changes.

Locria Cyber (Feb 03 2023 at 21:56):

(deleted)

Locria Cyber (Feb 03 2023 at 21:57):

Siddharth Bhat said:

Locria Cyber interesting, I have not used edit_cache. I usually configure these with cmake ../ -DLLVM=ON -DLLVM_CONFIG=/path/to/llvm/15 . I shall change the flag to be public once we are less overwhelmed with high priority mathlib porting changes.

I can do that. There are two CMakeLists.txt, one in stage0/. It seems like I have to change both

Siddharth Bhat (Feb 03 2023 at 22:00):

@Locria Cyber there is no need. Right now, even with the LLVM backend enabled, lean still compiles _itself_ using the C backend. Compiling the stage2 lean compiler via the LLVM backend is under development: https://github.com/bollu/lean4/tree/llvm/stage2-build

Locria Cyber (Feb 03 2023 at 22:01):

Locria Cyber said:

I got this error

lean_llvm_parse_bitcode ERROR: file doesn't start with bitcode header
LEAN ASSERTION VIOLATION
File: compiler/llvm.cpp
Line: 1065
!is_error && "failed to parse bitcode"
'unreachable' code was reached

It seems like my build is broken. I built it as above, and used elan toolchain link xxx bulid/stage1 to set it as the default toolchain.

Siddharth Bhat (Feb 03 2023 at 22:03):

@Locria Cyber could you please help me reproduce this locally? What should I do, starting from a bare git clone of lean to see this error?

Locria Cyber (Feb 03 2023 at 22:07):

I hope lean will bootstrap from WASM like Zig. No need to build stage0 then.

Locria Cyber (Feb 03 2023 at 22:13):

CMake Error at runtime/CMakeLists.txt:22 (message):
  building 'lean.h.bc', need CMAKE_CXX_COMPILER_ID to match Clang to build
  LLVM bitcode file of Lean runtime.

I got this when building stage1

Jireh Loreaux (Feb 03 2023 at 22:14):

sorry, wrong thread

Siddharth Bhat (Feb 03 2023 at 22:15):

@Locria Cyber could you please send a pastebin of:

$ cmake --trace <YOUR_OPTIONS_HERE>
$ VERBOSE=1 make <YOUR_OPTIONS_HERE>

Locria Cyber (Feb 03 2023 at 22:18):

First, I did cmake -G "Unix Makefiles" -B build -DLLVM=ON
gcc is selected as the compiler
I then changed the compiler to clang/clang++ in build/CMakeCache.txt

Locria Cyber (Feb 03 2023 at 22:18):

CMAKE_C_COMPILER:FILEPATH=/usr/bin/cc to CMAKE_C_COMPILER:FILEPATH=/usr/bin/clang
CMAKE_CXX_COMPILER:FILEPATH=/usr/bin/c++ to CMAKE_CXX_COMPILER:FILEPATH=/usr/bin/clang++

Locria Cyber (Feb 03 2023 at 22:18):

Locria Cyber said:

CMake Error at runtime/CMakeLists.txt:22 (message):
  building 'lean.h.bc', need CMAKE_CXX_COMPILER_ID to match Clang to build
  LLVM bitcode file of Lean runtime.

I got this when building stage1

Then this error

Siddharth Bhat (Feb 03 2023 at 22:20):

@Locria Cyber You might want to ensure that you have a system clang installed. Could you try to nuke your build directory and retry with:

$ CC=clang CXX=clang++ cmake ../ -DLLVM=ON

I believe that manually editing the CMAKE_{C,CXX}_COMPILER does not refresh CMAKE_CXX_COMPILER_ID.

Locria Cyber (Feb 03 2023 at 22:25):

Is there a way to skip stage0 and use existing lean compiler?

Siddharth Bhat (Feb 03 2023 at 22:27):

Not that I'm aware of, perhaps @Sebastian Ullrich knows :)

Locria Cyber (Feb 03 2023 at 22:27):

It's too slow to build it every time

Siddharth Bhat (Feb 03 2023 at 22:29):

@Locria Cyber if you have ccache, it ought to be much faster the subsequent times

Locria Cyber (Feb 03 2023 at 22:29):

When will the build system gets changed from CMake+Make?

Locria Cyber (Feb 03 2023 at 22:29):

I found the problem. compiler detection of stage1 happens independently.

Henrik Böving (Feb 03 2023 at 22:30):

Locria Cyber said:

When will the build system gets changed from CMake+Make?

That is not at all on the list right now afaik.

Locria Cyber (Feb 03 2023 at 22:32):

It's too flaky.

Locria Cyber (Feb 03 2023 at 22:32):

Locria Cyber said:

I found the problem. compiler detection of stage1 happens independently.

Solved with env var CC, CXX

Siddharth Bhat (Feb 03 2023 at 22:33):

@Locria Cyber thanks, I shall write that down.

Locria Cyber (Feb 03 2023 at 22:40):

Siddharth Bhat said:

Locria Cyber thanks, I shall write that down.

Should I open an issue

Siddharth Bhat (Feb 03 2023 at 22:45):

Please do so.

Locria Cyber (Feb 03 2023 at 22:46):

Now it works (can compile to bc)
However, I got

 lean --bc=Hello.bc --target=wasm32-freestanding Hello.lean
'x86-64' is not a recognized processor for this target (ignoring processor)
'+cx8' is not a recognized feature for this target (ignoring feature)
'+fxsr' is not a recognized feature for this target (ignoring feature)
'+mmx' is not a recognized feature for this target (ignoring feature)
'+sse' is not a recognized feature for this target (ignoring feature)
'+sse2' is not a recognized feature for this target (ignoring feature)
'+x87' is not a recognized feature for this target (ignoring feature)
'x86-64' is not a recognized processor for this target (ignoring processor)
'x86-64' is not a recognized processor for this target (ignoring processor)
'+cx8' is not a recognized feature for this target (ignoring feature)
'+fxsr' is not a recognized feature for this target (ignoring feature)
'+mmx' is not a recognized feature for this target (ignoring feature)
'+sse' is not a recognized feature for this target (ignoring feature)
'+sse2' is not a recognized feature for this target (ignoring feature)
'+x87' is not a recognized feature for this target (ignoring feature)
'x86-64' is not a recognized processor for this target (ignoring processor)

Siddharth Bhat (Feb 03 2023 at 22:52):

@Locria Cyber those are warning emitted by the LLVM toolchain, I believe? Does it produce a <FILENAME>.bc.o file in the same folder, which should be a wasm blob?

Locria Cyber (Feb 03 2023 at 22:52):

Yes it does

Locria Cyber (Feb 03 2023 at 22:56):

Slight problem

Locria Cyber (Feb 03 2023 at 22:56):

the main function signature is incorrect

Locria Cyber (Feb 03 2023 at 22:56):

1  zig cc --target=wasm32-wasi -lc -o Hello.wasm Hello.bc.o
LLD Link... wasm-ld: warning: function signature mismatch: main
>>> defined as (i32, i32) -> i32 in /home/user/.cache/zig/o/9e4aeb420731c7ddc81a4c7da00c93e4/libc.a(/home/user/.cache/zig/o/e856621094245318444f2a7d4ed86adf/__main_void.o)
>>> defined as (i64, i32) -> i64 in Hello.bc.o

Locria Cyber (Feb 03 2023 at 22:57):

That's definitely not right. wasm32 is 32-bit

Locria Cyber (Feb 03 2023 at 23:22):

@Siddharth Bhat Do you think it's ok to stop at emitting bytecode? I think lean doesn't need to compile the object file.
Also, it's better to emit LLVM IR (for readability).

Siddharth Bhat (Feb 04 2023 at 09:50):

@Locria Cyber Couple of things:

  1. Since we already have the LLVM bitcode in memory, it ought to be faster for lean to build the object file, rather than paying the cost of saving file.bc, then launch llc to compile file.bc into file.o. I'll be benchmarking this soon :tm:
  2. One can always go back from LLVM bitcode to readable LLVM with opt -S file.bc -o file.llvm

Sebastian Ullrich (Feb 04 2023 at 09:56):

Writing out .bc would probably be better for implementing ccache-like caching, or even the basic version "do not recompile .o if .bc is exactly the same as before" that we can implement even in Make

Locria Cyber (Feb 15 2023 at 16:53):

Hello @Sebastian Ullrich . Is there a way to skip stage0 and use existing lean?

Locria Cyber (Feb 15 2023 at 16:56):

I found a way

cd build/
rm -r stage0
rsync -a ~/.elan/toolchains/leanprover--lean4---nightly/ stage0/
touch stage0-prefix/src/stage0-stamp/stage0-build
# now stage0 is "built"
make

Henrik Böving (Feb 15 2023 at 17:35):

i think you should at least make -t stage0-prefix/src/stage0-stamp/stage0-build to make that reliable


Last updated: Dec 20 2023 at 11:08 UTC