Zulip Chat Archive

Stream: lean4

Topic: Lean Backend


Schrodinger ZHU Yifan (Sep 30 2023 at 04:16):

It seems to me that to write a backend for lean, I do need to rewrite a lot of functions that is in lean.h.
Is there a better way? Some current sketch:

import LeanGccBackend.Basic
import LeanGccJit.Core

open LeanGccJit.Core
namespace Lean.IR
namespace GccJit

def getLeanObject : CodegenM (Struct × Array Field) :=
  getOrCreateStruct "lean_object" do
    let ctx  getCtx
    let int  ctx.getType TypeEnum.Int
    let unsigned  ctx.getType TypeEnum.UnsignedInt
    let m_rc  ctx.newField none int "m_rc"
    let m_cs_sz  ctx.newBitField none unsigned 16 "m_cs_sz"
    let m_other  ctx.newBitField none unsigned 8 "m_other"
    let m_tag  ctx.newBitField none unsigned 8 "m_tag"
    let obj  ctx.newStructType none "lean_object" #[m_rc, m_cs_sz, m_other, m_tag]
    pure (obj, #[m_rc, m_cs_sz, m_other, m_tag])

def getLeanObjPtr : CodegenM JitType :=
  getLeanObject >>= (·.fst.asJitType) >>= (·.getPointer)

def getLeanBox : CodegenM Func :=
  getOrCreateFunction "lean_box" do
    let ctx  getCtx
    let obj  getLeanObjPtr
    let size_t  ctx.getType TypeEnum.SizeT
    let input  ctx.newParam none size_t "value"
    let leanBox  ctx.newFunction none FunctionKind.AlwaysInline obj "lean_box" #[input] false
    let block  leanBox.newBlock "entry"
    let one  ctx.one size_t
    let shifted  ctx.newBinaryOp none BinaryOp.LShift size_t ( input.asRValue) one
    let set  ctx.newBinaryOp none BinaryOp.BitwiseOr size_t shifted one
    let casted  ctx.newBitCast none set obj
    block.endWithReturn none casted
    pure leanBox

def getLeanUnbox : CodegenM Func :=
  getOrCreateFunction "leam_unbox" do
    let ctx  getCtx
    let obj  getLeanObjPtr
    let size_t  ctx.getType TypeEnum.SizeT
    let input  ctx.newParam none obj "boxed"
    let leanUnbox  ctx.newFunction none FunctionKind.AlwaysInline obj "lean_unbox" #[input] false
    let block  leanUnbox.newBlock "entry"
    let casted  ctx.newBitCast none ( input.asRValue) size_t
    let shifted  ctx.newBinaryOp none BinaryOp.RShift size_t casted ( ctx.one size_t)
    block.endWithReturn none shifted
    pure leanUnbox

def getLeanIsScalar : CodegenM Func :=
  getOrCreateFunction "lean_is_scalar" do
    let ctx  getCtx
    let obj  getLeanObjPtr
    let bool  ctx.getType TypeEnum.Bool
    let input  ctx.newParam none obj "obj"
    let leanIsScalar  ctx.newFunction none FunctionKind.AlwaysInline bool "lean_is_scalar" #[input] false
    let block  leanIsScalar.newBlock "entry"
    let casted  ctx.newBitCast none ( input.asRValue) ( ctx.getType TypeEnum.SizeT)
    let one  ctx.one ( ctx.getType TypeEnum.SizeT)
    let bitand  ctx.newBinaryOp none BinaryOp.BitwiseAnd ( ctx.getType TypeEnum.SizeT) casted one
    let cmp  ctx.newComparison none Comparison.EQ bitand one
    block.endWithReturn none cmp
    pure leanIsScalar

def importFunction (name: String) (ret: JitType) (params: Array (String × JitType)) : CodegenM Func :=
  getOrCreateFunction name do
    let ctx  getCtx
    let mut inputs := #[]
    for (name, ty) in params do
      inputs := inputs.push ( ctx.newParam none ty name)
    ctx.newFunction none FunctionKind.Imported ret name inputs false

def getLeanMarkPersistent : CodegenM Func := do
  let ctx  getCtx
  let void  ctx.getType TypeEnum.Void
  let obj  getLeanObjPtr
  importFunction "lean_mark_persistent" void #[("obj", obj)]

def getLeanAllocSmall : CodegenM Func := do
  let ctx  getCtx
  let voidPtr  ctx.getType TypeEnum.VoidPtr
  let unsigned  ctx.getType TypeEnum.UnsignedInt
  importFunction "lean_alloc_small" voidPtr #[("sz", unsigned), ("slot_idx", unsigned)]

def getLeanFreeSmall : CodegenM Func := do
  let ctx  getCtx
  let void  ctx.getType TypeEnum.Void
  let voidPtr  ctx.getType TypeEnum.VoidPtr
  importFunction "lean_free_small" void #[("p", voidPtr)]

def getLeanSmallMemSize : CodegenM Func := do
  let ctx  getCtx
  let unsigned  ctx.getType TypeEnum.UnsignedInt
  let voidPtr  ctx.getType TypeEnum.VoidPtr
  importFunction "lean_small_mem_size" unsigned #[("p", voidPtr)]

def getLeanIncHeartbeat : CodegenM Func := do
  let ctx  getCtx
  let void  ctx.getType TypeEnum.Void
  importFunction "lean_inc_heartbeat" void #[]

def callLeanBox (value : RValue) : CodegenM RValue := do
  let ctx  getCtx
  let leanBox  getLeanBox
  ctx.newCall none leanBox #[value]

def callLeanMarkPersistent (obj : RValue) : CodegenM RValue := do
  let ctx  getCtx
  let leanMarkPersistent  getLeanMarkPersistent
  ctx.newCall none leanMarkPersistent #[obj]

Mario Carneiro (Sep 30 2023 at 04:18):

(typo: leam_unbox)

Mario Carneiro (Sep 30 2023 at 04:18):

but I suppose that makes your point that with a lot of functions there is a lot of potential for mistakes

Mario Carneiro (Sep 30 2023 at 04:19):

Is there a way to feed a header file into gccgit?

Mario Carneiro (Sep 30 2023 at 04:19):

because that does seem pretty painful

Schrodinger ZHU Yifan (Sep 30 2023 at 04:20):

I think the LLVM.Backend is also repeating the definitions. So painful.

Schrodinger ZHU Yifan (Sep 30 2023 at 04:21):

I imagine that one day the ABI changes, and then a lot of repeated modifications are required.

Mario Carneiro (Sep 30 2023 at 04:22):

is it possible for the LLVM.Backend to at least share code with this implementation?

Schrodinger ZHU Yifan (Sep 30 2023 at 04:26):

It will be hard I think. Mine is personal interest while the LLVM one is official. I mean, there may be generic abstraction that fit both backends, but LLVM is the only one inside lean, so ad-hoc code is better :D

Mario Carneiro (Sep 30 2023 at 04:27):

well, I mean it more in the opposite sense, maybe the official LLVM backend can expose things in such a way that you can use it for your backend

Mario Carneiro (Sep 30 2023 at 04:27):

that said, I am not that surprised that writing a backend is a lot of work. It's not like we expect that to be a common activity

Mario Carneiro (Sep 30 2023 at 04:30):

maybe you should define some abstraction for describing the functions (maybe, C code?) and instantiate it to both backends

Mario Carneiro (Sep 30 2023 at 04:30):

that's probably more efficient than writing things like this even if it is only for you

Mario Carneiro (Sep 30 2023 at 04:31):

(efficient in terms of effort and entropy in the file, not necessarily runtime)

Schrodinger ZHU Yifan (Sep 30 2023 at 04:31):

Mario Carneiro said:

that said, I am not that surprised that writing a backend is a lot of work. It's not like we expect that to be a common activity

Yeah. It requires efforts.

Mario Carneiro (Sep 30 2023 at 04:33):

I wonder, is Alloy able to parse lean.h? Alternatively, you could call clang to parse the file. Either way, you could get the AST and translate it into function definitions as a way to automate this

Mario Carneiro (Sep 30 2023 at 04:34):

but that might be harder than just writing it out by hand :smile:

Schrodinger ZHU Yifan (Sep 30 2023 at 04:37):

And, it may induce extra runtime cost :crying_cat:.

Well, I would say that being FBIP-flavor, koka and lean's runtime is already lighter than many other PLs.

Maybe I'll just put it up bit by bit in my leisure time.

Mario Carneiro (Sep 30 2023 at 04:48):

you can do elab-time codegen if you want to remove the runtime cost

Mario Carneiro (Sep 30 2023 at 04:48):

i.e. making a macro which generates the code you have written above

Mario Carneiro (Sep 30 2023 at 04:49):

then the cost would be shifted to compiling the library itself

Mac Malone (Sep 30 2023 at 22:06):

Mario Carneiro said:

I wonder, is Alloy able to parse lean.h?

Sadly, not at the moment. The opencompl C Parser may be able to (and may be a better fit in this kind of use case anyway). However, it does require a custom toolchain since lean4#2293 has yet to land.

Mario Carneiro (Sep 30 2023 at 22:10):

it's a bit funny that the project is considered 96% C code even though it's a lean project

Mario Carneiro (Sep 30 2023 at 22:10):

I'm guessing there must be some big tests

Mac Malone (Sep 30 2023 at 22:13):

@Mario Carneiro A quick skim gave, as an example test, shell_clangd.c which is 23147 lines long. So, yeah, I think you are right about the big tests. :rofl:

Schrodinger ZHU Yifan (Sep 30 2023 at 22:51):

https://github.com/SchrodingerZhu/LeanGccBackend/blob/main/LeanGccBackend/Runtime.lean

Schrodinger ZHU Yifan (Sep 30 2023 at 22:52):

porting runtime all day long — not even half :stuck_out_tongue_closed_eyes:

Mario Carneiro (Sep 30 2023 at 22:53):

is there a reason ctx is an explicit argument to functions like getType? It could just pull the value out of the monad

Schrodinger ZHU Yifan (Sep 30 2023 at 22:55):

no. I should have wrapped getType.

Schrodinger ZHU Yifan (Sep 30 2023 at 22:59):

also some common type can be made easier to construct?

Schrodinger ZHU Yifan (Sep 30 2023 at 22:59):

like size_t/void/unsigned

Mario Carneiro (Sep 30 2023 at 22:59):

I would spend more time trying to make the writing of these functions simpler, e.g.:

def getLeanArrayObject : CodegenM (Struct × Array Field) :=
  mkStruct "lean_array_object" fun _ => return #[
     field ( obj) "m_header",
     field ( size_t) "m_size",
     field ( size_t) "m_capacity",
     field ( array ( obj) 0) "m_data"
  ]

Mario Carneiro (Sep 30 2023 at 23:01):

actually, if field took a monadic function as its first argument and so did array, you could simplify this to

def getLeanArrayObject : CodegenM (Struct × Array Field) :=
  mkStruct "lean_array_object" fun _ => return #[
     field obj "m_header",
     field size_t "m_size",
     field size_t "m_capacity",
     field (array obj 0) "m_data"
  ]

Mario Carneiro (Sep 30 2023 at 23:02):

and that's without even pulling out the macros, which could give this almost C-like syntax if you wanted

Mario Carneiro (Sep 30 2023 at 23:03):

which would make it easier to just copy-paste-modify from lean.h

Schrodinger ZHU Yifan (Sep 30 2023 at 23:04):

fair enough. also, some if then else branch helper would also reduce the code

Mario Carneiro (Sep 30 2023 at 23:05):

you have to name every basic block? That's messy

Mario Carneiro (Sep 30 2023 at 23:06):

if you call newBlock twice with the same name, what happens?

Schrodinger ZHU Yifan (Sep 30 2023 at 23:07):

that label can be none

Schrodinger ZHU Yifan (Sep 30 2023 at 23:07):

But I did not set default values in the upstream

Schrodinger ZHU Yifan (Sep 30 2023 at 23:09):

Also, all those none passing around is for debug info. I don’t put them in the last place cuz I wanted to keep them in the same order of the CAPI. But I think it is a good decision

Schrodinger ZHU Yifan (Sep 30 2023 at 23:09):

not good

Mario Carneiro (Sep 30 2023 at 23:09):

you can just have a defaulted parameter if it's almost always none

Mario Carneiro (Sep 30 2023 at 23:11):

besides, the functions I am showing here like size_t, array etc are not meant to be coming from LeanGccJit, they are wrappers you have in this file to compress the task of writing these functions

Mario Carneiro (Sep 30 2023 at 23:11):

which means they can be more tailored to exactly how you intend to use the API in this file

Mario Carneiro (Sep 30 2023 at 23:12):

they may or may not be useful for other purposes

Mario Carneiro (Sep 30 2023 at 23:13):

but creating DSLs on the spot is one of lean's selling points, you should take advantage of it whenever you notice you are writing a lot of boilerplate

Schrodinger ZHU Yifan (Oct 01 2023 at 04:37):

I think I have made it much shorter now: https://github.com/SchrodingerZhu/LeanGccBackend/blob/main/LeanGccBackend/Runtime.lean

There are still a lot of arrows flying around --- I actually begin with monoidal parameters, but then I find that such operations are lazy: If you pass a monoidal variable, the evaluation time and order are nondeterministic. Thus, I end up using arrows anyway.

Sebastian Ullrich (Oct 01 2023 at 06:01):

Schrodinger ZHU Yifan said:

I think the LLVM.Backend is also repeating the definitions

It isn't, see lean.h.bc

Mac Malone (Oct 01 2023 at 09:15):

Mac Malone said:

Mario Carneiro said:

I wonder, is Alloy able to parse lean.h?

Sadly, not at the moment.

Alloy is now able to parse lean.h with minimal overhead:

import Alloy.C.Grammar
import Lean.Elab.Command

open Lean Parser Elab Command

/- By Mario: https://leanprover.zulipchat.com/#narrow/stream/270676-lean4/topic/Undefine.20lambda/near/393694214 -/
def Lean.Parser.Trie.isEmpty {α} (t : Trie α) : Bool := t matches .Node none .leaf
partial def Lean.Parser.Trie.erase {α} (t : Trie α) (s : String) : Trie α :=
  let rec loop : Trie α  String.Pos  Option (Trie α)
    | val, m⟩, i =>
      match s.atEnd i with
      | true  => some (Trie.Node none m)
      | false => do
        let c := s.get i
        let i := s.next i
        let t  m.find compare c
        let t  loop t i
        let m := if t.isEmpty then m.erase compare c else m.insert compare c t
        some val, m
  (loop t 0).getD t

def mkParserModuleContext (env : Environment) (scope : Scope) : ParserModuleContext where
  env := env
  options := scope.opts
  currNamespace := scope.currNamespace
  openDecls := scope.openDecls

syntax "LEAN_CASSERT" "(" cExpr ")" ";" : cCmd
syntax "extern" str "{" : cCmd
syntax "}" : cCmd

partial def test : CommandElabM PUnit := do
  let fileName := "lean.h"
  let includeDir := ( Lean.getBuildDir) / "include" / "lean"
  let input  IO.FS.readFile <| includeDir / fileName
  let input := input.replace "--" "-=1"
  let p := categoryParser `cCmd 0
  let ictx := mkInputContext input fileName
  let env  getEnv
  let env := parserExtension.modifyState env fun s =>
    {s with tokens := s.tokens.insert "~" "~" |>.erase "fun" |>.erase "end"}
  let pmctx := mkParserModuleContext env ( getScope)
  go p.fn ictx pmctx (mkParserState ictx.input)
where
  go p ictx pmctx s := do
    let s := p.run ictx pmctx (getTokenTable pmctx.env) s
    if s.hasError then
      logError <| s.toErrorMsg ictx
    else if ictx.input.atEnd s.pos then
      pure ()
    else
      go p ictx pmctx s

#eval test -- no error

Mario Carneiro (Oct 01 2023 at 09:22):

you don't support --?

Mario Carneiro (Oct 01 2023 at 09:22):

...oh this is a lean comment issue isn't it

Schrodinger ZHU Yifan (Oct 01 2023 at 14:58):

Sebastian Ullrich said:

Schrodinger ZHU Yifan said:

I think the LLVM.Backend is also repeating the definitions

It isn't, see lean.h.bc

that is good. unfortunately, libgccjit is so simple that it does not have IR/Bytecode support.

Schrodinger ZHU Yifan (Oct 04 2023 at 01:23):

After writing several levels of the wrapper, I have made the codegen straightforward:

def getModuleInitializationFunction : CodegenM Func := do
  let bool  bool
  let uint8_t  uint8_t
  let obj_ptr  «lean_object
  let env  getEnv
  let modName  getModName
  let mut importedInits := #[]
  for i in env.imports do
    let f  importFunction (mkModuleInitializationFunctionName i.module) obj_ptr #[(uint8_t, "builtin"), (obj_ptr, "w")]
    importedInits := importedInits.push f
  let _G_initialized  getOrCreateGlobal "_G_initialized" bool (init := constantZero bool)
  mkFunctionM (mkModuleInitializationFunctionName modName) obj_ptr #[(uint8_t, "builtin"), (obj_ptr, "w")] (kind := FunctionKind.Exported) do
    let res  mkLocalVarM obj_ptr "res"
    let epilogue  mkNewBlock "epilogue"
    mkIfBranchM _G_initialized
      (do
        goto epilogue
      )
      (do
        mkAssignmentM _G_initialized ( constantOne bool)
        for i in importedInits do
          mkAssignmentM res (call i ( getParamM! 0,  getParamM! 1))
          let isErr  call ( getLeanIOResultIsError) res
          mkIfBranchM isErr
            (do
              mkReturnM res
            )
            (do
              mkEvalM (call ( getLeanDecRef) res)
            )
        goto epilogue
      )
    moveTo epilogue
    epilogue.addComment none "TODO: generate decl init code"
    let unit  call ( getLeanBox) ( constantZero ( size_t))
    let ok  call ( getLeanIOResultMkOk) unit
    mkReturnM ok

However, I think I will need more help in many ways. Does anyone interested in looking into this and helping me write/review the codegen?

Schrodinger ZHU Yifan (Oct 08 2023 at 22:58):

image.png
Lean GccJit Backend just compiled its first working example. (There is still a lot of work to do)

Sebastian Ullrich (Oct 09 2023 at 07:12):

I'm curious, what were the reasons that made you go with GCC instead of LLVM JIT?

Schrodinger ZHU Yifan (Oct 10 2023 at 00:45):

Sebastian Ullrich said:

I'm curious, what were the reasons that made you go with GCC instead of LLVM JIT?

:laughing:, just to test around!

Schrodinger ZHU Yifan (Oct 10 2023 at 00:47):

def getLeanNatSub : CodegenM Func :=
  getLeanNatBinOp "sub" fun blk a b => do
    let size_t  size_t
    let one  constantOne size_t
    let a  a ::! size_t
    let b  (b ::! size_t) &&& ( ·~· one)
    let (result, overflow)  overflowCheck blk "result" a "sub" b
    let overflow  unlikely overflow
    mkIfBranch blk overflow
      (fun then_ => do
        let unit  call ( getLeanBox) ( constantZero size_t)
        mkReturn then_ unit
      )
      (fun else_ => do
        mkReturn else_ ( result ::! (←«lean_object))
      )

Schrodinger ZHU Yifan (Oct 10 2023 at 00:48):

I think some nat/int operations in lean.h can be further optimized. For example, the above routine (together with add) in my gccjit backend makes (fib 47) 3 seconds faster than current lean.h implementation.

Schrodinger ZHU Yifan (Oct 10 2023 at 00:50):

Sebastian Ullrich said:

I'm curious, what were the reasons that made you go with GCC instead of LLVM JIT?

allow me to quote the section in rust’s gccjit backend:

“The primary goal of this project is to be able to compile Rust code on platforms unsupported by LLVM. A secondary goal is to check if using the gcc backend will provide any run-time speed improvement for the programs compiled using rustc.”

Schrodinger ZHU Yifan (Oct 19 2023 at 04:06):

089F081C-775E-41D0-A8FC-E4D2661330DC.jpg

Schrodinger ZHU Yifan (Oct 21 2023 at 03:44):

Call For Help:

This project needs more help writing commits as in https://github.com/SchrodingerZhu/LeanGccBackend/pull/1 to support full features from lean's runtime.

Schrodinger ZHU Yifan (Oct 21 2023 at 03:48):

For example, this branch currently failed to build as the compiler does not know some functions from lean.h:
image.png

One can try to add definitions to these functions to get it work! Believe me --- It is fun writing the code in lean to compose Gimple (GCC's IR)!
I hope to get people's help just because there are too many of them :D

Utensil Song (Oct 21 2023 at 05:16):

Related: https://leanprover.zulipchat.com/#narrow/stream/270676-lean4/topic/.E2.9C.94.20FFI.20to.20C.2B.2B.3A.20GiNaC/near/392446969 , particularly the log in the spoiler.

Try supplying -v, then you'll see leanc will call gcc or clang, then there should be something like -lLean which I believe to be where these symbols are located.

Utensil Song (Oct 21 2023 at 05:32):

Oh, these symbols are not in .a as confirmed by nm, but in the dynamic library libleanshared.so/dylib/dll.

Utensil Song (Oct 21 2023 at 05:43):

I thought you were missing some --load-dynlib in weakLeanArgs but they shouldn't be needed for Lean itself. So I suspected your lean.h is incompatible with the toolchain but you are using leanprover/lean4:nightly-2023-10-13 which is not too old.

Utensil Song (Oct 21 2023 at 05:44):

Finally, I realized that what you are missing are all static inline functions implemented in lean.h, that might be the cause.

Utensil Song (Oct 21 2023 at 05:53):

That means these symbols are now in your FFI library, and they need to be linked properly (especially when building a standalone exe as Lean itself is no longer involved during run time), you may check the flags in entries like package, lean_lib, lean_exe, and also target libginac_ffi in https://github.com/utensil/ginac-lean/blob/main/lakefile.lean to see what might be needed to add, but it would be less as your project is C, not C++.

Schrodinger ZHU Yifan (Oct 21 2023 at 06:24):

Thanks for providing a way to locate similar issues. However, linkage and missing symbol is not a problem here. I am actually looking for people interested in writing yet another backend for Lean to help fill up those static inline functions in Lean. As stated above, there are a lot of inline functions to be ported to GCCJIT runtime so I am hoping to get help :D

I am using the branch as an example of how specific functions (e.g. lean_nat_mod) are added to the GCCJIT backend.

Utensil Song (Oct 21 2023 at 06:34):

Oh sorry, I misunderstood your goal. I read your PR, it seems writing them in Lean would involve lots of boilerplates, is it more fun if you can just parse what's already in lean.h then generate the Lean code?

Schrodinger ZHU Yifan (Oct 21 2023 at 06:49):

Yes and No. C syntax is actually not that simple. for example, typecast/_Atomic actually makes the syntax context-dependent; and generating the code to GIMPLE would potentially require more work.

On the other hand, I think it is also worth check if rewriting lean.h can give a better codegen. As I mentioned before, lean.h does not seem performance-optimal to me. On aarch64, I speed up Fib 47 from 12s to 9s by saving a few boxing/unboxing instructions (using signed multiplication overflow/making fast path even shorter in bit operations or compare operations).

BTW, I think upstream should be aware of such optimizations. I guess the reason why these optimizations are not applied to lean.h is just that writing so may affect inter-compiler portability and readability and they are not that important as people seldom do such concentrated computations in Lean, as least for now. Am I correct?

Anyway, let this project to examine if such codegen optimization works can be helpful.

Schrodinger ZHU Yifan (Dec 04 2023 at 02:58):

I am sorry that project has no update for a while. I am currently burried in my school work and various of other things.


Last updated: Dec 20 2023 at 11:08 UTC