Zulip Chat Archive

Stream: general

Topic: Parallel Lean Repl


Frederick Pu (Apr 08 2025 at 18:35):

I've written the following batch proof verifying in lean repl

unsafe def getHeaderEnv (header : String) : IO Command.State := do
  Lean.initSearchPath ( Lean.findSysroot)
  enableInitializersExecution
  let inputCtx   := Parser.mkInputContext header "<input>"
  let (header, parserState, messages)   Parser.parseHeader inputCtx
  let (env, _)  processHeader header {} messages inputCtx
  let commandState := (Command.mkState env messages {})
  let s  IO.processCommands inputCtx parserState commandState <&> Frontend.State.commandState
  pure s

unsafe def batchVerifySequential (commandState : Command.State)  (proofs : Array String) : IO (Array (CommandResponse  Error)) := do
  proofs.mapM (fun pf => do
    let inputCtx   := Parser.mkInputContext pf "<input>"
    let parserState := { : Parser.ModuleParserState }
    let (_, msgs, _)  Lean.Elab.IO.processCommandsWithInfoTrees inputCtx parserState commandState
    return .inl ({ env := 0,
                    messages :=  msgs.mapM fun m => Message.of m,
                    sorries := [],
                    tactics := [],
                    infotree := none })
  )

unsafe def batchVerifyParrallelNaive (header : String) (proofs : Array String) : IO (Array (CommandResponse  Error)) := do
  let commandState  getHeaderEnv header
  let tasks : Array (Task (Except IO.Error CommandResponse))  (proofs.mapM <| fun proof => IO.asTask <| do
    let inputCtx   := Parser.mkInputContext proof "<input>"
    let parserState := { : Parser.ModuleParserState }
    let (_, msgs, _)  Lean.Elab.IO.processCommandsWithInfoTrees inputCtx parserState commandState
    return ({ env := 0,
              messages :=  msgs.mapM fun m => Message.of m,
              sorries := [],
              tactics := [],
              infotree := none
            })
  )
  let result :=  IO.wait <|  IO.mapTasks (·.mapM (pure ·)) tasks.toList (prio := Task.Priority.max)
  match result with
  | .ok results =>
    return (results.map
      fun x =>
        match x with
        | .ok cmdres => .inl cmdres
        | .error e => .inr (Error.mk e.toString)
      ).toArray
  | .error _ => return #[]

unsafe def batchVerifyParrallel (header : String) (proofs : Array String) (buckets : Option Nat): IO (Array (CommandResponse  Error)) := do
  let buckets :=
    match buckets with
    | some x => x
    | none => max 50 proofs.size
  let commandState  getHeaderEnv header
  let tasks  (splitArray proofs buckets |>.mapM <| fun bucket => IO.asTask ( (batchVerifySequential commandState bucket)))
  let result :=  IO.wait <|  IO.mapTasks (·.mapM (pure ·)) tasks.toList (prio := Task.Priority.max)
  match result with
  | .ok results => {
    let womp : List (Array (CommandResponse  Error))  results.mapM (
      fun x => do
        match x with
        | .ok bucket => pure bucket
        | .error e =>
          IO.println e
          pure #[]
    )
    return womp.toArray.flatMap id
  }
  | .error _ => return #[]

However, the performance is significantly worse than running in leanrepl on multiple python processes. Can someone tell me what I'm doing wrong?

Frederick Pu (Apr 08 2025 at 18:36):

here the link to my full repo: https://github.com/leanprover-community/repl

Arthur Paulino (Apr 08 2025 at 18:47):

First thing that comes to my mind is: are you running compiled code or are you interpreting it?

Frederick Pu (Apr 08 2025 at 21:47):

im running lake build, and then lake exe repl

Frederick Pu (Apr 08 2025 at 23:33):

One thing I noticed is that when you run lake build, or lake exe cache get, it usually spawns multiple processes. However my lean repl instance never does this

Frederick Pu (Apr 08 2025 at 23:40):

hmm seems like lake still uses subprocces calls to actually build the modules: https://github.com/leanprover/lean4/blob/master/src/lake/Lake/Build/Actions.lean

Frederick Pu (Apr 08 2025 at 23:46):

also when running naive parrallel you eventually create so many threads that you get the following error: libc++abi: terminating due to uncaught exception of type std::bad_alloc: std::bad_alloc. Is there any way to see when you are about to reach the thread limit?


Last updated: May 02 2025 at 03:31 UTC