Zulip Chat Archive

Stream: lean4

Topic: What's the right way to print lots of lines?


Yury G. Kudryashov (Mar 15 2025 at 16:54):

I'm trying to print all declaration names in the environment. My current code

#eval show MetaM Unit from do
  let env  getEnv
  for (c, info) in env.constants do
    if !( Batteries.Tactic.Lint.isAutoDecl c) then
      IO.println c

runs for a long time without printing anything. Does it accumulate all those strings before printing them all at once? Can I make it print them one by one? Also, how can I make it work faster?

Kyle Miller (Mar 15 2025 at 17:39):

#eval does accumulate everything before printing. It needs to capture stdout to accumulate a message to be printed.

I think it has this limitation even if it's being run from the command line. Potentially we should allow stdout to print immediately in this case, but there's the complexity that #eval is frequently used in conjunction with #guard_msgs in tests, which would need to be detected somehow.

It shouldn't be too hard to write a#eval_unisolated command that evaluates without using withIsolatedStreams by copying the code.

Yury G. Kudryashov (Mar 15 2025 at 18:21):

This script takes about 15 minutes on my laptop. Is there something wrong with it (besides using eval)?

Robin Arnez (Mar 15 2025 at 18:48):

well, there are around 300,000 declarations with your criteria, so it's gonna take a while to format all the text

Kyle Miller (Mar 15 2025 at 19:00):

3ms per declaration seems like a very long time to me, given what is being checked.

Maybe it's being dominated by string concatenation, from the isolated streams in eval?

Kyle Miller (Mar 15 2025 at 19:00):

You could try writing directly to a file instead

Julian Berman (Mar 15 2025 at 19:00):

When I did this a few weeks ago (and wanted more than just names) I did:

import Lean
import Lean.DocString
import Mathlib

open Lean Meta System

structure DeclInfo where
  name : Name
  type : String
  modulePath : String
  docString : String
  deriving ToJson, FromJson

def getDeclInfo (env : Environment) (declName : Name): MetaM DeclInfo := do
  let const  getConstInfo declName
  let declType  Meta.ppExpr const.type
  let modulePath := toString declName.getPrefix
  let docString := ( findDocString? env declName).getD ""
  pure {
    name := declName,
    type := toString declType,
    modulePath := modulePath,
    docString := docString
  }

def dumpDecls (outputHandle : IO.FS.Handle) : CoreM Unit := do
  let env  getEnv
  let declNames := env.constants.map₁.keys
  for declName in declNames do
    let declInfo  MetaM.run' <| getDeclInfo env declName
    outputHandle.putStrLn <| toJson declInfo |>.compress

def main (args : List String) : IO Unit := do
  let some outputPath := args.head?
    | throw (IO.Error.userError "Please provide an output path")
  IO.FS.withFile outputPath .write (fun outputHandle =>
    CoreM.withImportModules #[`Mathlib] (dumpDecls outputHandle))

#eval main ["mathlib_decls.json"]

in case that's helpful to reuse. Though possibly the declaration export thing in the leanprover org is even more useful (but when running that at least you can tail the file and watch progress).

Yury G. Kudryashov (Mar 15 2025 at 19:30):

I moved it to an executable file and it works in ~4s.

Yury G. Kudryashov (Mar 15 2025 at 19:31):

Sorry for long delays in replies in a topic I've started: I'm on a plane.

Kyle Miller (Mar 15 2025 at 19:59):

I wonder what the performance issue is. It could be docs#IO.FS.Stream.ofBuffer having accidental nonlinearity somewhere. There's also docs#String.fromUTF8!, but I'd expect that to be better tested on large byte arrays already.

Yury G. Kudryashov (Mar 15 2025 at 20:20):

I can't test rn, because my laptop's battery is running low and the outlet in the seat doesn't work. I can try to test it in a few hours.

Henrik Böving (Mar 15 2025 at 20:44):

There's also docs#String.fromUTF8!, but I'd expect that to be better tested on large byte arrays already.

bv_decide regularly works with this function on up to GB long strings, it's pretty good

Yury G. Kudryashov said:

I'm trying to print all declaration names in the environment. My current code

#eval show MetaM Unit from do
  let env  getEnv
  for (c, info) in env.constants do
    if !( Batteries.Tactic.Lint.isAutoDecl c) then
      IO.println c

runs for a long time without printing anything. Does it accumulate all those strings before printing them all at once? Can I make it print them one by one? Also, how can I make it work faster?

Looking at a profile of your code it's just stuck formatting after doing initial work
image.png

So if this was run without the formatting machinery of #eval it would be basically instant.

Henrik Böving (Mar 15 2025 at 21:06):

The reason this bottlenecks on extract is likely due to the following function:

private partial def be (w : Nat) [Monad m] [MonadPrettyFormat m] : List WorkGroup  m Unit
  | []                           => pure ()
  |   { items := [],    .. }::gs => be w gs
  | g@{ items := i::is, .. }::gs => do
    let gs' (is' : List WorkItem) := { g with items := is' }::gs;
    match i.f with
    | text s =>
      let p := s.posOf '\n'
      if p == s.endPos then
        pushOutput s
        endTags i.activeTags
        be w (gs' is)
      else
        pushOutput (s.extract {} p)
        pushNewline i.indent.toNat
        let is := { i with f := text (s.extract (s.next p) s.endPos) }::is
        -- after a hard line break, re-evaluate whether to flatten the remaining group
        pushGroup g.flb is gs w >>= be w

Our string is of the form

foo\n
bar\n
foobar\n
...

this function advances us to the next \n, then does an extract until that \n and an extract starting at that \n to the end of the previous string, the C implementation of extract is:

extern "C" LEAN_EXPORT object * lean_mk_string_from_bytes_unchecked(char const * s, size_t sz) {
    return lean_mk_string_unchecked(s, sz, utf8_strlen(s, sz));
}

extern "C" LEAN_EXPORT obj_res lean_string_utf8_extract(b_obj_arg s, b_obj_arg b0, b_obj_arg e0) {
    // blablablabla
    return lean_mk_string_from_bytes_unchecked(lean_string_cstr(s) + b, new_sz);
}

So what this does is always copy (mk_string_unchecked copies) and recompute utf8_strlen of the remaining part of the string after every newline, given that we have O(n) newlines in the code this comes down to a quadratic behavior

Henrik Böving (Mar 15 2025 at 21:09):

Due to the layout of strings in leans runtime right now it is in fact impossible to not copy the string when doing an extract, strlen might be improvable. Regardless the proper fix is likely to instead maintain a docs#String.Iterator as we e.g. also do in the Parsec module

Robin Arnez (Mar 15 2025 at 21:23):

Substring would probably also work well here

Kyle Miller (Mar 15 2025 at 21:49):

Another (independent) fix here would be to have #eval have max output configuration, like we have in the pretty printer for the Infoview. It wouldn't help if you wanted the whole output, but at least you wouldn't wait 15 minutes on formatting.


Last updated: May 02 2025 at 03:31 UTC