Zulip Chat Archive
Stream: lean4
Topic: What's the right way to print lots of lines?
Yury G. Kudryashov (Mar 15 2025 at 16:54):
I'm trying to print all declaration names in the environment. My current code
#eval show MetaM Unit from do
let env ← getEnv
for (c, info) in env.constants do
if !(← Batteries.Tactic.Lint.isAutoDecl c) then
IO.println c
runs for a long time without printing anything. Does it accumulate all those strings before printing them all at once? Can I make it print them one by one? Also, how can I make it work faster?
Kyle Miller (Mar 15 2025 at 17:39):
#eval
does accumulate everything before printing. It needs to capture stdout to accumulate a message to be printed.
I think it has this limitation even if it's being run from the command line. Potentially we should allow stdout to print immediately in this case, but there's the complexity that #eval
is frequently used in conjunction with #guard_msgs
in tests, which would need to be detected somehow.
It shouldn't be too hard to write a#eval_unisolated
command that evaluates without using withIsolatedStreams
by copying the code.
Yury G. Kudryashov (Mar 15 2025 at 18:21):
This script takes about 15 minutes on my laptop. Is there something wrong with it (besides using eval
)?
Robin Arnez (Mar 15 2025 at 18:48):
well, there are around 300,000 declarations with your criteria, so it's gonna take a while to format all the text
Kyle Miller (Mar 15 2025 at 19:00):
3ms per declaration seems like a very long time to me, given what is being checked.
Maybe it's being dominated by string concatenation, from the isolated streams in eval?
Kyle Miller (Mar 15 2025 at 19:00):
You could try writing directly to a file instead
Julian Berman (Mar 15 2025 at 19:00):
When I did this a few weeks ago (and wanted more than just names) I did:
import Lean
import Lean.DocString
import Mathlib
open Lean Meta System
structure DeclInfo where
name : Name
type : String
modulePath : String
docString : String
deriving ToJson, FromJson
def getDeclInfo (env : Environment) (declName : Name): MetaM DeclInfo := do
let const ← getConstInfo declName
let declType ← Meta.ppExpr const.type
let modulePath := toString declName.getPrefix
let docString := (← findDocString? env declName).getD ""
pure {
name := declName,
type := toString declType,
modulePath := modulePath,
docString := docString
}
def dumpDecls (outputHandle : IO.FS.Handle) : CoreM Unit := do
let env ← getEnv
let declNames := env.constants.map₁.keys
for declName in declNames do
let declInfo ← MetaM.run' <| getDeclInfo env declName
outputHandle.putStrLn <| toJson declInfo |>.compress
def main (args : List String) : IO Unit := do
let some outputPath := args.head?
| throw (IO.Error.userError "Please provide an output path")
IO.FS.withFile outputPath .write (fun outputHandle =>
CoreM.withImportModules #[`Mathlib] (dumpDecls outputHandle))
#eval main ["mathlib_decls.json"]
in case that's helpful to reuse. Though possibly the declaration export thing in the leanprover org is even more useful (but when running that at least you can tail the file and watch progress).
Yury G. Kudryashov (Mar 15 2025 at 19:30):
I moved it to an executable file and it works in ~4s.
Yury G. Kudryashov (Mar 15 2025 at 19:31):
Sorry for long delays in replies in a topic I've started: I'm on a plane.
Kyle Miller (Mar 15 2025 at 19:59):
I wonder what the performance issue is. It could be docs#IO.FS.Stream.ofBuffer having accidental nonlinearity somewhere. There's also docs#String.fromUTF8!, but I'd expect that to be better tested on large byte arrays already.
Yury G. Kudryashov (Mar 15 2025 at 20:20):
I can't test rn, because my laptop's battery is running low and the outlet in the seat doesn't work. I can try to test it in a few hours.
Henrik Böving (Mar 15 2025 at 20:44):
There's also docs#String.fromUTF8!, but I'd expect that to be better tested on large byte arrays already.
bv_decide regularly works with this function on up to GB long strings, it's pretty good
Yury G. Kudryashov said:
I'm trying to print all declaration names in the environment. My current code
#eval show MetaM Unit from do let env ← getEnv for (c, info) in env.constants do if !(← Batteries.Tactic.Lint.isAutoDecl c) then IO.println c
runs for a long time without printing anything. Does it accumulate all those strings before printing them all at once? Can I make it print them one by one? Also, how can I make it work faster?
Looking at a profile of your code it's just stuck formatting after doing initial work
image.png
So if this was run without the formatting machinery of #eval
it would be basically instant.
Henrik Böving (Mar 15 2025 at 21:06):
The reason this bottlenecks on extract is likely due to the following function:
private partial def be (w : Nat) [Monad m] [MonadPrettyFormat m] : List WorkGroup → m Unit
| [] => pure ()
| { items := [], .. }::gs => be w gs
| g@{ items := i::is, .. }::gs => do
let gs' (is' : List WorkItem) := { g with items := is' }::gs;
match i.f with
| text s =>
let p := s.posOf '\n'
if p == s.endPos then
pushOutput s
endTags i.activeTags
be w (gs' is)
else
pushOutput (s.extract {} p)
pushNewline i.indent.toNat
let is := { i with f := text (s.extract (s.next p) s.endPos) }::is
-- after a hard line break, re-evaluate whether to flatten the remaining group
pushGroup g.flb is gs w >>= be w
Our string is of the form
foo\n
bar\n
foobar\n
...
this function advances us to the next \n
, then does an extract until that \n
and an extract starting at that \n
to the end of the previous string, the C implementation of extract is:
extern "C" LEAN_EXPORT object * lean_mk_string_from_bytes_unchecked(char const * s, size_t sz) {
return lean_mk_string_unchecked(s, sz, utf8_strlen(s, sz));
}
extern "C" LEAN_EXPORT obj_res lean_string_utf8_extract(b_obj_arg s, b_obj_arg b0, b_obj_arg e0) {
// blablablabla
return lean_mk_string_from_bytes_unchecked(lean_string_cstr(s) + b, new_sz);
}
So what this does is always copy (mk_string_unchecked copies) and recompute utf8_strlen of the remaining part of the string after every newline, given that we have O(n)
newlines in the code this comes down to a quadratic behavior
Henrik Böving (Mar 15 2025 at 21:09):
Due to the layout of strings in leans runtime right now it is in fact impossible to not copy the string when doing an extract
, strlen
might be improvable. Regardless the proper fix is likely to instead maintain a docs#String.Iterator as we e.g. also do in the Parsec module
Robin Arnez (Mar 15 2025 at 21:23):
Substring
would probably also work well here
Kyle Miller (Mar 15 2025 at 21:49):
Another (independent) fix here would be to have #eval
have max output configuration, like we have in the pretty printer for the Infoview. It wouldn't help if you wanted the whole output, but at least you wouldn't wait 15 minutes on formatting.
Last updated: May 02 2025 at 03:31 UTC