Zulip Chat Archive

Stream: general

Topic: Rendered source


Patrick Massot (Mar 15 2022 at 11:54):

We periodically discuss the idea of having html file showing the source files together with links to definitions and tactic state information, see for instance https://leanprover.zulipchat.com/#narrow/stream/113488-general/topic/_theory.20top.20folders for a recent discussion, and https://agda.github.io/agda-stdlib/Algebra.Structures.html for an Agda source file rendered with links (but no tactic state). In principle Alectryon should be a natural starting point. But it uses lean --server which is meant for editor integration and is awfully unreliable for this purpose. Running alectryon on a Lean 3 file has a rather low probability of success when it comes to displaying tactic state. Note that lean-client-python is not more successful, it will also randomly fail (either reporting excessive memory consumption or not returning anything).

Of course the standard answer to all this is that we should wait for Lean 4. But I think we are missing opportunities. I have quite a few propaganda talks to give in the near future and it's really a shame that we can't tell people they can inspect Lean proofs in a webpage. Even during talks, I always want to show how nice it is to be able to jump to definitions, but it actually isn't because you need to wait for orange bars to be able to jump (and of course there is also a risk that Lean will jump to a tactic definition instead of the definition you asked for...).

It's hard to believe that we can have a powerful mathport tool but we can't produce this kind of webpage. I understand that expert time is very scarce. But I'd like to read what @Mario Carneiro and @Gabriel Ebner think about how hard it would be to get something that works, either by hacking Lean3 some more or using the existing pipeline (maybe lean --ast?).

Gabriel Ebner (Mar 15 2022 at 11:55):

The AST output doesn't contain tactic states, but that seems relatively easy to add: https://github.com/leanprover-community/lean/pull/681

Gabriel Ebner (Mar 15 2022 at 11:58):

I have no idea how good the tactic states are that Daniel picked (since you mention that the alectryon version is highly unreliable).

Eric Rodriguez (Mar 15 2022 at 12:00):

Alectryon uses --server just for tactic states; with this new patch (which I didn't know was a possibility!) it could easily be reworked to use that. It's also much faster, because the --server call is single-threaded (or at least very blocking), due to how the threading model of the lean server works

Patrick Massot (Mar 15 2022 at 12:03):

After quite a lot of experimentation, I'm convinced that anything that uses lean --server for this purpose won't be reliable.

Patrick Massot (Mar 15 2022 at 12:03):

That patch by Daniel doesn't seem to be related to the server, so I'm optimistic

Patrick Massot (Mar 15 2022 at 12:04):

@Daniel Selsam what do you think? Could you reopen that PR?

Patrick Massot (Mar 15 2022 at 12:06):

Eric, do you think you understand alectryon enough to be able to add those missing features? Note that alectryon is also missing links to definitions.

Patrick Massot (Mar 15 2022 at 12:07):

Note that it is easy to get information about location of declaration after the fact, as long as we have fully qualified names, because this can be asked using Lean metaprogramming, without using the server.

Eric Rodriguez (Mar 15 2022 at 12:07):

for FQNs yes, we can use the docgen style

Eric Rodriguez (Mar 15 2022 at 12:07):

but in actual code I think it's much much harder

Gabriel Ebner (Mar 15 2022 at 12:09):

I think most constant nodes now contain a fully-qualified name in the AST (there were some recent changes here to support mathport).

Eric Rodriguez (Mar 15 2022 at 12:10):

that's amazing, so this could also be done as a post-processing step (one per file, but still). I guess you'd have to make a file with the required imports and get the links from the FQNs there

Eric Rodriguez (Mar 15 2022 at 12:11):

I'm not fully sure how the doc-gen approach works, mind, but I just know it does :b

Patrick Massot (Mar 15 2022 at 12:12):

I guess doc-gen does the same thing leancrawler does: create a lean file importing everything and simply iterate on the environment.

Patrick Massot (Mar 15 2022 at 12:12):

This is also what the blueprint framework does.

Eric Rodriguez (Mar 15 2022 at 12:19):

the "issue" is that alectryon works on a per-file model

Eric Rodriguez (Mar 15 2022 at 12:20):

so I think we could use the doc-gen style where each FQN gets its own page that redirects to where it actually is

Eric Rodriguez (Mar 15 2022 at 12:20):

and then we can just say that the FQN links to website/find/FQN

Patrick Massot (Mar 15 2022 at 12:36):

In the worst case we could simply need a post-processing pass on generated html file, replacing dummy urls written by alectryon. We also need alectryon to put anchors. But modifying the HTML template is not the difficult part of course.

Daniel Selsam (Mar 15 2022 at 14:18):

Patrick Massot said:

Daniel Selsam what do you think? Could you reopen that PR?

With that commit enabled, and this commit on the mathport side, and then a little more lean4 code that processes the mathport dump, one can enumerate (tactic-start-pos, tactic-end-pos, tactic-state) triples. The export includes data at multiple granularities, e.g. tacA ; tacB will induce tacA ; tacB, tacA and tacB so there is a design decision about how to handle the overlaps.

Eric Rodriguez (Mar 15 2022 at 14:20):

the alectryon design decision on tactic combinators was just to just show the result after all the combinators, so I guess this could be carried forward

Patrick Massot (Mar 15 2022 at 14:37):

Daniel, what do you mean by "commit enabled"? Do you mean we should use a fork of lean3? Or could you reopen that PR and add a flag so that tactic state isn't exported by default?

Daniel Selsam (Mar 15 2022 at 14:43):

Patrick Massot said:

Daniel, what do you mean by "commit enabled"? Do you mean we should use a fork of lean3? Or could you reopen that PR and add a flag so that tactic state isn't exported by default?

If the plan sounds good, I will PR the feature. The change I made is tiny, it will be more code to weave a flag through.

Eric Rodriguez (Mar 15 2022 at 14:44):

is it worth benchmarking the performance difference?

Gabriel Ebner (Mar 15 2022 at 14:44):

The reason I wanted a flag because it generates a lot more json output and the ast files already take half a gigabyte.

Daniel Selsam (Mar 15 2022 at 14:45):

@Patrick Massot I am happy to PR the feature to Lean3, to PR the sister commit to mathport, and to write a Lean4 script that enumerates those triples.

Gabriel Ebner (Mar 15 2022 at 14:45):

(But I am fine with enabling it by default if the size increase and extra runtime turns out to be reasonable.)

Patrick Massot (Mar 15 2022 at 15:13):

That would be really nice.

Mario Carneiro (Mar 15 2022 at 17:40):

Gabriel Ebner said:

The AST output doesn't contain tactic states, but that seems relatively easy to add: https://github.com/leanprover-community/lean/pull/681

The AST output already includes tactic states, in the sense of having structured tactic state objects with the exact expression, associated to the span of each tactic application. This PR was only about pretty printing those states.

Gabriel Ebner (Mar 15 2022 at 17:53):

Does this help you if you want to print it to a web page? (Serious question. The AST format does not have much documentation, and pretty-printing the JSON gets you loads of opaque numeric references.)

Gabriel Ebner (Mar 15 2022 at 17:55):

I mean I see the following in the JSON file, but it doesn't really help me:

    {
      "decl": [
        "nat",
        "div_eq_sub_mod_div"
      ],
      "goals": [
        [
          [
            {
              "name": [
                0,
                "_fresh",
                803,
                138642
              ],
              "pp": "m",
              "type": 39601
            },
            {
              "name": [
                0,
                "_fresh",
                803,
                138643
              ],
              "pp": "n",
              "type": 39601
            },
            {
              "name": [
                0,
                "_fresh",
                810,
                34738
              ],
              "pp": "n0",
              "type": 62105
            }
          ],
          62120
        ]
      ]
    },

Gabriel Ebner (Mar 15 2022 at 17:58):

Apparently the numbers are indices in the expr array, which contains a serialized version of the expression like in the textual export. No wonder the AST files are so huge...

Patrick Massot (Mar 15 2022 at 19:41):

I also stared at the ast output without being able to decide whether it would be possible to do use it for this purpose. Mario, do you have some documentation to share?

Eric Rodriguez (Mar 15 2022 at 20:59):

i think the expr array can largely be ignored for this purpose

Eric Rodriguez (Mar 15 2022 at 21:00):

especially with Daniel's pp patch

Patrick Massot (Mar 16 2022 at 20:09):

Mario, is there any documentation about the lean --ast output apart from reading the source code of lean3 and mathport?

Eric Rodriguez (Mar 16 2022 at 20:11):

https://hackmd.io/8Dr1Me6DSGaoUeuqFd3ZmA

Eric Rodriguez (Mar 16 2022 at 20:11):

is an outdated one

Patrick Massot (Mar 16 2022 at 20:12):

This is a nice find but indeed it claims to be 9 months old...

Mario Carneiro (Mar 16 2022 at 20:12):

that's when the exporter was written

Mario Carneiro (Mar 16 2022 at 20:14):

There is also some documentation on the PR/commit that added the ast feature lean#608

Mario Carneiro (Mar 16 2022 at 20:15):

But indeed, writing a proper documentation has been on my todo list for a while and I have a half written doc on my computer that I need to get back to

Patrick Massot (Mar 16 2022 at 20:16):

That could be very useful.

Mario Carneiro (Mar 16 2022 at 20:19):

@Gabriel Ebner Like I said a while ago, the ast files can actually replace tlean in the sense that they contain all the same information. Expressions are stored as actual parsed data structures, which is why you have a bunch of numbers. Pretty printing these expressions is nontrivial because you would have to replicate lean 3's pretty printing functionality, but if you are running inside lean 3 itself then you can deserialize the expressions and pretty print them using lean 3's pretty printer

Mario Carneiro (Mar 16 2022 at 20:23):

I guess for rendered source you might want more than plain pretty printing anyway since you want things like declaration links

Mario Carneiro (Mar 16 2022 at 20:24):

it would be great if there was a way for lean to be able to produce this information on demand instead of just dumping giant files to be read by docgen

Patrick Massot (Mar 16 2022 at 20:28):

Clearly we are missing a lot of tooling, and I hope the situation will improve with Lean 4. Our traditional hack is to hack the editor integration framework, but this really doesn't work (at least nobody managed to make it work). My new hope is to hack the porting infrastructure, but this is still a hack.

Patrick Massot (Mar 16 2022 at 20:30):

I don't know which language we should be using to produce rendered source code. Using Lean3 indeed gives access to the pretty-printer, and Ed also had great success in his widget framework to render stuff. Using Lean4 has the advantage that mathport is already written in Lean4 so we could try to start from there. We could also use a completely different language reading the json file.

Daniel Selsam (Mar 17 2022 at 03:37):

@Patrick Massot not sure if you still wanted these features but FYI:

Here is a Python script to build the data from the AST files exported from lean (when --tspp flag is passed):

import json
import os
import argparse
import jsonlines

parser = argparse.ArgumentParser()
parser.add_argument("--in-dir", type=str, help="directory to crawl", required=True)
parser.add_argument("--out", type=str, help="output file", required=True)
args = parser.parse_args()

writer = jsonlines.open(args.out, "w")

for root, dirname, files in os.walk(args.in_dir):
    for file in files:
        if file.endswith(".ast.json"):
            doc = json.load(open(os.path.join(root, file)))
            if "tactics" not in doc: continue
            for tactic in doc["tactics"]:
                # {'ast': 46, 'end': 1, 'start': 0, 'success': True}
                if not tactic["success"]: continue
                if tactic["start"] == tactic["end"]: continue
                if tactic["ast"] >= len(doc["ast"]): continue
                tac_ast = doc["ast"][tactic["ast"]]
                # {'children': [47], 'end': [17, 13], 'kind': 'tactic', 'pexpr': 62, 'start': [17, 3], 'value': 'by_cases'}
                if "end" not in tac_ast: continue
                writer.write({
                   "filename" : os.path.relpath(os.path.join(root, file[:-len("ast.json")] + "lean"), args.in_dir),
                    "start_pos" :tac_ast["start"],
                    "end_pos" : tac_ast["end"],
                    "start_pp" : doc["tspps"][tactic["start"]],
                    "end_pp" : doc["tspps"][tactic["end"]]
                })

It produces a jsonl file with lines like:

{"filename": "init/meta/well_founded_tactics.lean", "start_pos": [17, 24], "end_pos": [17, 43], "start_pp": "a : ℕ,\nthis : 0 < a + 1\n⊢ 0 < 1 + a", "end_pp": "a : ℕ,\nthis : 0 < a + 1\n⊢ 0 < a + 1"}

If you want the tactic text itself, it can be extracted from the source files using the start_pos and end_pos information.

Patrick Massot (Mar 17 2022 at 06:24):

Yes, I'm very much still interested, thanks a lot Daniel. I'll process this message as soon as I'll be able to find one hour (probably not today).

Patrick Massot (Apr 03 2022 at 17:26):

I've managed to use the new pretty-printed tactic state export to fix Alectryon rendering of one of my file where Alectryon couldn't find any tactic state. This is already great. But what would make it actually useful would be to add the "jump to definition" feature. However it seems that the ast exporter doesn't always keep track of full names. For instance, assume that test.lean contains

open nat

example (n : ) : n+1  0 :=
begin
  exact succ_ne_zero n
end

then, using lean 3.42.1, lean --ast --tspp --tsast test.lean produces a json file which contains no indication that succ_ne_zero is nat.succ_ne_zero. Note that this work in term mode:

open nat

example (n : ) : n+1  0 :=
succ_ne_zero n

has all the information. @Mario Carneiro @Daniel Selsam is there any hope to fix this? I guess it would also help mathport.

Mario Carneiro (Apr 04 2022 at 04:53):

The expr associated to that syntax should contain a reference to the constant nat.succ_ne_zero, but this connection is not always easy to trace through. I think @Gabriel Ebner added some additional support for tracking name resolution for use in mathport?

Patrick Massot (Apr 04 2022 at 06:35):

I don't think it contains any such reference:

{"ast": [None,
  {"children": [0, 2, 3], "kind": "file", "start": [0, 0]},
  {"end": [1, 0], "kind": "imports", "start": [1, 0]},
  {"children": [4, 7], "end": [1, 0], "kind": "commands", "start": [1, 0]},
  {"children": [5], "end": [1, 0], "kind": "open", "start": [1, 0]},
  {"children": [6, 0], "end": [1, 5], "kind": "group", "start": [1, 5]},
  {"end": [1, 5], "kind": "ident", "start": [1, 5], "value": "nat"},
  {"children": [0, 0, 0, 0, 8, 17, 25],
   "end": [3, 0],
   "kind": "example",
   "start": [3, 0]},
  {"children": [10], "end": [3, 8], "kind": "binders", "start": [3, 8]},
  {"children": [11], "end": [3, 9], "kind": "vars", "start": [3, 9]},
  {"children": [9, 0, 12], "end": [3, 9], "kind": "binder_0", "start": [3, 9]},
  {"end": [3, 9], "kind": "ident", "start": [3, 9], "value": "n"},
  {"end": [3, 14],
   "kind": "notation",
   "pexpr": 1,
   "start": [3, 13],
   "value": "exprℕ"},
  {"end": [3, 19],
   "kind": "ident",
   "pexpr": 2,
   "start": [3, 18],
   "value": "n"},
  {"end": [3, 21], "kind": "nat", "pexpr": 3, "start": [3, 20], "value": "1"},
  {"children": [13, 14],
   "end": [3, 21],
   "kind": "notation",
   "pexpr": 6,
   "start": [3, 19],
   "value": "expr + "},
  {"end": [3, 25], "kind": "nat", "pexpr": 7, "start": [3, 24], "value": "0"},
  {"children": [15, 16],
   "end": [3, 25],
   "kind": "notation",
   "pexpr": 10,
   "start": [3, 22],
   "value": "expr ≠ "},
  {"children": [0, 0, 19],
   "end": [6, 3],
   "kind": "begin",
   "pexpr": 79,
   "start": [4, 0]},
  {"children": [20],
   "end": [5, 22],
   "kind": "tactic",
   "pexpr": 57,
   "start": [5, 2],
   "value": "exact"},
  {"children": [24],
   "end": [5, 22],
   "kind": "parse",
   "pexpr": 54,
   "start": [5, 8]},
  {"end": [5, 20],
   "kind": "ident",
   "pexpr": 49,
   "start": [5, 8],
   "value": "succ_ne_zero"},
  {"end": [5, 22],
   "kind": "ident",
   "pexpr": 51,
   "start": [5, 21],
   "value": "n"},
  {"children": [21, 22],
   "end": [5, 22],
   "kind": "app",
   "pexpr": 52,
   "start": [5, 8]},
  {"children": [23], "end": [6, 0], "kind": "expr", "start": [5, 8]},
  {"children": [18],
   "end": [6, 3],
   "kind": "notation",
   "pexpr": 80,
   "start": [4, 0],
   "value": "begin"}],
 "comments": [],
 "expr": [None,
  {"const": ["nat", []]},
  {"local": {"bi": 0, "name": "n", "pp": "n", "type": 1}},
  {"prenum": {"args": [], "value": "1"}},
  {"const": [["has_add", "add"], []]},
  {"app": [4, 2]},
  {"app": [5, 3]},
  {"prenum": {"args": [], "value": "0"}},
  {"const": ["ne", []]},
  {"app": [8, 6]},
  {"app": [9, 7]},
  {"const": [["interactive", "executor", "execute_explicit"], []]},
  {"const": ["tactic", []]},
  {"app": [11, 12]},
  {"const": [["has_bind", "seq"], []]},
  {"const": [["has_bind", "seq"], []]},
  {"const": [["tactic", "save_info"], []]},
  {"const": [[4, "_", 0], []]},
  {"prenum": {"args": [], "value": "4"}},
  {"app": [17, 18]},
  {"prenum": {"args": [], "value": "0"}},
  {"app": [19, 20]},
  {"annotation": {"args": [21], "name": "anonymous_constructor"}},
  {"app": [16, 22]},
  {"app": [15, 23]},
  {"const": [["has_bind", "seq"], []]},
  {"const": [["has_bind", "seq"], []]},
  {"const": [["tactic", "save_info"], []]},
  {"const": [[4, "_", 3], []]},
  {"prenum": {"args": [], "value": "5"}},
  {"app": [28, 29]},
  {"prenum": {"args": [], "value": "2"}},
  {"app": [30, 31]},
  {"annotation": {"args": [32], "name": "anonymous_constructor"}},
  {"app": [27, 33]},
  {"app": [26, 34]},
  {"const": [["tactic", "istep"], []]},
  {"prenum": {"args": [], "value": "5"}},
  {"app": [36, 37]},
  {"prenum": {"args": [], "value": "2"}},
  {"app": [38, 39]},
  {"prenum": {"args": [], "value": "5"}},
  {"app": [40, 41]},
  {"prenum": {"args": [], "value": "2"}},
  {"app": [42, 43]},
  {"prenum": {"args": [], "value": "19"}},
  {"app": [44, 45]},
  {"const": [["tactic", "interactive", "exact"], []]},
  {"const": [[4, "_", 1], []]},
  {"local": {"bi": 0,
    "name": "succ_ne_zero",
    "pp": "succ_ne_zero",
    "type": 48}},
  {"const": [[4, "_", 2], []]},
  {"local": {"bi": 0, "name": "n", "pp": "n", "type": 50}},
  {"app": [49, 51]},
  {"expr_quote_macro": {"args": [], "reflected": False, "value": 52}},
  {"annotation": {"args": [53], "name": "as_is"}},
  {"app": [47, 54]},
  {"app": [46, 55]},
  {"app": [35, 56]},
  {"app": [25, 57]},
  {"const": [["tactic", "save_info"], []]},
  {"const": [[4, "_", 4], []]},
  {"prenum": {"args": [], "value": "6"}},
  {"app": [60, 61]},
  {"prenum": {"args": [], "value": "1"}},
  {"app": [62, 63]},
  {"annotation": {"args": [64], "name": "anonymous_constructor"}},
  {"app": [59, 65]},
  {"app": [58, 66]},
  {"app": [24, 67]},
  {"app": [14, 68]},
  {"const": [["tactic", "save_info"], []]},
  {"const": [[4, "_", 5], []]},
  {"prenum": {"args": [], "value": "6"}},
  {"app": [71, 72]},
  {"prenum": {"args": [], "value": "0"}},
  {"app": [73, 74]},
  {"annotation": {"args": [75], "name": "anonymous_constructor"}},
  {"app": [70, 76]},
  {"app": [69, 77]},
  {"app": [13, 78]},
  {"annotation": {"args": [79], "name": "by"}}],
 "file": 1,
 "level": ["0"]}

Patrick Massot (Apr 04 2022 at 06:38):

The only places where succ_ne_zero are mentioned are:

  {"end": [5, 20],
   "kind": "ident",
   "pexpr": 49,
   "start": [5, 8],
   "value": "succ_ne_zero"},

in the ast section, and then the referenced expr, number 49, which is:

{"local": {"bi": 0,
    "name": "succ_ne_zero",
    "pp": "succ_ne_zero",
    "type": 48}},

Mario Carneiro (Apr 04 2022 at 10:38):

hm, there is an optional expr field on ast nodes which supplies the elaborated expression associated to a piece of syntax, but it is not being attached in the majority of cases, it looks like. This was a low priority for mathport so I think it got left by the wayside

Patrick Massot (Apr 04 2022 at 10:48):

Do you think you could easily fix this? Or would it be too much work and you prefer to focus on mathport or other things?

Edward Ayers (Apr 05 2022 at 19:24):

Why would succ_ne_zero be a local and not a const?

Mario Carneiro (Apr 05 2022 at 19:25):

it's a local in the pexpr, probably because it wasn't resolved immediately?

Mario Carneiro (Apr 05 2022 at 19:25):

if it's in exact then we don't know at parse time whether it's going to refer to a local variable in the context

Edward Ayers (Apr 05 2022 at 20:37):

I've got a debug loop set up to sort this out but I don't know where the relevant ast_data is set for the the "ident" type.

Mario Carneiro (Apr 05 2022 at 20:41):

not sure what you mean

Edward Ayers (Apr 05 2022 at 20:43):

Like the problem is that the ast json is only emitting the pexpr identifier that hasn't been resolved yet?

Mario Carneiro (Apr 05 2022 at 20:44):

yes, that part is working as intended

Edward Ayers (Apr 05 2022 at 20:45):

But Patrick wants the resolved name

Mario Carneiro (Apr 05 2022 at 20:45):

what isn't being done yet is the association of the resolved expression once elaboration of that pexpr completes

Mario Carneiro (Apr 05 2022 at 20:46):

I don't think you need to worry about the "ident" token itself

Mario Carneiro (Apr 05 2022 at 20:47):

the trick that is used to track the ast_data in pexprs is the "tag" field which you can get with get_tag(e)

Mario Carneiro (Apr 05 2022 at 20:48):

or parser::get_id(e)

Edward Ayers (Apr 05 2022 at 20:48):

It's not totally clear to me but I think what Patrick wants is to take a position in code and use the "ast" emitted json to figure out what the resolved name is of the identifier to get nat.succ_ne_asdf

Edward Ayers (Apr 05 2022 at 20:49):

and ast nodes can have expr fields as well as pexpr fields.

Mario Carneiro (Apr 05 2022 at 20:49):

they already do

Mario Carneiro (Apr 05 2022 at 20:49):

the expr field is just not set most of the time

Mario Carneiro (Apr 05 2022 at 20:50):

so the fix here is to find suitable places in the elaborator where an expr is "done" and attach it to the ast_data as given by get_id(pexpr)

Edward Ayers (Apr 05 2022 at 20:51):

ok the suitable places here are: whenever name resolution finishes

Mario Carneiro (Apr 05 2022 at 20:51):

sure, but in the elaborator that's mixed in with everything else

Edward Ayers (Apr 05 2022 at 20:51):

oof

Edward Ayers (Apr 05 2022 at 20:51):

there must be like a method called 'resolve name' or something

Mario Carneiro (Apr 05 2022 at 20:53):

I was thinking more like elaborate(e)

Mario Carneiro (Apr 05 2022 at 20:54):

the input is a pexpr and the output is an expr , so you can just take the tag of the input pexpr and attach the output expr to the ast_data

Mario Carneiro (Apr 05 2022 at 20:54):

and in the end you will get an expr.const associated to that pexpr local

Mario Carneiro (Apr 05 2022 at 20:54):

with the resolved name in it

Edward Ayers (Apr 05 2022 at 23:17):

Almost there, with parser::set_ast_expr and parser::get_id, but parser::elaborate already calls this, so I guess set_ast_expr also needs to be called within elaborator::elaborate too. I'm not sure how to get parser's state from inside the elaborator.

Mario Carneiro (Apr 05 2022 at 23:23):

something similar needed to be done to get the parser AST data accessible to tactics, which run during elaboration

Mario Carneiro (Apr 05 2022 at 23:23):

since tactics have to push their tactic states to the "tactics" array

Mario Carneiro (Apr 05 2022 at 23:26):

this is the scope_tactic_log value in elaborate_proof, which puts the tactic log in a thread local that is picked up in log_tactic

Mario Carneiro (Apr 05 2022 at 23:30):

elaborate_proof has a whole bunch of arguments that are all passed by value because the caller puts all that data in a closure as part of a "proof elaboration" task that can outlive the parser itself

Mario Carneiro (Apr 05 2022 at 23:33):

I'm starting to remember why I didn't implement this in the first pass

Kaiyu Yang (Jan 29 2023 at 21:04):

Mario Carneiro said:

elaborate_proof has a whole bunch of arguments that are all passed by value because the caller puts all that data in a closure as part of a "proof elaboration" task that can outlive the parser itself

Hi Mario,

I just ran into a bug potentially related to the proof elaboration task outliving the parser. The parser's destructor calls m_tactic_log->detach(), which makes the proof elaboration tasks unable to log the tactic states, since the get_tactic_log function returns nullptr if the tactic log is detached. Any idea of how to fix this? Thank you!

Kaiyu Yang (Jan 29 2023 at 21:07):

Another related bug: Sometimes lean --ast --tsast fails to log the tactic states because the export_ast task in src/library/module_mgr.cpp is executed before the log_tactic function is called in src/library/tactic/tactic_state.cpp. The problem disappears if I add a sleep(5) at the beginning of the export_ast task to make it slower.

Mario Carneiro (Jan 29 2023 at 21:08):

it's been a while since I looked at this code, but IIRC the idea is that m_tactic_log is a shared pointer shared between the parser and the elaborator tasks, and m_detached is true if the parser's copy of the shared pointer has been released

Mario Carneiro (Jan 29 2023 at 21:11):

the get_tactic_log() function in tactic_log.cpp is only callable within the context of a scope_tactic_log() call

Kaiyu Yang (Jan 29 2023 at 21:14):

Thanks for the explanation! The problem is, when get_tactic_log() is called in the proof elaboration task (within the context of scope_tactic_log()), it returns nullptr because the parser has been destructed (m_detached == true). Is there anything in the code preventing this scenario from happening?

Kaiyu Yang (Jan 29 2023 at 21:18):

An unrelated question: I'm wondering if there are any reasons --tsast only record the tactic states of theorem and lemma but not example. I see scope_tactic_log is only called in elaborate_proof in definition_cmds.cpp but not in the check_example function.

Mario Carneiro (Jan 29 2023 at 21:20):

Kaiyu Yang said:

Thanks for the explanation! The problem is, when get_tactic_log() is called in the proof elaboration task (within the context of scope_tactic_log()), it returns nullptr because the parser has been destructed (m_detached == true). Is there anything in the code preventing this scenario from happening?

No, this is an expected situation, related to your second issue - once the parser is gone, it's not clear where or why to keep collecting data

Mario Carneiro (Jan 29 2023 at 21:21):

Kaiyu Yang said:

An unrelated question: I'm wondering if there are any reasons --tsast only record the tactic states of theorem and lemma but not example. I see scope_tactic_log is only called in elaborate_proof in definition_cmds.cpp but not in the check_example function.

Note that --tsast is not very well supported, since it is not used by anything that gets daily testing, and the original reason it was implemented is no longer relevant (it was used for some ML training but that project is done)

Kaiyu Yang (Jan 29 2023 at 21:28):

I see. I was trying to modify check_example to record tactic states for the code below:

open nat

example (n : ) : n+1  0 :=
begin
  exact succ_ne_zero n
end

Then I ran lean --ast --tsast but couldn't find the recorded states in the output files (using Lean 3.50.3). Then I found the reason was that the parser is gone too quickly. But the problem doesn't happen if I replace example with theorem my_thm. I guess it's because example has no effects on the environment, so there was no need to keep the parser around?

Kaiyu Yang (Jan 29 2023 at 21:31):

I basically had to do two things for the tactic states in the code above to be recorded correctly.

  1. Remove if (m_tactic_log) m_tactic_log->detach(); in the parser's destructor.
  2. Add sleep (5) in the export_ast task in module_mgr.cpp

Last updated: Dec 20 2023 at 11:08 UTC