Zulip Chat Archive

Stream: lean4

Topic: Mathlib 4 source files


Aurélien Saue (May 11 2021 at 10:25):

Hi everyone,
So I have been working on creating Lean4 source files for Mathlib from the tlean files. For now, sorry is used everywhere, but apart from that the results are getting decent.
What I have worked on is mainly:

  • detecting autogenerated definitions
  • prettyprinting classes, structures, inductive types, instances and notations
  • moving a reasonable amount of arguments before the colon in definitions and theorems
  • parsing the namespace structure and comments of the Lean3 source files

There are still a handful of problems, some of the most frequent being:

  • prettyprinted expressions that do not compile because of missing type annotations or implicit arguments that cannot be synthesized,
  • class/structure extensions not handled

Here is for instance the result I get on algebra/add_torsor: https://gist.github.com/AurelienSaue/4d9e54100bb231a1196c881dfcb63454
What do you think of the approach?

Johan Commelin (May 11 2021 at 10:55):

Just glancing through that file, it looks marvellous!

Johan Commelin (May 11 2021 at 10:56):

About the sorryd proofs: do you have plans to also auto-port those later on?

Aurélien Saue (May 11 2021 at 10:59):

Yes that would of course be great but the main problem for now is that a lot of them are referencing autogenerated definitions (._main, ._proof_1, ...) which are eliminated

Patrick Massot (May 11 2021 at 11:11):

Seeing progress on this is wonderful! Do you think you could do anything to hide of these universe variables? I guess the original file has Type* everywhere and not a single explicit max or universes.

Patrick Massot (May 11 2021 at 11:12):

Something very easy to fix would be to remove those empty lines between docstrings and declarations.

Aurélien Saue (May 11 2021 at 11:14):

Yes sure I will try, that should not be too hard

Mac (May 11 2021 at 17:50):

Just checking, but the [(i : I) → add_group (fg i)] (in add_torsor) is not valid Lean 4, right? Functions can't be classes?

Patrick Massot (May 11 2021 at 17:50):

I think it's valid.

Mario Carneiro (May 11 2021 at 17:50):

It's valid in lean 3

Mario Carneiro (May 11 2021 at 17:51):

lean will just do typeclass search in the extended context

Mario Carneiro (May 11 2021 at 17:51):

the requirement is only that it be a pi type ending in a class

Mac (May 11 2021 at 17:57):

interesting! I learn something new about Lean every day :)

Daniel Selsam (May 14 2021 at 18:37):

FYI there has been some off-channel discussion of an alternative approach to porting that avoids the .tleans and tries to port syntax directly. Specifically, this would entail:

  • writing a complete lean3/mathlib parser in lean4 (without elaboration)
  • writing syntax transformers to convert the parsed syntax to lean4 syntax (e.g. fun _, _ -> fun _ => _)
  • improving the formatter (not the delaborator) so the generated lean4 files are (mostly) aesthetic
  • dealing with the gillion squiggly lines this will produce (e.g. from missing or modified tactics, elaborator changes, etc)
    • possibly by implementing missing tactics
    • possibly by updating syntax transformers
    • possibly by running bulk automation (e.g. a "squeezing" tree-search tactic)
    • probably largely by hand

Mario Carneiro (May 14 2021 at 18:40):

writing a complete lean3/mathlib parser in lean4 (without elaboration)

I want to add that an alternative to this is to modify lean 3 to produce AST export data, kind of like tlean but focused on the input syntax, and then parsing that export data in lean 4

Gabriel Ebner (May 14 2021 at 18:46):

I believe @Aurélien Saue's tool is also parsing the Lean 3 source file to extract comments, etc. (in addition to the mathported olean file). https://leanprover.zulipchat.com/#narrow/stream/270676-lean4/topic/Mathlib.204.20source.20files/near/238283727

Aurélien Saue (May 17 2021 at 08:30):

Sorry for seeing this so late.
This seems like a very good idea to me since all the issues I face currently are due to delaboration.
Has anyone started this process yet?

Daniel Selsam (May 20 2021 at 03:53):

I just quickly scraped together a scaffold: https://github.com/dselsam/synport

Mario Carneiro (May 20 2021 at 03:54):

this is for the "complete lean3/mathlib parser in lean4" plan, right?

Daniel Selsam (May 20 2021 at 03:55):

  • synport.py: takes a lean3 file and prints the lean4 file to stdout
  • it does this by prepending import SynPort; #lean3 to the file and calling lean4 on the resulting file
  • the lean4 elaboration of the #lean3 command will do the rest
  • there is almost no syntax/transformers there yet, but enough to see the pattern

Daniel Selsam (May 20 2021 at 03:55):

Mario Carneiro said:

this is for the "complete lean3/mathlib parser in lean4" plan, right?

Yes.

Mario Carneiro (May 20 2021 at 03:56):

I'm not sure that a fully faithful lean 3 parser can be written in the lean 4 parser infrastructure. In particular lean 3 parsers can do IO and access the tactic state

Daniel Selsam (May 20 2021 at 03:59):

What do you think this means for SynPort in practice? How many things will be unparseable? Can you backport a sanity-restriction to prevent this behavior?

Mario Carneiro (May 20 2021 at 04:04):

Not clear. I have long held the belief that lean 3 parsing for anything other than lean 3 is nigh impossible, and I have had very little success in approximating it from the outside. Plus the lean 3 grammar is pretty complicated even assuming the edge cases are somehow dealt with, and portions of the grammar (tactic parsers) are written in lean and it's not clear how to get at that without adding a lean 3 interpreter to the mix

Daniel Selsam (May 20 2021 at 04:05):

My guess is that we can find a way to syn-port the parts that syn-port and skip the edge cases, and that it will work well for a big chunk of the library

Daniel Selsam (May 20 2021 at 04:06):

In my manual porting experiments on top of mathport, so much of the work was just changing lambda/match syntax + friends

Mario Carneiro (May 20 2021 at 04:08):

One way to get something close without too much work is to copy and paste the lean 4 grammar and see if it works on lean 4 files

Daniel Selsam (May 20 2021 at 04:09):

did you mean for those both to be 4s?

Mario Carneiro (May 20 2021 at 04:09):

yes

Mario Carneiro (May 20 2021 at 04:10):

lean 4 already has its own grammar written using lean 4 notations, so if you use that as a base for modification you can get a rough lean 3 parser easier than if you wrote one from scratch

Daniel Selsam (May 20 2021 at 04:11):

That I definitely agree with. I was assuming we would effectively copy-paste.

Daniel Selsam (May 20 2021 at 04:12):

Note: I tried just adding new Lean4 syntax for the lean3 fun and match, and it seemed to work well also. So we could even try reusing Lean4's term category, but it seems like it could cause problems downstream

Mario Carneiro (May 20 2021 at 04:16):

No we want term3 because we are going to reflect on tactic proofs

Sebastian Ullrich (May 20 2021 at 07:25):

Mario Carneiro said:

No we want term3 because we are going to reflect on tactic proofs

What exactly does that mean?

Mario Carneiro (May 20 2021 at 07:47):

We want to be able to transform the lean 3 tactic AST into equivalent lean 4 tactic structures. It's not clear that this can be done if we just directly produce lean 4 exprs because when they get elaborated we lose the tactics. But I might just be confused about lean 4 macro expansion

Sebastian Ullrich (May 20 2021 at 07:50):

Ah no, I don't think the plan is to expand all macros in order to move from Lean 3 to Lean 4 syntax, if that is what you were referring to. Instead we would selectively expand macros from SynPort only.

Sebastian Ullrich (May 20 2021 at 07:52):

I.e. if there is a mathlib3 tactic that we want to map to an existing/different Lean 4 syntax, we would describe it as a macro and expand away, but no further.

Mario Carneiro (May 20 2021 at 07:54):

right, I don't really know how macro expansion is controlled like that

Mario Carneiro (May 20 2021 at 07:55):

What about type-aware expansion? For example we might want to select a lean 4 tactic based on what actually works on the given goal

Sebastian Ullrich (May 20 2021 at 08:00):

Hah, alright, that would be the next difficulty level. We could possibly retrieve that information from the InfoTree, which is the data structure used to transfer infromation from the elaborator to the server.

Sebastian Ullrich (May 20 2021 at 08:01):

Currently the info tree only holds macro expansion steps, not elaboration steps, but we will likely need to change that anyway if we want to implement "go to macro/elaborator" on arbitrary syntax.

Mario Carneiro (May 20 2021 at 08:02):

I was thinking we could parse the whole file into some command3 syntax object, then write an elaborator that uses the syntax object as instructions to construct a lean 4 syntax and simultaneously elaborate it to inform the construction

Sebastian Ullrich (May 20 2021 at 08:04):

So you basically want to reimplement (an extended version of) the Lean 3 elaborator in Lean 4? I would have hoped we could avoid that.

Mario Carneiro (May 20 2021 at 08:04):

Not really, it is the lean 4 elaborator which is informing the construction

Sebastian Ullrich (May 20 2021 at 08:05):

Okay, I'm not completely sure how you will coerce/extend the Lean 4 elaborator into constructing a new syntax tree on the side

Sebastian Ullrich (May 20 2021 at 08:06):

But also we should evaluate how often we actually think that will happen and if it is worthwhile to automate it away, or if it makes sense to offload it to the manual porting steps that will need to happen anyway. Same with Lean 3 parsing using I/O, how often does that happen?

Mario Carneiro (May 20 2021 at 08:07):

True, there is an obvious fallback if this doesn't work or isn't useful enough. I figure that given the size of mathlib it's worth investing in anything we can do to improve the success rate

Sebastian Ullrich (May 20 2021 at 08:09):

I agree. The InfoTree would be my best bet for doing that so far.

Mario Carneiro (May 20 2021 at 08:12):

To give an example of what I mean, suppose there is a lean 3 tactic foo that lean 4 writes as bar on goals of the form |- bar and foo otherwise. So we parse foo : tactic3, and then we have a program with type Syntax -> Option Expr -> TermElabM Syntax which does:

fun `(foo:tactic3) expectedType? =>
  if expectedType? = some `(bar) then `(tactic| bar) else `(tactic| foo)

Mario Carneiro (May 20 2021 at 08:13):

in general, it would go through the motions of actually running said lean 4 tactic, in addition to constructing the syntax for the tactic

Sebastian Ullrich (May 20 2021 at 08:14):

If you use adaptExpander, this will actually log the transformation in the info tree today.

Sebastian Ullrich (May 20 2021 at 08:15):

(tactic version: https://github.com/leanprover/lean4/blob/cd5dbc66ce2fd6bf1d38e40599a279df8c01a9ce/src/Lean/Elab/Tactic/Basic.lean#L258-L260)

Mario Carneiro (May 20 2021 at 08:21):

This would still not be perfect since it doesn't know lean 3's opinion about how it is supposed to elaborate, which is often more important than lean 4's reconstruction. That could be addressed by augmenting the lean 3 AST dump with type information about the goal state after each tactic or at each application. I think that the highest fidelity version would be if lean 3 elaborated everything into an AST dump and lean 4 parsed that instead of trying to parse lean 3 directly. That way lean 4 would have access both to the lean 3 type information and the lean 4 type information and could fill in motives and repair gaps.

Daniel Selsam (May 20 2021 at 14:00):

Sebastian Ullrich said:

Ah no, I don't think the plan is to expand all macros in order to move from Lean 3 to Lean 4 syntax, if that is what you were referring to. Instead we would selectively expand macros from SynPort only.

Right -- the current scaffold doesn't even use macros, it is literally a function Syntax -> <some monad> Syntax that we have complete control over.

Daniel Selsam (May 20 2021 at 14:00):

It then calls the formatter on the new syntax and that is it.

Daniel Selsam (May 20 2021 at 14:04):

Mario Carneiro said:

What about type-aware expansion? For example we might want to select a lean 4 tactic based on what actually works on the given goal

This could be a second pass -- i.e. read the new lean4 files and for each squiggly in a proof, try running proof search and if it works, replace what is there with the new proof.

Daniel Selsam (May 20 2021 at 14:11):

^ I wouldn't even want to try this right away, not until manually inspecting a bunch of the squigglies to see if they are caused by important issues/bugs

Mac (May 21 2021 at 00:38):

(deleted, posted in the wrong thread!)


Last updated: Dec 20 2023 at 11:08 UTC