Zulip Chat Archive

Stream: lean4

Topic: mathlib:delab


Daniel Selsam (Apr 15 2021 at 02:40):

There is still a lot of uncertainty about how to structure the textual port, but one high-level strategy that has been floated is to push heuristic delaboration as far as possible before manually touching files. I just wrote a toy demo of how this might work to start the discussion: https://gist.github.com/dselsam/bed905f85021dc4b41b971ad2c95a8c2 Here is a tentative proposal:

  1. We implement delaborator round-tripping https://github.com/leanprover/lean4/issues/368 . At this point, we should be able to (in principle) delaborate mathlib into large and unreadable files.

  2. We iterate the following:

    • find an ostensibly-tactic-compressible part of a delaborated proof (e.g. by simp, ring, etc.)
    • implement a heuristic delaboration rule for it (implementing the tactics as needed)
    • rerun the delaborator
  3. Baseless speculation is that we can produce reasonable .lean files for the entire library this way with (say) a few hundred auto-placed sorrys for large proof-terms that can't be compressed, before we reach diminishing returns.

  4. We fill in the sorrys manually, looking at the old lean3 tactic scripts for guidance.

  5. Once the textual port is complete, we do a manual pass to use lean4 constructions instead of auto-ported lean3 constructions. For example, lean3 definitions that use the equation compiler will translate as definitions in terms of foo._main, which will be defined by brec_on, and there will be a ton of explicit equation lemmas. These will all start out in the delaborated .lean files. So we delete all this cruft, and write the definition using the equation compiler, consulting the lean3 definitions as necessary. I don't expect this kind of change to propagate very far, since most things will be def-eq.

Thoughts?

Mario Carneiro (Apr 15 2021 at 02:47):

The example for 5 doesn't sound like it needs to be a manual pass

Mario Carneiro (Apr 15 2021 at 02:48):

We have visibility into the fact that the lean 3 definition was done with the equation compiler so we can use it

Daniel Selsam (Apr 15 2021 at 02:51):

Mario Carneiro said:

We have visibility into the fact that the lean 3 definition was done with the equation compiler so we can use it

True, we could separately export from lean3 the fully elaborated equations, and then have a special equations-delaborator on the lean4 side

Daniel Selsam (Apr 15 2021 at 02:53):

But I am not sure if there are enough uses to merit this though.

Mario Carneiro (Apr 15 2021 at 02:53):

There's another variation on this plan, where we put in enough work to extract a lean 3 AST and then map it to reasonable substitutes in lean 4. It will look good but most likely will be slightly broken all over the place; but it seems like a better starting point for manual editing

Daniel Selsam (Apr 15 2021 at 02:55):

Yes, a different and somewhat compatible high-level strategy is to export parsed lean3 tactic scripts, automate some of the work of making it lean4 syntax (e.g. lambda , -> =>), and then manually tweaking to make it type-check. As you said it will be slightly broken everywhere though. With the delab approach the sorry damage will be controlled and localized.

Daniel Selsam (Apr 15 2021 at 02:56):

We can definitely do this for the sorry stage, but I am hoping there will be few enough sorries that it is easier to just copy-paste and manually tweak the syntax.

Mario Carneiro (Apr 15 2021 at 02:57):

I'm not as positive about the outlook of step 2

Mario Carneiro (Apr 15 2021 at 02:58):

it will be difficult to get the tactics to actually reproduce the proof terms we want to compress

Daniel Selsam (Apr 15 2021 at 02:58):

A lower bar is producing reasonably short proof terms that type-check, even if they are slightly weird and not how a human would have written it.

Mario Carneiro (Apr 15 2021 at 02:59):

I am not so worried about simp and ring so much as the structural tactics. The elaborator differences are going to be a huge pain

Daniel Selsam (Apr 15 2021 at 02:59):

What do you mean by a structural tactic?

Mario Carneiro (Apr 15 2021 at 03:00):

change, cases, induction, refine, apply

Mario Carneiro (Apr 15 2021 at 03:00):

These are tactics that already exist but don't do exactly the same thing

Daniel Selsam (Apr 15 2021 at 03:00):

Mario Carneiro said:

I'm not as positive about the outlook of step 2

Also, if I were positive, I wouldn't even bother posting :) I am not at all positive, and I am hoping people share their concerns about potential roadblocks.

Daniel Selsam (Apr 15 2021 at 03:01):

Mario Carneiro said:

These are tactics that already exist but don't do exactly the same thing

This is why heuristic delaboration is likely to be more reliable than heuristic translation -- it doesn't matter what the lean3 tactics did.

Mario Carneiro (Apr 15 2021 at 03:01):

specifically, "a few hundred sorrys" sounds like an extremely high bar

Mario Carneiro (Apr 15 2021 at 03:01):

I'm expecting more like ten thousand sorrys

Daniel Selsam (Apr 15 2021 at 03:03):

At what granularity are you expecting? I mean, will we be able to stick sorrys deep inside otherwise OK proof terms, or will we effectively need to sorry out entire proofs?

Mario Carneiro (Apr 15 2021 at 03:03):

Daniel Selsam said:

Mario Carneiro said:

These are tactics that already exist but don't do exactly the same thing

This is why heuristic delaboration is likely to be more reliable than heuristic translation -- it doesn't matter what the lean3 tactics did.

Yes, but the issue with delaboration is that it's mostly a one way street - you can't easily reconstruct the original proof and will have to consult the lean 3 proof heavily anyway. I doubt the term will even help that much beyond being able to act as a kind of long winded version of sorry

Mario Carneiro (Apr 15 2021 at 03:04):

Daniel Selsam said:

At what granularity are you expecting? I mean, will we be able to stick sorrys deep inside otherwise OK proof terms, or will we effectively need to sorry out entire proofs?

If the bar is "terms that are too big even after iterating step 2 to fixpoint" I would guess there will be one in almost every proof

Mario Carneiro (Apr 15 2021 at 03:05):

AST translation is much more likely to be helpful to human editors

Daniel Selsam (Apr 15 2021 at 03:06):

Mario Carneiro said:

Daniel Selsam said:

Mario Carneiro said:

These are tactics that already exist but don't do exactly the same thing

This is why heuristic delaboration is likely to be more reliable than heuristic translation -- it doesn't matter what the lean3 tactics did.

Yes, but the issue with delaboration is that it's mostly a one way street - you can't easily reconstruct the original proof and will have to consult the lean 3 proof heavily anyway. I doubt the term will even help that much beyond being able to act as a kind of long winded version of sorry

I agree that this is a risk. Specifically, that even if it delaborates to something short enough, that when it comes time to refactor, the proof is still so unconventional that people just look to the old lean3 version anyway.

Mario Carneiro (Apr 15 2021 at 03:06):

I would probably do an AST translation and then splitscreen with the lean 3 proof to see intermediate states so I can repair the broken bits

Daniel Selsam (Apr 15 2021 at 03:06):

Mario Carneiro said:

AST translation is much more likely to be helpful to human editors

I agree.

Daniel Selsam (Apr 15 2021 at 03:08):

So it really seems to hinge on how dense the sorrys will be after iterating (2), and how easy it will be for a human to align a sorry within a delaborated proof with the corresponding part of the corresponding lean3 tactic script.

Mario Carneiro (Apr 15 2021 at 03:09):

We can certainly do tests for that

Mario Carneiro (Apr 15 2021 at 03:10):

This can't really start until we have most mathlib tactics first though

Daniel Selsam (Apr 15 2021 at 03:10):

It will take a while to develop the delaborator to test these hypotheses though. I am not even sure how long the initial (1) step will take -- probably easy for Sebastian but I'll need to get up to speed on a good chunk of the frontend first.

Mario Carneiro (Apr 15 2021 at 03:11):

step 1 sounds a bit like an open research question :oops:

Daniel Selsam (Apr 15 2021 at 03:11):

Mario Carneiro said:

This can't really start until we have most mathlib tactics first though

We don't need all the tactics to do the experiment. We already have analogues of simp, cases, and induction, and we can easily detect goals that could be proven by a future norm_num, ring, or linarith (without needing them to be implemented yet)

Daniel Selsam (Apr 15 2021 at 03:13):

Mario Carneiro said:

step 1 sounds a bit like an open research question :oops:

Maybe a research question to prove it works, but we may be able to get very far just iterating some form of:

- try delab
- detect error message
- add more detail to the subterm with the message

Mario Carneiro (Apr 15 2021 at 03:13):

There are a lot of tactics other than those though. I'm thinking of the other 90% of tactics that do silly little things like contradiction, existsi, constructor, choose

Mario Carneiro (Apr 15 2021 at 03:14):

If you look in logic.basic, nat.basic, and list.basic you won't find much linarith but quite a bit of simp, lots of induction and a bit of cases and rcases

Daniel Selsam (Apr 15 2021 at 03:14):

Even without those tactics implemented, the proofs will delab to simple and concise things. But that kind of little tactic shouldn't be hard to implement.

Daniel Selsam (Apr 15 2021 at 03:15):

Mario Carneiro said:

If you look in logic.basic, nat.basic, and list.basic you won't find much linarith but quite a bit of simp, lots of induction and a bit of cases and rcases

This is a good starting point to experiment with. We could even start experimenting with (2) before doing (1), since this kind of basic thing will probably work with pp.default.

Daniel Selsam (Apr 15 2021 at 03:16):

I'll try those three files tomorrow.

Mario Carneiro (Apr 15 2021 at 03:17):

plus the lemmas in nat.basic (or at least init.data.nat.lemmas) are actually kind of important, I seem to run across a half dozen of them when I try to prove anything at all (see findLeast thread)

Mario Carneiro (Apr 15 2021 at 03:17):

so some of them will need to end up in the mathlib prelude most likely

Daniel Selsam (Apr 15 2021 at 03:18):

Will the mathlib prelude be a stand-alone lean4 repo with no mathport dependency?

Mario Carneiro (Apr 15 2021 at 03:18):

that's the plan

Daniel Selsam (Apr 15 2021 at 03:19):

And the goal is mainly to replicate src/tactic right? As opposed to starting a separate bottom-up porting process.

Mario Carneiro (Apr 15 2021 at 03:19):

I need to understand better how compilation and dependencies work in lean 4 but I think we need tactics to be in a separate project for them to be compiled for use in mathlib

Mario Carneiro (Apr 15 2021 at 03:20):

yes, it's only for tactic stuff and supporting lemmas, like lean 3 core (which is a bit bigger than lean 4 core is right now) + mathlib tactic/

Daniel Selsam (Apr 15 2021 at 03:20):

Mario Carneiro said:

I need to understand better how compilation and dependencies work in lean 4 but I think we need tactics to be in a separate project for them to be compiled for use in mathlib

I believe this is currently the case, yes.

Scott Morrison (Apr 15 2021 at 04:58):

Is that planned to change?

Gabriel Ebner (Apr 15 2021 at 07:28):

Daniel Selsam said:

[...] definitions in terms of foo._main, which will be defined by brec_on [...]

Delaborating proofs is one challenge, and it's good it's being worked on. But I believe that these auxiliary definitions pose a much bigger issue in automatic porting because they are used extensively in mathlib. As you've already noticed, there are auxiliary definitions created by Lean, such as _proof_* or _match_*, etc. But there's also lots of automatically generated code from mathlib:

  • @[to_additive] duplicates every definition it is tagged with
  • @[ext], @[mk_iff], @[simps], etc., generate auxiliary lemmas
  • @[derive] also exists in Lean 3

And these are just the common ones. If the automatically ported file contains the expanded versions of these, then the task is not "fill in a few sorrys", but rather "delete it and port the Lean 3 version manually".

Sebastian Ullrich (Apr 15 2021 at 07:30):

Scott Morrison said:

Is that planned to change?

No, because a package seems like a sensible code unit to generate a shared library from. But maybe people are confused about what the implications of "separate packages" are - there is absolutely no reason that you couldn't put multiple packages into the same repo. It's just that leanpkg hasn't been extended in such a way yet.

Sebastian Ullrich (Apr 15 2021 at 07:32):

Well, separate packages might not be necessary if we could JIT compile Lean programs, but those are dreams of the future. Shared libraries we can produce right now.

Gabriel Ebner (Apr 15 2021 at 07:32):

Now, I don't think that that is necessarily a bad strategy. Let's say we can port all of mathlib to Lean 4 automatically, even if every proof is sorry and all to_additive are expanded, etc. Then at least we can parallelize the manual porting process afterwards. Because it becomes possible to replace an ugly automatically-ported Lean 4 file by a manually ported version, and have CI check that that everything still builds.

Gabriel Ebner (Apr 15 2021 at 07:33):

Also pinging @Aurélien Saue, who has been experimenting with a similar automatic porting tool.

Gabriel Ebner (Apr 15 2021 at 07:37):

I don't expect this kind of change to propagate very far, since most things will be def-eq.

If you write def foo := foo._main manually instead of def foo := 42, this changes several things, even though they are definitionally equal:

  • rw foo produces a different result (and also simp, delta, etc.). This is because the equational lemmas are different.
  • Unification behaves differently, exposing brec_on. If you use pattern matching, Lean will use smart unfolding instead.

Scott Morrison (Apr 15 2021 at 09:24):

Sebastian Ullrich said:

Scott Morrison said:

Is that planned to change?

No, because a package seems like a sensible code unit to generate a shared library from. But maybe people are confused about what the implications of "separate packages" are - there is absolutely no reason that you couldn't put multiple packages into the same repo. It's just that leanpkg hasn't been extended in such a way yet.

But what about our tactics that, e.g. rely on part of the category theory library before you can even write them, and then get used in later parts of the category theory library? e.g. the recent elementwise I wrote needs to know about particular lemmas.

Gabriel Ebner (Apr 15 2021 at 09:28):

I believe we can still use tactics immediately in the same file. It's just they will be interpreted (as slow as in Lean 3). They only need to be in a different project if we want them to be compiled.

Scott Morrison (Apr 15 2021 at 09:31):

I see. I guess we'll actually adapt to that pretty easily. If you look at the current import graph for mathlib there is a very obvious horizontal line, before which there is no interesting maths, and after which there are no substantial tactics. We can just split mathlib on that line.

Sebastian Ullrich (Apr 15 2021 at 09:32):

The Lean 4 interpreter is actually quite a bit faster than in Lean 3 IIRC. Also, compilation only matters for tactics that do significant work themselves. If most time is spent in a nested simp, it will use the native code for that anyway.

Aurélien Saue (Apr 15 2021 at 09:53):

Gabriel Ebner said:

Now, I don't think that that is necessarily a bad strategy. Let's say we can port all of mathlib to Lean 4 automatically, even if every proof is sorry and all to_additive are expanded, etc. Then at least we can parallelize the manual porting process afterwards. Because it becomes possible to replace an ugly automatically-ported Lean 4 file by a manually ported version, and have CI check that that everything still builds.

So that has been the approach I have been trying to explore, and I think it would be a good first step. After time, these sorries would be replaced by proper proofs but at least we have a buildable first version. For it to be comfortable to use for users we could also include all the Lean3 proofs as comments after the sorry for them not to have to look up the Lean3 files all the time.

Aurélien Saue (Apr 15 2021 at 09:57):

For step one, is the idea to delaborate from the mathport generated olean files?

Scott Morrison (Apr 15 2021 at 10:33):

Regarding delaborating from mathport oleans: is it plausible for us to modify lean3 tactics so that they insert "wrappers" into the proof terms, which essentially serve as instructions for a mathport specific delaboration step?

Sebastian Ullrich (Apr 15 2021 at 10:45):

Gabriel Ebner said:

Now, I don't think that that is necessarily a bad strategy. Let's say we can port all of mathlib to Lean 4 automatically, even if every proof is sorry and all to_additive are expanded, etc. Then at least we can parallelize the manual porting process afterwards. Because it becomes possible to replace an ugly automatically-ported Lean 4 file by a manually ported version, and have CI check that that everything still builds.

So you would effectively use the delaborator not for its source output to iterate upon but to move from Lean 4-compatible Lean 3 .olean files to equivalent ones that conform to the output of native Lean 4 declarations, e.g. regarding equations, is that right?

Gabriel Ebner (Apr 15 2021 at 11:30):

Sebastian Ullrich said:

Gabriel Ebner said:
So you would effectively use the delaborator not for its source output to iterate upon [...]

Yes. It would still produce source output, but the output is only/mostly there to make mathlib4 build while we port files individually. The advantage of using Lean 4 source files instead of converted olean files is that 1) they are producing the native equation lemmas, etc., and 2) that we can modify them during porting.
For example the category theory library extensively uses automation that is integrated into the definitions. There is lots of auto_params with custom tactics. Say we begin by changing these auto_param tactics to Lean 4 equivalents. This will certainly break downstream files. But you can't start with the downstream files either, because doing category theory without automation is pointless.

Gabriel Ebner (Apr 15 2021 at 11:32):

There are other parts of mathlib that are much easier to port. For example, I fully expect the delaboration-based porting tool to produce a usable version of logic.basic.

Sebastian Ullrich (Apr 15 2021 at 11:55):

I see, makes sense to me. But that also means that the delaborator output wouldn't have to be "great", i.e. using the current state with "try the default options; if it doesn't round-trip, use pp.all" could be sufficient (assuming there are no pp.all bugs). It could still be improved in parallel to manual porting if you do want to reuse its output for such files.

Aurélien Saue (Apr 15 2021 at 12:39):

I just want to tell you all that I am still a newbie but if during this process you encounter some task that does not require too much of an in-depth knowledge and that you would like to delegate, I would be happy to give it my best try :)

Daniel Selsam (Apr 15 2021 at 15:37):

Gabriel Ebner said:

If you write def foo := foo._main manually instead of def foo := 42, this changes several things, even though they are definitionally equal:

  • rw foo produces a different result (and also simp, delta, etc.). This is because the equational lemmas are different.

Yes but this one is at the tactic level, which wouldn't matter if the downstream proofs are generated by the delaborator.

  • Unification behaves differently, exposing brec_on. If you use pattern matching, Lean will use smart unfolding instead.

Actually, when importing mathlib binaries we still get smart unfolding, because foo._sunfold is generated and ported as well and Lean4 just checks for the name. The smart unfolding behavior is a little different between lean3 and lean4 (in particular, there is no smart unfolding anymore for non-recursive definitions), but I think most recursive smart unfolds will be the same.

Daniel Selsam (Apr 15 2021 at 15:41):

Scott Morrison said:

Regarding delaborating from mathport oleans: is it plausible for us to modify lean3 tactics so that they insert "wrappers" into the proof terms, which essentially serve as instructions for a mathport specific delaboration step?

Yes. We could create an inductive type MyExtraInfoForLean4 and define a tagging function:

axiom MyExtraInfoForLean4 : Type
def id_tag4 {α : Type} (tag : MyExtraInfoForLean4) (x : α) : α := x

Even if MyExtraLean4Info uses strings and names and other types that don't align perfectly, mathport can still decode concrete values into the corresponding Lean4 values. It already does this for auto_param names.

Gabriel Ebner (Apr 15 2021 at 15:43):

Daniel Selsam said:

Gabriel Ebner said:

If you write def foo := foo._main manually instead of def foo := 42, this changes several things, even though they are definitionally equal:

  • rw foo produces a different result (and also simp, delta, etc.). This is because the equational lemmas are different.

Yes but this one is at the tactic level, which wouldn't matter if the downstream proofs are generated by the delaborator.

In that case the delaborated files can't be used as is and still need to be cleaned up in an API-changing way. Which breaks all the nicely delaborated proofs afterwards.

Daniel Selsam (Apr 15 2021 at 15:48):

We can also get rid of the foo._mains during mathport, and fuse them with foo.


Last updated: Dec 20 2023 at 11:08 UTC