Zulip Chat Archive

Stream: general

Topic: automatic inclusion of variables in mathlib 4


Sebastien Gouezel (Aug 22 2023 at 14:15):

There has been an ongoing discussion for some time about issues with automatic variable inclusion in Lean 4. Here is a quick summary of the discussion.

Sebastien Gouezel (Aug 22 2023 at 14:15):

Currently, when starting the proof of a theorem (or the body of a definition, it doesn't make a difference), all variables in the context are included and may be used in the proof. When the proof is over, only those variables that have indeed been used are kept for the statement of the theorem (i.e., all the other ones are discarded). This looks nice, but we have faced several issues because of this:

  • if in the context there is variable {n : ℕ} (hn : 1 < n) (h'n : 2 < n) and a theorem using hn but not h'n, then induction n will also pull h'n in the assumptions of the theorem. So will subst. The solution is to clear h'n before the induction even though h'n should not be in sight anywhere.
  • When cleaning up a proof, say by using lemma B instead of lemma A in the proof, new typeclass assumptions may be pulled in by lemma B and weaken the initial statement. All this will go unnoticed (no linter can help here).
  • In files which are heavy on typeclass assumptions (say files on vector bundles), the goal view is cluttered by dozens of lines of typeclass assumptions which are not relevant for the lemma being currently proved.

Sebastien Gouezel (Aug 22 2023 at 14:15):

The Lean 3 behavior was less clever, but more predictable. It went like this:

  • include for a lemma or a def all the variables that are mentioned explicitly in the statement of the lemma, and in its body if it is in term mode.
  • Also include all the typeclass assumptions about these variables which are in the context, and unnamed.
  • There was also a mechanism where include hf would force inclusion of hf in all the forthcoming statements, until the end of the section or until omit hf was encountered.

Sebastien Gouezel (Aug 22 2023 at 14:15):

This wasn't perfect either. Especially, the include / omit mechanism made things quite hard to parse for the user, and was one of the issues that prompted for a different mechanism in Lean 4.

Sebastien Gouezel (Aug 22 2023 at 14:16):

Based on several discussions, the following was proposed, which would probably improve both on the Lean 3 and Lean 4 behavior. It would be based on the Lean 3 behavior but removing some sharp edges:

  • Include for a lemma or a def all the variables that are mentioned explicitly in the statement of the lemma (not its proof, even in term mode)
  • Also include all the typeclass assumptions about these variables which are in the context, and unnamed (edit: possibly also named ones, see the discussion below).
  • Allow include hf hg in before the statement of a theorem, to include hf and hg in the theorem (and all the unnamed typeclass assumptions that depend on them, as above). (edit: maybe also omit h in to remove a named typeclass assumption, see discussion below).
  • No global include / omit mechanism.
  • No automatic removal of unused typeclass assumptions at the end of the proof: Better let the linter signal it, and let the user fix it by hand.

Sebastien Gouezel (Aug 22 2023 at 14:16):

Another option was mentioned instead of include ... in: If there is some assumption hf in the context, then theorem foo ... (hf) ... would force the inclusion of hf in the theorem, without the need to restate its type. And one could adapt binders, for instance with theorem foo ... {hf} ....

Sebastien Gouezel (Aug 22 2023 at 14:16):

Here is a poll to see what mathlib folks think of these different options. What core would think of it is another issue, but before anything it would be useful to know what mathlib prefers.

Sebastien Gouezel (Aug 22 2023 at 14:16):

/poll How should mathlib4 include variables in theorems

  • current Lean 4 behavior: include everything in scope, remove unused stuff by the end of the proof.
  • Lean 3 behavior.
  • above proposal, with include hf in to force the inclusion of hf.
  • above proposal, with theorem foo ... (hf) ... to force the inclusion of hf.

Sebastien Gouezel (Aug 22 2023 at 14:17):

I'm mainly opening the discussion, and possibly missing interesting points, so additional discussion is most welcome!

Anatole Dedecker (Aug 22 2023 at 14:29):

I really don't miss the Lean3 behavior being dependent on wether typeclass assumptions were named or not. Besides, we probably want way more named arguments in Lean4 to be able to specify them manually. I'm really looking forward to being able to use that extensively in the topology library (I was even thinking of systematically naming all TopologicalSpace assumptions), but making named TC arguments work differently than unnamed ones would seriously discourage people from using it in that case.

Frédéric Dupuis (Aug 22 2023 at 14:30):

I think it would be still be nice to retain the option of keeping the current Lean 4 behavior; so far I've found it a lot more pleasant than the constant micromanagement of variables in Lean 3, and haven't had any of these issues bite me so far.

Anatole Dedecker (Aug 22 2023 at 14:31):

The problem is that we don't have a net to catch these errors for now...

Anatole Dedecker (Aug 22 2023 at 14:33):

The include in feature will already be way better than include/omit in Lean3 imo, because we no longer have to be smart about it by grouping together lemmas that require the same variables and so on.

Frédéric Dupuis (Aug 22 2023 at 14:34):

Sure, but it would be still be nice to be able to opt out of the Lean 3 style for the 99% of the time when the current Lean 4 behavior works well.

Sebastien Gouezel (Aug 22 2023 at 14:39):

The problem with the Lean 4 behavior is that you can't know if it worked well or not (and maybe it works well at some point in time, but it doesn't after some refactor). And there is no linter to help us catch its mistakes.

Sebastien Gouezel (Aug 22 2023 at 14:42):

I agree that having more named instances would be a good idea, and is not compatible with automatic inclusion of unnamed instances only. Another option would be to include all instances that depend on already included variables, be they named or not. This would force a little bit more micromanagement, but it would also increase readability and predictability, IMHO.

Johan Commelin (Aug 22 2023 at 14:45):

If necessary, you could imagine having

omit myNamedInst in
theorem HodgeConjecture : ...

Floris van Doorn (Aug 22 2023 at 14:46):

Some smaller suggestions:

  • include (hf) {hg} in might be nice to force the inclusion and update the binder type at the same time - and this can also be used to update the binder type even if the theorems statement mentions hf or hg. I like that syntax better than the theorem foo (hf) {hg} syntax.
  • Do not re-elaborate variables for every declaration separately, but only once. This was causing performance issues before, and I think it's a bug - not a feature - that variable types can be elaborated differently for the two options (I think it's also a feature that variable types generally have a universe variable as universe level).

Shall I make new options for these? Or is the proposal flexible enough to encapsulate (at least) the first suggestion?

Sebastien Gouezel (Aug 22 2023 at 14:48):

These are all very good ideas. I'd say let's not add new suggestions to the poll for now, and perhaps make a second finer poll once the discussion has settled a little bit. It will make it easier to see big trends.

Damiano Testa (Aug 22 2023 at 14:49):

I think that limiting the scope of include/omit to single declarations is generally an improvement.

I also like Florian's syntax.

Maybe, there should be also a stricter naming scheme for assumptions: I've found the new option of calling out assumptions by their label awesome and would like to be able to guess them more reliably!

Damiano Testa (Aug 22 2023 at 14:54):

In any case, if all the variables that I see in the infoview survive in the final type, that would already be an improvement. Right now, I am always worried that something that I'm not "using" ends up being part of the declaration.

Damiano Testa (Aug 22 2023 at 14:55):

(That and the unused variable linter pointing out unused stuff.)

Frédéric Dupuis (Aug 22 2023 at 14:59):

Can't you just hover the mouse over the lemma name to see what declaration was generated? Am I missing something here?

Damiano Testa (Aug 22 2023 at 15:00):

You can, but you're probably not going to remember to do it after every refactor...

Damiano Testa (Aug 22 2023 at 15:02):

Or if you decide to add another assumption to the variable declaration at the beginning of a section.

Eric Rodriguez (Aug 22 2023 at 16:06):

I like Floris' suggestion too, I think it's the best of both worlds. It's very often theorem by theorem what things should be explicit, even in a section, and being able to make things ex/implicit at will is great

Sebastien Gouezel (Aug 23 2023 at 19:25):

RFC at lean4#2452

Scott Morrison (Aug 23 2023 at 23:56):

It would be great if we could hear from any non-Mathlib users of Lean, and how this would effect them. If anyone feels they are in that category, please speak up!

Shreyas Srinivas (Aug 25 2023 at 22:29):

I haven't had time to consider the details, and how they affect my projects. I dont think this will happen overnight given the size and nature of the proposal, and the subtle errors that it might introduce. But I have a question and one suggestion for the process of introducing such changes:

  1. Can all these features be (at least initially)implemented as locally set options at the file or project lakefile level, which are not set by default for non-mathlib projects?
  2. Perhaps these options be tested on some experimental branch of mathlib for some time before being introduced to the main branch.

Last updated: Dec 20 2023 at 11:08 UTC