Zulip Chat Archive

Stream: lean4

Topic: Lean 4 variable command discussion


Floris van Doorn (Jul 18 2023 at 10:42):

There are quite some people that have expressed problems with the new variable command. Some complaints I've heard:

  1. The goal view shows variables that aren't actually relevant for this declaration, which clutters the goal state.
  2. It is not predictable what variables end up being included in a declaration (refactoring a proof might change it).
  3. variable (x : Nat) in def ... doesn't actually force x to be included in the declaration.
  4. Variables are elaborated again for every declaration, which can be a performance issue and cause the variables to elaborate to different expressions in different declarations.
  5. It is not easy to see at a glance what the statement of a theorem is (what variables ended up included)
  6. There are large ugly variable declarations at the top of some files.
  7. It conflicts with potential features like parallel compilation of proofs or a mode where you don't execute proof scripts.

5 and 6 were already problems in Lean 3 (5 improved a bit, since you can mouse-over the declaration name in Lean 4 to see the statement).
2 was somewhat already a problem in Lean 3, but much more so in Lean 4.

Floris van Doorn (Jul 18 2023 at 10:43):

I believe many Mathlib4 users prefer the Lean 3 variable command over the Lean 4 one, acknowledging that the Lean 3 variable command also had issues.

Floris van Doorn (Jul 18 2023 at 10:44):

Should we go back to the Lean 3 variable command? Can we make the variable command better than either Lean 3 or the current version in Lean 4?

Oliver Nash (Jul 18 2023 at 10:51):

At least for now, I still prefer the Lean 3 behaviour.

Is it true that the Lean 4 behaviour has the advantage that include and omit statements are no longer required?

Floris van Doorn (Jul 18 2023 at 10:53):

Yes, that is correct, in Lean 4 you don't need the include and omit commands anymore.

Eric Wieser (Jul 18 2023 at 10:58):

Variables are elaborated again for every declaration, which can be a performance issue.

This can also cause surprises, especially when they do not elaborate the same way every time

Scott Morrison (Jul 18 2023 at 11:09):

I am very happy to see omit and include gone, and hope that they are not coming back in an RFC!

Floris van Doorn (Jul 18 2023 at 11:21):

Eric Wieser said:

Variables are elaborated again for every declaration, which can be a performance issue.

This can also cause surprises, especially when they do not elaborate the same way every time

Good point. I added that to the first post.

Floris van Doorn (Jul 18 2023 at 11:25):

Scott Morrison said:

I am very happy to see omit and include gone, and hope that they are not coming back in an RFC!

How would you feel about the following, in combination with the Lean 3 heuristic? We can decide to not have a scoped version of include, but only a per-declaration version.

include a b c in
theorem ...

Sebastien Gouezel (Jul 18 2023 at 11:54):

Another suggestion that was mentioned at some point: only use variables that appear explicitly in the goal, and typeclasses thereof. And force inclusion of variables with {h} or (h) with the following meaning: if there is an h in the list of variables, this forces the inclusion of h in the current statement (with the binder type you have just specificed). While if there is no h, then go back to autoImplicit behavior. With this, no need for omit and include.

Johan Commelin (Jul 18 2023 at 11:57):

I could also imagine include h1 h2 in ... syntax. The problem with include and omit was that I couldn't keep track of the current state of what is included and what not. But with include in that isn't a problem. Of course there is a bit of a risk that we would need an include in before 95% of the statements in certain files. But that just hints at another design issue in such a file.

Kyle Miller (Jul 18 2023 at 12:07):

@Sebastien Gouezel Some design questions with that: What if these {h}/(h) variables do not appear in the same order as the declared variables? What if they come after another variable with some name n, and h depends on a variable with name n? These are just some things that need to be clearly specified.

Kyle Miller (Jul 18 2023 at 12:08):

If we have include ... in, it would be nice if you could override binder types, like include {h} in or include (h) in. (Though with Lean 4 dot notation, you don't need an explicit argument for the object anymore. It can be implicit.)

Evgeniy Kuznetsov (Jul 18 2023 at 12:13):

Even trivial examples cause induction to generalize irrelevant section variables

inductive Vector (α : Type): Nat  Type where
| nil : Vector α 0
| cons (head : α) (tail : Vector α n) : Vector α (n+1)

namespace Vector
  variable {n: Nat} {α: Type} (xs ys: Vector α n)

  def length: {n: Nat}  Vector α n  Nat
  | _, .nil       => 0
  | _, .cons _ xs => xs.length + 1

  theorem length_eq: xs.length = n := by induction xs <;> simp_all!
end Vector

Sebastien Gouezel (Jul 18 2023 at 12:15):

Indeed, these need to be specified. My take on these:

  • when doing {h}, it means the old include h (so respect the position at which it was declared in the variable block).
  • if h depends on some other variable n which was also in a variable block, then n should also be included, and transitively. If there is another different explicit n in the theorem statement, then the previous ones will be marked with a dagger by hygiene, as usual.

Kyle Miller (Jul 18 2023 at 12:30):

I worry that it will be confusing that these {h}/(h) instructions might not correspond to where they appear as arguments in the completed theorem. Would there be any problem restricting these to always come first before normal arguments? (Maybe we disallow any {h}/(h) that come later if h would shadow a variable variable?)

Sebastian Ullrich (Jul 18 2023 at 13:18):

Without yet forming an opinion on the overall issue, my first reaction would be that variable/include ... in appears to me as the more clear and confusion-avoiding option compared to overloading the meaning of regular binders, even if more verbose

Jon Eugster (Jul 18 2023 at 13:19):

To add to the examples, the only problem I had with the current variable command so far is that sorrying a proof would change the signature of the declaration, which is annoying while working on stuff. a very simple include surj in that keeps order and implicitely as it was in the variable call would already suffice for this use case:

variable (surj : ...)

theorem my_thm : STATEMENT := by
  sorry -- Proof will eventually need `surj` but doesn't until the `sorry` is worked on.

example (surj : ...) : STATEMENT := by
  apply my_thm surj -- doesn't work until the proof of `my_thm` used the variable

For the cluttered proof state, I'd be completely happy to just mark variables that could be included but aren't yet, if that was easy to implement:

import Mathlib.Logic.Basic

variable (P Q : Prop) (surj :  n, n = n)

example : ¬ (P  Q)  ¬ P  ¬ Q := by
  -- here
  exact not_and_or

screenshot-lean.png

(ofc there are other issues mentioned here which need to be addressed too. This is only the issue I'm confronted with most frequently.)

François G. Dorais (Jul 19 2023 at 11:46):

It seems to me that a large part of the unpredictability issue has to do with how variable generalization works. I wonder if it would make sense for the variable command to make its variables non-generalizable. That would certainly cause different style of headaches since uses of generalization are often invisible, but it would fix a lot of issues mentioned here.

Eric Wieser (Jul 19 2023 at 11:51):

By which you mean, elaborate them all in the variable command, then use the elaborated version in all future declarations?

François G. Dorais (Jul 19 2023 at 12:06):

No, I think that's much stronger. I think that could create unusual relationships between universe levels, for example. (For example, a theorem of the type {α : Type (max u v)} (x : α), x = x just because variable elaborated α that way at the outset.) I'm thinking about the induction example, where the induction needs to generalize n but then automatically generalizes ys too since it depends on n. Maybe a more sophisticated implementation of generalization would also work.

Eric Wieser (Jul 19 2023 at 12:13):

Note that right now it usually happens in the other directions; definitions accidentally add the unusual relationships between universes because the implementer doesn't notice. It's very rare for that type of mistake to appear in the variables line, but quite common inside a theorem statement and even more common inside a definition.

Sebastien Gouezel (Aug 20 2023 at 09:45):

I stumbled on yet another issue with the current automatic variable integration/removal, in #6689. I refactor a lemma A, which used to require assumptions [SecondCountableTopology α] [SecondCountableTopology β] and now only needs [SecondCountableTopologyEither α β] (a typeclass which follows from second countability either of α or β). Then I would like to weaken the assumptions of lemmas using A in the same way. Unfortunately, the unused variables linter won't tell me about them: if there are assumptions [SecondCountableTopology α] [SecondCountableTopology β] in the context, then the application of A will just happily use the first of them, and then the second one will be discarded silently at the end of the proof (so no unused variables). This means we end up with a suboptimal version of the new lemma, requiring [SecondCountableTopology α] instead of [SecondCountableTopologyEither α β].

The unused variables linter has still found a few of them, where typeclasses were explicit in the statement, but I'm sure I'm missing a lot of them. Yet another case showing that our current variables management is not working as smoothly as it used to in Lean 3, this time hindering refactors.

Is there already an RFC open somewhere about the variable management issue? Otherwise, I'll open one (in core or in mathlib?) explaning the various issues we had and the solutions that were advertised in this thread.

Eric Wieser (Aug 20 2023 at 09:53):

Sebastien Gouezel said:

then the application of A will just happily use the first of them, and then the second one will be discarded silently at the end of the proof (so no unused variables).

What do you mean by "discarded silently"; are you claiming that the argument ends up being tangled in the proof somehow but not being used?

Sebastien Gouezel (Aug 20 2023 at 10:01):

To apply Lemma A, you need to check the typeclass [SecondCountableTopologyEither α β]. So typeclass inference starts, finds that it follows from [SecondCountableTopology α], and uses this instance. So it never needs [SecondCountableTopology β].

Eric Wieser (Aug 20 2023 at 10:05):

And what you want it to do instead is use a local [SecondCountableTopologyEither α β] hypothesis?

Sebastien Gouezel (Aug 20 2023 at 10:06):

Yes, that would be better. But in the current situation there is no linter which can notice that.


Last updated: Dec 20 2023 at 11:08 UTC