Zulip Chat Archive

Stream: general

Topic: Changing a proof triggers recompilation of importing files


Michael Stoll (Jun 11 2022 at 19:54):

I understand that adding a def/lemma to a file in mathlib will necessitate the recompilation of all dependent files, since (as was pointed out to me in a different thread) the simp set could change or there could be new instances.

But when just a proof of a lemma gets changed, then I would think it should not be necessary to compile the dependent files again, since it should only matter that a proof term is provided and not what it looks like. Still, my observation is that running leanproject build after such a change does recompile everything below the file in the import tree. Is there a good reason for that?

Damiano Testa (Jun 11 2022 at 19:56):

I'm not sure about the reason for recompiling, but placing a dot . in a file prevents lean from compiling above the dot.

Eric Wieser (Jun 11 2022 at 19:59):

There's a Todo about this in the lean source code

Eric Wieser (Jun 11 2022 at 19:59):

Lean doesn't know that you only modified that one proof, it assumes the entire rest of the file has changed too

Eric Wieser (Jun 11 2022 at 19:59):

Indeed, it's entirely possible for the script in one proof to affect later proofs by changing global settings

Damiano Testa (Jun 11 2022 at 20:00):

This page mentions the dot:

https://leanprover-community.github.io/tips_and_tricks.html

Michael Stoll (Jun 11 2022 at 20:03):

Eric Wieser said:

Lean doesn't know that you only modified that one proof, it assumes the entire rest of the file has changed too

Could Lean not check that the net result (after compiling the affeced file) is the same as before the change (so that every importing file sees exactly the same as before) and then skip the further recompilations?

Michael Stoll (Jun 11 2022 at 20:04):

I think that (say) golfing proofs in files close to the root gets quite inefficient this way...

Bart Michels (Jun 11 2022 at 20:08):

Is it not theoretically possible that a proof relies on the specific term produced by a lemma?

Michael Stoll (Jun 11 2022 at 20:12):

As far as I know, all proofs (= terms) of a statement (of type Prop) are considered equivalent ("proof irrelevance"), so I would think it is not even possible that a proof relies on the specifics of some other proof term.

Michael Stoll (Jun 11 2022 at 20:15):

Eric Wieser said:

Indeed, it's entirely possible for the script in one proof to affect later proofs by changing global settings

How is this possible? I would have thought that everything I do in a proof is local to that.

Damiano Testa (Jun 11 2022 at 20:35):

About proof irrelevance, I learned recently that it is not as strong as I would have thought. For instance docs#tactic.interactive.delta can access the actual proof (if I remember correctly, in on mobile).

Jason Rute (Jun 11 2022 at 20:41):

@Michael Stoll @Damiano Testa The environment holds a lot of information, including the term proof of every theorem. To see this, just do #print <theorem_name>. Moreover, tactics have full access to the environment, so theoretically, any change to a proof could change the behavior of a downstream tactic. It is unlikely, but not impossible.

Jason Rute (Jun 11 2022 at 20:50):

I think proof irrelevance is more referring to the behavior of the terms proofs and compiled code after they are constructed by tactics. So if you take the term proof export of mathlib and change a proof of a Prop, I think (hopefully) it is impossible for it to effect downstream proofs or the compiled code (but there might be more subtleties here than I am aware of).

Bart Michels (Jun 11 2022 at 20:54):

So the question is, if you have lemma some_nat : ℕ := 0 and lemma some_other_nat : ℕ := 0, is it possible at all to prove some_nat = some_other_nat?

Jason Rute (Jun 11 2022 at 20:55):

As for the state of the art on recompilation avoidance in Lean 4 and other interactive theorem provers, there is some discussion by @Sebastian Ullrich on the Proof Assistants Stack Exchange.
Sebastian also mentions an RFC.

Michael Stoll (Jun 11 2022 at 20:56):

Bart Michels said:

So the question is, if you have lemma some_nat : ℕ := 0 and lemma some_other_nat : ℕ := 0, is it possible at all to prove some_nat = some_other_nat?

These are definitions, not lemmas: they construct something of a type whose type is not Prop. "Proof irrelevance" refers only to propositions, not to data.

Yaël Dillies (Jun 11 2022 at 20:58):

Sure, but lemma is the same as def + high irreducibility.

Jason Rute (Jun 11 2022 at 20:58):

Bart Michels said:

So the question is, if you have lemma some_nat : ℕ := 0 and lemma some_other_nat : ℕ := 0, is it possible at all to prove some_nat = some_other_nat?

lemma some_nat :  := 0
lemma some_other_nat :  := 0
example : some_nat = some_other_nat := begin
 dunfold some_nat,
 dunfold some_other_nat,
 refl
end

(Also, what Michael said. Proof irrelevance only refers to Props, regardless whether they are declared with def or lemma/theorem.)

Michael Stoll (Jun 11 2022 at 21:05):

(This is what I meant.)

Yaël Dillies (Jun 11 2022 at 21:07):

I don't think Bart was referring to proof irrelevance.

Jason Rute (Jun 11 2022 at 21:10):

Maybe I misunderstood the motivation behind Bart's problem.

Yaël Dillies (Jun 11 2022 at 21:13):

The question was whether the information was even available, and the answer is "Yes, but it's well hidden and you need dunfold to access it".

Jason Rute (Jun 11 2022 at 21:15):

But if his lemmas were of type, say exists x, x=x, then the information would not be available in the same way, even using dunfold, right?

Yaël Dillies (Jun 11 2022 at 21:17):

Pretty sure it would. What propositional irrelevance means is that you don't even need to access it. Instead, you just look at the type, see it's a Prop and say "Oh, it's irrelevant, so they are equal!".

Jason Rute (Jun 11 2022 at 21:18):

Can you provide an example?

Yaël Dillies (Jun 11 2022 at 21:19):

exists.intro 0 rfl and exists.intro 1 rfl are both proofs of ∃ x : ℕ, x = x, so they are equal by propositional irrelevance, but they are nonetheless recorded as different proof terms.

Jason Rute (Jun 11 2022 at 21:20):

Ok, by information I mean the 0 and 1. If this was proof relevant homotopy type theory, then I would be able to access the 0 and 1.

Yaël Dillies (Jun 11 2022 at 21:20):

Yes, exactly.

Yaël Dillies (Jun 11 2022 at 21:22):

Propositional irrelevance is an additional defeq rule. It doesn't affect how Lean records declarations (although that would probably lead to some speedup, as Michael suggested at the very beginning of this thread).

Jason Rute (Jun 11 2022 at 22:50):

@Yaël Dillies I admit I was mixing up the axiom/defeq rule Propositional irrelevance with another concept, which for lack of remembering the correct name I'll call "proof erasure". At least is some places in Lean, the proofs of propositions are literally erased in Lean I believe. One place where this occurs is in compiled code. But I also think it is also true of proofs. Consider this example:

theorem thm1 : true  true := λ x, x
theorem thm2 : true  true := λ x, true.intro

example : thm1 = thm2 := begin
  dunfold thm1, -- _ = thm2
  dunfold thm2, -- _ = _,
  refl
end

Notice that dunfold doesn't expand thm1 or thm2 to their definitions. It just expands them to _. This suggests to me that the proof of thm1 was erased inside Lean, no? And this makes sense since because of the propositional irrelavance defeq rule, it is impossible to give a proof of say thm1 = thm1 which is not also a proof of thm2 = thm2. So there is no need to keep around the proof terms. For example:

def foo {α : Sort} : (λ x : α, x) = (λ x : α, x) := rfl
example : thm1 = thm1 := foo
example : thm2 = thm2 := foo

Yaël Dillies (Jun 11 2022 at 23:04):

There's such a thing as erased proofs, but I'm pretty sure that the _ is just pretty-printing. Isn't pp.proofs the option to turn this off?

Alex J. Best (Jun 11 2022 at 23:17):

One thing that actually does change with different proofs of the same lemma, is whether a proof of a simp lemma is just rfl or not. If so, dsimp will apply the lemma, and not otherwise. In this situation changing the proof of a lemma will affect the result of downstream tactics.

Alex J. Best (Jun 11 2022 at 23:19):

Of course that doesn't mean we should give up on the idea of reducing proof recompilation. As mathlib grows it becomes an even more important issue, but this sort of example shows that it's not a trivial problem that will have an easy fix in the lean codebase

Jason Rute (Jun 11 2022 at 23:29):

Yaël Dillies said:

There's such a thing as erased proofs, but I'm pretty sure that the _ is just pretty-printing. Isn't pp.proofs the option to turn this off?

I stand corrected. You are right about the pretty printing. I should have checked with pp.all first.

Junyan Xu (Jun 12 2022 at 01:51):

An example where the proof term affects whether a subsequent proof succeeds or not:

lemma eraseN (h : true) (n : ) : true := h
lemma eraseN_eq (h : true) (n : ) : eraseN h n = h := rfl

lemma trivial1 : true := trivial
lemma trivial2 : true := eraseN trivial 0

lemma test1 : trivial1 = trivial := by { dunfold trivial1, apply eraseN_eq }
/-
solve1 tactic failed, focused goal has not been solved
state:
⊢ ℕ
-/

lemma test2 : trivial2 = trivial := by { dunfold trivial2, apply eraseN_eq }
/- goals accomplished -/

Violeta Hernández (Jun 12 2022 at 08:40):

That's actually scary

Violeta Hernández (Jun 12 2022 at 08:40):

I'm aware of random changes breaking random code in Lean but I thought proofs were the one safe haven

Mario Carneiro (Jun 12 2022 at 08:50):

Keep in mind that this code is very contrived. No one should be using dunfold or delta tactics to unfold lemmas

Mario Carneiro (Jun 12 2022 at 08:53):

As another contrived example, a tactic could open the containing file and only succeed if it contains exactly 30126 characters

Violeta Hernández (Jun 12 2022 at 08:54):

Tactics have access to that too? That's wild

Mario Carneiro (Jun 12 2022 at 08:55):

tactics can do arbitrary IO

Mario Carneiro (Jun 12 2022 at 08:55):

how else would the gptf tactic work? It literally makes a network request

Mario Carneiro (Jun 12 2022 at 08:55):

there are also tactics that connect to external smt solvers like z3 and vampire

Michael Stoll (Jun 12 2022 at 09:12):

What is the gptf tactic? I can't find the string "gptf" in all of mathlib...

Bart Michels (Jun 12 2022 at 09:50):

Yaël Dillies said:

The question was whether the information was even available, and the answer is "Yes, but it's well hidden and you need dunfold to access it".

Right, but I was of course wrong to replace props with nat. I was imagining having proofs that rely on lemma1 = lemma2 and lemma1 ≠ lemma2 respectively (that would each compile depending on the terms used), but this would require an already present contradiction, not create a new one.
So never mind. I'm glad it spawned more interesting questions.

Jason Rute (Jun 12 2022 at 10:06):

Tactics using IO are wonderful for AI machinery like #lean-gptf but obviously shouldn’t be left in code.

Alex J. Best (Jun 12 2022 at 11:14):

While we don't have any right now in mathlib, I could definitely imagine a use for tactics using file IO, to read some proof format stored as text, that doesn't fit as a DSL within Lean itself. I'm thinking of some sort of SAT (dis)-proof format for example.

Yaël Dillies (Jun 12 2022 at 11:15):

Something à la #10645?

Alex J. Best (Jun 12 2022 at 11:25):

Can you explain more? I have some vague memory of you adding an algorithm to compute these numbers, but are the proofs more appropriately stored in a simpler format?

Yaël Dillies (Jun 12 2022 at 11:25):

Not yet, but that's my goal.

Michael Stoll (Jun 12 2022 at 12:14):

Alex J. Best said:

Of course that doesn't mean we should give up on the idea of reducing proof recompilation. As mathlib grows it becomes an even more important issue, but this sort of example shows that it's not a trivial problem that will have an easy fix in the lean codebase

I imagine one could define a restricted set of allowed tactics that are guaranteed to have no side effects w.r.t. importing files. I guess a linter could check that a given file only uses these, and then perhaps it would be safe to not trigger recompilations down the tree if only proofs are changed.

Or perhaps it's the other way round: if the importing file only uses "safe" tactics, then (it can be proved that) it will not be affected when proof terms (and nothing else) changes in imported files.

Mario Carneiro (Jun 12 2022 at 12:16):

Bart Michels said:

Right, but I was of course wrong to replace props with nat. I was imagining having proofs that rely on lemma1 = lemma2 and lemma1 ≠ lemma2 respectively (that would each compile depending on the terms used), but this would require an already present contradiction, not create a new one.

No, lemma1 = lemma2 is always true if it typechecks. This is what proof irrelevance is all about

Mario Carneiro (Jun 12 2022 at 12:17):

you can break a tactic based on random things that the tactic looks at but if you fix the proof term then there is no change to the content of the lemma that will break the proof

Mario Carneiro (Jun 12 2022 at 12:19):

@Michael Stoll Unfortunately simp would not be on that list so it is a bit of a non-starter

Michael Stoll (Jun 12 2022 at 12:21):

I see.

Michael Stoll (Jun 12 2022 at 12:21):

But it would certainly be good to have a more efficient way of modifying mathlib...

Mario Carneiro (Jun 12 2022 at 12:35):

I think the easiest thing (well, still pretty hard) would be to support a sort of "quick and dirty" mode which just ignores the potential issues that arise from changing proofs. The tricky part I think is just detecting that this is in fact all that happened, given that the user is typing things in live and momentarily broken states could look like a change to something other than a proof

Junyan Xu (Jun 12 2022 at 13:09):

@_**Gabriel Ebner|110043** said:

what happens is that on every file update the server receives, all reverse dependencies of the file are visited to recompile them. The recompilation happens asynchronously of course, but dependency resolution etc. all happens during the original file sync request. Thus: many reverse dependencies = laggy typing. We could try to handle the invalidation part of the file sync request asynchronously, this wouldn't fix everything but at least diagnostics etc. would still be responsive.

This is another problem that makes editing low-level files inconvenient, from another thread. I find this behavior strange, because unless you save a file, edits in the file shouldn't affect other files importing it. I think this might be easier to fix? (The current recommendation is to close the high level files, but if you have to do that, you may as well ask VSCode to tell Lean there are no other files open.)

Sebastian Ullrich (Jun 12 2022 at 13:11):

Lean 4 uses a much more conservative approach there, it will not propagate changes between open files unless you explicitly tell it to

Junyan Xu (Jun 12 2022 at 13:18):

Oh I was wrong actually, editing a file without saving can indeed affect files importing it. The imported file ... uses sorry message can appear without saving the imported file, as I just learned by testing.

Jason Rute (Jun 12 2022 at 13:21):

Mario Carneiro said:

[...] but if you fix the proof term then there is no change to the content of the lemma that will break the proof

Just to play devil's advocate, is this technically true given type checking timeouts (and the non-transitivity of defeq for that matter)? I imagine one could make a situation where if one changes an upstream proof term in just the right way, then the downstream proof term would time out (or fail to typecheck because of the non-transitivity of defeq or because of technicalities with subject reduction). I'm just being a troll of course. I see now reason this needs to be taken into account when considering practical proof recompilation strategies

Junyan Xu (Jun 12 2022 at 13:26):

Sebastian Ullrich said:

Lean 4 uses a much more conservative approach there, it will not propagate changes between open files unless you explicitly tell it to

If modifying Lean 3 is not the way to go, maybe we could just add a feature in the VSCode extension to close a file in Lean but not in VSCode (and reopen it in Lean when focused in VSCode). (It would also be nice to auto-detect the list of open files (transitively) importing the low-level file and provide an option to close them all in Lean.)

Junyan Xu (Jun 12 2022 at 13:36):

Oh it seems the old mode also helps with this lagging issue. So maybe I should just add --old to the Lean command in VSCode settings. But when I do want to sync the changes, restarting Lean with --old will have no effect, and I'd have to manually call lean --make, right? Seems a fair price to pay.

No, the old mode doesn't seem to help with the lagging issue; changes in imported files still affect files importing them even without saving.


Last updated: Dec 20 2023 at 11:08 UTC