Stream: new members

Topic: tactic preference

Damiano Testa (Oct 18 2020 at 05:56):

Dear All,

in the process of cleaning up proofs, I find myself having to make choices between using different tactics with the same final effect. In practice, I often end up with the following scheme:

simp becomes rw (if I am lucky, via squeeze_simp)
apply [...], exact [...] becomes exact [......]
rw [... \iff ...] becomes apply [(...).mp(r)...] (when reasonable/possible)
have becomes a previous lemma

I realize that tactics are not just not linearly ordered, they are probably not even a poset! Still, a proof commits you to some choices and I would like to know that I am at least going in a reasonable direction. I would like to know whether you find the scheme above reasonable. What other substitutions would you make? For instance, I rarely use subst [...] since the next tactic is almost invariably rw and then I merge them in a longer rw. Which is normally preferred: subst [...], rw [...] or rw [......]?

Sorry for the vague question: I hope that someone can still give me some pointers!

Mario Carneiro (Oct 18 2020 at 05:57):

I try to merge subst into the nearest rcases

Damiano Testa (Oct 18 2020 at 05:58):

Mario Carneiro said:

I try to merge subst into the nearest rcases

Ah, I use very rarely rcases! The fact that you mention it, makes me understand that I should use it more! Thanks!

Mario Carneiro (Oct 18 2020 at 05:59):

You can go really far with rcases and refine

Mario Carneiro (Oct 18 2020 at 05:59):

it's basically term mode proof in tactic clothing

Damiano Testa (Oct 18 2020 at 05:59):

refine I *try * a lot, but end up being able to apply it only very rarely.

Mario Carneiro (Oct 18 2020 at 06:00):

refine can be used in place of almost any tactic, although the equivalent may not be pretty depending on the tactic it replaces

Damiano Testa (Oct 18 2020 at 06:00):

so would a good measure be the length of the output of show_term to inform the choices above?

Mario Carneiro (Oct 18 2020 at 06:01):

it's good for replacing split, existsi/use and intro in some combination

Damiano Testa (Oct 18 2020 at 06:02):

ah, I did not know that splits could be avoided by refine: this will keep me busy! Ahaha

Mario Carneiro (Oct 18 2020 at 06:02):

yes, although you should be aware that some tactics do work that doesn't show up in the term, for example simp

Mario Carneiro (Oct 18 2020 at 06:02):

split is just refine <_, _>

Mario Carneiro (Oct 18 2020 at 06:02):

and you can stick a term in there if one of the sides is easy

Damiano Testa (Oct 18 2020 at 06:02):

I never leave a simp and almost never leave even a simp only

Damiano Testa (Oct 18 2020 at 06:03):

Mario Carneiro said:

split is just refine <_, _>

I did not know! I will stop using split, then!

Mario Carneiro (Oct 18 2020 at 06:04):

I will also point out that this doesn't necessarily match everyone's preference; this is good for making low compile time proofs but may go against one's aesthetics

Mario Carneiro (Oct 18 2020 at 06:05):

existsi a is just refine <a, _>

Damiano Testa (Oct 18 2020 at 06:05):

Yes, I understand that. Although, to be honest, proof irrelevance has the effect in my mind of replacing "beauty" with "efficiency" in a proof.

Mario Carneiro (Oct 18 2020 at 06:06):

a kindred spirit :grinning_face_with_smiling_eyes:

Damiano Testa (Oct 18 2020 at 06:06):

Mario Carneiro said:

existsi a is just refine <a, _>

I did not even know about the tactic existsi.

Mario Carneiro (Oct 18 2020 at 06:06):

use is basically the same as existsi

Damiano Testa (Oct 18 2020 at 06:07):

Ah, good to know! I work aournd statements so as to avoid needing use whenever I can. I do not know why...

Mario Carneiro (Oct 18 2020 at 06:08):

It does a bit of postprocessing with simp to make it more user friendly I think

Mario Carneiro (Oct 18 2020 at 06:09):

check out this beautiful proof:

theorem split_bits.determ_l {n₁ n₂ l} (h₁ : split_bits n₁ l) (h₂ : split_bits n₂ l) : n₁ = n₂ :=
begin
  induction l generalizing n₁ n₂, {cases h₁, cases h₂, refl},
  rcases l_hd with ⟨_, l', rfl⟩,
  induction l' generalizing n₁ n₂,
  { cases h₁, cases h₂, exact l_ih h₁_a h₂_a },
  { have : ∀ {n l'},
      split_bits n l' →
      l' = ⟨_, l'_hd :: l'_tl, rfl⟩ :: l_tl →
      l'_hd = nat.bodd n ∧
      split_bits (nat.div2 n) (⟨_, l'_tl, rfl⟩ :: l_tl),
    { intros, cases a; try {cases a_1},
      rcases a_bs with ⟨l₂, rfl⟩,
      injection a_1, cases h_2,
      cases congr_arg (λ v : Σ n, bitvec n, v.2.1) h_1,
      exact ⟨rfl, a_a⟩ },
    rcases this h₁ rfl with ⟨rfl, h₁'⟩,
    rcases this h₂ rfl with ⟨e, h₂'⟩,
    rw [← nat.bit_decomp n₁, e, l'_ih h₁' h₂', nat.bit_decomp] }
end

100% old school tactics

Damiano Testa (Oct 18 2020 at 06:09):

I now need to resist the urge to go back to all the code that I have written and preform further substitutions... surely, alongside proof-checkers, there are "proof-spas", beautifying your code, no?

Mario Carneiro (Oct 18 2020 at 06:10):

it's big business in metamath land, but lean doesn't make it so easy to optimize proofs in any large scale way

Damiano Testa (Oct 18 2020 at 06:11):

Why do you call the proof above "old school tactics"? What are examples "new school tactics"? I feel that I am a beginner learning the old-school...

Heather Macbeth (Oct 18 2020 at 06:12):

Is this something AI can do for us someday? Take the corpus of all the lemmas and definitions in mathlib, and find its own proofs, sometimes 3 lines shorter than the existing proofs?

Damiano Testa (Oct 18 2020 at 06:13):

Heather Macbeth said:

Is this something AI can do for us someday? Take the corpus of all the lemmas and definitions in mathlib, and find its own proofs, sometimes 3 lines shorter than the existing proofs?

In my proofs, I think I would be happy if it only shortened them by 3 lines! Even my one-line-proofs are long...

Mario Carneiro (Oct 18 2020 at 06:14):

It's something that can absolutely be done in principle (that's basically what GPT-f did for metamath), but the limiting factor is ease of access of the data to automated tools, which is crazy considering that it's a formal theorem prover

Heather Macbeth (Oct 18 2020 at 06:16):

Mario Carneiro said:

that's basically what GPT-f did for metamath

I don't know about this, can you suggest something for further reading?

Mario Carneiro (Oct 18 2020 at 06:17):

In fact, for a long time the main metamath tool has had a proof processing tool called "minimize" that is basically a peephole optimizer: it looks at sequences of 2 or 3 steps in your proof and tries to find ways to do the same thing in 1 or 2 steps utilizing the whole library. It's computationally expensive, so it is run once a year or so on the entire library, but it can also be run in the background while you are working on your new proof

Mario Carneiro (Oct 18 2020 at 06:18):

There is some discussion about GPT-f on the #Machine Learning for Theorem Proving stream

Mario Carneiro (Oct 18 2020 at 06:20):

The upshot is that most proofs in set.mm are "locally optimal": they don't necessarily take the globally shortest path but there are no local missteps so everything proceeds straightforwardly

Mario Carneiro (Oct 18 2020 at 06:21):

a simple analogue for lean would be deleting have statements that are never used

Mario Carneiro (Oct 18 2020 at 06:22):

But none of that can happen in lean 3 because no one can parse lean except lean

Damiano Testa (Oct 18 2020 at 06:38):

@Mario Carneiro Thank you very much for your input! I have already shortened a couple of proofs using some of your ideas!

Julian Berman (Oct 18 2020 at 13:29):

What causes the parsing difficulties in Lean 3 (and what's changing)?

Mario Carneiro (Oct 18 2020 at 13:29):

Well, lean 4 will be even harder to parse

Mario Carneiro (Oct 18 2020 at 13:30):

the reason lean 3 is hard to parse is because you can run arbitrary code while parsing

Julian Berman (Oct 18 2020 at 13:30):

Ah, of course.

Mario Carneiro (Oct 18 2020 at 13:31):

which means that parsing correctly implies also running the VM correctly, having a typechecker and tactic framework and every other observable feature of lean

Kenny Lau (Oct 18 2020 at 13:36):

how do you run arbitrary code while parsing?

Julian Berman (Oct 18 2020 at 13:40):

notation I guess?

Julian Berman (Oct 18 2020 at 13:41):

Or are you saying how would you implement that? I mean I guess if you're sufficiently motivated you start with running a Lean process externally or embedding it in the parser, parse everything you can statically and then communicate with it for the bits you can't parse yourself or something

Mario Carneiro (Oct 18 2020 at 13:42):

Anything using the parser monad

Mario Carneiro (Oct 18 2020 at 13:42):

tactic argument parsing and user commands

Mario Carneiro (Oct 18 2020 at 13:46):

meta def tactic.interactive.even (e : interactive.parse (do
  n ← lean.parser.small_nat,
  guard (2 ∣ n))) := tactic.skip

example : true := by even 1234 -- ok
example : true := by even 1233 -- parse error

Mario Carneiro (Oct 18 2020 at 13:47):

I'm saying that it is impossible to parse lean without also doing everything else lean does. It is not a separate pass

Kevin Lacker (Oct 19 2020 at 17:21):

can you get Lean to output a representation of the abstract syntax tree of a file? like in python, it's also a pain to make a third-party parser for Python code, but you don't have to do it because Python itself can use its parser and then you can walk the AST to encode it into json or something

Reid Barton (Oct 19 2020 at 17:23):

I think there never really is an abstract syntax tree, or at least, not of the whole file at the same time

Reid Barton (Oct 19 2020 at 17:24):

which isn't to say you couldn't retrofit such a concept onto Lean, but it sounds like a lot of work

Reid Barton (Oct 19 2020 at 17:25):

I looked into how the "return information for the identifier at a specific location" feature works and it just reparses the whole file up to that location

Kevin Lacker (Oct 19 2020 at 17:28):

weird

Reid Barton (Oct 19 2020 at 17:28):

In Lean 4 the parser is written in Lean and will be user-extensible in ways which hopefully include being able to capture the parsed AST for whatever nefarious purposes one might have

Reid Barton (Oct 19 2020 at 17:31):

and there's actually a Syntax Lean type which is the AST, rather than, when the parser encounters the inductive keyword, it just calling the handler for inductive commands, or however it works now

Mario Carneiro (Oct 19 2020 at 17:35):

One way to implement AST retrieval would be to have an AST structure in the parser state, and then just log all parser function calls in it, i.e. whenever you call p.inductive() the first thing it does is call ast.inductive() which pushes an inductive AST node on the stack. You end up with a preorder traversal of the AST as an array and stream that out

Kevin Lacker (Oct 19 2020 at 17:36):

i guess even the ast is kind of weird if you're thinking of external proof minimizers or similar

Kevin Lacker (Oct 19 2020 at 17:37):

what you really want is a reduced language... like theoretically, would it be possible to write every proof in lean as a sequence of have ___, from <existing lemma> _ _ _ type lines?

Kevin Lacker (Oct 19 2020 at 17:37):

because otherwise the minimizer has to understand norm_num, solve_by_elim, all these complex things

Mario Carneiro (Oct 19 2020 at 17:37):

That kind of thing is in the olean files and the export format, and there are already external parsers for that

Mario Carneiro (Oct 19 2020 at 17:38):

the external typecheckers work on the data from the export format, which you can get by running lean --export

Mario Carneiro (Oct 19 2020 at 17:39):

but what we don't have access to is stuff like the list of tactics used in a file

Reid Barton (Oct 19 2020 at 17:39):

But this might somehow be "too reduced"

Kevin Lacker (Oct 19 2020 at 17:40):

it's too reduced to minimize a proof - it isn't obvious how minimal a lean proof is, from its compiled form. it might be useful for something though...

Mario Carneiro (Oct 19 2020 at 17:41):

the olean files are a bit more feature rich in that they contain notations and annotations, in addition to raw logical proof data (which is all that is in the lean export files), but they don't have anything about begin end blocks

Kevin Lacker (Oct 19 2020 at 17:41):

is lean 4 going to maintain these formats

Mario Carneiro (Oct 19 2020 at 17:41):

no, olean was never meant to be a stable format

Mario Carneiro (Oct 19 2020 at 17:42):

and even the core logic has changed so the export format would have to be different

Mario Carneiro (Oct 19 2020 at 17:42):

but I think in lean 4 there won't be an export format (but there will be enough tools to roll your own)

Kevin Lacker (Oct 19 2020 at 17:42):

i wonder how painful transitioning from 3 -> 4 is going to be....

Mario Carneiro (Oct 19 2020 at 17:42):

quite

Mario Carneiro (Oct 19 2020 at 17:44):

I'm told that lean 4 is at least compatible enough to read lean 3 olean files and get "raw mathlib"

Mario Carneiro (Oct 19 2020 at 17:44):

or maybe that was the export format? @Gabriel Ebner knows

Reid Barton (Oct 19 2020 at 17:44):

olean or export?

Reid Barton (Oct 19 2020 at 17:44):

I think it was export

Mario Carneiro (Oct 19 2020 at 17:45):

that would make more sense. No one wants to parse the rats nest that is olean files

Reid Barton (Oct 19 2020 at 17:45):

says the author of olean-rs

Reid Barton (Oct 19 2020 at 17:45):

by the way, last time I checked it was out of date :upside_down:

Mario Carneiro (Oct 19 2020 at 17:46):

I should do something about that

Gabriel Ebner (Oct 19 2020 at 17:46):

There is an importer for the text output format here: https://github.com/gebner/lean4-mathlib-import

Gabriel Ebner (Oct 19 2020 at 17:47):

Import mathlib into Lean 4

This tool takes the text export of mathlib and creates an olean file that you can import with Lean 4.

Notation, etc., is not supported. All theorems are imported as axioms. Some names are adapted to the new convention, some aren't. This code is not intended for production.

You can build the Mathlib.olean as follows (make sure you have enough RAM):

make
make emacs    # install the lean4 emacs package yourself!

(If you're on NixOS, you might need to run everything inside nix-shell ~/lean4.)

After that, you can import mathlib and play with it in Lean 4:

import Import.Mathlib
#check (deriv : (real → real) → (real → real))

Reid Barton (Oct 19 2020 at 17:47):

Relevant Zulip thread https://leanprover.zulipchat.com/#narrow/stream/113488-general/topic/Importing.20mathlib.20into.20Lean.204/near/187561178

Mario Carneiro (Oct 19 2020 at 17:47):

oh, you write olean files?

Kevin Lacker (Oct 19 2020 at 17:47):

so let's say I want to parse, or quasi-parse mathlib. the theory is that you could make a faster library_search if you did some preprocessing and wrote things in a faster language. I just want the names, namespaces, and signatures of the lemmas/theorems, though. enough to generate some code that refers to them. I don't need to verify the proofs. is that possible from the export format?

Mario Carneiro (Oct 19 2020 at 17:47):

so I suppose you know what lean 4 olean looks like

Gabriel Ebner (Oct 19 2020 at 17:48):

I'm using the Lean 4 API for olean files.

Mario Carneiro (Oct 19 2020 at 17:49):

You can fairly easily write an introspection tactic from lean to gather that information. That's what leandoc is

Reid Barton (Oct 19 2020 at 17:49):

You can also reconstruct this information from the export format.

Kevin Lacker (Oct 19 2020 at 17:50):

ok. so it's not the full AST but it's like the first level of the AST

Mario Carneiro (Oct 19 2020 at 17:50):

more like the last level

Mario Carneiro (Oct 19 2020 at 17:50):

it's the fully digested form

Kevin Lacker (Oct 19 2020 at 17:51):

well, if you think of the AST as a tree where the root is like a namespace, and the first level of the tree is the children of that namespace, and then the next levels are how those children start to be defined, etc

Mario Carneiro (Oct 19 2020 at 17:51):

you don't even have information about implicit args though, so generating a parseable expr from that info is tough

Mario Carneiro (Oct 19 2020 at 17:51):

lean isn't really structured like that though

Mario Carneiro (Oct 19 2020 at 17:52):

you can put stuff in namespaces in an arbitrary order

Mario Carneiro (Oct 19 2020 at 17:52):

from any file

Mario Carneiro (Oct 19 2020 at 17:53):

I mean, the file structure is like that with namespaces and sections, but none of that structure is visible in the digested theorems

Kevin Lacker (Oct 19 2020 at 17:53):

makes sense. well i'll check out what leandoc does

Gabriel Ebner (Oct 19 2020 at 17:54):

Implicit arguments are there in the text output format. Even infix notation.

Gabriel Ebner (Oct 19 2020 at 17:54):

But yes, please write your own exporter in lean like doc-gen.

Mario Carneiro (Oct 19 2020 at 17:54):

That's another thing that you could only get at with a proper AST - we currently have no idea when namespaces and sections start and end, except when you are inside them

Mario Carneiro (Oct 19 2020 at 17:55):

and getting the current namespace is only recently available too. I think we still can't get at the list of variables in scope

Mario Carneiro (Oct 19 2020 at 17:56):

so my dream of writing a def-equivalent tactic is as yet unfulfilled

Last updated: May 02 2025 at 03:31 UTC