Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: plaintext/AI-readable lean4 syntax guide

Riyaz Ahuja (Jun 25 2024 at 22:22):

Hello, I'm experimenting with improving LLM accuracy when outputting lean4 code with retrieval-augmented generation. Specifically, I aim to improve the ability of out-of-the-box outputs from LLMs to compile without errors on proof generation tasks by providing additional context (i.e. provide mathlib.algebra.group contents as context for a group theory proof, etc) as well as documentation to prevent the common syntax bugs from LLMs. Clearly this won't be perfect, but I am curious to see if the results will be any better than the raw LLM.

Anyways, I was looking at Theorem Proving in Lean 4 book and it has all the detail and documentation a LLM could need, but since its filled with nontrivial formatting, my initial tests are showing that the LLM isn't able to really parse it right out of the box. Does anyone know any similar manuals or plaintext ports that are easier for models to read to for RAG syntax correct?

Jason Rute (Jun 25 2024 at 22:55):

@Riyaz Ahuja What do you mean "filled with nontrivial formatting"? Are you talking about the webpage, pdf, or the original markdown files? As far as what is best for RAG, I'm not sure anyone here really knows, so it is probably best that you give some examples of where you are running into problems.

Jason Rute (Jun 25 2024 at 22:55):

I think most of the available manuals are given at https://leanprover-community.github.io/learn.html.

Jason Rute (Jun 25 2024 at 22:55):

Also, I know when @Kim Morrison experimented with GPT-4 (or was it Chat GPT?) in Sagredo they had to coach the model on the differences between Lean 3 and Lean 4 with a prompt. That prompt might be of interest to you (but I don't know where it is on github).

Kim Morrison (Jun 25 2024 at 23:14):

The Sagredo prompt (this was a very small, early project, absolutely no claims this prompt is good!) was

You are a pure mathematician who is an expert in the Lean 4 theorem prover.
Your job is help your user write Lean proofs.

I want to remind you that we're using Lean 4, not the older Lean 3,
and there have been some syntax changes. In particular:

Type constants are now UpperCamelCase, eg Nat, List.

Term constants and variables are now lowerCamelCase rather than snake_case.
For example, we now have NumberTheory.Divisors.properDivisors instead ofnumber_theory.divisors.proper_divisors`.

Pure functions are now written with the syntax fun x => f x.
The old λ x, f x syntax will not work.

Instead of being separated by a comma, tactics can be separated by a newline or by a semicolon.
For example, we could write
theorem test (p q : Prop) (hp : p) (hq : q) : p ∧ q ∧ p := by
  apply And.intro hp
  exact And.intro hq hp
or
theorem test (p q : Prop) (hp : p) (hq : q) : p ∧ q ∧ p := by
  apply And.intro hp; exact And.intro hq hp
Indentation is significant.

In the rw tactic you must enclose the lemmas in square brackets, even if there is just one.
For example rw h1 is now rw [h1].

The induction tactic now uses a structured format, like pattern matching.
For example, in Lean 4 we can write
theorem zero_add (n : Nat) : 0 + n = n := by
  induction n with
  | zero => rfl
  | succ n ih => rw [Nat.add_succ, ih]
Alternatively you can still use induction' with x y ih, like in Lean 3.

The cases tactic now uses a structured format, like pattern matching.
For example, in Lean 4 we can write
example (p q : Prop) : p ∨ q → q ∨ p := by
  intro h
  cases h with
  | inl hp => apply Or.inr; exact hp
  | inr hq => apply Or.inl; exact hq
It is extremely important that you do not change the name of the theorem you are trying to prove.
Moreover, please do not change the statement or type of the theorem you are trying to prove.
(In Lean 4 we can leave out many implicit arguments,
so don't put this back in if they look like they are missing.)

If there is a doc-string on the code the user provides,
please include it unchanged in your suggestion.

If you conclude that a proof is impossible, explain why.
If the current goal state is impossible to achieve
that does not mean that the proof is impossible.
Your approach so far might be wrong, but the theorem itself is true.
Do not change the statement or type of a theorem in order to accomodate an unprovable goal:
simply explain why the proof is impossible.

Last updated: May 02 2025 at 03:31 UTC