Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: Extracting full goal from REPL with type information

Jui-Hui Chung (May 16 2025 at 20:24):

Hi all, I'm trying to reconstruct the statement of a theorem from the Lean 4 REPL output (e.g., the sorries field or what appears in Infoview in VSCode). However, I’ve noticed that some type information is lost or misinterpreted.

Here’s the original theorem:

theorem lean_workbook_47828
  (S : Finset ℕ)
  (h₀ : ∀ (c : ℕ), c ∈ S ↔ 0 < c ∧ ∃ (x : ℚ), 5 * x^2 + 11 * x + c = 0) :
  ∏ k ∈ S, k = 12 := by sorry

In the REPL or Infoview, the goal is shown as:

S : Finset ℕ
h₀ : ∀ (c : ℕ), c ∈ S ↔ 0 < c ∧ ∃ x, 5 * x ^ 2 + 11 * x + ↑c = 0
⊢ ∏ k ∈ S, k = 12

If I try to reconstruct the theorem from this output:

theorem lean_workbook_47828_infoview
  (S : Finset ℕ)
  (h₀ : ∀ (c : ℕ), c ∈ S ↔ 0 < c ∧ ∃ x, 5 * x ^ 2 + 11 * x + ↑c = 0) :
  ∏ k ∈ S, k = 12 := by sorry

then Lean defaults x to type ℕ, which is incorrect—the original theorem used x : ℚ.

Is there a reliable way to extract the goal with full type information, including implicit types?

Thanks in advance!

Kevin Buzzard (May 16 2025 at 21:10):

If you switch on some pretty-printer options (the most extreme of which would be set_option pp.all true) then you have a much higher chance that your goal state will round-trip in this way.

Eric Wieser (May 16 2025 at 21:12):

Often this is an #xy problem, and the short answer is "don't try to go from Expr to Syntax in the first place"

Jui-Hui Chung (May 16 2025 at 22:33):

Eric Wieser said:

Often this is an #xy problem, and the short answer is "don't try to go from Expr to Syntax in the first place"

Thanks for the response.

If this approach is incorrect, what would be the proper way to achieve this? In my project, I’m trying to extract the goal state from the middle of a proof and convert it into a standalone theorem, so that a machine can attempt to prove the resulting subgoal. (The machine is trained to process inputs that are standalone theorems ending with := by sorry.)

Thomas Zhu (May 16 2025 at 23:48):

(I remember this question was asked somewhere before.)

Short answer is set_option pp.funBinderTypes true.

Long answer: I am assuming you are training your machine to take as input a string representation of the theorem like "theorem h1 h2 : G := by sorry". The way you obtain this string from a goal can be what you're currently doing. But you shouldn't put this string back into Lean verbatim, because the delaborator that produces this string is not designed to be information preserving (in this case it can be fixed like set_option pp.funBinderTypes true but this is ad-hoc). I guess a reasonable alternative is to let the machine find a proof, and put the proof back into where you extracted the goal in the first place?

Thomas Zhu (May 16 2025 at 23:59):

Alternatively if you really want to put the theorem back into Lean, then you can print the state twice, once with pp.all false to feed to the LLM and once with pp.all true to feed to Lean as the statement

David Xu (May 17 2025 at 22:42):

There is also a tactic called extract_goal:

import Mathlib.Tactic

-- Tell the pretty-printer to print `∃ (x : ℚ)`, not `∃ x`, which defaults to `∃ (x : ℕ)`
set_option pp.funBinderTypes true
-- set_option pp.all true  -- print every detail (very verbose)

theorem lean_workbook_47828
  (S : Finset ℕ)
  (h₀ : ∀ (c : ℕ), c ∈ S ↔ 0 < c ∧ ∃ (x : ℚ), 5 * x^2 + 11 * x + c = 0) :
  ∏ k ∈ S, k = 12 := by
  extract_goal
  /-
  It prints this message:
  theorem extracted_1 (S : Finset ℕ) (h₀ : ∀ (c : ℕ), c ∈ S ↔ 0 < c ∧ ∃ (x : ℚ), 5 * x ^ 2 + 11 * x + ↑c = 0) :
    ∏ k ∈ S, k = 12 := sorry
  -/
  sorry

theorem extracted_1 (S : Finset ℕ) (h₀ : ∀ (c : ℕ), c ∈ S ↔ 0 < c ∧ ∃ (x : ℚ), 5 * x ^ 2 + 11 * x + ↑c = 0) :
  ∏ k ∈ S, k = 12 := sorry

David Xu (May 17 2025 at 22:44):

The documentation string: https://github.com/dx2102/mathlib4-tactics-list?tab=readme-ov-file#223-extract_goal (I found this tactic by searching with Ctrl+F)

extract_goal formats the current goal as a stand-alone theorem or definition after
cleaning up the local context of irrelevant variables.
A variable is relevant if (1) it occurs in the target type, (2) there is a relevant variable
that depends on it, or (3) the type of the variable is a proposition that depends on a
relevant variable.

If the target is False, then for convenience extract_goal includes all variables.
extract_goal * formats the current goal without cleaning up the local context.
extract_goal a b c ... formats the current goal after removing everything that the given
variables a, b, c, ... do not depend on.
extract_goal ... using name uses the name name for the theorem or definition rather than
the autogenerated name.

The tactic tries to produce an output that can be copy-pasted and just work,
but its success depends on whether the expressions are amenable
to being unambiguously pretty printed.

The tactic responds to pretty printing options.
For example, set_option pp.all true in extract_goal gives the pp.all form.

Last updated: Feb 28 2026 at 14:05 UTC