Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: great 8b models for generating lean proofs?

Bulhwi Cha (Feb 18 2026 at 14:12):

Are there language models with eight billion parameters on HuggingFace that produce Lean proofs without errors?

Bulhwi Cha (Feb 18 2026 at 14:16):

Yong Lin said:

Our small model, Goedel-Prover-V2-8B, reaches 83.3% on MiniF2F test set at Pass@32, matching the performance of prior state-of-the-art DeepSeek-Prover-V2-671B while being nearly 100 times smaller in model size.

Bulhwi Cha (Feb 18 2026 at 14:22):

I don't have enough money to pay for ChatGPT Plus or Claude Pro, so I'll try using Goedel-Prover-V2-8B.

Bulhwi Cha (Feb 18 2026 at 15:09):

namespace Sequence

theorem const_of_rec (f : Nat → α) (hcon : ∀ (n : Nat), f (n + 1) = f n) (n : Nat) : f n = f 0 := by
  have h : ∀ (m : Nat), f m = f 0 := by
    intro m
    induction m with
    | zero => rfl
    | succ m ih =>
      rw [hcon, ih]
  exact h n

theorem const_of_strongRec (f : Nat → α) {c : Nat}
    (hcon : ∀ {n : Nat}, c ≤ n → f (n + 1) = f n) {n : Nat} (hle : c ≤ n) : f n = f c := by
  have h : ∀ (k : Nat), f (c + k) = f c := by
    intro k
    induction k with
    | zero => rfl
    | succ k ih =>
      have h₁ : c ≤ c + k := Nat.le_add_right c k
      have h₂ : f (c + k + 1) = f (c + k) := hcon (n := c + k) (by simpa [Nat.add_assoc] using h₁)
      rw [Nat.add_assoc] at *
      simpa [Nat.add_assoc, Nat.add_comm, Nat.add_left_comm] using h₂.trans ih
  have h₁ : f n = f c := by
    have h₂ : f (c + (n - c)) = f c := h (n - c)
    have h₃ : c + (n - c) = n := by
      have h₄ : c ≤ n := hle
      omega
    rw [h₃] at h₂
    exact h₂
  exact h₁

end Sequence

My proofs

namespace Sequence

theorem const_of_rec (f : Nat → α) (hcon : ∀ (n : Nat), f (n + 1) = f n) (n : Nat) : f n = f 0 := by
  induction n with
  | zero => rfl
  | succ k ih => rw [hcon k, ih]

theorem const_of_strongRec (f : Nat → α) {c : Nat} (hcon : ∀ {n : Nat}, c ≤ n → f (n + 1) = f n)
    {n : Nat} (hle : c ≤ n) : f n = f c := by
  let g : Nat → α := fun (d : Nat) ↦ f (c + d)
  have heq : g (n - c) = g 0 := by
    apply const_of_rec g _ (n - c)
    unfold g
    show ∀ (d : Nat), f (c + (d + 1)) = f (c + d)
    intro d
    rw [← Nat.add_assoc]
    exact hcon (show c ≤ c + d from Nat.le_add_right c d)
  unfold g at heq
  rwa [← Nat.add_sub_assoc hle, Nat.add_sub_self_left, Nat.add_zero] at heq

end Sequence

Bulhwi Cha (Feb 18 2026 at 15:10):

I'm satisfied with the result. My desktop computer can run Goedel-Prover-V2-8B, so it's free in a sense!

Bulhwi Cha (Feb 18 2026 at 15:51):

Sadly, it failed to prove the following theorems:

-- this code is written in Lean v4.27.0

import Mathlib.Data.Set.Basic

namespace Set

open Function

universe u

theorem ext' {α : Type u} {A B : Set α} (h : ∃ (f : A → B), Surjective f ∧
    ∀ (a : A), ∃ (haB : a.val ∈ B), f a = ⟨a.val, haB⟩) : A = B := by
  sorry

theorem ext_iff' {α : Type u} {A B : Set α} : A = B ↔ ∃ (f : A → B), Surjective f ∧
    ∀ (x : A), ∃ (hxB : x.val ∈ B), f x = ⟨x, hxB⟩ := by
  sorry

end Set

Bulhwi Cha (Feb 18 2026 at 16:15):

Thread

User

Prove the theorem Set.ext' in the following Lean 4 code:

-- this code is written in Lean v4.27.0

import Mathlib.Data.Set.Basic

namespace Set

open Function

universe u

theorem ext' {α : Type u} {A B : Set α} (h : ∃ (f : A → B), Surjective f ∧
    ∀ (a : A), ∃ (haB : a.val ∈ B), f a = ⟨a.val, haB⟩) : A = B := by
  sorry

Assistant

Step 1: Understand the Problem

First, we need to understand the theorem statement. The theorem claims that under certain conditions, two sets A and B are equal. The condition is that there exists a surjective function from A to B such that for every element a in A, the value f a is a pair where the first component is a.val and the second component is a proof that a.val is in B.

However, the statement of Set.ext' in the Lean code is not entirely correct or complete. The theorem is supposed to establish that A = B, but the current hypothesis is not sufficient for that. In particular:
1. The hypothesis mentions a.val and ⟨a.val, haB⟩, but the type of a is not explicitly stated. However, from the context, A is a set of elements of type α, and a : α (or a : A if A is a subtype of α). The hypothesis seems to treat a as an element of A and a.val as its underlying value in α. However, if A is not a subtype, this would be invalid. But in the Lean code, A is just a set (Set α), and a is an element of Set α (i.e., a : α). The statement a.val would only make sense if a is a subtype, but it isn't. Therefore, there is likely an error in the statement of the theorem, as a.val is not a valid expression when a : α.

Upon closer inspection, the Lean code actually has a : α (or a : A if A is a type), but A is a set (Set α). The condition a.val is not valid here because a is an element of Set α (a : α), and α is arbitrary, so a.val would not be defined unless α has a val field.

However, in Lean, Set α is a type of subsets of α, and a : Set α is not a valid type, because Set α is not a type of elements but a type of functions α → Prop. The statement A : Set α means A is a predicate α → Prop, and a is some variable of type α. Therefore, a.val is not a valid expression unless α is a structure with a val field, but this is not specified, and α is arbitrary.

But this is incorrect because in the Lean code, A and B are of type Set α, and the hypothesis is ∃ (f : A → B), Surjective f ∧ .... Here, A and B are not types but sets, so f : A → B is ill-typed because A and B are not types but subsets of α.

However, Lean's Set α is actually α → Prop, and A and B are of type α → Prop. Therefore, f : A → B is well-typed because A and B are treated as types (functions from α to Prop are treated as types).

But then the statement f a = ⟨a.val, haB⟩ is problematic because a : α (since a : A would mean a : (A : α → Prop) is a : α → Prop, but f a : B means f a : α → Prop and ⟨a.val, haB⟩ : ? is ill-typed because a.val is not defined unless α has a val field.

But [Stopped]
</thinking>

Here are my proofs:

-- this code is written in Lean v4.26.0

import Mathlib.Data.Set.Basic

namespace Set

open Function

universe u

theorem ext' {α : Type u} {A B : Set α} (h : ∃ (f : A → B), Surjective f ∧
    ∀ (a : A), ∃ (haB : a.val ∈ B), f a = ⟨a.val, haB⟩) : A = B := by
  let ⟨f, hsur, heq⟩ := h
  ext
  constructor
  · next x =>
    intro hxA
    let ⟨(hxB : x ∈ B), (hfx : f ⟨x, hxA⟩ = ⟨x, hxB⟩)⟩ := heq ⟨x, hxA⟩
    exact hxB
  · next y =>
    intro hyB
    let ⟨⟨x, hxA⟩, hfa₁⟩ := hsur ⟨y, hyB⟩
    let ⟨(hxB : x ∈ B), (hfa₂ : f ⟨x, hxA⟩ = ⟨x, hxB⟩)⟩ := heq ⟨x, hxA⟩
    have : x = y := by
      refine Subtype.mk.inj (show ⟨x, hxB⟩ = (⟨y, hyB⟩ : B) from ?_)
      rw [← hfa₁, hfa₂]
    subst y
    exact hxA

theorem ext_iff' {α : Type u} {A B : Set α} : A = B ↔ ∃ (f : A → B), Surjective f ∧
    ∀ (x : A), ∃ (hxB : x.val ∈ B), f x = ⟨x, hxB⟩ := by
  constructor
  · intro h
    exists fun x ↦ ⟨x.val, h ▸ x.property⟩
    constructor
    · intro y
      exists ⟨y.val, h.symm ▸ y.property⟩
    · intro x
      exists (h ▸ x.property)
  · exact ext'

end Set

Bulhwi Cha (Feb 18 2026 at 16:39):

Goedel-Prover-V2-8B also failed to prove the following theorem:

-- this code is written in Lean v4.27.0

@[simp] theorem List.isSuffixOf_iff_IsSuffix [BEq α] [LawfulBEq α] {l₁ l₂ : List α} :
    isSuffixOf l₁ l₂ ↔ l₁ <:+ l₂ := by
  sorry

Thread

User

Prove the theorem List.isSuffixOf_iff_IsSuffix in the @example06.lean file.

Assistant

Step-by-Step Abstract Plan

Problem Analysis

The theorem List.isSuffixOf_iff_IsSuffix states that for two lists l₁ and l₂, l₁ is a suffix of l₂ (isSuffixOf l₁ l₂) if and only if the concatenation of l₁ and l₂ is equal to l₁ ++ l₂ (l₁ <:+ l₂).

Definitions

Suffix: A list l₁ is a suffix of l₂ if there exists a list m such that l₂ = m ++ l₁.
List Concatenation: The concatenation l₁ <:+ l₂ is equivalent to l₁ ++ l₂.

Proof Outline

Forward Direction (isSuffixOf l₁ l₂ → l₁ <:+ l₂):
- Assume isSuffixOf l₁ l₂. Then there exists m such that l₂ = m ++ l₁.
- Then, l₁ <:+ l₂ = l₁ ++ (m ++ l₁) = (l₁ ++ m) ++ l₁ (by associativity of ++).
- But l₁ <:+ l₂ is by definition l₁ ++ l₂ = l₁ ++ (m ++ l₁) = (l₁ ++ m) ++ l₁.
- However, this doesn't directly show that l₁ <:+ l₂ is l₁ ++ l₂ unless we have a specific relationship.
- The problem is that the statement is incorrect. For example, take l₁ = [1] and l₂ = []:
  - isSuffixOf l₁ l₂ is true because l₂ = [] ++ [1] = [].
  - But l₁ <:+ l₂ is [1] ++ [] = [1], not l₁ ++ l₂ = [1] ++ [] = [1].
  - However, l₁ <:+ l₂ is l₁ ++ l₂, so l₁ <:+ l₂ = [1] ++ [] = [1] and l₁ ++ l₂ = [1] ++ [] = [1], which are the same.
- The issue arises when l₂ is a prefix of l₁. For example, take l₁ = [1, 2] and l₂ = [1]:
  - isSuffixOf l₁ l₂ is false because l₂ is not a suffix of l₁.
  - l₁ <:+ l₂ = [1, 2] ++ [1] = [1, 2, 1] and l₁ ++ l₂ = [1, 2] ++ [1] = [1, 2, 1].
- The confusion is in the statement. The actual condition for isSuffixOf l₁ l₂ is that l₂ can be written as m ++ l₁ for some m. So, isSuffixOf l₁ l₂ is equivalent to l₂ being a suffix of l₁.
- But in Lean, isSuffixOf l₁ l₂ is ∃ m, l₂ = m ++ l₁.
- And l₁ <:+ l₂ is l₁ ++ l₂ by definition.
- So, the statement is:
  - ∃ m, l₂ = m ++ l₁ ↔ l₁ ++ l₂ = l₁ ++ l₂.
- But l₁ ++ l₂ = l₁ ++ l₂ is always true, so the statement is:
  - ∃ m, l₂ = m ++ l₁ ↔ True, which is always true.
- But this is not correct. The statement ∃ m, l₂ = m ++ l₁ is equivalent to isSuffixOf l₁ l₂.
- And l₁ <:+ l₂ is l₁ ++ l₂ by definition.
- So, the statement is isSuffixOf l₁ l₂ ↔ l₁ ++ l₂ = l₁ ++ l₂.
- But l₁ ++ l₂ = l₁ ++ l₂ is always true, so the statement is isSuffixOf l₁ l₂ ↔ True.
- This means that the theorem is claiming that isSuffixOf l₁ l₂ is always true, which is not correct.

Correction

The theorem statement is incorrect. The correct statement should be:
isSuffixOf l₁ l₂ ↔ ∃ m, l₂ = m ++ l₁.
But ∃ m, l₂ = m ++ l₁ is equivalent to isSuffixOf l₁ l₂ by definition.
However, l₁ <:+ l₂ is l₁ ++ l₂, not l₁ ++ l₂.

But the original problem statement is isSuffixOf l₁ l₂ ↔ l₁ <:+ l₂, and l₁ <:+ l₂ is l₁ ++ l₂.
But isSuffixOf l₁ l₂ is not l₁ ++ l₂ unless l₂ = [].
This means the theorem is not correct as stated.

But perhaps the intended meaning is that l₁ is a suffix of l₂ if and only if l₁ <:+ l₂ = l₁ ++ l₂.
But l₁ <:+ l₂ is by definition l₁ ++ l₂, so l₁ <:+ l₂ = l₁ ++ l₂ is always true, and the theorem would be:
isSuffixOf l₁ l₂ ↔ True, which is not correct.

Conclusion

The theorem statement is incorrect, and the correct version should be:
isSuffixof l₁ l₂ ↔ ∃ m, l₂ = m ++ l₁, which is equivalent to the definition of isSuffixOf.

But the problem is asking to prove isSuffixOf l₁ l₂ ↔ l₁ <:+ l₂.
Since l₁ <:+ l₂ = l₁ ++ l₂, the right side is always true, but isSuffixOf l₁ l₂ is not always true.
Thus, the theorem is false, and no proof exists.

However, the problem might have a typo, and the intended statement is isSuffixOf l₁ l₂ ↔ ∃ m, l₂ = m ++ l₁.
But we cannot change the statement, and we must prove isSuffixOf l₁ l₂ ↔ l₁ <:+ l₂.

But l₁ <:+ l₂ is l₁ ++ l₂, so l₁ <:+ l₂ = l₁ ++ l₂ is always true, and the statement is:
isSuffixOf l₁ l₂ ↔ True, which is false.

But perhaps l₁ <:+ l₂ is not l₁ ++ l₂, but something else.
In Lean, List.hasSuffix l₁ l₂ is ∃ m, l₂ = m ++ l₁.
And l₁ <:+ l₂ is l₁ ++ l₂.

But in Lean, the actual theorem is isSuffixOf l₁ l₂ ↔ l₁ <:+ l₂.
Since l₁ <:+ l₂ is l₁ ++ l₂, the theorem is isSuffixOf l₁ l₂ ↔ l₁ ++ l₂ = l₁ ++ l₂, which is isSuffixOf l₁ l₂ ↔ True.

But isSuffixOf l₁ l₂ is ∃ m, l₂ = m ++ l₁, and this is not always true.
Thus, the theorem is false, and no proof exists.

But in Lean, isSuffixOf l₁ l₂ is defined as ∃ m, l₂ = m ++ l₁.
So, the theorem is ∃ m, l₂ = m ++ l₁ ↔ True, which is ∃ m, l₂ = m ++ l₁.

But ∃ m, l₂ = m ++ l₁ is not always true, so the theorem is false.

But perhaps l₁ <:+ l₂ is not l₁ ++ l₂.
But in Lean, List.hasSuffix l₁ l₂ is ∃ m, l₂ = m ++ l₁.
And l₁ <:+ l₂ is l₁ ++ l₂.

But in Lean 4.27.0, List.hasSuffix is ∃ m, l₂ = m ++ l₁.
And l₁ <:+ l₂ is l₁ ++ l₂.

But the problem is that isSuffixOf l₁ l₂ is not l₁ <:+ l₂.
The isSuffixOf predicate is ∃ m, l₂ = m ++ l₁.
But l₁ <:+ l₂ is l₁ ++ l₂.

But in Lean, isSuffixOf is defined as ∃ m, l₂ = m ++ l₁.
And l₁ <:+ l₂ is l₁ ++ l₂.

But the theorem is isSuffixOf l₁ l₂ ↔ l₁ <:+ l₂, which is ∃ m, l₂ = m ++ l₁ ↔ l₁ ++ l₂ = l₁ ++ l₂.

But l₁ ++ l₂ = l₁ ++ l₂ is always true, so the theorem is ∃ m, l₂ = m ++ l₁ ↔ True.

But ∃ m, l₂ = m ++ l₁ is not always true, so the theorem is false.

But the user wants to prove this, so the statement must be incorrect, and we must prove a contradiction.

But we [Stopped]
</thinking>

Bulhwi Cha (Feb 18 2026 at 16:43):

Here's my proof:

namespace List

@[simp] theorem isPrefixOf_iff_IsPrefix [BEq α] [LawfulBEq α] {l₁ l₂ : List α} :
    isPrefixOf l₁ l₂ ↔ l₁ <+: l₂ := by
  match l₁, l₂ with
  | [],   _  => simp [nil_prefix]
  | _::_, [] => simp
  | a::as, b::bs =>
    constructor
    · intro h
      simp only [isPrefixOf, Bool.and_eq_true, beq_iff_eq] at h
      let ⟨t, ht⟩ := isPrefixOf_iff_IsPrefix.mp h.2
      exists t
      simpa [h.1]
    · intro ⟨t, ht⟩
      simp only [cons_append, cons.injEq] at ht
      have hpf : as <+: bs := ⟨t, ht.2⟩
      simpa [isPrefixOf, ht.1] using isPrefixOf_iff_IsPrefix.mpr hpf

@[simp] theorem isSuffixOf_iff_IsSuffix [BEq α] [LawfulBEq α] {l₁ l₂ : List α} :
    isSuffixOf l₁ l₂ ↔ l₁ <:+ l₂ := by
  match l₁, l₂ with
  | [],   _  => simp [nil_suffix, isSuffixOf]
  | _::_, [] => simp [isSuffixOf, isPrefixOf]
  | a::as, b::bs =>
    constructor
    · intro h
      simp only [isSuffixOf, reverse_cons] at h
      let ⟨t, ht⟩ := isPrefixOf_iff_IsPrefix.mp h
      exists t.reverse
      simpa using congrArg reverse ht
    · intro ⟨t, ht⟩
      have hpf : (as.reverse ++ [a]) <+: (bs.reverse ++ [b]) := by
        exists t.reverse
        simpa using congrArg reverse ht
      simpa [isSuffixOf, reverse_cons] using isPrefixOf_iff_IsPrefix.mpr hpf

end List

ChatGPT 5.2 Thinking's Lean code

-- this code is written in Lean v4.27.0

namespace List

@[simp] theorem isPrefixOf_iff_IsPrefix [BEq α] [LawfulBEq α] {l₁ l₂ : List α} :
    isPrefixOf l₁ l₂ ↔ l₁ <+: l₂ := by
  induction l₁ generalizing l₂ with
  | nil =>
      constructor
      · intro _; exact ⟨l₂, by simp⟩
      · intro _; simp
  | cons a as ih =>
      cases l₂ with
      | nil =>
          simp [List.IsPrefix]
      | cons b bs =>
          have hIs : (a :: as) <+: (b :: bs) ↔ a = b ∧ as <+: bs := by
            constructor
            · intro hs
              rcases hs with ⟨t, ht⟩
              have ht' : a :: (as ++ t) = b :: bs := by
                simpa [List.cons_append] using ht
              cases ht'
              exact ⟨rfl, ⟨t, rfl⟩⟩
            · intro hs
              rcases hs with ⟨hab, hs⟩
              rcases hs with ⟨t, ht⟩
              subst hab
              refine ⟨t, ?_⟩
              simp [List.cons_append, ht]
          have hBool : isPrefixOf (a :: as) (b :: bs) ↔ a = b ∧ as <+: bs := by
            simpa [List.isPrefixOf, Bool.and_eq_true, beq_iff_eq, ih]
          exact hBool.trans hIs.symm

@[simp] theorem List.isSuffixOf_iff_IsSuffix [BEq α] [LawfulBEq α] {l₁ l₂ : List α} :
    isSuffixOf l₁ l₂ ↔ l₁ <:+ l₂ := by
  constructor
  · intro h
    have hp : l₁.reverse <+: l₂.reverse := by
      have : isPrefixOf l₁.reverse l₂.reverse := by
        simpa [List.isSuffixOf] using h
      exact
        (List.isPrefixOf_iff_IsPrefix (l₁ := l₁.reverse) (l₂ := l₂.reverse)).1 this
    rcases hp with ⟨t, ht⟩
    refine ⟨t.reverse, ?_⟩
    have hrev := congrArg List.reverse ht
    simpa [List.reverse_append, List.reverse_reverse] using hrev
  · intro hs
    rcases hs with ⟨t, ht⟩
    have hp : l₁.reverse <+: l₂.reverse := by
      refine ⟨t.reverse, ?_⟩
      have hrev := congrArg List.reverse ht
      simpa [List.reverse_append, List.reverse_reverse] using hrev
    have : isPrefixOf l₁.reverse l₂.reverse :=
      (List.isPrefixOf_iff_IsPrefix (l₁ := l₁.reverse) (l₂ := l₂.reverse)).2 hp
    simpa [List.isSuffixOf] using this

end List

Bulhwi Cha (Feb 18 2026 at 17:07):

What do we need to do for these relatively small language models to be much better at generating Lean proofs? Do we need more formalized theorems that are readily available to the public?

Bulhwi Cha (Feb 18 2026 at 17:25):

Are handmade natural language and formal language pairs still worthwhile?

Last updated: Feb 28 2026 at 14:05 UTC