Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: OpenAI "Learning to Reason with LLMs"

Floris van Doorn (Sep 12 2024 at 19:08):

I just saw this blog post: https://openai.com/index/learning-to-reason-with-llms/

Adam Topaz (Sep 12 2024 at 19:12):

yeah, it's all over twitter.

Jason Rute (Sep 12 2024 at 19:15):

Their approach is very reminiscent of the approaches which work for formal theorem proving: test-time-compute/search with self-improvement via reinforcement learning. It is exciting to see. They are getting a high score on AIME 2024 (at least compared to previous models). I would love to see more benchmarking of this model on math and formal theorem proving. Also when the small version is available through ChatGPT, I’d love to know what Lean/Coq/Isabelle users think of the suggestions. Does it still mix up Lean 3 and Lean 4? Is it a tool you could see yourself using regularly in the right workflow?

Adam Topaz (Sep 12 2024 at 19:17):

Therefore, after weighing multiple factors including user experience, competitive advantage, and the option to pursue the chain of thought monitoring, we have decided not to show the raw chains of thought to users.

Ralph Furman (Sep 12 2024 at 19:18):

Meaning they actually decided it for business reasons?

Adam Topaz (Sep 12 2024 at 19:19):

That's what I assumed from that statement

Jason Rute (Sep 12 2024 at 19:20):

“Competitive advantage”

Adam Topaz (Sep 12 2024 at 19:25):

I must say the chain of thought output in the math example from the blogpost is impressive.

Matthew Ballard (Sep 12 2024 at 19:35):

So it does better on PhD level physics than AP calculus or am I misreading

Xiyu Zhai (Sep 12 2024 at 19:45):

Jason Rute said:

Their approach is very reminiscent of the approaches which work for formal theorem proving: test-time-compute/search with self-improvement via reinforcement learning. It is exciting to see. They are getting a high score on AIME 2024 (at least compared to previous models). I would love to see more benchmarking of this model on math and formal theorem proving. Also when the small version is available through ChatGPT, I’d love to know what Lean/Coq/Isabelle users think of the suggestions. Does it still mix up Lean 3 and Lean 4? Is it a tool you could see yourself using regularly in the right workflow?

In my personal viewpoint, these problems are all just RL problems converted from NP problems with the verifier either specified or learned. They don’t have the complications of MDP as in generic RL.

David Renshaw (Sep 12 2024 at 20:03):

I understand "RL" to be "reinforcement learning", but what are NP and MDP?

Adam Topaz (Sep 12 2024 at 20:03):

Xiyu Zhai said:

In my personal viewpoint, these problems are all just RL problems converted from NP problems with the verifier either specified or learned. They don’t have the complications of MDP as in generic RL.

Can you expand on that?

Adam Topaz (Sep 12 2024 at 20:03):

I assume NP is as in "NP-complete"? and MDP is Markov decision process, I suppose

Jason Rute (Sep 12 2024 at 20:30):

I think Xiyu is saying you can cast informal reasoning problems as NP problems (in the sense that it is easier to verify the solution than find the solution). Once you have a verifier (which for informal problems is not straightforward since there is no formal calculus, so you have to train a verifier model, which in this case is likely part of the LLM training). Then it is now a matter of searching for a solution. If the model’s first attempt is wrong it can backtrack and find a different answer. As for how to search, RL is one of the best search algorithms we have. Now I don’t know if Xiyu is thinking on the macro scale where RL is used to train the LLM model itself. AlphaProof uses test-time RL but this model only uses test-time compute in the form of chain of thought, not RL or traditional tree search. I don’t know the details of how training time RL works. MDP (Markov decision processes) refers to an RL setup where there is a graph of states connected by transitions (which are actions, but could also be stochastic). The transitions also come with rewards and the goal is to maximize the total reward in the end. A NP-style search problem is a simpler MDP where there are no intermediate rewards, just a reward of 1 at the end for finding the solution. But IMHO this makes the RL even more difficult because you don’t have a good partial signal and have to come up with your own version of internal reward. It also makes it hard to assign credit to early actions in the trajectory. I’m not sure it anything in this work is “just” X. The challenge is getting all the details right and we will see how long it takes others to catch up.

Jason Rute (Sep 12 2024 at 20:34):

Maybe @Xiyu Zhai isn’t talking about RL in the sense of gradient training using observed rewards, but just the challenge of finding the highest reward path through an MDP (of which we already said that NP-style search problems are a special case).

Xiyu Zhai (Sep 12 2024 at 20:45):

I didn’t expand because of inconfidence lol. I graduated from a theory group where they have been taking about RL theory for several years and I mostly don’t quite understand. In my thesis, I made this bold conjecture that most AI problems in CV or NLP or AI Math are NP problem in disguise. Not NP hard, just NP. Including even mnist, where the proper mathematical definition of shapes I believe is of some NP form, a certificate and a verifier. For those with geometric backgrounds, homotopy can be viewed as NP, riemannian isometry can be viewed as NP, and shape is just similar and intermediate between topological and riemannian. Now let’s talk about NLP. Sure things in physics and chemistry can’t be formalized easily as math, but I imagine there is some neural informal verification mechanism for them.

Xiyu Zhai (Sep 12 2024 at 20:46):

As for RL, I meant mostly the task not any specific methodology. Gradient descent on V or Q function is just one way.

Xiyu Zhai (Sep 12 2024 at 20:47):

My understanding is that NP problem with a recursive nature can be viewed as RL problem. The bellman equation is recursive to me

Xiyu Zhai (Sep 12 2024 at 20:49):

Jason Rute said:

I think Xiyu is saying you can cast informal reasoning problems as NP problems (in the sense that it is easier to verify the solution than find the solution). Once you have a verifier (which for informal problems is not straightforward since there is no formal calculus, so you have to train a verifier model, which in this case is likely part of the LLM training). Then it is now a matter of searching for a solution. If the model’s first attempt is wrong it can backtrack and find a different answer. As for how to search, RL is one of the best search algorithms we have. Now I don’t know if Xiyu is thinking on the macro scale where RL is used to train the LLM model itself. AlphaProof uses test-time RL but this model only uses test-time compute in the form of chain of thought, not RL or traditional tree search. I don’t know the details of how training time RL works. MDP (Markov decision processes) refers to an RL setup where there is a graph of states connected by transitions (which are actions, but could also be stochastic). The transitions also come with rewards and the goal is to maximize the total reward in the end. A NP-style search problem is a simpler MDP where there are no intermediate rewards, just a reward of 1 at the end for finding the solution. But IMHO this makes the RL even more difficult because you don’t have a good partial signal and have to come up with your own version of internal reward. It also makes it hard to assign credit to early actions in the trajectory. I’m not sure it anything in this work is “just” X. The challenge is getting all the details right and we will see how long it takes others to catch up.

Agree. Thanks a lot

Xiyu Zhai (Sep 12 2024 at 20:58):

One can compare this with world models proposed by Yann Lecun. I believe world model is limited to physical environments but NP viewpoint is more general.

Jason Rute (Sep 12 2024 at 21:13):

I think LeCun’s JEPA ideas could be applicable even to theorem proving and other NP problems. But I admit it is not as fleshed-out and explored as the current approaches.

Xiyu Zhai (Sep 12 2024 at 21:13):

Yeah agree

Xiyu Zhai (Sep 12 2024 at 21:13):

Like some high level planning

Xiyu Zhai (Sep 12 2024 at 21:16):

Lecun from his vision background might focus more on the physical world. But this hierarchy idea is transferrable

Matthew Ballard (Sep 12 2024 at 22:00):

Create a formal statement of Lagrange’s theorem in Lean

import group_theory.subgroup

variables {G : Type*} [group G] [fintype G] (H : subgroup G) [fintype H]

theorem lagrange_theorem :
  fintype.card H ∣ fintype.card G :=
subgroup.card_subgroup_dvd_card H

In the current version of Lean

import Mathlib.GroupTheory.Subgroup.Basic
import Mathlib.Data.Fintype.Basic

variable {G : Type _} [Group G] [Fintype G] (H : Subgroup G)

theorem lagrange_theorem :
  Fintype.card H ∣ Fintype.card G :=
Subgroup.card_subgroup_dvd_card H

Jason Rute (Sep 12 2024 at 22:33):

@Matthew Ballard Is that using o1-preview?

Jason Rute (Sep 13 2024 at 03:22):

How is it at proofs?

Johan Commelin (Sep 13 2024 at 03:55):

So it still didn't get the memo that the cool kids all use Lean 4 these days :oops:

Eric Taucher (Sep 13 2024 at 10:58):

FYI - Announcement in OpenAI forum

https://community.openai.com/t/new-reasoning-models-openai-o1-preview-and-o1-mini/938081

ChatGPT Plus subscribers can also try o1 today.

Matthew Ballard (Sep 13 2024 at 11:53):

Haven’t tried anything more yet. If anyone has suggestions, I’ll test them this afternoon.

Eric Taucher (Sep 13 2024 at 14:40):

Matthew Ballard said:

If anyone has suggestions

Suggestion: Not for examples to try but for creating information relevant for others. I am doing the same so these are my current ideas.

Include how you ran the prompt, e.g. API or ChatGPT.
Include which model used, e.g. o1-mini, o1-preview.
Show the result using another common model such as GPT 4o, Claude, etc.
Include the number of seconds the OpenAI reasoning model thought, e.g. Thought for 37 seconds. A video of a result showing the thinking process step names would be useful.

Leonardo de Moura (Sep 13 2024 at 14:46):

BTW, our friend and major past Lean contributor, @Daniel Selsam , is working on this project :tada:

Jason Rute (Sep 13 2024 at 14:55):

Yes, I saw he was a “foundational contributor”.

Eric Taucher (Sep 13 2024 at 14:59):

Here is a rough post of a first example, mostly to just see what makes sense for posting more examples.

The first example is taken from the popular tutorial

https://github.com/PatrickMassot/GlimpseOfLean

ChatGPT - 4o

Prompt

You can solve the example using basic algebraic manipulations with Lean’s tactic mode.

Here’s the complete solution:

example (a b c d e : ℝ) (h : a = b + c) (h' : b = d - e) : a + e = d + c := by
rw [h, h'] -- rewrite a using h and b using h'
linarith -- use linarith to handle the arithmetic
Explanation:

rw [h, h'] rewrites a using the equation a = b + c and b using b = d - e.
After the rewrite, we have a + e = (d - e) + c + e. The linarith tactic can automatically simplify this expression to the required result d + c.

ChatGPT - o1-mini - Thought for 6 seconds

Prompt

Certainly! Let’s solve the given Lean 4 example by providing a complete proof.

Problem Statement
We are given real numbers ( a, b, c, d, e ) and the following hypotheses:

( a = b + c ) (h : a = b + c)
( b = d - e ) (h' : b = d - e)
We need to prove that:
[ a + e = d + c ]

Proof Strategy
To prove ( a + e = d + c ), we’ll use the given hypotheses to substitute ( a ) and ( b ) and perform algebraic manipulations.

Substitute ( a ): From ( h ), we know ( a = b + c ). Therefore, ( a + e = (b + c) + e ).
Substitute ( b ): From ( h’ ), ( b = d - e ). Substituting this into the previous expression gives:
[ a + e = (d - e + c) + e ]
Simplify: Combine like terms:
[ a + e = d + c - e + e = d + c ]
Thus, ( a + e = d + c ) is proven.

Lean 4 Implementation
Here’s how you can implement this proof in Lean 4:

example (a b c d e : ℝ) (h : a = b + c) (h' : b = d - e) : a + e = d + c := by
-- Substitute a with b + c using hypothesis h
rw h
-- Substitute b with d - e using hypothesis h'
rw h'
-- Simplify the expression
linarith
Explanation of the Lean Proof
rw h: The rw (rewrite) tactic replaces ( a ) with ( b + c ) in the goal ( a + e = d + c ), resulting in:
[ (b + c) + e = d + c ]

rw h': Again, using rw, we replace ( b ) with ( d - e ) in the expression:
[ (d - e + c) + e = d + c ]

linarith: This tactic handles linear arithmetic. It automatically simplifies the left-hand side:
[ d - e + c + e = d + c ]
The terms ( -e ) and ( +e ) cancel out, leaving:
[ d + c = d + c ]
Which is trivially true.

Alternative Proof Using calc Syntax
Lean also offers the calc syntax for a more step-by-step proof:

example (a b c d e : ℝ) (h : a = b + c) (h' : b = d - e) : a + e = d + c := by
calc
a + e = (b + c) + e := by rw h
_ = (d - e + c) + e := by rw h'
_ = d + c := by linarith
Conclusion
Both proofs effectively demonstrate how to use Lean’s tactics to manipulate and simplify expressions based on given hypotheses. The rw tactic is particularly powerful for substituting equal expressions, and linarith efficiently handles linear arithmetic simplifications.

ChatGPT - o1-preview - Thought for 37 seconds

Prompt

To solve the Lean 4 example, you can use the rw tactic to rewrite the expressions using the given hypotheses and then apply the ring tactic to simplify the resulting expression. Here’s the complete proof:

example (a b c d e : ℝ) (h₁ : a = b + c) (h₂ : b = d - e) : a + e = d + c := by
rw [h₁, h₂]
ring
Explanation:

Rewriting with Hypotheses (rw tactic):

rw [h₁, h₂] replaces a with b + c and b with d - e in the goal.
After rewriting, the goal becomes d - e + c + e = d + c.
Simplifying the Expression (ring tactic):

The ring tactic simplifies the arithmetic expression.
It recognizes that -e + e cancels out to 0, simplifying the left side to d + c.
Therefore, both sides of the equation become equal, and the proof is complete.
This proof leverages Lean’s powerful tactics to handle algebraic manipulations succinctly.

Jason Rute (Sep 13 2024 at 15:43):

Thanks! But maybe this example is too easy. They all seem to get it modulo not using brackets for rw sometimes.

Matthew Ballard (Sep 13 2024 at 16:01):

Tried a question from my old graduate algebra 1 final exam.

o1-preview through the web UI

Let G be a group. Prove that G is finite if and only if G has finitely many subgroups.

Matthew Ballard (Sep 13 2024 at 16:04):

Thought for 296 seconds.

Response

To prove the statement "A group G is finite if and only if G has finitely many subgroups," we need to establish two implications:

If G is finite, then G has finitely many subgroups.
If G has finitely many subgroups, then G is finite.

1. If G is finite, then G has finitely many subgroups.

This direction is straightforward.
* Proof:
* A finite group G has a finite number of subsets (specifically, 2∣G∣ subsets).
* Each subgroup of G is a subset of G that satisfies the group axioms.
* Since there are finitely many subsets, there are finitely many possible subgroups.
* Therefore, G has finitely many subgroups.

2. If G has finitely many subgroups, then G is finite.

This direction requires a more detailed argument.
* Proof:
* Assume, for contradiction, that G is infinite but has only finitely many subgroups H1,H2,…,Hn.
* Consider the set U=H1∪H2∪⋯∪Hn. Since there are finitely many subgroups, U is the union of finitely many subgroups.
* Case 1: All Hi are finite.
* Each Hi is finite, so their union U is finite.
* However, G is infinite, so there exist elements g∈G not in U.
* For any such g, the cyclic subgroup ⟨g⟩ generated by g is a subgroup of G.
* Since g∈/U, ⟨g⟩ is not among H1,H2,…,Hn, contradicting the assumption that these are all the subgroups of G.
* Case 2: At least one Hi is infinite.
* Without loss of generality, let H1 be infinite.
* The infinite cyclic subgroup H1 must be isomorphic to Z or contain elements of infinite order.
* The infinite cyclic group Z has infinitely many subgroups (e.g., nZ for each positive integer n).
* Therefore, H1 (and thus G) has infinitely many subgroups, contradicting the assumption that Ghas only finitely many subgroups.
* In both cases, we reach a contradiction.
* Therefore, our assumption that G is infinite must be false.
* Hence, G must be finite.

Conclusion

The group G is finite if and only if it has finitely many subgroups.
This completes the proof.
Answer:
Proof:

If G is finite, then it has finitely many subgroups:
- A finite group has finitely many subsets, so it can have only finitely many subgroups.
  2. If G has finitely many subgroups, then G is finite:
- Assume G is infinite but has only finitely many subgroups H1,H2,…,Hn.
- Every element g of G generates a cyclic subgroup ⟨g⟩, which must be one of the Hi.
- Since G is infinite but the Hi are finitely many, at least one Hi must be infinite.
- An infinite cyclic subgroup has infinitely many subgroups, contradicting that G has only finitely many subgroups.
- Therefore, G must be finite.

Thus, a group is finite if and only if it has finitely many subgroups.
Therefore, G is finite exactly when it has finitely many subgroups.

Matthew Ballard (Sep 13 2024 at 16:10):

To prove the statement that a group \( G \) is finite if and only if \( G \) has finitely many subgroups, we need to establish two directions:

1. If \( G \) is finite, then \( G \) has finitely many subgroups.

Let \( G \) be a finite group. By Lagrange's Theorem, the order (number of elements) of any subgroup of \( G \) divides the order of \( G \). Since there are only finitely many divisors of the order of \( G \), there can only be finitely many subgroups, as each subgroup must have one of these divisors as its order.

Additionally, for each divisor of the order of \( G \), there are finitely many subgroups of that order because the number of subgroups of a given order in a finite group is bounded. Therefore, \( G \) has finitely many subgroups.

2. If \( G \) has finitely many subgroups, then \( G \) is finite.

Assume that \( G \) has finitely many subgroups. We will show that \( G \) must be finite.

Suppose, for contradiction, that \( G \) is infinite. Consider the cyclic subgroups generated by each element of \( G \). For any \( g \in G \), the cyclic subgroup \( \langle g \rangle \) is a subgroup of \( G \). Since \( G \) is infinite, there are infinitely many elements in \( G \), and consequently, there would be infinitely many distinct cyclic subgroups of \( G \) (because distinct elements of \( G \) can generate distinct subgroups, particularly if the elements have different orders or are not related by a power).

This contradicts the assumption that \( G \) has finitely many subgroups. Therefore, \( G \) must be finite.

Conclusion:

Thus, we have proven both directions: \( G \) is finite if and only if \( G \) has finitely many subgroups.

Matthew Ballard (Sep 13 2024 at 16:17):

Give a formal proof in the current version of Lean

Thought for 80 seconds

Response

Certainly! Here is a formal proof of the statement in Lean:

import group_theory.subgroup
import data.fintype.basic

open_locale classical

/-- Lemma: The set of subgroups of ℤ is infinite. -/
lemma infinite_subgroup_int : infinite (subgroup ℤ) :=
begin
  -- Define a function from positive natural numbers to subgroups of ℤ
  let φ : ℕ+ → subgroup ℤ := λ n, subgroup.zmultiples (n : ℤ),
  -- Show that this function is injective
  have h_injective : function.injective φ,
  {
    intros n m hnm,
    rw [subgroup.zmultiples_eq_iff] at hnm,
    exact subtype.ext hnm,
  },
  -- Since ℕ+ is infinite and φ is injective, the set of subgroups of ℤ is infinite
  exact infinite.of_injective h_injective,
end

/-- Lemma: An infinite group has infinitely many subgroups. -/
lemma infinite_group_has_infinite_subgroups {G : Type*} [group G] (hG : infinite G) :
  infinite (subgroup G) :=
begin
  -- Assume, for contradiction, that `subgroup G` is finite
  by_contradiction h_subG_finite,
  haveI : fintype (subgroup G) := fintype.of_finite (subgroup G),
  -- Consider the map φ : G → subgroup G defined by φ(g) = subgroup generated by g
  let φ : G → subgroup G := λ g, subgroup.closure {g},
  -- The image of φ is finite since `subgroup G` is finite
  have h_image_finite : finite (set.range φ) := finite.of_fintype (set.range φ),
  -- Since G is infinite and the image of φ is finite, some fiber φ⁻¹(H) must be infinite
  have h_exists_H : ∃ H : subgroup G, infinite (φ ⁻¹' {H}),
  {
    apply infinite.exists_infinite_fiber φ hG h_image_finite,
  },
  obtain ⟨H, hH⟩ := h_exists_H,
  -- The subgroup H is nontrivial
  have hH_nontrivial : H ≠ ⊥,
  {
    intro h_trivial,
    have h_singleton : φ ⁻¹' {⊥} = {1},
    {
      ext g,
      split; intro h,
      {
        rw [set.mem_preimage, set.mem_singleton_iff] at h,
        rw [←h, subgroup.closure_singleton_eq_bot_iff] at h,
        simp [h],
      },
      {
        simp at h,
        rw [h, subgroup.closure_singleton_one],
        exact set.mem_singleton _,
      },
    },
    have h_finite : finite (φ ⁻¹' {⊥}) := by rw h_singleton; apply finite_singleton,
    exact infinite.not_finite hH h_finite,
  },
  -- The preimage φ⁻¹(H) is contained in H and is infinite, so H is infinite
  have hH_infinite : infinite H,
  {
    have h_preimage_subset : φ ⁻¹' {H} ⊆ H,
    {
      intros g hg,
      rw [set.mem_preimage, set.mem_singleton_iff] at hg,
      rw ←hg,
      exact subgroup.mem_closure_singleton_self g,
    },
    exact infinite.mono h_preimage_subset hH,
  },
  -- Since H is infinite cyclic or contains an infinite cyclic subgroup
  have hH_cyclic : ∃ g : H, subgroup.closure ({g} : set H) = H,
  {
    apply subgroup.is_cyclic.exists_generator,
    exact subgroup.is_cyclic_of_subgroup_of_cyclic,
  },
  obtain ⟨g, hg⟩ := hH_cyclic,
  -- Since H is infinite cyclic, H ≃ ℤ
  have hH_iso_Z : H ≃* ℤ,
  {
    apply subgroup.equiv_int_of_cyclic_of_infinite,
    exact ⟨g, hg⟩,
    exact hH_infinite,
  },
  -- Therefore, subgroup H ≃ subgroup ℤ
  have h_subH_iso : subgroup H ≃ subgroup ℤ := subgroup_equiv_subgroup hH_iso_Z,
  -- Since subgroup ℤ is infinite, so is subgroup H
  have h_subH_infinite : infinite (subgroup H) := infinite.of_injective h_subH_iso.injective infinite_subgroup_int,
  -- Contradiction: `subgroup G` cannot be finite if it contains an infinite subset
  have h_subG_infinite : infinite (subgroup G) := infinite.of_injective (subgroup.inclusion_injective H) h_subH_infinite,
  exact h_subG_finite h_subG_infinite,
end

/-- **Main Theorem:** A group `G` is finite if and only if it has finitely many subgroups. -/
theorem group_finite_iff_subgroup_finite (G : Type*) [group G] :
  fintype G ↔ fintype (subgroup G) :=
begin
  split,
  { -- First direction: If `G` is finite, then `subgroup G` is finite
    intro hG,
    haveI : fintype G := hG,
    -- Since `G` is finite, the set of subgroups is finite
    exact fintype.of_finite (subgroup G),
  },
  { -- Second direction: If `subgroup G` is finite, then `G` is finite
    intro h_subG,
    by_contradiction hG,
    have hG_inf : infinite G := infinite.of_not_finite hG,
    -- Use the lemma that an infinite group has infinitely many subgroups
    have h_subG_inf : infinite (subgroup G) := infinite_group_has_infinite_subgroups hG_inf,
    -- Contradiction: `subgroup G` cannot be both finite and infinite
    exact infinite.not_finite h_subG_inf h_subG,
  }
end

Explanation:
* We first import the necessary Lean libraries for group theory and finite types.
* We prove a lemma that the set of subgroups of the integer group ℤ is infinite.
* We then prove a lemma that any infinite group has infinitely many subgroups by considering the map that sends an element to the cyclic subgroup it generates.
* Finally, we prove the main theorem by showing both directions:
* If G is finite, then the set of subgroups subgroup G is finite.
* If subgroup G is finite and G were infinite, it would lead to a contradiction since an infinite group must have infinitely many subgroups.

This formal proof uses Lean's capabilities to handle classical logic and infinite sets, constructing rigorous arguments mirroring the informal proof provided earlier.

Matthew Ballard (Sep 13 2024 at 16:19):

The current version of Lean please

Thought for 90 seconds

Response

Matthew Ballard (Sep 13 2024 at 16:21):

Anyone still have Lean 3 running?

Jason Rute (Sep 13 2024 at 16:23):

This might be fun, but maybe too hard:

Consider this Lean 4 code:

inductive MyNat
| zero : MyNat
| succ : MyNat -> MyNat

def MyNat.plus_left : (m n : MyNat) -> MyNat
| .zero, n => n
| (.succ m), n => .succ (plus_left m n)

def MyNat.plus_right : (m n : MyNat) -> MyNat
| n, .zero => n
| m, (.succ n) => .succ (plus_right m n)

example (m n: MyNat) : MyNat.plus_left m n = MyNat.plus_right m n := by sorry

Complete the proof of the example theorem.

Matthew Ballard (Sep 13 2024 at 16:24):

Same prompts for 4o and I get (after emphasizing the current version of Lean)

import Mathlib.GroupTheory.Subgroup.Basic
import Mathlib.Data.Fintype.Basic

-- Let G be a group
variables {G : Type*} [Group G]

-- First, we show that if G is finite, then G has finitely many subgroups
lemma finite_group_finite_subgroups [Fintype G] : Fintype (Subgroup G) :=
begin
  -- Subgroups are subsets of G, and since G is finite, the number of subsets is finite
  apply Fintype.ofInjective (λ H : Subgroup G, H.carrier),
  -- Injectivity: if two subgroups have the same carrier, then they are equal
  intros H1 H2 h,
  ext x,
  exact Set.ext_iff.mp h x,
end

-- Next, we show that if G has finitely many subgroups, then G is finite
lemma finite_subgroups_finite_group [Fintype (Subgroup G)] : Fintype G :=
begin
  -- We prove this by contradiction, assuming G is infinite
  by_contradiction h,
  -- Assume G is infinite
  have : Infinite G := Infinite.of_not_fintype h,
  -- Consider the set of cyclic subgroups of G
  let cyclic_subgroups := Set.range (λ g : G, Subgroup.zmultiples g),
  -- Each element of G generates a cyclic subgroup, and distinct elements generate distinct cyclic subgroups
  have : Infinite cyclic_subgroups,
  { apply Set.infinite_of_injective_forall_mem,
    intros g1 g2 h_cyclic,
    -- If the cyclic subgroups generated by g1 and g2 are equal, their generators must be related
    have : g1 ∈ Subgroup.zmultiples g2 := by { rw ←h_cyclic, exact Subgroup.mem_zmultiples g1 },
    rw Subgroup.mem_zmultiples_iff at this,
    -- g1 = g2^n for some integer n
    cases this with n h,
    cases n,
    -- If n = 0, g1 is the identity element
    { simp [pow_zero] at h, exact h.symm },
    -- Otherwise, g1 and g2 are equal up to powers, ensuring injectivity
    exact h },
  -- But this contradicts the assumption that G has finitely many subgroups
  exact not_infinite_of_fintype this,
end

-- Finally, we combine the two lemmas to get the desired iff statement
theorem group_finite_iff_finitely_many_subgroups : Fintype G ↔ Fintype (Subgroup G) :=
⟨finite_group_finite_subgroups, finite_subgroups_finite_group⟩

Matthew Ballard (Sep 13 2024 at 16:25):

Which is like Lean 3.75

Matthew Ballard (Sep 13 2024 at 16:27):

To @Jason Rute 's prompt

example (m n : MyNat) : MyNat.plus_left m n = MyNat.plus_right m n := by
  induction m with
  | zero =>
    -- Base case: m = zero
    -- plus_left zero n = n by definition
    -- plus_right zero n = n by definition
    rfl
  | succ m ih =>
    -- Inductive step: Assume the hypothesis holds for m
    -- Show it holds for succ m
    calc
      plus_left (succ m) n
        = succ (plus_left m n)        : by rfl
        ... = succ (plus_right m n)   : by rw ih
        ... = plus_right (succ m) n   : by rfl

Matthew Ballard (Sep 13 2024 at 16:28):

And I need to return to productivity :)

Eric Taucher (Sep 13 2024 at 16:53):

Ran Jason's example and captured to video which was larger than the 25MB limit for uploading files here. Sorry I am not so adept with modifing and uploading video files but more than happy to email it to someone so they can use their magic

Here are the messages that ChatGPT displayed with model 01-preview.

ChatGPT messages

Thinking
Examining Lean 4 code
Understanding definition
Slugging through induction
Comparing expressions
Evaluating induction approach
Advancing induction
Breaking down the process
Confirming base case
Expanding the proof
Thinking through induction
Breaking down the base case
Breaking down the calculations
Analyzing base case
Demonstrating equality
Harnessing induction
Setting up the foundation
Evaluating function behavior
Detailing induction steps
Confirming equivalence
Translating to Lean 4
Validating assumptions
Exploring Lean proofs
Mapping out the proof strategy
Considering induction steps
Thought for 55 seconds

Eric Taucher (Sep 13 2024 at 17:04):

FYI

AMA with the OpenAI o1 team

https://x.com/OpenAIDevs/status/1834608585151594537

Eric Wieser (Sep 13 2024 at 17:17):

Matthew Ballard said:

Anyone still have Lean 3 running?

The web editor is still live, just use ```lean3 instead of ``` for code blocks and the "open in playground" link will still work

import mathlib

Eric Taucher (Sep 14 2024 at 12:24):

One point of note related to the new o1 models from OpenAI is that OpenAI will adjust the models moving forward and some prompts that work at present may not in the next few weeks. As such I have found it of benefit to establish a baseline of different questions early on to compare against future adjustments so that one has valid facts for determining if the model is degrading.

As I am not as adept as many here on building such a baseline of such questions, if others could post them here it would be appreciated.

Bas Spitters (Sep 16 2024 at 08:26):

OpenAI has released a new model that is claimed to be better at reasoning. Has anyone tried to rerun any of the existing benchmarks of AI for ITP to see whether it is relevant for our community?
https://openai.com/index/introducing-openai-o1-preview/

Notification Bot (Sep 16 2024 at 11:03):

A message was moved here from #Machine Learning for Theorem Proving > o1 by Jason Rute.

Jason Rute (Sep 16 2024 at 11:17):

@Bas Spitters, it would have to be a benchmark that previously used an LLM like GPT-4 or GPT-4o to synthesize whole proofs. There certainly are such benchmarks for MiniF2F and PutnamBench, for example, those run by @George Tsoukalas and @Amitayush Thakur, or those in CoqPilot (none of the CoqPilot authors appear to be in this Zulip, so we might have to repeat this discussion on the Coq zulip). I think there would be some practical concerns such as if the same prompt works for o1, if they can do these evals without getting rate limited, and if these researchers can afford the extra cost of running o1 benchmarks. (And the usual concerns about benchmarks getting leaked into LLM training data.) I think approaches like Copra (which call GPT-4o at each step of the proof) would not be a good fit for o1, since it would be using a lot of computation for each step, but that is not to say there isn't a future way to combine o1's power with knowledge of Lean/Coq's internal state.

Jason Rute (Sep 16 2024 at 11:23):

Oh, and @Sean Welleck's benchmarks in miniCTX and miniCodeProps. Those would be good o1 benchmark candidates.

Jason Rute (Sep 16 2024 at 11:28):

Cross post to the CoqPilot discussion: https://coq.zulipchat.com/#narrow/stream/252087-Machine-learning-and-automation/topic/CoqPilot/near/470561267

George Tsoukalas (Sep 16 2024 at 15:28):

Jason Rute said:

There certainly are such benchmarks for MiniF2F and PutnamBench, for example, those run by George Tsoukalas and Amitayush Thakur if they can do these evals without getting rate limited, and if these researchers can afford the extra cost of running o1 benchmarks.

An interesting thing I noticed is that the math demonstration of o1 in the blog post is actually on a Putnam problem (2023 A2) from last year! I tried o1-preview on the Lean formalization in PutnamBench and it wasn't able to produce a proof. o1-preview mentioned that there are some intricate parts in the argument and left sorry's instead of proving them. Also it seems that it still confuses Lean 3 and Lean 4 syntax. Based on this, I'm not sure I would expect strong results. And the cost of running it on the whole benchmark would be relatively high, the API is fairly pricy and the outputs generally produce a lot of tokens for these problems. I think if OpenAI could do an evaluation on any formal theorem-proving benchmark (doesn't have to be PutnamBench) I would be more confident about using some of our budget to do an eval ourselves.

Sid (Sep 21 2024 at 01:17):

Seems like Qwen 2.5 70B matches with a reward model on 256 generations + python execution matches O1-mini's AIME score (O1-mini is rumored to be 70B as well)

Jason Rute (Sep 21 2024 at 07:10):

@Sid reference? Is this Qwen-2.5-Math? Is this in the official documentation or someone else’s result? I pitty the graders who had to check 256 automatic-proofs for correctness. :)

Sid (Sep 21 2024 at 07:25):

Jason Rute said:

Sid reference? Is this Qwen-2.5-Math? Is this in the official documentation or someone else’s result? I pitty the graders who had to check 256 automatic-proofs for correctness. :)

It's here (https://qwenlm.github.io/blog/qwen2.5-math/) - last table on the page. Haha I think it's just final answer correctness (that said my guess is so is OAI's result)

Last updated: May 02 2025 at 03:31 UTC