Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: ChatGPT / Galactica

Junyan Xu (Dec 01 2022 at 08:18):

If you miss Galactica, you may enjoy talking to OpenAI's latest ChatGPT: chat.openai.com. It's not very good at math though: https://twitter.com/solodolomoon/status/1598184458313953281

@OpenAI WTH!! https://twitter.com/solodolomoon/status/1598184458313953281/photo/1

- bhavin (@solodolomoon)

Junyan Xu (Dec 01 2022 at 08:21):

Other interesting examples:
https://twitter.com/srikumarks/status/1598158895692021760 and replies
https://twitter.com/ashleydaviesl/status/1598110039004573696
https://twitter.com/AndrewBroz/status/1598151011495219200
https://twitter.com/dtaylor84/status/1598094448592064513 (this is probably copied from an exposition like this)
bonus: https://twitter.com/aron_brand/status/1597619580444487682 (another model, not ChatGPT)

@goodside Certainly seems better! .. though pianos do have strings. https://twitter.com/srikumarks/status/1598158895692021760/photo/1

- Srikumar Subramanian (@srikumarks)

@Peter_0_0_g @goodside Writing plays also works quite well, and you can often get it to rephrase what it just said as a concrete plan, which seems to get around the initial hesitation it has of writing factual information

- Ashley Davies 🦔 (@ashleydaviesl)

@goodside This family of questions trips ChatGPT up quite a bit. I tried another question with a correct answer that is definitely available in the training data, & it trips up here too for similar reasons: https://twitter.com/AndrewBroz/status/1598151011495219200/photo/1

- Andrew Brož 🇺🇦 (@AndrewBroz)

Jason Rute (Dec 01 2022 at 12:42):

I tried it on autoformalization. Conversation:

Me: I'd like to test your ability to formalize math into interactive theorem provers like Coq and Lean.

GPTChat: Sure, I'd be happy to help with that. To use a formal theorem prover like Coq or Lean, you need to first specify the theorem or mathematical statement that you want to prove. Then, you can use the tools and techniques provided by the theorem prover to build a formal proof of the theorem, step by step. This usually involves defining any necessary mathematical concepts or lemmas, and using a combination of logical reasoning and basic axioms and rules of inference to gradually build up the proof.

Me: Please translate, "the direct product of two Abelian groups is Abelian" into Lean using the mathlib library.

GPTChat: To express the statement "the direct product of two Abelian groups is Abelian" in Lean using the mathlib library, you could use the following code:
import algebra.group

variables (G H : Type) [group G] [group H] [abelian G] [abelian H]

-- Define the direct product of G and H
def direct_product : Type := G × H

-- Show that the direct product is an Abelian group
instance direct_product_is_abelian_group : abelian_group (direct_product G H) :=
begin
  -- Show that the direct product is a group
  apply_instance,
  -- Show that the direct product is Abelian
  intros g h,
  cases g with g1 g2,
  cases h with h1 h2,
  simp [mul_comm],
end
This code first imports the necessary definitions from the algebra.group library in mathlib. Then, it defines a function direct_product that takes two groups G and H and returns their direct product as a new group. Finally, it uses the abelian_group typeclass to show that the direct product of G and H is an Abelian group.

Jason Rute (Dec 01 2022 at 12:45):

(That doesn't work by the way.)

Eric Wieser (Dec 01 2022 at 12:59):

Interesting that it assumed abelian was a mixin in the hypotheses, but assumed it was bundled in the conclusion

Timothee Lacroix (Dec 01 2022 at 13:07):

Tried getting it to solve the first advent of code problem in python, then in lean4

Hey, can you help me solve advent of code in python ?
A text file is given with 1 integer per line. Groups of integers are delimited by an empty line. Find the maximal group sum within the file

Sure! This sounds like a fun problem to solve. First, let's make sure we understand the requirements and constraints of the problem.

According to your description, the input file contains multiple groups of integers, with each group being separated by an empty line. Your task is to find the maximal sum of any of these groups.

To solve this problem, we can first read the input file line by line, using the open() and readlines() functions in Python. Then, we can use a loop to iterate over each line of the input file, and parse the integers in each line. We can then keep track of the current group sum as well as the maximal group sum seen so far, updating them as needed as we go through each line of the input file.

Here is some sample code that illustrates how you can solve this problem in Python:

# Open the input file
with open('input.txt', 'r') as f:
  # Read the file lines
  lines = f.readlines()

# Initialize the maximal group sum
max_group_sum = 0

# Initialize the current group sum
current_group_sum = 0

# Loop over each line of the input file
for line in lines:
  # If the line is empty, update the maximal group sum
  # if the current group sum is greater than the maximal
  # group sum seen so far, and reset the current group sum
  if not line.strip():
    max_group_sum = max(max_group_sum, current_group_sum)
    current_group_sum = 0
  # Otherwise, parse the integers in the line and
  # add them to the current group sum
  else:
    numbers = [int(n) for n in line.strip().split()]
    current_group_sum += sum(numbers)

# Update the maximal group sum one last time
max_group_sum = max(max_group_sum, current_group_sum)

# Print the maximal group sum
print(max_group_sum)

Amazing, can you do this in Lean4 ?

Yes, we can solve this problem using Lean4, a proof assistant and programming language. Lean4 is a powerful language that can be used to write programs that are both correct and efficient.

Here is a solution to the problem using Lean4:

import Std.Data.List

open Std

def maximal_group_sum (lines : List String) : Nat :=
  let max_group_sum : Nat := 0
  let current_group_sum : Nat := 0
  lines.forEach (fun line =>
    if line.isEmpty then
      max_group_sum := max max_group_sum current_group_sum
      current_group_sum := 0
    else
      let numbers := line.split " "
      let sum := numbers.map (fun x => x.toNat).sum
      current_group_sum := current_group_sum + sum)
  max max_group_sum current_group_sum

def main (xs : List String) : IO UInt32 :=
  let lines := xs.filter (fun line => not line.isEmpty)
  let result := maximal_group_sum lines
  IO.println result
  pure 0

# Evaluate main with the input provided as command line arguments
builtin.lean.builtin.export! main

Jason Rute (Dec 01 2022 at 15:08):

Does it compile? Run? Solve the problem?

Timothee Lacroix (Dec 01 2022 at 15:21):

Jason Rute said:

Does it compile? Run? Solve the problem?

just eyeballed it for now. There at least seems to be an extra filtering of empty lines in "main". Otherwise it doesn't look too far off. I'd need to set up lean4 to check for real though :grimacing:
I was mostly surprised that ChatGPT seems to know about lean4

Timothee Lacroix (Dec 01 2022 at 15:22):

the python solution is basically the same thing I wrote ^^

Timothee Lacroix (Dec 01 2022 at 15:22):

I've tried getting it to give formal proofs in lean3 of statements in minif2f, I got nothing that worked though.

Jason Rute (Dec 01 2022 at 15:50):

I love also how it advertises Lean 4 so well:

Yes, we can solve this problem using Lean4, a proof assistant and programming language. Lean4 is a powerful language that can be used to write programs that are both correct and efficient.

Jason Rute (Dec 01 2022 at 15:59):

@Timothee Lacroix If you click the "view in playground", in the top corner of the info box it will take you to an online editor for Lean 4. It is mixing up monadic and non-monadic syntax in the first function (and it forgot the let mut). Also, List doesn't need to be imported. But still, it is a good start.

Adam Topaz (Dec 01 2022 at 22:33):

:upside_down:
Screenshot-2022-12-01-at-15-31-33-ChatGPT.png

Adam Topaz (Dec 01 2022 at 22:39):

At least it sounds very confident in its answers (regardless of their correctness):
Screenshot-2022-12-01-at-15-38-07-ChatGPT.png

Adam Topaz (Dec 01 2022 at 22:41):

Oh boy, this is starting to sound like some discussions I might have during office hours about homework solutions:
Screenshot-2022-12-01-at-15-40-49-ChatGPT.png

Patrick Massot (Dec 01 2022 at 22:44):

Are we really really sure they are not simply paying some undergrad to answer those questions pretending to be a computer?

Adam Topaz (Dec 01 2022 at 22:45):

It's really hard to tell!

Tom Chen (Dec 02 2022 at 00:46):

So, it is possible ChatGPT has seen (some of) them in the training data? Especially the question about pi - I guess it is quite classical and can be even in textbooks

Johan Commelin (Dec 02 2022 at 06:47):

If it read textbooks about transcendence of pi (which I agree it probably did) then it still did a very bad job a regurgitating the proof.

Johan Commelin (Dec 02 2022 at 08:14):

At least it can do the "hello world" of mathematics:

Can you prove that there are infinitely many prime numbers?

Yes, there are infinitely many prime numbers. One way to prove this is by contradiction. Suppose there are only finitely many prime numbers, say p1, p2, p3,...,pn. We can then construct a new number N by multiplying all of these prime numbers together and adding one: N = (p1 * p2 * p3 * ... * pn) + 1.

Since N is not divisible by any of the prime numbers p1, p2, p3,...,pn (because it has a remainder of 1 when divided by any of these numbers), it must be either a prime number itself or have a prime factor that is not in the list of known prime numbers. In either case, this contradicts our assumption that the list of prime numbers was complete, so there must be infinitely many prime numbers.

Johan Commelin (Dec 02 2022 at 08:15):

The response I got to Adam's question:

Sure, here is an example of a nontrivial element of the absolute Galois group of the rational numbers: complex conjugation. The absolute Galois group of the rationals is the group of all automorphisms of the field of rational numbers, which are essentially just invertible mappings of the rationals to themselves that preserve the field operations of addition, subtraction, multiplication, and division. Complex conjugation is an automorphism of the field of complex numbers that fixes the real numbers and negates the imaginary part of any complex number. It can be extended to an automorphism of the field of rational numbers by fixing all of the rational numbers, so it is an element of the absolute Galois group of the rationals.

Junyan Xu (Dec 02 2022 at 08:31):

https://twitter.com/colin_fraser/status/1598237975120609280 :upside_down:

More of the chat bot being bad at math https://twitter.com/colin_fraser/status/1598237975120609280/photo/1

- Colin Fraser (@colin_fraser)

MinusGix (Dec 02 2022 at 19:23):

Seems good at taking advice: https://imgur.com/a/7Px4EtE
Does make me want to see how nice the outputs could be if you hooked error/warning messages up to it, so that it refines it repeatedly before showing the output.

Tyler Josephson ⚛️ (Dec 03 2022 at 08:23):

You can ask it to write you a poem about category theory:
https://twitter.com/math3ma/status/1598776068969619456

ChatGPT waxes poetic about category theory: https://twitter.com/math3ma/status/1598776068969619456/photo/1

- Tai-Danae Bradley (@math3ma)

Last updated: May 02 2025 at 03:31 UTC