Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: Lean Chat is publicly available


Zhangir Azerbayev (Aug 02 2022 at 16:52):

The Lean Chat app developed by myself and @Edward Ayers is now publicly available as VS-code extension. Users no longer need to provide their own OpenAI API key.

The interface is similar to previous demos. To open the app, hit cmd+shift+p in VS code and run the command Open chat. Type the statement of a theorem in LaTeX, and the app will try to formalize the statement. If the returned declaration is incorrect, you can give the system instructions for what to fix in a natural, dialogue-like way.

Additionally, you can now give feedback to the system on whether each formalization attempt is correct or not. We will use this feedback to further improve the quality of outputs.

Under the hood, the user's inputs are wrapped in a few-shot prompt and completed by the Codex language model, accessed through the OpenAI API.

I'm excited to see what the community does with this app! I'd love to hear about any striking examples of successes or failures, any interesting patterns you discover, and of course reports of any bugs.

example with groups, example with topology

Oliver Nash (Aug 02 2022 at 19:55):

Holy crap, this is pretty amazing.

Oliver Nash (Aug 02 2022 at 19:55):

image.png

Zhangir Azerbayev (Aug 02 2022 at 20:28):

Oliver Nash said:

Holy crap, this is pretty amazing.

My (anecdotal) experience so far is that the app has rather good knowledge of advanced mathematics. The error rate for theorems about abelian categories isn't much, if at all, higher than the error rate for elementary number theory.

Oliver Nash (Aug 02 2022 at 20:36):

I think that might depend heavily on what one means by "advanced": certainly Mathlib contains hardly any (if any) mathematics that a mainstream mathematician would call advanced. Still, I'm very impressed and curious to see more.

David Renshaw (Aug 02 2022 at 22:51):

It seems to have trouble with olympiad-type puzzles like those in https://github.com/dwrensha/mathematical-puzzles-in-lean.
chat.png

Zhangir Azerbayev (Aug 03 2022 at 01:30):

David Renshaw said:

It seems to have trouble with olympiad-type puzzles like those in https://github.com/dwrensha/mathematical-puzzles-in-lean.
chat.png

Interesting. I tried rephrasing the prompt and reply in a few different ways, but could never get the model to include both the divisibility condition and the digits condition.

Although in general, writing all mathematical symbols with latex, even when it seems unnecessary to do so (in this example, $n$ rather than n) gives better results. For the language model, the dollar signs are a cue that it should be doing mathematical reasoning.

Jason Rute (Aug 03 2022 at 01:47):

You need a post to Zulip button so we can more easily share our findings. :)

Johan Commelin (Aug 05 2022 at 05:57):

@Zhangir Azerbayev @Edward Ayers Really cool! Minor feature request: Ctrl-Enter as keyboard shortcut for "Send".

Johan Commelin (Aug 05 2022 at 05:58):

I'm currently trying to coach it through

Let GG be a Lie group. Then the tangent space at the identity element is a Lie algebra.

But it's not cooperating about getting the conclusion right, and the assumptions are missing the fact that GG is a Lie group.

Johan Commelin (Aug 05 2022 at 05:59):

theorem  tangent_space_at_identity_is_lie_algebra (G : Type*) [group G] [topological_space G] [topological_group G] [charted_space G] [smooth_manifold G] :
  lie_algebra (tangent_space_at_identity G) :=

Johan Commelin (Aug 05 2022 at 07:07):

Let GG be a Lie group over a non-trivially normed field 𝕜𝕜. Then the tangent space at the identity element of is a Lie algebra over 𝕜𝕜.

theorem  tangent_space_at_identity_is_lie_algebra (G : Type*) [lie_group G] [normed_field 𝕜] [nontrivial 𝕜] :
  lie_algebra 𝕜 (tangent_space_at_identity G) :=

Julian Berman (Aug 05 2022 at 07:28):

Can I ask how the not-needing-a-key works? Is there a key "hidden" in the built extension? Or is OpenAPI just configured to allow public access now to this model? (Specifically, should I expect to be able to get this working in neovim now?)

Jason Rute (Aug 05 2022 at 11:13):

I suspect that you needing to be signed into GitHub via vs code matters, but I’m also curious about Julian’s question.

Jason Rute (Aug 05 2022 at 11:15):

I want to know if it is possible to add settings to play with the examples in the prompt (and maybe even the other parts of the prompt so I could see how it works in Coq) or if this violates the Codex TOS.

Edward Ayers (Aug 05 2022 at 11:20):

In order to satisfy the Codex TOS, OpenAI require us to make each user of the API log in or be identifiable in some way. We implemented this by having users sign in to github via the vscode extension, and we use the github ids as the identifier.

If you do want to experiment with the codex api, you can find our sourcecode (including prompts) here, and then you need to apply to Codex for an API key.

We also welcome contributions :smile:

Jason Rute (Aug 05 2022 at 11:25):

Other minor change suggestions:

  • “Open Chat” doesn’t see very plugin specific. “Lean: Open Chat” or “Open Lean Chat” maybe would be better.
  • The app is easy to use, but still having a better description and set of instructions on plugin page would go along way to user friendliness.
  • the navy blue message bubbles on black background in dark mode is readable, but maybe could be better.
  • going backwards is unintuitive. I don’t see the reason to go backwards twice (hit clear twice) to undo my last prompt. Also often I just want to edit my last prompt so it is annoying to have it completely deleted from the text box.

But these are just nitpicks. It is very cool.

Edward Ayers (Aug 05 2022 at 11:27):

I'm not sure whether OpenAI will permit us to let users play with adding custom prompts and so on. But that can also be a feature to add.

Edward Ayers (Aug 05 2022 at 11:32):

I've opened some issues feel free to add more bugs and requests!

Julian Berman (Aug 06 2022 at 10:51):

@Edward Ayers is hitting https://lean-chat-server.deno.dev/ with additional traffic also OK with you? Or you're saying you prefer stuff other than the VSCode extension run its own proxy?

Johan Commelin (Aug 15 2022 at 11:45):

Another little success story: I copied the statement of Burnside's theorem from wiki

if G is a finite group of order p a q b {\displaystyle p^{a}q^{b}} {\displaystyle p^{a}q^{b}} where p and q are prime numbers, and a and b are non-negative integers, then G is solvable

and fed it to the chat without cleaning up at all. So p a q b {\displaystyle p^{a}q^{b}} {\displaystyle p^{a}q^{b}} was literally in the input.

The first output was

theorem  solvable_finite_group_of_prime_power_order (G : Type*) [group G] [fintype G]
  (p q : ) (hp : prime p) (hq : prime q) (a b : ) (h : fintype.card G = p ^ a * q ^ b) :
  solvable G :=

And then I told it

The predicate for solvability of groups is called `is_solvable` instead of `solvable`.

After that it game me a statement that Lean was happy with.

Johan Commelin (Aug 15 2022 at 11:45):

As far as I know, this result is not yet in mathlib.

Zhangir Azerbayev (Aug 15 2022 at 18:49):

Johan Commelin said:

As far as I know, this result is not yet in mathlib.

Interestingly, prompting the app with proper latex gives a worse result. Although the model partially redeems itself by understanding the quite high level instruction "no need for the existential".

burnside.png

Johan Commelin (Aug 17 2022 at 08:07):

There are infinitely many prime numbers.

theorem  infinite_primes :  (p : ), prime p :=

:sad:

Johan Commelin (Aug 17 2022 at 08:08):

This only says that a prime exists. I'm asking for the existence of infinitely many primes.

theorem  infinite_primes :  (p : ),  (q : ), prime q  q > p :=

:smiley:

Albert Jiang (Aug 17 2022 at 08:24):

Johan Commelin said:

This only says that a prime exists. I'm asking for the existence of infinitely many primes.

theorem  infinite_primes :  (p : ),  (q : ), prime q  q > p :=

:smiley:

Wow such a cool example!

Daniel Selsam (Aug 17 2022 at 14:52):

The funny part is that it is probably just an alignment problem -- the model knows how to formalize the original statement, but doesn't know it is _supposed_ to formalize it correctly since there is so much data on the internet where things are wrong the first time.

Scott Morrison (Sep 14 2022 at 23:36):

@Daniel Selsam, I'm really curious about this alignment issue. How does one even begin to diagnose whether this is actually the case?

Daniel Selsam (Sep 19 2022 at 15:11):

Scott Morrison said:

Daniel Selsam, I'm really curious about this alignment issue. How does one even begin to diagnose whether this is actually the case?

by tweaking the prompt and seeing if the error rate goes down :) a sufficiently good model perhaps spends most of its energy just trying to guess the quality of the page it is on and what kind of errors it expects to see, and indeed additional context in the prompt that correlates with higher quality web pages can often help a lot. I don't know the prompt that is being used now but one could try prepending something that sounds official.

Langze YE (Dec 04 2023 at 12:58):

I really want to try it. It can still work now? I meet a 'code 404' problem.


Last updated: Dec 20 2023 at 11:08 UTC