Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: Morph Prover v0 7B


Jesse Michael Han (Oct 13 2023 at 17:47):

morph-prover-visual.gif

Hello everyone! On behalf of the entire team at Morph Labs, I am excited to announce Morph Prover v0 7B, the first open-source model trained as a conversational assistant for Lean users. In a few days, we will make the model weights available and share more details in a blog post.

Morph Prover v0 7B is a chat fine-tune of Mistral which achieves a new state of the art in autoformalization on ProofNet while remaining as strong as the original Mistral model on benchmarks like AGIEval and MMLU. This model was trained in collaboration with Nous Research and the Safe and Trustworthy AI (STAIR) Research group at Stanford led by professor Sanmi Koyejo, with major contributions by @Brando Miranda and help from Peter Holderrieth and Jin Peng Zhou.

This will be an early research release. While this is the best open-source model for Lean we have seen, it is not completely trained and still makes mistakes like hallucinating lemmas and confusing Lean 3 and Lean 4 syntax. We hope to improve it with your feedback and we will be open-sourcing the model weights in the hope that it stimulates further work on conversational assistants for autoformalization in Lean.

While it will be possible to run the model at 20 tokens / s on an M2 Macbook using GGML, we will also host the model for interactive use. Please sign up here for an invite link and we'll let you know as soon as it's available.

Utensil Song (Oct 13 2023 at 17:50):

603BCA73-B8E3-42F4-9A4F-C522E8D56838.jpg

Utensil Song (Oct 13 2023 at 17:53):

It's excited to finally see an open LLM based code assistant for Lean proof assistant :big_smile:

Jesse Michael Han (Oct 13 2023 at 18:17):

fixed! google forms always gets me with that one

Utensil Song (Oct 14 2023 at 05:03):

I wonder why it would confuse with Lean 3 and Lean 4 syntax. As Lean 3 is obsolete, and the mathlib fully ported to Lean 4, moving forward it's most valueable to be Lean 4 only.

For GPT-4, it's trained in a way that hasn't paid attention to the subtlety, and it manages to do OK. For an open LLM with total control to the data curation, if your target is to be Lean 4 only, you can be quite sure that little Lean 3 corpus has sneaked into the dataset.

Although, incompatible syntax or API between "minor" versions is inevitable, and can be mitigated by agent-style multi-round interactions, and LLMs can handle Lean feedback pretty well with a StackOverflow style Zulip corpus, with careful labeling the discussion to be about Lean 3 or Lean 4, which would greatly benefit from the efforts of moderator to move the topic to the right stream to avoid confusions for humans.

Scott Morrison (Oct 14 2023 at 05:48):

This is a fine tune, so there's no direct control on the dataset.

Jesse Michael Han (Oct 14 2023 at 06:19):

yep this can be smoothed out with further data engineering --- currently the base model has seen Lean 3 and we finetuned on mathlib3 github and zulip data in addition to mathlib4 without carefully disambiguating. we'll fix in a subsequent release.

Utensil Song (Oct 14 2023 at 06:29):

Ah, I see. This is a recurring issue. The foundation model contains general knowledge of a certain domain, when fine-tuning on the high-quality dataset of that domain, it's difficult to make it properly prune (e.g. MEMIT & REMEDI ) all prior knowledge of the domain, and start the finetune from a clean slate (for the domain).

Min-Hsien Weng (Oct 14 2023 at 09:23):

@Jesse Michael Han
Excited to see more open source models being developed. Lean 3 and 4 programs share some similar syntax, so the model's confusion is understandable (I often got confused too).

I look forward to test Morph Prover and learn more about the model (I've already filled out the interest form).

Utensil Song (Oct 14 2023 at 10:45):

Particularly, I'm looking forward to running CORPRA on Morph Prover, the chemistry could be quite something. Open LLM can easily pretend to be a GPT API, with 20 tokens / s times less than 60 inferences, it seems to be a reasonable time to figure a proof out (compared to human), and one can expect a helpful explanation even when it fails to complete the proof with such an assistant style model.

Jesse Michael Han (Oct 17 2023 at 23:34):

Blog post with samples from the model: https://morph.so/blog/the-personal-ai-proof-engineer

Model weights are available now, free hosted version for interactive use coming very soon.

Here is a little Python project for spinning the model up locally. It should work out of the box at 20 tok / s on an M2 Macbook and much slower on CPU.

https://github.com/morph-labs/morph-prover-cli/

HuggingFace weights: https://huggingface.co/morph-labs/morph-prover-v0-7b
GGUF binaries: https://huggingface.co/morph-labs/morph-prover-v0-7b-gguf

Heather Macbeth (Oct 17 2023 at 23:41):

In some of the blog post examples the prose talks about Lean 4 but the code (e.g. the imports and the indentation) is Lean 3. It seems that all ML models at this moment in time have this problem ... do you have any ideas for solving it?

Jesse Michael Han (Oct 18 2023 at 00:03):

yes, the idea is to teach the model to identify Lean 3 and Lean 4 and then to "just ask" for Lean 4, and this can be done by synthesizing training examples where the user and the model talk about Lean 3 and Lean 4 versions of a proof. Over a few hundred to a few thousand training examples it should learn to disambiguate and start to use e.g. camelCase when asked for Lean 4.

Heather Macbeth (Oct 18 2023 at 00:06):

Thanks for explaining. Sorry I don't know the terminology -- you're saying that by recording the interactions of users with the chatbot you expect (within the first thousand few interactions) to get a supplementary dataset that will clarify this point for the model?

Heather Macbeth (Oct 18 2023 at 00:16):

Oh ... by "synthesizing" do you mean that you will autogenerate (e.g. with mathport) some "fake" user interactions with the chatbot in which the chatbot provides a translation from Lean 3 to Lean 4, and then you will train the model on that?

Jesse Michael Han (Oct 18 2023 at 00:52):

Heather Macbeth said:

Oh ... by "synthesizing" do you mean that you will autogenerate (e.g. with mathport) some "fake" user interactions with the chatbot in which the chatbot provides a translation from Lean 3 to Lean 4, and then you will train the model on that?

yes exactly! the chatbot doesn't need to provide the translation, it just needs to trained on some data where it articulates its knowledge about the difference between Lean 3 and Lean 4 with source code for both visible in the training data

Scott Morrison (Oct 18 2023 at 02:08):

Experiences so far:

  • it doesn't know when to stop, and seems to answer multiple times. The answers very often have [INST] and [/INST] in them.
  • the cli doesn't seem to allow multi-line inputs, making it hard to evaluate!

Jesse Michael Han (Oct 18 2023 at 03:57):

thanks for the feedback, i pushed a couple formatting fixes. this version of the model wasn't trained on very long conversations so it tends to derail as the conversations get longer. we'll improve it in future iterations!

we'll be hosting the model as a Discord bot very soon where multiline formatting with code blocks will be much nicer.

Eric Rodriguez (Oct 18 2023 at 05:43):

What does tok/sec mean?

Scott Morrison (Oct 18 2023 at 05:46):

"tokens per second"

Scott Morrison (Oct 18 2023 at 05:47):

Each token is usually a few characters. (The details can be different for every model; it's standard to customize your choice of tokens for your application.)

Scott Morrison (Oct 18 2023 at 05:47):

It's a measure of how fast the model produces text.

Scott Morrison (Oct 18 2023 at 05:47):

The morph model was plenty fast on my computer!

Johan Commelin (Oct 18 2023 at 06:31):

But you also have a plenty fast computer :grinning:

Utensil Song (Oct 18 2023 at 08:33):

Tested with some simple cases with Morph Prover v0 7B (q8_0 gguf with unmodified Alpaca prompt template):

Case 1. Morph Prover doesn't succeed at first, but can be prompted to succeed, works well with natural language and #mwe proofs with sorry

image.png

Case 2. Morph Prover struggles to get a correct lemma name

image.png

Case 3. Morph Prover confuses about Lean 3 and Lean 4

image.png

Case 4. Morph Prover explains proofs with lots of lemmas and notations well

image.png

Jesse Michael Han (Oct 19 2023 at 05:33):

hey all, the hosted version of morph prover is now ready for experimental use as a Discord bot. we've got around a dozen folks so far testing it out in a private channel - RSVP here and we'll send you an invite link!

https://forms.gle/vr28p6dQCJuRVLAE6

Mario Carneiro (Oct 19 2023 at 05:35):

have you considered a Zulip bot instead/in addition?

Bolton Bailey (Oct 19 2023 at 05:39):

Jesse Michael Han said:

hey all, the hosted version of morph prover is now ready for experimental use as a Discord bot. we've got around a dozen folks so far testing it out in a private channel - RSVP here and we'll send you an invite link!

https://forms.gle/vr28p6dQCJuRVLAE6

Which discord server? Are you saying the server is private, or that it's a private channel on some public server?

Jesse Michael Han (Oct 19 2023 at 05:49):

private channel on the Morph Discord:

https://discord.gg/wa5sgWMfqv

Jesse Michael Han (Oct 19 2023 at 05:50):

Mario Carneiro said:

have you considered a Zulip bot instead/in addition?

working on it and will roll it out as soon as the infra is a bit more stable :+1:

Junyan Xu (Oct 19 2023 at 07:35):

Utensil Song said:

Tested with some simple cases with Morph Prover v0 7B (q8_0 gguf with unmodified Alpaca prompt template)

image.png :eyes:

apply_rules does exist in Lean 4.

The linear map explanation isn't super clear; I'd expect it to spell out the type after applying each operation.

Siddhartha Gadgil (Oct 19 2023 at 07:38):

I was impressed by the explanations of Lean Code (informalization). A couple of examples:

Prompt: Describe the following Lean 4 inductive type, including its purpose.

inductive DiaphontineSolution (a b c : ) where
    | solution : (x y : )   a * x + b * y = c  DiaphontineSolution a b c
    | unsolvable : ( x y : , ¬ (a * x + b * y = c))  DiaphontineSolution a b c

Answer:
This Lean 4 inductive type is called DiaphontineSolution. It represents the solutions to a Diophantine equation of the form ax + by = c, where a, b, and c are integers. The purpose of this type is to classify all possible solutions to such equations into two categories: solution and unsolvable.

The solution constructor takes a pair of integers (x, y) and a proof that the equation ax + by = c holds for those values. This represents the case where there exists a solution to the Diophantine equation.

The unsolvable constructor takes a proof that there is no solution to the Diophantine equation. It states that for any pair of integers (x, y), the equation ax + by = c does not hold. This represents the case where there are no solutions to the Diophantine equation.

In summary, the DiaphontineSolution type allows us to reason about the existence and non-existence of solutions to a Diophantine equation, providing a way to classify them into two categories.

Siddhartha Gadgil (Oct 19 2023 at 07:40):

Prompt: Add a documentation string describing the following Lean 4 function, INCLUDING how it differs from Nat.sub:

def minus (m n : )(hyp : n  m) :  :=
  match m, n, hyp with
  | m, 0, _ => m
  | 0, _ +1, pf => nomatch pf
  | m + 1, n + 1 , pf =>
    minus m n (le_of_succ_le_succ pf)

Answer:

The function minus is a modified version of the Nat.sub function in Lean 4. It takes two natural numbers m and n, along with a proof hyp that n is less than or equal to m. The function returns the difference between m and n, but with some additional constraints.

The main difference between minus and Nat.sub is that minus requires a proof that n is less than or equal to m. This ensures that the subtraction operation is well-defined, as it prevents negative results when n is greater than m. In contrast, Nat.sub does not require such a proof and allows for subtraction of any two natural numbers.

Here's a documentation string for the minus function:

/-!
# Modified Subtraction Function for Natural Numbers

The `minus` function is a modified version of the `Nat.sub` function in Lean 4. It takes two natural numbers `m` and `n`, along with a proof `hyp` that `n` is less than or equal to `m`. The function returns the difference between `m` and `n`, but with some additional constraints.

The main difference between `minus` and `Nat.sub` is that `minus` requires a proof that `n` is less than or equal to `m`. This ensures that the subtraction operation is well-defined, as it prevents negative results when `n` is greater than `m`. In contrast, `Nat.sub` does not require such a proof and allows for subtraction of any two natural numbers.

## Usage

The `minus` function can be used in the following way:
lean
def minus (m n : ℕ)(hyp : n ≤ m)
: ℕ :=
  match m, n, hyp with
  | m, 0,  => m
  | 0,  +1, pf => nomatch pf
  | m + 1, n + 1 , pf =>
    minus m n (le_of_succ_le_succ pf)
## Examples

Here are some examples of how to use the `minus` function:
lean
example : minus 5 2 (nat.zerole ) = 3 := rfl
example : minus 10 7 (nat.zerole ) = 3 := rfl
example : minus 7 10 (nat.zerole ) = 0 := rfl
-/

Siddhartha Gadgil (Oct 19 2023 at 07:40):

The examples at the end are wrong though.

Siddhartha Gadgil (Oct 19 2023 at 07:43):

For comparison, here is the ChatGPT response (also excellent):

The DiophantineSolution inductive type is a construct used in Lean 4, a theorem prover and formal proof development system, to represent Diophantine equations and their solutions. Diophantine equations are mathematical equations in which the variables are restricted to integer values, and the goal is to find integer solutions for those variables. This type is used for expressing and working with these kinds of equations in a formalized manner.

The DiophantineSolution inductive type is defined with three integer parameters a, b, and c, which represent the coefficients of the Diophantine equation a * x + b * y = c. Here's a breakdown of the two constructors for this inductive type:

  1. solution Constructor:

    • This constructor represents the case when there exists a solution to the Diophantine equation.
    • It takes two additional integer parameters x and y that represent the values of the variables in the equation.
    • The condition (a * x + b * y = c) specifies that this constructor can only be used when the equation is satisfied with the provided values of x and y.
  2. unsolvable Constructor:

    • This constructor represents the case when there are no integer solutions to the Diophantine equation.
    • It does not require any additional parameters.
    • The condition ¬ (a * x + b * y = c) states that for all integer values of x and y, the equation cannot be satisfied.

The purpose of this DiophantineSolution inductive type is to provide a formal and expressive way to represent Diophantine equations and their solutions or lack thereof within the Lean 4 proof assistant. By using this inductive type, mathematicians and researchers can reason about and prove properties related to Diophantine equations in a structured and rigorous manner. It can be particularly useful in areas of mathematics and computer science where integer solutions to equations are of interest, such as number theory or cryptography.

Utensil Song (Oct 19 2023 at 08:45):

Junyan Xu said:

apply_rules does exist in Lean 4.

Yes, but when I say it doesn't exist, the true issue is I didn't import anything, it's expected to tell me what I need to import (this is where assistants shine in many debugging scenarios), but instead, Morph Prover replied with extra explanation that "there is no built-in tactic apply_rules" which is correct for raw Lean 4 but less helpful. The next suggestion has commas, which is Lean 3.

Adam Topaz (Oct 19 2023 at 11:59):

Is this working for anyone else? I keep getting Error - Expecting value: line 1 column 1 (char 0).

Jesse Michael Han (Oct 19 2023 at 13:15):

Adam Topaz said:

Is this working for anyone else? I keep getting Error - Expecting value: line 1 column 1 (char 0).

network issue, fixed!

Junyan Xu (Oct 25 2023 at 21:40):

Eric Wieser said:

Yaël Dillies said:

We really should make AddMonoidAlgebra the one field structure and Polynomial an abbrev.

I'd be inclined to prioritize this refactor, but maybe others have different opinions

@Jesse Michael Han Do you think you can make the Morph Prover submit commits to fix CI errors once we change the definition and open a WIP pull request? It would be nice if a pipeline is set up for Prover to learn from feedback gathered from comments to its own commits. There are many desired refactors like this (abandoned mathlib3 PRs, an example of mathlib4 PR) but it's unrewarding and often time-consuming for humans to do. I think this task is in line with Morph's goal of solving programming. It would be nice if Prover can ask for help when we're awake and diligently fix errors (or fill sorries?) when we're asleep. (Update: there is also now a list of small TODOs.)

Prover can also check for performance degradation (or improvement) using !bench and use the metrics as reward signals. Currently (according to my impression) it's really hard for humans to predict what the effect of a refactor (or introduction of instances/typeclasses) might have to all the different files in mathlib, and reinforcement learning might work better here.

Another possible use case of Prover is to split big files (see thread). I guess we should have some tooling to verify the declaration names aren't changed and the arguments are the same up to permutation though.

I've also thought about asking Prover to review PRs, which is a great way to interact and learn. However, at Prover's current knowledge/skill level (having tested on Discord myself) I think we'd need to use another repo that mirrors mathlib4 PRs to avoid polluting mathlib4 PRs proper. People interested in training Prover can watch the repo and comment to tell Prover whether its ideas are good or bad. The mathlib reviewer welcome message PR review guide probably makes a good prompt for generating reviews.


Last updated: Dec 20 2023 at 11:08 UTC