Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: Llemma


Zhangir Azerbayev (Oct 17 2023 at 16:39):

Hi everyone,

My team has released Llemma, an open language model for mathematics with 7B and 34B parameter versions. Llemma was initialized with Code Llama weights, and further trained for up to 200B tokens on mathematical documents.

You can view some of Llemma's outputs on our sample explorer website.

We put a special emphasis on including formal proofs in our training set, for example, by using @Scott Morrison's lean-training-data tool to extract goal/tactic pairs. With few-shot prompting only, Llemma outperforms COPRA and almost matches ReProver on MiniF2F.

ArXiv
Models
Data
Code
Blog post
Sample Explorer

Adam Topaz (Oct 17 2023 at 16:43):

@Zhangir Azerbayev I look forward to trying it out! Just a quick note: I noticed the "our paper" link at the bottom of the readme on the huggingface page is broken, and just links to the same huggingface page: https://huggingface.co/EleutherAI/llemma_7b

Patrick Massot (Oct 17 2023 at 16:45):

Is this something one can actually use or only a research announcement? I mean, is there a tactic I can run in Lean to get suggestions from this model?

Adam Topaz (Oct 17 2023 at 16:50):

The huggingface page says that this was trained on Proof-Pile-2. Just to clarify, does that dataset now include goal/tactic pairs extracted with lean-training-data?

Utensil Song (Oct 17 2023 at 17:00):

Adam Topaz said:

The huggingface page says that this was trained on Proof-Pile-2. Just to clarify, does that dataset now include goal/tactic pairs extracted with lean-training-data?

EDIT: They are separately extracted, but put into algebraic_stack/*/lean_proofsteps.jsonl.zst in the dataset.

B.1.2 LEAN PROOFSTEPS

We extract a dataset of (tactic state, next tactic) pairs from Mathlib 4 (mathlib Community, 2020) using the lean-training-data (Morrison, 2023) tool. We use Mathlib 4 commit c779bd5, which was created on August 20th 2023.

Zhangir Azerbayev (Oct 17 2023 at 17:09):

Adam Topaz said:

The huggingface page says that this was trained on Proof-Pile-2. Just to clarify, does that dataset now include goal/tactic pairs extracted with lean-training-data?

This dataset includes the goal/tactic pairs.

Zhangir Azerbayev (Oct 17 2023 at 17:10):

Adam Topaz said:

Zhangir Azerbayev I look forward to trying it out! Just a quick note: I noticed the "our paper" link at the bottom of the readme on the huggingface page is broken, and just links to the same huggingface page: https://huggingface.co/EleutherAI/llemma_7b

Ah thanks for catching this. Will update rn.

Zhangir Azerbayev (Oct 17 2023 at 17:14):

Patrick Massot said:

Is this something one can actually use or only a research announcement? I mean, is there a tactic I can run in Lean to get suggestions from this model?

There is nothing you can use yet. Llemma is more of a research platform, which has to be finetuned and packaged in a front-end before it's directly useful. Systems like Morph Prover v0 can substitute in Llemma as their base model and expect a substantial gain in capabilities.

Patrick Massot (Oct 17 2023 at 17:30):

Thanks for this clarification.

Utensil Song (Oct 18 2023 at 03:45):

6 hours ago, TheBloke has uploaded the quantized version of Llemma, both 7B and 34B, which means that making small tweaks to the server script of tactic llm-step can make Llemma available in Lean, maybe also need to tweak how llm_step feeds the goal states to LLM.

Utensil Song (Oct 18 2023 at 07:09):

I've manually created some prompts from tactic states inside some Mathlib proofs or some #mwe from Zulip following the few shot prompt template which is "Given the Lean 4 tactic state, suggest a next tactic." and few-shot examples with stop string ---), Llemma (7B q8_0 gguf) generates good-looking Lean 4 proofs with no success so far (correct lemma with wrong tactic, inaccurate lemma name, hallucinated lemma names etc.):

image.png
image.png
image.png

For now, it seems to be a new Minerva, not ready for using it in proving yet, although it has a Formal-to-formal (Lean 4) example in the paper which worked after a few regenerations:

image.png

P.S. some times --- is not output, so it doesn't know to stop, as shown above.

Configs and chatlog attached:

Disclaimer: chat log is cherry-picked to demonstrate better results, only tactic state, no original statement due to too many attempts.


Last updated: Dec 20 2023 at 11:08 UTC