Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: GPT frontend for mathlib4

Scott Morrison (May 04 2023 at 01:11):

Participants here may be interested in two PRs I just made for mathlib4:

!4#3785, which adds a simple ChatGPT API written in Lean
!4#3786, which uses this to implement a #formalize command.

Very little thought went into the prompting for #formalize, so very happy to take suggestions. Typical usage:

#formalize "The sum of the first n natural numbers."

producing

/-- The sum of the first `n` natural numbers is equal to n * (n + 1) / 2. -/
theorem sum_of_first_n_natural_numbers (n : ℕ) : (n * (n + 1)) / 2 = ∑ i in Finset.range (n + 1), i := sorry

It is intended as a demo that it is easy to write GPT backed tools in Lean, not to be a good autoformalizer.

Scott Morrison (May 04 2023 at 01:11):

There will be a monadic API wrapper that keeps track of the conversation added later.

Scott Morrison (May 04 2023 at 01:12):

If anyone would like to point me to accessible API for other LLMs, I would love to have a swappable backend. :-)

Scott Morrison (May 04 2023 at 01:15):

!4#3785 tries to use the "gpt-4" model, and automatically falls back to "gpt-3.5-turbo" if you don't have access. But in either case you'll need an OpenAI API key to use these.

Adam Topaz (May 04 2023 at 02:30):

@Scott Morrison what do you think about allowing for a default model? I think something like the following should do it:

structure ChatConfig where
  /-- Model temperature.
  0 is close to deterministic, selecting the most probably token,
  while 1 samples from the predicted probability density on tokens. -/
  temperature : Float := 0.7
  /-- Specify a default model to use. -/
  defaultModel : Option Model := none
  /-- Specify a list of models to use.
  Within each file, if one model fails with an access denied message,
  subsequent attempts will ignore that model. -/
  models : List Model := [.gpt_4, .gpt_35_turbo]
  /-- Retry after server errors. -/
  attempts : Nat := 3
  /-- Log JSON messages to and from https://api.openai.com/ on stdout. -/
  trace : Bool := false

namespace ChatConfig

/-- Select the first model from a list of models, which has not been marked as unavailable. -/
def selectModel (cfg : ChatConfig) :
    IO (Option Model) := do
  match cfg.defaultModel with
  | some e => return some e
  | none =>
      let unavailable ← unavailableModels.get
      return cfg.models.filter (fun m => ¬ unavailable.contains m) |>.head?

Adam Topaz (May 04 2023 at 02:31):

Since GPT4 is so much more expensive than 3.5, I think having such an option would be worthwhile.

Mario Carneiro (May 04 2023 at 02:34):

I think we should also have at least one open source model / model you can run on your local machine, even if it is not very good

Scott Morrison (May 04 2023 at 02:36):

Adam Topaz said:

Since GPT4 is so much more expensive than 3.5, I think having such an option would be worthwhile.

Oh, my intent was just that you set models := [.gpt_35_turbo] to change the default.

Adam Topaz (May 04 2023 at 02:36):

Ah I see.

Scott Morrison (May 04 2023 at 02:37):

Mario Carneiro said:

I think we should also have at least one open source model / model you can run on your local machine, even if it is not very good

If someone points me to one, I will try to write the wrapper for it. :-)

Adam Topaz (May 04 2023 at 02:37):

Are there any locally-runnable LLM's where the API matches openai's?

Mario Carneiro (May 04 2023 at 02:38):

I don't think it is important for the API to match OpenAI's

Adam Topaz (May 04 2023 at 02:38):

Well, I'm asking just because the types that Scott wrote would work out of the box if that's the case.

Mario Carneiro (May 04 2023 at 02:40):

I ask because I'm not really comfortable landing this in mathlib4 with only openai options. That's just an ad

Scott Morrison (May 04 2023 at 02:42):

Sure, I agree it is problematic. :-) How would you feel if the backend were a separate project, but then allow uses of it e.g #formalize or sagredo from within mathlib4?

Mario Carneiro (May 04 2023 at 02:42):

@Zhangir Azerbayev might have some idea about how to do local models

Scott Morrison (May 04 2023 at 02:42):

I suspect https://github.com/ggerganov/llama.cpp is a good place to start. Their documentation looks nice.

Mario Carneiro (May 04 2023 at 02:42):

that name came up in discussion

Scott Morrison (May 04 2023 at 02:43):

I don't think the exact API matters much. It is easy to parse some text or JSON.

Mario Carneiro (May 04 2023 at 02:44):

yeah I can't imagine there is much complexity to the API for a transformer model

Mario Carneiro (May 04 2023 at 02:50):

Scott Morrison said:

Sure, I agree it is problematic. :-) How would you feel if the backend were a separate project, but then allow uses of it e.g #formalize or sagredo from within mathlib4?

Would it work to have said external project as an optional dependency for mathlib? That way you could switch some option in the lakefile and get access to a #formalize command defined elsewhere

Scott Morrison (May 04 2023 at 02:51):

I guess it could. Why make it an option, though? The dependency would presumably be tiny.

Mario Carneiro (May 04 2023 at 02:52):

not if it includes llama.cpp and/or a cached local model

Mario Carneiro (May 04 2023 at 02:53):

I think it has sufficiently different requirements from mathlib that it could go as a separate project even if it is small. Most of mathlib's dependencies are small projects

Mario Carneiro (May 04 2023 at 02:54):

there is also the maintenance angle, I'm not sure we want mathlib maintainers to become ML researchers

Zhangir Azerbayev (May 04 2023 at 02:54):

llama.cpp is the way to go for local inference. Although a limitation of using llama is that the gpl3 license doesn't allow us to redistribute the weights, so we'd have to ask the users to download the weights and compile the model binary manually.

Scott Morrison (May 04 2023 at 02:55):

Oh, I was assuming that using llama.cpp would require a user to do their own setup.

Scott Morrison (May 04 2023 at 02:56):

So a putative LLM interface in mathlib4, or a project which mathlib4 imports, would be small, plus require the user to do one of:

have an OpenAI key
download weights and build llama.cpp locally
some other equivalent

Zhangir Azerbayev (May 04 2023 at 02:56):

In principle you're supposed to download the weights by applying to meta's access program. In practice, most people get the weights via a torrent.

Scott Morrison (May 04 2023 at 02:57):

But it needs at least some foothold in mathlib4 or people will never use it.

Scott Morrison (May 04 2023 at 02:59):

i.e. if adding an LLM interface as a dependency for mathlib4 is a no go, then I won't bother.

Mario Carneiro (May 04 2023 at 03:01):

we could make it as easy as lake exe install-sagredo or so, which enables the dependency and calls to a script in the external dependency which walks the user through the openai key / weights downloading stuff

Scott Morrison (May 04 2023 at 03:01):

Okay, I could live with that. It feels like the code to enable the dependency might be more complicated than the current PR. :-)

Scott Morrison (May 04 2023 at 03:02):

Also it seems strange to have polyrith in mathlib4 but draw the line at this. Both are just curl to a JSON API...

Mario Carneiro (May 04 2023 at 03:02):

well, I expect that this won't be the extent of the ML stuff we will see

Mario Carneiro (May 04 2023 at 03:03):

I would prefer to put this stuff in a separate project to make it easier for ML folks to play with it

Scott Morrison (May 04 2023 at 03:03):

At the moment we have none, which is sad, because people keep writing proof of principle demos that the rest of us can't use.

Mario Carneiro (May 04 2023 at 03:04):

so our (mathlib's) main focus should be on enabling that interface and otherwise getting out of the way

Scott Morrison (May 04 2023 at 03:05):

I guess I have the opposite perspective: the ML folks don't seem interested in writing tools that mathematicians actually get to use. Where is our premise selection? I want to be able to write tactics, in mathlib, that call a premise selection function. I'm happy if mathlib has the dumbest possible implementation of that function, if there is an easy way for a user to hook into someone else's function that requires downloading a model.

Scott Morrison (May 04 2023 at 03:06):

But if you can only write something the relies on having premise selection code in a place that can never be run from mathlib, what is the point?

Mario Carneiro (May 04 2023 at 03:07):

I'm not saying it can never be run from mathlib, just the opposite

Mario Carneiro (May 04 2023 at 03:07):

it being a dependency of mathlib means it can be used anywhere

Mario Carneiro (May 04 2023 at 03:07):

it being optional means you have to be able to live without it though

Mario Carneiro (May 04 2023 at 03:08):

because there will be people who can't be bothered to get an openai key or download some big binary blob

Scott Morrison (May 04 2023 at 03:09):

Mario Carneiro said:

because there will be people who can't be bothered to get an openai key or download some big binary blob

And for whom #formalize could give a helpful error message explaining that it isn't available unless they do one of those things?

Scott Morrison (May 04 2023 at 03:09):

Okay. Sometime later I will move those two PRs to a new repo, and maybe then you can explain to me how you'd like to make them available from mathlib. :-) I will have to move one or two functions to Std first.

Scott Morrison (May 04 2023 at 05:42):

First by-hand attempt at running #formalize on LLaMA:

This is not possible because the language Lean does not exist yet.

:-)

Johan Commelin (May 04 2023 at 05:43):

Maybe LLaMa can invent the language Lean for us? :grinning:

Scott Morrison (May 05 2023 at 11:29):

Okay, the two PRs mentioned above have now been replaced by #formalize, backed by a choice of LLMs.

All the LLM interaction code is in a separate repository https://github.com/leanprover-community/llm, and it supports talking to GPT, or locally running models via pygpt4all or llama.cpp. You can select which model you want to use from a downstream project, or just call findChatBot which attempts to find the first that works (in the order I listed the models above), and if it finds none fails with an error message explaining how to set one up.

The mathlib4 code is just a prompt, and a couple of lines to implement the #formalize command. Example outputs here. The summary is that you would not want to use a local model for #formalize. :-)

Scott Morrison (May 05 2023 at 11:35):

And the same philosophy applies as before: #formalize may just be a toy, entertaining for spoiling traditional homework problems when teaching Lean, but hopefully it is a good demo of how to use LLMs from within mathlib.

Adam Topaz (Jun 06 2023 at 23:34):

Hi all. I hacked together a few little tools using some vector embeddings and GPT (using Scott's GPT frontend). Here's a demo:
find-with-gpt-demo.gif

There are four new commands:

#find_name searches theorems based on a given name, using cosine similarity.
#find_type searches theorems based on their type, using cosine similarity.
#find_name_with_gpt uses GPT to generate (i.e. hallucinate) a name for a hypothetical theorem based on a natural language prompt, and looks for theorems with similar names.
#find_type_with_gpt uses GPT to generate/hallucinate the type of a hypothetical theorem based on a natural language prompt, and looks for theorems with similar types.

The GPT prompts for 3 and 4 are very naive and there's a ton of room for improvement.

At this point, this work is not quite ready for prime time, as there is quite a lot of setup required to get this working. If anyone is willing to help in reducing the setup overhead, please let me know!

Scott Morrison (Jun 07 2023 at 01:35):

Fun! It looks like ... cosine similarity doesn't work very well? :-) It's hard to judge from the screenshots but often it's not finding what you want, or it is way down the list.

Scott Morrison (Jun 07 2023 at 01:35):

I'd love to play with it. :-)

Adam Topaz (Jun 07 2023 at 01:47):

Yeah, just using cosine similarity doesn't get too far. One thing I found it to be very useful for is helping me find the Mathlib4 name given a name with an approximate Mathlib3 naming convention. The current version doesn't do this, but one thing that works better is querying for some results with cosine similarity search, then feeding those in to GPT for a second round to refine its guess. I wanted to implement something like that which can do $n$ such iterations for arbitrary $n$ , but haven't had a chance to do it yet.

Adam Topaz (Jun 07 2023 at 01:49):

The code is in the mathlib4 find-with-gpt branch, and it uses two repos https://github.com/adamtopaz/lean_pinecone and https://github.com/adamtopaz/lean_embedding

The first is a small interface for pinecone.io (the vector database I used) and the second one for OpenAI's embedding API.

Adam Topaz (Jun 07 2023 at 01:50):

I was looking for a single-file vector database to use instead of this hosted one, but I didn't find any good option. Does anyone know of one? Something like sqlite, but with good support for vectors?

Adam Topaz (Jun 07 2023 at 01:52):

If you want to get this working you need the following:

an openai api key (set as an env variable)
a pinecone api key, pinecone project id, and pinecone index (set as env variables)
A pinecone index set up properly

You then run lake exe embed go which generates embeddings for both types and names, and stores then in (huge) jsonl files, then it takes the data from those files and uploads to pinecone. I tried just using the jsonl files directly and using lean for the dot product calculation, and unsurprisingly, it was way too slow.

Adam Topaz (Jun 07 2023 at 01:55):

Right now, the code only gets embeddings for theorems, found using ConstantInfo.thmInfo. This results in a little over 100K declarations in mathlib4. Computing the embeddings costs about 5-10 USD in API fees, and produces two files which are a little over 2GB in size each.

Adam Topaz (Jun 07 2023 at 02:00):

Oh, this is a bit silly, but I also found it useful for finding the name of the module of some theorems I was looking for :)

Junyan Xu (Jun 07 2023 at 04:25):

Adam Topaz said:

I was looking for a single-file vector database to use instead of this hosted one, but I didn't find any good option. Does anyone know of one? Something like sqlite, but with good support for vectors?

This post suggests that you could simply use numpy :) As for vector databases, Auto-GPT used to support Redis, Milvus, and Weaviate in addition to Pinecone, for example.

Update: another list from Microsoft:
AzureCognitiveSearch, DuckDB, Pinecone, Postgres, Qdrant, Redis, Sqlite, Weaviate

Scott Morrison (Jun 07 2023 at 05:25):

I'm curious if the dot products were slow, or loading the JSON files. I'm hoping it's the later, and you can just use pickle / unpickle for storage. (We use this for the library_search cache, for example, and it is fast.)

Adam Topaz (Jun 07 2023 at 14:37):

Scott Morrison said:

I'm curious if the dot products were slow, or loading the JSON files. I'm hoping it's the later, and you can just use pickle / unpickle for storage. (We use this for the library_search cache, for example, and it is fast.)

When I tried this out, I wanted to minimize memory consumption as much as possible for some reason. So what I ended up doing is streaming the 2.2GB file line by line and doing the computation that way. But maybe it's safe to assume that everyone now has 32GB of RAM, so the pickle/unpickle approach might be ok. I'll try it out soon.

Adam Topaz (Jun 07 2023 at 14:43):

OTOH, there are some quite serious benefits for using some sort of database. For example, I envision storing hashes of names/types, and only recomputing embeddings when the hash changes.

Fabian Glöckle (Jun 07 2023 at 22:39):

https://github.com/facebookresearch/faiss
is said to be very fast and easily deployable

Adam Topaz (Jun 07 2023 at 22:44):

I thought FAISS was a library that has to be connected to some other database.

Scott Morrison (Jun 08 2023 at 01:02):

FAISS just operates in-memory for the most part, and has simple functions to read and write indices to disk. There's a section in the docs for indices that don't fit in RAM.

Zhangir Azerbayev (Jun 08 2023 at 01:03):

Adam Topaz said:

I thought FAISS was a library that has to be connected to some other database.

For the mathlib-semantic-search demo (which no longer works because openai discontinued code-davinci-002), we used FAISS and our database was a numpy array.

I think that numpy's save/load utility makes more sense than pickle, since it is specifically optimized for matrices. Edit: Scott pointed out that FAISS has a save_index api, which will be better than numpy save/load.

For most use cases, the exotic vector databases are overkill.

Scott Morrison (Jun 08 2023 at 01:05):

(The pickle I was referring to was mathlib4s src4#pickle, btw)

Scott Morrison (Jun 08 2023 at 01:05):

But doing in this Lean is probably a bad idea. :-)

Adam Topaz (Jun 08 2023 at 01:09):

FWIW, I experimented today by trying to save embeddings using lean’s pickle for about 1/7th of mathlib4, and ran out of memory on my 32GB machine.

Adam Topaz (Jun 08 2023 at 01:11):

I think I’ll try using faiss with some interface between python and lean. Should be straightforward.

Adam Topaz (Jun 08 2023 at 01:13):

Another experiment I did today: I saved (randomly generated) “embeddings” for all of mathlib (both names and types) in a SQLite database, and the file was ~6GB. It’s big, but not completely unreasonable to distribute.

Zhangir Azerbayev (Jun 08 2023 at 01:14):

Another good option is to try using the FAISS C++ api. This will allow you to compile a binary and call it from Lean, which might be simpler than interfacing with python.

Adam Topaz (Jun 08 2023 at 01:16):

Yeah, but I haven’t written any C++ since 2006 :rofl:

Junyan Xu (Jun 09 2023 at 03:07):

Some nice proofs found by retrieval from https://github.com/zhangir-azerbayev/ProofNet/pull/14

Last updated: May 02 2025 at 03:31 UTC